A method for evaluating and selecting field experiment locations David ...

4 downloads 0 Views 373KB Size Report
David Trafimow & James M. Leonhardt &. Mihai Niculescu & Collin ... 1995; Levitt and List 2007; Lichtenstein and Slovic 1973). In addition, as field research.
A method for evaluating and selecting field experiment locations

David Trafimow, James M. Leonhardt, Mihai Niculescu & Collin Payne

Marketing Letters A Journal of Research in Marketing ISSN 0923-0645 Mark Lett DOI 10.1007/s11002-014-9345-7

1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23

Author's personal copy Mark Lett DOI 10.1007/s11002-014-9345-7

A method for evaluating and selecting field experiment locations David Trafimow & James M. Leonhardt & Mihai Niculescu & Collin Payne

# Springer Science+Business Media New York 2015

Abstract When marketing researchers perform field experiments, it is crucial that the experimental location and the control location are comparable. At present, it is difficult to assess the comparability of field locations because there is no way to distinguish differences between locations that are due to random versus systematic factors. To accomplish this, we propose a methodology that enables field researchers to evaluate and select optimal field locations by parsing these random versus systematic effects. To determine the accuracy of our proposed methodology, we performed computer simulations with 10,000 cases per simulation. The simulations demonstrate that accuracy increases as the number of data points increases and as consistency increases. Keywords Marketing research . Field experiments . Experimental methodology and design . Potential performance theory

D. Trafimow (*) Department of Psychology, New Mexico State University, MSC 3,452, PO Box 30001, Las Cruces, NM 88003-8001, USA e-mail: [email protected] J. M. Leonhardt (*) : M. Niculescu : C. Payne Department of Marketing, New Mexico State University, MSC 5,280, PO Box 30001, Las Cruces, NM 88003-8001, USA e-mail: [email protected] M. Niculescu e-mail: [email protected] C. Payne e-mail: [email protected]

Author's personal copy Mark Lett

1 Introduction Much marketing research involves testing the effectiveness of particular interventions in the field. For an intervention to be practical, it is insufficient for it to work only in the laboratory. To make the case for practicality, the intervention must be demonstrated to be effective in the context of the complex of naturally occurring forces that are at play in the marketplace in which the intervention is to be applied (e.g., Heckman and Smith 1995; Levitt and List 2007; Lichtenstein and Slovic 1973). In addition, as field research continues to be emphasized in marketing, and the frequency with which articles that depend on field research are published continues to accelerate, we expect that increased attention will be devoted to field research method innovation. Here, we develop a methodology for assessing and increasing the validity of field experiments in marketing. This new methodology employs the mathematics of potential performance theory (Trafimow and Rice 2008, 2009) and allows researchers to select optimal treatment and control locations for their field experiments. A preliminary description of the issue is as follows. A close examination of proposed experimental and control locations will reveal differences. It is unlikely, for example, that every time sales increase in one store they also will increase in another. There are two general classes of reasons for these differences. First, the complex of naturally occurring forces might differ between the two stores. The second reason is that differences can occur because of randomness, even if the complex of naturally occurring forces is precisely the same in the two stores. If there were no randomness in the world, it would be easy to test experimental-control location pairings. In the absence of randomness, a perfect control location would be one where the findings on the dependent variable of interest were the same as in the experimental location, on every measurement occasion, prior to the introduction of the intervention. The methodology presented in this paper allows researchers to handle randomness to approach this ideal state.

2 Theoretical background In a field experiment, the experimenter chooses an experimental location and a control location such that the intervention occurs in the experimental location but not in the control location. For instance, suppose a large grocery retailer is interested in testing the effect of a particular promotion on produce sales. The retailer could implement the promotion in one of their stores and then compare that store’s produce sales with another store where the promotion was absent. Should produce sales increase at the store with the promotion relative to the store without the promotion, the retailer may conclude that the promotion was successful in increasing produce sales. In addition, better field experiments may use paired baseline periods for treatment and control stores sales (for example) so that differences are measured not only between, but within stores; the potential interaction (or difference of differences) of which would be used to indicate the effect of the intervention (e.g. the promotion). However, there are factors that can compromise this conclusion. For example, in an experiment, participants are randomly assigned to conditions whereas in a field experiment this is not so. Consequently, alternative explanations pertaining to the existence

Author's personal copy Mark Lett

of preexisting differences tend to be more problematic for field experiments than for laboratory experiments. Put another way, differences between the two locations, subsequent to the intervention, could be due to factors other than the intervention. To the extent that one can plausibly argue for one or more differences between the two locations, other than the intervention or lack thereof, confidence in the efficacy of the intervention decreases even when an effect is obtained. The usual way of handling this problem is to attempt to show that the two locations match on variables that might be argued to be different between them. For example, it could be that there was more store traffic in the experimental store than the control store. A statistical demonstration that, in fact, traffic was approximately equal in both stores can help to alleviate the deleterious impact of the alternative explanation on the experimenter’s preferred explanation that the difference in produce sales was due to the intervention. But there are problems with using matching to discredit alternative explanations. The most obvious problem is that for ideal matching, the researcher needs to know all of the variables on which to match. It is unlikely that all of these will be known, thereby greatly reducing the ability of experimenters to use a matching methodology effectively. Even if all of the relevant variables were known, it is unlikely that the two locations will match on all of them. The experimenter is then reduced to trying to control for the nonmatching variables on statistical bases (e.g., via partial correlations, ANCOVAs, hierarchical regression, and so on), which are not very satisfactory given the many demonstrations of the problems that these methods carry with them (e.g., Heckman 1998). Even if the foregoing problems were not so, there remains the issue that matching on observed scores does not necessarily imply a match on true scores. Consider the example of socioeconomic status influencing entrepreneurial success. Suppose that a researcher suspects that entrepreneurs of high socioeconomic status average out to a higher intelligence level than entrepreneurs of low socioeconomic status, and wishes to match on intelligence so as to rule it out as an explanation. If the difference in intelligence is really there, matching can only work by choosing high socioeconomic status entrepreneurs that are lower than their mean, low socioeconomic status entrepreneurs that are greater than their mean, or both. The statistical phenomenon of regression to the mean implies that even though the matching can be carried out successfully on observed scores, the participants will remain mismatched on true scores. That is, the true score mean of the high socioeconomic status entrepreneurs will still exceed that of the low socioeconomic entrepreneurs, even after matching on observed scores. The totality of these problems renders matching to be a poor solution to know or demonstrate that experimental and control locations pair validly. Nevertheless, the problem remains an important one. We suggest a solution that does not depend on knowing what the relevant variables are, or how to measure them to show that there are no differences with respect to them, or how to handle the problem of regression to the mean when the true score means might differ. However, to understand the solution we propose, it is necessary to understand the basics of potential performance theory (Trafimow and Rice 2008, 2009). The mathematics of PPT involves two types of inputs. First, there are observed frequencies. Second, there are within-entity (e.g., location 1 or 2) correlation

Author's personal copy Mark Lett

coefficients or consistency coefficients. Based on these two types of inputs and PPT mathematics, the output is potential agreement or the agreement that would be obtained in the absence of randomness. When performing field research, we are concerned with the agreement of two locations, the control location and the experimental location. These two locations might “agree” at a particular time or they might “disagree”, and there are generally at least two ways of agreeing or disagreeing. For example, let us again consider produce sales. Two locations (e.g., stores) would agree if, on any particular day, their respective produce sales each exceeded or each fell below their respective produce sales medians. Alternatively, two locations would disagree if, on any particular day, one location’s produce sales exceeded its daily produce sales median, while the other location’s produce sales fell below its daily produce sales median, or the reverse. In general, it is possible to compare these two locations using a 2 (location 1 is above or below its respective median) by 2 (location 2 is above or below its respective median) frequency table. The table would show observed frequency, that is the frequency of observed agreement and observed disagreement for both locations across all trials (e.g., days) considered. For clarity regarding subsequent equations, we define the following variables. Lower case letters denote the frequency that both locations were above (a) or below (d) their respective medians; and when location 1 is above and location 2 is below their respective medians (b) and the reverse (c). There are also the following row and column frequencies: r1 =a+b,r2 =c+d,c1 =a+c, and c2 =b+d. This frequency of agreement/disagreement between the two locations—as described in the foregoing 2 by 2 table—is the result of both random and systematic factors. However, PPT mathematics can provide us with an estimate of what the “true” frequencies would be in the absence of randomness. Stated another way, PPT mathematics can be used to construct a table that indicates potential agreement between the two locations (the agreement that would be obtained in the absence of randomness). It is a PPT convention to use upper case letters to represent the frequencies that would be obtained in the absence of randomness; for example, A is the true frequency (in absence of randomness) that corresponds to the observed frequency a, R1 is the true row 1 frequency that corresponds to the observed row frequency r1, and so on, as described in the previous paragraph. Before presenting mathematical detail, consider again what it is that makes a control location a good one to contrast against an experimental location. As we pointed out earlier, the point of doing research in the field is to show not just that the intervention works, but also that it works in the complex of naturally occurring forces where it needs to work to have the desired effect in the real world. But how can we know that the complex of naturally occurring forces is the same in the experimental and control locations? Even if the two locations are similar in every way, so that the control location is ideal for the experimental location with which it is paired, randomness provides an effective disguise. However, with PPT, it is possible to gain an excellent estimate of the agreement that could be expected between the two locations in the absence of randomness. If potential agreement (PA) is near unity, it indicates that the complex of naturally occurring forces works similarly in the two locations with respect to the dependent variable under investigation (e.g., produce sales), so that the control location is ideal for the experimental location with which it is paired. To the extent that PA is less than unity, the

Author's personal copy Mark Lett

control location is less ideal for the experimental location with which it is paired. So the connection between PPT and the problem of determining the validity of control groups in field research becomes clear. Removing the veil of randomness renders it possible to directly evaluate the validity of a control location or the relative validity of competing control locations. The PPT equations are easy to use by undergoing the following steps (see Trafimow and Rice, 2008 for proofs for all equations). First, it is necessary to convert the observed frequency table into a correlation coefficient. This can be accomplished via Eq. 1, where a, b, c, and d are the cell frequencies and r1, r2, c1, and c2 are row and column frequencies. ad−bc ad−bc rX Y ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 1 r 2 c1 c2 aþbaþcbþdcþd

ð1Þ

The correlation coefficient in Eq. 1 is an observed correlation coefficient. But to correct for randomness, a “corrected” or “true” correlation coefficient is needed. Equation 2 provides that—based on the observed correlation from Eq. 1 and the consistency coefficients of each of the two locations (rXX' and rYY'). rX Y R ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rX X 0 rY Y 0

ð2Þ

The true correlation coefficient obtained in Eq. 2 can then be used to construct a table with true frequencies. These true frequencies are estimates of what would have happened in the absence of randomness. To obtain these estimates, the PPT researcher sets the margin frequencies at the observed levels (R1 =r1,R2 =r2,C1 =c1, and C2 =c2), and uses these in conjunction with the true correlation coefficient obtained via Eq. 2. The Eqs. 3–6 for obtaining all four true cell frequencies are below. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi R R1 R2 C 1 C 2 þ C 1 R1 ð3Þ A¼ ðR1 þ R2 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi R1 R1 þ R2 −R R1 R2 C 1 C 2 þ C 1 R1 B¼ ðR1 þ R2 Þ



pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C 1 R2 −R R1 R2 C 1 C 2 ðR1 þ R2 Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C 2 R1 þ R2 −R1 R1 þ R2 −R R1 R2 C 1 C 2 þ C 1 R1 D¼ ðR1 þ R2 Þ

ð4Þ

ð5Þ

ð6Þ

Once these true cell frequencies have been estimated, Eq. 7 can be used to obtain PA. PA ¼

AþD AþBþCþD

ð7Þ

Author's personal copy Mark Lett

To gain an idea of how inconsistency reduces observed agreement relative to PA, we provided Table 1. This table illustrates the difference between PA and observed agreement when locations vary in consistency. The table highlights that observed agreement is less than PA and the discrepancy increases as consistency decreases.

3 Simulations Considering that it is generally not possible to obtain an infinite amount of data, it is important to know how well PPT equations do when there is limited data, in which case we might expect PPT to perform at a level far below perfection. For each simulation, we tested the ability of PPT equations to discriminate between alternative PA levels {0.6 versus 0.7, 0.7 versus 0.8, 0.8 versus 0.9, and 0.9 versus 1.0} when sample size {N=30, 60, 120, and 240} and consistency coefficients varied {0.6, 0.7, 0.8, and 0.9}. For example, when PA=1.0 for one location and 0.90 for the other, we would hope that the PPT simulations would compute a higher PA estimate for the former than for the latter location. With infinite data, this would happen 100 % of the time. But with finite data, we expected a substantial percentage of failures, though the percentage of successes should increase as N increases. In addition, because lower consistency coefficients indicate more randomness, we expected success rates to increase as consistency coefficients increased. To commence, our PPT simulations relied on user-defined consistency coefficients for each location {0.6, 0.7, 0.8, or 0.9}. There was no point in using a user-defined consistency coefficient of 1.0 as that would render PA equal to observed agreement. These user-defined values were then used as the basis for randomly generating 10,000 sample consistency coefficients for each location for each simulation. Note that as the user-defined consistency coefficients decreased, randomness increased, thereby enabling us to manipulate the amount of randomness in the simulations. In the empirical PPT research that has been conducted thus far, researchers have employed two blocks of paired trials in order to obtain consistency coefficients. The idea is that if every trial in the first block of trials can be paired with a corresponding trial in the second block of trials, the consistency coefficient for a particular person who completes the two blocks of trials is indexed by using a within-person correlation across the two blocks of paired trials. Although the two block approach is particularly congruent with PPT, there might be cases where it is not feasible to obtain two blocks Table 1 Observed agreement as a function of potential agreement (PA) and consistency coefficients for two locations (rxx′ and ryy′) Consistency

Potential agreement (PA)

rxx′

ryy′

0.6

0.7

0.8

0.9

1.0

0.6

0.6

0.56

0.62

0.68

0.74

0.80

0.7

0.7

0.57

0.64

0.71

0.78

0.85

0.8

0.8

0.58

0.66

0.74

0.82

0.90

0.9

0.9

0.59

0.68

0.77

0.86

0.95

To simplify the simulations, we set the margin frequencies (R1 =R2 =C1 =C2, not depicted) and consistencies (rxx′ =ryy′) as equal to each other

Author's personal copy Mark Lett

of paired trials and alternative ways to compute consistency coefficients have been proposed. For example, MacDonald and Trafimow (2013) proposed an equation that allows consistency coefficients to be estimated based on a single block of trials. In the simulations to be presented, we generated user-defined consistency coefficients and left open the issue of different ways to obtain them empirically based on different empirical contexts. The difficulty with randomly generating user-defined consistency coefficients is that they are correlations, and distributions of correlations are skewed, thereby rendering the usual normal distributions as invalid for generating large numbers of user-defined consistency coefficients to use in the simulations. To circumvent this difficulty, we used Fisher’s r to z transformations, which involved using Eq. 8 to convert the consistency coefficients {0.6, 0.7, 0.8, and 0.9} into distribution means, and Eq. 9 to obtain the standard deviations. Based on these equations, 10,000 values were randomly generated, for each simulation, based on the means and standard deviation indicated in Eqs. 8 and 9, respectively. μZ ¼

1 1 þ rX X 0 loge 2 1−rX X 0

ð8Þ

1 σZ ¼ pffiffiffiffiffiffiffiffiffi N −3

ð9Þ

Once the 10,000 values were generated for each simulation, we reversed Eq. 8 to reconvert each generated value back into a consistency coefficient. This was done via Eq. 10 below and resulted in 10,000 normally distributed consistency coefficients for each location. consistency coefficient in any of the 10; 000 cases ¼

e2Z −1 e2Z þ 1

ð10Þ

We then addressed observed agreement for each pair of locations {0.6 versus 0.7, 0.7 versus 0.8, 0.8 versus 0.9, and 0.9 versus 1.0}. To simplify the simulations, we set the margin frequencies as equal to each other (R1 =R2 =C1 =C2). In turn, this allowed us to replace Eqs. 3–6 with a single simple equation rendered below as Eq. 11 (see Trafimow and Rice, 2008 for proof). In Eq. 11, s is the observed agreement and S is the PA. This can be reversed using Eq. 12 below. s¼S

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rX X 0 rY Y 0 þ 0:5−0:5 rX X 0 rY Y 0

ð11Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2s−1 þ rX X 0 rY Y 0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 rX X 0 r Y Y 0

ð12Þ



Given that Eq. 11 supplied a value for observed agreement, we converted this to a correlation coefficient using Eq. 13 below. Because distributions of correlation coefficients are skewed, as previously described, we used the Fisher’s r to z transformation

Author's personal copy Mark Lett

process (Eqs. 8 and 9) to obtain 10,000 correlation values upon which we used Eq. 14 to convert them into observed agreement values (Rosenthal and Rosnow 1991; Trafimow and Rice 2008). r ¼ 2ðs−0:5Þ

ð13Þ

r 2

ð14Þ

s ¼ 0:5 þ

Finally, it only remained to estimate PA using Eq. 12 in conjunction with the observed agreement values and consistency coefficients obtained in the previous steps of the simulation procedure. 3.1 Simulation results The percentage of times that PPT chose the correct control location varied with three factors. Most important, as N increased, PPT was correct more often. Second, as population consistency coefficients increased, so that randomness decreased, the accuracy of PPT increased. Finally, and of least interest, PPT performed best when choosing between population PAs of 1.0 and 0.90, and increasingly less well when choosing between 0.90 and 0.80, 0.80 and 0.70, and 0.70 and 0.60. Figure 1 illustrates these effects, as does Table 2.

4 General discussion When comparing possible field locations, how is a researcher to distinguish random differences from systematic ones? Our goal was to answer this question by invoking the mathematics of PPT. In the ideal case of an infinite number of data points, PPT works 100% 95% 90% % Correct

85% 80% 75% 70%

Consistency = 0.7 Consistency = 0.8 Consistency = 0.9

N = 30 72.80% 78.73% 84.82%

N = 60 81.38% 87.29% 92.12%

N = 120 90.27% 94.22% 97.05%

N = 240 96.42% 98.49% 99.33%

Fig. 1 Average frequencies for the correct control location being selected in the pairwise comparisons between neighboring potential agreement (PA) levels across sample size (N) and consistency levels

Author's personal copy Mark Lett Table 2 Pairwise comparisons between neighboring levels of potential agreement (PA) across consistency and sample size (N) levels used in the simulations N

Consistencies (rxx′ =ryy′)

Pairwise PA comparisons 1 vs. 0.9 (%)

30

60

120

240

0.9 vs. 0.8 (%)

0.8 vs. 0.7 (%)

0.7 vs. 0.6 (%)

Average (%)

0.9

96.66

86.72

79.94

75.97

84.82

0.8

86.38

79.83

76.00

72.70

78.73

0.7

76.70

73.18

71.87

69.43

72.80

0.9

99.72

94.56

88.90

85.29

92.12

0.8

94.27

88.88

84.21

81.79

87.29

0.7

86.10

81.81

79.53

78.09

81.38

0.9

100.00

98.77

96.08

93.33

97.05

0.8

98.93

95.78

92.39

89.77

94.22

0.7

94.11

91.07

88.55

87.34

90.27

0.9

100.00

99.91

99.33

98.09

99.33

0.8

99.98

99.45

97.91

96.60

98.49

0.7

98.52

97.17

95.68

94.31

96.42

perfectly but because PPT equations provide estimates, having too few data points is a potential problem. To address this issue, we performed simulations to understand how well PPT performs at distinguishing competing control locations at different userdefined population PA levels. In the worst-case scenario, where the population PA levels are low (0.70 versus 0.60), population consistencies are low (0.70 at both locations), and there are 30 data points for every sample estimate, PPT does not perform well (proportion of correct choices=0.69). In the best-case scenario, where population PA levels are high (1.0 versus 0.90), population consistency coefficients are impressive (0.90), and there are 240 data points for every estimate, PPT performs very well (proportion of correct choices=1.0). In general, when N is large, PPT tends to do well even in less than ideal conditions with respect to the other two factors. Therefore, our proposed PPT methodology works well provided that a sufficient number of data points are available. This conclusion is consistent with other research that pertains to the law of large numbers (Nisbett et al. 1983). Traditionally, marketing researchers have been able to obtain large amounts of field data from sources such as direct mail catalogs (e.g., Anderson and Simester 2001), school administrators and nonprofit organizations (e.g., Raju et al. 2010), ecommerce sites (e.g., Algesheimer et al. 2010), and retail scanner data (e.g. Levav and Zhu 2009). In addition to having access to a sufficient quantity of data on the dependent variable of interest, users of PPT must also consider how to go about estimating consistency coefficients for each location under consideration. Ideally, the researcher would have sufficient repeated measures data from each location. However, the ideal situation may not be met given the often limited form and quantity of available field data. For instance, we suspect that marketing researchers may encounter this problem when using grocery store scanner data at the daily level. Such data does not allow for a

Author's personal copy Mark Lett

sufficient number of comparisons if comparing by day of the week or month of the year. To obtain a sufficient number of comparisons the researcher could construct a frequency table by first comparing whether the dependent variable of interest (e.g., sales) during a given time period (e.g., daily sales) is above or below the historical median for the dependent variable of interest. The researcher could then compare these given time periods based on some criterion (e.g., neighboring days) to assess the frequency of their agreement and disagreement. The benefit of this method of obtaining consistency coefficients is that it does not limit the researcher to repeated measures data. This allows the researcher to use PPT with many more forms of data as long as there is a sufficient quantity of data points. The drawback of this method is that consistency coefficients will likely be lower than would be the case with repeated measures data.

5 Conclusion As marketing researchers continue to search for interventions that work in the field, it is not surprising that field experiments are increasingly emphasized. However, because field experiments use random assignment to conditions infrequently, there is the ubiquitous concern that an obtained difference between the treatment and control locations is due to differences in the locations rather than due to the manipulated intervention. As we explained earlier, the usual methods involving matching, statistical control, and so on insufficiently address this concern. A major part of the problem is the difficulty in knowing all of the relevant variables on which to match or on which to statistically control. In addition, even in the unlikely event that all of the relevant variables are known, matching and statistical control are still problematic. Our solution does not depend on the researcher’s ability to know what all of the relevant variables are. Unlike traditional methods trying to increase internal validity at the expense of external validity, our approach increases both internal and external validity without involving a trade-off. As a result, this method is consistent with the results of a metaanalysis by Anderson et al. (1999). Our proposed methodology hearkens back to the point of doing field experiments in the first place, which is to test the intervention in the complex of naturally occurring forces where it actually would be applied. But for the field experiments to be fair, the complex of naturally occurring forces should be the same in the treatment and control locations. Normally, this would be impossible to determine because there would be no way to distinguish systematic versus random differences between potential treatment and control locations. But our methodology provides researchers with the capability to parse these effects. The simulations demonstrated that although PPT does not work well with few data points (N=30 or less), it works extremely well with a large number of data points (N=120 or more). In many marketing contexts, it is not difficult to obtain a large number of data points, thereby maximizing the validity of our proposed methodology. In addition, although the simulations were successful, we believe that there is room for future research that would be helpful in improving the proposed methodology.

Author's personal copy Mark Lett

In our view, the most important difficulty for researchers who wish to use our methodology is in obtaining the consistency coefficients that it requires. There are multiple ways to obtain these coefficients. As of now, it is not clear which way is most valid, or even if a combination of ways is more valid than any single way. Fortunately, this is an issue that seems amenable to empirical testing. Therefore, we hope and expect that the present demonstration will lead to at least two tracks of research. Most obviously, we expect that future marketing researchers will routinely use our proposed methodology to test the validity of their pairings of experimental and control locations. Less obviously, we expect that researchers will perform competitive tests of alternative methods for obtaining consistency coefficients for locations. Finally, assuming that researchers pursue both of these tracks, it seems likely that the findings on both tracks will inform each other, and the integration of the two tracks will prove to be more informative than either one by itself.

References Algesheimer, R., Borle, S., Dholakia, U. M., & Singh, S. S. (2010). The impact of customer community participation on customer behaviors: an empirical investigation. Marketing Science, 29(4), 756–769. Anderson, C. A., Lindsay, J. M., & Bushman, B. J. (1999). Research in the psychological laboratory: truth or triviality? Current Directions in Psychological Science, 8(1), 3–9. Anderson, E. T., & Simester, D. I. (2001). Are sale signs less effective when more products have them? Marketing Science, 20(2), 121–142. Heckman, J. J. (1998). Detecting discrimination. The Journal of Economic Perspectives, 12(2), 101–116. Heckman, J. J., & Smith, J. A. (1995). Assessing the case for social experiments. The Journal of Economic Perspectives, 9(2), 85–110. Levav, J., & Zhu, R. J. (2009). Seeking freedom through variety. Journal of Consumer Research, 36(4), 600–610. Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? The Journal of Economic Perspectives, 21(2), 153–174. Lichtenstein, S., & Slovic, P. (1973). Response-induced reversals of preference in gambling: an extended replication in Las Vegas. Journal of Experimental Psychology, 101(1), 16–20. MacDonald, J. A., & Trafimow, D. (2013). A measure of within-participant consistency. Behavior Research Methods, 45(4), 950–954. Nisbett, R. E., Krantz, D. H., Jepson, C., & Kunda, Z. (1983). The use of statistical heuristics in everyday inductive reasoning. Psychological Review, 90(4), 339–363. Raju, S., Rajagopal, P., & Gilbride, T. J. (2010). Marketing healthful eating to children: the effectiveness of incentives, pledges, and competitions. Journal of Marketing, 74(3), 93–106. Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: methods and data analysis. New York: McGraw-Hill. Trafimow, D., & Rice, S. (2008). Potential performance theory (PPT): a general theory of task performance applied to morality. Psychological Review, 115(2), 447–462. Trafimow, D., & Rice, S. (2009). Potential performance theory (PPT): describing a methodology for analyzing task performance. Behavior Research Methods, 41(2), 359–371.