Means and standard deviations, or locations and ...

7 downloads 0 Views 257KB Size Report
Skew. Skew-normal distributions. A B S T R A C T. According to standard experimental practice, researchers randomly assign participants to experimental and.
New Ideas in Psychology 50 (2018) 34–37

Contents lists available at ScienceDirect

New Ideas in Psychology journal homepage: www.elsevier.com/locate/newideapsych

Means and standard deviations, or locations and scales? That is the question!

T



David Trafimow , Tonghui Wang, Cong Wang New Mexico State University, United States

A R T I C LE I N FO

A B S T R A C T

Keywords: Location Scale Skew Skew-normal distributions

According to standard experimental practice, researchers randomly assign participants to experimental and control conditions, deeming the experiment “successful” if the means of the two conditions differ in the hypothesized direction. Even for complex experiments, with many conditions, success generally depends on a comparison or contrast of means across conditions. Because the experimental manipulation may change the shape of the distribution, we show that a difference in means, even if large and in the hypothesized direction, does not necessarily indicate the success of the experiment. To make this determination, it also is necessary to compute location statistics. It is possible for means to change but for locations not to change, for means not to change but for locations to change, and for mean differences and location differences to be in opposite directions. Therefore, typical research that depends on differences between means across conditions, cannot be trusted in the absence of location statistics. For similar reasons, nor can standard deviations be trusted without scale statistics. Therefore, we take the radical step of arguing that all researchers who report means and standard deviations, also should be required to report corresponding location and scale statistics.

1. Introduction The standard experimental approach is similar across most sciences. At minimum, the researcher performs an experimental manipulation that includes an experimental condition and a control condition, and a difference in means, in the hypothesized direction, across the two conditions, is taken as indicating “success.” The researcher also might compute an effect size. Usually, this implies dividing the difference between means by the standard deviation, to find the size of the difference in standard deviation units. Of course, much research is more sophisticated, with more conditions, but the basic procedure of comparing means across conditions, and drawing conclusions about the success or failure of the experiment from differences between means, remains. Our goal is to show that using means in this way is problematic, but also that there is a solution. To introduce the problem, it is important to realize that most distributions are skewed rather than normal (Ho & Yu, 2015; Micceri, 1989). In addition, in the context of an experiment, it is quite possible that the researcher's experimental manipulation changes the skew of the distribution in the experimental condition relative to the control condition. The experimental manipulation might introduce ceiling effects, floor effects, extreme scores, or other factors that increase the level of skew relative to the control condition. Alternatively, the control condition might have a skewed distribution and the experimental manipulation might decrease, eliminate, or reverse the skew in the ∗

experimental condition. The possibility that an experimental manipulation can cause the experimental and control conditions to differ with respect to skewness suggests that there is an alternative explanation for differences in means across these conditions, other than that the experiment was successful. This alternative explanation constitutes the present main topic. To gain an initial understanding of the argument to be developed more formally later, consider the familiar mean, median, and mode. These are different ways of assessing what might be considered, informally, to be the “center” or “location” of a distribution. That there are multiple such assessments suggests that the center or location of a distribution, used in this general sense, is not well defined, though we will use location in a very specific, and well-defined sense, later. Intuitively, the larger the center or location of a distribution; such as the mean, median, or mode; the more the distribution is shifted to the right on a horizontal axis. But there is an important caveat in the context of skew-normal distributions. Skew-normal distributions are defined by three parameters, to be elaborated later: location (the “center” of the distribution), scale (dispersion of scores), and skewness (shape of the distribution). Because location is one of the three defining parameters of skew-normal distributions; and mean, median, and mode are not; our focus is on location. It is vital not to confuse the word, “location,” as used generally to indicate a class of parameters, such as mean, median, and mode; versus used specifically, as a parameter that helps define a skew-normal

Corresponding author. Department of Psychology, MSC 3452, New Mexico State University, P. O. Box 30001, Las Cruces, NM, 88003-8001, United States. E-mail address: dtrafi[email protected] (D. Trafimow).

https://doi.org/10.1016/j.newideapsych.2018.03.001 Received 26 December 2017; Received in revised form 27 January 2018; Accepted 9 March 2018 0732-118X/ © 2018 Elsevier Ltd. All rights reserved.

New Ideas in Psychology 50 (2018) 34–37

D. Trafimow et al.

distribution. Hereafter, we use “location” in the latter sense. Just as “location,” in the context of skew-normal distributions, differs from mean; “scale” differs from standard deviation. Because scale is a defining characteristic of skew-normal distributions, and standard deviation is not, scale is more useful than standard deviation in understanding skew-normal distributions. Location, scale, and skewness are independent parameters of skewnormal distributions. Consequently, it is quite possible for an experimental manipulation to influence one of them, without influencing the others; or to influence two of them, without influencing the third. For example, an experimental manipulation could influence the skewness parameter, which necessarily influences the mean and standard deviation, if the location and scale parameters are unchanged. In that case, with the location of the skew-normal distribution unchanged, it follows that what seems a successful experimental manipulation based on means, is not successful after all, based on locations. The present argument relates to one made by Speelman and McGann (2013), who showed that any summary statistic might be misleading. In fact, although our present goal is to support the use of location and scale statistics to uncover when means and standard deviations are misleading, we also wish to be up front that even location or scale statistics can be misleading. For example, if one has a bimodal distribution rather than a skew-normal distribution, the location is misleading. This is one reason we encourage researchers to use visual displays to aid in better understanding their data (Valentine, Aloe, & Lau, 2015).

Fig. 1. The probability density functions (pdf) for skew-normal distributions are shown with location parameter 0 (pointed by ‘|’); scale parameter 1; and skewness parameters −4 (cube curve), −2 (star curve), 0 (solid curve), 1 (dot curve) and 5 (triangle curve).

suppose that the effect of the experimental manipulation is to increase the skewness (in the positive direction) of the experimental condition relative to the control condition. But let us also assume that the location ξ is the same in both conditions. If the researcher wishes to increase the distribution of scores in the experimental condition relative to the distribution of scores in the control condition, it should be obvious that the experiment is not successful because the location is the same in both conditions. Nevertheless, the means are necessarily different, and in the hypothesized direction too. Stated more generally, changing the skewness of a distribution causes the mean to change too, if the location and scale remain the same. And we arrive at our alternative explanation for a larger mean in the experimental condition than in the control condition. That is, it could be that the experiment is not successful because the location does not change, but the experiment seems successful because the mean changes in the predicted direction. Furthermore, the change in skewness also forces a change in the standard deviation, if the scale does not change. In general, increasing the skewness magnitude in either the positive or negative direction forces the standard deviation to decrease when the scale remains constant. Fig. 1 illustrates this alternative explanation. In Fig. 1, all the curves have the same location and scale, but also have different shapes (amounts of skew), and consequently, different means and standard deviations. More precise mathematical demonstrations are forthcoming. It is possible to increase the precision of the argument by introducing Figs. 2–4. To create these figures, we mathematically modeled an experiment with experimental and control conditions, stipulating that the control condition distribution is normal (λ = 0 ), with location (and

2. The alternative explanation It is necessary to provide a brief introduction to the family of skewnormal distributions, of which the family of normal distributions is a subset (Azzalini, 2014). A random variable Z is said to have a standard skew-normal distribution with skewness parameter λ , denoted as Z ∼ SN (λ ) , if its probability density function (pdf) is given by

f (z ) = 2ϕ (z )Φ(λz ),

(1)

where ϕ (⋅) and Φ(⋅) are the probability density function (pdf) and cumulative distribution function (cdf) of the standard normal distribution, respectively. Let Z ∼ SN (λ ) , and consider the linear function of Z

X = ξ + ωZ.

(2)

Then the random variable X has a skew-normal distribution with location parameter ξ , scale parameter ω , and skewness parameter λ , denoted as X ∼ SN (ξ , ω2 , λ ) . The pdf of X is given by

f (x ) =

2 ⎛x − ξ⎞ ⎛ x − ξ⎞ Φ λ . ϕ ω ⎝ ω ⎠ ⎝ ω ⎠ ⎜







(3)

And the mean and variance are:

2 δω π

E(X ) = μ = ξ + where δ =

λ 1 + λ2

2 and V(X ) = σ 2 = ω2 ⎛1 − δ 2 ⎞, π ⎠ ⎝

(4)

.

The foregoing equations indicate important implications. When the skewness parameter λ = 0 , the distribution is normal. In turn, the mean parameter μ and the location parameter ξ are equivalent. Thus, the mean parameter also functions as the location parameter. However, when λ ≠ 0 , μ ≠ ξ ; the distribution is skew-normal and the mean fails to give the location of the distribution. In addition, when the distribution is normal, the standard deviation parameter σ also functions as the scale parameter ω , but when the distribution is skew-normal, σ ≠ ω, and the standard deviation σ fails to function as the scale ω . The present argument depends on the fact that for skewed distributions (λ ≠ 0) , μ ≠ ξ and σ ≠ ω. Returning to the case where a researcher performs an experiment,

Fig. 2. The mean of the experimental condition is presented along the vertical axis as a function of its skew, presented along the horizontal axis. The location is 0 and the scale is 1.

35

New Ideas in Psychology 50 (2018) 34–37

D. Trafimow et al.

difference in means in the two conditions is insufficient for showing that the experiment “worked.” Rather, it is necessary to demonstrate that the locations are different. Fig. 3 illustrates how the standard deviation changes as the skewness magnitude in the experimental condition increases relative to 0. Because the scale was set at 1, the standard deviation of the experimental condition is less than 1 at all levels of skew that differ from zero, and the standard deviation continues to decrease as the skewness magnitude increases. Researchers often interpret experimental findings in terms of effect size by dividing the difference between the means of the experimental and control conditions by the standard deviation. Fig. 4 illustrates how the absolute value of three sorts of effect sizes increase as skewness magnitude increases, again keeping the location in both conditions at 0 and the scale in both conditions at 1. The topmost curve was based on the experimental condition standard deviation, the middle curve was based on the average of the standard deviation of the two conditions, and the bottommost curve was based on the control group standard deviation. It is interesting to consider some numbers. According to the middle curve, when the skewness magnitude exceeds 1.72, the effect size exceeds 0.8, which is conventionally considered to be a “large” effect size (Cohen, 1988). This is despite the location being the same in both conditions, thereby indicating that the manipulation was not successful! Even a slight degree of skewness, such as 0.5, corresponds with an effect size of 0.37, which most researchers would consider reasonably successful.

Fig. 3. The standard deviation of the experimental condition is presented along the vertical axis as a function of its skew, presented along the horizontal axis. The location is 0 and the scale is 1.

3. What if the effect size is zero? In the previous section, we saw that it is possible to obtain impressive effect sizes, as computed using the difference between means divided by the standard deviation, even when there is no change in the location of the distributions. In this section, we take the opposite point of view. That is, suppose that the means and standard deviations in the experimental and control conditions are the same; does that mean that the experiment was unsuccessful? Or might a consideration of location and scale change this seemingly obvious conclusion? Suppose that we have a population of people who suffer from high blood pressure, so that their mean systolic blood pressure is 180 with a standard deviation of 20. They are randomly assigned to an experimental (treatment) condition where they receive medicine designed to lower their blood pressure or to a control condition where there is no medicine. Finally, imagine that the mean and standard deviation is 180 and 20, respectively, in both conditions. In that case, the obvious conclusion is that the medicine is ineffective. But such a conclusion might be hasty if skewness is not considered. It is possible that the experimental manipulation does change the location, but also influences the skew in a way that hides the location change. To continue the example, suppose the skew is 0 in the control condition but it is 0.80 in the experimental condition. In that case, the locations would be 180 and 168.50 in the control and experimental conditions, respectively, for a systolic blood pressure reduction of 11.50. Thus, the medicine is effective after all, despite appearances based on means and standard deviations. Again, we see that means and standard deviations can be deceiving.

Fig. 4. The absolute value of the effect size is presented along the vertical axis as a function of the skewness magnitude of the experimental condition, presented along the horizontal axis. The location is 0 and the scale is 1.

mean) equal to 0, and scale (and standard deviation) equal to 1. In contrast, we allowed the skew of the experimental condition to vary between −5 and 5, with the location and scale parameters remaining at 0 and 1, respectively. Thus, the experiment is blatantly unsuccessful no matter what level of skew is assumed because the location and scale parameters do not change. But we shall see that as skewness increases, the experiment seems increasingly successful because of the effects of skewness on the mean and standard deviation of the experimental condition. To understand Fig. 2, it is worthwhile to keep in mind that the mean of the control condition, where the skew is 0, also is 0. Relative to 0, then, Fig. 2 shows that increasing the skewness magnitude in the positive direction causes the mean of the experimental condition to increase too, despite the location being 0 in both conditions. And increasing the skewness magnitude in the negative direction causes the mean of the experimental condition to decrease. Either way, as the skewness magnitude increases, the mean of the experimental condition is increasingly displaced from the mean of the control condition, which is 0. Nor is it necessary to have an extreme skewness magnitude to obtain a reasonably strong apparent success. For example, even when the skew in the experimental condition is only 0.5, it nevertheless results in the mean increasing from 0 to 0.36. Clearly, showing a

4. Can mean and location differences trend in opposite directions? Consider another blood pressure experiment where the control group is normally distributed, has a location of 180 (and mean of 180), and the scale is 20 (and the standard deviation is 20). In contrast, the experimental group is skewed (skew = −2), with a location of 190.38 and scale of 28.55. Analyzing locations shows that the blood pressure medicine increases (by 10.38 points) systolic blood pressure rather than decreasing it. And yet, the reverse appears true if one analyzes means. The mean in the experimental condition is 170 and the standard 36

New Ideas in Psychology 50 (2018) 34–37

D. Trafimow et al.

accompanied by location and scale statistics.1 Although the present focus was on experiments, our theme can be applied to any study comparing the means of two or more groups. For example, cross-cultural researchers might compare the means of different cultures on collectivism, economists might compare mean production levels in different industries, and so on. If skews differ across cultures, industries, and so on, means and standard deviations are misleading, and the computation of location and scale statistics is vital for understanding the data. In all such cases where researchers normally would compute and report means and standard deviations, they also should compute and report location and scale statistics. Therefore, an important change across most sciences is required. Instead of settling just for means and standard deviations, researchers should compute location and scale statistics too, as necessary for better understanding their data. When researchers routinely compute and report location and scale statistics, it will be interesting to see how many seemingly solid conclusions turn out to be wrong, how many solid conclusions can be drawn when they seemingly cannot, and the frequency of location effects in the opposite direction of mean effects.

deviation is 20. Remembering that the mean in the control group is 180 (and standard deviation is 20), analyzing means renders the appearance that the blood pressure medicine reduces systolic blood pressure by 10 points, whereas analyzing locations demonstrates that the medicine actually increases systolic blood pressure by 10.38 points. 5. Discussion If an experimental manipulation causes the experimental and control conditions to have different levels of skew, this can create at least three classes of problems in interpreting means. Specifically, there can be a difference in means when there is no difference in locations; there can be a difference in locations without a difference in means; and worst of all, a difference in means can go in the opposite direction from a difference in locations. Furthermore, the problems can occur even with a mild difference in skewness in the two conditions. Consequently, unless a researcher can rule out that the experimental manipulation influenced skewness, there is no way to know if the difference, or lack of difference, in means can be trusted. Fortunately, the solution is obvious. If researchers would routinely provide sample location and scale statistics, as they currently do means and standard deviations, it would be easy to check whether the implications of location and scale statistics are similar or different from the implications of means and standard deviations. So why not also include location and scale statistics on a routine basis to address the potential problem? We see only three reasons. The first reason is that researchers may not understand the importance of location and scale statistics; but the present article should render that moot. The second reason is that researchers may not know how to compute scale and location statistics at the sample level. But this is easily handled. Practically all statistical packages give sample means and standard deviations, and most also give sample skew statistics. Based on these, Trafimow, Wang, and Wang (in press) provided the necessary equations for obtaining sample statistics for location, scale, and shape; including a worked example with real data. Researchers who wish to compute location and scale statistics merely need to follow their lead. The third reason is that researchers know how to perform null hypothesis significance tests for differences in means, but not for differences in locations. But this also is a poor reason for failing to compute location and scale statistics. As a recent statement from the American Statistical Association admits (Wasserstein & Lazar, 2016), significance tests are only capable of telling the researcher how likely a finding is under the null hypothesis and assumptions of the test, but they cannot tell the researcher what to believe; descriptive statistics are essential here. And we have seen that means and standard deviations can lead to conclusions that are misleading, or just plain wrong, unless

Appendix A. Supplementary data Supplementary data related to this article can be found at http://dx. doi.org/10.1016/j.newideapsych.2018.03.001. References Azzalini, A. (2014). The skew-normal and related families. New York: Cambridge University Press. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Ho, A. D., & Yu, C. C. (2015). Descriptive statistics for modern test score distributions: Skewness, kurtosis, discreteness, and ceiling effects. Educational and Psychological Measurement, 75, 365–388. Hubbard, R. (2016). Corrupt research: The case for reconceptualizing empirical management and social science. Los Angeles, California: Sage Publications. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. Speelman, C. P., & McGann, M. (2013). How mean is the mean? Frontiers in Psychology, 4, 451. http://dx.doi.org/10.3389/fpsyg.2013.00451. Trafimow, D., Wang, T., & Wang, C. (in press). From a sampling precision perspective, skewness is a friend and not an enemy!. Educational and Psychological Measurement. Valentine, J. C., Aloe, A. M., & Lau, T. S. (2015). Life after NHST: How to describe your data without “p-ing” everywhere. Basic and Applied Social Psychology, 37, 260–273. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70, 129–133. Ziliak, S. T., & McCloskey, D. N. (2016). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. Ann Arbor, Michigan: The University of Michigan Press.

1 This seems like a good place to point out the null hypothesis significance testing is increasingly coming under attack (see Hubbard, 2016; Ziliak & McCloskey, 2016 for recent reviews). Also, at the American Statistical Association Symposium on Statistical Inference (2017), null hypothesis significance testing was widely criticized with the main theme pertaining to alternatives.

37