Gender Balance, Representativeness, and Statistical ... - Springer Link

107 downloads 0 Views 144KB Size Report
Jan 7, 2012 - a university human subjects pool population: Gender, major, betrayal and latency of participation. Journal of Trauma and Dissociation, 7,.
Arch Sex Behav (2012) 41:325–327 DOI 10.1007/s10508-011-9887-1

GUEST EDITORIAL

Gender Balance, Representativeness, and Statistical Power in Sexuality Research Using Undergraduate Student Samples Emily R. Dickinson • Jill L. Adelson • Jesse Owen

Published online: 7 January 2012 Ó Springer Science+Business Media, LLC 2012

It is well-documented that a great deal of psychological research, including the study of sexual behavior, relies on undergraduate students for its study participants (e.g., Landrum & Chastain, 1999; Miller, 1981; Owen, Rhoades, Stanley, & Fincham, 2010; Saks & Seiber, 1989). In many ways, undergraduate students are an ideal sample for understanding sexual behavior as this is developmentally consistent for young adults. Beyond providing an inexpensive and readily available source of research subjects, the use of student pools has also been justified by the educational benefits it provides to students who participate, such as increased knowledge about contemporary psychology and research ethics (Rosell et al., 2005). However, there are also a couple of central concerns with the use of undergraduate student participants, namely gender imbalances and the representativeness of the samples. Researchers are often concerned with obtaining balanced samples, equally- or nearly equally-sized subgroups. Although such a concern is well-rooted in statistical theory, it can pose practical limitations for those who rely largely on undergraduate student samples, as undergraduates are often given the opportunity to self-select into particular research studies (Miller, 1981). Undergraduate research pools can be overrepresented by women, freshman, and psychology majors (Barlow & Cromer, 2006), and gender, prior sexual experience, and sexual attitudes may be related to the types of studies in which students are willing to participate (Gaither, Meier, & Sellbom, 2003; Wiederman, 1999). Specific concerns have been raised regarding the imbalance between the number of men and women participating in sexuality research, particularly a general overrepresentation of women (McCray, King, & Bailly, 2005) or women’s willingness E. R. Dickinson (&)  J. L. Adelson  J. Owen College of Education and Human Development, University of Louisville, Louisville, KY 40292, USA e-mail: [email protected]

to volunteer (Rosenbaum, 1997). Other research, however, has pointed out that men may volunteer more frequently for particular types of studies, such as those involving the viewing of images of heterosexual activity (Gaither et al., 2003). Gender imbalance should be understood through representativeness and statistical power. As an example, consider a study conducted at a small college of 1,000 students in which the distribution of gender groups at the college is quite imbalanced (e.g., 80% female and 20% male). Given that females may be more inclined to agree to participate in our research, we could easily end up with a sample of 100 students containing 85 females and 15 males. Our resulting sample is clearly not balanced, but it may well be representative in that it approximates the distribution of gender groups of the college. Because a sample size of men (n = 15) in one group is likely too small to test differences between men and women, we could increase the number of male participants (i.e., collect data from 70 more males while not collecting data from additional females). Although this might be useful to address the research questions, it may come at the cost of representativeness. That is, we will have 10.6% of all the women and 42.5% of all the men from this population (students at this college). Additionally, there are practical considerations regarding the time and expense required for additional data collection to achieve gender balance in samples. The aim of much of social research is to make observations in a research setting that reflects what would be observed among different people, in different settings, under different treatment conditions, and using alternative measurements (Shadish, Cook, & Campbell, 2002). We increase confidence that this expectation has been met by collecting data from people with a wide range of characteristics and/or by measuring constructs in slightly different ways. Researchers of human sexuality often are interested in sexual attitudes, beliefs, and behaviors that exist beyond the college campus. Though a single study based on a sample of undergraduate students from a particular college is likely not repre-

123

326

sentative of the attitudes, beliefs, and behaviors of the full population of young adults (or even of the population of undergraduate students), it is typically the only means by which researchers can contribute to the field. Achieving representativeness of the undergraduate student population from which the study sample is drawn is a feasible goal and allows researchers to demonstrate that their findings are generalizable to populations with similar characteristics. Even if the final analytical sample is less than representative, it is important to document the extent of the differences between population and sample. To return to our hypothetical study, by adding males (or removing females from our sample), we would run the risk of losing representativeness of our student population (i.e., that specific college). Should we choose to pursue the balanced sample approach, it would be important to present descriptive statistics about the full population, the final analytical sample, as well as any students that were dropped from the study to achieve balance. Additionally, researchers can examine sample sizes via power analyses. As Cohen (1962) pointed out, conducting power analysis in the early stages of research design can help researchers to avoid undertaking research that will not be able to detect an effect that is actually present in the larger population or in drawing a sample that is larger than necessary. Interestingly, power analysis is a long-recommended technique that continues to be used too sparingly in current research. In fact, studies conducted by Gigerenzer and Sedlmeier (1989) and Harris, Hyun, and Reeder (2011), more than two decades apart, highlighted the lack of use of power analysis in manuscript submissions and published psychological research in spite of the preferences of editorial staff for the inclusion of this type of analysis. For our hypothetical study, knowing that we had a relatively small proportion of male undergraduates from which to choose, we could use an a priori power calculator to determine the minimum sample size needed to detect the anticipated effects and then either make extra efforts to boost response rates among male students (e.g., additional reminders, targeted incentives) or seek other sources to recruit participants (i.e., another university) to achieve that minimum. Moreover, if we anticipated interactions between gender and other independent variables, then we would need to obtain even larger samples, as power is essentially reduced by half (Aiken & West, 1991). Importantly, it is not necessary to have balanced samples (e.g., equal numbers of men and women) to test interactions. Rather, it is only necessary to have sufficient power to detect the interaction effect. Up to this point, we have mainly focused on small sample sizes that are appropriate given the population distribution and attempts at achieving representativeness. There may well be times that our samples are smaller than desired, in spite of our best efforts to collect an adequate sample. In such cases where small sample sizes reflect inadequate response rates or lack of voluntary participation, issues of sampling become a concern. Techniques such as bootstrapping (Mooney & Duval, 1993) may be used as a post hoc approach to generate standard error estimates that do not rely on

123

Arch Sex Behav (2012) 41:325–327

parametric assumptions. Using more appropriate standard errors can potentially increase statistical power and, more importantly, allows the researcher to meet the assumptions of the selected statistical approach. Bootstrapping should be used with caution, however, especially in instances of very small samples. Chernick (1999) cautiously recommends samples sizes C50 when using bootstrapping techniques. Techniques such as bootstrapping are employed when traditional assumptions about the normal distribution of a variable within the population cannot be assumed to hold true. Smaller sample sizes increase the likelihood that the values of the variables will not take on the characteristics of a normal distribution. However, there are some variables that are, by their very nature, not normally distributed. Take, for example, asking undergraduate students to report the number of sexual partners over the past month. Often, such data will take on the characteristics of a zeroinflated Poisson distribution, in which a large number of respondents report zero sexual partners during that period, a concentration of respondents at the lower values, and a long tail reflecting some respondents reporting (seemingly impossibly) large numbers of sexual partners. In this instance, a larger sample size would not change the nature of the data nor would bootstrapping be an appropriate technique for increasing power. In such a case, it would be inappropriate to use bootstrapping and would be an inefficient use of time and resources to collect additional data; instead, the researcher would need to use appropriate statistical techniques to model the data. In summary, it is important to carefully consider potential costs and benefits rather than simply apply sampling rules carte blanche. As we have hopefully illustrated, a priori considerations of statistical power, weighed alongside the importance of sample representativeness, is in most cases preferable to post hoc manipulations of sample sizes. Software programs such as G*Power (Erdfelder, Faul, & Buchner, 1996) are readily available and can be used to determine necessary sample sizes, both total and subgroup sizes, to be able to detect main effects and interaction effects. And when, in spite of our best efforts, we are left with less than ideal sample characteristics, best practices include full and detailed reporting of decisions made along the way, along with adequate descriptions of those both included in and excluded from our analyses. References Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. Barlow, M. R., & Cromer, L. D. (2006). Trauma-relevant characteristics in a university human subjects pool population: Gender, major, betrayal and latency of participation. Journal of Trauma and Dissociation, 7, 59–75. Chernick, M. R. (1999). Bootstrap methods: A practitioner’s guide. New York: Wiley. Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153.

Arch Sex Behav (2012) 41:325–327 Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28, 1–11. Gaither, G. A., Meier, B. P., & Sellbom, M. (2003). The effect of stimulus content on volunteering for sexual interest research among college students. Journal of Sex Research, 40, 240–248. Gigerenzer, G., & Sedlmeier, P. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309– 315. Harris, A., Hyun, J., & Reeder, R. (2011). Survey of editors and reviewers of high-impact psychological journals: Statistics and research design problems in submitted manuscripts. Journal of Psychology, 145, 195– 209. Landrum, R. E., & Chastain, G. (1999). Subject pool policies in undergraduate-only departments: Results from a nationwide survey. In G. Chastain & R. E. Landrum (Eds.), Protecting human subjects: Departmental subject pools and institutional review boards (pp. 25–42). Washington, DC: American Psychological Association. McCray, J. A., King, A. R., & Bailly, M. D. (2005). General and genderspecific attributes of the psychology major. Journal of General Psychology, 132, 139–150.

327 Miller, A. (1981). A survey of introductory psychology subject pool practices among leading universities. Teaching of Psychology, 8, 211–213. Mooney, C. Z., & Duval, R. D. (1993). Bootstrapping: A nonparametric approach to statistic interference. Newbury Park, CA: Sage. Owen, J., Rhoades, G., Stanley, S., & Fincham, F. (2010).‘‘Hooking up’’ among college students: Demographic and psychosocial correlates. Archives of Sexual Behavior, 39, 653–663. Rosell, M. C., Beck, D. M., Luther, K. E., Goedert, K. M., Shore, W. J., & Anderson, D. D. (2005). The pedagogical value of experimental participation paired with course content. Teaching of Psychology, 32, 95–99. Rosenbaum, V. M. (1997). Understanding college age volunteers’ behavior. Unpublished doctoral dissertation, Lehigh University, Bethlehem, PA. Saks, M. J., & Seiber, J. E. (1989). A census of subject pool characteristics and policies. American Psychologist, 44, 1053–1061. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin Company. Wiederman, M. W. (1999). Volunteer bias in sexuality research using college student participants. Journal for Sex Research, 36, 59–66.

123