Throwing the baby out with the bathwater: pitfalls of ...

9 downloads 45 Views 110KB Size Report
Dec 24, 2014 - Taylor & Francis makes every effort to ensure the accuracy of all the .... approaches (Parker, Vannest, & Davis, 2011), generalised least squares ...
This article was downloaded by: [Howard Goldstein] On: 04 January 2015, At: 15:15 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Aphasiology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/paph20

Throwing the baby out with the bathwater: pitfalls of misrepresenting single-case experimental designs Howard Goldstein

a

a

Department of Communication Sciences & Disorders, College of Behavioral and Community Sciences, University of South Florida, Tampa, FL, USA Published online: 24 Dec 2014.

Click for updates To cite this article: Howard Goldstein (2014): Throwing the baby out with the bathwater: pitfalls of misrepresenting single-case experimental designs, Aphasiology, DOI: 10.1080/02687038.2014.987045 To link to this article: http://dx.doi.org/10.1080/02687038.2014.987045

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Downloaded by [Howard Goldstein] at 15:15 04 January 2015

Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Aphasiology, 2014 http://dx.doi.org/10.1080/02687038.2014.987045

COMMENTARY Throwing the baby out with the bathwater: pitfalls of misrepresenting single-case experimental designs Howard Goldstein*

Downloaded by [Howard Goldstein] at 15:15 04 January 2015

Department of Communication Sciences & Disorders, College of Behavioral and Community Sciences, University of South Florida, Tampa, FL, USA

Having studied, taught, and used single-case experimental designs (SCED) for 40 years primarily in studies of child language intervention, I was interested in learning of advances in the adult language intervention area. The focus of Howard, Best, and Nickels’ (2014) paper was on “studies aiming to improve word retrieval in persons with aphasia.” Consequently, I surmised that I would update my knowledge of treatments for anomia and the contributions that SCED has made to that literature base. My interest in the developing methods for calculating valid effect size estimates from SCED studies led to the anticipation that this paper would update my knowledge in that area as well. This paper, unfortunately, fell short of expectations on all three counts. Moreover, it has the potential to mislead present and future investigators about scientific method and philosophy that have been advancing behavioural science in communication disorders for several decades.

The inaccurate representation of SCED First, the authors suggest that there are two major approaches that represent SCEDs. Their descriptions bear little resemblance to the strategies and tactics that characterise SCEDs. The first approach discussed may be able to be construed as SCED, as it refers to the need for probing performance pre-therapy (baseline), during therapy, and post-therapy (maintenance). SCED tactics offer many design options for determining whether a functional (causal) relation exists between a treatment and behaviour change (Johnston & Pennypacker, 2009). However, the discussion in the Howard et al. paper excludes a number of important components. The requirements for continual measurement and replication appear to be missing. This problem is illustrated when examining the 10 graphs included in the Howard et al. paper. None of the graphs include a multiple baseline design or another design by which one would be able to determine with some confidence that experimental control is evident. An article on “optimising the design of intervention studies” should be expected to build on the fundamental logic behind SCED. Without an acknowledgment of the most basic strategies, the Howard et al. paper raises questions about whether case studies are being confused with SCEDs. This would be misguided, because in contrast to SCEDs, case studies are descriptive and do not support valid causal inferences about treatments. The second approach discussed contrasts treated and untreated responses pre- and posttherapy evaluated through statistical analysis. This approach seems to resemble a within*Email: [email protected] © 2014 Taylor & Francis

Downloaded by [Howard Goldstein] at 15:15 04 January 2015

2

H. Goldstein

subjects experimental design approach. Indeed, one may ask whether the bulk of the Howard et al. paper may be more relevant to a discussion of design characteristics and statistical analyses that are more akin to group experimental designs that rely on repeated measures or within-subject comparisons. These designs are useful for answering many questions, but not well suited to attempts to evaluate the process of behaviour change as a function of treatment. Again, one should not characterise these designs as SCEDs when they do not include the fundamental components related to continual measurement and replication. Before the labels and conventions of frequently used SCEDs (e.g., multiple baseline designs and withdrawal designs) became established, Sidman (1960) offered a primer for behaviour scientists that outlined the basic tactics of scientific research. Experiments can be conducted for a variety of reasons; and the reason for applied scientists is typically to pursue an understanding of a behavioural phenomena of interest and examine how changes in that behaviour relate to presumed treatment conditions. In this case, the interest is in evaluating treatment approaches to increase naming in persons with aphasia. This is important because decisions about “optimising the design of intervention studies” are made in a context of knowing a behavioural phenomena. For example, one needs to know the phenomena to make informed decision about the myriad details involved in designing a study (Krathwohl, 2009). As science progresses, we constantly refine our understanding of what conditions are functionally related to relevant behaviour change and why. More information from the extant literature would help readers understand the state of the knowledge of aphasia treatment or anomia in particular. Indeed, one would expect this paper to include examples of good models of prototypical SCED studies from aphasia treatment research. As mentioned earlier, SCED methods for evaluating treatments are built around a number of basic tactics. Foremost among them are the critical needs for continual measurement to evaluate behaviour change and for direct and systematic replication. Howard et al. suggest tactics that contradict these fundamental characteristics of SCED. First, in their discussion of baseline data, they suggest that two baseline points may be sufficient. Such short baseline conditions would undermine interpretation of most SCEDs and limit the available effect size statistics. Moreover, their recommendation is incompatible with the behavioural phenomena of interest, anomia, as performance for most individuals with aphasia tends to be variable both across sessions and among individuals, which typically necessitates more baseline measurements. Second, they do not highlight the importance of replication. Fundamentally, SCEDs demonstrate treatment effects by replicating systematic modifications in performance across more than one baseline within and across participants. If behaviour change occurs when and only when treatment is initiated and this relation is replicated within and across participants, then one can be confident that there is a causal relation between the treatment and the dependent variable of interest. That confidence grows through replication, whereas failures to replicate indicate that some other process(es) is affecting behaviour. Sidman recently warned of the dangers of misconstruing SCEDs: Ignorance of the rationale for single-subject methodology leads to ignorance of the special importance of steady-state baselines and multiple baselines of various sorts and of the necessity for refining such baselines to evaluate the success or failure of their treatments. (Holth, 2010, p. 197)

This admonition indicates that SCEDs are not merely means of evaluating treatments; they offer a scientific tool for refining our understanding of behavioural phenomena.

Downloaded by [Howard Goldstein] at 15:15 04 January 2015

Aphasiology

3

Understanding variables that inhibit or enhance variability in performance is critical to an iterative process of developing robust treatments. The logic behind SCEDs is that replications help to elucidate the boundary conditions for our treatments, which is only a step in the process of learning what works for whom. One should be concerned when statistical logic seems to trump scientific logic (Blampied, 2012). Investigators should be wary of statistical arguments that seemingly plead ignorance of a behaviour phenomena and hide behind statistical uncertainty. For example, normative comparisons may be a more appropriate means of determining meaningful target ranges for behavioural phenomena rather than relying on confidence intervals based on the binomial theorem. Repeated performance may indeed influence performance, which is the basic logic behind collecting adequate baseline information. It would certainly simplify the treatment of anomia if repeated testing produced robust treatment effects; however, any such effects are small at best. Howard et al. also suggest that one can reduce statistical uncertainty by teaching 100 responses at a time rather than 10 responses at a time. Their discussion makes no reference to when a particular treatment regimen of this scope might be practically or theoretically advisable. Nor may it be methodologically advisable, in the designing of SCEDs. If a treatment produces robust effects on naming sets of pictures, for example, then replicating those effects on three or more sets of pictures staggered according to a multiple baseline design, replicated across three or more participants would make a compelling case for a reliable experimental effect. The size of the sets is less important than the number and reliability of the replications. These are important considerations in SCEDs. Effect size estimates There is certainly considerable interest in deriving valid means of synthesising information on the magnitude of treatment effects across group and SCEDs (Beeson & Robey, 2006; Goldstein, Lackey, & Schneider, 2014; Horner, Swaminathan, Sugai, & Smolkowski, 2012). Perhaps efforts to derive effect size estimates in the education literature can be informative for aphasia researchers. Several recent efforts have sought to develop valid statistical measures to summarise the effect sizes of SCEDs. A good deal of research is devoted to exploring a variety of approaches including non-parametric approaches (Parker, Vannest, & Davis, 2011), generalised least squares regression approaches (Maggin et al., 2011; Manolov & Solanas, 2013), autoregressive techniques (Hedges, Pustejovsky, & Shadish, 2012, 2013), and Bayesian analyses (Swaminathan, Rogers, & Horner, 2014). These statisticians are sensitive to the concerns raised by Howard et al., such as the number of data points per condition, trends and slopes in data, autocorrelation, and confidence intervals. For example, it is worth noting that effect size estimates standardised by using withinsubject standard deviations are not comparable to effect sizes obtained in between-group designs. Because effect size estimates from SCED are not comparable to those from group experimental designs (SCED effect sizes are typically much larger), a different set of conventions may be needed to interpret SCED effect sizes (Schneider, Goldstein, & Parker, 2008). It is unclear whether those conventions will have broad generality across outcomes if those outcomes have very different distributions, rather than being standard scores with set characteristics, such as means of 100 and standard deviations of 15. Alternatively, the comparability of effect sizes may be rectified if effect sizes from individual cases are aggregated appropriately and other shortcomings of various

4

H. Goldstein

Downloaded by [Howard Goldstein] at 15:15 04 January 2015

approaches can be overcome. Clearly, more research is needed to help us determine how to interpret the typically much larger effect sizes obtained in SCEDs in comparison to effect sizes obtained from group designs. Many of the statisticians cited earlier are comparing various techniques as they are applied to multiple SCED data sets. Although considerable progress is being made, it is fair to say that no agreement has converged yet on a particular method for estimating effect sizes in SSED research. It behoves aphasia researchers to be familiar with the considerable work going on in this area of inquiry.

Conclusion Howard et al. conclude with recommendations that are argued largely on statistical grounds. Each of the recommendations raises numerous questions. There may be occasions when some of the recommendations are worthy of consideration. However, I cannot recommend the wholesale adoption of a single one in the design of SCEDs. Indeed, these recommendations may encourage investigators to throw the baby out with the bath water. Aphasia researchers would be well advised to go back to the basics of SCED. They should continue developing intervention science through the systematic replication of experimental effects for specified treatments with well-described individuals with aphasia. SCED has the potential to advance our knowledge of treatment approaches for people with aphasia, and there are many excellent books that could serve as resources on SCED. Classics, like Sidman (1960), clearly spell out a scientific logic for designing experiments. The science of behaviour analysis continues to progress as investigators explore innovative designs that fit their phenomena of interest. Considerable progress is being made in the use of statistical techniques to augment the use of visual analysis. Nevertheless, it is important to understand the reasoning behind SCED methods and to ensure that fundamental components are not sacrificed, as statistical logic is lauded as the essence of the scientific method. Keep in mind that SCEDs offer a powerful scientific method when dealing with large experimental effects but are not sensitive to small or inconsistent effects. Group experimental designs are better suited for detecting smaller effects and for treatment comparisons. But the task at hand is to develop and refine robust therapies. SCED offers a means to evaluate such therapies. It will take innovative clinical scientists and scientific clinicians to figure out how to make meaningful changes in the lives of persons with aphasia.

References Beeson, P. M., & Robey, R. R. (2006). Evaluating single-subject treatment research: Lessons learned from the aphasia literature. Neuropsychology Review, 16, 161–169. doi:10.1007/s11065-0069013-7 Blampied, N. M. (2012). Single-case research designs and the scientist-practitioner ideal in applied psychology. In G. Madden (Ed.), APA handbook of behavior analysis: Volume 1: Methods and principles (pp. 177–197). Washington, DC: APA. Goldstein, H., Lackey, K. C., & Schneider, N. J. (2014). A new framework for systematic reviews: Application to social skills interventions for preschoolers with autism. Exceptional Children, 80, 262–286. doi:10.1177/0014402914522423 Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single case designs. Research Synthesis Methods, 3, 224–239. doi:10.1002/jrsm.1052. Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2013). A standardized mean difference effect size for multiple baseline designs across individuals. Research Synthesis Methods, 4, 324–341. doi:10.1002/jrsm.1086

Downloaded by [Howard Goldstein] at 15:15 04 January 2015

Aphasiology

5

Holth, P. (2010). A research pioneer’s wisdom: An interview with Dr. Murray Sidman. European Journal of Behavior Analysis, 11, 181–198. Horner, R. H., Swaminathan, H., Sugai, G., & Smolkowski, K. (2012). Considerations for the systematic analysis and use of single-case research. Education and Treatment of Children, 35, 269–290. doi:10.1353/etc.2012.0011 Howard, D., Best, W., & Nickels, L. (2014). Optimising the design of intervention studies: Critiques and ways forward. Aphasiology. Advance online publication. doi:10.1080/02687038.2014.985884 Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics of behavioral research (3rd ed.). New York, NY: Routledge. Krathwohl, D. R. (2009). Methods of educational & social science research: An integrated approach. Long Grove, IL: Waveland Press. Maggin, D. M., Swaminathan, H., Rogers, H. J., O’Keeffe, B. V., Sugai, G., & Horner, R. H. (2011). A generalized least squares regression approach for computing effect sizes in single-case research: Application examples. Journal of School Psychology, 49, 301–321. doi:10.1016/j. jsp.2011.03.004 Manolov, R., & Solanas, A. (2013). A comparison of mean phase difference and generalized least squares for analyzing single-case data. Journal of School Psychology, 51, 201–215. doi:10.1016/j.jsp.2012.12.005 Parker, R. I., Vannest, K., & Davis, J. (2011). Effect size in single-case research: A review of nine nonoverlap techniques. Behavior Modification, 35, 303–322. doi:10.1177/0145445511399147 Schneider, N., Goldstein, H., & Parker, R. (2008). Social skills interventions for children with autism: A meta-analytic application of percentage of all non-overlapping data (PAND). Evidence-Based Communication Assessment and Intervention, 2, 152–162. doi:10.1080/ 17489530802505396 Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology. New York, NY: Basic Books. Swaminathan, H., Rogers, H. J., & Horner, R. (2014). An effect size measure and Bayesian analysis of single-case designs. Journal of School Psychology, 52, 213–230. doi:10.1016/j. jsp.2013.12.002