Sample size calculations: basic principles and ... - Oxford Academic

2 downloads 2642 Views 627KB Size Report
Jan 12, 2010 - pitfalls in sample size calculations and comment on how to report these .... Box 1. Simplest formula for a continuous outcome and equal sample sizes in both groups, assuming: alpha = 0.05 and power ..... index.html [18].
Nephrol Dial Transplant (2010) 25: 1388–1393 doi: 10.1093/ndt/gfp732 Advance Access publication 12 January 2010

CME Series

Sample size calculations: basic principles and common pitfalls Marlies Noordzij1, Giovanni Tripepi2, Friedo W. Dekker3, Carmine Zoccali2, Michael W. Tanck4 and Kitty J. Jager1 1

ERA-EDTA Registry, Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands, 2CNR–IBIM, Clinical Epidemiology and Pathophysiology of Renal Diseases and Hypertension, Renal and Transplantation Unit, Ospedali Riunti, Reggio Calabria, Italy, 3Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, The Netherlands and 4Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands Correspondence and offprint requests to: Marlies Noordzij; E-mail: [email protected]

Abstract One of the most common requests that statisticians get from investigators are sample size calculations or sample size justifications. The sample size is the number of patients or other experimental units included in a study, and determining the sample size required to answer the research question is one of the first steps in designing a study. Although most statistical textbooks describe techniques for sample size calculation, it is often difficult for investigators to decide which method to use. There are many formulas available which can be applied for different types of data and study designs. However, all of these formulas should be used with caution since they are sensitive to errors, and small differences in selected parameters can lead to large differences in the sample size. In this paper, we discuss the basic principles of sample size calculations, the most common pitfalls and the reporting of these calculations. Keywords: epidemiology; nephrology; power; sample size; statistics

Introduction The sample size is the number of patients or other experimental units included in a study, and one of the first practical steps in designing a trial is the choice of the sample size needed to answer the research question. Also in the critical appraisal of the results of published trials, evaluating the sample size required to answer the research question is an important step in interpreting the relevance of these results. It is therefore not surprising that one of the most frequent requests that statistical consultants get from investigators are sample size calculations or sample size justifications. Techniques for sample size calculations are described in most conventional statistical textbooks. However, the wide range of formulas that can be used for specific situations and study designs makes it difficult for most investigators to decide which method to use. Moreover, these calculations are

sensitive to errors because small differences in selected parameters can lead to large differences in the sample size. In this paper, we explain the basic principles of sample size calculations by means of examples from the nephrology literature. In addition, we discuss the most common pitfalls in sample size calculations and comment on how to report these calculations. Why sample size calculations? The main aim of a sample size calculation is to determine the number of participants needed to detect a clinically relevant treatment effect. Pre-study calculation of the required sample size is warranted in the majority of quantitative studies. Usually, the number of patients in a study is restricted because of ethical, cost and time considerations. However, if the sample size is too small, one may not be able to detect an important existing effect, whereas samples that are too large may waste time, resources and money. It is therefore important to optimize the sample size. Moreover, calculating the sample size in the design stage of the study is increasingly becoming a requirement when seeking ethical committee approval for a research project. Components of sample size calculations In order to calculate the sample size, it is required to have some idea of the results expected in a study. In general, the greater the variability in the outcome variable, the larger the sample size required to assess whether an observed effect is a true effect. On the other hand, the more effective (or harmful!) a tested treatment is, the smaller the sample size needed to detect this positive or negative effect. Calculating the sample size for a trial requires four basic components: 1. The type I error (alpha). Clinical studies are usually performed in a sample from a population rather than in the whole study population. In research, we are testing hypotheses to determine whether (results in) particular samples

© The Author 2010. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved. For Permissions, please e-mail: [email protected]

Sample size calculations

1389

Table 1. Overview of errors in clinical research

fect that is present in a population using a test based on a sample from that population (true positive). The power is the complement of beta: 1-beta. So, in case of a beta of 0.20, the power would be 0.80 or 80%, representing the probability of avoiding a false-negative conclusion, or the chance of correctly rejecting a null hypothesis.

Population Difference does not exist Difference does not exist

False negative result Type II error (beta)

3. The smallest effect of interest. The smallest effect of interest is the minimal difference between the studied groups that the investigator wishes to detect and is often referred to as the minimal clinically relevant difference, sometimes abbreviated as MCRD. This should be a differdiffer from each other. On the one hand, the null hypoth- ence that the investigator believes to be clinically relevant esis (H0) hypothesizes that the groups of subjects (sam- and biologically plausible. For continuous outcome variples) that are being compared are not different, that is ables, the minimal clinically relevant difference is a numerthey come from the same source population. The alterna- ical difference. For example, if body weight is the outcome tive hypothesis (H1), on the other hand, hypothesizes that of a trial, an investigator could choose a difference of 5 kg these groups are different and that therefore they seem to as the minimal clinically relevant difference. In a trial with be drawn from different source populations. Sample size a binary outcome, for example the effect of a drug on the calculations are needed to define at what number of sub- development of a myocardial infarction (yes/no), an invesjects it becomes quite unlikely that adding more subjects tigator should estimate a relevant difference between the will change the conclusion. event rates in both treatment groups and could choose, In the process of hypothesis–testing, two fundamental er- for instance, a difference of 10% between the treatment rors can occur. These errors are called type I and type II er- group and the control group as minimal clinically relevant rors, and an overview of these errors is presented in Table 1. difference. Even a small change in the expected difference The type I error (alpha) measures the probability that, giv- with treatment has a major effect on the estimated sample en the H0 that the samples come from the same source size, as the sample size is inversely proportional to the population, the differences found are likely to happen. In square of the difference. For instance, if one would need other words, the alpha represents the chance of a falsely 1000 subjects to detect an absolute difference of 4.8%, rejecting H0 and picking up a false-positive effect. The al- 4000 subjects per treatment group would be required to depha is most commonly fixed at 0.05, which means that the tect a 2.4% difference. researcher desires a