Statistical screening of system dynamics models - Wiley Online Library

2 downloads 668 Views 1MB Size Report
with the industrial overshoot model used to demonstrate the Digest software ..... inputs, it is more convenient to use statistical software such as S-PLUS. To.
A. Ford and H. Flynn: Statistical Screening of Models 273 Statistical screening of system dynamics models

Andrew Forda* and Hilary Flynnb

Andrew Ford is Professor of Environmental Science and Regional Planning at Washington State University. He teaches system dynamics modeling and is the author of the Island Press text on Modeling the Environment. His doctorate is in Public Policy and Technology from Dartmouth College, and his research deals with energy and environmental problems in the western U.S.A. Hilary Flynn is a researcher at the Prometheus Institute for Sustainable Development, a nonprofit organization in Cambridge, Massachusetts. Her research focuses on the photovoltaic supply chain and markets. She is responsible for the collection, analysis and dissemination of photovoltaic

Abstract This paper describes a pragmatic method of searching for the key inputs to a system dynamics model. This analysis is known as screening. The goal is to learn which of the many uncertain inputs stand out as most influential. The method is implemented with readily available software and relies on the simple correlation coefficient to indicate the relative importance of model inputs at different times in the simulation. The screening is demonstrated with two examples with step-by-step instructions. The paper recommends that screening analysis be used in an iterative process of screening and model expansion to arrive at tolerance intervals on model results. The appendices compare screening analysis with analytical methods to identify the key inputs to system dynamics models. Copyright © 2005 John Wiley & Sons, Ltd. Syst. Dyn. Rev. 21, 273–303, (2005)

Introduction The parameters in system dynamics models are typically estimated in a one-ata time fashion by taking advantage of every source of information at our disposal. The information sources may range from the hard to the soft, as depicted in the information spectrum in Figure 1. Hard sources include physical laws and the results of controlled experiments. Social system data may take the form of time series and cross sectional data. When numerical data1 is available, parameters may be estimated by generalized least squares2 or Kalman filtering (Ford 1999, p. 175; Sterman 2000, p. 867). The softer sources of information are depicted at the right end of the spectrum. These sources may not provide data in numerical form, but they are often the most important sources of information for model development and parameterization (Forrester 1980; Sterman 2000). Expert knowledge may be obtained from informal interviews, Delphi interviews, and intensive modeling workshops. Even better, expert judgment may be obtained when the experts become part of the modeling team and the entire modeling process. At the far end of the spectrum is personal intuition. System dynamics practitioners are willing to call on their personal intuition to estimate parameters even though the parameter value may be more of a “guesstimate” than an estimate. We include highly uncertain parameters when we believe our estimate is “better

a

Program in Environmental Science and Regional Planning, Washington State University, Pullman, WA 991644430, U.S.A. b Prometheus Institute for Sustainable Development, 1280 Massachusetts Avenue, Cambridge, MA 02138, U.S.A. * Correspondence to: A. Ford, Program in Environmental Science & Regional Planning, Washington State University, Pullman, WA 99164-4430, USA. E-mail: [email protected] System Dynamics Review Vol. 21, No. 4, (Winter 2005): 273–303 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sdr.322 Copyright © 2005 John Wiley & Sons, Ltd.

Received March 2005 Accepted August 2005

273

274 System Dynamics Review Volume 21 Number 4 Winter 2005

information that will promote the deployment of solar energy. She earned her MS degree in environmental science from Washington State University. Her research at WSU focused on system dynamics modeling of markets for electricity generation, carbon allowances and tradable green certificates.

Fig. 1. The information spectrum

than zero.” Our approach is to proceed with rough estimates, confident that the importance of the uncertain parameters can be tested through sensitivity analysis.3 The system dynamics approach leads to models with a large number of highly uncertain parameters, so we should ask ourselves which of the parameters are really important. One might think that the parameters with the greatest range of uncertainty are the most important and should be given the greatest attention. On the other hand, some readers would suspect that the key parameters are those located in a strategic position in the model. Perhaps the key parameters are those that control the gain around a key loop in the model?

Purpose and organization This article describes a pragmatic method to search for the most important of the uncertain parameters. The paper builds from statistical studies conducted in the 1980s. We review the previous studies, taking special note of the software advances that make it easy for system dynamics modelers to now conduct analysis that was only possible with customized software in the 1980s. However, current software does not allow for a convenient identification of the most important inputs to our models. This article shows how the key inputs can be identified with the simple correlation coefficient. The relative importance of each input is displayed by calculating the correlation coefficients between the model’s main output and the values assigned to each input in a sensitivity analysis using the Vensim software. The identification of key inputs is called screening (Welch et al. 1992). The screening approach is illustrated with a model of a hypothetical sales company. The sales example is presented in full detail with step-by-step instructions. Our purpose is to make it possible for the reader to verify these results and to conduct their own screening studies. For a second example, we turn to the World3 model described in The Limits to Growth (Meadows et al. 1972). We screen for which of 14 inputs exerts the greatest influence on the human population and the industrial output. The appendices describe analytical methods to identify the key connections in a system dynamics model. We recommend that the analytical and statistical approaches be employed on the same test models so that the complementary powers of the methods can be determined. We illustrate how this may be done with the industrial overshoot model used to demonstrate the Digest software (Mojtahedzadeh et al. 2004).

A. Ford and H. Flynn: Statistical Screening of Models 275

Previous studies with customized software Research in the 1970s and 1980s led to the development of customized software for estimating the key inputs to complex models. Some models were extremely time consuming because of spatial complexity and a wide range of time constants.4 The challenge was to design a small number of simulations that could be executed with limited computer resources.5 Selecting the number of simulations and the values for the inputs is “sample design,” and the common sample designs are random sampling, stratified sampling and Latin Hypercube Sampling (LHS). McKay et al. (1979) compared the three designs and reported that LHS was the most efficient design for computer codes with larger numbers of uncertain inputs. Reilly et al. (1987) described the practical advantage of LHS as a tenfold reduction in the sample size relative to simple random sampling. To learn if LHS sampling could be useful with a system dynamics model, Ford and McKay (1985) studied the range of projections from a model of an electric utility. The goal was to discover any surprises in the model behavior and to assign tolerance intervals to the projections of electricity demand and electricity price. With so many inputs, some might think that an inordinate number of simulations would be required to cover the input space.6 With the power of LHS, however, the key insights emerged with only 20 simulations.7 LHS sampling was implemented with customized programs during the 1980s.8 These programs allowed analysts to identify the key inputs and to estimate tolerance intervals9 on the model projections. The practical benefit of this customized capability was illustrated in a detailed study for the Bonneville Power Administration described by Ford (1990). The goal was to help BPA planners anticipate the uncertainty in electric loads over a 20-year planning interval.10 BPA used a system dynamics model of electric loads along with customized software to find tolerance intervals in the loads.11 The analysis demonstrated that BPA faced highly uncertain loads, but the uncertainty could be reduced substantially by implementing strict efficiency standards on new homes and buildings. The reduced uncertainty had practical benefits to BPA because it reduced the option costs for the agency.12 The pragmatic finding was that the efficiency standards delivered a $250 million reduction in the costs to acquire options on new resources. From a methodological point of view, the relevant portion of the BPA study is the step-by-step approach shown in Figure 2. The approach begins by specifying ranges of uncertainty for the model inputs; it concludes with a set of tolerance intervals that can be used by planners. The practical use of these tolerance intervals determines the time and effort invested in each stage of the analysis.13 The clear boxes in Figure 2 are steps easily conducted with current software. The shaded box deals with the identification of the top inputs, a step made easier by the screening analysis described in this paper.

276 System Dynamics Review Volume 21 Number 4 Winter 2005

Fig. 2. Iterative approach using customized software in the 1980s

An iterative approach to obtaining tolerance intervals Figure 2 begins with setting the range of uncertainty for uncertain inputs to the model. In a typical study, analysts will assign ranges to many, but not all of the inputs. This is a time-consuming step, and one should expect that some individuals will be reluctant to assign ranges of uncertainty.14 When pondering the ranges, it is useful to keep the end point in mind. The goal is to obtain a useful set of tolerance intervals. If investigators disagree on the appropriate ranges of uncertainty, the tolerance intervals can be recalculated with new ranges. For many models, the exact range assigned to the inputs will not make an appreciable difference in the estimates of the tolerance intervals. One should also guard against “overconfidence” in parameter estimates. Stainforth et al. (2005) warns that experts tend to underestimate the range of possible values for uncertain variables. Sterman (2000, p. 884) gives a similar warning. He is especially concerned about overconfidence in statistically estimated parameters, and he recommends that we “test over a range at least twice as wide as statistical and judgmental considerations suggest.” The next step is to decide on the number of runs and to assign values to each of the parameters. Based on previous experience with system dynamics models, we recommend one starts with 50 runs. To verify that this sample size is sufficient, one can repeat the analysis with 100 runs. When you see the same tolerance intervals, you will know that the sample size is sufficiently large. We recommend LHS for assigning the values in the 50 runs.15 The next step is to perform the simulations. With many system dynamics models, 50 simulations can be executed in a few seconds.16

A. Ford and H. Flynn: Statistical Screening of Models 277

The next step is to calculate the tolerance intervals. This required customized software in the 1980s, but tolerance intervals can now be obtained with the help of the sensitivity graph in the Vensim software. This graph shows color-coded intervals, with the term “confidence bounds” in the dialog box controlling the graph design. However, a more descriptive term is “percentiles” (R Eberlein, personal communication 2005) as we illustrate in Figure 3. Fig. 3. (a) Vensim traces of 100 simulations of a bank balance model. (b) Vensim percentile intervals for the bank balance model

Figure 3(a) shows the results of a sensitivity analysis of a bank balance model. The initial balance is $50, but the initial value is uniformly uncertain from $25 to $75. The interest rate is normally 7 percent/year, but the rate is uniformly uncertain from 0 to 14 percent/year. We test the model with 100 simulations using LHS. The 100 runs are traced out in Figure 3(a); Vensim’s percentile intervals are shown in Figure 3(b). The 100 percent percentile is shaded to enclose all 100 simulations. The 98 percent percentile is formed by eliminating the lowest and the highest run. The 90 percent percentile is formed by eliminating the lower five runs and the top five runs. These intervals may be translated into tolerance intervals with a given confidence level. The extreme example may be taken from Vensim’s 100 percent percentiles. According to van Belle (2002, p. 121), the confidence, C, associated with these extreme values is C = 1 − pn − n * (1 − p) * p(n−1) where n is the sample size (100 runs) and p is the proportion of the population to be covered by the interval.

278 System Dynamics Review Volume 21 Number 4 Winter 2005

In this example, the extreme values of 100 runs would cover 90 percent of the results with a confidence level of 100 percent. However, if we wanted the extreme values to cover 99 percent of the results, our confidence level would be only 26 percent. The 98 percent percentile in Figure 3(b) is formed by eliminating the top simulation and the bottom simulation. This interval may be translated into a tolerance interval using the family of curves reported by Hahn and Meeker (1991, p. 86). The curves show the proportion of the population coverage as a function of sample size and the number of extreme results eliminated from the sample. For a 90 percent confidence level, for example, eliminating the upper and lower runs will provide a coverage of 97.8 percent. This coverage is remarkably close to the 98 percent in the Vensim display of a percentile interval. A similar correspondence is obtained for the Vensim display as the 90 percent percentile interval in Figure 3(b).17 Translating the Vensim percentiles into tolerance intervals rests on the assumption that the inputs may be varied independently from one another. But most system dynamics models will include many interdependent inputs.18 These interdependencies should be removed if we are to use the Vensim estimates. However, in a large model with many inputs, there will be dozens and dozens of interdependencies. To remove them all would be infeasible. The more pragmatic approach is to concentrate on the top inputs to the model. The shaded box in Figure 2 depicts the use of partial correlation coefficients to learn which of the many inputs qualify as the top inputs. When we have identified the top inputs, we ask ourselves if they are uncorrelated. If they are, we can proceed to use the Vensim estimates. If they are not, we should expand the model to represent the interdependencies. The expanded model would include new parameters, which are likely to be uncertain. We would assign ranges of uncertainty to the new parameters and iterate through the steps in Figure 2.

Screening with the simple correlation coefficient The customized software from the 1980s relied on the partial correlation coefficient (PCC) to identify key inputs to a model. The PCC ranges from −1 to +1 and measures the strength of the relationship between two variables when all other variables are held constant. For example, suppose we have three variables, Y, X1 and X2. A PCC shows the effect X1 has on Y when the effect of X2 is removed. The ability to remove the confounding effect of other inputs was an important benefit because researchers were testing computationally intensive models that were expensive to simulate.19 System dynamics models are normally designed to simulate quickly, so the screening power of the PCC is not an important benefit. For our purposes, the simple correlation coefficient is a more useful measure.20 (A related approach, described by Powell et al.

A. Ford and H. Flynn: Statistical Screening of Models 279

2005, uses the correlation ratio to find the most important inputs in a model of infectious diseases.) The simple correlation coefficient (CC) ranges from −1 to +1 and measures the strength of the linear relationship between two variables without accounting for other variables that might be influential. The CC is often labeled as r as shown in the following equation: r =

∑(X − 8 )(Y − 9 ) ∑(X − 8 ) ∑(Y − 9 ) i

i

i

2

i

2

The CC is easily calculated in spreadsheet software such as Excel, using the function CORREL(array1,array2). Each array is the range of cells that span an entire column of simulated runs for a given variable at one point in time. The total number of CCs will therefore be the product of the number of uncertain variables multiplied by the number of time periods. This can be a large (but manageable) number, which we demonstrate below.

The sales model We begin with a textbook (Ford 1999) model of the sales company shown in Figure 4. The company is simulated to grow from 50 persons to 750 persons in around 15 years, as shown in Figure 5. The rapid growth is made possible by the positive feedback highlighted with thicker arrows in Figure 4: a larger sales force means more widget sales, more annual revenues, a larger sales department budget, more new hires and a still larger sales force. This loop dominates during the early years, and the company would grow by tenfold during the first 9 years. This growth is limited by a density-dependent negative feedback involving saturation of the market. We assume a saturation size of 1000 persons. Figure 6 shows the assumed relationship between the effectiveness multiplier and the saturation. With 60 percent saturation, for example, each person is able to sell at 90 percent of the maximum. However, as the saturation approaches 100 percent, the multiplier drops rapidly. If the saturation ratio exceeds 100 percent, we assume that each person can sell at only 20 percent of the maximum effectiveness. We now use the model to study the growth in the company with uncertainty in some, but not all, of the model inputs. For example, we assume that the fraction of revenues allocated to sales and the hiring fraction are fixed . The shape in Figure 6 is fixed as well. The other six inputs are highlighted in bold in Figure 4 and represented by sliders in Figure 5. We set the width of each slider to represent the range of uncertainties shown in Table 1. Five of the six parameters are highly uncertain; their range is plus or minus 50 percent. The widget price is known with greater certainty; its range is plus or minus 2 percent.

280 System Dynamics Review Volume 21 Number 4 Winter 2005

Fig. 4. Model of a sales company, with six uncertain parameters in bold

Fig. 5. Base case results with six sliders for informal testing of the uncertain parameters

A. Ford and H. Flynn: Statistical Screening of Models 281

Table 1. Uncertain parameters in the sales model

Base case values Initial sales force = 50 persons Saturation size = 1000 persons Average salary = $25,000 per yr Widget price = $100 Exit Rate = 0.2/year Maximum effectiveness = 2 widgets/day

Ranges of uncertainty Uniform (25, 75) Uniform (500, 1500) Uniform (20,000, 30,000) Uniform (98, 102) Uniform (0.15, 0.25) Uniform (1.5, 2.5)

Fig. 6. Effectiveness multiplier from the sales force saturation ratio

Figure 5 shows a view of the sales model as it might appear when one begins to explore the response of the model to multiple changes in the six inputs. With the speed of “SyntheSim,” one can rapidly explore the model’s response to a wide combination of assumptions on all six inputs. In our opinion, this informal style of testing should be conducted with every system dynamics model. This paper describes a more formal method to cover the parameter space in an organized manner. Vensim’s “Sensitivity Simulation” tool is especially useful for this purpose, as explained in full detail with step-by-step instructions below. (We include the full details to encourage others to perform their own screening studies. Readers who do not envision conducting such analysis may skip to the discussion of results.) Figure 7 shows the “set-up” window for assigning ranges of uncertainty. We ask for 50 simulations, using LHS to select the values of the six inputs in each of the 50 runs. Vensim allows for a variety of statistical distributions; we assign the uniform random distribution to all inputs. We ask Vensim to store the results of the 50 runs in the .vsc file named at the top of the window. Clicking on “next” prompts Vensim to ask for a .lst file, a file with a list of variables to be saved. We focus on the Size of the Sales Force. The lst file should also contain the names of the six variables listed in Table 1. Clicking “finish” prompts Vensim to ask for the name of the .vdf file to store the results of the sensitivity analysis. The results of the analysis may be viewed in Figure 8. Figure 8(a) shows the 50 runs as individual “traces”, an option that is reached by right clicking on the

282 System Dynamics Review Volume 21 Number 4 Winter 2005

Fig. 7. The sensitivity set-up window

Fig. 8. (a) Size of the sales force from 50 runs of the sales model. (b) Corresponding percentiles for 50 percent, 75 percent and 90 percent coverage

sensitivity graph. Some simulations show the company growing to around 1200 persons within the first 5 years. Other simulations show that the company would be unable to grow. Figure 8(b) shows Vensim percentile intervals: 50 percent in light gray, 75 percent in dark gray and 90 percent in black.

A. Ford and H. Flynn: Statistical Screening of Models 283

Translating the percentile intervals in Figure 8(b) into tolerance intervals rests on the assumption that the top inputs are independent of one another. We now use screening analysis to find the top inputs to the sales model.

Screening analysis of the sales model To find the top inputs, we export the 50 simulations to a spreadsheet template designed to receive the values assigned to six uncertain inputs in the 50 simulations with results for the key output saved in 20 time periods. This is one of several templates (see Table 2) that may be downloaded from the authors’ website. Table 2. Spreadsheets available from http:// www.wsu.edu/~forda/ CCTemplate.html

File name CC template 6x50x20.xls CC template 6x50x40.xls CC template 14x50x40.xls

Sensitivity results for 6 uncertain inputs; 50 runs; 20 save periods 6 uncertain inputs; 50 runs; 40 save periods 14 uncertain inputs; 50 runs; 40 save periods

The results of the Vensim sensitivity analysis are stored in a .vdf file, which may be exported to a tab file as shown in Figure 9. In this view, we assign a name to the tab file to remind ourselves that we are saving results for 20 years. The other settings in Figure 9 are the default settings. Clicking on “OK” prompts Vensim to export the results to the tab file. The contents of the tab file will be difficult to read, but they are ready to be imported into the CCs template. We open CC Template 6x50x20.xls and select cell A3, the green cell Fig. 9. Naming the tab file to hold the exported results

284 System Dynamics Review Volume 21 Number 4 Winter 2005

in the template. Then click on “data” in the main menu, followed by “Get External Data” and “Import Text File.” The spreadsheet software will be looking for .txt files, so we ask to look for all files to locate the tab file named in Figure 9. Click on the tab file, and the spreadsheet will open the “Text Import Wizard.” At this stage, we click on next, click on next again, then finish, and agree to have the data go to cell $A$3. The results of the sensitivity analysis will appear in the boxed cells, and the time graphs of the correlation coefficients will appear on the second worksheet. Figure 10 shows the time graph with some reformatting to emphasize the inputs with the strongest influence on the size of the sales force.

Discussion of the sales model Figure 10(a) shows the correlation coefficients for the 20-year simulation. Starting in year 1, we see that the initial value assigned to the stock stands out as the top input. As we would expect, the CC is 1.0, indicating that the size is totally explained by the initial value. This parameter is the most important input for the first 2 years of the simulation. But by the third year, the CC for the initial value has fallen below 0.5, and the stronger inputs appear to be the maximum effectiveness and the average salary. The CC for maximum effectiveness is around 0.7. The CC for the average salary is almost as important, with a negative correlation of around 0.6. These parameters control the gain21 around the positive feedback loop highlighted in Figure 4. A higher maximum effectiveness makes this loop stronger; a higher salary makes it weaker. Figure 10(a) shows that these inputs remain the dominant inputs for the remainder of the simulation. The positive correlation for the maximum effectiveness makes sense because a higher effectiveness leads to a larger company. The negative correlation for the average salary makes sense because a larger salary means that you must budget for a smaller work force. In some situations, it is possible that the CCs will not recognize an important variable because the pattern of influence is not linear across the range of uncertainty. If one feels that a parameter is more important than indicated by the CC, it is useful to examine a scatter plot between the output and the particular input . Scatter plots can be created directly in Excel if there is a single input of interest. However, if one wishes to examine scatter plots of all inputs, it is more convenient to use statistical software such as S-PLUS. To illustrate, we have imported the sensitivity results .tab file into the S-PLUS software. The software makes it convenient to view the scatter plots shown in Figure 10(b). The most visually obvious scatter pattern in Figure 10(b) shows the variation in starting sales force (T0) with the variations in the initial value assigned to the sales force. If one examines the related scatter patterns for T1, T2, and T3, we obtain visual confirmation for the decline in the importance of the starting

A. Ford and H. Flynn: Statistical Screening of Models 285

Fig. 10. (a) Correlation coefficients for the six inputs to the sales model. (b) Scatter plot for all six inputs to the sales force model versus the size of the sales force at the start (T0) and the end of the first 3 years of the simulation

286 System Dynamics Review Volume 21 Number 4 Winter 2005

value (indicated by the decline in the CC in Figure 10a). By the end of the third year, the scatter patterns for T3 confirm the emergence of maximum effectiveness and average salary as influential inputs. Although visual confirmation is always useful, the main purpose of Figure 10(b) is to check for scatter patterns that show an important input that may escape detection by the CC. We do not see such patterns for the sales model. However, if modelers find such patterns in their own analyses, they may wish to consider rank correlation coefficients to screen for the most important inputs. Now, let us turn to the question in the shaded box in Figure 2: are the top inputs sufficiently independent of one another that we can use the tolerance intervals? In this example, we ask if the maximum effectiveness could be specified independently of the average salary. Most readers would believe that these two inputs should not be specified in an independent manner. Companies that pay higher salaries, for example, are probably going to attract sales personnel with a higher maximum effectiveness. If the top inputs are correlated, the interdependence should be included in an expanded sales model. The new model would include additional parameters with their own ranges of uncertainty. We would specify the new ranges and begin the next iteration in Figure 2.

The World3 model For a second illustration, we turn to the global model described in the widely read book on The Limits to Growth (Meadows et al. 1972). The model simulates growth in human population and industrial output from 1900 to 2100 and is thoroughly documented by Meadows et al. (1974). Many readers will be familiar with this model, and some will have formed their own opinions on the top inputs. The model has served as a test case by other researchers,22 and it is readily available as one of the sample models with the Vensim software. We use the Vensim sample model (World3) for this illustration. Figure 11 encourages us to think about the feedback loops that dominate the simulated growth in the world system. This image is from the cover page of Fig. 11. Image of loop dominance in World3

A. Ford and H. Flynn: Statistical Screening of Models 287

Toward Global Equilibrium (Meadows and Meadows 1973), a collection of readings published as part of the global modeling study. Figure 11 depicts positive feedback weighing heavily on the scales to remind us that the positive feedbacks are sufficiently strong to permit exponential growth in industrial output and human population. For population growth, the positive loop is: more people, more children, more mature people and still more children. For industrial growth, the positive loop is: more industrial capital, more industrial output, greater investment in industrial capital, and still greater industrial output. These loops power the exponential growth in a system that must eventually come into accommodation with limits. The specific limits in The Limits to Growth are the amount of potentially arable land, the maximum yield from each hectare of land, the initial stock of nonrenewable resources and the ability of the environment to assimilate persistent pollutants. Table 3 shows a collection of 14 inputs to the World3 model. The first parameter is the initial size of the population—a population that is split into the four age groups of the population model. We then include three inputs that influence the rate of growth in human population. We include three parameters influencing the growth in industrial capital and two parameters influencing the growth in service capital. The final five parameters are associated with the limits to growth. Table 2 lists the short names (such as LEN for

Table 3. Fourteen variables selected for screening analysis of World3

Growth in human population PI = 16 Initial population (split into age groups) LEN = 28 years Life Expectancy Normal is the years of life expectancy with subsistence food, no medical care and no industrialization MTFN = 12 The normal maximum fertility with sufficient food and perfect health DCFSN = 3.8 The Desired Completed Family Size Normal Growth in industrial capital ICI = 2.1 e11 Industrial Capital Initial in $ ALIC1 = 14 years Average Life of Industrial Capital that would apply prior to a policy year ICOR1 = 3 years Industrial Capital Output Ratio is the ratio of the industrial capital stock to the annual industrial output, the value taken prior to the policy year Growth in service capital SCOR1 = 1 year Service Capital Output Ratio is the ratio of the service capital stock to the annual output prior to a policy year ALSC1 = 20 years Average Life of Service Capital prior to the policy year Limits to growth NR = 1e12 PALI = 2.3e9 ILF = 600 IMEF = 0.1 AHL70 = 1.5 years

Initial value of nonrenewable resources, measured in resource units Initial hectares of potentially arable land Inherent Land Fertility (vegetable equiv. kilograms/year per hectare) Industrial Materials Emissions Factor Assimilation half-life for persistent pollutants in 1970

288 System Dynamics Review Volume 21 Number 4 Winter 2005

Life Expectancy Normal) that appeared in the original World3 model (which was implemented in Dynamo). The estimates and uncertainties associated with these parameters are explained in detail by Meadows et al. (1974). Some estimates are highly uncertain, while others are known with more confidence. To keep this illustration simple, we will assume that each parameter is uniformly uncertain across a range of plus or minus 10 percent. We ask for a sensitivity analysis with 50 runs using LHS, and we save the results for industrial output and the human population. Figure 12 shows the 50 traces of the industrial output. The 50 runs trace out a consistent pattern of overshoot and decline. Variations in the inputs shift the timing of the overshoot, but they do not alter the fundamental pattern. The 50 runs reveal that assumptions leading to more rapid growth in the early years cause the overshoot to appear sooner and for the decline to occur in a more precipitous manner.

Fig. 12. Industrial output in 50 runs of the World3 model (with the vertical scale from 0 to 4 trillion $/year)

Figure 13 shows the results for the human population. These traces also reveal a consistent pattern of overshoot and decline. Variations in the inputs change the timing of the overshoot, but they do not alter the fundamental pattern. Comparing Figures 12 and 13 shows that the decline in the human population is considerably less pronounced than the decline in industrial output. For screening purposes, we export the World3 sensitivity results to CC Template 14x50x40.xls, which handles 14 inputs and 50 simulations. We set the save period to 5 years, so 40 sets of saved results corresponds to the 200year simulation. The correlation coefficients for industrial output are shown in Figure 14. The simulation begins in 1900 with the industrial output dominated by the value assigned to ICI, the initial value of industrial capital and ICOR, the industrial capital output ratio. The base case simulation begins with $210

A. Ford and H. Flynn: Statistical Screening of Models 289

Fig. 13. Human population in 50 runs of the World3 model (with the vertical scale from 0 to 20 billion)

Fig. 14. Correlation coefficients to screen for inputs with the main influence on the industrial output in the World3 model

billion in industrial capital. The base capital output ratio is set at 3 years23 so the initial capital stock of $210 billion will produce $70 billion of industrial output. Thus, we should expect high positive correlation between the industrial output and the estimate of ICI. We should expect a high negative correlation for the estimate of the ICOR. Figure 14 shows the expected results at the start of the simulation. After several decades, the ICI fades in importance. The most striking pattern in Figure 14 is the correlation coefficient for the ICOR, the industrial capital output ratio. It remains at minus 0.8 for the first half of the simulation. This

290 System Dynamics Review Volume 21 Number 4 Winter 2005

strong negative correlation is due to the ability of the industrial economy to grow more rapidly with a lower value of the capital output ratio. During 2000– 2040, however, the correlation coefficient for the ICOR (and for every other input) changes sign. After this period of transition, the ICOR emerges as the dominant input for the final years of the simulation. Figure 15 helps one understand this pattern by focusing on three simulations of the model. Fig. 15. Industrial output in three runs of the World3 model (with the vertical scale from 0 to 4 trillion $/year)

This graph shows industrial output with three values assigned to the ICOR. A lower value of the ICOR allows the industrial economy to grow more rapidly. Thus, for the first 100 years, a lower value of ICOR translates into a higher value of industrial output. However, in a world with finite limits, rapid growth eventually leads the system to an earlier overshoot and a more rapid decline. During the final 50 years of the simulation, we expect to see a high, positive correlation between output and the value assigned to the ICOR. Figure 16 shows the correlation coefficients for the human population. As we would expect, PI, the initial value assigned to the population, dominates in the early years of the simulation. After four decades, the correlation coefficient for PI has faded, and the ICOR emerges as the most important. Figure 16 shows the correlation coefficient shifting signs during the interval from 2000 to 2040, a shift explained previously. During this transition period, the estimate of the DCFSN, the desired completed family size normal dominates. But by the end of the simulation, the ICOR shows the greatest correlation.

Discussion of the World3 results World3 is a complicated model with around 150 equations and 600 pages of documentation (Meadows et al. 1974). Despite the model complexity, one can

A. Ford and H. Flynn: Statistical Screening of Models 291

Fig. 16. Correlation coefficients to screen for the inputs with the main influence on the human population in the World3 model

understand the model’s dominant pattern of behavior by focusing on the power of exponential growth. In the industrial sector, output has grown at 3.6 percent/year during 1900–1970, doubling in size every 19 years. The world’s human population grew at 1.2 percent/year during this same time period, which implies a doubling time of 58 years. The combination of industrial growth and population growth causes the simulated world to grow past a sustainable size and decline at some time during the coming century.24 With the relatively rapid growth in the industrial sector, we should expect the screening analysis to draw our attention to parameters that control the gain in the positive feedback loop of reinvestment in industrial capital. In our illustration, the parameter with most influence on this loop is the ICOR, the industrial sector capital output ratio. Lower values of this ratio lead to more output, greater reinvestment in industrial capital and still greater output. Thus, lower values lead to more rapid growth during early decades, followed by more rapid decline during later decades. This pattern is confirmed by the dominant values of the correlation coefficient for ICOR in Figure 14. Some may wonder about the low correlation coefficients for NR, the estimate of the natural resources available in 1900. One might expect NR to play a more important role in the second half of the simulations since the decline in industrial output in the “standard run” of World3 “occurs because of nonrenewable resource depletion” (Meadows et al. 1972, p. 125). But Fig-

292 System Dynamics Review Volume 21 Number 4 Winter 2005

ure 14 shows that the NR is one of the least important inputs. One reason for this unexpected result is the narrow range of uncertainty. (This particular input is much more uncertain than the plus or minus 10 percent used in our analysis.25) Another reason involves the cumulative power of exponential growth. When we lower the ICOR, for example, we create the possibility for more rapid growth, growth that continues decade after decade until the limits are reached. The cumulative effects of rapid growth will tend to dominate the statistical screening analysis.26 The screening for key inputs to explain the human population is also somewhat surprising. We see that the key input is the ICOR, the same input that dominates the explanation of industrial capital. For the interval from 2020 to 2050, the dominant input is the DCFSN, the desired completed family size normal. However, for the rest of the simulation, the size of the human population is dominated by the value assigned to the ICOR.27

Summary We have demonstrated a simple method of searching for the most important parameters in a system dynamics model. We use the simple correlation coefficient to screen for the parameters with the greatest statistical influence on the model output. Screening can now be performed with readily available software, and we encourage system dynamics practitioners to put this method to pragmatic use in their own studies. Screening is especially useful for practitioners interested in estimating tolerance intervals on model results.

Acknowledgements The work reported here has been supported in part by the National Science Foundation under grant ECS-0224810.

References Backus G, Amlin J. 1985. Combined Mutidimensional Simulation Language. In Proceedings of the 1985 International System Dynamics Conference, System Dynamics Society, Albany, NY. Eberlein R. 1989. Simplification and understanding of models. System Dynamics Review 5(1): 51–68. Ford A. 1990. Estimating the impact of efficiency standards on the uncertainty of the Northwest Electric System. Operations Research 38(4): 580–597. Ford A. 1999. Modeling the Environment. Island Press: Covelo, California. Ford A, McKay M. 1985. Quantifying uncertainty in energy model forecasts. Energy Systems and Policy 9(3): 217–241.

A. Ford and H. Flynn: Statistical Screening of Models 293

Ford D. 1999. A behavioral approach to feedback loop dominance analysis. System Dynamics Review 15(1): 3–36. Forrester J. 1980. Information sources for modeling the national economy. Journal of the American Statistical Association 75(371): 555–566. Forrester N. 1983. Eigenvalue analysis of dominant feedback loops. In 1983 International System Dynamics Conference, System Dynamics Society, Albany, NY. Goncalves P. 2000. The impact of shortages on push–pull production systems. Working paper from the Sloan School of Management, MIT. Goncalves P, Hines J, Sterman J. 2005. The impact of endogenous demand on push-pull production systems. System Dynamics Review 21(3): 187–216. Graham A. 1980. Parameter estimation in system dynamics modeling. In Elements of the System Dynamics Method, Randers J (ed.). Pegasus Communications, Waltham, MA. Guneralp B. 2005. Towards coherent loop dominance analysis: progress in eigenvalue elasticity analysis. In Proceedings of the 2005 International Conference of the System Dynamics Society. http://www.albany.edu/cpr/sds/conf2005/index.htm [9 November 2005]. Hahn G, Meeker W. 1991. Statistical Intervals: A Guide for Practitioners. Wiley: New York. Kampmann C. 1996. Feedback loop gains and system behavior. In Proceedings of the 1996 International System Dynamics Conference, System Dynamics Society, Albany, NY. Kampmann C. 2004. Feedback loop gains and system behavior. Working paper from the Center for Applied Management Studies, Copenhagen Business School. Kitching R. 1983. Systems Ecology: An Introduction to Ecological Modeling. University of Queensland Press: Saint Lucia. Lyneis J, Pugh A. 1996. Automated vs. “hand” calibration of system dynamics models. In Proceedings of the 1996 International System Dynamics Conference. McKay M, Conover W, Beckman R. 1979. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21. Meadows D, Meadows D (eds). 1973. Toward Global Equilibrium: Collected Papers. Pegasus Communications, Waltham, MA. Meadows D, Meadows D, Randers J, Behrens W. 1972. The Limits to Growth. Universe Books: New York, NY. Meadows D, Behrens W, Meadows D, Naill R, Randers J, Zahn E. 1974. Dynamics of Growth in a Finite World. Pegasus Communications, Waltham, MA. Miller JH. 1998. Active nonlinear tests of complex simulation models. Management Science 44(6): 820–830. Mojtahedzadeh M, Andersen D, Richardson G. 2004. Using Digest to implement the pathway participation method for detecting influential system structure. System Dynamics Review 20(1): 1–20. Oliva R. 2003. Model calibration as a testing strategy for system dynamics models. European Journal of Operational Research 151: 552–568. Peterson D. 1980. Statistical tools for system dynamics. In Elements of the System Dynamics Method, Randers J (ed.). Pegasus Communications, Waltham, MA. Powell D, Fair J, LeClaire R, Moore L, Thompson D. 2005. Sensitivity analysis of an infectious disease model. In Proceedings of the 2005 International Conference of

294 System Dynamics Review Volume 21 Number 4 Winter 2005

the System Dynamics Society http://www.albany.edu/cpr/sds/conf2005/index.htm [9 November 2005]. Reilly J, Edmonds J, Gardner R, Brenkert A. 1987. Uncertainty analysis of the IEA/ ORNU carbon dioxide emissions model. Energy Journal, July. Richardson G. 1986. Dominant structure. System Dynamics Review 2(1): 68–75. Richardson G. 1996. Problems for the future of system dynamics. System Dynamics Review 12(2): 141–157. Senge P. 1977. Statistical estimation of feedback models. Simulation 28(June): 177–184. Stainforth DA, Aina T, Christensen C, Collins M, Faull N, Frame DJ, Kettleborough JA, Knight S, Martin A, Murphy JM, Piani C, Sexton D, Smith LA, Spicer RA, Thorpe AJ and Allen MR. 2005. Uncertainty in predictions of the climate response to rising levels of greenhouse gases. Nature 433: 403–406. Sterman J. 2000. Business Dynamics. Irwin McGraw-Hill: New York. van Belle G. 2002. Statistical Rules of Thumb. Wiley: New York. Welch W, Buck R, Sacks J, Wynn H, Mitchell T, Morris M. 1992. Screening, predicting and computer experiments, Technometrics 34(1): 15–25.

Appendices This paper stresses the pragmatic benefits of screening. There could be methodological benefits as well. Perhaps screening analysis could be used in concert with analytical methods that detect the key structure in our models? Appendix A describes analytical methods for finding the key connections in system dynamics models. We discuss whether screening analysis could be used along side of these methods in a complementary manner. Appendix B illustrates complementary results with a screening analysis of the industrial structures model examined by Mojtahedzadeh (2004). Appendix A: analytical methods and screening analysis System dynamics models are comprised of a combination of feedback loops that control the dynamic behavior. Many readers have come to expect that certain loops will dominate the behavior of system dynamics models during different phases of the simulation. (These readers would be comfortable with images like the scales in Figure 11.) Although practitioners may have shared expectations, we are not necessarily well equipped to identify the key loops in a rigorous manner. Richardson (1986, 1996) describes the search for “dominant structure” as one of the top research challenges for the field. David Ford (1999) describes a behavioral approach to identify dominant feedback loops based on the concepts of atomic behavior patterns.28 Other researchers have reported on automated analysis tools that rely on eigenvalues to characterize the modes of behavior (Eberlein 1989; Forrester 1983; Goncalves 2000; Kampmann 1996, 2004). Nathan Forrester (1983) describes a formal link between the strength of a feedback loop and the system eigenvalues using the “eigenvalue elasticity.”

A. Ford and H. Flynn: Statistical Screening of Models 295

The elasticity represents the relative change in the eigenvalue that results from a relative change in the gain around the loop. The elasticity of the real part of the eigenvalue impacts the rate of decay or growth of the state variable. The elasticity of the complex part of the eigenvalue impacts the oscillatory behavior. For example, if the complex part is positive, an increase in the loop gain leads to an increase in the frequency of the oscillations. These ideas are explored by Kampmann (1996, 2004), whose recent working paper applies loop elasticity analysis to a model of the economic long-wave. Kampmann states that eigenvalue elasticity analysis yields insights beyond “old-fashioned” intuition.29 Goncalves (2000, 2005) also addresses the causes of oscillatory behavior. He shows eigenvalue evolution plots in a case study of capacity utilization at a semiconductor manufacturer. He states that eigenvalue analysis yields pragmatic insights30 especially when the plots show “sharp transitions from real to complex eigenvalues.” A conference paper by Guneralp (2005) presents a ten-step approach for analysis of loop dominance which uses a combination of software programs (Vensim, C language and MATLAB). Guneralp illustrates the approach by showing the evolution of loop dominance in a model of predator–prey cycles and in a model of the economic long wave. These studies of oscillatory systems raise the question of whether a screening analysis would provide complementary findings. To address this question, we consider a variation in the sales model. The model in Figure 4 has a single state variable, so it will not generate oscillatory behavior on its own. However, it will show endogenous oscillations if we introduce a time lag at some point in the negative feedback loop controlling the sales persons’ effectiveness (Ford 1999, p. 218). The lag can take the form of a material delay (i.e., training delay) or an information delay (i.e., slowing the effect of saturation on effectiveness). We introduce an information delay and then subject the new sales model to a sensitivity analysis with the length of the time lag as one of six uncertain inputs. Figure A1 shows traces of 50 simulations of the new sales model. Some simulations show a limit cycle; others show damped oscillations; and some show no oscillations at all. The length of the lag time is a key parameter in Fig. A1. Oscillatory behavior in 50 runs of a new version of the sales model

296 System Dynamics Review Volume 21 Number 4 Winter 2005

shaping the oscillations. If the lag time is longer, the oscillations have a longer period, and they are more volatile. The results in Figure A1 were exported to a spreadsheet to permit a screening analysis of the top inputs. We were curious to learn if the length of the lag time would appear among the top inputs to the new model. Despite the known importance of the lag time, the screening indicated that the lag time would be the least important of the model inputs. This interesting result arises from the overlapping behaviors in Figure A1. It appears that correlation coefficients are not well suited to help us detect the distinctive features of oscillatory behavior. This example suggests that screening analysis might not serve as a useful complement to the eigenvalue methods when searching for dominant structure in oscillatory models. But screening might prove useful when studying models that exhibit exponential growth, S shaped growth or overshoot. We explore this possibility in Appendix B. Appendix B: screening analysis of the industrial structures model This appendix screens for the key inputs of a model of growth and overshoot in industrial structures. The model was created by Mojtahedzadeh (2004) to illustrate the use of Digest, a software which accepts the list of equations generated by one of the standard system dynamics software packages (i.e., Stella, Vensim, Powersim). “Once a text version of the model equations has been edited and accepted by Digest, the software leads the modeler through a series of step-by-step procedures that use the PPM calculation to first detect and then display model structure.” PPM stands for the pathway participation metric, “a mathematical calculation that can help to identify the linkages between the structure and behavior of a dynamic system.” Mojtahedzadeh argues that “pathways, links of causal structure between two system stocks” may be envisioned as the primary building blocks of influential structure. His recent article reviews the conceptual underpinnings of the PPM approach and “presents an experimental piece of software, Digest, that can be used to implement the approach.” Figure B1 shows our version of the industrial structures model. There are 10 structures at the start of the simulation. They are subject to a demolition rate of 5 percent/year and a normal investment rate of 12 percent/year. The net growth rate would be 7 percent year, so we should expect the number to double every ten years as long as there is sufficient water. Each structure needs 10 units of water per year, and water consumption drains a stock of water reserves remaining.31 If the relevant fraction of water reserves remaining fails to provide one year of coverage, the demand will not be fully satisfied, and the investment in new structures will be reduced compared to the normal investment. Figure B2 shows verification of the previous results (Mojtahedzadeh 2004, p. 9). Industrial structures grow at 7 percent/year for over 20 years before the

A. Ford and H. Flynn: Statistical Screening of Models 297

Fig. B1. Our version of a model of industrial structures whose growth is limited by water reserves

Fig. B2. Simulation matching the original overshoot in industrial structures

investment is slowed by limitations on water. The slowdown appears as an abrupt change in the investment in the 22nd year. Six years later, the investment has fallen below demolitions, and the number of industrial structures begins to decline. Mojtahedzadeh explains that Digest detects the first 24 years as a period of reinforcing growth. Digest identifies the positive feedback loop highlighted in Figure B1 as the most influential structure during this period. Digest detects the negative feedback loop involving depletion of water reserves as the “most influential structure” in the remainder of the simulation. To learn if a screening analysis will lead to similar results, we screen for which of six inputs stand out as most important. Table B1 shows the base case

298 System Dynamics Review Volume 21 Number 4 Winter 2005

Table B1. Uncertain parameters in the industrial structures model

Base case values

Ranges of uncertainty

Initial IS = 10 structures normal inv rate = 0.12 per year demolition rate = 0.05 per year Initial Water Res = 10000 units Fr Counted = 0.10 water need per IS = 10 units/year

Uniform (8, 12) Uniform (0.10, 0.14) Uniform (0.04, 0.06) Uniform (8000, 12000) Uniform (0.08, 0.12) Uniform (8, 12)

values and ranges of uncertainty. To keep the illustration simple, each input is uniformly uncertain across a range of plus or minus 20 percent. We conduct a sensitivity analysis using LHS with 50 runs. The sensitivity results are exported to CC Template 6x50x40.xls, and the correlation coefficients are shown in Figure B3.

Fig. B3. Correlation coefficients for 6 inputs to the industrial structures model

Figure B3 shows that the initial value of the industrial structures is the dominant input at the start of the simulation. This input fades in importance over time, and the normal investment rate emerges as the top input after the tenth year of the simulation. The normal investment rate is the key parameter that controls the gain around the positive feedback loop highlighted in Figure B1. This is the same feedback loop detected by the Digest software, so one

A. Ford and H. Flynn: Statistical Screening of Models 299

might conclude that screening can be used to confirm the results of the Digest software. But the remainder of the results in Figure B3 seem “out of phase” with the phases of behavior detected by the Digest software. When calculating PPMs, Digest identifies the first 24 years as a “phase 1” period dominated by the positive feedback of reinvestment. But the remaining years are dominated by a negative feedback loop involving the depletion of water reserves and the resulting cutback in investment. The key parameters in this loop are the initial water reserves, the water need per industrial structure and the fraction of water reserves counted in calculating coverage. Figure B3 shows that one of these inputs emerges as most influential around the end of the simulation. (The value assigned to the initial water reserves has the largest correlation coefficient after the 37th year of the simulation.) Comparing these screening results with the “influential structures” identified by Digest leaves us with the general impression that the screening results are “slow to recognize” the key inputs associated with the influential loops identified by Digest. The screening finds that the normal investment rate is dominant in the early years, but this dominance does not appear until ten years into the simulation. The screening finds that the size of the water reserves is dominant in later years, but this dominance does not emerge until 12 years after Digest’s identification of a change in influential structure. Perhaps one should expect a screening analysis to be relatively slow in recognizing dominant inputs since the only information is the statistical correlation between the inputs and the slowly changing stock variable.32

Notes 1. Graham (1980) advises that parameters should be estimated by “using data below the level of aggregation of the model variables.” When following this advice, practitioners estimate some parameters a priori from direct observations and educated guesses (Oliva 2003). Parameter estimates are then revised in an interactive calibration process which may be done “by hand” (Lyneis and Pugh 1996) or with automated calibration methods (Oliva 2003). 2. Problems with statistical methods such as generalized least squares have been identified by Senge (1977). These problems leave system dynamics practitioners skeptical of multiple regression analysis. They usually prefer the method of full-information maximum likelihood estimation via optimal filtering, a method described by Peterson (1980). An alternative method is the nonlinear model reference optimization described by Lyneis and Pugh (1996). 3. Modelers from other disciplines are also willing to proceed with rough estimates based on personal intuition, as explained in Systems Ecology

300 System Dynamics Review Volume 21 Number 4 Winter 2005

(Kitching 1983, p. 41): “when the measurement of real values may be difficult or even impossible . . . the value used in the model, legitimately, may be a straight-out guess. Of course, such a value cannot be regarded and treated in the same way as other parameters in the model. It must be treated tentatively and its role can be evaluated using techniques of sensitivity analysis.” 4. An example of a spatially complex model is a model to simulate the temperature and pressure within the core of a nuclear reactor. Such a model simulates in three spatial directions as well as the time dimension. Other models may require extraordinary computer time because their developers elect to mix short and long time constants in the same model. 5. Some models can require extraordinary computer time. An extreme example is the general circulation model to calculate climate response to rising levels of greenhouse gases. Stainforth et al. (2005) describe a sensitivity analysis to study the effect of six parameters. The analysis called for over 2500 simulations. The computer time was so large that the team enlisted the help of over 90,000 participants who volunteered access to their personal computers. 6. Suppose our sample design was to select low and high estimates of each of 45 inputs and one thought it was necessary to test for each and every possible combination of the parameters, as in: (2 values for the 1st input) * (2 values for the 2nd input) * . . . (2 values for 45th input)

7. 8.

9.

10.

This reasoning would suggest a need for 245 simulations, which is over 30 trillion simulations. To verify that 20 simulations were sufficient, we increased the sample size to 100 simulations and checked for the same tolerance intervals. One program was constructed by McKay et al. (1979) for use at the Los Alamos National Laboratory. A related program was developed by Backus and Amlin (1985) for use in their consulting studies. A tolerance interval describes lower and upper bounds that enclose a specified proportion of a population to a given confidence. For example, we might ask for the bounds to cover 75 percent of the population, and we wish to be 90 percent confident that the bounds are sufficiently large. The other commonly used intervals are “confidence intervals” and “prediction intervals” (van Belle 2002, p. 120). The confidence interval bounds the mean value of the population. (For example, an administrator of a Corporate Average Fuel Efficiency Standard for a population of new cars may wish to know the upper and lower bounds on the average fuel efficiency with a given confidence.) The prediction interval bounds the next observation drawn from the population. Regional load uncertainty was influenced by uncertainty in economic growth, the electricity rates and the extent to which consumers react to

A. Ford and H. Flynn: Statistical Screening of Models 301

11.

12.

13.

14.

15. 16.

17.

18.

19.

20.

changes in electricity rates. BPA loads were influenced by the regional loads and the decision making by utilities that could turn to BPA to serve part of their load obligation. The analysis focused on uncertainty in regional electric loads and the portion of those loads that were placed on Bonneville. The uncertainty was associated with the combined uncertainty in 150 inputs to a system dynamics model. The system dynamics model was implemented in Dynamo, and the statistical analysis was conducted with the Hypersens Fortran program developed by Backus and Amlin (1985). To appreciate the reduction in option costs, imagine that planners acquire generating resources to serve the mean estimate of their load obligation. Next, imagine that they acquire options to serve the difference between the mean and the high estimate. Efficiency standards on new homes and buildings deliver the most savings in scenarios with rapid growth in the economy, so they reduce the high estimate of load and, therefore, the need for options. In the BPA study, for example, the pragmatic question was not the size of the tolerance intervals. It was whether the tolerance intervals would be reduced in a substantial way by efficiency standards on new homes and buildings. From personal experience, we have noticed that analysts who are reluctant to commit to a “best guess” on a parameter are also reluctant to describe a range of uncertainty in the parameter. This tendency might arise from a general wish to avoid conducting quantitative analysis. Latin Hypercube Sampling (LHS) is readily available with the Vensim software. Most system dynamics practitioners avoid building models of “stiff systems”, systems with a mix of short and long time constants. For example, it should be possible to design a model to simulate with no more than 1000 steps (Ford 1999, p. 117). The 90 percent percentile interval is formed by eliminating the top five simulations and the bottom five simulations. Hahn and Meeker show on page 86 that eliminating these runs will create an interval with 92 percent coverage at a confidence level of 90 percent. For example, we might run a model of a particular industry with an input for the rate of inflation and another input for growth in the GNP. If one thought growth and inflation were positive correlated, these inputs would be changed in tandem when doing “what if” studies with the model. The Hypersens software was used to find the partial rank correlation coefficients (i.e., PCCs of transformed data), and the PCCs would reveal the key inputs in a surprisingly small number of simulations. The CC is more useful because it is more readily available with spreadsheet software like Excel. Excel does not provide a function for the PCC, so one would have to turn to statistical software such as S-PLUS. The coding of S-PLUS would be a time-consuming interruption that is not necessary in

302 System Dynamics Review Volume 21 Number 4 Winter 2005

21.

22.

23.

24.

25. 26.

27.

28.

29.

the screening of system dynamics models. If one is unsure that the CCs are reliable, one can write separate code for S-PLUS estimates of the PCCs. One would then compare the PCCs and the CCs to learn if same inputs emerged as important. However, an easier check on the screening analysis is to simply double the sample size. We would then study the two sets of CCs to learn if the same inputs emerge as important. The gain is also influenced by the price of a widget. This input is assigned a narrow range of uncertainty, so it has less opportunity to influence the size of the sales force. Miller (1998) used the World3 model as a test case of “active nonlinear tests” in which optimization techniques were employed in an attempt to “break” the model’s implications for overshoot and collapse. His results were presented as evidence that “dramatic changes in the predictions of the World3 model can result from even minor changes in some parameters.” Sterman (2000, p. 887) describes Miller’s analysis as “showing that the model exhibits significant numerical sensitivity but low behavior mode sensitivity.” The World3 model was also used as a test case by C Kampmann (personal communication, 2005). The value of 3 years is representative of the U.S. economy over the interval from 1900 to 1970 (Meadows et al. 1974, p. 218). A low value of this ratio means more industrial output is produced by the stock of industrial capital. Some of the output is reinvested in industrial capital, so a low value of this ratio means a higher gain around the positive feedback that powers the growth in the industrial sector. The overshoot pattern is a robust result from World3 when the model is tested with variation in the uncertain inputs. But the overshoot is not inevitable. It can be avoided by the adoption of a mixture of growthregulating and technological policies (Meadows et al. 1972, Ch. V). A typical sensitivity test of the importance of NR is to double the value and rerun the model (Meadows et al. 1974, p. 401). When interpreting results, readers should remember that we are conducting the screening analysis starting in the year 1900, and the results are not recalibrated to match historical results for 1900–1970. The pattern of correlation is similar to the pattern in Figure 14. That is, values of the ICOR which cause the industrial sector to grow more rapidly in the early decades also cause more rapid growth in the human population. These same values lead to a more rapid decline in both industrial output and human population in later decades. Linear, exponential and logarithmic behaviors are examples of “atomic behavior patterns.” David Ford (1999, p. 7) uses the second derivative of the variable of interest to reveal the atomic behavior pattern. For example, Kampmann states that “the self-ordering loop plays a dominant destabilizing role” during the “self order growth phase” of the simulation, a result made evident by a very large elasticity of the real part of the

A. Ford and H. Flynn: Statistical Screening of Models 303

eigenvalue. In contrast, the “hoarding loop” is said to play no role in the dynamics, a result that he confirmed by rerunning the long-wave model with the hoarding loop disabled. David Ford (1999, p. 29) applied his behavioral approach to the long-wave model and reached “the same general conclusions concerning which loops dominate the long-wave model.” Interestingly, he reported that the two approaches “differ somewhat in when shifts in loop dominance occur.” 30. Goncalves states that an analysis of the nine eigenvalues in his model suggests that the shift in the mode of operation from a stabilizing push– pull system to an unstable push system occurs due to stock-outs in upstream inventories. 31. Mojtahedzadeh does not explain the nature of the water resource, nor does he explain the units of water consumption. To make hydrologic sense of his example, one might assume that the industrial structures pump their water from an isolated aquifer with no recharge. (There is an unnamed variable that limits the available water to 10 percent of the stock, so we might envision that only 10 percent of the aquifer is amenable to pumping.) As long as the 10 percent provides one year of coverage, demand will be satisfied and investment will proceed at the normal rate. However, if the coverage were to fall to 0.75 years, for example, only 75 percent of the demand will be satisfied and investment will be 75 percent of the normal amount. This interpretation of the model eliminates the need for the two nonlinear relationships in the Mojtahedzadeh model while giving the same results. 32. David Ford (1999, p. 29) found a somewhat similar result when comparing his “behavioral approach” to loop dominance with the automated analytical approach by Kampmann. Ford reported that the two methods “differed somewhat in when shifts in loop dominance occur.” However, his differences appear to be small compared to the differences described in Appendix B of this paper.