Estimating Demand Uncertainty Using Judgmental Forecasts

4 downloads 0 Views 147KB Size Report
The Wharton School, University of Pennsylvania, Jon M. Huntsman Hall, 3730 Walnut Street,. Philadelphia, Pennsylvania 19104-6366, [email protected].
MANUFACTURING & SERVICE OPERATIONS MANAGEMENT

informs

Vol. 9, No. 4, Fall 2007, pp. 480–491 issn 1523-4614  eissn 1526-5498  07  0904  0480

®

doi 10.1287/msom.1060.0134 © 2007 INFORMS

Estimating Demand Uncertainty Using Judgmental Forecasts Vishal Gaur

Leonard N. Stern School of Business, New York University, 44 West Fourth Street, New York, New York 10012, [email protected]

Saravanan Kesavan, Ananth Raman

Harvard Business School, Morgan Hall, Soldiers Field, Boston, Massachusetts 02163 {[email protected], [email protected]}

Marshall L. Fisher

The Wharton School, University of Pennsylvania, Jon M. Huntsman Hall, 3730 Walnut Street, Philadelphia, Pennsylvania 19104-6366, fi[email protected]

M

easuring demand uncertainty is a key activity in supply chain planning, but it is difficult when demand history is unavailable, such as for new products. One method that can be applied in such cases uses dispersion among forecasting experts as a measure of demand uncertainty. This paper provides a test for this method and presents a heteroscedastic regression model for estimating the variance of demand using dispersion among experts’ forecasts and scale. We test this methodology using three data sets: demand data at item level, sales data at firm level for retailers, and sales data at firm level for manufacturers. We show that the variance of a random variable (demand and sales for our data sets) is positively correlated with both dispersion among experts’ forecasts and scale: The variance increases sublinearly with dispersion and more than linearly with scale. Further, we use longitudinal data sets with sales forecasts made three to nine months before the earnings report date for retailers and manufacturers to show that the effects of dispersion and scale on variance of forecast error are consistent over time. Key words: demand uncertainty; forecast dispersion; heteroscedasticity History: Received: April 18, 2005; accepted: July 26, 2006.

1.

Introduction

One of the earliest motivations for this method is suggested by the Delphi system of forecasting (Dalkey and Helmer 1963). More recently, several researchers have used the dispersion of expert forecasts as a proxy for standard deviation in different contexts; see for example, Ajinkya and Gift (1985) and MacCormack and Verganti (2003). However, this relationship has not been empirically validated thus far. Further, while the above papers use the dispersion of expert forecasts as a proxy for the standard deviation of a random variable, statistical methods to estimate standard deviation from dispersion among experts and other variables such as scale have not be studied. In this paper, we test the hypothesis that the variance of demand is positively correlated with dispersion among experts’ forecasts and provide a general methodology for using dispersion among experts’ fore-

Measuring demand uncertainty is a key activity in supply chain planning. When historical demand data are available, numerous statistical techniques can be applied to estimate demand uncertainty. However, in the absence of historical demand data, as in the case of new product introductions and short life-cycle seasonal items, other methods and data sets must be used. For example, Moe and Fader (2001, 2002) forecast the demand distribution for new products using historical sales data of other products and advance purchase orders for new products. Another method to estimate demand uncertainty is suggested by experts in forecasting. Experts on a panel assign point forecasts of demand for a given product, and the standard deviation of these forecasts, which is known as the dispersion among experts, can be used to estimate the standard deviation of demand. 480

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

casts and scale to estimate the variance of demand.1 We define scale as the mean forecast of demand and employ it in our model because the variance of demand is commonly known to increase with mean demand. The principal challenge in testing our hypothesis is that the variance of the demand is not known; it must be estimated from only a single realization of its value. We first propose a multiplicative functional form to represent the relationship between the variance of demand and dispersion among experts. We then construct a heteroscedastic regression model to test our hypothesis against a null hypothesis of no relationship. We apply this methodology to three data sets. The first, obtained from Sport Obermeyer, an Aspen, Colorado-based ski-wear manufacturer, records demand forecasts and actual demand for 248 items at the style-color level. The second data set, obtained from Institutional Brokers Estimate System (IBES), records experts’ forecasts of annual sales and realized annual sales for publically listed U.S. retailers for the years 1997–2004. We consider forecasts made at least 90 days prior to the announcement of actual sales and partition our data into seven time buckets, depending on the time difference between the forecast date and the earnings report date (when sales are revealed). We estimate our model for each of these time buckets and thus test the correlation between dispersion among experts and variance of sales over different time frames. The third data set, also obtained from IBES, pertains to public U.S. manufacturers and is analyzed in a similar way to the second data set. Our empirical results show that the variance of forecast error has a statistically significant positive correlation with dispersion among experts for all data sets. Testing this correlation using both nonparametric and parametric methods yields consistent results. We control for bias in forecasts as well as the effect of scale. As expected, scale is found to be statistically 1 To maintain conformity with the previous literature, we use both the terms standard deviation of demand and variance of demand throughout this paper. Note that statistical tests of variance are equivalent to statistical tests of standard deviation. We also use the term variance of demand interchangeably with variance of forecast error. Finally, we note that we use both demand and sales data in our analysis. When referring to the data sets together, we shall use the term demand interchangeably with sales.

481

significant. Dispersion among experts’ forecasts has a statistically significant positive correlation with variance of forecast error even after controlling for scale. We obtain 15 estimates of the exponents of dispersion and scale across the three data sets and different time buckets. The average values of the exponents are 0.466 and 1.493, respectively. Our paper is most closely related to Fisher and Raman (1996). Using data from Sport Obermeyer, Fisher and Raman propose using dispersion among experts’ forecasts to estimate the standard deviation of demand. They use the heuristic rule, si = 175Zi , where si is the estimated standard deviation of demand for item i, Zi is the standard deviation of experts’ forecasts for item i, and 1.75 is a parameter estimated from historical data for previous seasons. Fisher and Raman do not test the validity of this model statistically and do not show how to combine dispersion and other variables such as scale in a model for estimating demand uncertainty. Our paper addresses these questions and further shows that the parameter values hypothesized by Fisher and Raman are rejected by a likelihood ratio test (p < 001). Further, their method leads to underestimating the standard deviation of demand, and this error increases with scale. Previous researchers in operations management have proposed using scale as a variable to estimate the standard deviation of demand. For example, Silver et al. (1998) describe the model  = c1 ac2 , where  denotes the standard deviation of demand, a is the mean demand, and c1 and c2 are parameters. Our paper provides a methodology that can be used to estimate the parameters in this model. Further, this methodology can also be integrated with a causal forecasting model by including explanatory variables such as price, promotion, competing offerings, etc. Several research papers besides Fisher and Raman (1996) suggest a relationship between level of experts’ agreement and degree of uncertainty of the variable they are trying to predict. The Delphi system of forecasting, for example, is used widely to obtain consensus opinions by reducing divergence among experts’ views. Dalkey and Helmer (1963) note that the Delphi method often leaves some residual disagreement, which they interpret as a measure of the uncertainty of the variable being forecasted. Similarly, numerous papers discussing implementations of Delphi method-

482

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

ology state that it is not possible to obtain a consensus opinion due to various uncertainties faced by the forecasters (Milkovich et al. 1972, Munier and Ronde 2001). In a recent application of the Delphi method to new product development, MacCormack and Verganti (2003) use the residual variation in experts’ opinions; i.e., the variation that cannot be removed through the Delphi process, as a proxy for uncertainty around customers’ requirements for the product. Given the widespread use of judgmental forecasts in practice, there is a rich literature in experimental psychology on eliciting the standard deviation of a random variable from experts. Soll and Klayman (2004), for example, show experts to be grossly overconfident when estimating the confidence interval of a random variable. Our work augments this literature in that our methodology requires experts to provide only point estimates of mean, while the papers in experimental psychology have sought to elicit confidence intervals from experts. As in operations management, experts—referred to as analysts in the finance literature—are employed to forecast various financial measures for firms, and the characteristics of analysts’ forecasts, such as herding, bias, overreaction, and underreaction, have been studied in several papers employing IBES data; see Givoly and Lakonishok (1984) for an early survey and Brown (2000) for a recent annotated bibliography. Givoly (1985) states that analysts’ forecasts of earnings are rational, because they seem to incorporate all previous information contained in past earnings and their past forecasts. Malkiel (1982) and Ajinkya and Gift (1985) use dispersion among analysts’ earnings forecasts as a proxy for variance of earnings and find that it is a better measure of risk than historical beta. Butler and Lang (1991) show that analysts are persistently optimistic or pessimistic relative to consensus forecasts, but that there are no statistically significant differences in accuracy across analysts. De Bondt and Forbes (1999) define herding among analysts as excessive agreement—that is, a surprising degree of consensus relative to the predictability of earnings—and test for its presence in data taken from IBES for UK companies. The rest of the paper is organized as follows. The main hypothesis of the paper is presented in §2. Section 3 deals with the model that is used for empir-

ical analysis. The description of the data is presented in §4. Results are discussed in §5, and §6 concludes the paper.

2.

Hypothesis

The uncertainty of demand is often the result of many complex underlying processes that cannot be completely identified. This complexity also leads experts to use different information sets or forecasting models that capture different aspects of the information available for prediction. Hence, experts could offer different forecasts for the same random variable. More important, the same complexity that drives the uncertainty of demand also increases the level of disagreement among experts. Therefore, we hypothesize that the uncertainty of demand is positively correlated with the extent to which forecasting experts differ from each other. For example, suppose that the demand for a product is dependent on several factors, such as consumer tastes, weather patterns, fluctuations in the economy, alterations to the product as it evolves from design to production, etc. If some of these factors are uncertain, and experts have different views about them or each expert evaluates only a subset of the factors, then the forecasts given by experts will differ. This reasoning is consistent with the research literature in psychology and decision sciences. Experts are known to differ in their forecasts of a random variable. In fact, Hogarth (1978) suggests that the experts that are not strongly correlated with each other should be selected for forecasting, and the optimal number of experts that a decision maker should consult for forecasting is 8–12. Other researchers have sought to explain why experts differ in their forecasts. Summers et al. (2004) discuss how experts differ in terms of their functioning and experience; Armstrong (1986) states that experts differ when they make adjustments for recent events whose effects have not yet been observed; Flugstad and Windschitl (2003) show that even numerical estimates of probability can be interpreted in different ways depending on the reasons provided for arriving at the estimate. Our hypothesis can also be explained using the research of Simon (1976), who notes that human beings are bounded in their rationality, so complexity in decision making causes them to use heuristics to make satisficing decisions. As the underlying com-

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

483

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

Forecast error/mean forecast

Figure 1

Relationship Between Forecast Error and Dispersion Among Experts’ Forecasts for the Sport Obermeyer Data Set

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Dispersion among experts’ forecasts/mean forecast

plexity that drives the uncertainty of a random variable increases, experts need to increasingly rely on heuristics. Thus, use of different heuristics by experts would lead to heterogeneity in their decisions. As a result, the correlation between the uncertainty of a random variable and the degree of disagreement among experts would be explained by the complexity of the situation. The demand data from Sport Obermeyer illustrate that experts tend to be more accurate when they agree than when they do not agree. Figure 1 illustrates the relationship between forecast error and dispersion among experts’ forecasts controlling for scale. Here, the forecast error of demand is defined as the absolute value of difference between the realized value of demand and the experts’ mean forecast; dispersion among experts’ forecasts is defined as the standard deviation of the point forecasts of the different experts. We control for scale by dividing both forecast error of demand and dispersion among experts’ forecasts by the experts’ mean forecast. For clarity, data points have been combined into groups of four. (The data are first sorted on dispersion among experts’ forecasts and then combined into groups of four.) The figure shows a positive correlation ( = 0356, p < 001) between dispersion and forecast error. Motivated by this observation, we set up the following hypothesis: Hypothesis 1. The variance of forecast error is positively correlated with the dispersion among experts’ forecasts. We test this hypothesis against the null hypothesis, that the variance of forecast error is uncorrelated with the dispersion among experts’ forecasts.

We control for scale in testing our hypothesis because both the standard deviation of forecast error and the dispersion between experts’ forecasts could be increasing in scale. Thus, it is possible to observe a positive correlation between the standard deviation of forecast error and the dispersion among experts’ forecasts error just because both variables are correlated with scale (omitted variable bias). Using scale as a control variable in the model avoids this bias. We also control for the presence of bias in forecasts. Biased forecasts produce nonzero forecast errors (asymptotically) that affect the computation of standard deviation of forecast error; i.e., it is no longer valid to use the absolute value of forecast error as a single point estimate of the standard deviation of forecast error.

3.

Model

In this section we develop an empirical model to test the proposed hypothesis. We express the model in general terms because it could be applied to forecasts not only for demand or sales (as we do in this paper), but also for other random variables for which experts’ assessments are used. Let i index the set of random variables. Each random variable is forecast by several experts, and then its realization is observed. Thus, the ith observation includes the experts’ forecasts for and realization of the ith random variable. For example, each observation might refer to the demand realization for a different product. We do not require the set of experts to be the same for all random variables. Let Xij be the forecast for random variable i by i be the mean of Xij  across experts. expert j and X Let Zi denote the dispersion among experts’ forecasts,   i 2 /ni − 1 Xij − X Zi = j

where ni is the number of experts used for random variable i. Let Yi denote the actual realization of random variable i, i denote its standard deviation, i in our and Si denote the scale factor. We set Si = X analysis. We use a linear regression model to predict the random variable Yi . We assume that all experts are assigned equal weights in computing the forecast, so

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

484

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

the only explanatory variable used to predict Yi is the i . However, because the mean foremean forecast, X cast might be biased, we introduce intercept and slope parameters in the regression model. Thus, we have

3.2. Parametric Tests We consider the following heteroscedastic regression model:

i + i  Y i =  + X

i ∼ N 0 i2 

i + i  Yi =  + X

(1)

where i are independent normally distributed error terms with mean 0 and variance i2 . Here, we use the subscript i for i2 to model our hypothesis that the variance of the error term differs across observations. Hypothesis 1 implies that i2 is positively correlated with Zi . Next we describe nonparametric and parametric tests of our hypothesis in §§3.1 and 3.2. 3.1. Nonparametric Tests We perform the Goldfeld-Quandt test for heteroscedasticity (Greene 2003). To execute this test we rank the observations based on dispersion among experts’ forecasts. The data are then divided into two equal groups of high and low dispersion, and the hypothesis is tested by comparing the estimates of standard error obtained from the regressions on these two groups. We also perform the Spearman’s rank correlation test to estimate the correlation between forecast error and dispersion among experts’ forecasts. In this test, we first conduct an ordinary least squares regression of (1) using all observations. We define the absolute values of the residuals as forecast errors. We rank the forecast error for each observation and the corresponding dispersion among experts’ forecasts and compute the correlation between the rankings. Under the null hypothesis of homoscedastic errors, this correlation should be zero. However, under the alternative hypothesis, residuals become stochastically more variable as the dispersion among experts’ forecasts increases. Hence, the forecast error will be stochastically increasing in the dispersion among experts. Therefore, we expect to observe a positive correlation between forecast error and dispersion among experts’ forecasts under the alternative hypothesis. While the conclusions drawn from these tests are robust to erroneous assumptions that might otherwise be made in parametric tests, the weaknesses of these tests are that they do not yield a method for estimating i from Zi , and that they do not control for scale.

i2

1

1

2

(2) (3) 2

= e Zi  Si  + e 

(4)

where , , 1 , 2 , 1 , and 2 are unknown parameters; i denotes the forecast error; and the variance of the forecast error is modeled as a function of dispersion and scale. The hypothesis that the variance of Yi is proportional to the dispersion among experts implies that 1 > 0. The functional form (4) is well suited to represent uncertainty in our case for several reasons. First, in application to demand forecasting, it is common to model the variance of demand as having a multiplicative relationship with mean demand (see, for example, Silver et al. 1998, p. 126). Equation (4) is a natural extension of this model to incorporate dispersion. We use the form Zi 1 Si 2 to allow for the possibility of interaction between dispersion and scale. A pure multiplicative model has the drawback that observations having zero dispersion cannot be included in the estimation when the multiplicative model is transformed using logarithms. We avoid this possibility by adding the constant e2 to (4).2 Second, the functional form (4) admits many variations that can be estimated easily and have attractive large-sample properties. We tested a pure multiplicative model of variance as well as an additive model. Both these models provided statistically significant support for our hypothesis. The specifications of these models are provided in the appendix. We estimate model (2)–(4) by maximum likelihood estimation. Assuming that i are normally distributed, the log-likelihood function is given by     e2 1 n lni2  + i2  (5) ln2! − ln L = − 2 2 i i where ei are the residuals from (2) and n is the total number of observations. We use unconstrained nonlinear optimization in the statistical software, SAS/ IML, to maximize (5) with respect to , , 1 , 2 , 1 , and 2 . On testing with various starting values, 2 This modification is based on suggestions from an anonymous referee.

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

485

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

we obtained unique local maxima for all our data sets. These maximum likelihood estimators satisfy the regularity conditions required for asymptotic normality and efficiency (see Greene 2003, Theorem 4.18). Hence, the standard errors of the estimators are given by the inverse of the asymptotic Hessian matrix. We compute this inverse to test our hypotheses. We note that independence of i is not necessary for testing our methodology or employing it to estimate the variance of demand. Two types of correlations may be possible in our data sets: correlation across observations and correlation among experts for an observation. Correlation across observations can arise from same experts forecasting for more than one product (or firm) or from the presence of persistent biases in experts’ forecasts. This correlation can be modeled using an autocorrelated variance-covariance matrix for i . In the presence of simple autocorrelation models such as ARp, the estimates of  and  are unbiased and consistent (Greene 2003, Theorem 11.2). Hence, the computed residuals can be used to obtain estimates of 1 , 2 , 1 , and 2 that have the same limiting distributions as would be obtained with efficient estimators of  and . Therefore, even though the ordinary least squares estimates of  and  may be inefficient in the presence of autocorrelation resulting in inflated standard errors for those estimates, there is negligible effect on our hypothesis tests involving 1 , 2 , 1 , and 2 . Correlation among experts for an observation can arise due to herding behavior, use of common information, or differences in power and reputation. Such correlation would imply that dispersion among experts does not fully explain the variance of forecast error. Thus, it can be modeled by allowing i2 to be independent of Zi . Our model allows for this possibility because the portion of uncertainty not explained by dispersion among experts is captured by e2 . Our model differs slightly from heteroscedastic regression models used in the econometrics literature, because it contains both a multiplicative function and an additive term in the computation of i2 . The heteroscedasticity models studied previously include multiplicative heteroscedasticity (Harvey 1976), additive, power function, exponential, and groupwise. Judge et al. (1985) and Greene (2003) provide useful references and detailed descriptions of these models.

4.

Data Description

We use item-level demand data from a skiwear manufacturer, firm-level sales data for retail firms, and firm-level sales data for manufacturing firms to test our hypothesis. The first data set, called the Sport Obermeyer data set, contains style-color level forecasts and final demand information for 248 short lifecycle items for a selling season of about three months. The 248 items contain 56 unique styles and about four colors per style. Both the forecasts and the actual demand are measured in number of units of each item. The forecasts are done by members of a committee specifically constituted to forecast demand. This committee consisted of seven members: the president, a vice president, two designers, and the managers of marketing, production, and customer service (Fisher and Raman 1996). The committee members were expected to forecast independently without sharing any information among them. Raman et al. (2001) describe this forecasting process in detail. Table 1 presents summary statistics for the key variables relevant to our study. The average dispersion among experts is 37.6% of the mean forecast, and the average forecast error is 60.6% of forecasts, both values being of the same order of magnitude. The second and third data sets, which we call the retail sales data set and the manufacturing sales data set, respectively, are taken from the IBES, which provides analysts’ estimates of various financial measures of U.S. and international companies. IBES obtains data from multiple sources such as newswires (Bloomberg, Reuters, etc.), newspapers (Financial Times, local Table 1

Summary Statistics for Demand Data from Sport Obermeyer

Variable

Mean

Standard deviation

Minimum

Maximum

Actual sales, Yi (number of units item) Average forecast (scale), Xi (number of units of item) Coefficient of variation of forecasts, Zi /Xi (%) Absolute percentage error, Yi − Xi /Xi (%)

38215

34829

400

172000

36745

21471

6000

115167

376

144

56

950

606

509

04

3800

Note. Number of items = 248; number of experts who forecast for each item = 7.

486

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

financial news papers), and the contributing brokers. This database is widely used for research in finance and accounting, and hundreds of articles using this data set have been published in peer-reviewed journals; see Brown (2000) for references. IBES includes financial measures such as sales, earnings per share (EPS), book value per share, and cash flow per share for public firms registered on NYSE, AMEX, and NASDAQ for each fiscal year and quarter. Typically, analysts forecast these financial measures up to three years ahead. IBES contains two data sets, a detail history data set and a summary history data set. The latter contains summary statistics such as the mean and standard deviation of the analysts’ estimates available each month for the above-mentioned financial measures. Analysts are expected to update IBES with any revisions to their forecasts, and those who do not issue revised forecasts are presumed by IBES to stick to their previous forecasts. We merge the summary history data set with the detail history data set, which contains actual realizations of the financial measures. The data are obtained using the Wharton Research Data Services. We use forecasts of annual sales ($ million) of firms for the period 1997–2004 for our analysis (data prior to 1997 contain forecasts of earnings but not of sales). We consider forecasts that are made 90 days or more before the earnings report date (when sales are revealed) for each firm-year combination. We classify these data based on the time difference between the forecast date and the earnings report date into seven buckets ranging from three months prior to the earnings report date to nine or more months prior to the earnings report date. Thus, each observation in the data set is a firm-month-year combination. IBES uses a proprietary scheme to classify firms based on business lines into sector/industry/group (SIG) codes. We identified SIG codes that contain firms in the retail and manufacturing sectors, as shown in Tables 2(a) and 2(b), respectively. There are 13 SIG codes mapped to the retail sales data set and 140 SIG codes mapped to the manufacturing sales data set. Table 3(a) summarizes the final retail sales data set. It consists of 603–605 observations in each time bucket across 217 firms for the years 1998–2003. Not all firms have data available for each year. However,

Table 2a

Mapping of SIG Codes to Retail Data Set

SIG codes

Sector name

Industry name

40301, 40302

Consumer services Consumer services

Retailing, foods

40401–40406, 40408–40413

Table 2b

Retailing, goods

Examples of group names Grocery chains, food wholesalers Department stores, specialty retailers, variety chains, drug chains, electronics chains, etc.

Sample Mapping of SIG Codes to Manufacturing Data Set

SIG codes

Sector name

Examples of industry and group names

30001–39901

Consumer nondurables

50101–59901

Consumer durables Technology

Apparel, shoes, cosmetics, packaged foods, beverages, toys and games, etc. Auto, auto parts, home furnishings, tools and hardware, etc. Computer manufacturers, semiconductors, electronic devices, photo-optical equipment, etc. Chemical, containers, metal fabrication, fertilizers, steel, textile, etc. Defence, auto OEM, electrical, machinery, building material, etc.

80101–80302, 80701–81102 90201–90501, 90701–91201 100101–109901

Basic industries Capital goods

Note. OEM, original equipment manufacturers.

the same set of firms is represented in each time bucket. In other words, we consider only those firmyear combinations for which we have forecasts available up to nine months prior to the earnings report date. If a firm had forecasts going back less than nine months from the earnings report date for a particular year, then that firm-year was dropped form the data set to ensure comparability of results across time buckets. The number of experts in each observation ranges between 2 and 27, with an average of 5.1. The remaining rows of the table provide summary statistics for all the variables. The average dispersion among experts is 2.88% of forecasts, and average forecast error is 4.80% of forecasts. Table 3(b) summarizes the manufacturing sales data set. Here, we have 1,164–1,171 observations in each time bucket across 479 firms for the same set of years. As with the retail sales data set, not all firms have data available in each year. Further, only those firms were considered that had forecasts available up to nine months or more prior to the earnings report date. The average number of experts in an observation is 3.75.

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

487

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

Table 3a

Summary Statistics for Retail Sales Data Time difference between forecast date and earnings report date (months)

Variable

Statistic

3

4

5

6

7

8

9

Mean Std. deviation Minimum Maximum

215 603 53 38 2 24

217 605 52 38 2 24

217 605 52 38 2 27

217 605 52 37 2 26

217 605 50 35 2 25

217 605 49 35 2 25

215 603 49 33 2 22

Actual sales ($ million), Yi

Mean Std. deviation Minimum Maximum

7,761 21,328 69 258,681

7,737 21,296 69 258,681

7,737 21,296 69 258,681

7,737 21,296 69 258,681

7,737 21,296 69 258,681

7,737 21,296 69 258,681

7,737 21,296 69 258,681

Average forecast (scale, $ million), Xi

Mean Std. deviation Minimum Maximum

7,761 21,330 88 258,201

7,754 21,308 91 257,821

7,757 21,339 91 257,609

7,754 21,297 85 257,043

7,757 21,306 79 257,782

7,755 21,339 79 259,520

7,802 21,550 95 267,295

Coefficient of variation of forecasts, Zi /Xi (%)

Mean Std. deviation Minimum Maximum

23 37 002 411

25 32 004 285

27 36 006 330

28 37 002 340

32 46 00023 460

33 56 00013 800

34 56 00023 800

Absolute percentage error, Yi − Xi /Xi (%)

Mean Std. deviation Minimum Maximum

292 452 0000 4753

353 490 0001 4940

408 567 0007 4940

470 673 0001 6578

570 827 0001 8090

611 880 0011 8330

680 969 00001 8330

Number of firms represented Number of observations Number of experts per observation

The average dispersion among experts is 3.50% of forecasts, and average forecast error is 5.84% of forecasts.

5.

Results

This section presents our estimation results. We first note that the Goldfeld-Quandt test strongly rejects the null hypothesis of homoscedastic errors p < 0001 for all three data sets. The Spearman’s rank correlation between dispersion among experts and the absolute value of forecast error is 0.385 for the Sport Obermeyer data set, 0.504 for the retail sales data set, and 0.219 for the manufacturing sales data set. These correlations are also statistically significant p < 00001. Hence, both nonparametric tests support our hypothesis that the variance of forecast error increases with dispersion among experts’ forecasts. The rest of this section discusses the results of the parametric tests. Tables 4, 5, and 6 present the coefficients’ estimates and corresponding standard errors for the three data sets. For the Sport Obermeyer data set, the estimated coefficient for dispersion among experts, 1 , is 0.516 (statistically significant at p = 004), and the estimated

coefficient for scale, 2 , is 1.204 (statistically significant at p < 00001). For the remaining two data sets, both 1 and 2 are statistically significant at p < 00001. The estimate of 1 ranges between 0.287 and 1.216 for the retail sales data set and between 0.316 and 0.436 for the manufacturing sales data set. The estimate of 2 ranges between 1.203 and 1.622, and between 1.575 and 1.783 for the retail sales and the manufacturing sales data sets, respectively. These results consistently support our hypothesis that the variance of Yi is positively correlated with dispersion among experts’ forecasts. Further, dispersion among experts’ forecasts has significant explanatory power over and above scale. The estimates of 1 and 2 show that variance increases sublinearly with dispersion and that it increases more than linearly with mean demand in almost all cases. It is useful to compare the values of coefficients’ estimates for our data sets with assumptions made in previous research with respect to demand uncertainty. For example, Fisher and Raman (1996) and MacCormack and Verganti (2003) assume direct pro-

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

488

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

Table 3b

Summary Statistics for Manufacturing Sales Data Time difference between forecast date and earnings report date (months)

Variable

Statistic

3

4

5

6

7

8

9

Mean Std. deviation Minimum Maximum

479 1,165 4.03 2.54 2 22

479 1,165 3.94 2.46 2 22

479 1,165 3.88 2.39 2 22

478 1,167 3.52 2.20 2 24

479 1,164 3.76 2.37 2 22

479 1,165 3.72 2.33 2 21

479 1,171 3.38 2.08 2 20

Actual sales ($ million), Yi

Mean Std. deviation Minimum Maximum

7,208 19,077 18.3 186,763

7,208 19,077 18.3 186,763

7,208 19,077 18.3 186,763

7,139 18,918 18.3 186,763

7,214 19,084 18.3 186,763

7,208 19,077 18.3 186,763

7,264 19,076 18.3 186,763

Average forecast (scale, $ million), Xi

Mean Std. deviation Minimum Maximum

7,001 17,593 17.40 164,174

7,039 17,801 16.73 164,101

7,049 17,808 16.73 186,763

6,964 17,546 16.73 163,999

7,061 17,857 16.73 168,862

7,050 17,852 16.73 168,058

7,062 17,822 16.73 167,798

Coefficient of variation of forecasts, Zi /Xi (%)

Mean Std. deviation Minimum Maximum

3.1 4.9 0.0064 52

3.2 4.6 0.0049 52

3.3 4.7 0.0049 52

3.5 4.8 0.0049 52

3.7 5.0 0.01 52

3.8 5.0 0.01 52

3.9 5.0 0.01 52

Absolute percentage error, Yi − Xi /Xi (%)

Mean Std. deviation Minimum Maximum

4.06 6.90 0.0001 79

4.74 7.14 0.0001 79

5.11 7.35 0.0007 79

5.71 7.68 0.0080 82

6.60 8.33 0.0080 82

7.10 8.78 0.0008 82

7.58 9.15 0.0007 79

Number of firms represented Number of observations Number of experts per observation

portionality between the standard deviation of forecast error and dispersion among experts; i.e., 1 = 1, 2 = 0. A likelihood ratio test comparing the parameter estimates of Fisher and Raman with ours shows our estimates to be significantly better p < 001. Figure 2 compares the estimates of standard deviation obtained from our model with those obtained by Fisher and Raman (1996) for the Sport Obermeyer data set. Notice that our model yields larger values of standard deviation than Fisher and Raman for most of the products. This implies that in inventory planTable 4 Parameter 

1 1 2

2 ∗

MLE Results for the Sport Obermeyer Data Estimate

Std. error

−1725 109∗∗ 162 0516∗ 1204∗∗ −302

19665 0081 1581 0252 0273 74,647.828

Statistically significant at 0.05. Statistically significant at 0.01.

∗∗

ning, their model would result in underestimating the safety stock requirement compared with our model. One reason our estimates of standard deviation differ from those given by Fisher and Raman is that our model includes scale while their model does not. Thus, we expect the difference between the models to be more pronounced for items with large mean demand. Silver et al. (1998) have assumed that the standard deviation of forecast error is a power function of scale. Our results augment their model by incorporating dispersion and also providing an estimate of the coefficient of scale. Finally, Tables 3(a) and 3(b) show that the coefficient of variation of dispersion decreases with time, going from 3.4% to 2.3% for the retail sales data set and from 3.9% to 3.1% for the manufacturing sales data set as the length of the forecast interval shrinks from nine months to three months. This decrease shows that experts agree with each other more and more closely as we approach the financial statements report date. Likewise, the coefficient of variation of sales shown in Tables 5 and 6 also decreases with time, from 10.2%

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

489

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

Table 5

MLE Results for the Retail Sales Data Set Time difference between forecast date and earnings report date (in months)

Parameter

3

4

5

6

7

8

9

1756 1685 1001∗∗ 0001

−0448 1083

0656 1467

1587 1850

2818 2223

3683 2324

6222 2853

1000∗∗ 0002

0998∗∗ 0002

1001∗∗ 0003

1000∗∗ 0004

1000∗∗ 0004

0999∗∗ 0004

1

−5632∗∗ 0589

−2535∗∗ 0387

−2431∗∗ 0419

−3518∗∗ 0430

−1685∗∗ 0387

−1685∗∗ 0387

−1241∗∗ 0402

1

1216∗∗ 0073 1270∗∗ 0086 5827∗∗ 0125

0616∗∗ 0058 1267∗∗ 0067 2947∗∗ 0517

0736∗∗ 0062 1203∗∗ 0072 4226∗∗ 0356

0287∗∗ 0060 1622∗∗ 0074 4578∗∗ 0383

0301∗∗ 0053 1438∗∗ 0065 4692∗∗ 0514

0301∗∗ 0053 1438∗∗ 0065 4692∗∗ 0514

0345∗∗ 0053 1371∗∗ 0066 5388∗∗ 0457

2.3

2.5

2.6

2.8

3.2

3.3

3.4

4.7

5.4

5.9

7.0

9.3

9.5

10.2



2

2 Average of [dispersion/mean forecast] (%) Average CV of sales (%) ∗∗

Notes. (i) Statistically significant at 0.01. (ii) The numbers in parentheses below the parameter estimates are the respective standard errors. (iii) Average CV (coefficient of variation) of sales is computed using (4) as the average of e 1 Zi 1 Si 2 + e 2 05 / + Xi  across all observations.

to 4.7% for the retail sales data set and from 11.3% to 7.5% for the manufacturing sales data set. This evidence is consistent with assumptions in the new product development literature that demand uncertainty gets resolved with the passage of time (see, for Table 6

example, Terwiesch and Loch 1999 and Cohen et al. 2003). However, as far as we know, there has been no empirical test for uncertainty resolution over time. A longitudinal analysis using our methodology provides the means to test whether uncertainty is resolved with

MLE Results for the Manufacturing Sales Data Set Time difference between forecast date and earnings report date (in months)

Parameter

3

4

5

6

7

8

9

−0400 0707 1008∗∗ 0002

−0095 0784

−0070 0869

−0353 0975

−0826 0920

−0135 0790

0154 0737

1007∗∗ 0002

1006∗∗ 0003

1007∗∗ 0003

1009∗∗ 0003

1008∗∗ 0003

1008∗∗ 0003

1

−5114∗∗ 0302

−4216∗∗ 0293

−4093∗∗ 0299

−3738∗∗ 0297

−3230∗∗ 0272

−2576∗∗ 0253

−2474∗∗ 0251

1

0409∗∗ 0035

0436∗∗ 0038

0410∗∗ 0039

0415∗∗ 0038

0335∗∗ 0037

0316∗∗ 0036

0347∗∗ 0037

2

1783∗∗ 0049

1669∗∗ 0049

1676∗∗ 0050

1642∗∗ 0050

1647∗∗ 0047

1591∗∗ 0045

1575∗∗ 0046

2

3389∗∗ 0231 3.1

3481∗∗ 0258 3.2

3760∗∗ 0245 3.3

4004∗∗ 0245 3.5

3301∗∗ 0375 3.8

1573 1140 3.8

0309 3132 3.9

7.5

8.0

8.6

9.1

10.2

11.0

11.4



Average of [dispersion/mean forecast] (%) Average CV of sales (%)

Notes. (i) ∗∗ Statistically significant at 0.01. (ii) The numbers in parentheses below the parameter estimates are the respective standard errors. (iii) Average CV (coefficient of variation) of sales is computed using (4) as the average of e 1 Zi 1 Si 2 + e 2 05 / + Xi  across all observations.

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

490

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

Estimated standard deviation from model (4)

Figure 2

Comparison of Results with Fisher and Raman (1996)

800 700 600 500 400 300 200 100 0 0

100

200

300

400

500

600

700

800

Estimated standard deviation Notes. The straight line in the graph represents a 45 slope line. The standard deviation estimated from model (4) lies above this line for 191 of 248 items.

time and to determine the rate at which uncertainty gets resolved.

6.

Conclusions

Knowledge of the variance of demand is required in applications of many models of decision making under uncertainty, but estimating it remains a challenge. Our paper provides a method to estimate variance using dispersion among point forecasts obtained from multiple experts and managers. It also shows empirically that dispersion among experts’ forecasts is a good measure of demand uncertainty and can be augmented with scale in an estimation model. Our findings are particularly useful for forecasting demand for new products because they do not have demand history. Besides the empirical evidence reported in this paper, we have tested our hypothesis using a data set containing forecasts of firms’ EPS, obtained from the IBES database. Using about 25,000 observations across 18 years, from 1976 to 1993, we found that the standard deviation of forecast error was positively correlated with dispersion among experts in each year. However, we do not include the results of this analysis in the paper, because it is well known that EPS data can be managed by firms to reflect analysts’ forecasts, making it harder to interpret the results in

this context. To our knowledge, no such drawbacks have been reported for sales data. Whereas we considered only two explanatory variables, dispersion and scale, it is possible to extend our model to a larger set of variables. Uncertainty of demand might be correlated with other factors such as firm-specific risk, market uncertainty, macroeconomic conditions, and product characteristics (e.g., basic versus fashion apparel). For example, given the mean forecast for an item and dispersion among experts, the variance of the forecast error may increase with the volatility of a market index. To a certain extent, dispersion among experts would capture uncertainties related to such factors, because experts should consider all available relevant information (Makridakis and Wheelwright 1987) in forecasting demand. However, our model can easily be augmented by including additional factors to estimate their effects on demand uncertainty. Our study has some limitations that may be examined in future research. First, many aspects of analysts’ behavior have been discovered in the finance literature using data on forecasts of EPS. These include herding behavior, overreaction and underreaction to information, bias, cultural effects, etc. Such behavioral characteristics may be examined for sales forecasts. Second, we assume all analysts to be equally capable. Thus, we do not consider the effects of the composition of analysts’ committees to derive our results. Given suitable data, it would be easy to extend our methodology to analyze the quality of predictions made by different analysts and thus compare their capabilities. This would require forecasts of several random variables by the same set of analysts. Third, our model should be applied to other data sets, especially for item-level sales data, to test its applicability in different situations. Acknowledgments

The authors gratefully acknowledge James Zeitler, Research Database Analyst at Harvard Business School, for help in obtaining and preparing the data for this study. The authors are also thankful to Aleda Roth (Special Issue Guest Editor), the senior editor, two anonymous referees, Professor William Greene, and seminar participants at the 2004 POMS Annual Conference and Harvard Business School Supply Chain Reading Group and Work-in-Progress sessions for numerous helpful comments that led to significant improvements in this paper.

Gaur et al.: Estimating Demand Uncertainty Using Judgmental Forecasts

Manufacturing & Service Operations Management 9(4), pp. 480–491, © 2007 INFORMS

Appendix

In a previous version of this paper, we used the following model for i2 instead of (4): i2 = 2 Zi 1 Si 2 

(6)

This model is practically appealing because it is more parsimonious than (4) and because it implies that lnZi  and lnSi  can directly be used as proxies for lni2  in empirical research without having to explicitly estimate i2 . However, as noted in §3.2, this model cannot be applied if Zi = 0 for some observations. Thus, we did not use this model in the main text of the paper. The estimates of 1 and 2 obtained from (6) are presented in the online appendix available on the Manufacturing & Service Operations Management website (http://msom.pubs.informs.org/ecompanion. html) and are consistent with those obtained from (4). We also tested our hypothesis using the following additive model as an alternative to the multiplicative model (4): i + i Yi = a + b X i ∼ N 0 i2  i2

(7)

i   =  + 1 Zi + 2 X 2

We found that the estimates of both 1 and 2 are greater than zero for this model for all cases in our data sets p < 001, again providing statistical support to our hypothesis. The maximum log likelihood values for this model are also close to those for the multiplicative model. Thus, this model may be used as an alternative to model (4) in an implementation. However, we prefer (4) to (7), because (7) imposes constraints on the coefficients of parameters that lack economic interpretation. Specifically, expanding the quadratic expression for i2 yields six terms that are generated from three parameters. We also find that the coefficients’ estimates for (7) vary across the three data sets much more than the coefficients’ estimates for (4). The online appendix presents the maximum likelihood estimates for (7) for all three data sets.

References Ajinkya, B. J., M. J. Gift. 1985. Dispersion of financial analysts’ earnings forecasts and the (option model) implied standard deviations of stock returns. J. Finance 40(5) 1353–1365. Armstrong, S. J. 1986. Research on forecasting: A quarter-century review, 1960–1984. Interfaces 16(1) 89–103. Brown, L. D., ed. 2000. I/B/E/S Research Bibliography, 6th ed. I/B/E/S International Inc., New York. Butler, K. C., L. H. P. Lang. 1991. The forecast accuracy of individual analysts: Evidence of systematic optimism and pessimism. J. Accounting Res. 29(1) 150–156. Cohen, M. A., T. H. Ho, Z. J. Ren, C. Terwiesch. 2003. Measuring imputed cost in the semiconductor equipment supply chain. Management Sci. 49(12) 1653–1670.

491

Dalkey, N., O. Helmer. 1963. An experimental application of the Delphi method to the use of experts. Management Sci. 9(3) 458–467. De Bondt, W. F. M., W. P. Forbes. 1999. Herding in analyst earnings forecast: Evidence from the United Kingdom. Eur. Financial Management 5(2) 143–163. Fisher, M. L., A. Raman. 1996. Reducing the cost of demand uncertainty through accurate response to early sales. Oper. Res. 44(1) 87–99. Flugstad, A. R., P. D. Windschitl. 2003. The influence of reasons on interpretations of probability forecasts. J. Behav. Decision Making 16(2) 107–126. Givoly, D. 1985. The formation of earnings expectations. Accounting Rev. 60(3) 372–386. Givoly, D., J. Lakonishok. 1984. Properties of analysts’ forecasts of earnings: A review and analysis of the research. J. Accounting Literature 3 117–152. Greene, W. H. 2003. Econometric Analysis, 5th ed. Prentice Hall, Upper Saddle River, NJ. Harvey, A. C. 1976. Estimating regression models with multiplicative heteroscedasticity. Econometrica 44(3) 461–465. Hogarth, R. M. 1978. A note on aggregating opinions. Organ. Behav. Human Performance 21(1) 40–52. Judge, G. G., W. E. Griffiths, R. Carter Hill, H. Lutkepohl, T. C. Lee. 1985. The Theory and Practice of Econometrics, 2nd ed. John Wiley & Sons, New York. MacCormack, A., R. Verganti. 2003. Managing the sources of uncertainty: Matching process and context in software development. J. Product Innovation Management 20 217–232. Makridakis, S., S. C. Wheelwright. 1987. The Handbook of Forecasting: A Manager’s Guide, 2nd ed. John Wiley & Sons, New York. Malkiel, B. G. 1982. Risk and return: A new look. NBER Working Paper W0700. http://ssrn.com/abstract=227532. Milkovich, G. T., A. J. Annoni, T. A. Mahoney. 1972. The use of the Delphi procedures in manpower forecasting. Management Sci. 19(4) 381–388. Moe, W. W., P. S. Fader. 2001. Modeling hedonic portfolio products: A joint segmentation analysis of music CD sales. J. Marketing Res. 38(3) 376–385. Moe, W. W., P. S. Fader. 2002. Using advance purchase orders to forecast new product sales. Marketing Sci. 21(3) 347–364. Munier, F., P. Ronde. 2001. The role of knowledge codification in the emergence of consensus under uncertainty: Empirical analysis and policy implications. Res. Policy 30(9) 1537–1551. Raman, A., A. McClelland, M. L. Fisher. 2001. Supply chain management at World Co. Ltd. HBS Case 9-601-072, Harvard Business School, Boston, MA. Silver, E. A., D. F. Pyke, R. Peterson. 1998. Inventory Management and Production Planning and Scheduling, 3rd ed. John Wiley & Sons, New York. Simon, Herbert. 1976. Administrative Behavior, 3rd ed. Free Press, New York. Soll, J. B., J. Klayman. 2004. Overconfidence in interval estimates. J. Experiment. Psych. 30 299–314. Summers, B., T. Williamson, D. Read. 2004. Does method of acquisitsion affect the quality of expert judgment? A comparison of education with on-the-job learning. J. Occupational Organ. Psych. 77(2) 237–258. Terwiesch, C., C. H. Loch. 1999. Measuring the effectiveness of overlapping development activities. Management Sci. 45(4) 455–465.