International Journal of Research in Marketing 35 (2018) 394–414

Contents lists available at ScienceDirect

IJRM International Journal of Research in Marketing journal homepage: www.elsevier.com/locate/ijresmar

Full Length Article

Estimating time-varying parameters in brand choice models: A semiparametric approach Daniel Guhl a, Bernhard Baumgartner b, Thomas Kneib c, Winfried J. Steiner d,⁎ a

Humboldt University Berlin, Institute of Marketing, School of Business and Economics, Spandauer Straße 1, 10178 Berlin, Germany University of Osnabrück, Department of Marketing, Rolandstraße 8, 49069 Osnabrück, Germany Georg-August-Universität Göttingen, Department of Statistics and Econometrics, Humboldtallee 3, 37073 Göttingen, Germany d Clausthal University of Technology, Department of Marketing, Julius-Albert-Straße 2, 38678 Clausthal-Zellerfeld, Germany b c

a r t i c l e

i n f o

Article history: First received on January 30, 2010 and was under review for 7 months Available online 6 August 2018 Guest Area Editor: Harald J. Van Heerde Keywords: Brand choice modeling Time-varying parameters Heterogeneity Semiparametric regression P(enalized) splines

a b s t r a c t Nowadays, brand choice models are standard tools in quantitative marketing. In most applications, parameters representing brand intercepts and covariate effects are assumed to be constant over time. However, marketing theories, as well as the experience of marketing practitioners, suggest the existence of trends or short-term variations in particular parameters. Hence, having constant parameters over time is a highly restrictive assumption, which is not necessarily justiﬁed in a marketing context and may lead to biased inferences and misleading managerial insights. In this paper, we develop ﬂexible, heterogeneous multinomial logit models based on penalized splines to estimate time-varying parameters. The estimation procedure is fully data-driven, determining the ﬂexible function estimates and the corresponding degree of smoothness in a uniﬁed approach. The model ﬂexibly accounts for parameter dynamics without any prior knowledge needed by the analyst or decision maker. Thus, we position our approach as an exploratory tool that can uncover interesting and managerially relevant parameter paths from the data without imposing assumptions on their shape and smoothness. Our approach further allows for heterogeneity in all parameters by additively decomposing parameter variation into time variation (at the population level) and cross-sectional heterogeneity (at the individual household level). It comprises models without time-varying parameters or heterogeneity, as well as random walk parameter evolutions used in recent state space models, as special cases. The results of our extensive model comparison suggest that models considering parameter dynamics and household heterogeneity outperform less complex models regarding ﬁt and predictive validity. Although models with random walk dynamics for brand intercepts and covariate effects perform well, the proposed semiparametric approach still provides a higher predictive validity for two of the three data sets analyzed. For joint estimation of all regression coefﬁcients and hyperparameters, we employ the publicly available software BayesX, making the proposed approach directly applicable. © 2018 Elsevier B.V. All rights reserved.

⁎ Corresponding author. E-mail addresses: [email protected] (D. Guhl), [email protected] (B. Baumgartner), [email protected] (T. Kneib), [email protected] (W.J. Steiner).

https://doi.org/10.1016/j.ijresmar.2018.03.003 0167-8116/© 2018 Elsevier B.V. All rights reserved.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

395

1. Introduction The marketing literature comprises a large number of applications of discrete choice models directed at explaining consumer brand choices (see Russell, 2014, for a recent overview). Within the class of discrete choice models, the multinomial logit (MNL) model has been applied so regularly that it is called the “workhorse model” of marketing today (Rossi, Allenby, & McCulloch, 2005, p. 35). The MNL model is frequently applied to data from consumer or household1 scanner panels collected during observation periods reaching from six months to several years (Bronnenberg, Kruger, & Mela, 2008). The deterministic utility function of the MNL model captures the inﬂuence of variables, which are supposed to be the drivers of consumers' brand choice behavior. In the context of panel data, a consumer's utility of buying a certain brand is typically assumed to depend on brands' actual prices and related reference price terms, promotional activities (e.g., displays and feature advertising), current brand loyalties, and alternativespeciﬁc intercepts representing intrinsic brand utilities (Guadagni & Little, 1983). In addition, consumer heterogeneity plays a major role in marketing (Allenby & Rossi, 1999), and hence nowadays, almost all versions of the MNL model also account for unobserved heterogeneity, leading to the so-called mixed logit (MXL) model (Train, 2009). In the majority of publications, estimated parameters reﬂecting the inﬂuence of those predictors on brand choice as well as estimated brand intercepts have been assumed to be constant over time, i.e., equal across all purchase occasions. However, marketing literature as well as experiences reported by marketing practitioners suggests the possibility of changing consumer choice behavior over time. The effects of marketing variables might change because of many different reasons. During an economic downturn, e.g., price sensitivity may increase (Gordon, Goldfarb, & Li, 2013) and consumers may increasingly search for price deals advertised by features and displays. Price sensitivity may also vary depending on the intensity of and time since previous promotional activities in the product category (e.g., Foekens, Leeﬂang, & Wittink, 1999; Kopalle, Mela, & Marsh, 1999). Moreover, an advertising campaign may improve a brand's awareness and image or its perceived quality over time with the result of a higher intrinsic brand utility. Advertising may as well decrease price sensitivity (e.g., Boulding, Lee, & Staelin, 1994; Kaul & Wittink, 1995). Importantly, advertising activities are typically not recorded in panel data. The brand intercepts of a model, often referred to as brand preferences, represent the intrinsic utility of a brand net of (possibly changing) marketing mix effects and can also be interpreted as the utility-based brand value (Kamakura & Russell, 1993). Brand intercepts might also evolve over time, because consumers' brand choice may be affected by situational factors associated with the personal consultation with salespeople, out-of-stock situations, or different usage situations (Miller & Ginter, 1979; Srivastava, Shocker, & Day, 1978). Furthermore, marketing practitioners report an increase in the demand for higher-tier brands in certain product categories (e.g., coffee, chocolate) in the run-up to special events like Christmas, Easter, or Mother's day. In contrast to potential long-term trends in consumer choice behavior due to, e.g., advertising campaigns, situational factors will probably result in short-term ﬂuctuations of parameters.2 In all these cases, a more ﬂexible model speciﬁcation allowing for time-varying brand intercepts and time-varying effects of covariates is presumed to provide a better explanation and prediction of consumer choice behavior as compared to a model with constant parameters only. We will explore this potential improvement especially regarding prediction accuracy in holdout samples in our empirical application. That way, we are able to validate whether changes in consumer behavior across time are inherent in our data at hand. From a managerial point of view, an unexpected short-term increase in demand for a speciﬁc brand can cause an out-of-stock situation resulting in decreased proﬁts and dissatisﬁed customers. An unexpected decrease in demand, on the other hand, may lead to increased inventories or deterioration of goods and, therefore, to higher costs. For this reason, it is essential to know whether and when consumers may vary in their sensitivity to price and promotional activities or in their brand preferences even if the reasons/causes for the observed variations are not fully understood. Managers can also learn which recurring events (e.g., festive occasions) are actually important and how they inﬂuence demand (e.g., via changes in intrinsic brand utilities or changes in marketing-mix sensitivities). This information can then be potentially used for future marketing strategies (e.g., exploiting peak-demand). Ignoring time-varying effects concerning brand loyalties or consumer response to promotional activities may further mask (potentially harmful) trends in a product category like decreasing loyalties or increased bargain hunting. Further, changes in intrinsic brand utilities can be an indication of changes in the competitive structure between brands. For example, if the perceived quality of an established brand increases over time (e.g., via advertising investments), it may become a competitor for the higher-tier brands in the category. While our approach does not explicitly address the supply side, timevarying effects of marketing-mix variables also imply changes in optimal marketing policies over time. The model proposed in this paper can support managers in uncovering time-varying parameters and may serve as a basis to improve related marketing decisions. To be precise, we do not claim to present a model that incorporates all effects that can cause dynamics explicitly because the list of potential candidate variables is long and apparently depends on the context of the speciﬁc research question, the industry and/or product category, and/or the characteristics of the data set. We instead recommend

1

In the following, we will use the terms consumer and household interchangeably. We use the terms short- and long-term to conceptually differentiate between effects on the weekly or bi-weekly level and ones on a longer time-scale (e.g., half a year or longer). This deﬁnition is fairly arbitrary and is used to simplify verbal interpretations. It is, however, not related to the deﬁnition used in persistence modeling (see, e.g., Dekimpe & Hanssens, 2004 for an overview). 2

396

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

a fully data-driven approach as an exploratory tool, where we let the data determine the shape and smoothness of parameter evolutions over time. Our approach is to understand in a ﬁrst step whether a meaningful variation of utility parameters over time exists and how corresponding parameter paths look like. Once empirical evidence for dynamics has been found, approaches for modeling the cause of those dynamics may be added in a second step. On the other hand, there may be situations where (only) measuring dynamics without explaining them is already sufﬁcient (e.g., for the sake of brand value monitoring, see Sriram, Balachander, & Kalwani, 2007). Further, additional information for explaining time-varying effects may not be available (e.g., advertising activities). Stated differently, we propose an exploratory tool for understanding parameter dynamics even in cases either none or only limited prior knowledge regarding the shape and functional form of the time-varying functions exists, and/or when data for covariates driving changes in parameters over time is not available. Some papers have already considered the possibility of time-varying parameters in brand choice behavior, and we will present in the next section an overview and discuss strengths and weaknesses of those approaches. Importantly, many of these models (e.g., Mela, Gupta, & Lehmann, 1997) are inherently parametric, i.e., all covariate effects are summarized in a linear predictor for each of the brands. For an exploratory approach, it seems however promising to consider nonparametric models that provide ﬂexible time-effect curves (Stremersch & Lemmens, 2009). Nonparametric models do not only allow for smooth time-varying effect curves but also make more efﬁcient use of the data available by borrowing strength from neighboring time intervals. Baumgartner (2003) already proposed a nonparametric MNL model to estimate time-varying brand intercepts. A drawback of his approach is that the researcher has to predetermine the appropriate level of smoothness by optimizing information or cross validation criteria. Using his approach to estimate time-varying parameters for brand intercepts and covariates would require extensive search procedures in multidimensional parameter spaces. Two other papers, which are competing with our approach, are the papers of Kim, Menzefricke, and Feinberg (2005) and Lachaab, Ansari, Jedidi, and Trabelsi (2006). The authors of both papers independently propose a state-space approach, allowing (population) parameters to evolve across time in a (heterogeneous) choice model. State-space models are closely related to the nonparametric models considered in this paper. In particular, the best ﬁtting dynamic speciﬁcation in Lachaab et al. (2006) (i.e., the random walk model) can be interpreted as a special case of our approach since it is included in our framework. In the following, we propose MXL models with varying (population) parameters to uncover time-varying effects in consumer brand choice. Both time-varying brand intercepts and time-varying effects of covariates are modeled based on penalized splines, a ﬂexible yet parsimonious nonparametric smoothing technique (Eilers & Marx, 1996). Our estimation procedure is likelihood-based, and regression parameters are obtained by a penalized Fisher scoring procedure making the approximate covariance matrix available for the construction of credible intervals. Smoothing parameters governing the variability of each of the nonparametric function estimates for the covariates are derived from an approximate marginal likelihood procedure. As a consequence, there is no need for extensive search procedures to determine the appropriate amount of smoothness, and estimation is fully data-driven determining the ﬂexible function estimates as well as the corresponding degrees of smoothness in a uniﬁed approach. Also, the researcher can use different settings with respect to the number of knots, the degree of the spline, and/or the type of penalty term for each time-varying parameter. Finally, our approach is implemented using free software (R and BayesX).3 The remainder of the paper is organized as follows: In Section 2, we review the literature on discrete choice models with timevarying parameters as well as the literature on semi-/nonparametric models in marketing. In Section 3, we brieﬂy present the standard MNL model and introduce more ﬂexible model variants based on penalized splines for the estimation of time-varying parameters and/or consumer heterogeneity. We further illustrate how our spline approach is connected to the state-space approaches of Kim et al. (2005) and Lachaab et al. (2006). Subsequently, in Section 4, we demonstrate our methodology with three empirical applications. To provide managerial implications, we also focus on brand level results including market share considerations. We conclude in Section 5 with a summary of the paper's key ﬁndings and an outlook on future research perspectives. 2. Related literature 2.1. Discrete choice models with time-varying parameters All the papers reviewed in this subsection use scanner panel data to estimate brand choice models in which (at least some) parameters vary over time. As our focus is on continuous parameter evolutions instead of particular states, we do not consider hidden Markov models (Netzer, Lattin, & Srinivasan, 2008) or regime-switching models (Park & Gupta, 2011). We further exclude from this overview papers dealing with other forms of dynamics, such as state dependence due to inertia, habit persistence or carry over effects (Dubé, Hitsch, & Rossi, 2010; Guadagni & Little, 1983; Keane, 1997), serially correlated utility errors (Allenby & Lenk, 1995; Haaijer & Wedel, 20014), or learning (e.g., Erdem & Keane, 1996). Finally, we do not explicitly discuss models that have addressed varying parameters over choice tasks in stated preference settings (e.g., Hasegawa, Terui, & Allenby, 2012),5 or that focus on binary choice 3 In Web Appendix C, we have provided computer code together with a detailed description of commands to illustrate how models within our framework can be estimated. 4 Haaijer and Wedel (2001) use a probit model with a ﬂexible error structure that allows for correlations over time. Different dynamics are nested within their framework, e.g., autoregressive utilities or autoregressive parameters with or without error correlations, respectively. The authors apply their approach to household panel data and compare in- and out-of-sample ﬁts of different speciﬁcations. The dynamic probit model needs equally spaced time series observations for all households, a requirement which is not met in our empirical applications. For this reason, we abstain from comparing the dynamic probit model with our approach. 5 Nevertheless, the model of Hasegawa et al. (2012) who analyze brand satiation in a choice experiment is interesting because it accounts for individual dynamics in brand preferences. Their best-ﬁtting model includes an individual-speciﬁc quadratic trend to represent time-variation in the (baseline) utility. We picked up this idea and consider a similar speciﬁcation as benchmark model in our empirical application (Section 4) by adding individual-speciﬁc quadratic trends in the brand intercepts.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

397

settings such as churn (Holtrop, Wieringa, Gijsenberg, & Verhoef, 2017) or payment default (Zhao, Zhao, & Song, 2009) in customer relationship management. The (selection of) papers summarized in Table 1 can be classiﬁed along several dimensions: (1) whether the choice model speciﬁcation is logit or probit, (2) whether unobserved consumer heterogeneity is accounted for, (3) whether all or only a subset of parameters are allowed to vary over time, and (4) the chosen modeling and estimation framework for deriving time-varying parameters. We discuss each dimension to position our approach. 1. Roughly equal shares of the papers use either the logit model or the probit model. Probit models do not suffer from the IIA property and they can deal more easily with unobserved heterogeneity using a normal distribution. In addition, it is more straightforward to specify a probit model within a state-space framework and apply Kalman ﬁlter (KF) and Kalman smoother (KS) algorithms for estimating time-varying parameters (Lachaab et al., 2006; Rutz & Sonnier, 2011). However, computing choice probabilities is computationally demanding for more than 4 or 5 alternatives per choice set. Logit probabilities, on the other hand, can be computed in closed form and in case unobserved heterogeneity is accounted for, the IIA property is not a (serious) issue in logit models. 2. Except for Baumgartner (2003), all researchers model unobserved heterogeneity. Mela et al. (1997) and Heilman et al. (2000) use a discrete heterogeneity speciﬁcation (Kamakura & Russell, 1989), whereas other researchers rely on a continuous heterogeneity speciﬁcation (Gönül & Srinivasan, 1993). Consumer heterogeneity plays a very important role in marketing, and nowadays many marketing scholars favor the continuous speciﬁcation (Allenby & Rossi, 1999). 3. In some papers, all utility parameters vary over time (Jedidi et al., 1999). In other cases, depending on the research question at hand, only marketing-mix parameters (Papatla & Krishnamurti, 1996) or only brand intercepts (Baumgartner, 2003) vary. It seems difﬁcult to justify a priori why only some utility parameters should be allowed to vary over time while others are kept ﬁxed (except for empirical reasons). Hence, a less restrictive model that allows all parameters to be time-varying is preferable in general, particularly in an exploratory approach. 4. The papers use different frameworks for modeling and estimating time-varying parameters, and Van Heerde, Mela, and Manchanda (2004) provide a detailed discussion of related strengths and weaknesses. The papers can be differentiated according to (a) whether they model parameters as a function of observed time-varying covariates (“process functions”, Leeﬂang, Wittink, Wedel, & Naert, 2000), or (b) if they explicitly yield parameter paths. The former stream, also referred to as reparametrization approach, is easy to understand and can be estimated with standard software tools if suitable covariates for reparametrization are available. For example, Heilman et al. (2000) explain the evolution of brand preferences over time with the experience of a consumer by modeling brand preferences as to depend on (the logarithm of) related category purchases, while Jedidi et al. (1999) model utility parameters as functions of long-term advertising, long-term promotion, and loyalty. Using reparametrization approaches, however, “[o]nly the variance of the parameter estimate is computed; thus, it is not possible to reconstruct the parameter paths over time” (Van Heerde et al., 2004, p. 167). Other approaches explicitly yield parameter paths. Mela et al. (1997) rely on moving windows to obtain quarterly time-varying parameters for price and promotion effects, and Gordon et al. (2013) introduce interaction effects between the price variable and quarter-indicators. Importantly, a sufﬁcient number of observations for each time window is necessary here to estimate parameters reliably. More efﬁcient are state-space models6 (e.g., Kim et al., 2005; Lachaab et al., 2006; Rutz & Sonnier, 2011) and nonparametric regression techniques (e.g., Baumgartner, 2003). Both approaches are well suited for modeling time-varying parameters in an explorative fashion because they are (a) highly ﬂexible, (b) impose only a few a priori restrictions on the parameter paths, and (c) are easy to interpret (Lachaab et al., 2006). However, to the best of our knowledge, there is no standard software for estimating choice models with time-varying parameters based on the state-space approach. In contrast, the proposed penalized spline approach is easy to estimate using the free software package BayesX (Brezger, Kneib, & Lang, 2005), which further has an R-interface R2BayesX (Umlauf, Adler, Kneib, Lang, & Zeileis, 2015). Another feature of our approach is that depending on the chosen number of knots, degree of the spline, and/or type of penalty term more or less smooth parameter paths can be obtained (the latter may also guard against overﬁtting problems). Once parameter paths have been determined based on approaches of this second stream, additional covariates (if available) can be considered in a further step to analyze if drivers for the parameter evolutions can be identiﬁed. For example, Gordon et al. (2013) regressed estimated price elasticities on macroeconomic growth variables (e.g., GDP) in a second step. Besides, the data set lengths also differ considerably across papers. The shortest data set spans 52 weeks (Baumgartner, 2003), whereas the longest data set spans over 8 years (Mela et al., 1997).7 The periodicity of the time-varying parameters seems to be related to the length of the data set and varies between weekly (e.g., Baumgartner, 2003) to quarterly (e.g., Lachaab et al., 2006). In case a reparametrization is used (e.g., Heilman et al., 2000), parameters could (potentially) vary for each purchase occasion, no matter when it takes place. From a modeler's perspective, an approach is highly attractive if the periodicity can be adjusted easily

6 In recent years there was a strong trend in Marketing to use state-space models and the KF for dealing with time-varying parameters. We focus our attention here to the subset of discrete choice models. There is a much larger body of literature that relies these modern time series techniques to estimating aggregate demand models (e.g., Neelamegham & Chintagunta, 2004; Osinga et al., 2010; van Heerde et al., 2004) or to time-varying advertising response models (e.g. Bass, Bruce, Majumdar, & Murthi, 2007; Bruce, Peters, & Naik, 2012; Naik, Mantrala, & Sawyer, 1998). See Leeﬂang et al. (2009) for an overview. 7 Scanner panel data sets with a time dimension of several years are now widely available in academia and practice, see, e.g., the IRI marketing data set (Bronnenberg et al., 2008), or the recently published consumer panel by Nielsen in cooperation with the Kilts Center for Marketing, University of Chicago, Booth School of Business.

398

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Table 1 Overview of discrete choice models with time-varying parameters. Study

Choice model

Unobserved heterogeneity

Time-varying parameter(s)

Modeling and estimation approach

Papatla and Krishnamurti (1996)

Probit

Yes

Marketing-mix

Mela et al. (1997)

Logit

Jedidi, Mela, and Gupta (1999)

Probit

Yes (2 classes) Marginal price- and promotion effects Yes All utility parameters

Reparametrization: expectations-based approach that relies on time-varying covariates Moving window analysis that yields parameter paths + AR (1)-regression on second stage Reparametrization: expectations-based approach that relies on time-varying covariates Reparametrization: expectations-based approach that relies on time-varying covariates Smoothing splines: nonparametric approach that yields parameter paths; level of smoothness needs to be predetermined State-space approach: method that yields parameter paths; different models for parameter dynamics (e.g., random walk or VAR(1)) State-space approach: method that yields parameter paths; different models for parameter dynamics (e.g., random walk or VAR(1)) State-space approach: method that yields parameter paths; different models for parameter dynamics (e.g., random walk or VAR(1)); may include exogenous variables Interaction of quarter dummies and price that yields a path for the price effect only + linear regression of time-varying price elasticity on economic factors (e.g., GDP) P(enalized)-splines: nonparametric approach that yields parameter paths; ﬂexible function estimates and level of smoothness are determined simultaneously; different models for parameter dynamics depending on the number of knots, degree of spline, and type of roughness penalty

Heilman, Bowman, and Wright (2000) Logit

Yes (3 classes) All utility parameters

Baumgartner (2003)

Logit

No

Brand intercepts

Kim et al. (2005)

Logit

Yes

All utility parameters

Lachaab et al. (2006)

Probit

Yes

All utility parameters

Rutz and Sonnier (2011)

Probit

Yes

Latent brand factors

Gordon et al. (2013)

Logit

Yes

Price parameter and elasticity

This study

Logit

Yes

All utility parameters

to reﬂect what is best for the research question at hand in combination with the available data. The proposed penalized spline approach possesses this feature, as we will explain in Section 3. In sum, although several papers in marketing have addressed time-varying parameters in brand choice models, the number still is fairly small. The existing approaches are diverse and have different strengths and weaknesses, as described above. Our approach can be characterized in the following way: it is based on the wide-spread logit model, parameter paths over time can be explicitly estimated, it can deal with consumer heterogeneity, it can cope with either short or long time series, and it provides high ﬂexibility to let the data determine the shape of parameter evolutions over time in an explorative way through a number of additional spline setting options. Having such a ﬂexible and adaptive model over time can be especially useful in situations where data sets have long time dimensions, in turbulent markets, for new or changing product categories, and/or when short- and long-term effects are both at work. In addition, improving marketing activities (e.g., pricing, promotion, advertising, etc.) in response to changing utility parameters does not necessarily require the knowledge of the causes for those changes, as long as a timely and precise measurement of parameter evolutions is feasible. In particular, the ﬂexibility to measure parameter paths as accurate as possible can be considered the strength of the proposed approach. Nevertheless, a potential disadvantage of our approach is that splines cannot easily incorporate additional variables in a dynamic fashion like state-space models and the KF (e.g., see the model extension in Rutz & Sonnier, 2011). Therefore, in order to explain what drives the parameter paths, further analysis becomes necessary by using the time-varying parameters estimated in the ﬁrst step (or transformations hereof) as dependent variables in a second step (similar to Gordon et al., 2013).8 The fact that our approach is easily accessible using freely available software (BayesX) may constitute an additional beneﬁt for practitioners and researchers who want to estimate discrete choice models with timevarying parameters.

2.2. Non- and semiparametric models in marketing Given the fact that we advocate using splines for modeling time-varying parameters in choice models, a general brief discussion of non- and semiparametric regression approaches in marketing is in order. Using ﬂexible regression methods (e.g., kernel estimators, spline estimators, or k-nearest neighbor estimators) is not new in marketing (see Leeﬂang et al., 2000 for an overview). Several papers analyze nonlinear pricing and promotional effects on aggregated brand sales or market shares (see, e.g., Kalyanam & Shively, 1998; Van Heerde, Leeﬂang, & Wittink, 2001; Hruschka, 2002; Martínez-Ruiz, Mollá-Descals, Gómez-Borja, & Rojo-Álvarez, 2006; Steiner, Brezger, & Belitz, 2007; Brezger & Steiner, 2008; Lang, Steiner, Weber, & Wechselberger, 2015; Weber, Steiner, & Lang, 2017), some others non-linear effects in brand choice models (Abe, 1995; Abe, 1999; Abe, Boztuğ, & Hildebrandt, 2004; 8

Of course, similar to Papatla and Krishnamurti (1996), it is possible to directly add time-varying covariates in our model.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

399

Briesch, Chintagunta, & Matzkin, 2002; Kim, Menzefricke, & Feinberg, 2007; Kneib, Baumgartner, & Steiner, 2007; Schindler, Baumgartner, & Hruschka, 2007). Furthermore, Baumgartner and Hruschka (2005) study the allocation of catalogs to customers, Steiner, Siems, Weber, and Guhl (2014) apply nonparametric regression to the ﬁeld of customer satisfaction research, and Haupt, Kagerer, and Steiner (2014) propose ﬂexible semiparametric quantile regression models. For studying customer defection, Singh and Jain (2014) employ the semiparametric proportional hazard model of Lillard (1993), where the baseline hazard rate is modeled by piecewise linear splines. Shively, Allenby, and Kohn (2000) introduce a nonparametric approach to identifying latent relationships in hierarchical choice models. Four other studies are closely related to our research because they ﬂexibly model changes of parameters over time, although not in the context of brand choice modeling. Sloot, Fok, and Verhoef (2006) use cubic splines for modeling the short- and longterm effects of an assortment reduction on category sales. Stremersch and Lemmens (2009) apply penalized splines in the context of pharmaceutical marketing. They model time-varying effects of explanatory variables on aggregated new drug sales to better understand regulatory regimes. Lemmens, Croux, and Stremersch (2012) employ penalized splines for modeling ﬂexible growth effects of several products in multiple countries in a hidden Markov model. Kumar, Choi, and Greene (2017) propose a model with time-varying effects using penalized splines to analyze synergetic effects of social media and traditional marketing on ice-cream brand sales. The general advantage of ﬂexible non- or semiparametric approaches is that they do not impose any speciﬁc shape or functional form to the data. This might be crucial because wrong assumptions can lead to incorrect conclusions and false implications. Instead, ﬂexible methods “let the data speak” and hence minimize the risk of using a wrong speciﬁcation. However, more ﬂexibility comes at the cost of needing greater sample sizes to estimate the models. The mentioned studies have clearly established the value of non- and semiparametric models in marketing. Particularly when prior knowledge about the functional relationship is scarce, ﬂexible methods are highly valuable. Our exploratory approach for modeling and estimating time-varying parameters in brand choice models follows exactly this stream. 3. Methodology 3.1. Models 3.1.1. Standard MNL model In brand choice modeling, the MNL model is typically motivated from considering latent utilities describing the beneﬁt of purchasing a speciﬁc brand. In our data, we repeatedly observe purchases of households over a certain time span represented by a categorical response variable Yit ∈ Cit, where Cit = {1,…,k} represents the set of brands available at the store visited by household i (i = 1, …, n) at time t (t ∈ Ti ⊂ 1, …, T). Each household's choice is then associated with a set of k utility functions L(r) it , r = 1, …, (r) k, which are composed of a deterministic part η(r) it reﬂecting the inﬂuence of relevant covariates, and a random part it : ðr Þ

ðr Þ

Lit ¼ ηit þ

ðr Þ it

¼α

ðr Þ

ðr Þ 0

þ xit β þ

ðr Þ it :

ð1Þ

The parameters α(r) represent the brand intercepts (i.e., the intrinsic brand utilities or utility-based brand values), and the pa(r) rameter vector β captures effects of covariates (e.g., price or promotional activities) collected in the vector x(r) it . it is an error term accounting for the presence of unobservable inﬂuences in a household's brand choice decision. Assuming that consumers are util(r) ity maximizers and the error terms (r) it are i.i.d. standard extreme value distributed, the conditional choice probability πit of household i for brand r at time t is given by the well-known MNL equation (see, e.g., McFadden, 1974):

ðr Þ πit

ðkÞ

ðr Þ exp ηit ; ¼ Xk−1 ðr0Þ 1þ exp ηit r0¼1

ð1Þ

ðk−1Þ

πit ¼ 1−πit −…−πit

:

r ¼ 1; …; k−1; and

ð2Þ

ð3Þ

Because of identiﬁability reasons, only k − 1 brand intercepts can be estimated. Hence, without loss of generality, we choose (r) (r) brand k as the reference category and assume α(k) = 0 and x(k) it = 0. The latter can be achieved by simply redeﬁning xit as xit − (k) xit , i.e., we only consider contrasts to the reference category. One shortcoming of this basic model formulation is that it completely ignores the time-dependency of brand choice decisions and the repeated measurements of the same households. This has two immediate implications: observations are treated as independent, i.e., as if all observations were collected from different individuals, and parameters are assumed to be constant throughout the entire observation period. Therefore, we introduce time-varying parameters, heterogeneity, and the combination of both in the MNL model in three steps.

400

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

3.1.2. MNL-TVP model We extend Eq. (1) to consider time-varying brand intercepts and covariate effects. Speciﬁcally, the brand intercepts α(r) and the time-constant effects β of covariates are replaced by time-varying functions f(r) 0 (t) and fj(t), leading to the utility function ðr Þ

ðr Þ

ηit ¼ f 0 ðt Þ þ

J X

ðr Þ

xitj f j ðt Þ;

ð4Þ

j¼1

where j = 1, …, J denotes the covariates included in the model. That way, brand intercepts are allowed to vary over time, reﬂecting changes in intrinsic brand utilities that can be induced by either long-term trends or short-term ﬂuctuations in brand choice behavior. The effects of covariates might change over time because of long-term macroeconomic developments (e.g., Gordon et al., 2013 ﬁnd that price sensitivity is countercyclical) or because of short-term seasonal effects (e.g., Meza & Sudhir, 2006 ﬁnd that the effects of price and promotion variables temporarily increase during periods of peak demand). Within this framework, we will be able to investigate whether intrinsic brand utilities, price sensitivity, effects of promotional instruments, or effects of brand loyalty are changing over time. We refer to the time-varying parameter model (MNL-TVP) in Eq. (4) as semiparametric model, since the time-varying functions are modeled nonparametrically via penalized splines (see below), while the error term follows a parametric distribution. Similar to the basic parametric model, identiﬁability restrictions have to be imposed in the more ﬂexible model variants. Again, (k) we assume brand k to represent the reference category and, therefore, f(k) 0 (t) = 0 and xitj = 0. 3.1.3. MXL model Next, we add household heterogeneity to the MNL model, which yields the MXL model (see Train, 2009 for an overview). We employ a continuous heterogeneity speciﬁcation where random effects are assumed to follow i.i.d. Gaussian distributions. Heterogeneity is introduced in brand values by employing household-speciﬁc brand intercepts α(r) i , as well as in the covariate effects, i.e., by replacing common effects β with household-speciﬁc effects βi = β + bi. This leads to the utility function ðr Þ

ηit ¼ α

ðr Þ

ðr Þ

ðr Þ 0

ðr Þ 0

ð5Þ

þ α i þ xit β þ xit bi ;

∼ i. i. d. N(0, σ2r ) and bij ∼ i. i. d. N(0,σ2j ). Note that α(r) is now the mean intrinsic brand utility. All household-speciﬁc with α(r) i parameters are assumed to be the same across purchase occasions of the same household. 3.1.4. MXL-TVP model Both previously presented model extensions can be combined to a MXL model with time-varying parameters (MXL-TVP). In this case the utility function becomes ðr Þ

ðr Þ

ðr Þ

ηit ¼ α i þ f 0 ðt Þ þ

J X

ðr Þ

ðr Þ 0

xitj f j ðt Þ þ xit bi :

ð6Þ

j¼1

The very general model structure allows for differences in parameters across households and differences in the means of the heterogeneity distributions over time. It therefore additively decomposes parameter variation into time variation (at the population level) and cross-sectional variation. This setup is very similar compared to Kim et al. (2005, p. 283) and Lachaab et al. (2006, p. 61). These authors explicitly employ VAR models and variants hereof for the time-varying means of the heterogeneous parameters. However, in our case the speciﬁc model for the parameter dynamics depends on the deﬁnition of the splines (degree, number of knots) and the penalization, which we will discuss in the next section. In the context of aggregate-level data (e.g., sales or market share data) some authors have considered heterogeneous dynamics for different brands, categories, or stores (e.g., Van Heerde et al., 2004, Ataman, van Heerde, & Mela, 2010, or Osinga, Leeﬂang, & Wieringa, 2010). Therefore, heterogeneity and time variation have been decoupled in those approaches. However, in these cases, each heterogeneous unit has many observations (typically 50 to 250, and typically observations for each time period) which strongly simpliﬁes the inference of “individual” dynamics. In contrast, household panel data is usually rather sparse in the time dimension, and hence estimating “truly” individual time-varying parameters for each household is a much more difﬁcult task. In particular, the risk of overﬁtting is high with sparse data due to much less degrees of freedom. We leave this issue for future research. 3.2. Penalized splines To model any of the time-varying functions f(t) (for the sake of simplicity, we drop the covariate indices), we consider penalized splines, a ﬂexible nonparametric regression technique popularized by Eilers and Marx (1996). The idea is to represent the time-varying effects in terms of a high-dimensional parametric basis and to add an appropriate penalty term to the likelihood for the sake of regularization. Particularly, we employ polynomial splines of degree l to represent f(t), leading to

f ðt Þ ¼

M X m¼1

l

γ m Bm ðt Þ;

ð7Þ

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

401

with Blm (m = 1, …, M) representing B-spline basis functions (we refer to de Boor, 2001 as key reference for B-splines), and γm denoting the regression parameter to be estimated for the m-th B-spline basis function. Basically, polynomial splines of degree l form a class of piecewise polynomial functions under the additional condition that the piecewise polynomials are fused smoothly at the interval boundaries (also referred to as knots), such that the resulting function is l − 1 times continuously differentiable (Dierckx, 1993). Thus, two quantities characterize a polynomial spline ﬁt: the number of intervals that are reﬂected in the number of basis functions M, and the degree l that determines the function's overall smoothness properties. For the latter, a default choice are cubic splines (i.e., l = 3), leading to twice continuously differentiable functions. However, to get a model which is similar to the one proposed by Lachaab et al. (2006), we also use splines with l = 0 (i.e., piecewise constant step functions). The main difﬁculty lies in choosing an appropriate number of intervals. If the number is too large, the resulting ﬁt becomes overly ﬂexible leading to overﬁtting and, as an ultimate consequence, to a model with probably insufﬁcient predictive power when applied to new data. On the other hand, if the number of intervals is too small, the resulting function might not be ﬂexible enough to capture, e.g., short-term ﬂuctuations in consumers' choice behavior. As a remedy, Eilers and Marx (1996) have suggested on the one hand to use a moderately large number of intervals to ensure enough ﬂexibility for the unknown functions, and on the other hand to add a penalty term to the likelihood to enforce sufﬁcient smoothness and to avoid overﬁtting. We use equidistant intervals and 0.3 ∙ T knots as a heuristic default choice in our applications, where T is the number of time periods (e.g., weeks in our applications), and we round the resulting number of knots to the nearest integer. To assess the sensitivity of the estimation results regarding this default choice, we also use a higher number of knots (0.5 ∙ T). To mimic the random walk model of Lachaab et al. (2006) we choose one knot for each week. In this case, the basis functions basically indicate speciﬁc points in time, such that the difference penalty (see below) operates directly on the function evaluations. From a Bayesian perspective, this is equivalent to a random walk (see Lang & Brezger, 2004 for details). To regularize the estimation problem and to ensure that the estimated function ^f ðtÞ is ﬂexible but not “too ﬂexible”, the likelihood function is augmented by a penalty term. A suitable penalty term can be derived from squared p-th order derivatives and, according to B-spline theory (see, e.g., de Boor, 2001), we can approximate the derivative penalty with a roughness penalty based on ﬁrst or second order differences on adjacent regression parameters γm, leading to λ

M X

2

ðγ m −γm−1 Þ

or λ

m¼2

M X

2

ð8Þ

ðγ m −2γm−1 þ γm−2 Þ ;

m¼3

for ﬁrst order or second order differences, respectively. The smoothing parameter λ controls the trade-off between ﬂexibility (λ small) and smoothness (λ large) of the penalized spline. For the discussion of statistical inference in the next section, the compact representation of the difference penalties in terms of quadratic forms λγ′Pγ will be helpful, where P = D′D corresponds to the penalty matrix constructed from the ﬁrst or second order difference matrix. Our default choice is the second order difference penalty term. In contrast, for the Lachaab et al. (2006) speciﬁcations we use the ﬁrst order difference penalty term. The discussion shows that penalized splines seem very suitable for modeling time-varying parameters within an exploratory approach because different speciﬁcations depending on the settings for the degree of the spline, type of roughness penalty, and number of intervals (knots) can be applied and tested. Note that the speciﬁcations can further be different for each parameter in the model (e.g., more variability in brand intercepts, but smoother parameter paths for covariates). The best ﬁtting dynamic speciﬁcation in Lachaab et al. (2006), i.e., the random walk model, is just a special case of the penalized spline approach considering zero degree splines with knots at each observed point in time and a ﬁrst order difference penalty term. Our framework can therefore be considered as a kind of toolbox that enables researchers and analysts to uncover parameter dynamics in brand choice data in an explorative way and, if necessary, to adapt the model speciﬁcation to cope as best as possible with the characteristics of the data set at hand. We will compare several dynamic speciﬁcations (as well as combinations of different dynamic speciﬁcations for subsets of parameters) in Section 4. 3.3. Statistical inference Statistical inference of the MXL-TVP model is based on maximizing the following penalized (log-)likelihood9

lpen ðγ; α; bÞ ¼ lðγ; α; bÞ−

k1 X r¼1

ðr Þ

ðr Þ0

ðr Þ

λ0 γ 0 P 0 γ 0 −

J X j¼1

0

λ jγ jP jγ j−

J k1 X 1 ðr Þ0 ðr Þ X 1 0 α α − b jb j 2 σ 2j r¼1 σ r j¼1

ð9Þ

(k−1) ,γ1,…,γJ) contains the regression parameters associwhere l(γ,α, b) denotes the model's (log-)likelihood, and γ = (γ(1) 0 , …,γ0 (k−1) (1) (k−1) (t), …, f (t) and f (t), …, f (t). The vectors α = (α(1) ,…, α(k−1) ) and ated with the time-varying effects in f(1) 0 0 1 J 1 ,…, αn ,…, α1 n b = (b11, …, b1J, …, bn1, …, bnJ) contain the household-speciﬁc brand intercepts and covariate effects, respectively. Please note that each time-varying brand intercept and covariate effect is assigned a separate smoothing parameter allowing for different amounts of smoothness for each of the functions. Also note that for the heterogeneity parameters the smoothing constant is directly interpretable as the inverse variance of the normal distribution of each parameter. Given the smoothing parameters and the random effects variances, estimation of the penalized model can be achieved by a simple modiﬁcation of the usual Fisher

9

Simpler models (e.g., MNL, MNL-TVP, or MXL) would omit some or all penalty terms in Eq. (9).

402

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

scoring algorithm. To fully automate the estimation routine (given the choice for the number of knots, degree of spline, and type of penalization), optimal smoothing parameters and random effects variances have to be provided to the Fisher scoring algorithm and therefore the determination of the smoothing parameters is a crucial step in the estimation procedure. We apply a likelihood-based approach that originates from the close connection between penalized likelihood estimation and mixed models. This approach has received considerable attention in the statistical literature (see, e.g., Ruppert, Wand, & Carroll, 2003, or Fahrmeir, Kneib, & Lang, 2004 for overviews). Kneib et al. (2007) transferred the approach to ﬂexible brand choice models in order to accommodate nonlinear price-utility effects but did neither consider time-varying parameters nor the inclusion of household-speciﬁc heterogeneity. In the mixed model formulation of nonparametric regression models, marginal likelihood estimation can be applied for the joint estimation of regression effects and hyperparameters. Iterative updates of the regression parameters and the hyperparameters yield penalized maximum likelihood estimates upon convergence. Based on large sample theory and approximate normality of the estimates, signiﬁcance tests and credible intervals can be constructed (see Web Appendix A for technical details). 3.4. Fit measures We deliberately use several ﬁt measures to evaluate the model performance, because each measure has different properties and it is a priori unclear, which the single best measure is (Gneiting & Raftery, 2007). In particular, ﬁt and predictive validity are measured in terms of the log-Likelihood (log-Lik), Brier score, spherical score, and (using a scoring rule on the aggregate level) in terms of the average root mean squared error (ARMSE) between actual and predicted brand shares. The measures are calculated as follows (Kneib et al., 2007): log−Lik ¼

n X X

^ ðitr Þ ; log π

ð10Þ

i¼1 t∈T i

Brier score ¼ −

n XX k 2 X ðr Þ ^ ðitrÞ ; yit −π

ð11Þ

i¼1 t∈T i r¼1

Spherical score ¼

i¼1

^ ðitr Þ π rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ ; and Pk ðrÞ 2 t∈T i ^ π r¼1 it

n X X

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u Τ k u X 1X ðr Þ ðr Þ 2 t1 st −^st ; ARMSE ¼ k r¼1 T t¼1

ð12Þ

ð13Þ

where r∗ denotes the brand that household i has chosen on time t and y(r) it is the binary choice indicator observed for brand r Þ ^ ðr and household i at time t. Whereas the log-Likelihood only considers the logðπ it Þ-terms of brands chosen by households (but not of the brands which were not chosen by households), Brier and spherical scores utilize the entire predictive distribution of all ^ ðkÞ Þ. Stated differently, the log-Likelihood does not fully exploit the information contained in ^ ¼ ðπ ^ ð1Þ ; …; π choice probabilities π the predictive distribution and is therefore rather sensitive to extreme observations. ðrÞ The predicted shares of the brands in week t (^st ) are calculated from the predicted brand choice probabilities of the households in week t. To avoid artifacts introduced by weeks with only a very small number of purchases, squared errors are weighted by the number of purchases in the corresponding week. 4. Empirical study 4.1. Data In this section, the previously presented models are applied to a scanner panel data set referring to the product category ketchup.10 The data set belongs to the Nielsen ERIM single source data base and covers 26,820 purchase acts of 2,494 households for three ketchup brands across 2.5 years (134 weeks) in Sioux Falls, SD, USA.11 We subset the sample to households that made at least 10 ketchup purchases and bought at least once in each of the ﬁve half-year periods. Hence, we restrict our analysis to

10 Web Appendix B contains two additional empirical applications for the product categories detergent and cola. We omit them here because of space constraints. The relative performance of the competing types of models (excluding vs. including heterogeneity, excluding versus including time-variation) is replicated across the three applications, however, in each case a different model speciﬁcation performs best. 11 The data was provided by the James M. Kilts Center, GSB, University of Chicago.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

403

households with product class experience that report regularly (see for a similar approach Rutz & Sonnier, 2011).12 This minimizes the risk that parameters dynamics are confounded by heterogeneity and/or sample composition over time. This data pruning step results in a sample of 502 households making 10,292 purchases. On average each household made 20 purchases, and the median interpurchase time is about 7 weeks. The data set contains the dates of purchasing and choice of brands of each household, as well as observed (paid) prices and promotional activities. For model evaluation, we randomly divided the total number of purchase acts (less the number of purchase acts used for initialization) into two halves (see Table 2), estimated the model based on the ﬁrst subset (the estimation sample) and predicted consumer choices for the second subset (the validation sample). This enables us to employ a prediction-oriented approach for model evaluation, i.e., a model with higher complexity should be preferred only if it outperforms a more parsimonious model in validation samples. Please note that we did not split the data in the time dimension, since as compared to parametric approaches it is not possible or at least difﬁcult to predict new time observations with the semiparametric approach. Stated otherwise, without specifying additional assumptions or an additional reparametrization step there is no information ex ante how the parameter paths will evolve. On the other hand, our random split of purchasing acts across the whole estimation time-period allows us to determine whether the variation of parameters over time is a robust result and not just an artifact of data pathologies, such as reporting errors (Einav, Leibtag, & Nevo, 2010), which is important for an exploratory approach. Fig. 1 shows for each brand weekly time-series plots for shares, prices, and promotion variables. The plots indicate several interesting features of the ketchup data set: (a) It contains a lot of variation over time. (b) The price levels of Del Monte and Hunt's decrease during the ﬁrst half of the time window, while the price level of Heinz increases in the second half. (c) Brand shares seem to be rather stable, but whereas between weeks 52 and 78 the brands Del Monte and Hunt's have almost the same brand share, before and after this time window Hunt's seems to be the stronger brand. (d) Promotional intensities of the brands vary over time. This combination of changing marketing-mix variables and stable brand shares for some periods, as well as changing brand shares together with relatively stable prices for other periods implies that brand values and marketing-mix sensitivities should be changing over time. Therefore, the data set is suited for applying time-varying parameter brand choice models. 4.2. Speciﬁcation of covariates In contrast to observed prices and promotional activities, which are directly contained in our data, brand loyalties and reference prices have to be computed from each consumer's purchase history. Both inherently dynamic covariates capture (at least some) temporal correlation between purchases of households and have proven their ability to increase model ﬁt and predictive validity (Ailawadi, Gedenk, & Neslin, 1999; Kalyanaram & Winer, 1995). Following Guadagni and Little (1983), we recursively calculated loyalty values by exponentially smoothing past purchases Yi,(r)τ−1 of brand r made by household i at purchase occasion τ − 1 using smoothing constant ϑloy according to13: ðr Þ ðr Þ ðr Þ loyaltyiτ ¼ ϑloy loyaltyi;τ−1 þ 1−ϑloy Y i;τ−1 ;

0≤ϑloy ≤1

ð14Þ

(r) 14 where Y(r) This exponential smoothing i, τ−1 = 1 if household i purchased brand r on her/his last store visit (otherwise Yi, τ−1 = 0). is very popular in the marketing literature for capturing brand loyalty (Ailawadi et al., 1999). Importantly, Kim et al. (2005), Lachaab et al. (2006), and Rutz and Sonnier (2011) did not incorporate a loyalty variable in their models. To the best of our knowledge, our study is the ﬁrst that tries to estimate time-varying loyalty effects in brand choice models. Reference price terms are also regularly included in brand choice models (see Kalyanaram & Winer, 1995). Reference prices constitute internal prices consumers expect at a purchase occasion and compare observed prices to. Observed prices exceeding the reference price are perceived as losses and may deter from purchasing, while observed prices below the reference price are perceived as gains and presumably stimulate purchases. Prospect theory postulates asymmetric effects of gains and losses (Kahneman & Tversky, 1979; Winer, 1986). To account for possibly asymmetric reference price effects, two additional price terms corresponding to losses and gains need to be included in the brand choice model. Again, Kim et al. (2005), Lachaab et al. (2006) and Rutz and Sonnier (2011) refrain from incorporating reference price variables. Hence, to the best of our knowledge, this study is also the ﬁrst that models time-varying reference price effects.

12 In general, we recommend not using data sets with only a few observations per household (e.g., b4) for heterogeneous models, in particular when the data set is randomly split into two halves for estimation and validation. In the latter case, each household must appear in both the estimation and validation sample which becomes difﬁcult with sparse data at the individual household level. 13 Note that each purchase occasion τ of a household can be related to a speciﬁc time t, but a household does not necessarily purchase every time period (e.g., week) or can even purchase multiple times within a time period. Stated otherwise, the model framework does not require a balanced panel. 14 Instead of setting the smoothing constant to a ﬁxed value (e.g., Gupta, 1988), we estimated a value of 0.74 for ϑloy by a grid search within the interval [0,1] based on the standard MNL model with constant effects. This value is comparable to the results of other papers that used the same data set (e.g., Keane, 1997; Seetharaman, 2004), and is in line with results from other data sets (e.g., Briesch et al., 1997). Although an estimation of ϑloy using a model with time-varying parameters would be better, the considerable increase in the computational effort prevents us from doing so.

404

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Table 2 Summary statistics for the ketchup data set (502 households; 134 weeks). Brand name

No. purchases (estimation*)

No. purchases (validation*)

Price ($-cent per oz.)

Promotion (% of purchases)

Mean

sd

Display

Feature

Del Monte Heinz Hunt's Total

416 3855 882 5153

449 3796 894 5139

3.651 4.071 3.570

0.751 0.889 0.648

0.050 0.090 0.055

0.116 0.269 0.107

Note: *We used about one-third of all purchase acts (not reported in this table) to initialize brand loyalties and reference prices for the households.

share

price 5.0

0.75

4.5

0.50

4.0

0.25

3.5

0.00

3.0 0

26

52

78

104

130

0

26

52

display

78

104

130

104

130

feature 0.8

0.4

0.6

0.3 0.2

0.4

0.1

0.2

0.0

0.0 0

26

52

78

104

130

0

26

52

78

week brand

Del Monte

Heinz

Hunt's

Fig. 1. Time-series of brand shares, prices, and promotion variables (ketchup data).

In many empirical applications, reference prices (refprice) have been derived from prices paid in the past according to the framework of adaptive expectations by the following adaptive process (e.g., Lattin & Bucklin, 1989; Kalyanaram & Little, 1994; or Abe, 1998): ðr Þ ðr Þ ðr Þ refpriceiτ ¼ ϑref refpricei;τ−1 þ 1−ϑref pricei;τ−1 ;

0≤ϑref ≤1 :

ð15Þ

When comparing ﬁve different reference price formation models, Briesch, Krishnamurthi, Mazumdar, and Raj (1997) found this adaptive approach to outperform other operationalizations. Given the reference price of household i for brand r at purchase occa(r) (r) (r) 15 sion τ, gain and loss arise from max(refprice(r) iτ − priceiτ , 0) and max(priceiτ − refpriceiτ ,0), respectively. Promotional activities are operationalized by using two dummy variables describing the presence or absence of features and displays for each brand and purchase act, respectively. Summing up, all model speciﬁcations include brand intercepts, price, gain, loss, promotion (feature and display), and loyalty as covariates. Effects on deterministic utilities and brand choice probabilities are expected to be positive for gains, promotional variables, and brand loyalties, whereas effects of prices and losses are supposed to be negative. All effects are allowed to vary over time. 4.3. Model comparison The main purpose of our empirical application is to compare the performance of the different model versions introduced in Section 3.1 (as well as hybrid versions of them). This comparison enables us to evaluate the gains from using more ﬂexible models, which incorporate time-varying parameters and/or account for consumer heterogeneity. Furthermore, the different versions of the models with time-varying parameters should shed light on the question which speciﬁcation for parameter dynamics works best. In 15 Based on the standard MNL model with constant effects, we again estimated the smoothing constant ϑref via grid search and obtained a value of 0.69. The value is slightly higher than the results reported in Briesch et al. (1997) but still comparable (0.47–0.65). The same argument as before also applies here, a grid search using a model with time-varying parameters would have been better, but the very high computational effort prevented us from doing so. However, we do not expect results to be sensitive to this decision.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

405

addition, it is also important to evaluate the performance of the proposed approach to the currently best available competing models. Table 3 summarizes the models used in the comparison and describes how they are related. The nonparametric models differ in the number of knots (0.3 ∙ T, 0.5 ∙ T, or Τ), the degree of the spline (zero-order or cubic), and the type of penalty term (1st or 2nd order differences). Note that some models use different settings of these characteristics for different time-varying parameters. For example, models (10) and (11) use the very ﬂexible random walk speciﬁcation of model (9) for brand intercepts, but the less ﬂexible speciﬁcation of the models (7) respectively (8) for the time-varying effects of covariates. We also include a parametric model (MNL-TVP4) with quadratic trends in both the brand intercepts and all covariate effects across weeks, which is a model speciﬁcation practitioners would most likely use to accommodate time-varying parameters. Further, we add a parametric model with individual dynamics (MXL-TVP4). It contains heterogeneous quadratic trends for the brand intercepts and has similarities to Hasegawa et al. (2012). Accordingly, this model can be stated as ðr Þ ðr Þ ðr Þ ðr Þ ðr Þ ðr Þ ðr Þ 2 ðr Þ 0 ðr Þ 0 ηit ¼ α 0 þ α 0i þ α 1 þ α 1i t þ α 2 þ α 2i t þ xit β þ xit bi ;

ð16Þ

(r) (r) (r) 2 (r) 2 where α(r) 0 , α1 , and α2 determine the mean quadratic trends for the intrinsic brand utilities, and α0i ∼ N(0, σ0r), α1i ∼ N(0,σ1r), 2 ∼ N(0, σ ) account for household heterogeneity in those quadratic trends. Lastly, we also add two models (MXL-VAR and and α(r) 2i 2r MXL-RVAR) as proposed by Kim et al. (2005) to have a direct comparison to their Bayesian approach. In particular, we use the VAR(1) model and a restricted version hereof (RVAR(1)), which ﬁtted best in Kim et al. (2005). Table 4 reports ﬁt measures for each model in the estimation and validation sample, respectively. Irrespective of the scoring rule the model ranking does not change, enhancing conﬁdence in the results. Models with heterogeneity and parameter dynamics ﬁt the data better (in- and out-of-sample) than models without heterogeneity and/or parameter dynamics. Of course, in-sample performance was expected to be better for more complex models. The simplest model with dynamics (MNL-TVP4) shows only slight improvements over the static MNL model both in- and out-ofsample. Hence, parametric dynamics (in form of quadratic trends) are not sufﬁcient for modeling complex parameter evolutions in this data set. The models of Kim et al. (2005) have by far the best in-sample ﬁt, but perform out-of-sample even worse than the simple parametric MXL model (i.e., the heterogeneous model without time-varying parameters). This indicates strong overﬁtting of the MXLVAR and MXL-RVAR models and we conclude that these models are not suitable for modeling dynamics in this data set.16 The MXL-TVP4 also shows clear indication of overﬁtting. If we ignore the models of Kim et al. (2005), the MXL-TVP4 model has the best in-sample ﬁt according to the log-Likelihood and the spherical score. However, the predictive performance of the MXLTVP4 model is rather bad and in the range of the simple MXL model. Focusing on the rest of the models, we observe that for the individual-level scoring rules (log-Lik, Brier score, spherical score) improvements in model ﬁt due to heterogeneity turn out somewhat larger than improvements due to parameter dynamics. This result is consistent with the ﬁndings of Lachaab et al. (2006) and underlines the importance of accounting for heterogeneity in brand choice models (Allenby & Rossi, 1999). Further, the improvements in model ﬁt are smaller in the validation sample than in the estimation sample, indicating that precise individual-level estimates are rather difﬁcult to obtain. Relative improvements due to accommodating timevarying parameters are more or less the same in- and out-of-sample. Therefore, the estimated parameter paths (to be discussed in more detail in the next subsection) are not an artifact and applying a model with time-varying parameters is advisable. The best-ﬁtting model out-of-sample is the MXL-TVP32 model, which is a “hybrid model” with random walk dynamics for the brand intercepts and “smoother” dynamics for the covariates. This is interesting because it shows that the model with the highest ﬂexibility in all parameters (i.e., the MXL-TVP3 model) is not necessarily the best model. On the other hand, a model which is very smooth in all parameters (i.e., the MXL-TVP1) does also not predict best.17 This ﬁnding highlights the beneﬁt of using a modeling toolbox with diverse speciﬁcation options in the context of brand choice models with time-varying parameters. The brand share analysis mostly conﬁrms the results discussed before. However, on the aggregate level heterogeneity seems clearly less important and larger improvements in ARMSE are due to parameter dynamics. Again, the MXL-TVP32 model shows the highest ﬁt out-of-sample (and also in-sample if we ignore the strongly overﬁtting MXL-VAR and MXL-RVAR models), closely followed by the MXL-TVP31 and MXL-TVP3 models.18

4.4. Discussion of results In this section, we discuss our estimation results for a subset of the estimated models in more detail. In particular, we compare the MXL-TVP1, MXL-TVP3, and MXL-TVP32 models to see how their parameter paths differ. The three models are among those with the highest predictive validities, with the MXL-TVP32 showing the best performance across all predictive validity measures.19 16

In Web Appendix B1 we analyze and discuss both models (MXL-VAR and MXL-RVAR) in greater detail. The MXL-TVP32 model has also a better ﬁt in-sample compared to the MXL-TVP1 and MXL-TVP3 models. In a recent working paper (Baumgartner et al., 2018), we compare the class of TVP-MNL models (i.e., models without heterogeneity) in a different empirical application to a number of further benchmark models, among others the nonparametric model of Baumgartner (2003), dynamic models with seasonal effects and a model with brand-week ﬁxed-effects. In addition, we extend the TVP-MNL models to consider alternative-speciﬁc instead of generic covariate effects. 19 According to Van Heerde, Leeﬂang, and Wittink (2002), marketing managers should favor the model with the highest predictive performance. For the sake of clarity, we do not include the MXL-TVP2 and MXL-TVP31 models in the ﬁgures. The predictive accuracy of the MXL-TVP2 lies in-between the MXL-TVP1 and the MXL-TVP3, the MXL-TVP31 performs almost as well as the MXL-TVP3 (please compare Table 4). Conﬁdence bounds are included in subsequent ﬁgures. 17 18

406

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Table 3 Model speciﬁcations. Model

Heterogeneity Number of Degree of knots spline

Roughness penalty

Description

1) MNL 2) MNL-TVP1 3) MNL-TVP2 4) MNL-TVP3

No No No No

– 0.3 ∙ T⁎ 0.5 ∙ T T

– cubic cubic zero-order

– 2nd order 2nd order 1st order

5) MNL-TVP4 6) MXL 7) MXL-TVP1 8) MXL-TVP2 9) MXL-TVP3

No Yes Yes Yes Yes

– – 0.3 ∙ T 0.5 ∙ T T

– – cubic cubic zero-order

– – 2nd order 2nd order 1st order

10) MXL-TVP31

Yes

11) MXL-TVP32

Yes

0.3 ∙ T ⁎⁎, T ⁎⁎⁎ 0.5 ∙ T ⁎⁎, T ⁎⁎⁎

12) MXL-TVP4

Yes

–

cubic⁎⁎, zero-order⁎⁎⁎ cubic⁎⁎, zero-order⁎⁎⁎ –

2nd order⁎⁎, 1st order⁎⁎⁎ 2nd order⁎⁎, 1st order⁎⁎⁎ –

13) MXL-VAR 14) MXL-RVAR

Yes Yes

– –

– –

– –

Simple MNL model MNL with smooth time-varying parameters MNL-TVP1 with more knots MNL with smooth time-varying parameters and as many knots as weeks; mimics the KF model of Lachaab et al. (2006) MNL model with quadratic trends in all parameters Simple MXL model MXL with smooth time-varying parameters MXL-TVP1 with more knots MXL with smooth time-varying parameters and as many knots as weeks; mimics the KF model of Lachaab et al. (2006) Hybrid version of MXL-TVP1 and MXL-TVP3; more ﬂexibility for intercepts, more smoothness for covariate effects Hybrid version of MXL-TVP2 and MXL-TVP3; more ﬂexibility for intercepts, more smoothness for covariate effects MXL with heterogeneous quadratic trends for brand intercepts; has similarities to Hasegawa et al. (2012) VAR(1) model of Kim et al. (2005) RVAR(1) model of Kim et al. (2005)

Note: ⁎ T is the number of time periods (e.g., weeks in our applications) and we round the resulting number of knots to the nearest integer. ⁎⁎ For effects of covariates. ⁎⁎⁎ For brand intercepts.

We further add the simple MXL as a benchmark. Remember that models accounting for heterogeneity consistently outperformed homogeneous models (see Table 4). Fig. 2 depicts the estimated parameter paths for the four models, with time t (weeks) on the x-axis and estimated parameters for brand intercepts and covariate effects on the y-axis. The dotted black line represents the mean of the (time-constant) parameters for the MXL model, the dashed green (MXL-TVP1), dash-dotted red (MXL-TVP3), and solid blue (MXL-TVP32) lines show the means of the time-varying parameters. The horizontal lines for the MXL model can be interpreted as average results over time. Please note the different interpretation of the value 0 for the brand intercepts on the one hand and the covariate effects (i.e., price, gain, loss, loyalty, display, and feature) on the other hand. Concerning the brand intercepts, the value 0 refers to the utility of the reference brand (i.e., Hunt's). Regarding the covariate effects the value 0 implies that the covariate has no effect on the utility of the consumer. The time-varying parameters show noticeable similarities across models. Therefore, we ﬁrst focus on common patterns across the three TVP models and then discuss model differences. 4.4.1. Brand intercepts Throughout the whole observation period, Del Monte's intrinsic brand utility has been lower than that of Hunt's. Except for some weeks in the last third of the data set the intercept of Heinz is larger than 0. Heinz has, therefore, the largest brand value followed by Hunt's and Del Monte. The ordering of the brand values is also the same for the MXL model with constant parameters. However, this model ignores the considerable changes in the brand values over time. The models with time-varying parameters reveal, e.g., that Del Monte's brand intercept increases over the ﬁrst 52 weeks and then decreases during the second year of the data set. This inverse U-shaped brand value evolution coincides with an U-shaped evolution of its weekly average prices during this time span: the average price level of Del Monte ﬁrst declines reaching the minimum around week 60, and subsequently stabilizes at a slightly higher value (see Fig. 1). A similar price-utility pattern is observed for Heinz ketchup. The brand intercept of Heinz has been considerably higher during the ﬁrst half of the data set (with a maximum at week 55) and then drops to a lower level. This pattern matches the increase in the (average) price level of Heinz in the second half of the data set. Stated differently, the inﬂuence of price changes on a consumer's deterministic utility may not only be reﬂected by a time-varying price parameter but may also be captured in the brand values. Indeed, in stable categories like ketchup where people most likely will not enter or leave the market because of a price change of approximately 0.5 $-cent per ounce, it makes sense that such changes will be absorbed to some degree in the brand intercepts, at least in the short run.20 In sum, the time-varying models provide interesting (explorative) insights with respect to the evolution of brand value over time, which is an inherently dynamic construct (Erdem et al., 1999). Furthermore, even though the brand shares vary over 20 In a second and unrelated step, we reparameterized the time-varying brand intercepts (each one at a time) of the best-ﬁtting model MXL-TVP32 (compare Table 4, model 11) as to depend on past own prices and past own promotional activities following the intercept process function speciﬁcation as used in Foekens et al. (1999). In particular, we computed for each brand and week the weighted sum of past prices and past promotional intensities (e.g., percent display across observed purchase acts in a week) with an exponentially declining weight within a time window of 6 weeks. We further used the inverse sampling variance of the estimated time-varying intercepts of the ﬁrst stage as regression weights to account for heteroscedasticity. Although the estimated effects for the three covariates are well-interpretable (e.g., higher past own prices have a negative effect on current brand values), the explained variances of these time-series regressions were fairly low (31% for Del Monte, 19% for Heinz). Results can be obtained from the authors upon request.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

407

Table 4 Fit and predictive validity (ketchup data). Model

Log-Lik

Brier score

Spherical score

ARMSE

Data set: estimation 1) MNL 2) MNL-TVP1 3) MNL-TVP2 4) MNL-TVP3 5) MNL-TVP4 6) MXL 7) MXL-TVP1 8) MXL-TVP2 9) MXL-TVP3 10) MXL-TVP31 11) MXL-TVP32 12) MXL-TVP4 13) MXL-VAR 14) MXL-RVAR

−2503.536 −2448.143 −2446.975 −2377.526 −2486.439 −1909.215 −1828.434 −1813.228 −1792.563 −1792.481 −1784.468 −1771.250 −1187.755 −1438.816

−1382.973 −1349.029 −1348.324 −1310.781 −1371.497 −1062.927 −1010.848 −1003.968 −990.114 −990.020 −987.248 −987.683 −678.219 −820.854

4363.311 4383.192 4383.597 4406.645 4368.986 4543.255 4573.909 4578.426 4586.641 4586.673 4588.865 4589.098 4767.558 4689.401

0.0721 0.0618 0.0616 0.0495 0.0701 0.0639 0.0550 0.0548 0.0476 0.0476 0.0475 0.0612 0.0332 0.0409

Data set: validation 1) MNL 2) MNL-TVP1 3) MNL-TVP2 4) MNL-TVP3 5) MNL-TVP4 6) MXL 7) MXL-TVP1 8) MXL-TVP2 9) MXL-TVP3 10) MXL-TVP31 11) MXL-TVP32 12) MXL-TVP4 13) MXL-VAR 14) MXL-RVAR

−2684.858 −2653.468 −2652.923 −2631.779 −2669.448 −2420.971 −2385.807 −2379.836 −2369.880 −2370.512 −2366.492 −2416.059 −4699.364 −4098.767

−1502.482 −1484.072 −1483.749 −1473.027 −1495.612 −1370.856 −1346.748 −1344.226 −1337.959 −1338.788 −1336.935 −1372.683 −1836.804 −1819.550

4272.380 4283.621 4283.809 4290.508 4275.778 4343.625 4358.683 4360.079 4363.315 4362.793 4363.856 4342.660 4132.380 4124.422

0.0777 0.0723 0.0722 0.0693 0.0743 0.0741 0.0695 0.0694 0.0669 0.0668 0.0667 0.0732 0.0877 0.0861

Note: Best-ﬁtting model indicated in bold within each data set for each performance measure.

time and also lead on average to the same brand ordering (see Fig. 1), the shares are further driven by the marketing-mix of all competitors, as well as brand loyalty and reference price effects (see Kamakura & Russell, 1993). In particular, the gap between the top-brand Heinz and its competitors appears to be much larger based on the brand shares than the gap based on the comparison of the time-varying brand intercepts. The latter measure of brand performance is thus more informative because it shows differences more nuanced and net of confounding effects. Therefore, a brand choice model with time-varying brand intercepts is a valuable tool for brand value measurement and monitoring (Sriram et al., 2007). Using disaggregated data (instead of market shares) further enables the researcher to control for heterogeneity as well as brand loyalty more easily, which is important for an unbiased measurement of effects (see Keller & Lehmann, 2006). 4.4.2. Price-related covariates The estimated parameters for the covariates price, gain, and loss show the expected signs (negative for price and loss, positive for gain). The price effect decreases (in absolute terms) over the time period of the data set, indicating that households on average become less price-sensitive. This decreased price sensitivity might be driven by the increase in the average price level of the leading brand Heinz (see Lachaab et al., 2006 for a similar argument regarding changes in price elasticities). The gain parameter reveals some kind of U-shaped pattern (MXL-TVP1, MXL-TVP32) but anyway strongly increases toward the end of the time window (MXL-TVP1, MXL-TVP32, and MXL-TVP3), whereas the loss parameter remains almost constant over time. The loss effect is further not signiﬁcantly different from zero (as indicated by the credible intervals of all three MXL-TVP model versions, see Figs. 3 and 4 below). Please note, however, that the mean estimates of the models depicted in Fig. 2 also account for heterogeneity and, therefore, at least some households are signiﬁcantly loss-averse. It is important to know the impact of household heterogeneity in reference price effects for optimal pricing policies, as has been shown by Kopalle, Kannan, Boldt, and Arora (2012). Considering time-varying parameters, computing optimal prices is certainly even more difﬁcult. Nevertheless, in the case of distinct changes, like the ones depicted in Fig. 2 for the gain effect, accommodating time-variation for optimal pricing could be potentially proﬁtable. 4.4.3. Brand-loyalty The loyalty effect shows the expected positive sign and is (slightly) decreasing in the course of time. Again, regarding optimal pricing policies, a similar argument as before applies here. Dubé, Hitsch, Rossi, and Vitorino (2008) discuss optimal pricing with

408

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

intercept Del Monte

intercept Heinz 2

0.0 −0.5

1 −1.0 −1.5

0

−2.0 0

26

52

78

104

130

0

26

52

price

78

104

130

78

104

130

104

130

104

130

gain

0 4 −2 2

parameter value

−4 0

−6 0

26

52

78

104

130

0

26

52

loss

loyalty 3.5

1 0

3.0

−1 2.5 −2 −3

2.0 0

26

52

78

104

130

0

26

52

display

78

feature 2

1.5 1.0

1

0.5

0

0

26

52

78

104

130

0

26

52

78

week model

MXL

MXL−TVP1

MXL−TVP3

MXL−TVP32

Fig. 2. Estimated parameter paths (MXL-models; ketchup data).

state-dependent utility, and being able to measure time-varying effects would be the ﬁrst step to extend their approach to a situation where the effect of brand-loyalty is changing over time.

4.4.4. Promotion covariates The plots for the effects of display and feature advertising reveal interesting patterns, too. Both covariates have a positive and signiﬁcant effect on utility (except for the feature effect during the very ﬁrst weeks where the credible intervals contain the 0, see Figs. 3 and 4 below). The feature effect markedly increases over time according to an inverted s-shaped trend, turning out at the end of the time window at least twice as high as compared to the beginning. In contrast, the display effect is fairly stable in the long-term and starts to increase only during the last third of the time window. For a retailer, such changes are also important to know in order to plan and set up promotional activities efﬁciently.

4.4.5. MXL-TVP1 vs. MXL-TVP3 Fig. 3 explicitly contrasts the estimated parameter paths of those two speciﬁcations and now also displays the corresponding 95% point-wise credible intervals (shaded). The comparison shows that some of the estimated parameter paths are highly similar (e.g., for the brand intercepts of Heinz) or only differ in their level of smoothness but not in their course (e.g., for the brand intercepts of Del Monte), while other effects turn out quite different (e.g., see the gain parameter evolutions).

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

intercept Del Monte

409

intercept Heinz 2

0.0 −0.5

1 −1.0 −1.5

0

−2.0 0

26

52

78

104

130

0

26

52

price

78

104

130

78

104

130

104

130

104

130

gain

0 4 −2 2

parameter value

−4 0

−6 0

26

52

78

104

130

0

26

52

loss

loyalty 3.5

1 0

3.0

−1 2.5 −2 −3

2.0 0

26

52

78

104

130

0

26

52

display

78

feature 2

1.5 1.0

1

0.5

0

0

26

52

78

104

130

0

26

52

78

week model

MXL−TVP1

MXL−TVP3

Fig. 3. Estimated parameter paths (MXL-TVP1 vs. MXL-TVP3; ketchup data).

Note that the 95% point-wise credible intervals overlap for all parameter estimates of the two models. Therefore, despite the fact that the MXL-TVP3 model ﬁts the data still better in- and out-of-sample compared to the MXL-TVP1, the resulting estimates for the parameter paths are statistically indistinguishable. Still, except for the mean loss effect (which is not signiﬁcant over the whole time-period), we clearly observe a greater temporal ﬂexibility of the MXL-TVP3 model, even though its parameter paths are often only ﬂuctuating around the smoother parameters paths of the MXL-TVP1 model (see, e.g., the brand intercepts or the feature effect). Further, both models differ in their parameter evolutions for the price and gain effects. While the MXL-TVP1 model is less ﬂexible in general, its price parameter path reveals larger amplitudes and consequently more variation over the whole time window as compared to the MXL-TVP3 model. Moreover, the estimated gain effect for the MXL-TVP3 model is clearly different from the U-shape evolution of the MXL-TVP1 model and shows considerably more variation at the weekly level (i.e., is much more wiggly). At the same time, the credible interval of the MXL-TVP3 model most of the time contains the 0 suggesting that the mean gain effect is not signiﬁcant. Finally, the decreasing trend for the loyalty effect is less distinctive for the MXLTVP3 model. Altogether, it seems that the higher ﬂexibility of the MXL-TVP3 model is advantageous compared to the “smoother” MXL-TVP1 model as it provides a better ﬁt in- and out-of-sample (see Table 4). The general managerial insights regarding changes in brand values as well as price and promotion sensitivities are nevertheless often similar. Note that the less ﬂexible MXL-TVP1 model seems to allow a more precise measurement of brand values compared to the MXL-TVP3 model, as indicated by narrower credible intervals at most points in time (this also holds for the detergent data set, as is illustrated in Appendix B).

410

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

intercept Del Monte

intercept Heinz 2

0.0 −0.5

1 −1.0 −1.5

0

−2.0 0

26

52

78

104

130

0

26

52

price

78

104

130

78

104

130

104

130

104

130

gain

0 4 −2 2

parameter value

−4 0

−6 0

26

52

78

104

130

0

26

52

loss

loyalty 3.5

1 0

3.0

−1 2.5 −2 −3

2.0 0

26

52

78

104

130

0

26

52

display

78

feature 2

1.5 1.0

1

0.5

0

0

26

52

78

104

130

0

26

52

78

week model

MXL−TVP3

MXL−TVP32

Fig. 4. Estimated parameter paths (MXL-TVP3 vs. MXL-TVP32; ketchup data).

4.4.6. MXL-TVP3 vs. MXL-TVP32 The model with the best predictive performance, the MXL-TVP32, uses the most ﬂexible speciﬁcation for the brand intercepts (following Lachaab et al., 2006), but “smoother” dynamics for covariate effects (see Table 3). Fig. 4 explicitly contrasts this hybrid model with the MXL-TVP3 model (the latter which consistently imposes the random walk speciﬁcation according to Lachaab et al., 2006 on all effects). The motivation behind the MXL-TVP32 is the idea that brand values may be rather volatile and then would require a more ﬂexible speciﬁcation, while changes of effects of covariates may be expected to be more smooth. If this assumption holds the MXL-TVP3 model should be prone to overﬁt covariate effects which might, in turn, entail a worse out-of-sample ﬁt. The predictive validity results (see Table 4) interpreted together with the estimated parameter paths (Fig. 4, also Fig. 2) favor this argumentation, because the MXL-TVP32 performs even better than the MXL-TVP3 with regard to all predictive validity measures. The MXL-TVP3 seems to overﬁt some covariate effects (especially the gain effect) here. While the price sensitivity decreases in a linear way according to the MXL-TVP32, the MXL-TVP1 (and also the MXL-TVP3 in less pronounced form) suggests a cyclic pattern, as discussed above. As expected, the parameter paths for the brand intercepts for the MXL-TVP3 and the MXL-TVP32 model are virtually identical. Altogether, the MXL-TVP32 seems to combine two favorable properties leading to a better model performance than the MXL-TVP3 in the ketchup category: high ﬂexibility to model rather volatile brand values (brand intercepts), less ﬂexibility to model more smoothly evolving covariate effects.21

21

Please note that the model performance depends on the product category and on the characteristics of the data set at hand, as is illustrated in Web Appendix B.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

411

4.5. Managerial implications Instead of overall ﬁt and predictive validity statistics, product or brand managers may be more interested in ﬁndings on the individual brand level and related improvements resulting from applying a more complex model. From a managerial point of view, the comparison of the following four models seems reasonable: (1) the MNL as simplest benchmark model (no heterogeneity, no time-varying parameters) (2) the MXL as the standard approach in research and practice for modeling brand choice (with heterogeneity, no time-varying parameters), (3) the MNL-TVP3 as the best performing model with parameter dynamics only (no heterogeneity, with time-varying parameters, see Table 3), and (4) the MXL-TVP32 as our model with the best out-of-sample performance across all models (with heterogeneity, with time-varying parameters). Table 5 reports RMSE values in shares for each ketchup brand (in- and out-of-sample), and reveals to which extent the brands beneﬁt from greater model ﬂexibility. We further report the improvement in ﬁt for the MXL, MNL-TVP3, and MXL-TVP32 models over the MNL model to gain further understanding which component (heterogeneity, time variability) increases the model ﬁt the most. As expected, in-sample ﬁt for all individual brands improves for the more ﬂexible models. Furthermore, accommodating parameter dynamics provides a larger improvement (MNL-TVP3 vs. MNL) compared to accounting for heterogeneity only (MXL vs. MNL). This is remarkable because the marketing literature predominantly focuses on the importance of heterogeneity. However, our results suggest that homogeneous models with time-varying parameters perform at least equally good regarding brand level share predictions. Finally, the MXL-TVP32 model has the lowest RMSE values for all brands and provides the largest improvements over the MNL. Thus, both components (heterogeneity and dynamics) are relevant for computing brand-level shares accurately (at least for the ketchup data). A closer examination of the results of the MXL-TVP32 model reveals that the highest improvements in RMSE are obtained for Heinz (+35.46%) and Hunt's (+30.08%), the two brands with the highest brand share ﬂuctuations (see Fig. 1) and considerably changing brand values (see Fig. 2). Concerning predictive validity, the results indicate a superior performance of the MXLTVP32 for all brands, with improvements in RMSE over the standard MNL model ranging from 9.63% (Del Monte) up to 15.09% (Hunt's). Even though the RMSE improvements in the validation sample are smaller than in-sample, the improvements are still substantial. Altogether, the brand level RMSE results conﬁrm our previous results regarding model performance (see Table 4). Fig. 5 contrasts actual brand shares (grey lines) and brand shares predicted from both the MXL-TVP32 model (blue lines) and the standard MXL model (black lines) in the validation sample. The differences in share predictions can be attributed to the additional consideration of time-varying parameters in the MXL-TVP32 since heterogeneity has been accommodated in both models. The plots refer to Hunt's (left diagram) and Heinz (right diagram), i.e., the brands with the largest improvements in predictive validity when allowing for time-varying parameters. The plot referring to Hunt's ketchup reveals a markedly high discrepancy between the performance of the two models around weeks 60 and 100. In these periods, the MXL-TVP32 model can approximate brand shares rather well, while the standard MXL is away from providing accurate predictions. Remember that Hunt's intrinsic brand utility decreased toward Del Monte's around week 60. Also for Heinz, we see that the more ﬂexible MXL-TVP32 model approximates brand shares better than the MXL model. For example, in weeks 90 to 100 the MXL clearly overpredicts brand shares. The MXL model is not ﬂexible enough to provide predictions that match with the dynamic developments in the data, even though we include with the brand loyalty variable and reference price variables (gain, loss) several dynamic constructs (in contrast to Kim et al., 2005, Lachaab et al., 2006, or Rutz & Sonnier, 2011). In other words, the improvements in predictive performance can be fully attributed to changes in parameters and are not due to well established dynamic constructs because all models (even our models with constant parameters) control for these dynamic components. Overall, these ﬁndings suggest the use of the time-varying parameters models and should encourage ﬁrms and managers to adopt more ﬂexible models to detect time-varying effects. If time-varying effects exist and are expected to recur, the use of TVP models can help managers to understand short-term peaks or dips in demand more accurately, so that out-of-stock situations or increased inventories may be prevented. If changes and ﬂuctuations in demand or market shares are not foreseeable and thus not fully predictable, managers can use the model to detect and analyze changes in consumer behavior in response to marketing mix decisions in due time. This way, even though the decision maker can't predict the future, he/she “is one step (but only Table 5 RMSE values in shares at the individual brand level (improvement in % over MNL). Brand

MNL

MXL

MNL-TVP3

MXL-TVP32

Data set: estimation Del Monte Heinz Hunt's

0.0460 0.0736 0.0778

0.0395 (14.13) 0.0651 (11.55) 0.0700 (10.03)

0.0375 (18.48) 0.0488 (33.70) 0.0580 (25.45)

0.0340 (26.09) 0.0475 (35.46) 0.0544 (30.08)

Data set: validation Del Monte Heinz Hunt's

0.0519 0.0813 0.0749

0.0491 (5.39) 0.0772 (5.04) 0.0734 (2.00)

0.0485 (6.55) 0.0726 (10.70) 0.0654 (12.68)

0.0469 (9.63) 0.0697 (14.27) 0.0636 (15.09)

Note: Best-ﬁtting model (for each brand) indicated in bold within each data set.

412

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Hunt's

0.4

0.2

share

0.0 0

26

52

78

104

130

78

104

130

Heinz 1.0 0.8 0.6 0.4 0

26

52

week actual

MXL

MXL−TVP32

Fig. 5. Actual and predicted brand shares (validation sample).

one step) behind” (Amman & Kendrick, 2003). In the context of our ketchup data, “one step” refers to one week which we consider as a reasonable short time interval for practical purposes. In addition, estimated parameter paths can be reparameterized in a second step as to depend on further covariates in order to try to identify drivers for the time-varying effects.

5. Conclusions and limitations This article presents ﬂexible MNL and MXL models to analyze time-varying effects in consumer choice behavior. Estimation of the time-varying effects is based on penalized splines, a ﬂexible yet parsimonious nonparametric smoothing technique. The estimation procedure is fully data-driven (determining the unknown smooth functions and individual degrees of smoothness for each of the unknown functions simultaneously) and easy to apply (using free software). The main advantage of the ﬂexible approach is that both short-term ﬂuctuations in brand choice behavior (e.g., due to situational factors) and longer-term effects of marketing instruments can be uncovered. The ﬂexible model provides direct implications for future managerial decisions if systematic time-varying effects exist (e.g., recurring effects before festive occasions), and can further be used as a diagnostic tool to explore changes in consumer behavior in response to marketing actions or other inﬂuencing factors. In addition, via time-varying brand intercepts managers can measure brand values over time, thus providing better insights into the health of their brand as well as their position compared to their competitors. In our empirical study, we obtained a higher predictive validity for our ﬂexible models with parameter dynamics as compared to the standard MNL and MXL models and very different patterns of time-varying effects at the brand level in the ketchup category. In particular, a model with heterogeneity but different kinds of dynamic speciﬁcations for brand intercepts on the one hand (high ﬂexibility: random walk dynamics) and effects of covariates on the other hand (lower ﬂexibility: cubic splines) ﬁtted the data best compared to different possible speciﬁcations within our proposed framework as well as compared to other established benchmark models in this ﬁeld. Which dynamic speciﬁcation performs best depends on the data at hand, and using an adaptive approach like the proposed semiparametric framework helps to explore time-varying parameters in brand choice models (as is demonstrated with two additional empirical applications in Web Appendix B). Next, we turn to the limitations of the study and indicate opportunities for future research. First, even though we accommodate consumer heterogeneity, the variances of the heterogeneity distributions are kept invariant across periods, i.e., parameter variation is additively decomposed into time variation (at the population level) and cross-sectional variation. We share this shortcoming with other studies (e.g., Kim et al., 2005; Lachaab et al., 2006 and Rutz & Sonnier, 2011). A brand choice model with individual-level heterogeneous and nonparametrically estimated time-varying parameters would be an interesting extension. Since scanner data on the individual household level is very sparse, the main challenge of such an approach would be to cope with overﬁtting problems. Second, we did not analyze competitive reactions (e.g., Jedidi et al., 1999) as our focus lies on an exploratory approach for estimating parameter dynamics. However, if changes in parameters cause ﬁrms to alter their marketing policies, analyzing competitive reactions might be an interesting avenue for future research. Third, it would be interesting to use our semiparametric approach for modeling time-varying parameters in other limited-dependent variable contexts (e.g., purchase incidence and/or purchase quantity). Using BayesX this would be relatively straight forward, as the program also contains suitable families of distributions (e.g., binomial logit/probit and poisson). Finally, we rely on the logit model, mostly because of its computational simplicity. Utilizing a probit model instead would solve the IIA assumption even on the individual household level. However, the estimation of a ﬂexible probit model within our framework would be computationally expensive if not intractable.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

413

Acknowledgments The authors thank the Editor, the Area Editor, two anonymous reviewers, and the former Editor Marnik Dekimpe for their detailed and valuable comments and suggestions that helped to improve and to position the paper. Furthermore, we thank Jin Gyo Kim and Fred Feinberg for letting us use their proprietary detergent data, their C++ code, and their server, as well as Koray Cosguner and Seethu Seetharaman for sharing their cola data with us. Daniel Guhl gratefully acknowledges ﬁnancial support by the Deutsche Forschungsgemeinschaft (DFG) through CRC TRR 190. All authors have contributed equally to this research. Web Appendix Supplementary data to this article can be found online at https://doi.org/10.1016/j.ijresmar.2018.03.003. References Abe, M. (1995). A nonparametric density estimation method for brand choice using scanner data. Marketing Science, 14(3), 300–325. Abe, M. (1998). Measuring consumer, nonlinear brand choice response to price. Journal of Retailing, 74(4), 541–568. Abe, M. (1999). A generalized additive model for discrete choice data. Journal of Business & Economic Statistics, 17(3), 271–284. Abe, M., Boztuğ, Y., & Hildebrandt, L. (2004). Investigating the competitive assumption of multinomial logit models of brand choice by nonparametric modeling. Computational Statistics, 19(4), 635–657. Ailawadi, K. L., Gedenk, K., & Neslin, S. A. (1999). Heterogeneity and purchase event feedback in choice models: An empirical analysis with implications for model building. International Journal of Research in Marketing, 16(3), 177–198. Allenby, G. M., & Lenk, P. J. (1995). Reassessing brand loyalty, price sensitivity, and merchandising effects on consumer brand choice. Journal of Business & Economic Statistics, 13(3), 281–289. Allenby, G. M., & Rossi, P. E. (1999). Marketing models of consumer heterogeneity. Journal of Econometrics, 89(1/2), 57–78. Amman, H. M., & Kendrick, D. A. (2003). Mitigation of the Lucas critique with stochastic control methods. Journal of Economic Dynamics & Control, 27(11−12), 2035–2057. Ataman, M. B., van Heerde, H. J., & Mela, C. F. (2010). The long-term effect of marketing strategy on brand sales. Journal of Marketing Research, 47(5), 866–882. Bass, F. M., Bruce, N., Majumdar, S., & Murthi, B. P. S. (2007). Wearout effects of different advertising themes: A dynamic Bayesian model of the ad-sales relationship. Marketing Science, 26(2), 179–195. Baumgartner, B. (2003). Measuring changes in brand choice behavior. Schmalenbach Business Review, 55(3), 242–256. Baumgartner, B., & Hruschka, H. (2005). Allocation of catalogs to collective customers based on semiparametric response models. European Journal of Operational Research, 162(3), 839–849. Baumgartner, B., Guhl, D., Kneib, T., & Steiner, W. J. (2018). Flexible Estimation of Time-Varying Effects for Frequently Purchased Retail Goods: A Modeling Approach Based on Household Panel Data. Working Paper. Boulding, W., Lee, E., & Staelin, R. (1994). Mastering the mix: Do advertising, promotion, and sales force activities lead to differentiation? Journal of Marketing Research, 31(2), 159–172. Brezger, A., Kneib, T., & Lang, S. (2005). BayesX: Analyzing Bayesian structured additive regression models. Journal of Statistical Software, 14(11), 1–22. Brezger, A., & Steiner, W. J. (2008). Monotonic regression based on bayesian p-splines: An application to estimating price response functions from store-level scanner data. Journal of Business & Economic Statistics, 26(1), 90–104. Briesch, R., Chintagunta, P. K., & Matzkin, R. L. (2002). Semiparametric estimation of brand choice behavior. Journal of the American Statistical Association, 97(460), 973–982. Briesch, R., Krishnamurthi, L., Mazumdar, T., & Raj, S. P. (1997). A comparative analysis of reference price models. Journal of Consumer Research, 24(2), 202–214. Bronnenberg, B. J., Kruger, M. W., & Mela, C. F. (2008). The IRI marketing data set. Marketing Science, 27(4), 745–748. Bruce, N. I., Peters, K., & Naik, P. A. (2012). Discovering how advertising grows sales and builds brands. Journal of Marketing Research, 49(6), 793–806. de Boor, C. (2001). A Practical Guide to Splines (Revised Ed.). New York: Springer. Dekimpe, M. G., & Hanssens, D. M. (2004). Persistence modeling for assessing marketing strategy performance. In D. Lehmann, & C. Moorman (Eds.), Assessing marketing strategy performance. Marketing Science Institute. Dierckx, P. (1993). Curve and surface fitting with splines. Oxford: Clarendon Press. Dubé, J. -P., Hitsch, G. J., & Rossi, P. E. (2010). State dependence and alternative explanations for consumer inertia. RAND Journal of Economics, 41(3), 417–445. Dubé, J. -P., Hitsch, G. J., Rossi, P. E., & Vitorino, M. A. (2008). Category pricing with state-dependent utility. Marketing Science, 31(6), 873–877. Eilers, P. H. C., & Marx, B. C. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statistical Science, 11(2), 89–121. Einav, L., Leibtag, E., & Nevo, A. (2010). Recording discrepancies in Nielsen Homescan data: Are they present and do they matter? Quantitative Marketing and Economics, 8(2), 207–239. Erdem, T., & Keane, M. P. (1996). Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Science, 15(1), 1–20. Erdem, T., Swait, J., Broniarczyk, S., Chakravarti, D., Kapferer, J. -N., Keane, M., ... Zettelmeyer, F. (1999). Brand equity, consumer learning and choice. Marketing Letters, 10(3), 301–318. Fahrmeir, L., Kneib, T., & Lang, S. (2004). Penalized structured additive regression for space-time data: A Bayesian perspective. Statistica Sinica, 14(3), 731–761. Foekens, E. W., Leeflang, P. S. H., & Wittink, D. R. (1999). Varying parameter models to accommodate dynamic promotion effects. Journal of Econometrics, 89(1–2), 249–268. Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. Gönül, F., & Srinivasan, K. (1993). Modeling multiple sources of heterogeneity in multinomial logit models: Methodological and managerial issues. Marketing Science, 12(3), 213–229. Gordon, B. R., Goldfarb, A., & Li, Y. (2013). Does price elasticity vary with economic growth? A cross-category analysis. Journal of Marketing Research, 50(1), 4–23. Guadagni, P. M., & Little, J. D. C. (1983). A logit model of brand choice calibrated on scanner data. Marketing Science, 2(3), 203–238. Gupta, S. (1988). Impact of sales promotions on when, what, and how much to buy. Journal of Marketing Research, 25(4), 342–355. Haaijer, R., & Wedel, M. (2001). Habit persistence in time series models of discrete choice. Marketing Letters, 12(1), 25–35. Hasegawa, S., Terui, N., & Allenby, G. M. (2012). Dynamic brand satiation. Journal of Marketing Research, 49(6), 842–853. Haupt, H., Kagerer, K., & Steiner, W. J. (2014). Smooth quantile-based modeling of brand sales, price and promotional effects from retail scanner panels. Journal of Applied Econometrics, 29(6), 1007–1028. Heilman, C. M., Bowman, D., & Wright, G. P. (2000). The evolution of brand preferences and choice behaviors of consumers new to a market. Journal of Marketing Research, 37(2), 139–155. Holtrop, N., Wieringa, J. E., Gijsenberg, M. J., & Verhoef, P. C. (2017). No future without the past? Predicting churn in the face of customer privacy. International Journal of Research in Marketing, 34(1), 154–172. Hruschka, H. (2002). Market share analysis using semi-parametric attraction models. European Journal of Operational Research, 138(1), 212–225.

414

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Jedidi, K., Mela, C. F., & Gupta, S. (1999). Managing advertising and promotion for long-run profitability. Marketing Science, 18(1), 1–22. Kahneman, D., & Tversky, A. (1979). A prospect theory: An analysis of decisions under risk. Econometrica, 47(2), 263–291. Kalyanam, K., & Shively, T. S. (1998). Estimating irregular pricing effects: A stochastic spline regression approach. Journal of Marketing Research, 35(1), 16–29. Kalyanaram, G., & Little, J. D. C. (1994). An empirical analysis of latitude of price acceptance in consumer package goods. Journal of Consumer Research, 21(3), 408–418. Kalyanaram, G., & Winer, R. S. (1995). Empirical generalizations from reference price research. Marketing Science, 14(3), G161–G169 (Part 2 of 2). Kamakura, W. A., & Russell, G. J. (1989). A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research, 26(4), 379–390. Kamakura, W. A., & Russell, G. J. (1993). Measuring brand value with scanner data. International Journal of Research in Marketing, 10(1), 9–22. Kaul, A., & Wittink, D. R. (1995). Empirical generalizations about the impact of advertising on price sensitivity and price. Marketing Science, 14(3), G151–G160 (Part 2 of 2). Keane, M. P. (1997). Modeling heterogeneity and state dependence in consumer choice behavior. Journal of Business & Economic Statistics, 15(3), 310–327. Keller, K. L., & Lehmann, D. R. (2006). Brands and branding: Research findings and future priorities. Marketing Science, 25(6), 740–759. Kim, J. G., Menzefricke, U., & Feinberg, F. M. (2005). Modeling parametric evolution in a random utility framework. Journal of Business & Economic Statistics, 23(3), 282–294. Kim, J. G., Menzefricke, U., & Feinberg, F. M. (2007). Capturing flexible heterogeneous utility curves: A Bayesian spline approach. Management Science, 53(2), 340–354. Kneib, T., Baumgartner, B., & Steiner, W. J. (2007). Semiparametric multinomial logit models for analysing consumer choice behaviour. AStA Advances in Statistical Analysis, 91(3), 225–244. Kopalle, P. K., Kannan, P. K., Boldt, L. B., & Arora, N. (2012). The impact of household level heterogeneity in reference price effects on optimal retailer pricing policies. Journal of Retailing, 88(2), 102–114. Kopalle, P. K., Mela, C. F., & Marsh, L. (1999). The dynamic effect of discounting on sales: Empirical analysis and normative pricing implications. Marketing Science, 18 (3), 317–332. Kumar, V., Choi, J. B., & Greene, M. (2017). Synergistic effects of social media and traditional marketing on brand sales: Capturing the time-varying effects. Journal of the Academy of Marketing Science, 45(2), 268–288. Lachaab, M., Ansari, A., Jedidi, K., & Trabelsi, A. (2006). Modeling preference evolution in discrete choice models: A Bayesian state-space approach. Quantitative Marketing and Economics, 4(1), 57–81. Lang, S., & Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics, 13(1), 183–212. Lang, S., Steiner, W. J., Weber, A., & Wechselberger, P. (2015). Accommodating heterogeneity and nonlinearity in price effects for predicting brand sales and profits. European Journal of Operational Research, 246(1), 232–241. Lattin, J. M., & Bucklin, R. E. (1989). Reference effects of price and promotion on brand choice behavior. Journal of Marketing Research, 26(3), 299–310. Leeflang, P. S., Bijmolt, T. H., van Doorn, J., Hanssens, D. M., van Heerde, H. J., Verhoef, P. C., & Wieringa, J. E. (2009). Creating lift versus building the base: Current trends in marketing dynamics. International Journal of Research in Marketing, 26(1), 13–20. Leeflang, P. S. H., Wittink, D. R., Wedel, M., & Naert, P. A. (2000). Building models for marketing decisions. Boston, MA: Kluwer Academic Publishers. Lemmens, A., Croux, C., & Stremersch, S. (2012). Dynamics in the international market segmentation of new product growth. International Journal of Research in Marketing, 29(1), 81–92. Lillard, L. A. (1993). Simultaneous equations for hazards. Marriage duration and fertility timings. Journal of Econometrics, 56(1–2), 189–217. Martínez-Ruiz, M. P., Mollá-Descals, A., Gómez-Borja, M. A., & Rojo-Álvarez, J. L. (2006). Using daily store-level data to understand price promotion effects in a semiparametric regression model. Journal of Retailing and Consumer Services, 13(3), 193–204. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics. New York: Academic Press. Mela, C. F., Gupta, S., & Lehmann, D. R. (1997). The long-term impact of promotion and advertising on consumer brand choice. Journal of Marketing Research, 34(2), 248–261. Meza, S., & Sudhir, K. (2006). Pass-through timing. Quantitative Marketing and Economics, 4(4), 351–382. Miller, K. E., & Ginter, J. L. (1979). An investigation of situational variation in brand choice behavior and attitude. Journal of Marketing Research, 16(1), 111–123. Naik, P. A., Mantrala, M. K., & Sawyer, A. G. (1998). Planning media schedules in the presence of dynamic advertising quality. Marketing Science, 17(3), 214–235. Neelamegham, R., & Chintagunta, P. K. (2004). Modeling and forecasting the sales of technology products. Quantitative Marketing and Economics, 2(3), 195–232. Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185–204. Osinga, E. C., Leeflang, P. S. H., & Wieringa, J. E. (2010). Early marketing matters: A time-varying parameter approach to persistence modeling. Journal of Marketing Research, 46(1), 173–185. Papatla, P., & Krishnamurti, L. (1996). Measuring the dynamic effects of promotions on brand choice. Journal of Marketing Research, 33(1), 20–35. Park, S., & Gupta, S. (2011). A regime-switching model of cyclical category buying. Marketing Science, 30(3), 469–480. Rossi, P. E., Allenby, G. M., & McCulloch, R. (2005). Bayesian statistics and marketing. Chichester: John Wiley & Sons. Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). Semiparametric regression. Cambridge: University Press. Russell, G. (2014). Brand choice models. In R. S. Winer, & S. A. Neslin (Eds.), History of marketing science. World scientific – Now publishers series in business, Vol. 3. (pp. 19–46). Rutz, O. J., & Sonnier, G. P. (2011). The evolution of internal market structure. Marketing Science, 30(2), 274–289. Schindler, M., Baumgartner, B., & Hruschka, H. (2007). Nonlinear effects in brand choice models: Comparing heterogeneous latent class to homogeneous nonlinear models. Schmalenbach Business Review, 59(2), 118–137. Seetharaman, S. (2004). Modeling multiple sources of state dependence in random utility models: A distributed lag approach. Marketing Science, 23(2), 263–271. Shively, T. S., Allenby, G. M., & Kohn, R. (2000). A nonparametric approach to identifying latent relationships in hierarchical models. Marketing Science, 19(2), 149–162. Singh, S. S., & Jain, D. C. (2014). Evaluating customer relationships: Current and future challenges. In L. Moutinho, E. Bigné, & A. K. Manrai (Eds.), The Routledge companion to the future of marketing. New York: Routledge. Sloot, L. M., Fok, D., & Verhoef, P. C. (2006). The short- and long-term impact of an assortment reduction on category sales. Journal of Marketing Research, 43(4), 536–548. Sriram, S., Balachander, S., & Kalwani, M. U. (2007). Monitoring the dynamics of brand equity using store-level data. Journal of Marketing, 71(2), 61–78. Srivastava, R. K., Shocker, A. D., & Day, G. S. (1978). An exploratory study of the influences of usage situation on perceptions of product-markets. Advances in Consumer Research, 5, 32–38. Steiner, W. J., Brezger, A., & Belitz, C. (2007). Flexible estimation of price response function using retail scanner data. Journal of Retailing and Consumer Services, 14(6), 383–393. Steiner, W. J., Siems, F. U., Weber, A., & Guhl, D. (2014). How customer satisfaction with respect to price and quality affects customer retention: An integrated approach considering nonlinear effects. Journal of Business Economics, 84(6), 879–912. Stremersch, S., & Lemmens, A. (2009). Sales growth of new pharmaceuticals across the globe: The role of regulatory regimes. Marketing Science, 28(4), 690–708. Train, K. E. (2009). Discrete choice methods with simulation (2nd ed.). Cambridge: University Press. Umlauf, N., Adler, D., Kneib, T., Lang, S., & Zeileis, A. (2015). Structured additive regression models: An R interface to BayesX. Journal of Statistical Software, 63(21), 1–46. Van Heerde, H. J., Leeflang, P. S. H., & Wittink, D. R. (2001). Semiparametric analysis to estimate the deal effect curve. Journal of Marketing Research, 38(2), 197–215. Van Heerde, H. J., Leeflang, P. S. H., & Wittink, D. R. (2002). How promotions work: SCAN*PRO-based evolutionary model building. Schmalenbach Business Review, 54, 198–220. Van Heerde, H. J., Mela, C. F., & Manchanda, P. (2004). The dynamic effect of innovation on market structure. Journal of Marketing Research, 41(2), 166–183. Weber, A., Steiner, W., & Lang, S. (2017). A comparison of semiparametric and heterogeneous store sales models for optimal category pricing. OR Spectrum, 39(2), 403–445. Winer, R. S. (1986). A reference price model of brand choice for frequently purchased products. Journal of Consumer Research, 13(2), 250–256. Zhao, Y., Zhao, Y., & Song, I. (2009). Predicting new customers' risk type in the credit card market. Journal of Marketing Research, 46(4), 506–517.

Contents lists available at ScienceDirect

IJRM International Journal of Research in Marketing journal homepage: www.elsevier.com/locate/ijresmar

Full Length Article

Estimating time-varying parameters in brand choice models: A semiparametric approach Daniel Guhl a, Bernhard Baumgartner b, Thomas Kneib c, Winfried J. Steiner d,⁎ a

Humboldt University Berlin, Institute of Marketing, School of Business and Economics, Spandauer Straße 1, 10178 Berlin, Germany University of Osnabrück, Department of Marketing, Rolandstraße 8, 49069 Osnabrück, Germany Georg-August-Universität Göttingen, Department of Statistics and Econometrics, Humboldtallee 3, 37073 Göttingen, Germany d Clausthal University of Technology, Department of Marketing, Julius-Albert-Straße 2, 38678 Clausthal-Zellerfeld, Germany b c

a r t i c l e

i n f o

Article history: First received on January 30, 2010 and was under review for 7 months Available online 6 August 2018 Guest Area Editor: Harald J. Van Heerde Keywords: Brand choice modeling Time-varying parameters Heterogeneity Semiparametric regression P(enalized) splines

a b s t r a c t Nowadays, brand choice models are standard tools in quantitative marketing. In most applications, parameters representing brand intercepts and covariate effects are assumed to be constant over time. However, marketing theories, as well as the experience of marketing practitioners, suggest the existence of trends or short-term variations in particular parameters. Hence, having constant parameters over time is a highly restrictive assumption, which is not necessarily justiﬁed in a marketing context and may lead to biased inferences and misleading managerial insights. In this paper, we develop ﬂexible, heterogeneous multinomial logit models based on penalized splines to estimate time-varying parameters. The estimation procedure is fully data-driven, determining the ﬂexible function estimates and the corresponding degree of smoothness in a uniﬁed approach. The model ﬂexibly accounts for parameter dynamics without any prior knowledge needed by the analyst or decision maker. Thus, we position our approach as an exploratory tool that can uncover interesting and managerially relevant parameter paths from the data without imposing assumptions on their shape and smoothness. Our approach further allows for heterogeneity in all parameters by additively decomposing parameter variation into time variation (at the population level) and cross-sectional heterogeneity (at the individual household level). It comprises models without time-varying parameters or heterogeneity, as well as random walk parameter evolutions used in recent state space models, as special cases. The results of our extensive model comparison suggest that models considering parameter dynamics and household heterogeneity outperform less complex models regarding ﬁt and predictive validity. Although models with random walk dynamics for brand intercepts and covariate effects perform well, the proposed semiparametric approach still provides a higher predictive validity for two of the three data sets analyzed. For joint estimation of all regression coefﬁcients and hyperparameters, we employ the publicly available software BayesX, making the proposed approach directly applicable. © 2018 Elsevier B.V. All rights reserved.

⁎ Corresponding author. E-mail addresses: [email protected] (D. Guhl), [email protected] (B. Baumgartner), [email protected] (T. Kneib), [email protected] (W.J. Steiner).

https://doi.org/10.1016/j.ijresmar.2018.03.003 0167-8116/© 2018 Elsevier B.V. All rights reserved.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

395

1. Introduction The marketing literature comprises a large number of applications of discrete choice models directed at explaining consumer brand choices (see Russell, 2014, for a recent overview). Within the class of discrete choice models, the multinomial logit (MNL) model has been applied so regularly that it is called the “workhorse model” of marketing today (Rossi, Allenby, & McCulloch, 2005, p. 35). The MNL model is frequently applied to data from consumer or household1 scanner panels collected during observation periods reaching from six months to several years (Bronnenberg, Kruger, & Mela, 2008). The deterministic utility function of the MNL model captures the inﬂuence of variables, which are supposed to be the drivers of consumers' brand choice behavior. In the context of panel data, a consumer's utility of buying a certain brand is typically assumed to depend on brands' actual prices and related reference price terms, promotional activities (e.g., displays and feature advertising), current brand loyalties, and alternativespeciﬁc intercepts representing intrinsic brand utilities (Guadagni & Little, 1983). In addition, consumer heterogeneity plays a major role in marketing (Allenby & Rossi, 1999), and hence nowadays, almost all versions of the MNL model also account for unobserved heterogeneity, leading to the so-called mixed logit (MXL) model (Train, 2009). In the majority of publications, estimated parameters reﬂecting the inﬂuence of those predictors on brand choice as well as estimated brand intercepts have been assumed to be constant over time, i.e., equal across all purchase occasions. However, marketing literature as well as experiences reported by marketing practitioners suggests the possibility of changing consumer choice behavior over time. The effects of marketing variables might change because of many different reasons. During an economic downturn, e.g., price sensitivity may increase (Gordon, Goldfarb, & Li, 2013) and consumers may increasingly search for price deals advertised by features and displays. Price sensitivity may also vary depending on the intensity of and time since previous promotional activities in the product category (e.g., Foekens, Leeﬂang, & Wittink, 1999; Kopalle, Mela, & Marsh, 1999). Moreover, an advertising campaign may improve a brand's awareness and image or its perceived quality over time with the result of a higher intrinsic brand utility. Advertising may as well decrease price sensitivity (e.g., Boulding, Lee, & Staelin, 1994; Kaul & Wittink, 1995). Importantly, advertising activities are typically not recorded in panel data. The brand intercepts of a model, often referred to as brand preferences, represent the intrinsic utility of a brand net of (possibly changing) marketing mix effects and can also be interpreted as the utility-based brand value (Kamakura & Russell, 1993). Brand intercepts might also evolve over time, because consumers' brand choice may be affected by situational factors associated with the personal consultation with salespeople, out-of-stock situations, or different usage situations (Miller & Ginter, 1979; Srivastava, Shocker, & Day, 1978). Furthermore, marketing practitioners report an increase in the demand for higher-tier brands in certain product categories (e.g., coffee, chocolate) in the run-up to special events like Christmas, Easter, or Mother's day. In contrast to potential long-term trends in consumer choice behavior due to, e.g., advertising campaigns, situational factors will probably result in short-term ﬂuctuations of parameters.2 In all these cases, a more ﬂexible model speciﬁcation allowing for time-varying brand intercepts and time-varying effects of covariates is presumed to provide a better explanation and prediction of consumer choice behavior as compared to a model with constant parameters only. We will explore this potential improvement especially regarding prediction accuracy in holdout samples in our empirical application. That way, we are able to validate whether changes in consumer behavior across time are inherent in our data at hand. From a managerial point of view, an unexpected short-term increase in demand for a speciﬁc brand can cause an out-of-stock situation resulting in decreased proﬁts and dissatisﬁed customers. An unexpected decrease in demand, on the other hand, may lead to increased inventories or deterioration of goods and, therefore, to higher costs. For this reason, it is essential to know whether and when consumers may vary in their sensitivity to price and promotional activities or in their brand preferences even if the reasons/causes for the observed variations are not fully understood. Managers can also learn which recurring events (e.g., festive occasions) are actually important and how they inﬂuence demand (e.g., via changes in intrinsic brand utilities or changes in marketing-mix sensitivities). This information can then be potentially used for future marketing strategies (e.g., exploiting peak-demand). Ignoring time-varying effects concerning brand loyalties or consumer response to promotional activities may further mask (potentially harmful) trends in a product category like decreasing loyalties or increased bargain hunting. Further, changes in intrinsic brand utilities can be an indication of changes in the competitive structure between brands. For example, if the perceived quality of an established brand increases over time (e.g., via advertising investments), it may become a competitor for the higher-tier brands in the category. While our approach does not explicitly address the supply side, timevarying effects of marketing-mix variables also imply changes in optimal marketing policies over time. The model proposed in this paper can support managers in uncovering time-varying parameters and may serve as a basis to improve related marketing decisions. To be precise, we do not claim to present a model that incorporates all effects that can cause dynamics explicitly because the list of potential candidate variables is long and apparently depends on the context of the speciﬁc research question, the industry and/or product category, and/or the characteristics of the data set. We instead recommend

1

In the following, we will use the terms consumer and household interchangeably. We use the terms short- and long-term to conceptually differentiate between effects on the weekly or bi-weekly level and ones on a longer time-scale (e.g., half a year or longer). This deﬁnition is fairly arbitrary and is used to simplify verbal interpretations. It is, however, not related to the deﬁnition used in persistence modeling (see, e.g., Dekimpe & Hanssens, 2004 for an overview). 2

396

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

a fully data-driven approach as an exploratory tool, where we let the data determine the shape and smoothness of parameter evolutions over time. Our approach is to understand in a ﬁrst step whether a meaningful variation of utility parameters over time exists and how corresponding parameter paths look like. Once empirical evidence for dynamics has been found, approaches for modeling the cause of those dynamics may be added in a second step. On the other hand, there may be situations where (only) measuring dynamics without explaining them is already sufﬁcient (e.g., for the sake of brand value monitoring, see Sriram, Balachander, & Kalwani, 2007). Further, additional information for explaining time-varying effects may not be available (e.g., advertising activities). Stated differently, we propose an exploratory tool for understanding parameter dynamics even in cases either none or only limited prior knowledge regarding the shape and functional form of the time-varying functions exists, and/or when data for covariates driving changes in parameters over time is not available. Some papers have already considered the possibility of time-varying parameters in brand choice behavior, and we will present in the next section an overview and discuss strengths and weaknesses of those approaches. Importantly, many of these models (e.g., Mela, Gupta, & Lehmann, 1997) are inherently parametric, i.e., all covariate effects are summarized in a linear predictor for each of the brands. For an exploratory approach, it seems however promising to consider nonparametric models that provide ﬂexible time-effect curves (Stremersch & Lemmens, 2009). Nonparametric models do not only allow for smooth time-varying effect curves but also make more efﬁcient use of the data available by borrowing strength from neighboring time intervals. Baumgartner (2003) already proposed a nonparametric MNL model to estimate time-varying brand intercepts. A drawback of his approach is that the researcher has to predetermine the appropriate level of smoothness by optimizing information or cross validation criteria. Using his approach to estimate time-varying parameters for brand intercepts and covariates would require extensive search procedures in multidimensional parameter spaces. Two other papers, which are competing with our approach, are the papers of Kim, Menzefricke, and Feinberg (2005) and Lachaab, Ansari, Jedidi, and Trabelsi (2006). The authors of both papers independently propose a state-space approach, allowing (population) parameters to evolve across time in a (heterogeneous) choice model. State-space models are closely related to the nonparametric models considered in this paper. In particular, the best ﬁtting dynamic speciﬁcation in Lachaab et al. (2006) (i.e., the random walk model) can be interpreted as a special case of our approach since it is included in our framework. In the following, we propose MXL models with varying (population) parameters to uncover time-varying effects in consumer brand choice. Both time-varying brand intercepts and time-varying effects of covariates are modeled based on penalized splines, a ﬂexible yet parsimonious nonparametric smoothing technique (Eilers & Marx, 1996). Our estimation procedure is likelihood-based, and regression parameters are obtained by a penalized Fisher scoring procedure making the approximate covariance matrix available for the construction of credible intervals. Smoothing parameters governing the variability of each of the nonparametric function estimates for the covariates are derived from an approximate marginal likelihood procedure. As a consequence, there is no need for extensive search procedures to determine the appropriate amount of smoothness, and estimation is fully data-driven determining the ﬂexible function estimates as well as the corresponding degrees of smoothness in a uniﬁed approach. Also, the researcher can use different settings with respect to the number of knots, the degree of the spline, and/or the type of penalty term for each time-varying parameter. Finally, our approach is implemented using free software (R and BayesX).3 The remainder of the paper is organized as follows: In Section 2, we review the literature on discrete choice models with timevarying parameters as well as the literature on semi-/nonparametric models in marketing. In Section 3, we brieﬂy present the standard MNL model and introduce more ﬂexible model variants based on penalized splines for the estimation of time-varying parameters and/or consumer heterogeneity. We further illustrate how our spline approach is connected to the state-space approaches of Kim et al. (2005) and Lachaab et al. (2006). Subsequently, in Section 4, we demonstrate our methodology with three empirical applications. To provide managerial implications, we also focus on brand level results including market share considerations. We conclude in Section 5 with a summary of the paper's key ﬁndings and an outlook on future research perspectives. 2. Related literature 2.1. Discrete choice models with time-varying parameters All the papers reviewed in this subsection use scanner panel data to estimate brand choice models in which (at least some) parameters vary over time. As our focus is on continuous parameter evolutions instead of particular states, we do not consider hidden Markov models (Netzer, Lattin, & Srinivasan, 2008) or regime-switching models (Park & Gupta, 2011). We further exclude from this overview papers dealing with other forms of dynamics, such as state dependence due to inertia, habit persistence or carry over effects (Dubé, Hitsch, & Rossi, 2010; Guadagni & Little, 1983; Keane, 1997), serially correlated utility errors (Allenby & Lenk, 1995; Haaijer & Wedel, 20014), or learning (e.g., Erdem & Keane, 1996). Finally, we do not explicitly discuss models that have addressed varying parameters over choice tasks in stated preference settings (e.g., Hasegawa, Terui, & Allenby, 2012),5 or that focus on binary choice 3 In Web Appendix C, we have provided computer code together with a detailed description of commands to illustrate how models within our framework can be estimated. 4 Haaijer and Wedel (2001) use a probit model with a ﬂexible error structure that allows for correlations over time. Different dynamics are nested within their framework, e.g., autoregressive utilities or autoregressive parameters with or without error correlations, respectively. The authors apply their approach to household panel data and compare in- and out-of-sample ﬁts of different speciﬁcations. The dynamic probit model needs equally spaced time series observations for all households, a requirement which is not met in our empirical applications. For this reason, we abstain from comparing the dynamic probit model with our approach. 5 Nevertheless, the model of Hasegawa et al. (2012) who analyze brand satiation in a choice experiment is interesting because it accounts for individual dynamics in brand preferences. Their best-ﬁtting model includes an individual-speciﬁc quadratic trend to represent time-variation in the (baseline) utility. We picked up this idea and consider a similar speciﬁcation as benchmark model in our empirical application (Section 4) by adding individual-speciﬁc quadratic trends in the brand intercepts.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

397

settings such as churn (Holtrop, Wieringa, Gijsenberg, & Verhoef, 2017) or payment default (Zhao, Zhao, & Song, 2009) in customer relationship management. The (selection of) papers summarized in Table 1 can be classiﬁed along several dimensions: (1) whether the choice model speciﬁcation is logit or probit, (2) whether unobserved consumer heterogeneity is accounted for, (3) whether all or only a subset of parameters are allowed to vary over time, and (4) the chosen modeling and estimation framework for deriving time-varying parameters. We discuss each dimension to position our approach. 1. Roughly equal shares of the papers use either the logit model or the probit model. Probit models do not suffer from the IIA property and they can deal more easily with unobserved heterogeneity using a normal distribution. In addition, it is more straightforward to specify a probit model within a state-space framework and apply Kalman ﬁlter (KF) and Kalman smoother (KS) algorithms for estimating time-varying parameters (Lachaab et al., 2006; Rutz & Sonnier, 2011). However, computing choice probabilities is computationally demanding for more than 4 or 5 alternatives per choice set. Logit probabilities, on the other hand, can be computed in closed form and in case unobserved heterogeneity is accounted for, the IIA property is not a (serious) issue in logit models. 2. Except for Baumgartner (2003), all researchers model unobserved heterogeneity. Mela et al. (1997) and Heilman et al. (2000) use a discrete heterogeneity speciﬁcation (Kamakura & Russell, 1989), whereas other researchers rely on a continuous heterogeneity speciﬁcation (Gönül & Srinivasan, 1993). Consumer heterogeneity plays a very important role in marketing, and nowadays many marketing scholars favor the continuous speciﬁcation (Allenby & Rossi, 1999). 3. In some papers, all utility parameters vary over time (Jedidi et al., 1999). In other cases, depending on the research question at hand, only marketing-mix parameters (Papatla & Krishnamurti, 1996) or only brand intercepts (Baumgartner, 2003) vary. It seems difﬁcult to justify a priori why only some utility parameters should be allowed to vary over time while others are kept ﬁxed (except for empirical reasons). Hence, a less restrictive model that allows all parameters to be time-varying is preferable in general, particularly in an exploratory approach. 4. The papers use different frameworks for modeling and estimating time-varying parameters, and Van Heerde, Mela, and Manchanda (2004) provide a detailed discussion of related strengths and weaknesses. The papers can be differentiated according to (a) whether they model parameters as a function of observed time-varying covariates (“process functions”, Leeﬂang, Wittink, Wedel, & Naert, 2000), or (b) if they explicitly yield parameter paths. The former stream, also referred to as reparametrization approach, is easy to understand and can be estimated with standard software tools if suitable covariates for reparametrization are available. For example, Heilman et al. (2000) explain the evolution of brand preferences over time with the experience of a consumer by modeling brand preferences as to depend on (the logarithm of) related category purchases, while Jedidi et al. (1999) model utility parameters as functions of long-term advertising, long-term promotion, and loyalty. Using reparametrization approaches, however, “[o]nly the variance of the parameter estimate is computed; thus, it is not possible to reconstruct the parameter paths over time” (Van Heerde et al., 2004, p. 167). Other approaches explicitly yield parameter paths. Mela et al. (1997) rely on moving windows to obtain quarterly time-varying parameters for price and promotion effects, and Gordon et al. (2013) introduce interaction effects between the price variable and quarter-indicators. Importantly, a sufﬁcient number of observations for each time window is necessary here to estimate parameters reliably. More efﬁcient are state-space models6 (e.g., Kim et al., 2005; Lachaab et al., 2006; Rutz & Sonnier, 2011) and nonparametric regression techniques (e.g., Baumgartner, 2003). Both approaches are well suited for modeling time-varying parameters in an explorative fashion because they are (a) highly ﬂexible, (b) impose only a few a priori restrictions on the parameter paths, and (c) are easy to interpret (Lachaab et al., 2006). However, to the best of our knowledge, there is no standard software for estimating choice models with time-varying parameters based on the state-space approach. In contrast, the proposed penalized spline approach is easy to estimate using the free software package BayesX (Brezger, Kneib, & Lang, 2005), which further has an R-interface R2BayesX (Umlauf, Adler, Kneib, Lang, & Zeileis, 2015). Another feature of our approach is that depending on the chosen number of knots, degree of the spline, and/or type of penalty term more or less smooth parameter paths can be obtained (the latter may also guard against overﬁtting problems). Once parameter paths have been determined based on approaches of this second stream, additional covariates (if available) can be considered in a further step to analyze if drivers for the parameter evolutions can be identiﬁed. For example, Gordon et al. (2013) regressed estimated price elasticities on macroeconomic growth variables (e.g., GDP) in a second step. Besides, the data set lengths also differ considerably across papers. The shortest data set spans 52 weeks (Baumgartner, 2003), whereas the longest data set spans over 8 years (Mela et al., 1997).7 The periodicity of the time-varying parameters seems to be related to the length of the data set and varies between weekly (e.g., Baumgartner, 2003) to quarterly (e.g., Lachaab et al., 2006). In case a reparametrization is used (e.g., Heilman et al., 2000), parameters could (potentially) vary for each purchase occasion, no matter when it takes place. From a modeler's perspective, an approach is highly attractive if the periodicity can be adjusted easily

6 In recent years there was a strong trend in Marketing to use state-space models and the KF for dealing with time-varying parameters. We focus our attention here to the subset of discrete choice models. There is a much larger body of literature that relies these modern time series techniques to estimating aggregate demand models (e.g., Neelamegham & Chintagunta, 2004; Osinga et al., 2010; van Heerde et al., 2004) or to time-varying advertising response models (e.g. Bass, Bruce, Majumdar, & Murthi, 2007; Bruce, Peters, & Naik, 2012; Naik, Mantrala, & Sawyer, 1998). See Leeﬂang et al. (2009) for an overview. 7 Scanner panel data sets with a time dimension of several years are now widely available in academia and practice, see, e.g., the IRI marketing data set (Bronnenberg et al., 2008), or the recently published consumer panel by Nielsen in cooperation with the Kilts Center for Marketing, University of Chicago, Booth School of Business.

398

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Table 1 Overview of discrete choice models with time-varying parameters. Study

Choice model

Unobserved heterogeneity

Time-varying parameter(s)

Modeling and estimation approach

Papatla and Krishnamurti (1996)

Probit

Yes

Marketing-mix

Mela et al. (1997)

Logit

Jedidi, Mela, and Gupta (1999)

Probit

Yes (2 classes) Marginal price- and promotion effects Yes All utility parameters

Reparametrization: expectations-based approach that relies on time-varying covariates Moving window analysis that yields parameter paths + AR (1)-regression on second stage Reparametrization: expectations-based approach that relies on time-varying covariates Reparametrization: expectations-based approach that relies on time-varying covariates Smoothing splines: nonparametric approach that yields parameter paths; level of smoothness needs to be predetermined State-space approach: method that yields parameter paths; different models for parameter dynamics (e.g., random walk or VAR(1)) State-space approach: method that yields parameter paths; different models for parameter dynamics (e.g., random walk or VAR(1)) State-space approach: method that yields parameter paths; different models for parameter dynamics (e.g., random walk or VAR(1)); may include exogenous variables Interaction of quarter dummies and price that yields a path for the price effect only + linear regression of time-varying price elasticity on economic factors (e.g., GDP) P(enalized)-splines: nonparametric approach that yields parameter paths; ﬂexible function estimates and level of smoothness are determined simultaneously; different models for parameter dynamics depending on the number of knots, degree of spline, and type of roughness penalty

Heilman, Bowman, and Wright (2000) Logit

Yes (3 classes) All utility parameters

Baumgartner (2003)

Logit

No

Brand intercepts

Kim et al. (2005)

Logit

Yes

All utility parameters

Lachaab et al. (2006)

Probit

Yes

All utility parameters

Rutz and Sonnier (2011)

Probit

Yes

Latent brand factors

Gordon et al. (2013)

Logit

Yes

Price parameter and elasticity

This study

Logit

Yes

All utility parameters

to reﬂect what is best for the research question at hand in combination with the available data. The proposed penalized spline approach possesses this feature, as we will explain in Section 3. In sum, although several papers in marketing have addressed time-varying parameters in brand choice models, the number still is fairly small. The existing approaches are diverse and have different strengths and weaknesses, as described above. Our approach can be characterized in the following way: it is based on the wide-spread logit model, parameter paths over time can be explicitly estimated, it can deal with consumer heterogeneity, it can cope with either short or long time series, and it provides high ﬂexibility to let the data determine the shape of parameter evolutions over time in an explorative way through a number of additional spline setting options. Having such a ﬂexible and adaptive model over time can be especially useful in situations where data sets have long time dimensions, in turbulent markets, for new or changing product categories, and/or when short- and long-term effects are both at work. In addition, improving marketing activities (e.g., pricing, promotion, advertising, etc.) in response to changing utility parameters does not necessarily require the knowledge of the causes for those changes, as long as a timely and precise measurement of parameter evolutions is feasible. In particular, the ﬂexibility to measure parameter paths as accurate as possible can be considered the strength of the proposed approach. Nevertheless, a potential disadvantage of our approach is that splines cannot easily incorporate additional variables in a dynamic fashion like state-space models and the KF (e.g., see the model extension in Rutz & Sonnier, 2011). Therefore, in order to explain what drives the parameter paths, further analysis becomes necessary by using the time-varying parameters estimated in the ﬁrst step (or transformations hereof) as dependent variables in a second step (similar to Gordon et al., 2013).8 The fact that our approach is easily accessible using freely available software (BayesX) may constitute an additional beneﬁt for practitioners and researchers who want to estimate discrete choice models with timevarying parameters.

2.2. Non- and semiparametric models in marketing Given the fact that we advocate using splines for modeling time-varying parameters in choice models, a general brief discussion of non- and semiparametric regression approaches in marketing is in order. Using ﬂexible regression methods (e.g., kernel estimators, spline estimators, or k-nearest neighbor estimators) is not new in marketing (see Leeﬂang et al., 2000 for an overview). Several papers analyze nonlinear pricing and promotional effects on aggregated brand sales or market shares (see, e.g., Kalyanam & Shively, 1998; Van Heerde, Leeﬂang, & Wittink, 2001; Hruschka, 2002; Martínez-Ruiz, Mollá-Descals, Gómez-Borja, & Rojo-Álvarez, 2006; Steiner, Brezger, & Belitz, 2007; Brezger & Steiner, 2008; Lang, Steiner, Weber, & Wechselberger, 2015; Weber, Steiner, & Lang, 2017), some others non-linear effects in brand choice models (Abe, 1995; Abe, 1999; Abe, Boztuğ, & Hildebrandt, 2004; 8

Of course, similar to Papatla and Krishnamurti (1996), it is possible to directly add time-varying covariates in our model.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

399

Briesch, Chintagunta, & Matzkin, 2002; Kim, Menzefricke, & Feinberg, 2007; Kneib, Baumgartner, & Steiner, 2007; Schindler, Baumgartner, & Hruschka, 2007). Furthermore, Baumgartner and Hruschka (2005) study the allocation of catalogs to customers, Steiner, Siems, Weber, and Guhl (2014) apply nonparametric regression to the ﬁeld of customer satisfaction research, and Haupt, Kagerer, and Steiner (2014) propose ﬂexible semiparametric quantile regression models. For studying customer defection, Singh and Jain (2014) employ the semiparametric proportional hazard model of Lillard (1993), where the baseline hazard rate is modeled by piecewise linear splines. Shively, Allenby, and Kohn (2000) introduce a nonparametric approach to identifying latent relationships in hierarchical choice models. Four other studies are closely related to our research because they ﬂexibly model changes of parameters over time, although not in the context of brand choice modeling. Sloot, Fok, and Verhoef (2006) use cubic splines for modeling the short- and longterm effects of an assortment reduction on category sales. Stremersch and Lemmens (2009) apply penalized splines in the context of pharmaceutical marketing. They model time-varying effects of explanatory variables on aggregated new drug sales to better understand regulatory regimes. Lemmens, Croux, and Stremersch (2012) employ penalized splines for modeling ﬂexible growth effects of several products in multiple countries in a hidden Markov model. Kumar, Choi, and Greene (2017) propose a model with time-varying effects using penalized splines to analyze synergetic effects of social media and traditional marketing on ice-cream brand sales. The general advantage of ﬂexible non- or semiparametric approaches is that they do not impose any speciﬁc shape or functional form to the data. This might be crucial because wrong assumptions can lead to incorrect conclusions and false implications. Instead, ﬂexible methods “let the data speak” and hence minimize the risk of using a wrong speciﬁcation. However, more ﬂexibility comes at the cost of needing greater sample sizes to estimate the models. The mentioned studies have clearly established the value of non- and semiparametric models in marketing. Particularly when prior knowledge about the functional relationship is scarce, ﬂexible methods are highly valuable. Our exploratory approach for modeling and estimating time-varying parameters in brand choice models follows exactly this stream. 3. Methodology 3.1. Models 3.1.1. Standard MNL model In brand choice modeling, the MNL model is typically motivated from considering latent utilities describing the beneﬁt of purchasing a speciﬁc brand. In our data, we repeatedly observe purchases of households over a certain time span represented by a categorical response variable Yit ∈ Cit, where Cit = {1,…,k} represents the set of brands available at the store visited by household i (i = 1, …, n) at time t (t ∈ Ti ⊂ 1, …, T). Each household's choice is then associated with a set of k utility functions L(r) it , r = 1, …, (r) k, which are composed of a deterministic part η(r) it reﬂecting the inﬂuence of relevant covariates, and a random part it : ðr Þ

ðr Þ

Lit ¼ ηit þ

ðr Þ it

¼α

ðr Þ

ðr Þ 0

þ xit β þ

ðr Þ it :

ð1Þ

The parameters α(r) represent the brand intercepts (i.e., the intrinsic brand utilities or utility-based brand values), and the pa(r) rameter vector β captures effects of covariates (e.g., price or promotional activities) collected in the vector x(r) it . it is an error term accounting for the presence of unobservable inﬂuences in a household's brand choice decision. Assuming that consumers are util(r) ity maximizers and the error terms (r) it are i.i.d. standard extreme value distributed, the conditional choice probability πit of household i for brand r at time t is given by the well-known MNL equation (see, e.g., McFadden, 1974):

ðr Þ πit

ðkÞ

ðr Þ exp ηit ; ¼ Xk−1 ðr0Þ 1þ exp ηit r0¼1

ð1Þ

ðk−1Þ

πit ¼ 1−πit −…−πit

:

r ¼ 1; …; k−1; and

ð2Þ

ð3Þ

Because of identiﬁability reasons, only k − 1 brand intercepts can be estimated. Hence, without loss of generality, we choose (r) (r) brand k as the reference category and assume α(k) = 0 and x(k) it = 0. The latter can be achieved by simply redeﬁning xit as xit − (k) xit , i.e., we only consider contrasts to the reference category. One shortcoming of this basic model formulation is that it completely ignores the time-dependency of brand choice decisions and the repeated measurements of the same households. This has two immediate implications: observations are treated as independent, i.e., as if all observations were collected from different individuals, and parameters are assumed to be constant throughout the entire observation period. Therefore, we introduce time-varying parameters, heterogeneity, and the combination of both in the MNL model in three steps.

400

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

3.1.2. MNL-TVP model We extend Eq. (1) to consider time-varying brand intercepts and covariate effects. Speciﬁcally, the brand intercepts α(r) and the time-constant effects β of covariates are replaced by time-varying functions f(r) 0 (t) and fj(t), leading to the utility function ðr Þ

ðr Þ

ηit ¼ f 0 ðt Þ þ

J X

ðr Þ

xitj f j ðt Þ;

ð4Þ

j¼1

where j = 1, …, J denotes the covariates included in the model. That way, brand intercepts are allowed to vary over time, reﬂecting changes in intrinsic brand utilities that can be induced by either long-term trends or short-term ﬂuctuations in brand choice behavior. The effects of covariates might change over time because of long-term macroeconomic developments (e.g., Gordon et al., 2013 ﬁnd that price sensitivity is countercyclical) or because of short-term seasonal effects (e.g., Meza & Sudhir, 2006 ﬁnd that the effects of price and promotion variables temporarily increase during periods of peak demand). Within this framework, we will be able to investigate whether intrinsic brand utilities, price sensitivity, effects of promotional instruments, or effects of brand loyalty are changing over time. We refer to the time-varying parameter model (MNL-TVP) in Eq. (4) as semiparametric model, since the time-varying functions are modeled nonparametrically via penalized splines (see below), while the error term follows a parametric distribution. Similar to the basic parametric model, identiﬁability restrictions have to be imposed in the more ﬂexible model variants. Again, (k) we assume brand k to represent the reference category and, therefore, f(k) 0 (t) = 0 and xitj = 0. 3.1.3. MXL model Next, we add household heterogeneity to the MNL model, which yields the MXL model (see Train, 2009 for an overview). We employ a continuous heterogeneity speciﬁcation where random effects are assumed to follow i.i.d. Gaussian distributions. Heterogeneity is introduced in brand values by employing household-speciﬁc brand intercepts α(r) i , as well as in the covariate effects, i.e., by replacing common effects β with household-speciﬁc effects βi = β + bi. This leads to the utility function ðr Þ

ηit ¼ α

ðr Þ

ðr Þ

ðr Þ 0

ðr Þ 0

ð5Þ

þ α i þ xit β þ xit bi ;

∼ i. i. d. N(0, σ2r ) and bij ∼ i. i. d. N(0,σ2j ). Note that α(r) is now the mean intrinsic brand utility. All household-speciﬁc with α(r) i parameters are assumed to be the same across purchase occasions of the same household. 3.1.4. MXL-TVP model Both previously presented model extensions can be combined to a MXL model with time-varying parameters (MXL-TVP). In this case the utility function becomes ðr Þ

ðr Þ

ðr Þ

ηit ¼ α i þ f 0 ðt Þ þ

J X

ðr Þ

ðr Þ 0

xitj f j ðt Þ þ xit bi :

ð6Þ

j¼1

The very general model structure allows for differences in parameters across households and differences in the means of the heterogeneity distributions over time. It therefore additively decomposes parameter variation into time variation (at the population level) and cross-sectional variation. This setup is very similar compared to Kim et al. (2005, p. 283) and Lachaab et al. (2006, p. 61). These authors explicitly employ VAR models and variants hereof for the time-varying means of the heterogeneous parameters. However, in our case the speciﬁc model for the parameter dynamics depends on the deﬁnition of the splines (degree, number of knots) and the penalization, which we will discuss in the next section. In the context of aggregate-level data (e.g., sales or market share data) some authors have considered heterogeneous dynamics for different brands, categories, or stores (e.g., Van Heerde et al., 2004, Ataman, van Heerde, & Mela, 2010, or Osinga, Leeﬂang, & Wieringa, 2010). Therefore, heterogeneity and time variation have been decoupled in those approaches. However, in these cases, each heterogeneous unit has many observations (typically 50 to 250, and typically observations for each time period) which strongly simpliﬁes the inference of “individual” dynamics. In contrast, household panel data is usually rather sparse in the time dimension, and hence estimating “truly” individual time-varying parameters for each household is a much more difﬁcult task. In particular, the risk of overﬁtting is high with sparse data due to much less degrees of freedom. We leave this issue for future research. 3.2. Penalized splines To model any of the time-varying functions f(t) (for the sake of simplicity, we drop the covariate indices), we consider penalized splines, a ﬂexible nonparametric regression technique popularized by Eilers and Marx (1996). The idea is to represent the time-varying effects in terms of a high-dimensional parametric basis and to add an appropriate penalty term to the likelihood for the sake of regularization. Particularly, we employ polynomial splines of degree l to represent f(t), leading to

f ðt Þ ¼

M X m¼1

l

γ m Bm ðt Þ;

ð7Þ

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

401

with Blm (m = 1, …, M) representing B-spline basis functions (we refer to de Boor, 2001 as key reference for B-splines), and γm denoting the regression parameter to be estimated for the m-th B-spline basis function. Basically, polynomial splines of degree l form a class of piecewise polynomial functions under the additional condition that the piecewise polynomials are fused smoothly at the interval boundaries (also referred to as knots), such that the resulting function is l − 1 times continuously differentiable (Dierckx, 1993). Thus, two quantities characterize a polynomial spline ﬁt: the number of intervals that are reﬂected in the number of basis functions M, and the degree l that determines the function's overall smoothness properties. For the latter, a default choice are cubic splines (i.e., l = 3), leading to twice continuously differentiable functions. However, to get a model which is similar to the one proposed by Lachaab et al. (2006), we also use splines with l = 0 (i.e., piecewise constant step functions). The main difﬁculty lies in choosing an appropriate number of intervals. If the number is too large, the resulting ﬁt becomes overly ﬂexible leading to overﬁtting and, as an ultimate consequence, to a model with probably insufﬁcient predictive power when applied to new data. On the other hand, if the number of intervals is too small, the resulting function might not be ﬂexible enough to capture, e.g., short-term ﬂuctuations in consumers' choice behavior. As a remedy, Eilers and Marx (1996) have suggested on the one hand to use a moderately large number of intervals to ensure enough ﬂexibility for the unknown functions, and on the other hand to add a penalty term to the likelihood to enforce sufﬁcient smoothness and to avoid overﬁtting. We use equidistant intervals and 0.3 ∙ T knots as a heuristic default choice in our applications, where T is the number of time periods (e.g., weeks in our applications), and we round the resulting number of knots to the nearest integer. To assess the sensitivity of the estimation results regarding this default choice, we also use a higher number of knots (0.5 ∙ T). To mimic the random walk model of Lachaab et al. (2006) we choose one knot for each week. In this case, the basis functions basically indicate speciﬁc points in time, such that the difference penalty (see below) operates directly on the function evaluations. From a Bayesian perspective, this is equivalent to a random walk (see Lang & Brezger, 2004 for details). To regularize the estimation problem and to ensure that the estimated function ^f ðtÞ is ﬂexible but not “too ﬂexible”, the likelihood function is augmented by a penalty term. A suitable penalty term can be derived from squared p-th order derivatives and, according to B-spline theory (see, e.g., de Boor, 2001), we can approximate the derivative penalty with a roughness penalty based on ﬁrst or second order differences on adjacent regression parameters γm, leading to λ

M X

2

ðγ m −γm−1 Þ

or λ

m¼2

M X

2

ð8Þ

ðγ m −2γm−1 þ γm−2 Þ ;

m¼3

for ﬁrst order or second order differences, respectively. The smoothing parameter λ controls the trade-off between ﬂexibility (λ small) and smoothness (λ large) of the penalized spline. For the discussion of statistical inference in the next section, the compact representation of the difference penalties in terms of quadratic forms λγ′Pγ will be helpful, where P = D′D corresponds to the penalty matrix constructed from the ﬁrst or second order difference matrix. Our default choice is the second order difference penalty term. In contrast, for the Lachaab et al. (2006) speciﬁcations we use the ﬁrst order difference penalty term. The discussion shows that penalized splines seem very suitable for modeling time-varying parameters within an exploratory approach because different speciﬁcations depending on the settings for the degree of the spline, type of roughness penalty, and number of intervals (knots) can be applied and tested. Note that the speciﬁcations can further be different for each parameter in the model (e.g., more variability in brand intercepts, but smoother parameter paths for covariates). The best ﬁtting dynamic speciﬁcation in Lachaab et al. (2006), i.e., the random walk model, is just a special case of the penalized spline approach considering zero degree splines with knots at each observed point in time and a ﬁrst order difference penalty term. Our framework can therefore be considered as a kind of toolbox that enables researchers and analysts to uncover parameter dynamics in brand choice data in an explorative way and, if necessary, to adapt the model speciﬁcation to cope as best as possible with the characteristics of the data set at hand. We will compare several dynamic speciﬁcations (as well as combinations of different dynamic speciﬁcations for subsets of parameters) in Section 4. 3.3. Statistical inference Statistical inference of the MXL-TVP model is based on maximizing the following penalized (log-)likelihood9

lpen ðγ; α; bÞ ¼ lðγ; α; bÞ−

k1 X r¼1

ðr Þ

ðr Þ0

ðr Þ

λ0 γ 0 P 0 γ 0 −

J X j¼1

0

λ jγ jP jγ j−

J k1 X 1 ðr Þ0 ðr Þ X 1 0 α α − b jb j 2 σ 2j r¼1 σ r j¼1

ð9Þ

(k−1) ,γ1,…,γJ) contains the regression parameters associwhere l(γ,α, b) denotes the model's (log-)likelihood, and γ = (γ(1) 0 , …,γ0 (k−1) (1) (k−1) (t), …, f (t) and f (t), …, f (t). The vectors α = (α(1) ,…, α(k−1) ) and ated with the time-varying effects in f(1) 0 0 1 J 1 ,…, αn ,…, α1 n b = (b11, …, b1J, …, bn1, …, bnJ) contain the household-speciﬁc brand intercepts and covariate effects, respectively. Please note that each time-varying brand intercept and covariate effect is assigned a separate smoothing parameter allowing for different amounts of smoothness for each of the functions. Also note that for the heterogeneity parameters the smoothing constant is directly interpretable as the inverse variance of the normal distribution of each parameter. Given the smoothing parameters and the random effects variances, estimation of the penalized model can be achieved by a simple modiﬁcation of the usual Fisher

9

Simpler models (e.g., MNL, MNL-TVP, or MXL) would omit some or all penalty terms in Eq. (9).

402

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

scoring algorithm. To fully automate the estimation routine (given the choice for the number of knots, degree of spline, and type of penalization), optimal smoothing parameters and random effects variances have to be provided to the Fisher scoring algorithm and therefore the determination of the smoothing parameters is a crucial step in the estimation procedure. We apply a likelihood-based approach that originates from the close connection between penalized likelihood estimation and mixed models. This approach has received considerable attention in the statistical literature (see, e.g., Ruppert, Wand, & Carroll, 2003, or Fahrmeir, Kneib, & Lang, 2004 for overviews). Kneib et al. (2007) transferred the approach to ﬂexible brand choice models in order to accommodate nonlinear price-utility effects but did neither consider time-varying parameters nor the inclusion of household-speciﬁc heterogeneity. In the mixed model formulation of nonparametric regression models, marginal likelihood estimation can be applied for the joint estimation of regression effects and hyperparameters. Iterative updates of the regression parameters and the hyperparameters yield penalized maximum likelihood estimates upon convergence. Based on large sample theory and approximate normality of the estimates, signiﬁcance tests and credible intervals can be constructed (see Web Appendix A for technical details). 3.4. Fit measures We deliberately use several ﬁt measures to evaluate the model performance, because each measure has different properties and it is a priori unclear, which the single best measure is (Gneiting & Raftery, 2007). In particular, ﬁt and predictive validity are measured in terms of the log-Likelihood (log-Lik), Brier score, spherical score, and (using a scoring rule on the aggregate level) in terms of the average root mean squared error (ARMSE) between actual and predicted brand shares. The measures are calculated as follows (Kneib et al., 2007): log−Lik ¼

n X X

^ ðitr Þ ; log π

ð10Þ

i¼1 t∈T i

Brier score ¼ −

n XX k 2 X ðr Þ ^ ðitrÞ ; yit −π

ð11Þ

i¼1 t∈T i r¼1

Spherical score ¼

i¼1

^ ðitr Þ π rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ ; and Pk ðrÞ 2 t∈T i ^ π r¼1 it

n X X

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u Τ k u X 1X ðr Þ ðr Þ 2 t1 st −^st ; ARMSE ¼ k r¼1 T t¼1

ð12Þ

ð13Þ

where r∗ denotes the brand that household i has chosen on time t and y(r) it is the binary choice indicator observed for brand r Þ ^ ðr and household i at time t. Whereas the log-Likelihood only considers the logðπ it Þ-terms of brands chosen by households (but not of the brands which were not chosen by households), Brier and spherical scores utilize the entire predictive distribution of all ^ ðkÞ Þ. Stated differently, the log-Likelihood does not fully exploit the information contained in ^ ¼ ðπ ^ ð1Þ ; …; π choice probabilities π the predictive distribution and is therefore rather sensitive to extreme observations. ðrÞ The predicted shares of the brands in week t (^st ) are calculated from the predicted brand choice probabilities of the households in week t. To avoid artifacts introduced by weeks with only a very small number of purchases, squared errors are weighted by the number of purchases in the corresponding week. 4. Empirical study 4.1. Data In this section, the previously presented models are applied to a scanner panel data set referring to the product category ketchup.10 The data set belongs to the Nielsen ERIM single source data base and covers 26,820 purchase acts of 2,494 households for three ketchup brands across 2.5 years (134 weeks) in Sioux Falls, SD, USA.11 We subset the sample to households that made at least 10 ketchup purchases and bought at least once in each of the ﬁve half-year periods. Hence, we restrict our analysis to

10 Web Appendix B contains two additional empirical applications for the product categories detergent and cola. We omit them here because of space constraints. The relative performance of the competing types of models (excluding vs. including heterogeneity, excluding versus including time-variation) is replicated across the three applications, however, in each case a different model speciﬁcation performs best. 11 The data was provided by the James M. Kilts Center, GSB, University of Chicago.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

403

households with product class experience that report regularly (see for a similar approach Rutz & Sonnier, 2011).12 This minimizes the risk that parameters dynamics are confounded by heterogeneity and/or sample composition over time. This data pruning step results in a sample of 502 households making 10,292 purchases. On average each household made 20 purchases, and the median interpurchase time is about 7 weeks. The data set contains the dates of purchasing and choice of brands of each household, as well as observed (paid) prices and promotional activities. For model evaluation, we randomly divided the total number of purchase acts (less the number of purchase acts used for initialization) into two halves (see Table 2), estimated the model based on the ﬁrst subset (the estimation sample) and predicted consumer choices for the second subset (the validation sample). This enables us to employ a prediction-oriented approach for model evaluation, i.e., a model with higher complexity should be preferred only if it outperforms a more parsimonious model in validation samples. Please note that we did not split the data in the time dimension, since as compared to parametric approaches it is not possible or at least difﬁcult to predict new time observations with the semiparametric approach. Stated otherwise, without specifying additional assumptions or an additional reparametrization step there is no information ex ante how the parameter paths will evolve. On the other hand, our random split of purchasing acts across the whole estimation time-period allows us to determine whether the variation of parameters over time is a robust result and not just an artifact of data pathologies, such as reporting errors (Einav, Leibtag, & Nevo, 2010), which is important for an exploratory approach. Fig. 1 shows for each brand weekly time-series plots for shares, prices, and promotion variables. The plots indicate several interesting features of the ketchup data set: (a) It contains a lot of variation over time. (b) The price levels of Del Monte and Hunt's decrease during the ﬁrst half of the time window, while the price level of Heinz increases in the second half. (c) Brand shares seem to be rather stable, but whereas between weeks 52 and 78 the brands Del Monte and Hunt's have almost the same brand share, before and after this time window Hunt's seems to be the stronger brand. (d) Promotional intensities of the brands vary over time. This combination of changing marketing-mix variables and stable brand shares for some periods, as well as changing brand shares together with relatively stable prices for other periods implies that brand values and marketing-mix sensitivities should be changing over time. Therefore, the data set is suited for applying time-varying parameter brand choice models. 4.2. Speciﬁcation of covariates In contrast to observed prices and promotional activities, which are directly contained in our data, brand loyalties and reference prices have to be computed from each consumer's purchase history. Both inherently dynamic covariates capture (at least some) temporal correlation between purchases of households and have proven their ability to increase model ﬁt and predictive validity (Ailawadi, Gedenk, & Neslin, 1999; Kalyanaram & Winer, 1995). Following Guadagni and Little (1983), we recursively calculated loyalty values by exponentially smoothing past purchases Yi,(r)τ−1 of brand r made by household i at purchase occasion τ − 1 using smoothing constant ϑloy according to13: ðr Þ ðr Þ ðr Þ loyaltyiτ ¼ ϑloy loyaltyi;τ−1 þ 1−ϑloy Y i;τ−1 ;

0≤ϑloy ≤1

ð14Þ

(r) 14 where Y(r) This exponential smoothing i, τ−1 = 1 if household i purchased brand r on her/his last store visit (otherwise Yi, τ−1 = 0). is very popular in the marketing literature for capturing brand loyalty (Ailawadi et al., 1999). Importantly, Kim et al. (2005), Lachaab et al. (2006), and Rutz and Sonnier (2011) did not incorporate a loyalty variable in their models. To the best of our knowledge, our study is the ﬁrst that tries to estimate time-varying loyalty effects in brand choice models. Reference price terms are also regularly included in brand choice models (see Kalyanaram & Winer, 1995). Reference prices constitute internal prices consumers expect at a purchase occasion and compare observed prices to. Observed prices exceeding the reference price are perceived as losses and may deter from purchasing, while observed prices below the reference price are perceived as gains and presumably stimulate purchases. Prospect theory postulates asymmetric effects of gains and losses (Kahneman & Tversky, 1979; Winer, 1986). To account for possibly asymmetric reference price effects, two additional price terms corresponding to losses and gains need to be included in the brand choice model. Again, Kim et al. (2005), Lachaab et al. (2006) and Rutz and Sonnier (2011) refrain from incorporating reference price variables. Hence, to the best of our knowledge, this study is also the ﬁrst that models time-varying reference price effects.

12 In general, we recommend not using data sets with only a few observations per household (e.g., b4) for heterogeneous models, in particular when the data set is randomly split into two halves for estimation and validation. In the latter case, each household must appear in both the estimation and validation sample which becomes difﬁcult with sparse data at the individual household level. 13 Note that each purchase occasion τ of a household can be related to a speciﬁc time t, but a household does not necessarily purchase every time period (e.g., week) or can even purchase multiple times within a time period. Stated otherwise, the model framework does not require a balanced panel. 14 Instead of setting the smoothing constant to a ﬁxed value (e.g., Gupta, 1988), we estimated a value of 0.74 for ϑloy by a grid search within the interval [0,1] based on the standard MNL model with constant effects. This value is comparable to the results of other papers that used the same data set (e.g., Keane, 1997; Seetharaman, 2004), and is in line with results from other data sets (e.g., Briesch et al., 1997). Although an estimation of ϑloy using a model with time-varying parameters would be better, the considerable increase in the computational effort prevents us from doing so.

404

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Table 2 Summary statistics for the ketchup data set (502 households; 134 weeks). Brand name

No. purchases (estimation*)

No. purchases (validation*)

Price ($-cent per oz.)

Promotion (% of purchases)

Mean

sd

Display

Feature

Del Monte Heinz Hunt's Total

416 3855 882 5153

449 3796 894 5139

3.651 4.071 3.570

0.751 0.889 0.648

0.050 0.090 0.055

0.116 0.269 0.107

Note: *We used about one-third of all purchase acts (not reported in this table) to initialize brand loyalties and reference prices for the households.

share

price 5.0

0.75

4.5

0.50

4.0

0.25

3.5

0.00

3.0 0

26

52

78

104

130

0

26

52

display

78

104

130

104

130

feature 0.8

0.4

0.6

0.3 0.2

0.4

0.1

0.2

0.0

0.0 0

26

52

78

104

130

0

26

52

78

week brand

Del Monte

Heinz

Hunt's

Fig. 1. Time-series of brand shares, prices, and promotion variables (ketchup data).

In many empirical applications, reference prices (refprice) have been derived from prices paid in the past according to the framework of adaptive expectations by the following adaptive process (e.g., Lattin & Bucklin, 1989; Kalyanaram & Little, 1994; or Abe, 1998): ðr Þ ðr Þ ðr Þ refpriceiτ ¼ ϑref refpricei;τ−1 þ 1−ϑref pricei;τ−1 ;

0≤ϑref ≤1 :

ð15Þ

When comparing ﬁve different reference price formation models, Briesch, Krishnamurthi, Mazumdar, and Raj (1997) found this adaptive approach to outperform other operationalizations. Given the reference price of household i for brand r at purchase occa(r) (r) (r) 15 sion τ, gain and loss arise from max(refprice(r) iτ − priceiτ , 0) and max(priceiτ − refpriceiτ ,0), respectively. Promotional activities are operationalized by using two dummy variables describing the presence or absence of features and displays for each brand and purchase act, respectively. Summing up, all model speciﬁcations include brand intercepts, price, gain, loss, promotion (feature and display), and loyalty as covariates. Effects on deterministic utilities and brand choice probabilities are expected to be positive for gains, promotional variables, and brand loyalties, whereas effects of prices and losses are supposed to be negative. All effects are allowed to vary over time. 4.3. Model comparison The main purpose of our empirical application is to compare the performance of the different model versions introduced in Section 3.1 (as well as hybrid versions of them). This comparison enables us to evaluate the gains from using more ﬂexible models, which incorporate time-varying parameters and/or account for consumer heterogeneity. Furthermore, the different versions of the models with time-varying parameters should shed light on the question which speciﬁcation for parameter dynamics works best. In 15 Based on the standard MNL model with constant effects, we again estimated the smoothing constant ϑref via grid search and obtained a value of 0.69. The value is slightly higher than the results reported in Briesch et al. (1997) but still comparable (0.47–0.65). The same argument as before also applies here, a grid search using a model with time-varying parameters would have been better, but the very high computational effort prevented us from doing so. However, we do not expect results to be sensitive to this decision.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

405

addition, it is also important to evaluate the performance of the proposed approach to the currently best available competing models. Table 3 summarizes the models used in the comparison and describes how they are related. The nonparametric models differ in the number of knots (0.3 ∙ T, 0.5 ∙ T, or Τ), the degree of the spline (zero-order or cubic), and the type of penalty term (1st or 2nd order differences). Note that some models use different settings of these characteristics for different time-varying parameters. For example, models (10) and (11) use the very ﬂexible random walk speciﬁcation of model (9) for brand intercepts, but the less ﬂexible speciﬁcation of the models (7) respectively (8) for the time-varying effects of covariates. We also include a parametric model (MNL-TVP4) with quadratic trends in both the brand intercepts and all covariate effects across weeks, which is a model speciﬁcation practitioners would most likely use to accommodate time-varying parameters. Further, we add a parametric model with individual dynamics (MXL-TVP4). It contains heterogeneous quadratic trends for the brand intercepts and has similarities to Hasegawa et al. (2012). Accordingly, this model can be stated as ðr Þ ðr Þ ðr Þ ðr Þ ðr Þ ðr Þ ðr Þ 2 ðr Þ 0 ðr Þ 0 ηit ¼ α 0 þ α 0i þ α 1 þ α 1i t þ α 2 þ α 2i t þ xit β þ xit bi ;

ð16Þ

(r) (r) (r) 2 (r) 2 where α(r) 0 , α1 , and α2 determine the mean quadratic trends for the intrinsic brand utilities, and α0i ∼ N(0, σ0r), α1i ∼ N(0,σ1r), 2 ∼ N(0, σ ) account for household heterogeneity in those quadratic trends. Lastly, we also add two models (MXL-VAR and and α(r) 2i 2r MXL-RVAR) as proposed by Kim et al. (2005) to have a direct comparison to their Bayesian approach. In particular, we use the VAR(1) model and a restricted version hereof (RVAR(1)), which ﬁtted best in Kim et al. (2005). Table 4 reports ﬁt measures for each model in the estimation and validation sample, respectively. Irrespective of the scoring rule the model ranking does not change, enhancing conﬁdence in the results. Models with heterogeneity and parameter dynamics ﬁt the data better (in- and out-of-sample) than models without heterogeneity and/or parameter dynamics. Of course, in-sample performance was expected to be better for more complex models. The simplest model with dynamics (MNL-TVP4) shows only slight improvements over the static MNL model both in- and out-ofsample. Hence, parametric dynamics (in form of quadratic trends) are not sufﬁcient for modeling complex parameter evolutions in this data set. The models of Kim et al. (2005) have by far the best in-sample ﬁt, but perform out-of-sample even worse than the simple parametric MXL model (i.e., the heterogeneous model without time-varying parameters). This indicates strong overﬁtting of the MXLVAR and MXL-RVAR models and we conclude that these models are not suitable for modeling dynamics in this data set.16 The MXL-TVP4 also shows clear indication of overﬁtting. If we ignore the models of Kim et al. (2005), the MXL-TVP4 model has the best in-sample ﬁt according to the log-Likelihood and the spherical score. However, the predictive performance of the MXLTVP4 model is rather bad and in the range of the simple MXL model. Focusing on the rest of the models, we observe that for the individual-level scoring rules (log-Lik, Brier score, spherical score) improvements in model ﬁt due to heterogeneity turn out somewhat larger than improvements due to parameter dynamics. This result is consistent with the ﬁndings of Lachaab et al. (2006) and underlines the importance of accounting for heterogeneity in brand choice models (Allenby & Rossi, 1999). Further, the improvements in model ﬁt are smaller in the validation sample than in the estimation sample, indicating that precise individual-level estimates are rather difﬁcult to obtain. Relative improvements due to accommodating timevarying parameters are more or less the same in- and out-of-sample. Therefore, the estimated parameter paths (to be discussed in more detail in the next subsection) are not an artifact and applying a model with time-varying parameters is advisable. The best-ﬁtting model out-of-sample is the MXL-TVP32 model, which is a “hybrid model” with random walk dynamics for the brand intercepts and “smoother” dynamics for the covariates. This is interesting because it shows that the model with the highest ﬂexibility in all parameters (i.e., the MXL-TVP3 model) is not necessarily the best model. On the other hand, a model which is very smooth in all parameters (i.e., the MXL-TVP1) does also not predict best.17 This ﬁnding highlights the beneﬁt of using a modeling toolbox with diverse speciﬁcation options in the context of brand choice models with time-varying parameters. The brand share analysis mostly conﬁrms the results discussed before. However, on the aggregate level heterogeneity seems clearly less important and larger improvements in ARMSE are due to parameter dynamics. Again, the MXL-TVP32 model shows the highest ﬁt out-of-sample (and also in-sample if we ignore the strongly overﬁtting MXL-VAR and MXL-RVAR models), closely followed by the MXL-TVP31 and MXL-TVP3 models.18

4.4. Discussion of results In this section, we discuss our estimation results for a subset of the estimated models in more detail. In particular, we compare the MXL-TVP1, MXL-TVP3, and MXL-TVP32 models to see how their parameter paths differ. The three models are among those with the highest predictive validities, with the MXL-TVP32 showing the best performance across all predictive validity measures.19 16

In Web Appendix B1 we analyze and discuss both models (MXL-VAR and MXL-RVAR) in greater detail. The MXL-TVP32 model has also a better ﬁt in-sample compared to the MXL-TVP1 and MXL-TVP3 models. In a recent working paper (Baumgartner et al., 2018), we compare the class of TVP-MNL models (i.e., models without heterogeneity) in a different empirical application to a number of further benchmark models, among others the nonparametric model of Baumgartner (2003), dynamic models with seasonal effects and a model with brand-week ﬁxed-effects. In addition, we extend the TVP-MNL models to consider alternative-speciﬁc instead of generic covariate effects. 19 According to Van Heerde, Leeﬂang, and Wittink (2002), marketing managers should favor the model with the highest predictive performance. For the sake of clarity, we do not include the MXL-TVP2 and MXL-TVP31 models in the ﬁgures. The predictive accuracy of the MXL-TVP2 lies in-between the MXL-TVP1 and the MXL-TVP3, the MXL-TVP31 performs almost as well as the MXL-TVP3 (please compare Table 4). Conﬁdence bounds are included in subsequent ﬁgures. 17 18

406

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Table 3 Model speciﬁcations. Model

Heterogeneity Number of Degree of knots spline

Roughness penalty

Description

1) MNL 2) MNL-TVP1 3) MNL-TVP2 4) MNL-TVP3

No No No No

– 0.3 ∙ T⁎ 0.5 ∙ T T

– cubic cubic zero-order

– 2nd order 2nd order 1st order

5) MNL-TVP4 6) MXL 7) MXL-TVP1 8) MXL-TVP2 9) MXL-TVP3

No Yes Yes Yes Yes

– – 0.3 ∙ T 0.5 ∙ T T

– – cubic cubic zero-order

– – 2nd order 2nd order 1st order

10) MXL-TVP31

Yes

11) MXL-TVP32

Yes

0.3 ∙ T ⁎⁎, T ⁎⁎⁎ 0.5 ∙ T ⁎⁎, T ⁎⁎⁎

12) MXL-TVP4

Yes

–

cubic⁎⁎, zero-order⁎⁎⁎ cubic⁎⁎, zero-order⁎⁎⁎ –

2nd order⁎⁎, 1st order⁎⁎⁎ 2nd order⁎⁎, 1st order⁎⁎⁎ –

13) MXL-VAR 14) MXL-RVAR

Yes Yes

– –

– –

– –

Simple MNL model MNL with smooth time-varying parameters MNL-TVP1 with more knots MNL with smooth time-varying parameters and as many knots as weeks; mimics the KF model of Lachaab et al. (2006) MNL model with quadratic trends in all parameters Simple MXL model MXL with smooth time-varying parameters MXL-TVP1 with more knots MXL with smooth time-varying parameters and as many knots as weeks; mimics the KF model of Lachaab et al. (2006) Hybrid version of MXL-TVP1 and MXL-TVP3; more ﬂexibility for intercepts, more smoothness for covariate effects Hybrid version of MXL-TVP2 and MXL-TVP3; more ﬂexibility for intercepts, more smoothness for covariate effects MXL with heterogeneous quadratic trends for brand intercepts; has similarities to Hasegawa et al. (2012) VAR(1) model of Kim et al. (2005) RVAR(1) model of Kim et al. (2005)

Note: ⁎ T is the number of time periods (e.g., weeks in our applications) and we round the resulting number of knots to the nearest integer. ⁎⁎ For effects of covariates. ⁎⁎⁎ For brand intercepts.

We further add the simple MXL as a benchmark. Remember that models accounting for heterogeneity consistently outperformed homogeneous models (see Table 4). Fig. 2 depicts the estimated parameter paths for the four models, with time t (weeks) on the x-axis and estimated parameters for brand intercepts and covariate effects on the y-axis. The dotted black line represents the mean of the (time-constant) parameters for the MXL model, the dashed green (MXL-TVP1), dash-dotted red (MXL-TVP3), and solid blue (MXL-TVP32) lines show the means of the time-varying parameters. The horizontal lines for the MXL model can be interpreted as average results over time. Please note the different interpretation of the value 0 for the brand intercepts on the one hand and the covariate effects (i.e., price, gain, loss, loyalty, display, and feature) on the other hand. Concerning the brand intercepts, the value 0 refers to the utility of the reference brand (i.e., Hunt's). Regarding the covariate effects the value 0 implies that the covariate has no effect on the utility of the consumer. The time-varying parameters show noticeable similarities across models. Therefore, we ﬁrst focus on common patterns across the three TVP models and then discuss model differences. 4.4.1. Brand intercepts Throughout the whole observation period, Del Monte's intrinsic brand utility has been lower than that of Hunt's. Except for some weeks in the last third of the data set the intercept of Heinz is larger than 0. Heinz has, therefore, the largest brand value followed by Hunt's and Del Monte. The ordering of the brand values is also the same for the MXL model with constant parameters. However, this model ignores the considerable changes in the brand values over time. The models with time-varying parameters reveal, e.g., that Del Monte's brand intercept increases over the ﬁrst 52 weeks and then decreases during the second year of the data set. This inverse U-shaped brand value evolution coincides with an U-shaped evolution of its weekly average prices during this time span: the average price level of Del Monte ﬁrst declines reaching the minimum around week 60, and subsequently stabilizes at a slightly higher value (see Fig. 1). A similar price-utility pattern is observed for Heinz ketchup. The brand intercept of Heinz has been considerably higher during the ﬁrst half of the data set (with a maximum at week 55) and then drops to a lower level. This pattern matches the increase in the (average) price level of Heinz in the second half of the data set. Stated differently, the inﬂuence of price changes on a consumer's deterministic utility may not only be reﬂected by a time-varying price parameter but may also be captured in the brand values. Indeed, in stable categories like ketchup where people most likely will not enter or leave the market because of a price change of approximately 0.5 $-cent per ounce, it makes sense that such changes will be absorbed to some degree in the brand intercepts, at least in the short run.20 In sum, the time-varying models provide interesting (explorative) insights with respect to the evolution of brand value over time, which is an inherently dynamic construct (Erdem et al., 1999). Furthermore, even though the brand shares vary over 20 In a second and unrelated step, we reparameterized the time-varying brand intercepts (each one at a time) of the best-ﬁtting model MXL-TVP32 (compare Table 4, model 11) as to depend on past own prices and past own promotional activities following the intercept process function speciﬁcation as used in Foekens et al. (1999). In particular, we computed for each brand and week the weighted sum of past prices and past promotional intensities (e.g., percent display across observed purchase acts in a week) with an exponentially declining weight within a time window of 6 weeks. We further used the inverse sampling variance of the estimated time-varying intercepts of the ﬁrst stage as regression weights to account for heteroscedasticity. Although the estimated effects for the three covariates are well-interpretable (e.g., higher past own prices have a negative effect on current brand values), the explained variances of these time-series regressions were fairly low (31% for Del Monte, 19% for Heinz). Results can be obtained from the authors upon request.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

407

Table 4 Fit and predictive validity (ketchup data). Model

Log-Lik

Brier score

Spherical score

ARMSE

Data set: estimation 1) MNL 2) MNL-TVP1 3) MNL-TVP2 4) MNL-TVP3 5) MNL-TVP4 6) MXL 7) MXL-TVP1 8) MXL-TVP2 9) MXL-TVP3 10) MXL-TVP31 11) MXL-TVP32 12) MXL-TVP4 13) MXL-VAR 14) MXL-RVAR

−2503.536 −2448.143 −2446.975 −2377.526 −2486.439 −1909.215 −1828.434 −1813.228 −1792.563 −1792.481 −1784.468 −1771.250 −1187.755 −1438.816

−1382.973 −1349.029 −1348.324 −1310.781 −1371.497 −1062.927 −1010.848 −1003.968 −990.114 −990.020 −987.248 −987.683 −678.219 −820.854

4363.311 4383.192 4383.597 4406.645 4368.986 4543.255 4573.909 4578.426 4586.641 4586.673 4588.865 4589.098 4767.558 4689.401

0.0721 0.0618 0.0616 0.0495 0.0701 0.0639 0.0550 0.0548 0.0476 0.0476 0.0475 0.0612 0.0332 0.0409

Data set: validation 1) MNL 2) MNL-TVP1 3) MNL-TVP2 4) MNL-TVP3 5) MNL-TVP4 6) MXL 7) MXL-TVP1 8) MXL-TVP2 9) MXL-TVP3 10) MXL-TVP31 11) MXL-TVP32 12) MXL-TVP4 13) MXL-VAR 14) MXL-RVAR

−2684.858 −2653.468 −2652.923 −2631.779 −2669.448 −2420.971 −2385.807 −2379.836 −2369.880 −2370.512 −2366.492 −2416.059 −4699.364 −4098.767

−1502.482 −1484.072 −1483.749 −1473.027 −1495.612 −1370.856 −1346.748 −1344.226 −1337.959 −1338.788 −1336.935 −1372.683 −1836.804 −1819.550

4272.380 4283.621 4283.809 4290.508 4275.778 4343.625 4358.683 4360.079 4363.315 4362.793 4363.856 4342.660 4132.380 4124.422

0.0777 0.0723 0.0722 0.0693 0.0743 0.0741 0.0695 0.0694 0.0669 0.0668 0.0667 0.0732 0.0877 0.0861

Note: Best-ﬁtting model indicated in bold within each data set for each performance measure.

time and also lead on average to the same brand ordering (see Fig. 1), the shares are further driven by the marketing-mix of all competitors, as well as brand loyalty and reference price effects (see Kamakura & Russell, 1993). In particular, the gap between the top-brand Heinz and its competitors appears to be much larger based on the brand shares than the gap based on the comparison of the time-varying brand intercepts. The latter measure of brand performance is thus more informative because it shows differences more nuanced and net of confounding effects. Therefore, a brand choice model with time-varying brand intercepts is a valuable tool for brand value measurement and monitoring (Sriram et al., 2007). Using disaggregated data (instead of market shares) further enables the researcher to control for heterogeneity as well as brand loyalty more easily, which is important for an unbiased measurement of effects (see Keller & Lehmann, 2006). 4.4.2. Price-related covariates The estimated parameters for the covariates price, gain, and loss show the expected signs (negative for price and loss, positive for gain). The price effect decreases (in absolute terms) over the time period of the data set, indicating that households on average become less price-sensitive. This decreased price sensitivity might be driven by the increase in the average price level of the leading brand Heinz (see Lachaab et al., 2006 for a similar argument regarding changes in price elasticities). The gain parameter reveals some kind of U-shaped pattern (MXL-TVP1, MXL-TVP32) but anyway strongly increases toward the end of the time window (MXL-TVP1, MXL-TVP32, and MXL-TVP3), whereas the loss parameter remains almost constant over time. The loss effect is further not signiﬁcantly different from zero (as indicated by the credible intervals of all three MXL-TVP model versions, see Figs. 3 and 4 below). Please note, however, that the mean estimates of the models depicted in Fig. 2 also account for heterogeneity and, therefore, at least some households are signiﬁcantly loss-averse. It is important to know the impact of household heterogeneity in reference price effects for optimal pricing policies, as has been shown by Kopalle, Kannan, Boldt, and Arora (2012). Considering time-varying parameters, computing optimal prices is certainly even more difﬁcult. Nevertheless, in the case of distinct changes, like the ones depicted in Fig. 2 for the gain effect, accommodating time-variation for optimal pricing could be potentially proﬁtable. 4.4.3. Brand-loyalty The loyalty effect shows the expected positive sign and is (slightly) decreasing in the course of time. Again, regarding optimal pricing policies, a similar argument as before applies here. Dubé, Hitsch, Rossi, and Vitorino (2008) discuss optimal pricing with

408

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

intercept Del Monte

intercept Heinz 2

0.0 −0.5

1 −1.0 −1.5

0

−2.0 0

26

52

78

104

130

0

26

52

price

78

104

130

78

104

130

104

130

104

130

gain

0 4 −2 2

parameter value

−4 0

−6 0

26

52

78

104

130

0

26

52

loss

loyalty 3.5

1 0

3.0

−1 2.5 −2 −3

2.0 0

26

52

78

104

130

0

26

52

display

78

feature 2

1.5 1.0

1

0.5

0

0

26

52

78

104

130

0

26

52

78

week model

MXL

MXL−TVP1

MXL−TVP3

MXL−TVP32

Fig. 2. Estimated parameter paths (MXL-models; ketchup data).

state-dependent utility, and being able to measure time-varying effects would be the ﬁrst step to extend their approach to a situation where the effect of brand-loyalty is changing over time.

4.4.4. Promotion covariates The plots for the effects of display and feature advertising reveal interesting patterns, too. Both covariates have a positive and signiﬁcant effect on utility (except for the feature effect during the very ﬁrst weeks where the credible intervals contain the 0, see Figs. 3 and 4 below). The feature effect markedly increases over time according to an inverted s-shaped trend, turning out at the end of the time window at least twice as high as compared to the beginning. In contrast, the display effect is fairly stable in the long-term and starts to increase only during the last third of the time window. For a retailer, such changes are also important to know in order to plan and set up promotional activities efﬁciently.

4.4.5. MXL-TVP1 vs. MXL-TVP3 Fig. 3 explicitly contrasts the estimated parameter paths of those two speciﬁcations and now also displays the corresponding 95% point-wise credible intervals (shaded). The comparison shows that some of the estimated parameter paths are highly similar (e.g., for the brand intercepts of Heinz) or only differ in their level of smoothness but not in their course (e.g., for the brand intercepts of Del Monte), while other effects turn out quite different (e.g., see the gain parameter evolutions).

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

intercept Del Monte

409

intercept Heinz 2

0.0 −0.5

1 −1.0 −1.5

0

−2.0 0

26

52

78

104

130

0

26

52

price

78

104

130

78

104

130

104

130

104

130

gain

0 4 −2 2

parameter value

−4 0

−6 0

26

52

78

104

130

0

26

52

loss

loyalty 3.5

1 0

3.0

−1 2.5 −2 −3

2.0 0

26

52

78

104

130

0

26

52

display

78

feature 2

1.5 1.0

1

0.5

0

0

26

52

78

104

130

0

26

52

78

week model

MXL−TVP1

MXL−TVP3

Fig. 3. Estimated parameter paths (MXL-TVP1 vs. MXL-TVP3; ketchup data).

Note that the 95% point-wise credible intervals overlap for all parameter estimates of the two models. Therefore, despite the fact that the MXL-TVP3 model ﬁts the data still better in- and out-of-sample compared to the MXL-TVP1, the resulting estimates for the parameter paths are statistically indistinguishable. Still, except for the mean loss effect (which is not signiﬁcant over the whole time-period), we clearly observe a greater temporal ﬂexibility of the MXL-TVP3 model, even though its parameter paths are often only ﬂuctuating around the smoother parameters paths of the MXL-TVP1 model (see, e.g., the brand intercepts or the feature effect). Further, both models differ in their parameter evolutions for the price and gain effects. While the MXL-TVP1 model is less ﬂexible in general, its price parameter path reveals larger amplitudes and consequently more variation over the whole time window as compared to the MXL-TVP3 model. Moreover, the estimated gain effect for the MXL-TVP3 model is clearly different from the U-shape evolution of the MXL-TVP1 model and shows considerably more variation at the weekly level (i.e., is much more wiggly). At the same time, the credible interval of the MXL-TVP3 model most of the time contains the 0 suggesting that the mean gain effect is not signiﬁcant. Finally, the decreasing trend for the loyalty effect is less distinctive for the MXLTVP3 model. Altogether, it seems that the higher ﬂexibility of the MXL-TVP3 model is advantageous compared to the “smoother” MXL-TVP1 model as it provides a better ﬁt in- and out-of-sample (see Table 4). The general managerial insights regarding changes in brand values as well as price and promotion sensitivities are nevertheless often similar. Note that the less ﬂexible MXL-TVP1 model seems to allow a more precise measurement of brand values compared to the MXL-TVP3 model, as indicated by narrower credible intervals at most points in time (this also holds for the detergent data set, as is illustrated in Appendix B).

410

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

intercept Del Monte

intercept Heinz 2

0.0 −0.5

1 −1.0 −1.5

0

−2.0 0

26

52

78

104

130

0

26

52

price

78

104

130

78

104

130

104

130

104

130

gain

0 4 −2 2

parameter value

−4 0

−6 0

26

52

78

104

130

0

26

52

loss

loyalty 3.5

1 0

3.0

−1 2.5 −2 −3

2.0 0

26

52

78

104

130

0

26

52

display

78

feature 2

1.5 1.0

1

0.5

0

0

26

52

78

104

130

0

26

52

78

week model

MXL−TVP3

MXL−TVP32

Fig. 4. Estimated parameter paths (MXL-TVP3 vs. MXL-TVP32; ketchup data).

4.4.6. MXL-TVP3 vs. MXL-TVP32 The model with the best predictive performance, the MXL-TVP32, uses the most ﬂexible speciﬁcation for the brand intercepts (following Lachaab et al., 2006), but “smoother” dynamics for covariate effects (see Table 3). Fig. 4 explicitly contrasts this hybrid model with the MXL-TVP3 model (the latter which consistently imposes the random walk speciﬁcation according to Lachaab et al., 2006 on all effects). The motivation behind the MXL-TVP32 is the idea that brand values may be rather volatile and then would require a more ﬂexible speciﬁcation, while changes of effects of covariates may be expected to be more smooth. If this assumption holds the MXL-TVP3 model should be prone to overﬁt covariate effects which might, in turn, entail a worse out-of-sample ﬁt. The predictive validity results (see Table 4) interpreted together with the estimated parameter paths (Fig. 4, also Fig. 2) favor this argumentation, because the MXL-TVP32 performs even better than the MXL-TVP3 with regard to all predictive validity measures. The MXL-TVP3 seems to overﬁt some covariate effects (especially the gain effect) here. While the price sensitivity decreases in a linear way according to the MXL-TVP32, the MXL-TVP1 (and also the MXL-TVP3 in less pronounced form) suggests a cyclic pattern, as discussed above. As expected, the parameter paths for the brand intercepts for the MXL-TVP3 and the MXL-TVP32 model are virtually identical. Altogether, the MXL-TVP32 seems to combine two favorable properties leading to a better model performance than the MXL-TVP3 in the ketchup category: high ﬂexibility to model rather volatile brand values (brand intercepts), less ﬂexibility to model more smoothly evolving covariate effects.21

21

Please note that the model performance depends on the product category and on the characteristics of the data set at hand, as is illustrated in Web Appendix B.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

411

4.5. Managerial implications Instead of overall ﬁt and predictive validity statistics, product or brand managers may be more interested in ﬁndings on the individual brand level and related improvements resulting from applying a more complex model. From a managerial point of view, the comparison of the following four models seems reasonable: (1) the MNL as simplest benchmark model (no heterogeneity, no time-varying parameters) (2) the MXL as the standard approach in research and practice for modeling brand choice (with heterogeneity, no time-varying parameters), (3) the MNL-TVP3 as the best performing model with parameter dynamics only (no heterogeneity, with time-varying parameters, see Table 3), and (4) the MXL-TVP32 as our model with the best out-of-sample performance across all models (with heterogeneity, with time-varying parameters). Table 5 reports RMSE values in shares for each ketchup brand (in- and out-of-sample), and reveals to which extent the brands beneﬁt from greater model ﬂexibility. We further report the improvement in ﬁt for the MXL, MNL-TVP3, and MXL-TVP32 models over the MNL model to gain further understanding which component (heterogeneity, time variability) increases the model ﬁt the most. As expected, in-sample ﬁt for all individual brands improves for the more ﬂexible models. Furthermore, accommodating parameter dynamics provides a larger improvement (MNL-TVP3 vs. MNL) compared to accounting for heterogeneity only (MXL vs. MNL). This is remarkable because the marketing literature predominantly focuses on the importance of heterogeneity. However, our results suggest that homogeneous models with time-varying parameters perform at least equally good regarding brand level share predictions. Finally, the MXL-TVP32 model has the lowest RMSE values for all brands and provides the largest improvements over the MNL. Thus, both components (heterogeneity and dynamics) are relevant for computing brand-level shares accurately (at least for the ketchup data). A closer examination of the results of the MXL-TVP32 model reveals that the highest improvements in RMSE are obtained for Heinz (+35.46%) and Hunt's (+30.08%), the two brands with the highest brand share ﬂuctuations (see Fig. 1) and considerably changing brand values (see Fig. 2). Concerning predictive validity, the results indicate a superior performance of the MXLTVP32 for all brands, with improvements in RMSE over the standard MNL model ranging from 9.63% (Del Monte) up to 15.09% (Hunt's). Even though the RMSE improvements in the validation sample are smaller than in-sample, the improvements are still substantial. Altogether, the brand level RMSE results conﬁrm our previous results regarding model performance (see Table 4). Fig. 5 contrasts actual brand shares (grey lines) and brand shares predicted from both the MXL-TVP32 model (blue lines) and the standard MXL model (black lines) in the validation sample. The differences in share predictions can be attributed to the additional consideration of time-varying parameters in the MXL-TVP32 since heterogeneity has been accommodated in both models. The plots refer to Hunt's (left diagram) and Heinz (right diagram), i.e., the brands with the largest improvements in predictive validity when allowing for time-varying parameters. The plot referring to Hunt's ketchup reveals a markedly high discrepancy between the performance of the two models around weeks 60 and 100. In these periods, the MXL-TVP32 model can approximate brand shares rather well, while the standard MXL is away from providing accurate predictions. Remember that Hunt's intrinsic brand utility decreased toward Del Monte's around week 60. Also for Heinz, we see that the more ﬂexible MXL-TVP32 model approximates brand shares better than the MXL model. For example, in weeks 90 to 100 the MXL clearly overpredicts brand shares. The MXL model is not ﬂexible enough to provide predictions that match with the dynamic developments in the data, even though we include with the brand loyalty variable and reference price variables (gain, loss) several dynamic constructs (in contrast to Kim et al., 2005, Lachaab et al., 2006, or Rutz & Sonnier, 2011). In other words, the improvements in predictive performance can be fully attributed to changes in parameters and are not due to well established dynamic constructs because all models (even our models with constant parameters) control for these dynamic components. Overall, these ﬁndings suggest the use of the time-varying parameters models and should encourage ﬁrms and managers to adopt more ﬂexible models to detect time-varying effects. If time-varying effects exist and are expected to recur, the use of TVP models can help managers to understand short-term peaks or dips in demand more accurately, so that out-of-stock situations or increased inventories may be prevented. If changes and ﬂuctuations in demand or market shares are not foreseeable and thus not fully predictable, managers can use the model to detect and analyze changes in consumer behavior in response to marketing mix decisions in due time. This way, even though the decision maker can't predict the future, he/she “is one step (but only Table 5 RMSE values in shares at the individual brand level (improvement in % over MNL). Brand

MNL

MXL

MNL-TVP3

MXL-TVP32

Data set: estimation Del Monte Heinz Hunt's

0.0460 0.0736 0.0778

0.0395 (14.13) 0.0651 (11.55) 0.0700 (10.03)

0.0375 (18.48) 0.0488 (33.70) 0.0580 (25.45)

0.0340 (26.09) 0.0475 (35.46) 0.0544 (30.08)

Data set: validation Del Monte Heinz Hunt's

0.0519 0.0813 0.0749

0.0491 (5.39) 0.0772 (5.04) 0.0734 (2.00)

0.0485 (6.55) 0.0726 (10.70) 0.0654 (12.68)

0.0469 (9.63) 0.0697 (14.27) 0.0636 (15.09)

Note: Best-ﬁtting model (for each brand) indicated in bold within each data set.

412

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Hunt's

0.4

0.2

share

0.0 0

26

52

78

104

130

78

104

130

Heinz 1.0 0.8 0.6 0.4 0

26

52

week actual

MXL

MXL−TVP32

Fig. 5. Actual and predicted brand shares (validation sample).

one step) behind” (Amman & Kendrick, 2003). In the context of our ketchup data, “one step” refers to one week which we consider as a reasonable short time interval for practical purposes. In addition, estimated parameter paths can be reparameterized in a second step as to depend on further covariates in order to try to identify drivers for the time-varying effects.

5. Conclusions and limitations This article presents ﬂexible MNL and MXL models to analyze time-varying effects in consumer choice behavior. Estimation of the time-varying effects is based on penalized splines, a ﬂexible yet parsimonious nonparametric smoothing technique. The estimation procedure is fully data-driven (determining the unknown smooth functions and individual degrees of smoothness for each of the unknown functions simultaneously) and easy to apply (using free software). The main advantage of the ﬂexible approach is that both short-term ﬂuctuations in brand choice behavior (e.g., due to situational factors) and longer-term effects of marketing instruments can be uncovered. The ﬂexible model provides direct implications for future managerial decisions if systematic time-varying effects exist (e.g., recurring effects before festive occasions), and can further be used as a diagnostic tool to explore changes in consumer behavior in response to marketing actions or other inﬂuencing factors. In addition, via time-varying brand intercepts managers can measure brand values over time, thus providing better insights into the health of their brand as well as their position compared to their competitors. In our empirical study, we obtained a higher predictive validity for our ﬂexible models with parameter dynamics as compared to the standard MNL and MXL models and very different patterns of time-varying effects at the brand level in the ketchup category. In particular, a model with heterogeneity but different kinds of dynamic speciﬁcations for brand intercepts on the one hand (high ﬂexibility: random walk dynamics) and effects of covariates on the other hand (lower ﬂexibility: cubic splines) ﬁtted the data best compared to different possible speciﬁcations within our proposed framework as well as compared to other established benchmark models in this ﬁeld. Which dynamic speciﬁcation performs best depends on the data at hand, and using an adaptive approach like the proposed semiparametric framework helps to explore time-varying parameters in brand choice models (as is demonstrated with two additional empirical applications in Web Appendix B). Next, we turn to the limitations of the study and indicate opportunities for future research. First, even though we accommodate consumer heterogeneity, the variances of the heterogeneity distributions are kept invariant across periods, i.e., parameter variation is additively decomposed into time variation (at the population level) and cross-sectional variation. We share this shortcoming with other studies (e.g., Kim et al., 2005; Lachaab et al., 2006 and Rutz & Sonnier, 2011). A brand choice model with individual-level heterogeneous and nonparametrically estimated time-varying parameters would be an interesting extension. Since scanner data on the individual household level is very sparse, the main challenge of such an approach would be to cope with overﬁtting problems. Second, we did not analyze competitive reactions (e.g., Jedidi et al., 1999) as our focus lies on an exploratory approach for estimating parameter dynamics. However, if changes in parameters cause ﬁrms to alter their marketing policies, analyzing competitive reactions might be an interesting avenue for future research. Third, it would be interesting to use our semiparametric approach for modeling time-varying parameters in other limited-dependent variable contexts (e.g., purchase incidence and/or purchase quantity). Using BayesX this would be relatively straight forward, as the program also contains suitable families of distributions (e.g., binomial logit/probit and poisson). Finally, we rely on the logit model, mostly because of its computational simplicity. Utilizing a probit model instead would solve the IIA assumption even on the individual household level. However, the estimation of a ﬂexible probit model within our framework would be computationally expensive if not intractable.

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

413

Acknowledgments The authors thank the Editor, the Area Editor, two anonymous reviewers, and the former Editor Marnik Dekimpe for their detailed and valuable comments and suggestions that helped to improve and to position the paper. Furthermore, we thank Jin Gyo Kim and Fred Feinberg for letting us use their proprietary detergent data, their C++ code, and their server, as well as Koray Cosguner and Seethu Seetharaman for sharing their cola data with us. Daniel Guhl gratefully acknowledges ﬁnancial support by the Deutsche Forschungsgemeinschaft (DFG) through CRC TRR 190. All authors have contributed equally to this research. Web Appendix Supplementary data to this article can be found online at https://doi.org/10.1016/j.ijresmar.2018.03.003. References Abe, M. (1995). A nonparametric density estimation method for brand choice using scanner data. Marketing Science, 14(3), 300–325. Abe, M. (1998). Measuring consumer, nonlinear brand choice response to price. Journal of Retailing, 74(4), 541–568. Abe, M. (1999). A generalized additive model for discrete choice data. Journal of Business & Economic Statistics, 17(3), 271–284. Abe, M., Boztuğ, Y., & Hildebrandt, L. (2004). Investigating the competitive assumption of multinomial logit models of brand choice by nonparametric modeling. Computational Statistics, 19(4), 635–657. Ailawadi, K. L., Gedenk, K., & Neslin, S. A. (1999). Heterogeneity and purchase event feedback in choice models: An empirical analysis with implications for model building. International Journal of Research in Marketing, 16(3), 177–198. Allenby, G. M., & Lenk, P. J. (1995). Reassessing brand loyalty, price sensitivity, and merchandising effects on consumer brand choice. Journal of Business & Economic Statistics, 13(3), 281–289. Allenby, G. M., & Rossi, P. E. (1999). Marketing models of consumer heterogeneity. Journal of Econometrics, 89(1/2), 57–78. Amman, H. M., & Kendrick, D. A. (2003). Mitigation of the Lucas critique with stochastic control methods. Journal of Economic Dynamics & Control, 27(11−12), 2035–2057. Ataman, M. B., van Heerde, H. J., & Mela, C. F. (2010). The long-term effect of marketing strategy on brand sales. Journal of Marketing Research, 47(5), 866–882. Bass, F. M., Bruce, N., Majumdar, S., & Murthi, B. P. S. (2007). Wearout effects of different advertising themes: A dynamic Bayesian model of the ad-sales relationship. Marketing Science, 26(2), 179–195. Baumgartner, B. (2003). Measuring changes in brand choice behavior. Schmalenbach Business Review, 55(3), 242–256. Baumgartner, B., & Hruschka, H. (2005). Allocation of catalogs to collective customers based on semiparametric response models. European Journal of Operational Research, 162(3), 839–849. Baumgartner, B., Guhl, D., Kneib, T., & Steiner, W. J. (2018). Flexible Estimation of Time-Varying Effects for Frequently Purchased Retail Goods: A Modeling Approach Based on Household Panel Data. Working Paper. Boulding, W., Lee, E., & Staelin, R. (1994). Mastering the mix: Do advertising, promotion, and sales force activities lead to differentiation? Journal of Marketing Research, 31(2), 159–172. Brezger, A., Kneib, T., & Lang, S. (2005). BayesX: Analyzing Bayesian structured additive regression models. Journal of Statistical Software, 14(11), 1–22. Brezger, A., & Steiner, W. J. (2008). Monotonic regression based on bayesian p-splines: An application to estimating price response functions from store-level scanner data. Journal of Business & Economic Statistics, 26(1), 90–104. Briesch, R., Chintagunta, P. K., & Matzkin, R. L. (2002). Semiparametric estimation of brand choice behavior. Journal of the American Statistical Association, 97(460), 973–982. Briesch, R., Krishnamurthi, L., Mazumdar, T., & Raj, S. P. (1997). A comparative analysis of reference price models. Journal of Consumer Research, 24(2), 202–214. Bronnenberg, B. J., Kruger, M. W., & Mela, C. F. (2008). The IRI marketing data set. Marketing Science, 27(4), 745–748. Bruce, N. I., Peters, K., & Naik, P. A. (2012). Discovering how advertising grows sales and builds brands. Journal of Marketing Research, 49(6), 793–806. de Boor, C. (2001). A Practical Guide to Splines (Revised Ed.). New York: Springer. Dekimpe, M. G., & Hanssens, D. M. (2004). Persistence modeling for assessing marketing strategy performance. In D. Lehmann, & C. Moorman (Eds.), Assessing marketing strategy performance. Marketing Science Institute. Dierckx, P. (1993). Curve and surface fitting with splines. Oxford: Clarendon Press. Dubé, J. -P., Hitsch, G. J., & Rossi, P. E. (2010). State dependence and alternative explanations for consumer inertia. RAND Journal of Economics, 41(3), 417–445. Dubé, J. -P., Hitsch, G. J., Rossi, P. E., & Vitorino, M. A. (2008). Category pricing with state-dependent utility. Marketing Science, 31(6), 873–877. Eilers, P. H. C., & Marx, B. C. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statistical Science, 11(2), 89–121. Einav, L., Leibtag, E., & Nevo, A. (2010). Recording discrepancies in Nielsen Homescan data: Are they present and do they matter? Quantitative Marketing and Economics, 8(2), 207–239. Erdem, T., & Keane, M. P. (1996). Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Science, 15(1), 1–20. Erdem, T., Swait, J., Broniarczyk, S., Chakravarti, D., Kapferer, J. -N., Keane, M., ... Zettelmeyer, F. (1999). Brand equity, consumer learning and choice. Marketing Letters, 10(3), 301–318. Fahrmeir, L., Kneib, T., & Lang, S. (2004). Penalized structured additive regression for space-time data: A Bayesian perspective. Statistica Sinica, 14(3), 731–761. Foekens, E. W., Leeflang, P. S. H., & Wittink, D. R. (1999). Varying parameter models to accommodate dynamic promotion effects. Journal of Econometrics, 89(1–2), 249–268. Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. Gönül, F., & Srinivasan, K. (1993). Modeling multiple sources of heterogeneity in multinomial logit models: Methodological and managerial issues. Marketing Science, 12(3), 213–229. Gordon, B. R., Goldfarb, A., & Li, Y. (2013). Does price elasticity vary with economic growth? A cross-category analysis. Journal of Marketing Research, 50(1), 4–23. Guadagni, P. M., & Little, J. D. C. (1983). A logit model of brand choice calibrated on scanner data. Marketing Science, 2(3), 203–238. Gupta, S. (1988). Impact of sales promotions on when, what, and how much to buy. Journal of Marketing Research, 25(4), 342–355. Haaijer, R., & Wedel, M. (2001). Habit persistence in time series models of discrete choice. Marketing Letters, 12(1), 25–35. Hasegawa, S., Terui, N., & Allenby, G. M. (2012). Dynamic brand satiation. Journal of Marketing Research, 49(6), 842–853. Haupt, H., Kagerer, K., & Steiner, W. J. (2014). Smooth quantile-based modeling of brand sales, price and promotional effects from retail scanner panels. Journal of Applied Econometrics, 29(6), 1007–1028. Heilman, C. M., Bowman, D., & Wright, G. P. (2000). The evolution of brand preferences and choice behaviors of consumers new to a market. Journal of Marketing Research, 37(2), 139–155. Holtrop, N., Wieringa, J. E., Gijsenberg, M. J., & Verhoef, P. C. (2017). No future without the past? Predicting churn in the face of customer privacy. International Journal of Research in Marketing, 34(1), 154–172. Hruschka, H. (2002). Market share analysis using semi-parametric attraction models. European Journal of Operational Research, 138(1), 212–225.

414

D. Guhl et al. / International Journal of Research in Marketing 35 (2018) 394–414

Jedidi, K., Mela, C. F., & Gupta, S. (1999). Managing advertising and promotion for long-run profitability. Marketing Science, 18(1), 1–22. Kahneman, D., & Tversky, A. (1979). A prospect theory: An analysis of decisions under risk. Econometrica, 47(2), 263–291. Kalyanam, K., & Shively, T. S. (1998). Estimating irregular pricing effects: A stochastic spline regression approach. Journal of Marketing Research, 35(1), 16–29. Kalyanaram, G., & Little, J. D. C. (1994). An empirical analysis of latitude of price acceptance in consumer package goods. Journal of Consumer Research, 21(3), 408–418. Kalyanaram, G., & Winer, R. S. (1995). Empirical generalizations from reference price research. Marketing Science, 14(3), G161–G169 (Part 2 of 2). Kamakura, W. A., & Russell, G. J. (1989). A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research, 26(4), 379–390. Kamakura, W. A., & Russell, G. J. (1993). Measuring brand value with scanner data. International Journal of Research in Marketing, 10(1), 9–22. Kaul, A., & Wittink, D. R. (1995). Empirical generalizations about the impact of advertising on price sensitivity and price. Marketing Science, 14(3), G151–G160 (Part 2 of 2). Keane, M. P. (1997). Modeling heterogeneity and state dependence in consumer choice behavior. Journal of Business & Economic Statistics, 15(3), 310–327. Keller, K. L., & Lehmann, D. R. (2006). Brands and branding: Research findings and future priorities. Marketing Science, 25(6), 740–759. Kim, J. G., Menzefricke, U., & Feinberg, F. M. (2005). Modeling parametric evolution in a random utility framework. Journal of Business & Economic Statistics, 23(3), 282–294. Kim, J. G., Menzefricke, U., & Feinberg, F. M. (2007). Capturing flexible heterogeneous utility curves: A Bayesian spline approach. Management Science, 53(2), 340–354. Kneib, T., Baumgartner, B., & Steiner, W. J. (2007). Semiparametric multinomial logit models for analysing consumer choice behaviour. AStA Advances in Statistical Analysis, 91(3), 225–244. Kopalle, P. K., Kannan, P. K., Boldt, L. B., & Arora, N. (2012). The impact of household level heterogeneity in reference price effects on optimal retailer pricing policies. Journal of Retailing, 88(2), 102–114. Kopalle, P. K., Mela, C. F., & Marsh, L. (1999). The dynamic effect of discounting on sales: Empirical analysis and normative pricing implications. Marketing Science, 18 (3), 317–332. Kumar, V., Choi, J. B., & Greene, M. (2017). Synergistic effects of social media and traditional marketing on brand sales: Capturing the time-varying effects. Journal of the Academy of Marketing Science, 45(2), 268–288. Lachaab, M., Ansari, A., Jedidi, K., & Trabelsi, A. (2006). Modeling preference evolution in discrete choice models: A Bayesian state-space approach. Quantitative Marketing and Economics, 4(1), 57–81. Lang, S., & Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics, 13(1), 183–212. Lang, S., Steiner, W. J., Weber, A., & Wechselberger, P. (2015). Accommodating heterogeneity and nonlinearity in price effects for predicting brand sales and profits. European Journal of Operational Research, 246(1), 232–241. Lattin, J. M., & Bucklin, R. E. (1989). Reference effects of price and promotion on brand choice behavior. Journal of Marketing Research, 26(3), 299–310. Leeflang, P. S., Bijmolt, T. H., van Doorn, J., Hanssens, D. M., van Heerde, H. J., Verhoef, P. C., & Wieringa, J. E. (2009). Creating lift versus building the base: Current trends in marketing dynamics. International Journal of Research in Marketing, 26(1), 13–20. Leeflang, P. S. H., Wittink, D. R., Wedel, M., & Naert, P. A. (2000). Building models for marketing decisions. Boston, MA: Kluwer Academic Publishers. Lemmens, A., Croux, C., & Stremersch, S. (2012). Dynamics in the international market segmentation of new product growth. International Journal of Research in Marketing, 29(1), 81–92. Lillard, L. A. (1993). Simultaneous equations for hazards. Marriage duration and fertility timings. Journal of Econometrics, 56(1–2), 189–217. Martínez-Ruiz, M. P., Mollá-Descals, A., Gómez-Borja, M. A., & Rojo-Álvarez, J. L. (2006). Using daily store-level data to understand price promotion effects in a semiparametric regression model. Journal of Retailing and Consumer Services, 13(3), 193–204. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics. New York: Academic Press. Mela, C. F., Gupta, S., & Lehmann, D. R. (1997). The long-term impact of promotion and advertising on consumer brand choice. Journal of Marketing Research, 34(2), 248–261. Meza, S., & Sudhir, K. (2006). Pass-through timing. Quantitative Marketing and Economics, 4(4), 351–382. Miller, K. E., & Ginter, J. L. (1979). An investigation of situational variation in brand choice behavior and attitude. Journal of Marketing Research, 16(1), 111–123. Naik, P. A., Mantrala, M. K., & Sawyer, A. G. (1998). Planning media schedules in the presence of dynamic advertising quality. Marketing Science, 17(3), 214–235. Neelamegham, R., & Chintagunta, P. K. (2004). Modeling and forecasting the sales of technology products. Quantitative Marketing and Economics, 2(3), 195–232. Netzer, O., Lattin, J. M., & Srinivasan, V. (2008). A hidden Markov model of customer relationship dynamics. Marketing Science, 27(2), 185–204. Osinga, E. C., Leeflang, P. S. H., & Wieringa, J. E. (2010). Early marketing matters: A time-varying parameter approach to persistence modeling. Journal of Marketing Research, 46(1), 173–185. Papatla, P., & Krishnamurti, L. (1996). Measuring the dynamic effects of promotions on brand choice. Journal of Marketing Research, 33(1), 20–35. Park, S., & Gupta, S. (2011). A regime-switching model of cyclical category buying. Marketing Science, 30(3), 469–480. Rossi, P. E., Allenby, G. M., & McCulloch, R. (2005). Bayesian statistics and marketing. Chichester: John Wiley & Sons. Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). Semiparametric regression. Cambridge: University Press. Russell, G. (2014). Brand choice models. In R. S. Winer, & S. A. Neslin (Eds.), History of marketing science. World scientific – Now publishers series in business, Vol. 3. (pp. 19–46). Rutz, O. J., & Sonnier, G. P. (2011). The evolution of internal market structure. Marketing Science, 30(2), 274–289. Schindler, M., Baumgartner, B., & Hruschka, H. (2007). Nonlinear effects in brand choice models: Comparing heterogeneous latent class to homogeneous nonlinear models. Schmalenbach Business Review, 59(2), 118–137. Seetharaman, S. (2004). Modeling multiple sources of state dependence in random utility models: A distributed lag approach. Marketing Science, 23(2), 263–271. Shively, T. S., Allenby, G. M., & Kohn, R. (2000). A nonparametric approach to identifying latent relationships in hierarchical models. Marketing Science, 19(2), 149–162. Singh, S. S., & Jain, D. C. (2014). Evaluating customer relationships: Current and future challenges. In L. Moutinho, E. Bigné, & A. K. Manrai (Eds.), The Routledge companion to the future of marketing. New York: Routledge. Sloot, L. M., Fok, D., & Verhoef, P. C. (2006). The short- and long-term impact of an assortment reduction on category sales. Journal of Marketing Research, 43(4), 536–548. Sriram, S., Balachander, S., & Kalwani, M. U. (2007). Monitoring the dynamics of brand equity using store-level data. Journal of Marketing, 71(2), 61–78. Srivastava, R. K., Shocker, A. D., & Day, G. S. (1978). An exploratory study of the influences of usage situation on perceptions of product-markets. Advances in Consumer Research, 5, 32–38. Steiner, W. J., Brezger, A., & Belitz, C. (2007). Flexible estimation of price response function using retail scanner data. Journal of Retailing and Consumer Services, 14(6), 383–393. Steiner, W. J., Siems, F. U., Weber, A., & Guhl, D. (2014). How customer satisfaction with respect to price and quality affects customer retention: An integrated approach considering nonlinear effects. Journal of Business Economics, 84(6), 879–912. Stremersch, S., & Lemmens, A. (2009). Sales growth of new pharmaceuticals across the globe: The role of regulatory regimes. Marketing Science, 28(4), 690–708. Train, K. E. (2009). Discrete choice methods with simulation (2nd ed.). Cambridge: University Press. Umlauf, N., Adler, D., Kneib, T., Lang, S., & Zeileis, A. (2015). Structured additive regression models: An R interface to BayesX. Journal of Statistical Software, 63(21), 1–46. Van Heerde, H. J., Leeflang, P. S. H., & Wittink, D. R. (2001). Semiparametric analysis to estimate the deal effect curve. Journal of Marketing Research, 38(2), 197–215. Van Heerde, H. J., Leeflang, P. S. H., & Wittink, D. R. (2002). How promotions work: SCAN*PRO-based evolutionary model building. Schmalenbach Business Review, 54, 198–220. Van Heerde, H. J., Mela, C. F., & Manchanda, P. (2004). The dynamic effect of innovation on market structure. Journal of Marketing Research, 41(2), 166–183. Weber, A., Steiner, W., & Lang, S. (2017). A comparison of semiparametric and heterogeneous store sales models for optimal category pricing. OR Spectrum, 39(2), 403–445. Winer, R. S. (1986). A reference price model of brand choice for frequently purchased products. Journal of Consumer Research, 13(2), 250–256. Zhao, Y., Zhao, Y., & Song, I. (2009). Predicting new customers' risk type in the credit card market. Journal of Marketing Research, 46(4), 506–517.