A Two-Stage Model of Innovation Adoption with Partial ... - CiteSeerX

1 downloads 4607 Views 161KB Size Report
Aug 16, 1999 - Conference, the 1994 and 1998 INFORMS Marketing Science ...... 19 man.” Lederle's direct mail budget for tetracycline permitted 105 mailings, ...
A Two-Stage Model of Innovation Adoption with Partial Observability: Model Development and Application

Christophe Van den Bulte University of Pennsylvania

Gary L. Lilien The Pennsylvania State University

August 16, 1999

Acknowledgements We benefited from comments by Wayne Baker, Hans Baumgartner, Albert Bemmaor, the late Clifford Clogg, Jehoshua Eliashberg, David Krackhardt, Keith Ord, Arvind Rangaswamy, David Schmittlein, David Strang, Thomas Valente, and audience members at the 1999 INSNA Sunbelt Conference, the 1994 and 1998 INFORMS Marketing Science Conferences, the Australian Graduate School of Management, Carnegie Mellon, Columbia, Cornell, Duke, Harvard, KU Leuven, Michigan, Northwestern, Penn State, Stanford, UNC Chapel Hill, UT Austin, and Wharton. We thank Thomas Valente for providing us with the Medical Innovation data set prepared by Ronald Burt. Financial support from Penn State's Institute for the Study of Business Markets and the Richard D. Irwin Foundation is gratefully acknowledged. Correspondence address: Christophe Van den Bulte, The Wharton School, University of Pennsylvania, 3620 Locust Walk, Philadelphia, PA 19104-6371. Tel: 215-898-653; fax: 215-898-2534; e-mail: [email protected].

A Two-Stage Model of Innovation Adoption with Partial Observability: Model Development and Application Abstract

We show how one can specify micro-level hazard models of adoption that separately represent the awareness and evaluation/choice stage of the adoption decision process even when one observes only the final outcome, i.e. adoption or not. These models can be derived from random utility theory. Using a small simulation study, we show that the two-stage partial observability specification retrieves weak advertising or contagion effects better than a more traditional one-stage model does when people really follow a two-stage decision process. We illustrate the value of the modeling strategy in a re-examination of Medical Innovation, a classic study of the diffusion of an antibiotic, traditionally credited for establishing the role of social contagion in innovation diffusion. Using newly collected data on advertising and applying both one-stage and two-stage models, we conclude that the traditional interpretation is based on confounding contagion and marketing effort. We also explain why other re-analyses have failed to document strong contagion effects.

ii

Despite the widespread acceptance of the role of social contagion in innovation diffusion, recent research has come to challenge its apparent empirical support. Some work has shown that S-shaped diffusion curves—often interpreted as evidence of social contagion—can be the result of population heterogeneity rather than contagion. For instance, when a product’s price decreases linearly over time and reservation prices are normally distributed over the population, the diffusion curve will be the normal cumulative density function (Thirtle and Ruttan 1987). The logistic and Bass models have also been shown to be formally indistinguishable from patterns that arise in the absence of any social contagion (Bemmaor 1994; Bonus 1973). Still other work has documented that, even when contagion is truly at work, its effect can be overestimated (Van den Bulte and Lilien 1997). Another concern is that the well-documented positive relationship between prior adoptions among one’s peers and the likelihood of one’s own adoption—often interpreted as evidence of social contagion—may often be produced by factors excluded from the model (England 1998; Haunschild and Miner 1997). Changes in the marketing mix, such as declining prices, increasing distribution coverage, and increasing advertising and sales efforts, are obvious possibilities. In this paper, we focus on separating the effects of advertising from social contagion. Several empirical studies have examined the impact of advertising on new product diffusion. With the exception of the work by Urban, Hauser and Roberts (1990), these studies generally represent adoption as a single, binary event (Bass et al. 1993; Horsky and Simon 1983; Kalish and Lilien 1986; Mesak 1996; Simon and Sebastian 1987). A large body of behavioral research, however, documents that adoption is really a process consisting of several distinct stages. Most authors distinguish two separate, and temporally sequenced, stages in the decision to adopt or not (e.g., Bonus 1973; Kalish 1985; Rogers 1995). During the knowledge or awareness stage, the

1

individual or corporate actor learns about the innovation’s existence and gains some understanding of how it functions. The next stage is persuasion or evaluation, during which the actor forms a favorable or unfavorable attitude toward (i.e., evaluation of) the innovation that leads to the decision to adopt or not. Explicitly distinguishing between these two stages may be critical to understand what drives adoption decisions, because research suggests different factors weigh in differently in these two stages. Two such factors are advertising and word of mouth (or, more generally, social network exposure). Many early studies in rural sociology indicate that “impersonal sources are most important at the awareness stage, and personal sources are most important at the evaluation stage in the adoption process” (Rogers 1962, p. 99). Similarly, studies of new drug adoption by physicians report that initial knowledge occurs mainly through commercial sources such as salespeople and direct mailings, whereas personal contacts with colleagues gain importance in later stages (Coleman et al. 1966; Peay and Peay 1984). Such findings generalize across many settings (e.g., Valente and Saba 1998). Given these differences in effectiveness across stages, modeling the effect of mass media and commercial efforts compared to word of mouth and other social contagion processes without distinguishing between awareness and evaluation may produce misleading results. Say, for instance, that commercial media were quite important in creating awareness, and social contagion was moderately—though still sizably—important in persuading actors to adopt the innovation. When both explanatory variables are forced into a single-stage model, the weaker social contagion effect may be washed out by a commercial media effect, erroneously suggesting that social contagion was not at work. Some researchers have built macro-level models incorporating two or more stages to reflect the sequential nature of the diffusion process. The empirical application and validation of such

2

models has proven quite difficult. Some researchers use data on the number of people at each stage at each point in time to estimate separate parameters for the awareness and evaluation subprocesses (Mahajan et al. 1984; Urban et al. 1990). Data of such great richness in which each stage is observed are not always available. More often, one does not observe each sub-process, but only the final outcome, i.e. the number of adoptions. In such situations of partial observability, researchers may attempt to derive the population at various stages by decomposing the adoption data, but this often results in ill-conditioned models having too many parameters to be estimated from the available data (Silver 1984). Developers of macro-level two-stage models have therefore collapsed their model into a single-stage specification prior to estimation (Kalish 1985) or just foregone empirical analysis altogether (Dodson and Muller 1978; Tapiero 1983). Researchers also use micro-level hazard models of adoption to study diffusion processes. Such models also represent adoption as a single event rather than as a sequenced process. Hence, studies using micro-level models may have similar difficulty in documenting weak advertising and/or contagion effects and risk confounding one for the other. In this paper we show how one can specify micro-level two-stage hazard models of adoption, both when one observes both stages of the adoption process and when one observes only the final outcome, i.e. adoption or not. We exploit several established results that have not previously been related. We first note that discrete-time hazard models are simply binary dependent variable models (Allison 1982) and, thus, we can represent adoption hazards using models for binary sequential processes under both full observability and partial observability conditions (Abowd and Farber 1982). As binary dependent variable models, such models can be derived from random utility theory. Using a small-scale simulation study, we show that a twostage partial observability specification is better than a one-stage model in retrieving weak

3

advertising or contagion effects when people really do follow a two-stage decision process. We next use the partial observability model to empirically investigate a puzzle (in Kuhn’s sense) in the diffusion literature. Recent hazard model analyses have failed to document strong social network effects in the Medical Innovation data (Coleman et al. 1966), arguably the most influential study supporting the role of social network exposure in innovation adoption (Rogers 1995). Using newly collected data on advertising and applying both one-stage and two-stage models, we come to conclusions that conflict with the traditional interpretation (Rogers 1995). We are also able to explain why other re-analyses have failed in documenting strong contagion. The tight linkages we identify between (random) utility theory and hazard modeling bring together two streams of thought in diffusion research. The extension to two-stage hazard models with partial observability should prove valuable to researchers seeking a better understanding of the differential effects of advertising, social contagion, and individual-level characteristics at different stages of the adoption process. The new findings from the Medical Innovation data are of interest in themselves as the original study and more recent analyses, especially the one by Burt (1987), have been very influential in shaping current thinking on innovation adoption and social contagion.

SINGLE-STAGE DISCRETE-TIME HAZARD MODELS OF ADOPTION Over the last several years, empirical diffusion research has increasingly been employing micro-level hazard models, also referred to as duration or event history models. These models use individual-level data about each corporate or individual actor’s characteristics and time of adoption to capture how the probability of adoption varies not only over time—the focus of macro-level models—but also across actors (e.g., Saloner and Shepard 1995; Sinha and Chandrashekaran 1992; Strang and Tuma 1993).

4

Diffusion applications of duration or event history analysis often use statistical models developed in engineering and economics. In those latter areas, events are often recorded on a very fine time scale, such as in hours or even minutes for the failure time of electric components or in days for the duration of strikes. The process itself is also typically conceived as being continuous in time (i.e., operating at an infinitesimally small time scale) such that the event (failure, end of the strike, death, etc.) can occur at any point in time. When the underlying behavioral process is continuous in time and event occurrence is measured at a fine time scale, continuous-time hazard modeling is a powerful approach. These two conditions rarely apply to diffusion studies, though. Adoption behavior often is a discretetime process, dictated by the typical weekly schedule of shopping trips by consumers and by the annual or quarterly budgeting cycles in organizations. Adoption data often come from panel designs, recording whether an actor adopted in a specific time interval (often month or year) rather the exact time of adoption. Applying continuous-time methods to discrete-time data violates the key assumption of accurate time measurement. This results in a systematic negative time-aggregation bias in the estimated time duration dependence and may severely affect the coefficients of time-varying covariates (BergstrØm and Edin 1992; Petersen 1991). When the time measurement scale is too coarse compared to the true time scale along which the process takes place, many adoptions are reported to have occurred at the same time. Using continuoustime methods with such data results in grouped-data bias in coefficient and covariance matrix estimates (Cox and Oakes 1984). If one requires that one’s statistical model both reflect the nature of the behavioral process and match the nature of one’s data, one must conclude that, in many diffusion studies, discrete-time models are to be preferred to continuous-time models. Ignoring the presence of multiple stages in the adoption process, we represent an actor’s

5

adoption as a transition in a two-state, discrete-time semi-Markov process. We use state 0 to denote "not having adopted" and state 1 for "having adopted." In this paper, we assume that people do not disadopt, i.e. state 1 is an absorbing state. We can represent the transition probabilities of actor i at time t as: P01it



P( yit = 1 yi,t-1 = 0 )

=

G ( αxit),

P11it



P( yit = 1 yi,t-1 = 1 )

=

1

[1] , where

P01it

= probability of i transiting to the absorbing “having adopted” state at time t;

yit

= 1 if i has adopted by time t, and 0 otherwise;

xit

= column vector of variables affecting i’s decision to adopt;

α

= row vector of parameters to be estimated;

G

= a cumulative distribution function (cdf). Analyzing adoption behavior as this Markov process has the appealing property that the

hazard of adoption, i.e. the risk that one adopts at time t given that one has not adopted just prior to t, simply equals the transition probability P01it. As a result, discrete-time hazard models can be estimated using binary dependent variable models such as the binary logit and probit (Allison 1982). Let the censoring variable dit be 1 if yi,t-1 = 1 or if i has dropped out of the study by time t, and dit = 0 otherwise. Then, the log likelihood function can be written as: ln L = Σi Σt [1 - dit] [ yit ln{G(αxit)} + (1- yit) ln{1 – G(αxit)}] Note that post-adoption observations (yi,t-1 = 1) do not contribute to the likelihood. As a result, the model is very easy to estimate: organize the data in a panel, delete all post-adoption observations, and estimate using a standard binary dependent variable specification. As with any binary logit or probit model, the arguments of the cdf can be interpreted as the deterministic component of the random utility function. Hence, discrete-time hazard models

6

[2]

capture the choice behavior of an actor maximizing a random utility function. While some authors using continuous-time models have interpreted the hazard as the limiting probability that the product meets the reservation utility level, they offer no formal derivation for this claim (e.g., Saloner and Shepard 1995). In discrete-time models, in contrast, the derivation is straightforward and the statistical model is tightly linked to the choice process it is intended to represent.

EXTENDING THE MODEL TO TWO STAGES We can also model adoption as the outcome of a two-stage process. Assume that actors become aware of the innovation if they get exposed to information and that information passes their perceptual threshold. Second, assume that, conditional upon being aware, actors evaluate the innovation and adopt if it meets their minimum utility threshold. Let ait and eit be binary indicators of whether i is aware and evaluates the innovation highly, respectively. Hence, the probability that i adopts at time t (yit =1) is the probability that i is aware at time t (ait =1) and, being aware, evaluates the innovation highly enough to adopt (eit =1 | ait =1). We can write: P[yit =1 | yi,t-1 = 0]

= P[eit =1, ait =1 | yi,t-1 = 0] = P[eit =1 | ait =1, yi,t-1 = 0] P[ait =1 | yi,t-1 = 0]

[3]

We can express the probability that i becomes aware at time t as a function of the volume of commercial and impersonal information he is exposed to, say Iit1. This information exposure can be a function of how much information is circulating in general, of i’s media exposure, media habits, perceived source credibility, etc., and of random shocks. Actor i becomes aware of the innovation and its (purported) benefits when his information exposure crosses his perceptual threshold, say πi. Similarly, we express the probability that i chooses to adopt (being aware) as a function of the innovation’s perceived utility to him, say Uit2. Actor i adopts when that evaluation meets his minimum threshold, say φi. Finally, we assume that Iit1 and Uit2 are random

7

variables additively composed of a deterministic element (Vit1 and Vit2) and a stochastic element (εit1 and εit2), independently and normally distributed over i and t with variance σ1 and σ2, respectively. Collecting all the explanatory variables of the first and second stage in the vectors xit1 and xit2 respectively, such that Iit1 - πi = κxit1 and Vit2 - φi = λxit2, and deleting all postadoption observations, we can write: P[yit =1 | yi,t-1 = 0]

=

P[eit =1 | ait =1, yi,t-1 = 0] P[ait =1 | yi,t-1 = 0]

=

P[Iit1 ≥ πi ] P[Uit2 ≥ φi ]

=

P[Vit1 - πi ≥ -εit1] P[Vit2 - φi ≥ -εit2 ]

=

Φ((κ/σ1)xit1) Φ((λ/σ2)xit2)

[4]

where Φ is the standard normal distribution function and σ1 = σ2 = 1 can be assumed without loss of generality. These derivations require that εit1 and εit2 are independently distributed across stages. This assumption follows from the interpretation of the evaluation process as being conditional on being aware. The model does not require the events of the joint process {being aware, evaluating positively} to be independent (Abowd and Farber 1982). This is important, as one can expect that an actor with very high interest in a particular area is more likely to actively seek out information about innovations in that area, and once knowledgeable about one such innovation and its purported benefits, to evaluate it highly. Full observability We present model specifications for two conditions: full observability where one has data on both time of awareness and time of adoption, and the more common situation of partial observability where one observes only the time of adoption. When both awareness and adoption are observed, and one does not consider post-adoption observations, the data comprise three possible states:

8

State 1: ait = 0, eit not relevant (no awareness and hence no adoption) State 2: ait = 1, eit =0 (awareness, but no positive evaluation; hence no adoption). State 3: ait = 1, eit =1 (both awareness and positive evaluation, hence adoption). Let P1it, P2it and P3it denote the probability of i being in each state at time t (P1it + P2it + P3it = 1). The log likelihood will then have the following form: ln L

= Σi Σt [1 - dit] [ (ait eit) ln P3it + ait (1- eit) ln P2it + (1- ait) ln P1it ]

[5]

We also have P3it = P[eit =1, ait =1 | yi,t-1 = 0] = P[ait =1 | yi,t-1 = 0] P[eit =1 | ait =1, yi,t-1 = 0] = Φ(κxit1) Φ(λxit2),

[6a]

P2it = P[eit =0, ait =1 | yi,t-1 = 0] = P[ait =1 | yi,t-1 = 0] P[eit =0 | ait =1, yi,t-1 = 0],

P1it = P[ait =0 | yi,t-1 = 0]

= Φ(κxit1) [1- Φ(λxit2)],

[6b]

= 1- Φ(κxit1)

[6c]

We can then write the total log likelihood as: ln L

=

Σi Σt [1 - dit] [ (ait eit) ln {Φ(κxit1) Φ(λxit2)} + ait (1- eit) ln {Φ(κxit1) [1- Φ(λxit2)]} + (1- ait) ln {1- Φ(κxit1)} ]

[7]

Kardes et al. (1993) used a similar model in a cross-sectional data structure to isolate the impact of pioneering on brand retrieval, consideration, and choice in a three-stage decision process. Partial observability In situations of partial observability, one does not observe ait and eit separately, but only their product ait × eit = yit. As a result, one cannot distinguish between States 1 (no adoption because no awareness) and 2 (no adoption because no positive evaluation). In spite of the partial observability of the process, the probability that actor i adopts at time t is still:

9

P[yit =1 | yi,t-1 = 0]

=

P[eit =1 | ait =1, yi,t-1 = 0] P[ait =1 | yi,t-1 = 0]

=

Φ(κxit1) Φ(λxit2),

and the log likelihood equals: ln L = Σi Σt [1 - dit] [ yit ln{Φ(κxit1) Φ(λxit2)} + (1- yit) ln{1 - Φ(κxit1) Φ(λxit2)}]

[8]

This model was originally developed by Abowd and Farber (1982) for cross-sectional data. The model is identified when there is at least one non-overlapping variable in xit1 or xit2. Model (8) with partial observability allows one to extend the hazard modeling framework by representing innovation adoption as a two-stage process, even when the data capture only the outcomes of the final stage. The absence of awareness data, however, comes at some cost. In models of full observability, one can impose a condition that once actors are aware they never forget about the product by coding the awareness indicators such that ait =1 ⇒ ai,t+k =1, ∀ k > 0. This is not possible with partial observability, and one must assume that actors can forget. Also, as indicated in the way each cdf enters the log likelihood function, the only thing that distinguishes the two stages in the partial observability model is the set of covariates. Clearly, the full observability model imposes more structure on the data.

SIMULATION ANALYSIS One can expect two-stage models to provide better insight than single-stage models do about the effect of covariates on the timing of adoption. We investigated this through a small two-part simulation study. In a first analysis, we assessed to what extent using a single-stage model when adoptions are truly the result of a two-stage process makes it less likely to detect weak effects. In the second analysis, we assessed relative model performance when one of the two stages is quasi-redundant and the two-stage adoption process hardly differs from a one-stage process. In

10

analysis 1, both stages are relevant, though we manipulate their relative importance. In analysis 2, only one stage is relevant. We created data for a population from 100 individuals observed over 20 periods using the following model: P[yit =1 | yi,t-1 = 0]

= P[ait =1 | yi,t-1 = 0] P[eit =1 | ait =1, yi,t-1 = 0] = P[α1 + β1At + γ1xi1 + εit1 ≥ 0 | yi,t-1 = 0] × P[α2 + β2 Σj wij yj,t-1 + γ2xi2 + εit2 ≥ 0 | ait =1, yi,t-1 = 0].

[9]

Awareness is affected by both market-level advertising exposure, At, and an individual-level trait xi1 capturing differences among individuals in attentiveness, media habits, etc. Evaluation, conditional upon awareness, is affected by social network exposure (Σj wij yj,t-1) and a second individual-level trait xi2. The variables xi1 and xi2 follow a standard normal distribution, are independent across individuals, and moderately correlated with each other (r = 0.5). The random error terms follow a standard normal distribution and are independent. We set advertising to be concave in time as At = 1-exp{-.15t}. This corresponds to a situation in which advertising volume decreases exponentially over time and consumers are affected by both current and past advertising. We operationalized social network exposure using a lagged network autocorrelation structure. We define the extent of social influence i is subject to at time t as a function of whether other actors j have adopted previously (indicated by yj,t-1) and how important each actor j is to i (indicated by the social weight wij). The extent of social network exposure i is experiencing can then be expressed as a lagged network autocorrelation variable Σj wij yj,t-1 (e.g., Marsden and Podolny 1990; Midgley et al. 1992; Strang 1991). The actual social

11

contagion, i.e. influence on adoption behavior, is then β Σj wij yj,t-1.1 Our population consists of 100 individuals, each having either two or three relevant sources of social influence, and located in four cities of 25 actors each. There is no cross-city communication. Each city has a “star” structure: everyone is influenced by the same three opinion leaders, and by no one else, except for the three opinion leaders who have only two influencers, i.e. the two other opinion leaders in their city. Each opinion leader is equally influential, so the weights are .50 if i is a leader and .333 if he is not. We assigned opinion leadership independently of the individual covariates xi1 and xi2. We believe this set-up slightly favors the detection of network effects over advertising effects because social network exposure varies over a somewhat greater range (0-1 vs. 0.14-0.95) and varies not only over time but also over individuals. We limited our analysis to situations with partial observability. Both stages relevant (Analysis 1) We first analyzed three conditions: (1) strong advertising effects/weak social contagion, (2) weak advertising effects/strong social contagion, and (3) both effects moderately and equally strong. We chose the values of the β’s to represent different relative size of advertising and contagion effects; we kept the values of the γ’s constant; and set the intercepts α to values that tend to generate 75 or more adoptions by period 20. The specific parameter values we used were:

1

This specification resembles the recent work by Putsis et al. (1997), but the models are very different in research

purpose and assumptions. We use data on interaction patterns (wij) and test whether there was social influence on time of adoption (captured by the parameter β). Putsis et al., in contrast, assume there is social contagion and then try to infer the interaction patterns in the population. Our model uses data on interaction patterns and does not impose any structure on the wij. The model by Putsis et al., in contrast, organizes actors in blocks (e.g., countries) and then assumes a heterogeneous random mixing structure in which all members of a block have the same interaction pattern. We are not aware of any work extending their type of model to micro-level data.

12

Condition 1:

α1 = -2

β1 = 3

γ1 = 1

α2 = 0

β2 = 1

γ2 = 1

Condition 2:

α1 = 0

β1 = 1

γ1 = 1

α2 = -2

β2 = 3

γ2 = 1

Condition 3:

α1 = -1.25

β1 = 2

γ1 = 1

α2 = -1.25

β2 = 2

γ2 = 1

We performed 50 replications in each condition. To each of the resulting event histories, we fit two models: True, 2-stage model:

Φ( α1 + β1At + γ1xi1 ) Φ( α2 + β2 Σj wij yj,t-1 + γ2xi2 )

1-stage model:

Φ( α0 + β1At + γ1xi1

+ β2 Σj wij yj,t-1 + γ2xi2 )

We use the ratio of parameter estimates to assess how well the different models recover the true parameter values. Remember that the parameters in our models are identified up to a scaling of the error variance only. Because the actual error variance in a misspecified single-stage model will expectedly be larger than the assumed variance of 1, the estimates in the single-stage model can be expected to be too small in absolute value, and one cannot conclude the existence of inconsistency from the parameter estimates themselves. Using the ratio of the coefficients eliminates the scaling problem (Fischer and Nagin 1981). We are not only interested in whether erroneous use of a single-stage model provides incorrect parameter point estimates, but also in whether that use increases the likelihood of not rejecting the null hypothesis for weak effects. We therefore also report how often the estimation results indicate that advertising, social contagion, and trait effects are significantly greater than zero (based on a two-sided test). Table 1 presents the results. We draw three conclusions. First, as expected, the estimation procedure is able to recover the parameter values for the two-stage model in all three conditions. We also compared the average estimated standard error with the standard deviation of the point estimates, and found both to be close as well (details not presented here). Second, the log

13

likelihoods indicate that using the two-stage model leads to better descriptive fit than using a single-stage model, again as to be expected. The third conclusion is that the two-stage model enables one to better recover weak effects than the single-stage model: -

In condition 1, both models recover the strong advertising effect, but the two-stage model recovers the weaker contagion effect better (β2/γ2 = 0.91 vs. 0.70) and also documents its effect as being significant more often (p ≤ .10: 50% vs. 22%).

-

In condition 2, both models recover the strong contagion effects, but the two-stage model recovers the weaker advertising effect better (β1/γ1 = 0.99 vs. 0.69) and also documents its effect as being significant more often (p ≤ .10: 24% vs. 12%).

-

In condition 3, where both advertising and contagion effects are sizable, both models recover these effects well. Using the correct two-stage model may still lead to better statistical inference, though.

Our main finding here is that erroneously ignoring the true sequential structure leads to poorer descriptive fit and also affects one’s ability to detect weak effects in one stage in the presence of strong effects in another stage. [ Table 1 about here ] Only one stage relevant (Analysis 2) Having found some evidence of the benefits of using a two-stage model, one can ask whether the one-stage specification is “good enough” when one of the two stages is as good as redundant and the adoption process is hardly different from a one-stage process. How do the two models compare when all individuals always tend to have a very positive evaluation and to adopt as soon as they are aware? Or when everyone always tends to be aware about the product as soon as it is

14

launched? When one of the stages hardly matters, one might expect that the one-stage model to do as well in fit and parameter recovery as the true two-stage specification. We created two additional conditions to reflect these situations. In condition 4, evaluation is positive for the great majority of i and t. Hence, it hardly matters and awareness is the only relevant stage. In condition 5, the reverse holds: awareness is pervasive and only differences in evaluation across actors and over time matter. For the “relevant” stage, we used the same values as in the “strong effects” conditions analyzed earlier. For the “quasi-redundant” stage, we chose parameter values such that the threshold was crossed for 75%-95% of the individuals at almost any time. The specific parameter values we used were: Condition 4:

α1 = -2

β1 = 3

γ1 = 1

α2 = 1

β2 = 0.5

γ2 = 0.5

Condition 5:

α1 = 1

β1 = 0.5

γ1 = 0.5

α2 = -2

β2 = 3

γ2 = 1

As before, we created 50 replications in each conditions and fit both the two-stage and onestage model to the resulting data. As to be expected when dealing with over-parametrized models and ill-conditioned estimation problems, a few cases resulted in rather extreme estimates of parameters in the “redundant” stage (e.g., social network exposure and trait x2 in condition 4). Because the median is a measure of central tendency that is robust to outliers, we report the median rather than the mean parameter estimates. Table 2 presents the results. We draw three conclusions. First, the two-stage model again tends to recover the true parameter values. Second, the very small difference in average log likelihoods indicate that using the two-stage model does not lead to better descriptive fit than using a single-stage model when only one stage really matters. The third conclusion concerns the relative ability of the two models to document the true effects: -

In both conditions, both models recover the strong effects, but

15

-

In both conditions, the one-stage model is more likely to produce statistically significant effects for variables in the quasi-redundant stage. [ Table 2 about here ]

Our main conclusion from this second analysis is that, when the adoption process really has only one salient stage, (1) the one-stage and the two-stage models fit about as well, (2) both models recover the relevant effects about as well, but (3) the one-stage model is more likely to produce significant effects for variables operating at the redundant stage. The general conclusion from both analyses is that the two-stage model tends to provide sound evidence of the effects of factors affecting adoption behavior, including when the effects truly are negligible.

APPLICATION TO THE MEDICAL INNOVATION PUZZLE Background In this section we apply our modeling approach to a classic and now controversial account of social contagion in the market place: Medical Innovation by Coleman, Katz and Menzel (1966). This classic study on physicians’ decision to adopt new drugs is credited for establishing that diffusion is a social process in which people’s adoption of an innovation is driven by social contagion, more specifically by influence from others who have already adopted (Rogers 1995). But the study has more than just historical interest. Its data on the diffusion of tetracycline, a broad-spectrum antibiotic, have become “a strategic research site for testing new propositions of how social structure drives contagion” (Burt 1987, p. 1301) and for assessing the performance of new modeling techniques (Marsden and Podolny 1990; Strang and Tuma 1993; Valente 1996). Hence, Medical Innovation is an important piece of evidence on the effect and nature of social influence in individuals’ and organizations’ decision to adopt new products, practices and ideas. Recent re-analyses of the data, however, have found social contagion effects to be rather small

16

(Burt 1987), sensitive to model specification (Strang and Tuma 1993), or even insignificant (Marsden and Podolny 1990). It is possible, however, that the social contagion effects, while present, might have been too subtle to be captured by traditional one-stage models. The market for tetracycline Medical Innovation is a study of the adoption by 125 physicians in four cities in Illinois of a new antibiotic drug called tetracycline. At the time that Lederle launched the first tetracyclinebased product in November 1953, three other broad-spectrum antibiotics were already on the market. Lederle had introduced chlortetracycline in 1948, Parke-Davis had introduced chloramphenicol in 1949, and Pfizer had introduced oxytetracycline in 1950. Tetracycline had product characteristics typically associated with rapid diffusion: •

Low complexity. Tetracycline was the newest in an already established family of drugs, and an undramatic pharmaceutical innovation (Coleman et al. 1966).



Compatibility. Physicians were favorably disposed toward the pharmaceutical industry, its new products, and its efforts to market them (Caplow and Raymond 1954). Enthusiasm was particularly strong for broad-spectrum antibiotics (Peterson et al. 1956).



Trialability and observability of results. Broad-spectrum antibiotics were generally used in the treatment of acute rather than chronic conditions. Because of the short time between treatment and outcome, a physician could easily and quickly determine their efficacy in any particular case, and adjust the therapy if necessary (Coleman et al. 1966, p. 17).



Relative advantage. Tetracycline produced fewer side effects than the other three broadspectrum antibiotics (Pearson 1969). Tolerance and side effects had become an important issue by the time tetracycline was launched. In the summer of 1952, side effects of Chloromycetin had received wide press coverage. Its share of the broad spectrum antibiotics

17

market had declined to only 5% in October 1952, down from 38% four months earlier. In September 1953, two months before Lederle’s launch of tetracycline, Chloromycetin’s share was still at a low of 10% (Fortune 1953). As a result, physicians were “eagerly awaiting Achromycin [Lederle's tetracycline], especially since it was reported to be effective over a wider range and to be somewhat better tolerated” (Ben Gaffin and Associates 1956, p. 784). In brief, there may be little reason to expect social contagion to have been a dominant driver. To the extent that there is little risk in prescribing tetracycline, information from previous adopters should not affect physicians’ evaluation of the drug. Since tetracycline was merely the newest in an already established family of drugs and an undramatic innovation, it is also questionable that adoption markedly enhanced physicians’ status among their peers. What could have been a major driver, in contrast, are the marketing efforts expended by the pharmaceutical companies commercializing the product, especially Lederle. Tetracycline did not enjoy exclusive patent protection. Five companies had the right to market the drug: Lederle, Pfizer, Bristol, Squibb and Upjohn (Federal Trade Commission 1958). These five firms accounted for more than half of all antibiotics sold in the U.S. in 1950 and all had a good reputation with the medical community (FTC 1958). Lederle, the first entrant, deployed a very aggressive marketing program. Broad-spectrum antibiotics enjoyed the largest promotional budgets in the pharmaceutical industry (FTC 1958). Even by those standards, Lederle’s marketing efforts for its tetracycline brand Achromycin were exceptional. As a detail man remarked, “Lederle was interested in bombarding physicians with the Achromycin name and we did just that and got the name across. We swamped them with Achromycin” (FTC 1958, p. 130). Lederle used a vast array of marketing efforts. Coleman, Katz and Menzel (1966, pp. 44 and 181) mention the “blanket exposure of all doctors to the detail

18

man.” Lederle’s direct mail budget for tetracycline permitted 105 mailings, an average of two per week, to every physician in the United States during its first year of introduction. Medical journal advertising for the first twelve months consisted of 26 insertions in the Journal of the American Medical Association (JAMA), and monthly insertions in the highly circulated Modern Medicine and Medical Economics, as well as in all state journals, 116 county journals, and most specialty journals (FTC 1958). Tetracycline also received wide publicity in the professional media (Ben Gaffin 1956). Pfizer entered second and was much less aggressive. Fearing that strongly promoting its own brand of tetracycline, Tetracyn, would undercut its sales of oxytetracycline, Pfizer relegated the marketing of Tetracyn to a small and money losing subsidiary it had just acquired in 1953, J.B. Roerig and Company. Only in January 1955, possibly alarmed by the tremendous success of Lederle, did Pfizer start to market Tetracyn through its main sales organization (Mines 1978). We have no detailed information the other three players’ marketing strategy, but they less aggressive than Lederle (Pearson 1969). Tetracycline was aggressively priced. Though it was superior to other broad-spectrum antibiotics, it sold at parity (FTC 1958). To the extent that physicians took price into consideration in their prescription behavior, it would have favored rapid adoption. In sum, tetracycline was marketed by a small group of companies enjoying a solid reputation. The first entrant deployed a very intensive marketing campaign. The product also enjoyed a large amount of free publicity and was competitively priced. Such a market environment is conducive to rapid initial diffusion (Bauer 1961; Hahn et al. 1994; Robertson and Gatignon 1986), without physicians having to waiting to be convinced or compelled by social contagion.

19

Overall, tetracycline’s product characteristics and the way it was marketed do not paint a case for strong contagion effects. Table 3, reconstructed from original reports on the Medical Innovation study, shows that physicians did not consider colleagues an important source of either information or influence. Marketing efforts were considered much more important. Note, however, that the relative importance of colleagues increases markedly from early to later stages of the decision process. So, even though the above discussion suggests that most physicians, once aware, would evaluate the product positively and adopt, Table 3 suggests that social contagion might have been present nevertheless, even though the effect would have been weak and limited to the evaluation stage of the adoption process. If this is so, then one-stage models might have been unable to detect these effects, as illustrated in the simulation analyses. Hence, the inability of earlier hazard model applications to detect social contagion effects (Marsden and Podolny 1990; Strang and Tuma 1993) might have been due to the nature of the model rather than the data. As our simulation analysis shows, two-stage models might produce richer insights. [ Table 3 about here ] Data Coleman, Katz and Menzel (1966) provide a detailed description of the population, the sample, and data collection procedures. Burt (1986) placed the portion of the original data set that we use in the public domain. Since the data are accessible for public scrutiny, we limit the discussion to the variables we used or constructed for our own analysis. Physician characteristics. We included five covariates to account for the heterogeneity in physicians’ tendency to adopt early. Professional age measures (on a 1-6 scale) how long ago the physician graduated. We included both a linear and quadratic term to account for a possible inverse U-shaped relationship between professional age and adoption proneness: compared to mid-career physicians, older physicians may be more conservative and very inexperienced 20

physicians more risk-averse. We mean-centered age to avoid extreme collinearity. We used the number of journals a physician receives or subscribes to as a measure of media exposure influence. Journals included both newsletters sent by pharmaceutical companies and scientificprofessional publications. We used the logarithm to reflect decreasing returns to scale. We expected physicians having a chief or honorary position in their hospital, captured as a dummy variable, to be less involved in actual medical practice than active or regular staff, and hence to adopt later. We also included an attitudinal measure: scientific orientation was coded 1 if the physician agreed with the statement that it is more important for a physician to "keep himself informed of new scientific developments [rather than to] devote more time to his patients," and 0 otherwise. We expected scientific orientation to affect both awareness and evaluation, number of journals to affect awareness only, and the other characteristics to affect evaluation only. We also estimated models including the number of nominations a physician received as advisor or as discussant as measures of status. Although sociometric status figures prominently in the analysis by Coleman and his associates, it did not contribute significantly to model fit or change the coefficients of the contagion variables once we controlled for the number of journals received. Burt (1987), Marsden and Podolny (1990) and Strang and Tuma (1993) reported similar findings. It appears that opinion leaders adopted early as a result of their cosmopolitan perspective and media habits rather than out of pressure to maintain their status among their colleagues. We report results only for models excluding sociometric status variables. Seasonal effects. We included a seasonal dummy variable for the summer months July and August. We expected fewer adoptions of a new antibiotic in these two months because the weather is milder and schools are closed, limiting the spread of contagious diseases, and thus the demand for antibiotics (Cliff et al. 1981).

21

Advertising volume. The Medical Innovation data do not contain information on the amount of marketing effort targeted towards the physicians whose prescriptions were tracked. We use the number of advertising pages in three leading advertising outlets, Modern Medicine, Medical Economics and GP, as our measure of marketing effort. These three publications were preferred by pharmaceutical advertisers and widely read by physicians (Ben Gaffin 1953, 1956). Our attempts to collect data on the number of ads appearing in JAMA were unsuccessful, as librarians removed the advertising supplement before binding the issues for storage. We distinguish between the marketing efforts by the first entrant, Lederle, and those of the later entrants. We do so for two reasons. First, the first entrant’s marketing efforts are often more effective than those of later entrants when the latter do not offer an important therapeutic advantage (Bond and Lean 1977; Hurwitz and Caves 1988; Shankar et al. 1998). Second, Lederle had a very large sales force and was strongly committed to aggressively building a dominant position while other companies were less well endowed and less aggressive. We matched the number of ads in each issue to the 4-week sampling periods in the data set prepared by Burt (1986). Because the data are monthly observations and previous research in the pharmaceutical industry documents the presence of sizable spillover effects over time (Berndt et al. 1997; Montgomery and Silk 1972; Rangaswamy and Krishnamurthi 1991), we expected marketing communication efforts to span multiple periods. We therefore constructed measures of depreciation-adjusted stock of marketing effort (Berndt et al. 1997; Kalish and Lilien 1986; Rizzo 1999). Let mt be the amount of advertising in month t (in hundreds of pages), and let δ be the monthly decay rate (0 = δ = 1). The stock of marketing effort Mt is then defined as: t

Mt

=

mt + (1- δ) Mt-1

S (1- δ)t-τ mτ.

=

t =0

22

[10]

We constructed one such variable for Lederle and one for all other competitors combined. We assumed the decay parameter δ equal across companies. We have not been able to locate data on the amount of detailing effort by various companies marketing tetracycline. We do not believe this to be a problem. Detailing effort and journal advertising are so highly correlated in pharmaceutical markets that either variable can be used to represent overall marketing effort (Berndt et al. 1997; Gatignon et al. 1990; Lilien et al. 1981; Rangaswamy and Krishnamurthi 1991; Rizzo 1999). We do not include an interaction effect between marketing effort and the number of journals received for two reasons. First, the number of journals received includes many in-house publications by other pharmaceutical companies that would not carry ads by Lederle. Second, though we operationalize marketing effort using advertising data, true marketing effort also captures detailing effort and direct mail, effects of which do not depend on journals received. Contagion variables. The great appeal of the Medical Innovation data is that they contain information on which doctors talked to which other doctors. Hence, they allow one to measure the social network exposure of each physician separately, as we did in the simulation. Coleman and his associates had 228 physicians interviewed. Each physicians was asked to name up to three other physicians he discussed medical practice with, and up to three physicians he sought advice from about medical practice. However, Coleman and his associates collected prescription data only for general practitioners (N=125) and not for specialists. There are two approaches to this missing data problem. Burt (1987) and Marsden and Podolny (1990) assumed that the adoption of tetracycline by specialists affected generalists’ decision to adopt, and consequently imputed adoption dates for specialists (though they only analyzed the generalists’ adoptions). In contrast, Coleman et al. (1966) and Strang and Tuma (1993) assumed that

23

generalists did not take specialists' adoption behavior into account, and did not consider the latter’s missing adoption data when constructing the social influence variables. We use both approaches, as the extant literature opines that different assumptions about the effects of specialists on generalists may have caused re-analyses to find evidence of social contagion or not (Strang and Tuma 1993). We constructed two types of exposure variables, each assuming a different influence mechanism represented by the wij weights. Direct ties indicates whether i nominated j as an interaction partner for advice or discussions. This measure captures information available from colleagues i is directly interacting with. We constructed a measure of structural equivalence indicating whether i might mimic j out of fear of losing out in the competition for status. We operationalized structural equivalence as the proportion of exact matches between two physicians’ set of relationships with third parties: thus, the more their portfolio of relationships overlap, the higher the weight they give to one another. Appendix A details how we constructed the network exposure variables. We also used network exposure variables constructed by Burt: one capturing word-of-mouth operating over direct ties and one capturing competition for status between structurally equivalent physicians. Burt incorporated specialists’ imputed adoption data in his exposure variables. He also used different operationalizations of the influence weights than we did. Because of imputation problems, Burt could not compute influence from structurally equivalent colleagues for seven physicians. For one of these seven, Burt was also unable to compute a measure of social cohesion influence. For these few physicians, we substituted our measures of influence through structural equivalence and direct ties for Burt’s missing values. Table 4 presents descriptive statistics.

24

[ Table 4 about here ] Estimation and Specification Tests After constructing the variables, we deleted four physicians from the estimation sample, due to missing covariates. The data set for estimation contains 17 monthly observations for 121 individuals, 105 of whom had adopted by the last observation period. We estimated single-stage and two-stage partial observability hazard models using MLE. We estimated the marketing effort decay parameter δ using a grid search (cf. Berndt et al. 1997) in a model featuring no social network exposure variables. A value of .25 led to the highest model likelihood. Model fit was not very sensitive to changes in the range between .15 and .30. In subsequent analyses of models featuring both marketing effort and social network exposure, we kept δ fixed at .25. We checked for unobserved heterogeneity in both probit and logit single-stage models. In the probit models, we estimated a normal mixture while allowing the base hazard to vary freely every three months (cf. Han and Hausman 1990). In the logit models, we used the score tests developed by Hamerle (1990) and Commenges et al. (1994). None of the tests suggested the presence of significant unobserved heterogeneity (p > 0.10). We did not develop tests for unobserved heterogeneity for the two-stage probit model. To avoid redundancy, we present the results for the probit specifications only and omit the test statistics for unobserved heterogeneity. Results Table 5 reports the results for the single-stage models. The coefficients of all physician characteristics are significant at 90% confidence or higher. Age does not have the expected positive sign, though, suggesting that younger physicians did not delay adoption, on the contrary. The summer dummy has the expected negative sign, but is not or is only marginally significant. As expected from our situational analysis, marketing efforts by Lederle seem to have affected physicians more than peer influence or marketing efforts by later entrants, neither of which

25

showing a significant effect. We re-estimated the models without the advertising variables and obtained significant social effects then, but the log likelihoods were markedly lower. This indicates that Lederle’s marketing effort was the dominant driver. [ Table 5 about here ] As we argued earlier, it is possible that advertising and other marketing efforts mostly affected the probability of physicians becoming aware of the existence and purported benefits of the product, while physicians turned to their peers mostly to come to evaluate the merits of adoption. Forcing both effects into a single-stage model when both advertising and network effects truly operate but the former are much stronger may lead to the erroneous conclusion that only advertising mattered. We therefore repeated the analysis using the two-stage approach. In the first stage, awareness, we included three variables. Two were actor attributes reflecting physicians’ media exposure and likelihood of searching out information on new drugs and treatments: the number journals received and scientific orientation (i.e., importance attached to keeping oneself informed of new scientific developments). The third variable was Lederle’s advertising. We did not include others’ advertising because by doing so we encountered convergence problems and extreme point estimates with very large standard errors, without improving model fit. Both problems are indications of ill-conditioning, i.e. that the set of covariates is too rich relative to the information in the data. In the second stage, the set of covariates includes social network exposure, variables reflecting the need to prescribe antibiotics (summer and chief), and variables associated with one’s willingness to try a new drug (scientific orientation and professional age). Table 6 reports the results for the two-stage models. The key finding is that none of the variables operating at the evaluation stage had a significant effect. We are not able to explain any

26

variance in the evaluation stage. Just as in Table 4, the pattern holds across social contagion mechanisms. Note also that explicitly separating the two stages did not result in better model fit (Tables 5 vs. 4). The results allow us to separate the probability of adoption into the probability that a possible adopter is aware, i.e. Φ(κxit1), and the probability that a possible adopter who is aware evaluates the product positively, i.e. Φ(λxit2). In the first month, for instance, 11 out of 121 physicians adopted. To decompose this hazard of .09 into awareness and evaluation, we evaluate both cdfs at the sample mean values of the time-invariant physician characteristics and at the appropriate values for the time-varying covariates. Our estimates for the two structural equivalence models indicate that the .09 hazard of adoption is the product of a .10 chance of being aware and a .86 chance of adopting given being aware. The direct influence model using our own measure of social network exposure through direct ties generates the same decomposition, though that using Burt’s measure generates probabilities of .15 and .57 (whose product is .09). Hence, three of the four models indicate that physicians adopted almost as soon as they became aware of the product. Overall, the results—including the fact that second-stage covariates show up significant in the single-stage model only—are similar to what we observed in our simulation study in condition 4 where evaluation hardly mattered. Our results do not mimic the simulated condition 1 where social network exposure has a small effect that can be swamped in single-stage models. Rather, evaluation appears to have been irrelevant: physicians tended to adopt as soon as they were aware of the existence and benefits of tetracycline. [ Table 6 about here ] Discussion Our results about the absence of network effects once one controls for advertising not only contradict the received view of strong network effects in Medical Innovation (Rogers 1995), but

27

at the same time also explain the “weak” results obtained more recently by Marsden and Podolny (1990) and Strang and Tuma (1993). Marsden and Podolny estimated a Cox proportional hazard model, which is very similar to a discrete-time logit hazard model with a dummy variable for each time period. These dummies capture all cross-temporal variation in the mean adoption hazard, and leave only variance within particular time periods to be explained by network exposure. Strang and Tuma incorporate lagged penetration as a covariate, besides lagged network terms. Lagged penetration assumes that any physician interacts with any other physician (i.e., a constant wij for all i and j). Thus it ignores network structure but captures the crosstemporal variation in average network exposure. Similarly, our marketing variables vary over time but not across physicians. Hence, all three studies show that differences in adoption across physicians within any particular time period are not statistically significantly associated with differences in lagged social network exposure.2 Our study, however, is the only to provide an explanation for this finding grounded in a detailed situational analysis.

CONCLUSION This paper offers three results: •

Discrete-time hazard models can be expanded to two-stage models separating the effect of causal variables on awareness and evaluation, even when one only observes the final outcome of the adoption decision process. A small-scale simulation study indicates that twostage models are better at detecting weak advertising and social contagion effects;



Discrete-time hazard models, both single stage and two-stage variants, can be derived from random utility theory;

• 2

The Medical Innovation data do not document that diffusion is driven by contagion operating We thank Keith Ord and David Krackhardt for raising the issue of cross-temporal versus cross-sectional variation.

28

over social networks. Social contagion effects have been confounded with marketing effects. Even our more fine-grained two-stage models are unable to detect significant contagion effects once marketing effort is controlled for. We hope that the first two results will contribute in generating tighter linkages between theoretical models specifying how diffusion processes operate and statistical hazard models used to empirically investigate these processes. We see many venues to adapt and further develop the simple models we presented. The single-stage models can easily be extended to allow for disadoption. Let the probability of adoption be as in equation (1): P( yit = 1 yi,t-1 = 0 )

=

G (αxit),

and let the probability of disadoption be: P( yit = 0 yi,t-1 = 1 )

=

1 - G (βxit)

[11]

Adoption and disadoption can then be modeled jointly as: P( yit = 1 yi,t-1 )

=

G (αxit + γxityi,t-1), where γ = β-α.

[12]

The same procedure can be applied to the two-stage model, assuming that actors can forget even after having adopted. If actors do not forget after adoption, then a single-stage model suffices for disadoption and the likelihoods of adoption and disadoption can be maximized separately. Extensions to multiple competing innovations is straightforward for single-stage models: one simply uses a multinomial rather than binomial logit or probit specification. Andrews and Srinivasan (1995) have presented a method to extend multinomial choice models to two stages, consideration followed by choice given consideration, with the same conditional structure as the binomial model we used. The Andrews-Srinivasan method allows one to extend innovation adoption models with partial observability to competing innovations, provided that one imposes the constraint that non-adoption has probability equal to 1 of being in the set of choice

29

alternatives actors are aware of. We made several restrictive behavioral assumptions when constructing the set of covariates in our application, some of which can be relaxed. Two forms of myopia need to be highlighted. First, actors considering adoption were assumed to take into account only others’ past behavior, yj,t-1, not their current or future behavior, yj,t+k where k ≥ 0. Thus, we did not represent joint decision making among actors (k = 0). Nor did we account for the fact that the knowledge or expectation that others will adopt in the near future may affect one’s decision to adopt today. The latter dynamics may be important in competitive contexts, where preemptive adoption increases one’s competitive position. For individuals, these positional concerns may take the form of worrying about one’s social status based on being regarded as a trendsetter. For businesses, preemptive adoption may help one develop or sustain a competitive advantage through lower costs, better product quality, or a reputation. We expect that this problem can be circumvented in part by using a two-stage estimation method, similar to work in spatial autoregression.3 We also made restrictive assumptions about the richness of the information transmitted through the network. Social interaction, we assumed, informs potential adopters only about others’ choices, not their expected or achieved utility, post-adoption attitudes, or other evaluations. This is defensible when actors do not discern internal states or outcomes of others. Sometimes, however, outcomes are actually communicated by adopters (e.g., satisfaction) or can be observed (e.g., market share gains by a company that implements a new technology). Researchers having such data available can modify the model quite easily by substituting the relevant variable (say, qjt) for the yj,t-1 indicator and compute network exposure as Σj wij qjt. 3

Traditional estimation techniques do not take into account the interdependence among observations once one

allows contemporaneous contagion. Maximizing traditional likelihood functions does not account for this endogeneity and leads to invalid results (Besag 1975).

30

Alternatively, researchers may want to infer actors’ utility from their behavior statistically, and substitute these estimated utility levels for the yj,t-1 state indicator. Our last result is that (1) the Medical Innovation data do not provide statistical evidence of network effects in new product diffusion, and that (2) there are good reasons not to expect such effects for this particular innovation. Prior evidence of social contagion was an artifact, capturing the true but omitted effect of Lederle’s aggressive marketing efforts. Our findings cast doubt on a small but important part of the empirical base underlying the belief that innovations diffuse through social contagion. Our results must not be interpreted as suggesting that social network effects do not matter in general. Tetracycline, we have shown, simply is not a very adequate case to assess the importance of social contagion. Its product characteristics and the way it was marketed made it very unlikely for sizable social network effects to be at work. The absence of social network effect in such a foundational study, still, gives credence and salience to earlier calls for sound skepticism and for the need to be wary of confounds when studying social contagion. We hope that the tools we presented will help tighten the “triple alliance between theory, method and data” (Merton 1968) in diffusion research.

31

Appendix A. Procedures Used to Create Social Influence Weights Our analysis uses both discussion and advice relationships. Using the network data of all 228 physicians, we constructed the social weight matrices for each city separately in a series of steps. Step 1. First, we created adjacency matrices with element aij equal to 1 if i mentions j, and zero otherwise. We created two such adjacency matrices for each city: one for discussion ties and one for advice ties. Step 2. Since being discussion partners is a naturally reciprocal relationship, we symmetrized the discussion adjacency matrix (Alba and Kadushin 1976). Step 3. We constructed a pooled adjacency matrix by adding the symmetrized discussion matrix and the advice matrix, treating discussion and advice as indicators of a common underlying variable “interacting with.” We also performed analyses (not reported here) keeping discussion and advice separate. This did not affect the results. Step 4. We constructed four different weight matrices to account for various network contagion mechanisms. Direct tie matrices are identical to the adjacency matrices. We computed structural equivalence weights as the proportion of exact matches between two physicians’ set of relationships with third parties. A valid match required that the physicians had at least one common third party, which implies that actors without any common third party did not put any weight on each other's actions. Step 5 involved deleting all rows and columns referring to physicians who were not among the 125 included in the prescription sample. In step 6, we put all diagonals to zero and normalized all rows such that (1) wii = 0, and (2) Σjwij = 1 iff wij ? 0 for some j , and Σ jwij = 0 otherwise. This row-normalization implies that physicians are sensitive to the proportion rather than the number of relevant others who have adopted, and ensures that each network exposure variable is bounded between 0 and 1. Actor i’s social network exposure at time t can then be computed as Σj wij yj,t-1.

32

Example of operations in steps 2 and 4 Let matrix A shown below represent a discussion network measured by asking each of 10 physicians in a city who they discuss medical practice with. Hence, element aij equal to 1 if i mentions j, and zero otherwise (step 1). Since being discussion partners is a reciprocal relationship, but some physicians may forget to mention one of their colleagues they discuss medical practice with (e.g., 4 cites 1, but 1 does not cite 4), we symmetrize A and obtain B with bij = max (aij, aji) (step 2). We use the resulting matrix B to represent direct ties. Note the special structure: there are two sets of interconnected physicians (1,2,3,4) and (5,6,7,8,9,0), but there is no communication between these two sets. Adjacency Matrix A (Showing Partioning)

Direct Ties Matrix B (Showing Partioning)

1 2 3 4

1 0 1 1 1

2 1 0 0 0

3 1 0 0 0

4 0 0 0 0

5 0 0 0 0

6 0 0 0 0

7 0 0 0 0

8 0 0 0 0

9 0 0 0 0

0 0 0 0 0

5 6 7 8 9 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 1 1 1 0 0

0 0 1 0 0 0

1 0 0 1 0 0

1 0 0 0 1 1

0 0 0 1 0 0

0 0 0 1 0 0

1 2 3 4

1 0 1 1 1

2 1 0 0 0

3 1 0 0 0

4 1 0 0 0

5 0 0 0 0

6 0 0 0 0

7 0 0 0 0

8 0 0 0 0

9 0 0 0 0

0 0 0 0 0

5 6 7 8 9 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 1 1 1 0 0

1 0 1 0 0 0

1 1 0 1 0 0

1 0 1 0 1 1

0 0 0 1 0 0

0 0 0 1 0 0

Structural equivalence captures the extent to which two physicians i and j share their contacts. The (i,j)th entry in matrix C indicates the proportion of non-zero matches between the ith and jth row in the direct ties matrix. Physician 9, for instance, shares his only contact (8) with three other physicians: 5, 7 and 0. For physician 0, colleague 8 is the only contact as well, hence physicians 8 and 9 have a totally overlapping portfolio and c98 equals 1. Physician 5, in contrast, is related to 3 physicians in total, so c95 equals only .33. Structural Equivalence Matrix C

1 2 3 4

1 ---1.00 0.00 0.00 0.00

2 ---0.00 1.00 1.00 1.00

3 ---0.00 1.00 1.00 1.00

4 ---0.00 1.00 1.00 1.00

5 ---0.00 0.00 0.00 0.00

6 ---0.00 0.00 0.00 0.00

7 ---0.00 0.00 0.00 0.00

8 ---0.00 0.00 0.00 0.00

9 ---0.00 0.00 0.00 0.00

0 ---0.00 0.00 0.00 0.00

5 6 7 8 9 0

0.00 0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00 0.00

1.00 0.50 1.00 0.25 0.33 0.33

0.50 1.00 0.50 0.50 0.00 0.00

1.00 0.50 1.00 0.25 0.33 0.33

0.25 0.50 0.25 1.00 0.00 0.00

0.33 0.00 0.33 0.00 1.00 1.00

0.33 0.00 0.33 0.00 1.00 1.00

33

Table 1. How well 2-stage and 1-stage models recover advertising and social contagion effects in a simulated 2-stage process,

assessed under three conditions of relative effect size _________________________________________________________________________________________________________________________________ Condition 1 _____________________________________________

Condition 2 __________________________________________

2-stage 1-stage _______________________ ____________________ Point. Proportion Point. Proportion Estimates Signif. at p ≤ Estim. Signif. at p ≤ ___________ ___________ ______ ___________ True α1 α2 α0

-2 0

Avg.

5%

10%

Avg.

5%

10%

2-stage ____________________

1-stage ____________________

Condition 3 __________________________________________ 2-stage 1-stage ____________________ ____________________

Point Proportion Point Proportion Estimates Signif. at p ≤ Estim. Signif. at p ≤ ___________ ___________ ______ ___________

Point Proportion Point Proportion Estimates Signif. at p ≤ Estim. Signif. at p ≤ ___________ ___________ ______ ___________

True

True

-2.03 0.23

0 -2

Avg.

5%

10%

Avg.

5%

10%

-0.09 -2.00

-1.25 -1.25

-2.26

Avg.

5%

10%

Avg.

5%

10%

-1.35 -1.17

-2.25

-2.30

β1 γ1 β1/γ1

3 1 3

3.08 1.00 3.07

100% 100%

100% 100%

2.38 0.74 3.23

98% 100%

100% 100%

1 1 1

1.05 1.04 0.99

14% 100%

24% 100%

0.25 0.39 0.69

6% 100%

12% 100%

2 1 2

2.15 1.03 2.10

84% 100%

92% 100%

1.14 0.54 2.10

64% 100%

80% 100%

β2 γ2 β2/γ2

1 1 1

1.11 1.20 0.91

26% 96%

50% 100%

0.27 0.40 0.70

16% 100%

22% 100%

3 1 3

3.32 1.12 3.00

100% 100%

100% 100%

2.34 0.76 3.11

100% 100%

100% 100%

2 1 2

2.07 1.07 1.96

100% 100%

100% 100%

1.19 0.62 1.96

96% 100%

98% 100%

LL

-218.50

-221.44

-193.88

-199.76

-217.21

-221.28

_________________________________________________________________________________________________________________________________ Condition 1: strong advertising effects, weak social contagion Condition 2: weak advertising effects, strong social contagion Condition 1: moderate advertising effects, moderate social contagion α1 α2 α0

intercept stage 1, β1 advertising effect, γ1 covariate stage 1 intercept stage 2, β2 social contagion, γ2 covariate stage 2 intercept in one-stage model

34

Table 2. How well 2-stage and 1-stage models recover advertising and social contagion effects in a simulated 2-stage process when one of the two stages is quasi-redundant _____________________________________________________________________________________________ Condition 4 _____________________________________________ 2-stage 1-stage _______________________ ____________________ Point. Proportion Point. Proportion Estimates Signif. at p ≤ Estim. Signif. at p ≤ ___________ ___________ ______ ___________ True α1 α2 α0

-2 1

Med.

5%

10%

Med.

5%

10%

-2.00 0.82

Condition 5 __________________________________________ 2-stage ____________________

1-stage ____________________

Point Proportion Point Proportion Estimates Signif. at p ≤ Estim. Signif. at p ≤ ___________ ___________ ______ ___________ True 1 -2

Med.

5%

10%

Med.

5%

10%

0.61 -2.02

-2.10

-2.10

β1 γ1 β1/γ1

3 1 3

3.20 1.05 3.11

100% 100%

100% 100%

2.91 0.91 3.29

100% 100%

100% 100%

.5 .5 1

0.84 0.48 1.95

2% 14%

6% 36%

0.18 0.14 1.28

6% 44%

12% 50%

β2 γ2 β2/γ2

.5 .5 1

0.57 0.43 0.77

2% 4%

2% 12%

-0.03 0.09 -0.02

6% 30%

12% 32%

3 1 3

3.35 1.11 2.86

100% 100%

100% 100%

2.82 0.93 2.93

100% 100%

100% 100%

LL

-209.24

-209.90

-179.65

-181.06

_____________________________________________________________________________________________ Condition 4: awareness (stage 1) relevant, evaluation (stage 2) quasi-redundant Condition 5: awareness (stage 1) quasi-redundant, evaluation (stage 2) relevant

35

Table 3. Doctors did not consider colleagues an important source of influence or information _____________________________________________________________________________________________________________________ Percentage of physicians crediting a source with a

Percentage of physicians mentioning a source of information b

Original influence Most influence First source Intermediate source Final source _____________________________________________________________________________________________________________________ Detail men Journal articles Direct mail Drug house periodicals

57 7 18 4

38 23 8 5

52 22 6 3

27 16 21 11

5 14 21 21

Colleagues Meetings

7 3

20 __

10 3

15 4

28 8

All other media

4

6

3

7

3

_____________________________________________________________________________________________________________________ a

Based on Katz (1961, p. 77). A crosscheck against the Medical Innovation network data set prepared by Burt (1986) indicates that the base for these percentages are the 141 physicians (out of a total of 216 interviewed after the 12 exploratory interviews) whose most recent adoption was tetracycline. b Based on Coleman, Katz, and Menzel (1966, p. 59). Data were available for 87 adopters, who generated 131 mentions of sources intermediate to first and last source. Thus, the base for the percentages in the first and third column is 87, that for those in the middle one is 131.

36

Table 4. Descriptive statistics _____________________________________________________________________________________________________________________

Mean 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Y

.111 .084 .000 2.907 Journals (log) 1.484 Science .268 Chief .098 Direct ties .384 Structural equivalence .379 Direct ties (Burt) .334 Structural equivalence (Burt) .451 Advertising by Lederle .155 Advertising by others .191

Summer Age Age2

SD .314 .278 1.706 2.561 .392 .443 .298 .413 .321 .321 .419 .087 .224

Min 0 0 -2.54 0.21 0.69 0 0 0 0 0 0 0 0

Max 1 1 2.46 6.48 2.20 1 1 1 1 1 1 0.24 0.98

1

2

3

4

5

6

7

8

9

10

11

12

-.05 -.09 -.08 .10 .12 -.05 .01 .01 .07 .02 .05 .03

.05 .05 -.07 -.08 .00 .25 .25 .18 .27 .30 .02

.17 -.16 -.01 .31 .15 .13 .15 .18 .13 .16

-.03 .01 .26 .18 .14 .07 .09 .12 .12

-.02 .06 -.05 -.22 -.04 -.15 -.17 -.14

.29 -.14 -.16 -.14 -.16 -.19 -.17

.17 .05 .09 .15 .05 .06

.75 .68 .86 .68 .57

.70 .82 .80 .71

.72 .69 .63

.75 .60

.68

N = 947

_____________________________________________________________________________________________________________________

37

Table 5. Effect of explanatory variables on the adoption hazard in single-stage models,

organized by social contagion mechanism _____________________________________________________________________________________________________________________

Intercept Summer Age Age2 Chief Journals (log) Science Social contagion Advertising by Lederle Advertising by others Ln L

Direct ties __________

Structural equivalence ___________________

Direct ties (Burt) ______________

Structural equivalence (Burt) ________________________

-2.457 d -0.410 -0.068 a -0.049 b -0.492 b 0.520 d 0.578 d -0.087 2.761 b 0.192

-2.428 d -0.403 -0.068 a -0.048 b -0.509 b 0.495 c 0.588 d -0.262 3.081 c 0.283

-2.382 d -0.424 a -0.071 a -0.047 b -0.521 b 0.485 c 0.578 d 0.305 1.929 a 0.036

-2.435 d -0.420 a -0.068 a -0.050 b -0.501 b 0.512 d 0.580 d -0.004 2.565 b 0.158

-304.16

-303.93

-303.51

-304.26

_____________________________________________________________________________________________________________________ a

: p