Estimating Dynamic Games of Complete Information with an Application to the Generic Pharmaceutical Industry ∗ A. Ronald Gallant Duke University Fuqua School of Business Durham NC 27708-0120 USA

Han Hong Stanford University Department of Economics Stanford CA 94305-6072 USA

Ahmed Khwaja Duke University Fuqua School of Business Durham NC 27708-0120 USA First draft: January 2008

Abstract We estimate a dynamic oligopolistic entry model for the generic pharmaceutical industry that allows for dynamic spillovers from experience due to entry on future costs. Our paper contributes to both the estimation of oligopolistic dynamic games and the understanding of entry decisions in the pharmaceutical industry. Our dynamic model features unobserved firm production costs that are serially correlated over time. This introduces difficulty in the estimation of the dynamic game theoretic model which we overcome using sequential importance sampling methods. Our empirical findings show that the dynamic evolution of the production cost plays an important role in the equilibrium path of the pharmaceutical industry structure. Keywords: Dynamic Games, Dynamic Spillovers, Generic Pharmaceuticals, Sequential Importance Sampling. JEL Classification: E00, G12, C51, C52 ∗

Supported by the National Science Foundation.

1

1

Introduction Entry behavior by pharmaceutical firms in the generic drug industry is an important topic

of investigation in the empirical literature. For example, Scott-Morton (1999) shows that entry can be predicted by organizational experience, size of the market and the condition of the generic drug. However, much is still to be understood regarding the entry decision of pharmeceutical firms when they engage in a process of dynamic oligopolist competition. Entry decisions of myoptic pharmaceutical firms in a static competition environment are drastically different from those that are forward looking in a dynamic competitive environment. In a dynamic setting, current entry can have a potential spillover effect on future entry. A firm might enter even if current opportunity is loss generating as entry improves payoffs in the future. In order to evaluate the effects of past experience on future cost reduction and entry, and to evaluate the effects of costs on entry, we need to formulate and estimate a structural model of a dynamic game of oligopolistic competition with serially correlated unobserved cost. While there has been a substantial recent development of the empirical dynamic game estimation literature, incorporating serially correlated unobserved production costs induces additional computational difficulties to the econometric estimation procedure. We overcome this difficulty using the recent development of sequential importance sampling techniques. The sequential importance sampling method offers a drastic improvement in the computation speed which makes the estimation of this dynamic model feasible.

2

Related Literature

Static games under the incomplete information assumption have been studied by Bjorn and Vuong (1984), Bresnahan and Reiss (1991a), Bresnahan and Reiss (1991c), Haile, Hortacsu, and Kosenok (2003), Aradillas-Lopez (2005), Ho (2005), Ishii (2005), Pakes, Porter, Ho, and Ishii (2005), Augereau, Greenstein, and Rysman (2005), Seim (2005), Sweeting (2005), Tamer (2003), Manuszak and Cohen (2004), Rysman (2004), Gowrisankaran and Stavins (2004) and Bajari, Hong, Krainer, and Nekipelov (2006). Dynamic versions have been studied by Aguirregabiria and Mira (2002), Bajari, Benkard, and Levin (2004), Berry, Pakes, and 2

Ostrovsky (2003), Pesendorfer and Schmidt-Dengler (2003) and Bajari, Chernozhukov, Hong, and Nekipelov (2007). These papers all made the strong assumption that there is no market or firm level unobserved heterogeneity other than a random shock that is independent and identically distributed across both time and players. This assumption is restrictive because it rules out unobserved dynamics in the latent state variables and rules out any private information that a player might have about competing firms that the researcher does not have. On the other hand, Bresnahan and Reiss (1991b), Berry (1992), Tamer (2003), Ciliberto and Tamer (2003) and Bajari, Hong, and Ryan (2004) investigated static games of complete information. The complete information assumption allows substantial unobserved heterogeneity at the level of the firms. These games typically require combinatorial algorithm to search for an equilibrium instead of the continuous fixed point mapping for the incomplete information models. To our knowledge, we are the first to undertake the challenge of applying the complete information model in a dynamic setting. In the single agent dynamic framework, there is a considerable research that allows for unobserved heterogeneity that is time invariant, e.g., Keane and Wolpin (1997). However work that allows for serially correlated unobserved heterogeneity is rare. Building on the work of Heckman (1981), a frequentist simulation based approach was developed by Khwaja (2001) to integrate out unobserved state variables in the context of a finite horizon dynamic discrete choice model. This approach however relies on closed form expressions for the transition probabilities and a state space that is computationally tractable. Recent work by Imai, Jain, and Ching (2005) and Norets (2006) develops Bayesian methods for single agent dynamic discrete choice models with unobserved state variables that are serially correlated over time. Of relevance to our work is that Bayesian methods are likelihood based as is our approach and so, as shown by Chernozhukov and Hong (2003), computational methods useful in Bayesian inference may be applied to likelihood based frequentist inference and vice versa. Therefore, although this paper is frequentist, as a practical matter we extend this Bayesian estimation literature to dynamic discrete games. Our computational methods may be used for either frequentist likelihood based inference or Bayesian inference. Our implementation makes use of a sequential importance sampler. Sequential impor3

tance sampling methods are have been used by Fernandez-Villaverde and Rubio-Ramirez (2005) for estimating macroeconomics dynamic stochastic general equilibrium models. The structure of dynamic stochastic general equilibrium models are very similar to dynamic discrete choice models. The game theoretical component of our implementation is new. In a continuous time setting, Nekipelov (2007) develops a flexible indirect inference estimator for continuous time dynamic games in the context of eBay auctions without requiring the complete solution of the dynamic game. This is a novel approach that has potential applications in dynamic oligopolistic competition models. Relatedly, Benkard, Weintraub, and Roy (2007) introduced the notion of oblivious equilibrium to facilitate the computation of dynamic game equilibria.

3

Institutional Background

Our empirical model examines the data analyzed in Scott-Morton (1999) on the generic drug entries in the period 1984 to 1994. This time period is particularly interesting because of the 1984 Waxman-Hatch Act which permitted Abbreviated New Drug Applications (ANDAs) by generic firms. The important timing of the institutional environment was discussed in details in Scott-Morton (1999). The relevant facts for our study are summarized in the following. The preparation of an ANDA application takes months to years because it requires contruction of manufacturing facilities that need to be inspected and approved by the FDA. The sunk cost of an ANDA market entry is high even though it is much less than a new drug invention. The size and heterogeneity of entry cost relative to the size of the market revenue lead to a small number of entrants supported by each market. In addition, the FDA does not reveal when and from whom it receives ANDA applications. These features of the data are consistent with our modelling assumption of a dynamic simultaneous entry game among a small number of competing pharmaceutical firms, in which firms have to face substantial uncertainty in rival price competition when they incurr the fixed sunk cost of entry. As discussed in Scott-Morton (1999), announced entry is very rare, because firms do not want to signal the common market value. They also fear that the delay in the approval will invite competition. There are few late sequential movers who withdrew in response to rivals’ 4

approvals. But simultaneous moves in a dynamic context are more important. Because of the “generic scandal” that broke out in 1989 and the general upheaval and uncertainty in the generic drug industry surrounding this period, we take great care in processing the data between 1988 to 1993.

4

Model

In this section we formally describe the entry game. Each time the market is open counts as one time increment. The time index is denoted t. Two implications are that irrespective of the calendar time that has elapsed between the adjacent market openings, (i) cost decreases are homogeneous, and (ii) the discount rate is held constant between market openings. These are plausible assumptions for our application as in our estimation sample (described below in Section 7) there are 40 openings in the period 1990-94 or on average a market opening every 1.5 weeks. This convention avoids unsurmountable computational difficulties in solving for the equilibrium of the model caused by unequal spacing. Moreover, we are only required to get the ordering of the data correct rather than try to determine market entry dates precisely. We order according to the date when an ANDA application was recieved by the FDA. This convention appears to us to be a reasonable a priori view as to how costs, which are unobservable, would behave. In view of the excellent fit to the data that we are able to achieve (Section 8) this convention appears reasonable a posteriori. As mentioned earlier, we are calling this latent variable cost for convenience. It is really a mixture of experience (learning by doing), capital accumulation, etc. The actions available to firm i when market t opens are to enter, Ait = 1, or not enter Ait = 0. There are I firms in total so that the number who enter market t is given by Nt =

I X

Ait

(1)

i=1

Past entry decisions and random shocks determine cost Cit = exp(cit ), where, as just indicated, we follow the standard convention that a lower case quantity denotes the logarithm of an upper case quantity. The equation governing the log cost of firm i at time t is cit = µc + ρc (ci,t−1 − µc ) − κc Ai,t−1 + σc eit , 5

(2)

where eit is normally distributed shock with mean zero and unit variance, σc is a scale parameter, κc is the immediate impact of time t that is felt if the market was entered at time t − 1, µc is a location parameter, and ρc is an autoregressive parameter which is presumed to satisfy 0 < ρc < 1. We assume that all firms are ex ante identical, with all the heterogeneity in costs driven by past decisions, hence none of these parameters are indexed by i. Log cost can be decomposed into a sum of an observable and unobservable components as follows: ci,t = cu,i,t + ck,i,t

(3)

cu,i,t = µc + ρc (cu,i,t−1 − µc ) + σc eit

(4)

ck,i,t = ρc ck,i,t−1 − κc Ai,t−1

(5)

From these equations one can see that the location parameter µc can be interpreted as the mean of the unobervable portion of log cost and that the total impact of a firm’s past entry decisions is ck,i,t = −

P∞

j=0

ρj κc Ai,t−j−1 .

The total revenue to be divided among firms who enter market t is Πt = exp(πt ) given by

πt = µπ + σπ e0,t

(6)

where e0,t is normally distributed with mean zero and unit variance, µπ is a location parameter and σπ a scale parameter. The value we have for total revenue is that for the last year the drug was on patent. We interpret this value as being proportional to the total discounted value of the revenue flows as a generic drug once the drug goes off patent. Post the 1989 scandal there were fifty firms that entered the market. We cannot hope to model all fifty and so consider only the dominant firms. Therefore in the following Nt is used to denote the number of entering dominant firms. We consider the case of two, three, and four dominant firms. Nt is less than or equal to I, which is the total number of dominat firms (i.e. 2, 3, or 4), which is considered to be time-invariant. Nt is to be differentiated from Nta , which is used to denote the total number of entrant firms at time t including both dominant and nondominant firms.

6

We allow for nondominant firms as follows. Regressions indicate that log N a = b log Π, with b ≈ 0.092, is a reasonable approximation to the total number of firms that enter a market. Therefore, when one of the dominant firms is considering entry, it can anticipate that the revenue available to be divided among all dominant firms is log Πanticipated = log Π −

b log Π = log Π1−b . These considerations suggest that a reasonable functional form for dominant firm i’s per period revenue at time t is Ait (Πγt π /Nt − Cit ) ,

(7)

with 1 − b = 0.908 being a reasonable lower bound for γπ . We assume an infinite horizon. Thus, the firm’s total discounted revenue at time t is ∞ X

π β j Ai,t+j Πγt+j /Nt+j − Ci,t+j ,

(8)

j=0

where β is the discount factor, 0 < β < 1. The parameters of the model may be summarised as θ = (µc , ρc , σc , κc , µπ , σπ , γπ , β).

(9)

Denote a subgame perfect pure strategy equilibrium profile for this game by E E AE i,t , A−i,t , ⇒ Nt ,

(10)

E where AE i,t is the entry decision of firm i for market t, A−i,t the vector of entry decisions of

the other dominant firms, and NtE is the number of firms that enter, which can be computed from the profile using (1). Thoerem 3.1 of Dutta and Sundaram (1998) implies that this game has a subgame perfect Markovian equilibrium in mixed strategies under reqularity conditions, the most important of which is that revenue and cost can only take on a finite set of values. This can be relaxed to countable values Parthasarathy (1982). We could modify our problem to meet this requirement but there seems no need to because we have no trouble computing pure strategy equilibria for the problem as posed with an infinite state space. A pure strategy equilbrium is, of course, also a mixed strategy equilibrium. What is most interesting about Theorem 3.1, however, is that its proof provides the details that motivate our computational strategy, discussed in Section 6. (See also Rust (2006) in this connection.) The regularity 7

conditions of Theorem 5.1 of Dutta and Sundaram (1998) come closer to the problem as we have posed it, notably the revenue and cost do not have to be discrete but they would need to be bounded. The equilibrium provided by Theorem 5.1 may depend on two periods of the state vector but we find that we can alway find equlibria that depend on one period only. Thus, although there are results that imply that a slightly modified version of the proposed game has equilibria, in the sequel we rely mostly on the fact that we have no trouble computing equilibria. A stationary pure strategy Markov equilibrium of the dynamic game can be computed E by finding an equilibrium (AE i,t , A−i,t ) for the game with payoffs

Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt )

(11)

= Ait (Πγt π /Nt − Cit ) h

+ β E Vi (Ci,t+1 , C−i,t+1 , Πt+1 ) | Ai,t , A−i,t , Ci,t , C−i,t , Πt ,

i

where Vi (Ci , C−i , Π) solves the Bellman-type equation Vi (Cit , C−i,t , Πt )

(12)

γπ E = AE it Πt /Nt − Cit

i

h

E + β E Vi (Ci,t+1 , C−i,t+1 , Πt+1 ) | AE i,t , A−i,t , Ci,t , C−i,t , Πt .

In writing the Bellman-equation it is implicitly understood that the equilibrium actions

E AE i,t , A−i,t are functions of the state variables of cost and revenue (Ci,t , C−i,t , Πt ). The

E conditions that (AE i,t , A−i,t ) must satisfy to be the sought equilibrium are E E Vi (AE i,t , A−i,t , Ci,t , C−i,t , Πt ) ≥ Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt ) ∀ i, t.

5

(13)

Solving the Model

The log state vector is st = (c1t , ..., cIt , πt ) .

(14)

Denote the entry decisons of all firms (strategy profile) by At = (A1t , ..., AIt ) . 8

(15)

Conditionally upon st and At , the elements of st+1 are independently normally distributed with means µit = µc + ρc (cit − µc ) − κc Ait for the first I elements and µIt = µπ for the last and with standard deviations σi = σc for the first I elements and σI = σπ for the last. The conditional expectation of functions of the form f (st+1 ) given (At , st ), such as appear in equations (11) and (12), can be computed by Gauss-Hermite quadrature. The weights wi and abscissae xi for Gauss-Hermite quadrature may be obtained from tables such as Abramowitz and Stegun (1964) or by direct computation using algorithms such as Golub and Welsch (1969) as updated in Golub (1973). To integrate with respect to si,t+1 conditionally upon At √ and st the abscissae are transformed to s˜t+1,i = µi + 2σi xi , and the weights are transformed √ to w˜i = wi / π. Then, for a K + 1 rule, E[f (st+1 ) | At , st ] ≈

K X

···

K X

K X

f (˜ st+1,i1 ,· · · ,tildest+1,iI , s˜t+1,iI+1 )w˜i1 · · ·w˜iI w˜iI+1 .

i1 =−K iI =−K iI+1 =−K

(16) Note from above that both the abscissae s˜t+1 and the weights w˜t+1 depend on At and st . If, for example, there are two firms and a three point quadrature rule is used, then E[f (st+1 ) | At , st ] ≈

1 1 1 X X X

f (˜ si , s˜j , s˜k )w˜i w˜j w˜k .

i=−1 j=−1 k=−1

We use three point rules throughout. A three point rule will integrate a polynomial in st+1 up to degree five exactly. Equations (11), (12), and (13) can be expressed in terms of st by putting Cit = exp(sit ) for i = 1, . . . , I and Πt = exp(sI+1,t ). Assume that this has been done. Suppose, for the moment, that the value function V (st ) = (V1 (st ), . . . , VI (st )) ,

(17)

where each Vi (st ) solves its respective equation (12), is known. Then an equilibrium for the game with payoffs given by equation (11) can be found by checking (13) directly for all possible At . There will, at times, be multiple equilibria. This usually takes the form of a prisoner’s delemma situatation where one or another firm can profitably enter but if both enter they both will incurr losses whereas if neither enters they would get the continuation value of the game. In the three firm game the frequency of multiple equilibria is 4%. We 9

resolve this situation by assuming a coordination game. The firms that have the lowest costs Cit are those that are allowed to enter. That is, the At are ordered by increasing C=

PI

i=1

Ait Cit and the first At that satisfies (13) is accepted as the solution. Note that our

distributional assumptions on st guarantee that no two C can be equal so that this ordering of the At is unique. Moreover, none of the Cit can equal one another and when that is true we have never failed to be able to compute a pure strategy equilibrium. We approximate the value function V (st ) by a local linear approximation as follows. We define a grid on the state space which defines a set of (I + 1)-dimensional rectangles that have grid lines as boundaries. We refer to the center K of such a rectangle as its key. To approximate the value function at a point st , the key K of the rectangle that contains st is located and the value function is approximated by V (st ) = bK + (BK )st ,

(18)

where bk is a vector of length I and BK is an I by I + 1 matrix. We set the grid increments at 16σ although it does not matter much. The set of keys that actually get visited will be about the same for grid increments as small as 4σ because past decisions operating though recursion (5) have far more influence on the location of st than do random shocks operating through recursion (4) or equation (6). For a three firm game the number of rectangles that actually are visited is about six. It remains to compute the coefficients bk and BK . This we do as follows. We initialize to zero. We generate a set of abscissae {˜ st } clustered about K and solve the game with payoffs

(11) to get corresponding equilibria {A˜E ˜t , A˜E pairs into (11) to get t }. We substitue the s t ˜E ˜ ˜ ˜ updated values {V˜i (˜ st ) = Vi (A˜E it , A−i,t , Cit , C−i,t , Πt )} for the value function. Then, using the pairs {(˜ st , V (˜ st )} as data, we compute bk and BK by mulitvariate least squares. We repeat until the bk and BK stablize. We have found that twenty interations suffices. As the notation suggests, the easiest way to get a cluster of points about a key is to use absciccae from the quadrature rule described above with st set to K and At set to zero. But if so, one must take care to jitter the points so that no two firms have exactly the same cost. It is possible to apply a modified Howard acceleration strategy as descirbed in Kuhn (2006); see also Rust (2006) and Howard (1960). The idea is simple: The solution {A˜E t } of 10

the game with payoffs (11) will not change much, if at all, for small changes in the value function V (s). Therefore, rather than recompute the solution at every step of the (bk , Bk ) interations, one can reuse a solution for a few steps. We find that this strategy becomes riskier as the number of firms increases. A conservative approach that does work is to not use the modified Howard acceleration for the firs two iterates of (bk , Bk ). On the third, use it once; on the fourth, twice; the fifth, thrice; and so on. We have found that twenty iterations suffices, counting regular steps and modified Howard steps equally. Our code is written in C++ and makes heavy use of the object oriented programming style, the classical data structures in the standard template library, and a matrix class. We surmise that trying to implement these methods in a language without the first two features, such as Matlab, would be exasperating beyond endurance.

6

Computing the Likelihood

In this section describe our estimation strategy. From the computational standpoint, the setting is as follows. There are I firms, i = 1, . . . , I, who can enter the market or not at each time period t. If firm i enters at time t, then Ait = 1; if not, Ait = 0. The number of firms that enter at time t, is Nt =

PI

i=1

Ait . The total anticiapated revenue available to the firms in each market is

Πγt π , which is divided equally among those firms that enter. We can observe both Πt and At = (A1t , . . . , AIt ). Log cost, ci,t = log Ci,t , is the sum of two componants. The first is log Cu,i,t , which is known by all firms but not by us. The second is log Ck,i,t , which is known by all firms and by us. Both evolve as a Markov process. Cost together with revenue Πt is the state vector. Denote the part of the state vector that is hidden from us by Xt = (Cu,1,t , . . . , Cu,I,t ).

(19)

Denote the variables that we can observe by Yt = (A1t , . . . , AIt , Ck,1,t , . . . , Ck,I,t , Πt ).

(20)

As previously, a lower case variable denotes the logarithm of an upper case variable with the exception that at = At ; i.e. xit = log Xit , cu,i,t = log Cu,i,t , ck,i,t = log Ck,i,t , πit = log Πit , 11

xt = (cu,1,t , . . . , cu,I,t ), and yt = (a1t , . . . , aIt , ck,1,t , . . . , ck,I,t , log Πt ). With these conventions, cost evolves as ci,t = cu,i,t + ck,i,t

(21)

cu,i,t = µc + ρc (cu,i,t−1 − µc ) + σc eit

(22)

ck,i,t = ρc ck,i,t−1 − κc Ai,t−1

(23)

and revenue evolves as πt = µπ + σπ e0,t .

(24)

We shall estimate model parameter by maximum liklihood using the Chernozhukov and Hong (2003) MCMC method. ADD CITE TO Belloni and Chernozukov AND COMMENT ON IT This method uses the Metropois-Hastings algorithm wherein one proposes a value for model parameters and then decides whether to accept or reject it. These proposed parameter values θ = (µc , ρc , σc , κc , µπ , σπ , γπ , β) are known to us for the purpose of computation. We have data for both the pre and post scandal periods. The pre-scandal period is indexed by t = −n0 , . . . , 0 and the values of Yt over the pre-scandal period are denoted by Ypre . The post-scandal period is indexed by t = 1, . . . , n. While the scandal changed the market structure thus rendering the pre-scandal data unsuitable for general estimation, it can still be used for two purposes: The entry decisions {Ait }0t=−n0 can be used to compute the last two pre-scandal values ck,i,−1 and ck,i,0 of the observable part of log cost for each firm; and the pre-scandal log revenue {πt }0t=−n0 can be used to help identify the parameters µπ and σπ . We compute the initial values ck,i,−1 and ck,i,0 for each firm by running the recursion (23) started at zero over the observed choices {Ait }0t=−n0 . We now know the vectors y−1 and y0 because (π−1 , A−1 ) and (π0 , A0 ) are in Ypre . While the scandal may have affected which firms participated in the market post-scandal, there is no reason to believe that market oportunites were different pre- and post-scandal. Therefore the pre-scandal data can be used to help identify the revenue distribution. From Ypre we can compute a normal likelihood for log revenue over the period −n, . . . , 0. Although this likelihood actually only depends on the two elements (µπ , σπ ) of θ, we denote it as p(Ypre |θ) 12

(25)

for convenience. Because At is a deterministic function of (xt , πt , yt−1 , θ), the density p(At |πt , xt , yt−1 , θ) puts mass one on a single value of At . The implication is that the likelihood over the post scandal data Ypost assumes the value one if we predict every entry decision perfectly and zero otherwise. To avoid this situation, we adopt the following density for At p(At |πt , xt , yt−1 , θ, pa ) =

I Y

c

c

(pa )I(Ait =Ait ) (1 − pa )I(Ait 6=Ait )

(26)

i=1

where 0 < pa < 1 and Acit is computed from (xt , πt , yt−1 , θ) using the methods described in Section 5. Douced, de Freitas, and Gordon (2001) present a concise description of the sequential importance sampler that is adequate to follow what we do here. The densities relevant to a sequential importance sampler are the transition density of the hidden state vector p(xt |xt−1 , θ),

(27)

which is defined by recursion (23), the initial density p(x0 |θ),

(28) q

which, from (23), is normal with mean µc and standard deviation σc / 1 − ρ2c , and, the observation density p(yt |yt−1 , xt , θ, pa ) = p(At |πt , yt−1 , xt , θ, pa ) p(πt |yt−1 , xt , θ)

(29)

where from (24) p(πt |yt−1 , xt , θ) = p(πt |θ), is normal with mean µπ and standard deviation σπ . The sequential importance sampler for (θ, pa ) is given in the following: 1. For t = 0 (i)

(a) Start N particles by drawing x0 for i = 1, . . . , N from the initial density (28). (b) Compute p(y0 |θ, pa ) = . =

Z

p(y0 |y−1 , x0 , θ, pa ) p(y−1 , x0 |θ, pa ) dx0

N 1 X (i) p(y0 |y−1 , x0 , θ, pa ). N i=1

13

2. For t = 1, . . . , n (i)

(a) For each particle draw x˜t from the transition density (27) and set (i)

(i)

(i)

x˜0:t = (x0:t−1 , x˜t ). (i)

(b) For each particle compute the particle weights wˆt using the observation density (29); i.e. (i)

(i)

w˜t = p(yt |yt−1 , x˜t , θ, pa ) (Here is where we run into trouble with a deterministic function because the weights could all be zero.) (c) Normalize the weights so that they sum to one (i)

(i) wˆt

w˜t

= PN

i=1

(i)

w˜t

. (i)

(i)

x0:t } (d) For i = 1, . . . , N sample with replacement the particles x0:t from the set {˜ (i)

according to the weights {wˆt }. (Note the convention: Particles with unequal (i)

weights are denoted by {˜ x0:t }. After resampling the particles are denoted by (i)

{x0:t }.) (e) Compute p(yt |y1:t−1 , θ, pa ) = . =

Z

p(yt |yt−1 , xt , θ, pa ) p(yt−1 , xt |y1:t−1 , θ, pa ) dxt

N 1 X (i) p(yt |yt−1 , xt , θ, pa ). N i=1

(i)

(Note that p(yt |yt−1 , xt , θ) does not have to be recomputed here if the weights (i)

(i)

w˜t are associated to xt in the resampling step and saved. If each firms entry decisions are similarly associated, then classification error rates can be computed at this step.) 3. Done (a) The likelihood is p(y1:t |θ) = p(Ypre |θ)p(y0 |θ)

n Y t=1

14

p(yt |y1:t−1 , θ).

Steps 2a and 2b can be parallelized by means of threads as follows. The N particles can be divided up into groups, one group for each of the machine’s CPUs. Steps 2a and 2b are performed on each group by a separate thread. Unfortunately, all other parts of the algorithm are serial. Another approach to parallelization is to use message passing (MPI) and let each CPU compute a chain. These sub chains are then concatonated to get one long chain. Too avoid having to burn off transients on each CPU, one can start from the maximum of the likelihood and use a different initial seed on each CPU. The general purpose impmenentation of the Chernozhukov and Hong (2003) that we use will do this. It is in the public domain and available at econ.duke.edu/webfiles/arg/emm. Our strategy is to run using threads until the maximum has been found and the chain tuned so that the rejection rate is about 30% for each parameter and then switch to the MPI strategy to get enough draws to compute standard errors. It is probably an unnecessary precaution, but when we switch to the MPI runs, we quit using the modified Howard acceleration method. An acceleration method that can be used with either parallization strategy is to define the proposal density on a grid and save the likelihood for each draw to a binary tree with the parameter θ as the key (e.g. a C++ associateive map). When a value of θ is proposed again, the value of the likelihood is obtained from the tree rather than recomputed. If the grid increments are (fractional) powers of two, then the machine representation of parameter values is exact and a tree from a previous run can be reused in runs with differently scaled proposal densities. The aforementioned public domain software implements this strategy. The parameter pa can either be estimated or be fixed at various values. We tried values from 0.75 to 0.95. We find that estimates of the other elements of θ are hardly affected. What we do find is that varying pa affects the rate at which particles die out at step 2d in the sequential importance sampler. Because we are not using the sequential importance sampler as a smoother, the rate at which particles die out is of no concern. We always have a large number of points available at Step 2e. When pa is treated as a parameter to be estimated, the performance of the MCMC algorithm is degraded somewhat. We think that fixing pa is preferred because doing so improves performance and permits a cleaner comparision of results across the cases I = 2, 3, 4 that we consider in Section 8. We fix at a 15

fractional power of two near 0.95, namely 0.9375. The firm’s discount rate β is extremely difficult to estimate in studies of this sort (e.g. Magnac and Thesmar (2002) and Rust (1994)) and we find this to be the case here. A common rule of thumb in business is not to undertake a project whose internal rate of return is less than 20%. Grabowski, Vernon, and DiMasi (2002) state that estimates of hurdle rates specific to the drug industry range “from 13.5% to over 20%.” As to theory, a firm should not undertake a project whose rate of return is less than its cost of capital. The historical risk premium in the drug industry is 12.55%, Gebhardt, Lee, and Swaminathan (2001). If one adds to this a nominal borrowing rate of 5% one arrives at the value 17.55% below which a project should not be undertaken. Grabowski, Vernon, and DiMasi (2002) arrive at a nominal cost of capital of 14% using a CAPM method that they regard as conservative. On the basis of these considerations we set the firm’s discount rate at 20%. There are 40 market entry opportunities in our five years of data. That implies an expected time increment of 0.125 years between prospective projects for the firms in our data. Therefore, using a hurdle rate of 20%, allowing for compounding, and rounding to a nearby fractional power of two, we set β = 0.96875. Examination of (11) indicates that were γπ to enter as a linear factor then γπ would not be identified. That in fact it enters to the first order as 1 + γπ log Π does not help matters much. Attempts to estimate γπ anyway yields estimates of about 0.93 for the three and four firm case. For the two firm case γπ cannot be determined; chains have wandered between 0.89 and 1.01 with no reliable indication of what the modal value might actually be. Therefore, based on the plausible lower bound of 0.908 derived in Section 4 and our experience from trying to estimate γπ , we take 0.93 to be a reasonable value. Rounding to a nearby fractional power of two, we set γπ = 0.9375.

7

Data

Our data come from Scott Morton (1999).1 We describe the data briefly here but refer the reader to Scott Morton (1999) for details. The data consists of all ANDA approvals between 1

We are grateful to Fiona Scott Morton for providing us with here data, and to Derek Gurney for answering our questions about the data.

16

1984 and 1994. There is data on 1,233 ANDAs, and 363 markets entry opportunities for a total of 123 firms. For each market opportunity there is information on: 1. Submission date, approval date, applicant name. 2. Characteristics of drug: ingredient, concentration, route, form. 3. Characteristics of drug markets: drug therapeutic class, patent expiration date, brand name drug, revenue of brand name drug the year before expiration, revenue from hospitals. 4. Characteristics of firms: stock of all drugs approved before 1984 for firms, measures of entry cost, parent or subsidiary firm, whether firm indicted in scandal. Given our model specification and estimation strategy in order to recover the model parameters we only need information on on total market revenues and entry decisions of potential entrants at each market entry opportunity. In our estimation we focus on the period after the FDA scandal in 1989. We only look at ANDA applications for orally ingested generics in the form of pills. In this category, for the sample period 1990-94, there are 40 market openings for which there is no missing revenue information and 51 firms who entered at least once. The dominant firms in the sample after 1989, in order, are: Mylan, Novopharm, Lemmon, Geneva, Copley, Roxane, Purepac, Watson and Mutual. The top firm, Mylan, entered 45% of the markets, the top two 48%, the top three 55%, the top four 60%, the top five 65%, and the top ten 73%. In our analysis we consider situations where the potential entrants are the top two, three or four firms. In each case the remaining firms are clubbed together in category referred to as “other.” The fraction of the market allocated to “other” is taken as a given that is anticipated by the top firms when considering entry. That fraction is determined via the parameter γπ as described in Section 4. On average 3.3 firms enter a market (std. dev. is 2.6, min is 1, and max is 9). The mean log total revenue in a market (where the levels are in thousands of dollars) is 10.47 (std. dev. is 2.1, min is 4.3, and max is 13.3).

17

8

Results

We estimate the model for three cases: (1) the two most dominant firms are the only potential entrants that are strategic competitors (the actions of the remaining 49 firms firms are accounted for by the parameter γπ ), (2) the top three dominant firms are the only potential entrants, and (3) the top four dominant firms are only potential entrants. The parameters are reported in Table 1. Table 1 about here

Figure 1 about here

Figure 2 about here The parameters are tightly estimated and, as seen from the extrmely low classification error rates, model predictions are quite accurate. The large of ρc imply costs are persistent. The half life of a shock (σc ei ) or an entry decision (κc ) is about 69 market periods or 8.6 years. A value of κc of 0.07 implies that an immediate cost reduction of 7% going into the next market opening that would still have a payoff of 3.5% 8.6 years later. Interestingly, the payoff must be higher to rationalize the decisions of a larger number of firms. The parameter γπ decreases as the the number of firms increases whereas one would expect it to go the other way as one would think that the anticipated revenue share of nondominant firms should decrease as the number of dominant firms increase. Apparently it requires an increasing dose of pessimism to rationalize the behavior of the dominant firms as their number increases. Figure 1 plots the log cost of the three dominant firms in the upper three panels. The circles indicate that the firm entered that market. The logarithm of cost is computed by averaging at Step 2e of the importance sampler at the maximum likelihood estimate. The bottom panel shows log total revenue; the numbers at the bottom of this panel are the number of dominant firms who entered the market at that time point. The top firm, Mylan, 18

has a clear cost advantage over its competitors. Broad trends in cost are about the same for all firms. Figure 2 displays the entry decisions of the dominant firms period by period as circles and the model’s average prediction of their entry, period by period by crosses. The average prediction is computed by averaging game solutions at Step 2e of the importance sampler at the maximum likelihood estimate. It is interesting to consider the possibility that the firms play a different game than the game we propose. Consider two others that might be played instead of the game with payoffs (11). They could play a myoptic game with payoffs Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt ) = Ait (Πγt π /Nt − Cu,i,t )

(30)

where no attention at all is paid to the cost reductions arising from past market entries. Or they could play a game with payoffs Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt ) = Ait (Πγt π /Nt − Cit )

(31)

where they take cognizance of the effect of entry on costs but ignor the continuation value of the game. The the mypotic game has an equilibrium that agrees with the solution of the game we propose with payoffs (11) in 52% of the cases. The game that ignors the continuation value has an equilibrium that agrees in 84% of the cases. These values were computed by using the maximum likelihood estimates shown for the three firm game in Table 9 and finding all equilibria for the three games for all costs that obtained at Step 2b of the sequential importance sampler. As we are using 1024 particles and there are 40 market openings, the number of such cases is 40960. It would seem that the games with payoffs (30) and (31) would have far higher classification error rates then the game we propose with payoffs (11). Incidentally, we can also compute the incidence of multiple equilibria for these three games. They are 5%, 5%, and 4%, respectively. As discussed earlier, these cases of multiple equilibria are like prisoners delemma games and we assume a coordniation game whereby the firms with the lowest costs are those that that are allowed to ”confess”.

19

9

Conclusions

To summarize, we estimate a dynamic oligopolistic entry model for the generic pharmaceutical industry. Our stylized model fits data reasonably well, i.e., the values of pa that are consistent with our estimates are quite close to one. We find that costs affect entry decisions and that past entry decisions (e.g., experience) affect future costs. Hence we find dynamic spill overs of entry in reducing future costs. Our paper contributes to both the estimation of the oligopolistic dynamic games and the understanding of entry decisions in the pharmaceutical industry. Our dynamic model features unobserved firm production costs that are serially correlated over time. This introduces difficulty in the estimation of the dynamic game theoretic model which we overcome using sequential importance sampling methods. Our empirical findings show that the dynamic evolution of the production cost plays an important role in the equilibrium path of the pharmaceutical industry structure. To be completed...

References Abramowitz, M., and I. A. Stegun (1964): Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover. Aguirregabiria, V., and P. Mira (2002): “Sequential simulation-based estimation of dynamic discrete games,” Technical Report, Boston University. Aradillas-Lopez, A. (2005): “Semiparametric Estimation of a Simultaneous Game with Incomplete Information,” working paper, University of California, Berkeley. Augereau, A., S. Greenstein, and M. Rysman (2005): “Coordination vs. differentiation in a standards war: 56K modems,” working paper, Boston University. Bajari, P., L. Benkard, and J. Levin (2004): “Estimating Dynamic Models of Imperfect Competition,” working paper, forthcoming Econometrica. Bajari, P., V. Chernozhukov, H. Hong, and D. Nekipelov (2007): “Semiparametric

20

Estimation of a Dynamic Game of Incomplete Information,” working paper, University of Michigan and Stanford University. Bajari, P., H. Hong, J. Krainer, and D. Nekipelov (2006): “Estimating Static Models of Strategic Interactions,” working paper, Duke University, Stanford University and University of Minnesota. Bajari, P., H. Hong, and S. Ryan (2004): “Identification and Estimation of Discrete Games of Complete Information,” Working paper, Department of Economics, Duke University. Benkard, L., G. Weintraub, and B. V. Roy (2007): “Markov Perfect Industry Dynamics with Many Firms,” working paper, Stanford University. Berry, S. (1992): “Estimation of a model of entry in the airline industry,” Econometrica, 60(4), 889–917. Berry, S., A. Pakes, and M. Ostrovsky (2003): “Simple estimators for the parameters of dynamic games (with entry/exit examples),” Technical Report, Harvard University. Bjorn, P. A., and Q. Vuong (1984): “Simultaneous Equations Models for Dummy Endogenous Variables: A Game Theoretic Formulation with an Application to Labor Force Participation,” SSWP No. 537, Caltech. Bresnahan, T., and P. Reiss (1991a): “Empirical Models of Discrete Games,” Journal of Econometrics, 48, 57—81. (1991b): “Empirical Models of Discrete Games,” Journal of Econometrics, 48, 57—81. (1991c): “Entry and competition in concentrated markets,” Journal of Political Economy, 99, 977–1009. Chernozhukov, V., and H. Hong (2003): “A MCMC Approach to Classical Estimation,” Journal of Econometrics, 115(2), 293–346.

21

Ciliberto, F., and E. Tamer (2003): “Market Structure and Multiple Equilibria in Airline Markets,” Northwestern and North Carolina State University Working Paper. Douced, A., N. de Freitas, and N. Gordon (2001): “An Introduction to Sequential Monte Carlo Methods,” in Sequential Monte Carlo Methods in Practice, ed. by A. Douced, N. de Freitas, and N. Gordon, pp. 3–13. Springer. Dutta, P. K., and R. K. Sundaram (1998): “The Equilibrium Existence Problem in General Markovian Games,” in Organizations with Incomplete Information, ed. by M. Majumdar, pp. 159–207. Cambridge University Press. Fernandez-Villaverde, J., and J. F. Rubio-Ramirez (2005): “Estimating Dynamic Equilibrium Economies: Linear versus Nonlinear Likelihood,” Journal of Applied Econometrics, 20, 891–910. Gebhardt, W. R., C. M. C. Lee, and B. Swaminathan (2001): “Toward and Implied Cost of Capital,” Journal of Accounting Research, 39, 135–176. Golub, G. H. (1973): “Some modified matrix eigenvalue problems,” SIAM Review, 15, 318–334. Golub, G. H., and J. H. Welsch (1969): “Calculation of Gaussian quadrature rules,” Mathematics of Computation, 23, 221–230. Gowrisankaran, G., and J. Stavins (2004): “Network Externalities and Technology Adoption: Lessons from Electronic Payments,” RAND Journal of Economics, 35(2), 260– 276. Grabowski, H., J. Vernon, and J. A. DiMasi (2002): “Returns on Research and Developement for 1990s New Drug Introductions,” Pharmacoeconomics, 20, 11–29. Haile, P., A. Hortacsu, and G. Kosenok (2003): “On the Empirical Content of Quantal Response Equilibrium,” working paper, Yale University. Heckman, J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time - Discrete Data Stochastic Process,” in Structural 22

Analysis of Discrete Data with Econometric Applications, ed. by C. Manski, and D. McFadden, pp. 179–195. MIT Press. Ho, K. E. (2005): “Insurer-Provider Networks in the Medical Care Market,” working paper, Columbia University. Howard, R. A. (1960): Dynamic Programming and Markov Processes. Wiley. Imai, S., N. Jain, and A. Ching (2005): “Bayesian Estimation of Dynamic Discrete Choice Models,” working paper, Concordia University, Northern Illinois University and the University of Toronto. Ishii, J. (2005): “Interconnection Pricing, Compatibility, and Investment in Network Industries: ATM Networks in the Banking Industry,” working paper, Stanford University. Keane, M., and K. Wolpin (1997): “The Career Decisions of Young Men,” The Journal of Political Economy, 105(3), 473–522. Khwaja, A. (2001): “Health Insurance, Habits and Health Outcomes: A Dynamic Stochastic Model of Investment in Health,” Ph.D. Dissertation, University of Minnesota. Kuhn, M. (2006): “Notes on Numerical Dynamic Programming in Economic Applications,” CDSEM, University of Mannheim, Denmark. Magnac, T., and D. Thesmar (2002): “Identifying dynamic discrete decision processes,” Econometrica, 70(2), 801–816. Manuszak, M. D., and A. Cohen (2004): “Endogenous Market Structure with Discrete Product Differentiation and Multiple Equilibria: An Empirical Analysis of Competition Between Banks and Thrifts,” Working paper. Nekipelov, D. (2007): “Entry Deterrence and Learning Prevention on eBay,” Job market paper, Duke University and Stanford University. Norets, A. (2006): “Inference in dynamic discrete choice models with serially correlated unobserved state variables,” working paper, Princeton University. 23

Pakes, A., J. Porter, K. Ho, and J. Ishii (2005): “Moment Inequalities and Their Application,” working paper, Harvard University. Parthasarathy, T. (1982): “Existence of Equilibrium Stationary Strategies in Discounted Stochastic Games,” Sankhya, 44, 114–137. Pesendorfer, M., and P. Schmidt-Dengler (2003): “Identification and Estimation of Dynamic Games,” NBER working paper No. w9726. Rust, J. (1994): “Structural Estimation of Markov Decision Processes,” in Handbook of Econometrics, Vol. 4, ed. by R. Engle, and D. McFadden, pp. 3081–3143. North Holland. Rust, J. (2006): “Dynamic Programming,” Department of Economics, University of Maryland. Rysman, M. (2004): “Competition Between Networks: A Study of the Market for Yellow Pages,” The Review of Economic Studies, 71(2), 483–512. Scott-Morton, F. (1999): “Entry Decisions into Generic Pharmaceutical Markets,” RAND Journal of Economics. Seim, K. (2005): “An Empirical Model of Firm Entry with Endogenous Product-Type Choices,” forthcoming, Rand Journal of Economics. Sweeting, A. (2005): “Coordination Games, Multiple Equilibria and the Timing of Radio Commercials,” Northwestern University Working Paper. Tamer, E. (2003): “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria,” Review of Economic Studies, 70(1).

24

Table 1. Parameter Estimates Number of Potential Entrants (excluding “other” firms) Parameter

2 firms

3 firms

4 firms

µc

9.54 (0.03) 0.97 (0.002) 0.44 (0.006) 0.03 (0.003) 9.93 (0.1) 1.57 (0.074) 0.95 (0.002) 0.96875 0.9375

10.06 (0.003) 0.99 (8.0e-05) 0.37 (0.0005) 0.07 (0.0002) 9.96 (0.02) 1.54 (0.03) 0.94 (0.0002) 0.96875 0.9375

10.07 (0.0006) 0.99 (5.9e-05) 0.37 (0.0002) 0.07 (7.7e-05) 9.99 (0.01) 1.68 (0.01) 0.93 (0.0001) 0.96875 0.9375

0.08 0.08 0.07

0.08 0.09 0.08 0.09

0.10 0.10 0.10 0.10 0.14

ρc σc κc µπ σπ γπ β pa CER all firms CER firm 1 CER firm 2 CER firm 3 CER firm 4

Standard errors are shown in parentheses. CER is the classification error rate computed by averaging the classification errors at Step 2e of the importance sampler at the maximum likelihood estimate.

25

11

12

MYLAN’s log cost

10

o

o

o o

9

o

o o

o

o

o

o

o

o o o

o o o

o 0

10

20

30

40

12

NOVOPHARM’s log cost

11

o

o

o

10

o o

o

o

o

o

o

o

9

o

0

10

20

30

40

12

LEMMON’s log cost

11

o o

o

o o

10

o

o

o

o 9

o o

0

10

20

30

40

4

6

8

10

12

log total revenue

3 2 0 1 1 0 1 0 0 3 0 3 1 0 2 1 0 3 0 0 1 1 2 0 2 0 0 0 2 1 0 2 3 3 0 0 1 2 1 0 0 0

10

20

30

40

Figure 1. Cost, Revenue, and Entry Decisionts. Plotted as a solid line in the first three panels are the logarithm of cost divided for the three dominant firms in the three firm model. The logarithm of cost is computed by averaging at Step 2e of the importance sampler at the maximum likelihood estimate. The circles in these plots indicate that the firm entered the market at that time point. The bottom panel shows the logarithm of total revenue. The numbers at the bottom are the count of the number of dominant firms who entered the market at that time point.

26

MYLAN’s entry decisions o x

0.8

o o x

o

o

o x

o x o x

o x

o

x o o x

o x

o x o x o x

x

x x

x

o x o x o x

0.4

x x

0.0

x o

x o o

x x o o

0

x o

x x o o

x o

x

x x x o x o o o

10

x o

x x o x o o

o

20

x o o

x x o o

30

40

NOVOPHARM’s entry decisions o x

0.8

o

o

o x

o x

o x

o

o x

o o x o x x

o x

x x

0.4

x

x x

0.0

x x x o o o

x o x o x o x o

0

x o

x x x o x o o o

x o o

x o x o

10

x o x o x o x o o

20

x x x o o o

x o x o 30

x o x o x o 40

LEMMON’s entry decisions o

o o x x

0.8

o o x

0.4

x

o o x

x

x o 10

o x

x

x

x x x x x o x o o o o o o

0.0

o x x o

x

x x

0

o

x x o o o o

x x o x o 20

x o o

x x o x o o x o x o x o x o 30

x x x x o x o x o x o o o o 40

Figure 2. Actual and Predicted Entry Decisionts. Plotted as circles are the entry decisions of the three dominant firms in the three firm model. The crosses are the average predictions of the three firm model computed by averaging game solutions at Step 2e of the importance sampler at the maximum likelihood estimate.

27

Han Hong Stanford University Department of Economics Stanford CA 94305-6072 USA

Ahmed Khwaja Duke University Fuqua School of Business Durham NC 27708-0120 USA First draft: January 2008

Abstract We estimate a dynamic oligopolistic entry model for the generic pharmaceutical industry that allows for dynamic spillovers from experience due to entry on future costs. Our paper contributes to both the estimation of oligopolistic dynamic games and the understanding of entry decisions in the pharmaceutical industry. Our dynamic model features unobserved firm production costs that are serially correlated over time. This introduces difficulty in the estimation of the dynamic game theoretic model which we overcome using sequential importance sampling methods. Our empirical findings show that the dynamic evolution of the production cost plays an important role in the equilibrium path of the pharmaceutical industry structure. Keywords: Dynamic Games, Dynamic Spillovers, Generic Pharmaceuticals, Sequential Importance Sampling. JEL Classification: E00, G12, C51, C52 ∗

Supported by the National Science Foundation.

1

1

Introduction Entry behavior by pharmaceutical firms in the generic drug industry is an important topic

of investigation in the empirical literature. For example, Scott-Morton (1999) shows that entry can be predicted by organizational experience, size of the market and the condition of the generic drug. However, much is still to be understood regarding the entry decision of pharmeceutical firms when they engage in a process of dynamic oligopolist competition. Entry decisions of myoptic pharmaceutical firms in a static competition environment are drastically different from those that are forward looking in a dynamic competitive environment. In a dynamic setting, current entry can have a potential spillover effect on future entry. A firm might enter even if current opportunity is loss generating as entry improves payoffs in the future. In order to evaluate the effects of past experience on future cost reduction and entry, and to evaluate the effects of costs on entry, we need to formulate and estimate a structural model of a dynamic game of oligopolistic competition with serially correlated unobserved cost. While there has been a substantial recent development of the empirical dynamic game estimation literature, incorporating serially correlated unobserved production costs induces additional computational difficulties to the econometric estimation procedure. We overcome this difficulty using the recent development of sequential importance sampling techniques. The sequential importance sampling method offers a drastic improvement in the computation speed which makes the estimation of this dynamic model feasible.

2

Related Literature

Static games under the incomplete information assumption have been studied by Bjorn and Vuong (1984), Bresnahan and Reiss (1991a), Bresnahan and Reiss (1991c), Haile, Hortacsu, and Kosenok (2003), Aradillas-Lopez (2005), Ho (2005), Ishii (2005), Pakes, Porter, Ho, and Ishii (2005), Augereau, Greenstein, and Rysman (2005), Seim (2005), Sweeting (2005), Tamer (2003), Manuszak and Cohen (2004), Rysman (2004), Gowrisankaran and Stavins (2004) and Bajari, Hong, Krainer, and Nekipelov (2006). Dynamic versions have been studied by Aguirregabiria and Mira (2002), Bajari, Benkard, and Levin (2004), Berry, Pakes, and 2

Ostrovsky (2003), Pesendorfer and Schmidt-Dengler (2003) and Bajari, Chernozhukov, Hong, and Nekipelov (2007). These papers all made the strong assumption that there is no market or firm level unobserved heterogeneity other than a random shock that is independent and identically distributed across both time and players. This assumption is restrictive because it rules out unobserved dynamics in the latent state variables and rules out any private information that a player might have about competing firms that the researcher does not have. On the other hand, Bresnahan and Reiss (1991b), Berry (1992), Tamer (2003), Ciliberto and Tamer (2003) and Bajari, Hong, and Ryan (2004) investigated static games of complete information. The complete information assumption allows substantial unobserved heterogeneity at the level of the firms. These games typically require combinatorial algorithm to search for an equilibrium instead of the continuous fixed point mapping for the incomplete information models. To our knowledge, we are the first to undertake the challenge of applying the complete information model in a dynamic setting. In the single agent dynamic framework, there is a considerable research that allows for unobserved heterogeneity that is time invariant, e.g., Keane and Wolpin (1997). However work that allows for serially correlated unobserved heterogeneity is rare. Building on the work of Heckman (1981), a frequentist simulation based approach was developed by Khwaja (2001) to integrate out unobserved state variables in the context of a finite horizon dynamic discrete choice model. This approach however relies on closed form expressions for the transition probabilities and a state space that is computationally tractable. Recent work by Imai, Jain, and Ching (2005) and Norets (2006) develops Bayesian methods for single agent dynamic discrete choice models with unobserved state variables that are serially correlated over time. Of relevance to our work is that Bayesian methods are likelihood based as is our approach and so, as shown by Chernozhukov and Hong (2003), computational methods useful in Bayesian inference may be applied to likelihood based frequentist inference and vice versa. Therefore, although this paper is frequentist, as a practical matter we extend this Bayesian estimation literature to dynamic discrete games. Our computational methods may be used for either frequentist likelihood based inference or Bayesian inference. Our implementation makes use of a sequential importance sampler. Sequential impor3

tance sampling methods are have been used by Fernandez-Villaverde and Rubio-Ramirez (2005) for estimating macroeconomics dynamic stochastic general equilibrium models. The structure of dynamic stochastic general equilibrium models are very similar to dynamic discrete choice models. The game theoretical component of our implementation is new. In a continuous time setting, Nekipelov (2007) develops a flexible indirect inference estimator for continuous time dynamic games in the context of eBay auctions without requiring the complete solution of the dynamic game. This is a novel approach that has potential applications in dynamic oligopolistic competition models. Relatedly, Benkard, Weintraub, and Roy (2007) introduced the notion of oblivious equilibrium to facilitate the computation of dynamic game equilibria.

3

Institutional Background

Our empirical model examines the data analyzed in Scott-Morton (1999) on the generic drug entries in the period 1984 to 1994. This time period is particularly interesting because of the 1984 Waxman-Hatch Act which permitted Abbreviated New Drug Applications (ANDAs) by generic firms. The important timing of the institutional environment was discussed in details in Scott-Morton (1999). The relevant facts for our study are summarized in the following. The preparation of an ANDA application takes months to years because it requires contruction of manufacturing facilities that need to be inspected and approved by the FDA. The sunk cost of an ANDA market entry is high even though it is much less than a new drug invention. The size and heterogeneity of entry cost relative to the size of the market revenue lead to a small number of entrants supported by each market. In addition, the FDA does not reveal when and from whom it receives ANDA applications. These features of the data are consistent with our modelling assumption of a dynamic simultaneous entry game among a small number of competing pharmaceutical firms, in which firms have to face substantial uncertainty in rival price competition when they incurr the fixed sunk cost of entry. As discussed in Scott-Morton (1999), announced entry is very rare, because firms do not want to signal the common market value. They also fear that the delay in the approval will invite competition. There are few late sequential movers who withdrew in response to rivals’ 4

approvals. But simultaneous moves in a dynamic context are more important. Because of the “generic scandal” that broke out in 1989 and the general upheaval and uncertainty in the generic drug industry surrounding this period, we take great care in processing the data between 1988 to 1993.

4

Model

In this section we formally describe the entry game. Each time the market is open counts as one time increment. The time index is denoted t. Two implications are that irrespective of the calendar time that has elapsed between the adjacent market openings, (i) cost decreases are homogeneous, and (ii) the discount rate is held constant between market openings. These are plausible assumptions for our application as in our estimation sample (described below in Section 7) there are 40 openings in the period 1990-94 or on average a market opening every 1.5 weeks. This convention avoids unsurmountable computational difficulties in solving for the equilibrium of the model caused by unequal spacing. Moreover, we are only required to get the ordering of the data correct rather than try to determine market entry dates precisely. We order according to the date when an ANDA application was recieved by the FDA. This convention appears to us to be a reasonable a priori view as to how costs, which are unobservable, would behave. In view of the excellent fit to the data that we are able to achieve (Section 8) this convention appears reasonable a posteriori. As mentioned earlier, we are calling this latent variable cost for convenience. It is really a mixture of experience (learning by doing), capital accumulation, etc. The actions available to firm i when market t opens are to enter, Ait = 1, or not enter Ait = 0. There are I firms in total so that the number who enter market t is given by Nt =

I X

Ait

(1)

i=1

Past entry decisions and random shocks determine cost Cit = exp(cit ), where, as just indicated, we follow the standard convention that a lower case quantity denotes the logarithm of an upper case quantity. The equation governing the log cost of firm i at time t is cit = µc + ρc (ci,t−1 − µc ) − κc Ai,t−1 + σc eit , 5

(2)

where eit is normally distributed shock with mean zero and unit variance, σc is a scale parameter, κc is the immediate impact of time t that is felt if the market was entered at time t − 1, µc is a location parameter, and ρc is an autoregressive parameter which is presumed to satisfy 0 < ρc < 1. We assume that all firms are ex ante identical, with all the heterogeneity in costs driven by past decisions, hence none of these parameters are indexed by i. Log cost can be decomposed into a sum of an observable and unobservable components as follows: ci,t = cu,i,t + ck,i,t

(3)

cu,i,t = µc + ρc (cu,i,t−1 − µc ) + σc eit

(4)

ck,i,t = ρc ck,i,t−1 − κc Ai,t−1

(5)

From these equations one can see that the location parameter µc can be interpreted as the mean of the unobervable portion of log cost and that the total impact of a firm’s past entry decisions is ck,i,t = −

P∞

j=0

ρj κc Ai,t−j−1 .

The total revenue to be divided among firms who enter market t is Πt = exp(πt ) given by

πt = µπ + σπ e0,t

(6)

where e0,t is normally distributed with mean zero and unit variance, µπ is a location parameter and σπ a scale parameter. The value we have for total revenue is that for the last year the drug was on patent. We interpret this value as being proportional to the total discounted value of the revenue flows as a generic drug once the drug goes off patent. Post the 1989 scandal there were fifty firms that entered the market. We cannot hope to model all fifty and so consider only the dominant firms. Therefore in the following Nt is used to denote the number of entering dominant firms. We consider the case of two, three, and four dominant firms. Nt is less than or equal to I, which is the total number of dominat firms (i.e. 2, 3, or 4), which is considered to be time-invariant. Nt is to be differentiated from Nta , which is used to denote the total number of entrant firms at time t including both dominant and nondominant firms.

6

We allow for nondominant firms as follows. Regressions indicate that log N a = b log Π, with b ≈ 0.092, is a reasonable approximation to the total number of firms that enter a market. Therefore, when one of the dominant firms is considering entry, it can anticipate that the revenue available to be divided among all dominant firms is log Πanticipated = log Π −

b log Π = log Π1−b . These considerations suggest that a reasonable functional form for dominant firm i’s per period revenue at time t is Ait (Πγt π /Nt − Cit ) ,

(7)

with 1 − b = 0.908 being a reasonable lower bound for γπ . We assume an infinite horizon. Thus, the firm’s total discounted revenue at time t is ∞ X

π β j Ai,t+j Πγt+j /Nt+j − Ci,t+j ,

(8)

j=0

where β is the discount factor, 0 < β < 1. The parameters of the model may be summarised as θ = (µc , ρc , σc , κc , µπ , σπ , γπ , β).

(9)

Denote a subgame perfect pure strategy equilibrium profile for this game by E E AE i,t , A−i,t , ⇒ Nt ,

(10)

E where AE i,t is the entry decision of firm i for market t, A−i,t the vector of entry decisions of

the other dominant firms, and NtE is the number of firms that enter, which can be computed from the profile using (1). Thoerem 3.1 of Dutta and Sundaram (1998) implies that this game has a subgame perfect Markovian equilibrium in mixed strategies under reqularity conditions, the most important of which is that revenue and cost can only take on a finite set of values. This can be relaxed to countable values Parthasarathy (1982). We could modify our problem to meet this requirement but there seems no need to because we have no trouble computing pure strategy equilibria for the problem as posed with an infinite state space. A pure strategy equilbrium is, of course, also a mixed strategy equilibrium. What is most interesting about Theorem 3.1, however, is that its proof provides the details that motivate our computational strategy, discussed in Section 6. (See also Rust (2006) in this connection.) The regularity 7

conditions of Theorem 5.1 of Dutta and Sundaram (1998) come closer to the problem as we have posed it, notably the revenue and cost do not have to be discrete but they would need to be bounded. The equilibrium provided by Theorem 5.1 may depend on two periods of the state vector but we find that we can alway find equlibria that depend on one period only. Thus, although there are results that imply that a slightly modified version of the proposed game has equilibria, in the sequel we rely mostly on the fact that we have no trouble computing equilibria. A stationary pure strategy Markov equilibrium of the dynamic game can be computed E by finding an equilibrium (AE i,t , A−i,t ) for the game with payoffs

Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt )

(11)

= Ait (Πγt π /Nt − Cit ) h

+ β E Vi (Ci,t+1 , C−i,t+1 , Πt+1 ) | Ai,t , A−i,t , Ci,t , C−i,t , Πt ,

i

where Vi (Ci , C−i , Π) solves the Bellman-type equation Vi (Cit , C−i,t , Πt )

(12)

γπ E = AE it Πt /Nt − Cit

i

h

E + β E Vi (Ci,t+1 , C−i,t+1 , Πt+1 ) | AE i,t , A−i,t , Ci,t , C−i,t , Πt .

In writing the Bellman-equation it is implicitly understood that the equilibrium actions

E AE i,t , A−i,t are functions of the state variables of cost and revenue (Ci,t , C−i,t , Πt ). The

E conditions that (AE i,t , A−i,t ) must satisfy to be the sought equilibrium are E E Vi (AE i,t , A−i,t , Ci,t , C−i,t , Πt ) ≥ Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt ) ∀ i, t.

5

(13)

Solving the Model

The log state vector is st = (c1t , ..., cIt , πt ) .

(14)

Denote the entry decisons of all firms (strategy profile) by At = (A1t , ..., AIt ) . 8

(15)

Conditionally upon st and At , the elements of st+1 are independently normally distributed with means µit = µc + ρc (cit − µc ) − κc Ait for the first I elements and µIt = µπ for the last and with standard deviations σi = σc for the first I elements and σI = σπ for the last. The conditional expectation of functions of the form f (st+1 ) given (At , st ), such as appear in equations (11) and (12), can be computed by Gauss-Hermite quadrature. The weights wi and abscissae xi for Gauss-Hermite quadrature may be obtained from tables such as Abramowitz and Stegun (1964) or by direct computation using algorithms such as Golub and Welsch (1969) as updated in Golub (1973). To integrate with respect to si,t+1 conditionally upon At √ and st the abscissae are transformed to s˜t+1,i = µi + 2σi xi , and the weights are transformed √ to w˜i = wi / π. Then, for a K + 1 rule, E[f (st+1 ) | At , st ] ≈

K X

···

K X

K X

f (˜ st+1,i1 ,· · · ,tildest+1,iI , s˜t+1,iI+1 )w˜i1 · · ·w˜iI w˜iI+1 .

i1 =−K iI =−K iI+1 =−K

(16) Note from above that both the abscissae s˜t+1 and the weights w˜t+1 depend on At and st . If, for example, there are two firms and a three point quadrature rule is used, then E[f (st+1 ) | At , st ] ≈

1 1 1 X X X

f (˜ si , s˜j , s˜k )w˜i w˜j w˜k .

i=−1 j=−1 k=−1

We use three point rules throughout. A three point rule will integrate a polynomial in st+1 up to degree five exactly. Equations (11), (12), and (13) can be expressed in terms of st by putting Cit = exp(sit ) for i = 1, . . . , I and Πt = exp(sI+1,t ). Assume that this has been done. Suppose, for the moment, that the value function V (st ) = (V1 (st ), . . . , VI (st )) ,

(17)

where each Vi (st ) solves its respective equation (12), is known. Then an equilibrium for the game with payoffs given by equation (11) can be found by checking (13) directly for all possible At . There will, at times, be multiple equilibria. This usually takes the form of a prisoner’s delemma situatation where one or another firm can profitably enter but if both enter they both will incurr losses whereas if neither enters they would get the continuation value of the game. In the three firm game the frequency of multiple equilibria is 4%. We 9

resolve this situation by assuming a coordination game. The firms that have the lowest costs Cit are those that are allowed to enter. That is, the At are ordered by increasing C=

PI

i=1

Ait Cit and the first At that satisfies (13) is accepted as the solution. Note that our

distributional assumptions on st guarantee that no two C can be equal so that this ordering of the At is unique. Moreover, none of the Cit can equal one another and when that is true we have never failed to be able to compute a pure strategy equilibrium. We approximate the value function V (st ) by a local linear approximation as follows. We define a grid on the state space which defines a set of (I + 1)-dimensional rectangles that have grid lines as boundaries. We refer to the center K of such a rectangle as its key. To approximate the value function at a point st , the key K of the rectangle that contains st is located and the value function is approximated by V (st ) = bK + (BK )st ,

(18)

where bk is a vector of length I and BK is an I by I + 1 matrix. We set the grid increments at 16σ although it does not matter much. The set of keys that actually get visited will be about the same for grid increments as small as 4σ because past decisions operating though recursion (5) have far more influence on the location of st than do random shocks operating through recursion (4) or equation (6). For a three firm game the number of rectangles that actually are visited is about six. It remains to compute the coefficients bk and BK . This we do as follows. We initialize to zero. We generate a set of abscissae {˜ st } clustered about K and solve the game with payoffs

(11) to get corresponding equilibria {A˜E ˜t , A˜E pairs into (11) to get t }. We substitue the s t ˜E ˜ ˜ ˜ updated values {V˜i (˜ st ) = Vi (A˜E it , A−i,t , Cit , C−i,t , Πt )} for the value function. Then, using the pairs {(˜ st , V (˜ st )} as data, we compute bk and BK by mulitvariate least squares. We repeat until the bk and BK stablize. We have found that twenty interations suffices. As the notation suggests, the easiest way to get a cluster of points about a key is to use absciccae from the quadrature rule described above with st set to K and At set to zero. But if so, one must take care to jitter the points so that no two firms have exactly the same cost. It is possible to apply a modified Howard acceleration strategy as descirbed in Kuhn (2006); see also Rust (2006) and Howard (1960). The idea is simple: The solution {A˜E t } of 10

the game with payoffs (11) will not change much, if at all, for small changes in the value function V (s). Therefore, rather than recompute the solution at every step of the (bk , Bk ) interations, one can reuse a solution for a few steps. We find that this strategy becomes riskier as the number of firms increases. A conservative approach that does work is to not use the modified Howard acceleration for the firs two iterates of (bk , Bk ). On the third, use it once; on the fourth, twice; the fifth, thrice; and so on. We have found that twenty iterations suffices, counting regular steps and modified Howard steps equally. Our code is written in C++ and makes heavy use of the object oriented programming style, the classical data structures in the standard template library, and a matrix class. We surmise that trying to implement these methods in a language without the first two features, such as Matlab, would be exasperating beyond endurance.

6

Computing the Likelihood

In this section describe our estimation strategy. From the computational standpoint, the setting is as follows. There are I firms, i = 1, . . . , I, who can enter the market or not at each time period t. If firm i enters at time t, then Ait = 1; if not, Ait = 0. The number of firms that enter at time t, is Nt =

PI

i=1

Ait . The total anticiapated revenue available to the firms in each market is

Πγt π , which is divided equally among those firms that enter. We can observe both Πt and At = (A1t , . . . , AIt ). Log cost, ci,t = log Ci,t , is the sum of two componants. The first is log Cu,i,t , which is known by all firms but not by us. The second is log Ck,i,t , which is known by all firms and by us. Both evolve as a Markov process. Cost together with revenue Πt is the state vector. Denote the part of the state vector that is hidden from us by Xt = (Cu,1,t , . . . , Cu,I,t ).

(19)

Denote the variables that we can observe by Yt = (A1t , . . . , AIt , Ck,1,t , . . . , Ck,I,t , Πt ).

(20)

As previously, a lower case variable denotes the logarithm of an upper case variable with the exception that at = At ; i.e. xit = log Xit , cu,i,t = log Cu,i,t , ck,i,t = log Ck,i,t , πit = log Πit , 11

xt = (cu,1,t , . . . , cu,I,t ), and yt = (a1t , . . . , aIt , ck,1,t , . . . , ck,I,t , log Πt ). With these conventions, cost evolves as ci,t = cu,i,t + ck,i,t

(21)

cu,i,t = µc + ρc (cu,i,t−1 − µc ) + σc eit

(22)

ck,i,t = ρc ck,i,t−1 − κc Ai,t−1

(23)

and revenue evolves as πt = µπ + σπ e0,t .

(24)

We shall estimate model parameter by maximum liklihood using the Chernozhukov and Hong (2003) MCMC method. ADD CITE TO Belloni and Chernozukov AND COMMENT ON IT This method uses the Metropois-Hastings algorithm wherein one proposes a value for model parameters and then decides whether to accept or reject it. These proposed parameter values θ = (µc , ρc , σc , κc , µπ , σπ , γπ , β) are known to us for the purpose of computation. We have data for both the pre and post scandal periods. The pre-scandal period is indexed by t = −n0 , . . . , 0 and the values of Yt over the pre-scandal period are denoted by Ypre . The post-scandal period is indexed by t = 1, . . . , n. While the scandal changed the market structure thus rendering the pre-scandal data unsuitable for general estimation, it can still be used for two purposes: The entry decisions {Ait }0t=−n0 can be used to compute the last two pre-scandal values ck,i,−1 and ck,i,0 of the observable part of log cost for each firm; and the pre-scandal log revenue {πt }0t=−n0 can be used to help identify the parameters µπ and σπ . We compute the initial values ck,i,−1 and ck,i,0 for each firm by running the recursion (23) started at zero over the observed choices {Ait }0t=−n0 . We now know the vectors y−1 and y0 because (π−1 , A−1 ) and (π0 , A0 ) are in Ypre . While the scandal may have affected which firms participated in the market post-scandal, there is no reason to believe that market oportunites were different pre- and post-scandal. Therefore the pre-scandal data can be used to help identify the revenue distribution. From Ypre we can compute a normal likelihood for log revenue over the period −n, . . . , 0. Although this likelihood actually only depends on the two elements (µπ , σπ ) of θ, we denote it as p(Ypre |θ) 12

(25)

for convenience. Because At is a deterministic function of (xt , πt , yt−1 , θ), the density p(At |πt , xt , yt−1 , θ) puts mass one on a single value of At . The implication is that the likelihood over the post scandal data Ypost assumes the value one if we predict every entry decision perfectly and zero otherwise. To avoid this situation, we adopt the following density for At p(At |πt , xt , yt−1 , θ, pa ) =

I Y

c

c

(pa )I(Ait =Ait ) (1 − pa )I(Ait 6=Ait )

(26)

i=1

where 0 < pa < 1 and Acit is computed from (xt , πt , yt−1 , θ) using the methods described in Section 5. Douced, de Freitas, and Gordon (2001) present a concise description of the sequential importance sampler that is adequate to follow what we do here. The densities relevant to a sequential importance sampler are the transition density of the hidden state vector p(xt |xt−1 , θ),

(27)

which is defined by recursion (23), the initial density p(x0 |θ),

(28) q

which, from (23), is normal with mean µc and standard deviation σc / 1 − ρ2c , and, the observation density p(yt |yt−1 , xt , θ, pa ) = p(At |πt , yt−1 , xt , θ, pa ) p(πt |yt−1 , xt , θ)

(29)

where from (24) p(πt |yt−1 , xt , θ) = p(πt |θ), is normal with mean µπ and standard deviation σπ . The sequential importance sampler for (θ, pa ) is given in the following: 1. For t = 0 (i)

(a) Start N particles by drawing x0 for i = 1, . . . , N from the initial density (28). (b) Compute p(y0 |θ, pa ) = . =

Z

p(y0 |y−1 , x0 , θ, pa ) p(y−1 , x0 |θ, pa ) dx0

N 1 X (i) p(y0 |y−1 , x0 , θ, pa ). N i=1

13

2. For t = 1, . . . , n (i)

(a) For each particle draw x˜t from the transition density (27) and set (i)

(i)

(i)

x˜0:t = (x0:t−1 , x˜t ). (i)

(b) For each particle compute the particle weights wˆt using the observation density (29); i.e. (i)

(i)

w˜t = p(yt |yt−1 , x˜t , θ, pa ) (Here is where we run into trouble with a deterministic function because the weights could all be zero.) (c) Normalize the weights so that they sum to one (i)

(i) wˆt

w˜t

= PN

i=1

(i)

w˜t

. (i)

(i)

x0:t } (d) For i = 1, . . . , N sample with replacement the particles x0:t from the set {˜ (i)

according to the weights {wˆt }. (Note the convention: Particles with unequal (i)

weights are denoted by {˜ x0:t }. After resampling the particles are denoted by (i)

{x0:t }.) (e) Compute p(yt |y1:t−1 , θ, pa ) = . =

Z

p(yt |yt−1 , xt , θ, pa ) p(yt−1 , xt |y1:t−1 , θ, pa ) dxt

N 1 X (i) p(yt |yt−1 , xt , θ, pa ). N i=1

(i)

(Note that p(yt |yt−1 , xt , θ) does not have to be recomputed here if the weights (i)

(i)

w˜t are associated to xt in the resampling step and saved. If each firms entry decisions are similarly associated, then classification error rates can be computed at this step.) 3. Done (a) The likelihood is p(y1:t |θ) = p(Ypre |θ)p(y0 |θ)

n Y t=1

14

p(yt |y1:t−1 , θ).

Steps 2a and 2b can be parallelized by means of threads as follows. The N particles can be divided up into groups, one group for each of the machine’s CPUs. Steps 2a and 2b are performed on each group by a separate thread. Unfortunately, all other parts of the algorithm are serial. Another approach to parallelization is to use message passing (MPI) and let each CPU compute a chain. These sub chains are then concatonated to get one long chain. Too avoid having to burn off transients on each CPU, one can start from the maximum of the likelihood and use a different initial seed on each CPU. The general purpose impmenentation of the Chernozhukov and Hong (2003) that we use will do this. It is in the public domain and available at econ.duke.edu/webfiles/arg/emm. Our strategy is to run using threads until the maximum has been found and the chain tuned so that the rejection rate is about 30% for each parameter and then switch to the MPI strategy to get enough draws to compute standard errors. It is probably an unnecessary precaution, but when we switch to the MPI runs, we quit using the modified Howard acceleration method. An acceleration method that can be used with either parallization strategy is to define the proposal density on a grid and save the likelihood for each draw to a binary tree with the parameter θ as the key (e.g. a C++ associateive map). When a value of θ is proposed again, the value of the likelihood is obtained from the tree rather than recomputed. If the grid increments are (fractional) powers of two, then the machine representation of parameter values is exact and a tree from a previous run can be reused in runs with differently scaled proposal densities. The aforementioned public domain software implements this strategy. The parameter pa can either be estimated or be fixed at various values. We tried values from 0.75 to 0.95. We find that estimates of the other elements of θ are hardly affected. What we do find is that varying pa affects the rate at which particles die out at step 2d in the sequential importance sampler. Because we are not using the sequential importance sampler as a smoother, the rate at which particles die out is of no concern. We always have a large number of points available at Step 2e. When pa is treated as a parameter to be estimated, the performance of the MCMC algorithm is degraded somewhat. We think that fixing pa is preferred because doing so improves performance and permits a cleaner comparision of results across the cases I = 2, 3, 4 that we consider in Section 8. We fix at a 15

fractional power of two near 0.95, namely 0.9375. The firm’s discount rate β is extremely difficult to estimate in studies of this sort (e.g. Magnac and Thesmar (2002) and Rust (1994)) and we find this to be the case here. A common rule of thumb in business is not to undertake a project whose internal rate of return is less than 20%. Grabowski, Vernon, and DiMasi (2002) state that estimates of hurdle rates specific to the drug industry range “from 13.5% to over 20%.” As to theory, a firm should not undertake a project whose rate of return is less than its cost of capital. The historical risk premium in the drug industry is 12.55%, Gebhardt, Lee, and Swaminathan (2001). If one adds to this a nominal borrowing rate of 5% one arrives at the value 17.55% below which a project should not be undertaken. Grabowski, Vernon, and DiMasi (2002) arrive at a nominal cost of capital of 14% using a CAPM method that they regard as conservative. On the basis of these considerations we set the firm’s discount rate at 20%. There are 40 market entry opportunities in our five years of data. That implies an expected time increment of 0.125 years between prospective projects for the firms in our data. Therefore, using a hurdle rate of 20%, allowing for compounding, and rounding to a nearby fractional power of two, we set β = 0.96875. Examination of (11) indicates that were γπ to enter as a linear factor then γπ would not be identified. That in fact it enters to the first order as 1 + γπ log Π does not help matters much. Attempts to estimate γπ anyway yields estimates of about 0.93 for the three and four firm case. For the two firm case γπ cannot be determined; chains have wandered between 0.89 and 1.01 with no reliable indication of what the modal value might actually be. Therefore, based on the plausible lower bound of 0.908 derived in Section 4 and our experience from trying to estimate γπ , we take 0.93 to be a reasonable value. Rounding to a nearby fractional power of two, we set γπ = 0.9375.

7

Data

Our data come from Scott Morton (1999).1 We describe the data briefly here but refer the reader to Scott Morton (1999) for details. The data consists of all ANDA approvals between 1

We are grateful to Fiona Scott Morton for providing us with here data, and to Derek Gurney for answering our questions about the data.

16

1984 and 1994. There is data on 1,233 ANDAs, and 363 markets entry opportunities for a total of 123 firms. For each market opportunity there is information on: 1. Submission date, approval date, applicant name. 2. Characteristics of drug: ingredient, concentration, route, form. 3. Characteristics of drug markets: drug therapeutic class, patent expiration date, brand name drug, revenue of brand name drug the year before expiration, revenue from hospitals. 4. Characteristics of firms: stock of all drugs approved before 1984 for firms, measures of entry cost, parent or subsidiary firm, whether firm indicted in scandal. Given our model specification and estimation strategy in order to recover the model parameters we only need information on on total market revenues and entry decisions of potential entrants at each market entry opportunity. In our estimation we focus on the period after the FDA scandal in 1989. We only look at ANDA applications for orally ingested generics in the form of pills. In this category, for the sample period 1990-94, there are 40 market openings for which there is no missing revenue information and 51 firms who entered at least once. The dominant firms in the sample after 1989, in order, are: Mylan, Novopharm, Lemmon, Geneva, Copley, Roxane, Purepac, Watson and Mutual. The top firm, Mylan, entered 45% of the markets, the top two 48%, the top three 55%, the top four 60%, the top five 65%, and the top ten 73%. In our analysis we consider situations where the potential entrants are the top two, three or four firms. In each case the remaining firms are clubbed together in category referred to as “other.” The fraction of the market allocated to “other” is taken as a given that is anticipated by the top firms when considering entry. That fraction is determined via the parameter γπ as described in Section 4. On average 3.3 firms enter a market (std. dev. is 2.6, min is 1, and max is 9). The mean log total revenue in a market (where the levels are in thousands of dollars) is 10.47 (std. dev. is 2.1, min is 4.3, and max is 13.3).

17

8

Results

We estimate the model for three cases: (1) the two most dominant firms are the only potential entrants that are strategic competitors (the actions of the remaining 49 firms firms are accounted for by the parameter γπ ), (2) the top three dominant firms are the only potential entrants, and (3) the top four dominant firms are only potential entrants. The parameters are reported in Table 1. Table 1 about here

Figure 1 about here

Figure 2 about here The parameters are tightly estimated and, as seen from the extrmely low classification error rates, model predictions are quite accurate. The large of ρc imply costs are persistent. The half life of a shock (σc ei ) or an entry decision (κc ) is about 69 market periods or 8.6 years. A value of κc of 0.07 implies that an immediate cost reduction of 7% going into the next market opening that would still have a payoff of 3.5% 8.6 years later. Interestingly, the payoff must be higher to rationalize the decisions of a larger number of firms. The parameter γπ decreases as the the number of firms increases whereas one would expect it to go the other way as one would think that the anticipated revenue share of nondominant firms should decrease as the number of dominant firms increase. Apparently it requires an increasing dose of pessimism to rationalize the behavior of the dominant firms as their number increases. Figure 1 plots the log cost of the three dominant firms in the upper three panels. The circles indicate that the firm entered that market. The logarithm of cost is computed by averaging at Step 2e of the importance sampler at the maximum likelihood estimate. The bottom panel shows log total revenue; the numbers at the bottom of this panel are the number of dominant firms who entered the market at that time point. The top firm, Mylan, 18

has a clear cost advantage over its competitors. Broad trends in cost are about the same for all firms. Figure 2 displays the entry decisions of the dominant firms period by period as circles and the model’s average prediction of their entry, period by period by crosses. The average prediction is computed by averaging game solutions at Step 2e of the importance sampler at the maximum likelihood estimate. It is interesting to consider the possibility that the firms play a different game than the game we propose. Consider two others that might be played instead of the game with payoffs (11). They could play a myoptic game with payoffs Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt ) = Ait (Πγt π /Nt − Cu,i,t )

(30)

where no attention at all is paid to the cost reductions arising from past market entries. Or they could play a game with payoffs Vi (Ai,t , A−i,t , Ci,t , C−i,t , Πt ) = Ait (Πγt π /Nt − Cit )

(31)

where they take cognizance of the effect of entry on costs but ignor the continuation value of the game. The the mypotic game has an equilibrium that agrees with the solution of the game we propose with payoffs (11) in 52% of the cases. The game that ignors the continuation value has an equilibrium that agrees in 84% of the cases. These values were computed by using the maximum likelihood estimates shown for the three firm game in Table 9 and finding all equilibria for the three games for all costs that obtained at Step 2b of the sequential importance sampler. As we are using 1024 particles and there are 40 market openings, the number of such cases is 40960. It would seem that the games with payoffs (30) and (31) would have far higher classification error rates then the game we propose with payoffs (11). Incidentally, we can also compute the incidence of multiple equilibria for these three games. They are 5%, 5%, and 4%, respectively. As discussed earlier, these cases of multiple equilibria are like prisoners delemma games and we assume a coordniation game whereby the firms with the lowest costs are those that that are allowed to ”confess”.

19

9

Conclusions

To summarize, we estimate a dynamic oligopolistic entry model for the generic pharmaceutical industry. Our stylized model fits data reasonably well, i.e., the values of pa that are consistent with our estimates are quite close to one. We find that costs affect entry decisions and that past entry decisions (e.g., experience) affect future costs. Hence we find dynamic spill overs of entry in reducing future costs. Our paper contributes to both the estimation of the oligopolistic dynamic games and the understanding of entry decisions in the pharmaceutical industry. Our dynamic model features unobserved firm production costs that are serially correlated over time. This introduces difficulty in the estimation of the dynamic game theoretic model which we overcome using sequential importance sampling methods. Our empirical findings show that the dynamic evolution of the production cost plays an important role in the equilibrium path of the pharmaceutical industry structure. To be completed...

References Abramowitz, M., and I. A. Stegun (1964): Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover. Aguirregabiria, V., and P. Mira (2002): “Sequential simulation-based estimation of dynamic discrete games,” Technical Report, Boston University. Aradillas-Lopez, A. (2005): “Semiparametric Estimation of a Simultaneous Game with Incomplete Information,” working paper, University of California, Berkeley. Augereau, A., S. Greenstein, and M. Rysman (2005): “Coordination vs. differentiation in a standards war: 56K modems,” working paper, Boston University. Bajari, P., L. Benkard, and J. Levin (2004): “Estimating Dynamic Models of Imperfect Competition,” working paper, forthcoming Econometrica. Bajari, P., V. Chernozhukov, H. Hong, and D. Nekipelov (2007): “Semiparametric

20

Estimation of a Dynamic Game of Incomplete Information,” working paper, University of Michigan and Stanford University. Bajari, P., H. Hong, J. Krainer, and D. Nekipelov (2006): “Estimating Static Models of Strategic Interactions,” working paper, Duke University, Stanford University and University of Minnesota. Bajari, P., H. Hong, and S. Ryan (2004): “Identification and Estimation of Discrete Games of Complete Information,” Working paper, Department of Economics, Duke University. Benkard, L., G. Weintraub, and B. V. Roy (2007): “Markov Perfect Industry Dynamics with Many Firms,” working paper, Stanford University. Berry, S. (1992): “Estimation of a model of entry in the airline industry,” Econometrica, 60(4), 889–917. Berry, S., A. Pakes, and M. Ostrovsky (2003): “Simple estimators for the parameters of dynamic games (with entry/exit examples),” Technical Report, Harvard University. Bjorn, P. A., and Q. Vuong (1984): “Simultaneous Equations Models for Dummy Endogenous Variables: A Game Theoretic Formulation with an Application to Labor Force Participation,” SSWP No. 537, Caltech. Bresnahan, T., and P. Reiss (1991a): “Empirical Models of Discrete Games,” Journal of Econometrics, 48, 57—81. (1991b): “Empirical Models of Discrete Games,” Journal of Econometrics, 48, 57—81. (1991c): “Entry and competition in concentrated markets,” Journal of Political Economy, 99, 977–1009. Chernozhukov, V., and H. Hong (2003): “A MCMC Approach to Classical Estimation,” Journal of Econometrics, 115(2), 293–346.

21

Ciliberto, F., and E. Tamer (2003): “Market Structure and Multiple Equilibria in Airline Markets,” Northwestern and North Carolina State University Working Paper. Douced, A., N. de Freitas, and N. Gordon (2001): “An Introduction to Sequential Monte Carlo Methods,” in Sequential Monte Carlo Methods in Practice, ed. by A. Douced, N. de Freitas, and N. Gordon, pp. 3–13. Springer. Dutta, P. K., and R. K. Sundaram (1998): “The Equilibrium Existence Problem in General Markovian Games,” in Organizations with Incomplete Information, ed. by M. Majumdar, pp. 159–207. Cambridge University Press. Fernandez-Villaverde, J., and J. F. Rubio-Ramirez (2005): “Estimating Dynamic Equilibrium Economies: Linear versus Nonlinear Likelihood,” Journal of Applied Econometrics, 20, 891–910. Gebhardt, W. R., C. M. C. Lee, and B. Swaminathan (2001): “Toward and Implied Cost of Capital,” Journal of Accounting Research, 39, 135–176. Golub, G. H. (1973): “Some modified matrix eigenvalue problems,” SIAM Review, 15, 318–334. Golub, G. H., and J. H. Welsch (1969): “Calculation of Gaussian quadrature rules,” Mathematics of Computation, 23, 221–230. Gowrisankaran, G., and J. Stavins (2004): “Network Externalities and Technology Adoption: Lessons from Electronic Payments,” RAND Journal of Economics, 35(2), 260– 276. Grabowski, H., J. Vernon, and J. A. DiMasi (2002): “Returns on Research and Developement for 1990s New Drug Introductions,” Pharmacoeconomics, 20, 11–29. Haile, P., A. Hortacsu, and G. Kosenok (2003): “On the Empirical Content of Quantal Response Equilibrium,” working paper, Yale University. Heckman, J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time - Discrete Data Stochastic Process,” in Structural 22

Analysis of Discrete Data with Econometric Applications, ed. by C. Manski, and D. McFadden, pp. 179–195. MIT Press. Ho, K. E. (2005): “Insurer-Provider Networks in the Medical Care Market,” working paper, Columbia University. Howard, R. A. (1960): Dynamic Programming and Markov Processes. Wiley. Imai, S., N. Jain, and A. Ching (2005): “Bayesian Estimation of Dynamic Discrete Choice Models,” working paper, Concordia University, Northern Illinois University and the University of Toronto. Ishii, J. (2005): “Interconnection Pricing, Compatibility, and Investment in Network Industries: ATM Networks in the Banking Industry,” working paper, Stanford University. Keane, M., and K. Wolpin (1997): “The Career Decisions of Young Men,” The Journal of Political Economy, 105(3), 473–522. Khwaja, A. (2001): “Health Insurance, Habits and Health Outcomes: A Dynamic Stochastic Model of Investment in Health,” Ph.D. Dissertation, University of Minnesota. Kuhn, M. (2006): “Notes on Numerical Dynamic Programming in Economic Applications,” CDSEM, University of Mannheim, Denmark. Magnac, T., and D. Thesmar (2002): “Identifying dynamic discrete decision processes,” Econometrica, 70(2), 801–816. Manuszak, M. D., and A. Cohen (2004): “Endogenous Market Structure with Discrete Product Differentiation and Multiple Equilibria: An Empirical Analysis of Competition Between Banks and Thrifts,” Working paper. Nekipelov, D. (2007): “Entry Deterrence and Learning Prevention on eBay,” Job market paper, Duke University and Stanford University. Norets, A. (2006): “Inference in dynamic discrete choice models with serially correlated unobserved state variables,” working paper, Princeton University. 23

Pakes, A., J. Porter, K. Ho, and J. Ishii (2005): “Moment Inequalities and Their Application,” working paper, Harvard University. Parthasarathy, T. (1982): “Existence of Equilibrium Stationary Strategies in Discounted Stochastic Games,” Sankhya, 44, 114–137. Pesendorfer, M., and P. Schmidt-Dengler (2003): “Identification and Estimation of Dynamic Games,” NBER working paper No. w9726. Rust, J. (1994): “Structural Estimation of Markov Decision Processes,” in Handbook of Econometrics, Vol. 4, ed. by R. Engle, and D. McFadden, pp. 3081–3143. North Holland. Rust, J. (2006): “Dynamic Programming,” Department of Economics, University of Maryland. Rysman, M. (2004): “Competition Between Networks: A Study of the Market for Yellow Pages,” The Review of Economic Studies, 71(2), 483–512. Scott-Morton, F. (1999): “Entry Decisions into Generic Pharmaceutical Markets,” RAND Journal of Economics. Seim, K. (2005): “An Empirical Model of Firm Entry with Endogenous Product-Type Choices,” forthcoming, Rand Journal of Economics. Sweeting, A. (2005): “Coordination Games, Multiple Equilibria and the Timing of Radio Commercials,” Northwestern University Working Paper. Tamer, E. (2003): “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria,” Review of Economic Studies, 70(1).

24

Table 1. Parameter Estimates Number of Potential Entrants (excluding “other” firms) Parameter

2 firms

3 firms

4 firms

µc

9.54 (0.03) 0.97 (0.002) 0.44 (0.006) 0.03 (0.003) 9.93 (0.1) 1.57 (0.074) 0.95 (0.002) 0.96875 0.9375

10.06 (0.003) 0.99 (8.0e-05) 0.37 (0.0005) 0.07 (0.0002) 9.96 (0.02) 1.54 (0.03) 0.94 (0.0002) 0.96875 0.9375

10.07 (0.0006) 0.99 (5.9e-05) 0.37 (0.0002) 0.07 (7.7e-05) 9.99 (0.01) 1.68 (0.01) 0.93 (0.0001) 0.96875 0.9375

0.08 0.08 0.07

0.08 0.09 0.08 0.09

0.10 0.10 0.10 0.10 0.14

ρc σc κc µπ σπ γπ β pa CER all firms CER firm 1 CER firm 2 CER firm 3 CER firm 4

Standard errors are shown in parentheses. CER is the classification error rate computed by averaging the classification errors at Step 2e of the importance sampler at the maximum likelihood estimate.

25

11

12

MYLAN’s log cost

10

o

o

o o

9

o

o o

o

o

o

o

o

o o o

o o o

o 0

10

20

30

40

12

NOVOPHARM’s log cost

11

o

o

o

10

o o

o

o

o

o

o

o

9

o

0

10

20

30

40

12

LEMMON’s log cost

11

o o

o

o o

10

o

o

o

o 9

o o

0

10

20

30

40

4

6

8

10

12

log total revenue

3 2 0 1 1 0 1 0 0 3 0 3 1 0 2 1 0 3 0 0 1 1 2 0 2 0 0 0 2 1 0 2 3 3 0 0 1 2 1 0 0 0

10

20

30

40

Figure 1. Cost, Revenue, and Entry Decisionts. Plotted as a solid line in the first three panels are the logarithm of cost divided for the three dominant firms in the three firm model. The logarithm of cost is computed by averaging at Step 2e of the importance sampler at the maximum likelihood estimate. The circles in these plots indicate that the firm entered the market at that time point. The bottom panel shows the logarithm of total revenue. The numbers at the bottom are the count of the number of dominant firms who entered the market at that time point.

26

MYLAN’s entry decisions o x

0.8

o o x

o

o

o x

o x o x

o x

o

x o o x

o x

o x o x o x

x

x x

x

o x o x o x

0.4

x x

0.0

x o

x o o

x x o o

0

x o

x x o o

x o

x

x x x o x o o o

10

x o

x x o x o o

o

20

x o o

x x o o

30

40

NOVOPHARM’s entry decisions o x

0.8

o

o

o x

o x

o x

o

o x

o o x o x x

o x

x x

0.4

x

x x

0.0

x x x o o o

x o x o x o x o

0

x o

x x x o x o o o

x o o

x o x o

10

x o x o x o x o o

20

x x x o o o

x o x o 30

x o x o x o 40

LEMMON’s entry decisions o

o o x x

0.8

o o x

0.4

x

o o x

x

x o 10

o x

x

x

x x x x x o x o o o o o o

0.0

o x x o

x

x x

0

o

x x o o o o

x x o x o 20

x o o

x x o x o o x o x o x o x o 30

x x x x o x o x o x o o o o 40

Figure 2. Actual and Predicted Entry Decisionts. Plotted as circles are the entry decisions of the three dominant firms in the three firm model. The crosses are the average predictions of the three firm model computed by averaging game solutions at Step 2e of the importance sampler at the maximum likelihood estimate.

27