View this article - The Stata Journal

6 downloads 0 Views 216KB Size Report
standard errors robust to autocorrelation, but the command is not designed to ... command in Stata, and calculation of these effects around data mean values .... craggit is equipped to handle a model for heteroskedastic standard errors that is.
The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A&M University College Station, Texas 77843 979-845-8817; fax 979-845-6077 [email protected]

Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK [email protected]

Associate Editors Christopher F. Baum Boston College

Jens Lauritsen Odense University Hospital

Nathaniel Beck New York University

Stanley Lemeshow Ohio State University

Rino Bellocco Karolinska Institutet, Sweden, and University of Milano-Bicocca, Italy

J. Scott Long Indiana University

Maarten L. Buis T¨ ubingen University, Germany

Thomas Lumley University of Washington–Seattle

A. Colin Cameron University of California–Davis

Roger Newson Imperial College, London

Mario A. Cleves Univ. of Arkansas for Medical Sciences

Austin Nichols Urban Institute, Washington DC

William D. Dupont Vanderbilt University

Marcello Pagano Harvard School of Public Health

David Epstein Columbia University

Sophia Rabe-Hesketh University of California–Berkeley

Allan Gregory Queen’s University

J. Patrick Royston MRC Clinical Trials Unit, London

James Hardin University of South Carolina

Philip Ryan University of Adelaide

Ben Jann ETH Z¨ urich, Switzerland

Mark E. Schaffer Heriot-Watt University, Edinburgh

Stephen Jenkins University of Essex

Jeroen Weesie Utrecht University

Ulrich Kohler WZB, Berlin

Nicholas J. G. Winter University of Virginia

Frauke Kreuter University of Maryland–College Park

Jeffrey Wooldridge Michigan State University

Stata Press Editorial Manager Stata Press Copy Editors

Lisa Gilmore Jennifer Neve and Deirdre Patterson

The Stata Journal publishes reviewed papers together with shorter notes or comments, regular columns, book reviews, and other material of interest to Stata users. Examples of the types of papers include 1) expository papers that link the use of Stata commands or programs to associated principles, such as those that will serve as tutorials for users first encountering a new field of statistics or a major new technique; 2) papers that go “beyond the Stata manual” in explaining key features or uses of Stata that are of interest to intermediate or advanced users of Stata; 3) papers that discuss new commands or Stata programs of interest either to a wide spectrum of users (e.g., in data management or graphics) or to some large segment of Stata users (e.g., in survey statistics, survival analysis, panel analysis, or limited dependent variable modeling); 4) papers analyzing the statistical properties of new or existing estimators and tests in Stata; 5) papers that could be of interest or usefulness to researchers, especially in fields that are of practical importance but are not often included in texts or other journals, such as the use of Stata in managing datasets, especially large datasets, with advice from hard-won experience; and 6) papers of interest to those who teach, including Stata with topics such as extended examples of techniques and interpretation of results, simulations of statistical concepts, and overviews of subject areas. For more information on the Stata Journal, including information for authors, see the web page http://www.stata-journal.com The Stata Journal is indexed and abstracted in the following: R • CompuMath Citation Index • RePEc: Research Papers in Economics R • Science Citation Index Expanded (also known as SciSearch )

Copyright Statement: The Stata Journal and the contents of the supporting files (programs, datasets, and c by StataCorp LP. The contents of the supporting files (programs, datasets, and help files) are copyright  help files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal. The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal. Written permission must be obtained from StataCorp if you wish to make electronic copies of the insertions. This precludes placing electronic copies of the Stata Journal, in whole or in part, on publicly accessible web sites, fileservers, or other locations where the copy may be accessed by anyone other than the subscriber. Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting files understand that such use is made without warranty of any kind, by either the Stata Journal, the author, or StataCorp. In particular, there is no warranty of fitness of purpose or merchantability, nor for special, incidental, or consequential damages such as loss of profits. The purpose of the Stata Journal is to promote free communication among Stata users. The Stata Journal, electronic version (ISSN 1536-8734) is a publication of Stata Press. Stata and Mata are registered trademarks of StataCorp LP.

The Stata Journal (2009) 9, Number 4, pp. 584–592

Fitting and interpreting Cragg’s tobit alternative using Stata William J. Burke Department of Agricultural, Food, and Resource Economics Michigan State University East Lansing, MI [email protected] Abstract. In this article, I introduce the user-written command craggit, which simultaneously fits both tiers of Cragg’s (1971, Econometrica 39: 829–844) “twotier” (sometimes called “two-stage” or “double-hurdle”) alternative to tobit for corner-solution models. A key limitation to the tobit model is that the probability of a positive value and the actual value, given that it is positive, are determined by the same underlying process (i.e., the same parameters). Cragg proposed a more flexible alternative that allows these outcomes to be determined by separate processes through the incorporation of a probit model in the first tier and a truncated normal model in the second. Also, tobit is nested in craggit, making the latter a popular choice among “two-tier” models. In the article, I also present postestimation syntax to facilitate the understanding and interpretation of results. Keywords: st0179, craggit, two tier, two stage, double hurdle, corner-solution models, Craggit’s tobit

1

Introduction

Introductory econometric texts commonly reference tobit for fitting models with limited dependent variables that are sometimes called “corner-solution” models.1 Fitting and interpreting the tobit model is fairly straightforward through the use of tobit and associated postestimation commands in Stata.2 A key limitation to the tobit model is that the probability of a positive value and the actual value, given that it is positive, are determined by the same underlying process (i.e., the same parameters). Cragg (1971) proposed a more flexible alternative, sometimes called a “two-tier” or “double-hurdle” model, which allows these outcomes to be determined by separate processes. Thus far, fitting Cragg’s tobit alternative using Stata has required a disjointed process, and interpreting results has been complicated and tedious. In this article, I present a command, craggit, that enables a more coherent fitting of Cragg’s model, as well as presenting postestimation syntax to facilitate the interpretation of results. The model in this article is applied to cross-sectional data. The craggit command can be used with either balanced or unbalanced panel data as well, 1. For examples, see Pindyck and Rubinfeld (1998), Kennedy (2003), Wooldridge (2009), or Baum (2006). 2. For example, see Baum (2006, 262–266).

c 2009 StataCorp LP 

st0179

W. J. Burke

585

but only as a pooled estimator (e.g., the vce(cluster) option can be used to compute standard errors robust to autocorrelation, but the command is not designed to control for unobserved heterogeneity). Conceptually, a corner-solution model is where yi = yi∗ yi = 0 and

if if

yi∗ > 0 yi∗ ≤ 0

yi∗ = α + X i β + εi

In practice, as the name suggests, a corner-solution model applies to dependent variables where data are truncated and “piles up” at some given value, but is continuous otherwise. Some examples would be agricultural production or input demands where some observations’ optimizing behavior results in no production or demand, while the outcome is continuous for others.

2

The tobit model

Introduced by Tobin (1958), the tobit model proposes the likelihood function, 1(y=0)

f (y | x1 ) = {1 − Φ(x1 β/σ)}

1(y>0) 1 (2π)− 2 σ −1 exp −(y − x1 β)2 /2σ 2



where Φ is the standard normal cumulative distribution function and exponential indicator functions are 1(y = 0) and 1(y > 0). There are four values of interest after fitting a tobit model: 1) the probability that y is zero, P (yi = 0 | xi ); 2) the probability that y is positive, P (yi > 0 | xi ); 3) the expected value of y, conditional on y being positive, E(yi | yi > 0, xi ); and 4) the so-called “unconditional” expected value of y, E(yi | xi ).3 Then, of course, one could calculate the marginal effects of an explanatory variable on each probability and expectation. Fitting this model is fairly simple using the tobit command in Stata, and calculation of these effects around data mean values can be obtained using various margins postestimation commands. For a thorough discussion on the tobit model and its interpretation, refer to Wooldridge (2009, 587–595).

3

Cragg’s alternative to the tobit model

Again, while useful, the major drawback of the tobit model is that the choice of y > 0 and the value of y, given that y > 0, is determined by the same vector of parameters (β from above). For example, this imposes that the direction (sign) of a given determinant’s marginal effect will be the same on both the probability that y > 0 and the expectation of y, conditional or otherwise. As an alternative, Cragg proposed the following, which 3. The commonly used phrase “unconditional expectation” is a bit of a misnomer because all expectations are conditional on the explanatory variables (x).

586

Cragg’s tobit alternative using Stata

integrates the probit model to determine the probability of y > 0 and the truncated normal model for given positive values of y,  1 1(w=0) f (w, y | x1 , x2 ) = {1 − Φ(x1 γ)} Φ(x1 γ)(2π)− 2 σ −1 1(w=1)

exp −(y − x2 β)2 /2σ 2 /Φ(x2 β/σ) where w is a binary indicator equal to 1 if y is positive and 0 otherwise. Notice in Cragg’s model the probability of y > 0 and the value of y, given y > 0, are now determined by different mechanisms (the vectors γ and β, respectively). Furthermore, there are no restrictions on the elements of x1 and x2 , implying that each decision may even be determined by a different vector of explanatory variables altogether. Also notice that the tobit model is nested within Cragg’s alternative because if x1 = x2 and γ = β/σ, the models become identical. For a more thorough discussion of this and other double-hurdle alternatives to tobit, refer to Wooldridge (2002, 536–538). Fitting Cragg’s alternative requires the additional assumption of conditional independence for the latent variable’s distribution, or D(y ∗ | w, x) = D(y ∗ | x).4 Please refer to Wooldridge (forthcoming) for a thorough discussion. In short, this implies that the unbiasedness of results is sensitive to model misspecification. From Cragg’s model, we can obtain the same probabilities and expected values as with tobit by using an updated functional form. The probabilities regarding whether y is positive are (1) P (yi = 0 | x1i ) = 1 − Φ(x1i γ) P (yi > 0 | x1i ) = Φ(x1i γ)

(2)

The expected value of y, conditional on y > 0 is E(yi | yi > 0, x2i ) = x2i β + σ × λ(x2i β/σ)

(3)

where λ(c) is the inverse Mills ratio (IMR) λ(c) = φ(c)/Φ(c) where φ is the standard normal probability distribution function. Finally, the “unconditional” expected value of y is E(yi | x1i , x2i ) = Φ(x1i γ) {x2i β + σ × λ(x2i β/σ)}

(4)

For a given observation, the partial effect of an independent variable, xj , around the probability that y > 0 is5 ∂P (y > 0 | x1 ) = γj φ(x1 γ) ∂xj

(5)

4. As described in Wooldridge (forthcoming), the exponential type-two tobit relaxes this assumption; however, convergence can be difficult, especially when x1 = x2 . 5. The marginal effect around the probability y = 0 is the negative of the value in (5).

W. J. Burke

587

where γj is the element of γ representing the coefficient on xj . Equations (1), (2), and (5) are the same as the probabilities and partial effect from a probit regression of w on x1 . The partial effect of an independent xj on the expected value of y, given y > 0, is ∂E(yi | yi > 0, x2i ) = βj [1 − λ(x2 β/σ) {x2 β/σ + λ(x2 β/σ)}] ∂xj

(6)

where βj is the element of β representing the coefficient on xj . Equations (3) and (6) are the same as the expected values and partial effect from a truncated normal regression of y on x2 , with emphasis that the effect is conditional on y being positive. The partial effect of an independent xj on the “unconditional” expected value of y is somewhat trickier, because it depends on whether xj is an element of x1 , x2 , or both. First, if xj is an element of both vectors, the partial effect is ∂E(y | x1 , x2 ) = γj φ(x1 γ) × {x2 β + σ × λ(x2 β/σ)} ∂xj

(7)

+ Φ(x1 γ) × βj [1 − λ(x2 β/σ) {x2 β/σ + λ(x2 β/σ)}] if xj x1 , x2 Now, if xj is only determining the probability of y > 0, then βj = 0, and the second term on the right-hand side of (7) is canceled. On the other hand, if xj is only determining the value of y, given that y > 0, then γj = 0, and the first right-hand side term in (7) is canceled. In either of the latter cases, the marginal effect will still be a function of parameters and explanatory variables in both tiers of the regression.

4

Fitting the Cragg model using Stata

The maximum likelihood estimation (MLE) of γ can be obtained by regressing w on x1 by using probit in Stata. Similarly, the MLE of β and σ can be obtained by regressing y on x2 by using truncreg. Or, all parameters can be fit simultaneously using craggit. The syntax is         craggit depvar1 indepvars1 if in weight , second(depvar2    indepvars2 ) noconstant vce(vcetype) hetero(varlist) level(#)  maximize options depvar1 is the indicator variable for whether y > 0 (w in the above notation), and indepvars1 are the independent variables explaining that decision (x1 in the above notation). depvar2 is the continuous response for positive values (y from above), and indepvars2 are the independent variables explaining it (x2 from above). second() is so named because the determinants of y, given y > 0, are traditionally thought of as the second “tier” or “hurdle” of the model.

588

Cragg’s tobit alternative using Stata

Using an example of household-level demand for subsidized fertilizer in Zambia (these data are described in the appendix available online6 ), output from craggit estimation is presented as follows: . craggit basal_g disttown cland educ age, > second(qbasal_g disttown cland educ age) Estimating Cragg s tobit alternative Assumes conditional independence initial: log feasible: log rescale: log rescale eq: log Iteration 0: log (output omitted ) Iteration 12: log

likelihood likelihood likelihood likelihood likelihood

= = = = =

- -1.316e+08 -692711.38 -7868.8964 -7868.8964

(could not be evaluated)

likelihood = -7498.6146 Number of obs Wald chi2(4) Prob > chi2

Log likelihood = -7498.6146 Coef.

Std. Err.

z

= = =

6378 310.27 0.0000

P>|z|

[95% Conf. Interval]

Tier1 disttown cland educ age _cons

-.0069824 .1013351 .0510291 .0065328 -1.717704

.0010228 .008374 .0054314 .0014688 .0929843

-6.83 12.10 9.40 4.45 -18.47

0.000 0.000 0.000 0.000 0.000

-.008987 .0849223 .0403837 .0036541 -1.89995

-.0049779 .1177478 .0616744 .0094116 -1.535458

Tier2 disttown cland educ age _cons

-4.082447 346.8557 248.2076 12.1808 -10131.47

12.82117 195.8594 178.2567 20.47023 7309.131

-0.32 1.77 1.39 0.60 -1.39

0.750 0.077 0.164 0.552 0.166

-29.21148 -37.02165 -101.1691 -27.94012 -24457.1

21.04659 730.733 597.5843 52.30172 4194.162

1110.764

391.2037

2.84

0.005

344.0187

1877.509

sigma _cons

Results from the section labeled Tier1 are the MLE of γ, and those labeled Tier2 are the MLE of β from Cragg’s likelihood function. The constant term in the section labeled sigma is the MLE of σ. Whether estimations are obtained simultaneously or one regression at a time, the results will be identical because of the separability of Cragg’s likelihood function. That is, while using craggit makes estimation more coherent, it will not change results. The primary benefit of using craggit is its ability to facilitate postestimation analysis and interpretation. I emphasize that, as with all double-hurdle models, separability in estimation does not imply separability in interpretation. To illustrate this point, consider a variable that explains the continuous value of y but not the probability that y > 0. If fit separately, 6. Visit https://www.msu.edu/˜burkewi2/estimating cragg appendix.pdf.

W. J. Burke

589

the partial effect of such a variable on the unconditional expected value of y [the second right-hand-side term in (7)] would be a function of probit results, even though it would not be included in the probit regression. For another example, xj may be in both explanatory vectors and have countervailing impacts in each tier (i.e., βj and γj may have different signs). Here the direction of the overall effect cannot be known unless all results are considered together, as in (7).

5

Interpreting results after fitting with craggit

Postestimation analysis commands will be presented for a general case, but it may be useful to follow along with the example by using Zambian fertilizer data presented in the online appendix. After estimation using craggit, new variables representing the  σ  , x2i β, ) can be generated for each observation using predict: three scalar values (x1i γ predict , eq(Tier1) predict , eq(Tier2) predict  , eq(sigma)

The bold terms in the above syntax are the names of the new variables generated by the predict commands. The choice of variable name is not important to the final results, except that the names should be used consistently throughout. For this article, generated variable names (and later, program names) will continue to be in bold and will endeavor to be consistent with the notation in (1)–(7). craggit is equipped to handle a model for heteroskedastic standard errors that is a function of observables with the hetero(varlist) option. If that is used, the above command will generate unique values of sigma for each observation ( σi ). Now all the information we need to calculate the predicted values in (1)–(4) and partial effects in (5)–(7) is either predicted as a new variable or stored in Stata’s active memory. The following commands will generate new variables with unique values for each observation that can then be aggregated to any desired level. To calculate the probability that y = 0 from (1): generate = 1 - normal()

To calculate the probability that y > 0 from (2): generate  = normal()

To calculate expected values, it is useful to generate a new variable for the IMR: generate  = normalden(/ )/normal(/ )

To calculate the expected value of y, given that y > 0, from (3): generate  =  +  *

590

Cragg’s tobit alternative using Stata To calculate the unconditional expected value of y from (4): generate  = normal()*( +  *)

If indepj is an independent variable of interest (xj from the above notation), then to calculate the partial effect on the probability that y > 0 from (5): generate   = [Tier1]_b[indepj]*normalden()

To calculate the partial effect on the expected value of y, given y > 0 from (6): generate =[Tier2]_b[indepj]*(1-*(/ +))

Finally, to calculate the partial effect on the unconditional expected value of y from (7): generate =[Tier1]_b[indepj]*normalden()*(+ *) +[Tier2]_b[indepj]*normal()*(1-*(/ +))

///

Once these predicted values have been generated for each observation, the average partial effect (APE) of the independent variable can be found using the summarize command. For example, the APE on the unconditional expected value of y is the mean of  : summarize 

We can also see how this APE varies across groups by using the tabulate command. Suppose we have a categorical variable, catvar: tabulate catvar, summarize()

The standard deviations reported by these summaries describe only the data and should not be considered a parameter estimate. That is, the standard deviation of the predicted partial effects should not be used as a standard error (SE) for inference on the APE. One viable option for inference on an APE is bootstrapping. Bootstrapping starts by reestimating the model and generating a new APE on a random subsample within the data. This process is repeated many times until many APEs have been computed from numerous random subsamples. The standard deviation from those APEs can be used as an SE for the full sample APE. Fortunately, Stata can be programmed to run this process for you, albeit after several lines of programming:7

7. Thank you to David Tschirley and Ana Fernandez for providing syntax to bootstrap APE standard errors.

W. J. Burke

591

program define  , rclass preserve craggit depvar1 indepvars1, second(depvar2 indepvars2) predict , eq(Tier1) predict , eq(Tier2) predict  , eq(sigma) generate  = normalden(/ )/normal(/ ) /// generate = /// [Tier1]_b[indepj]*normalden()*(+ *) +[Tier2]_b[indepj]*normal()*(1-*(/ +)) summarize  return scalar =r(mean) matrix =r( ) restore end bootstrap  = r( ), reps(#):  

Here, again, the variable and program names in bold type can be chosen subjectively, so long as they are used consistently and none of the variables generated by the program has the same name as the existing variables. Again the explanatory variable whose APE is being computed is indepj. The # sign must be replaced by the number of times Stata should iterate the bootstrap process (the more iterations there are, the longer the process will take, at about 5–10 seconds per iteration; 100 is a reasonable starting point). While the bootstrap is running, after the last command from above is entered, Stata provides progress updates: Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .x.......x..x..x...............................x.. x...x.......x......................

50 100

where each “.” represents a completed iteration and the occasional “x” represents a “failed” iteration. Here the failed iterations indicate that Stata was not able to fit coefficients for that subsample. This is likely because the random draw did not provide enough variation in the binary indicator variable (w) for craggit to converge. Stata ignores these failures, and they will not have much impact on the final results, except to reduce the number of iterations used to compute the bootstrap SE. To calculate the APE for a different explanatory variable, simply rewrite all the commands from program to bootstrap, changing the name of indepj. First, however, the original program must be cleared from Stata’s memory: capture program drop  

Unfortunately, bootstrapping can be a time-consuming process, especially when conducting inference on the APE of multiple variables. An alternative would be to approximate a standard error using the delta method. This is when an SE is approximated using a Taylor expansion around the data mean, which can be accomplished after estimation with craggit by using the nlcom postestimation command and a few local macros.8 After fitting and predicting  and  : 8. Thank you to Joleen Hadrich and Joshua Ariga for assistance on the syntax for the delta method inference.

592

Cragg’s tobit alternative using Stata summarize  local   = r(mean) summarize  local   = r(mean) nlcom [Tier1]_b[indepj]*normalden(  )*(  +[sigma]_b[_cons]* /// /// (normalden(  /[sigma]_b[_cons])/normal(  / /// [sigma]_b[_cons])))+[Tier2]_b[indepj]*normal(  ) /// *(1- (normalden(  /[sigma]_b[_cons])/normal(  / /// [sigma]_b[_cons]))*(  /[sigma]_b[_cons]+(normalden(  / [sigma]_b[_cons])/normal(  /[sigma]_b[_cons]))))

The SE provided after nlcom can be used with the APE from the summarize command to compute the p-value manually9 (the p-value provided after nlcom is only valid for the partial effect at the mean of x, not the APE).

6

References

Baum, C. F. 2006. An Introduction to Modern Econometrics Using Stata. College Station, TX: Stata Press. Cragg, J. G. 1971. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39: 829–844. Kennedy, P. 2003. A Guide to Econometrics. 5th ed. Cambridge, MA: MIT Press. Pindyck, R. S., and D. L. Rubinfeld. 1998. Econometric Models and Economic Forecasts. 4th ed. New York: McGraw–Hill. Tobin, J. 1958. Estimation of relationships for limited dependent variables. Econometrica 26: 24–36. Wooldridge, J. M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. ———. 2009. Introductory Econometrics: A Modern Approach. 4th ed. Mason, OH: South-Western. ———. Forthcoming. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press. About the author William J. Burke is a PhD student and agricultural economics research specialist at Michigan State University. He has coauthored several policy analysis studies focused on Africa and has a strong interest in facilitating the use of nonstandard model structures.

9. See page 5 of the online appendix for an example of computing the p-value manually.