Matching using Semiparametric Propensity Scores

Matching using Semiparametric Propensity Scores Gregory Kordas Department of Economics University of Pennsylvania [email protected]

Steven F. Lehrer Wharton School HCSD University of Pennsylvania [email protected]

June 19, 2003 Abstract This paper considers microeconometric evaluation by matching methods when selection in to the program under consideration is heterogeneous. Existing studies generally use parametric estimators of binary response models such as the probit and logit to estimate the propensity score, which allows for very limited forms of heterogeteity and imposes strong distributional assumptions on the error term that are often violated with the underlying data. We introduce an easy to implement matching strategy that incorporates semiparametric propensity scores that allow for very general forms of heterogeneity in response across observed covariates along the conditional willingness to participate in the treatment intervention distribution. Data from the NSW experiment, CPS and PSID are used to evaluate the performance of alternative matching estimators. We find significant evidence of heterogeneity and that the proposed algorithm generally exhibits lower bias and accurately captures the experimental treatment impact. A detailed examination of the average absolute bias errors between our procedure and matching algorithms based on parametric propensity scores indicate reductions between 6.2% and 706% of the experimental program impact.

*We are grateful to Mianna Plesca and seminar participants at the 2003 CEA meetings, 2003 IHEA World Conress, Concordia University, Florida State University, Lehigh University, McGill University, Queens University, Simon Fraser University, SUNY Albany, UNC-Greensboro, University of South Florida and the Wharton School, University of Pennsylvania for comments and suggestions which have helped to improve this paper. We are responsible for all errors.

1

1

Introduction

An increasing body of evidence has found that there is significant diversity and heterogeneity in response to a given policy. Heckman (2001) argues that this has profound consequences for economic theory and for economic practice. In particular, accounting for heterogeneity may improve the performance of non-experimental estimators. In this paper, we introduce and evaluate the performance of an easy to implement propensity score matching estimation strategy that explicitly accounts for heterogeneity in response across observed covariates along the conditional willingness to participate in the treatment intervention distribution. Matching estimators evaluate the effects of a treatment intervention by comparing outcomes such as wages, employment, fertility or mortality for treated persons to those of similar persons in a comparison group. The use of the propensity score as a basis for matching treated and untreated individuals (and thus for evaluating the magnitude of treatment effects) is becoming increasingly common in clinical medicine, demographic and economic research. The propensity score is defined as the conditional probability of being treated given the individual’s covariates and requires the assumption of selection on observables.1 Existing studies use parametric estimators of binary response models, such as the probit and logit which imposes strong distributional assumptions on the underlying data. In particular, the dangers of misspecification may be severe if the error terms are not independent and identically distributed from their known parametric distributions.2 Kordas (2002) outlines the benefits of using Manski’s (1975, 1985) binary regression quantiles to provide consistent estimates of the conditional probability at different points of the distribution. This estimator avoids the distributional restrictions embedded in the parametric approach and has the advantage that it is robust and can accommodate heteroskedasticity of unknown form. This property is extremely valuable in our setting as the estimator can accommodate problems of heterogeneity, self-selection and misclassification. Todd (1999) presents the only other study that we are aware of that considers matching using 1

The assumption of selection on observables requires that conditioning on the observed variables the assignment to treatment is random. Propensity score matching (Rosenbaum and Rubin (1983)) estimators reduce the dimensionality of having to match participants and non participants on the set of conditioning variables (X) by matching solely on the basis estimated propensity scores (P (X)). 2 Horowitz (1993) demonstrates that misspecification of the conditional distribution of the residual in parametric binary response model is likely to be severe under heteroskedasticity and bimodality.

2

semiparametrically estimated propensity scores. She considers matching using the index estimated from both the semiparametric least squares estimator of Ichimura (1993) and the quasi maximum likelihood estimator of Klein and Spady (1993).3 Her Monte Carlo study demonstrates that the gains from using the semiparametric least squares procedure relative to parametric alternatives are greatest when either the systematic component of the model is misspecified or when the error distribution is highly asymmetric. Our approach offers several additional benefits for empirical researchers. First, this estimator does not require the researcher to select higher order or interaction terms to ensure balancing of the covariates across the treatment and non treatment groups. Recent work in economics (Dehejia and Wahba, 2002) has proposed the use of balancing tests to determine if additional higher order or interaction terms should be included in the estimates of the propensity scores but does not provide guidance on precisely which of these terms should be included. Second, the results from this estimator can also be used to calculate quantile treatment effects. Researchers can determine the average treatment effect on the treated at different points along the probability of participation distribution. Accounting for heterogeneity in the impact of the program across individuals provides a more complete picture of the effectiveness of the treatment employed. To demonstrate the performance of our estimation strategy we use experimental data originally employed in LaLonde (1986). This data has been used in a number of studies that have evaluated the performance of different non-experimental estimators including propensity score matching. While early evidence (Dehejia and Wahba (1999)) found that propensity score matching estimators were able to replicate experimental treatment effects, more recent evidence calls these findings in question (Smith and Todd (2002)) and indicate that accounting for permanent unobserved heterogeneity does lower the estimated bias with propensity score matching estimators. If accounting for heterogeneity is indeed a major source of bias then our estimation strategy will account for it.4 3

These methods estimate a conditional mean and overcome the distributional restrictions embedded in the parametric approach but allows for only limited forms of heterogeneity. 4 Our strategy is unable to account for selection on unobservables that would result in an omitted variable bias problem.

3

2

Econometric Methods

2.1

Framework

Cross-sectional matching estimators compare outcomes for treatment (Y1 ) and comparison group (Y0 ) individuals measured at some time period after the program. We define Di = 1 indicate if person i received treatment and Di = 0 if not. The goal of any evaluation study is to estimate the causal effect of the treatment program. One parameter of interest is the effect of the treatment on the treated (AT TD=1 (X)), which can be defined conditional on some characteristics X as AT TD=1 (X) = E(Y1 − Y0 |X, D = 1). The propensity score reduces the dimension of the conditioning problem in matching by replacing an estimate of E(Y0i |D = 0, Xi ) with an estimate of E(Y0i |D = 0, P (Xi )); where P (X) = P r(Di = 1|X). . Conditioning on the propensity score yields, AT TD=1 (X) = E{E(Y1i |P (Xi ), Di = 1) − E(Y0i |P (Xi ), Di = 1)}

(1)

where Y1i and Y0i are the potential outcomes in two counterfactual situations. To derive equation 1 given the definition of P (X) requires that matching is to be performed over an area of common support ( 0 < Pr(D = 1|X) < 1) and a balancing hypothesis. D ⊥ X|P (X). The balancing hypothesis requires observations with the same propensity score to have the same distribution of observable and unobservable characteristics independent of treatment status. The ease of implementation of these estimators has resulted in a substantial increase in their application in economics and other fields. Implementation involves two steps. In the first step, the conditional probability of participating in the treatment intervention is estimated using either a probit or logit estimator. In the second step, the researcher uses a matching algorithm to construct the matched outcomes for the treated group. Algorithms differ in the distance metric they use to determine which individuals are suitable matches to the treated persons so they can be included in the comparison group of individuals.5 Our approach differs by estimating this probability using the following semiparametric procedure. 5

See Smith and Todd (2002) for a comprehensive overview of alternative cross sectional matching algorithms.

4

2.2

Calculating the Propensity Score Semiparametrically

To avoid the distributional and other restrictions embedded in the parametric specification of Pr(Di = 1|Xi ) we use Manski’s (1975, 1985) binary regression quantiles. Our use of binary regression quantiles is motivated from an estimation viewpoint as with heterogenous populations, a family of quantile estimates can provide a more complete picture of how covariates affect various conditional quantiles of the latent response variable underlying the observed binary indicator. Define the latent variable Di∗ and assume that we may write it’s q-th conditional quantile function as linear index QDi∗ (q|Xi ) = Xi0 α(q), q ∈ (0, 1). (2) where α(q) is the coefficient vector for the q-th conditional quantile. Using the equivariance property of quantile functions with respect to monotonic transformations we write the conditional quantile function of Di = 1{Di∗ ≥ 0} as QDi (q|Xi ) ≡ Q1{Di∗ ≥0} (q|Xi ) = 1{QDi∗ (q|Xi ) ≥ 0} = 1{Xi0 α(q) ≥ 0}.

(3)

This estimator is the binary response analogue to the linear quantile regression estimator introduced by Koenker and Bassett (1978) and offers a robust and efficient semiparametric alternative to commonly used parametric models. From an empirical point of view, their main advantage is their ability to model very general forms of population heterogeneity by allowing the coefficient vector (α(q)) to vary across the conditional quantiles of the dependent variable. Estimates of the scaled coefficients α(q) such that ||α(q) = 1||, are obtained by solving the quantile regression problem (

α(q) = argmina:||a||=1 SN (a) = N

−1

N X i=1

ρq (Di −

1{Xi0 a

)

≥ 0}) ,

(4)

where ρq (u) = (q − 1{u < 0}) · u, and SN (·) is the score function. Since SN is a multimodal step function of a, binary quantile regression estimators are solutions to difficult optimization problems.6 The discontinuities of the objective function also affect the asymptotic behavior of the estimators that have been shown to converge at the slow N 1/3 rate to a non-gaussian random variable (Kim and Pollard, 1990). To overcome these problems Horowitz 6

Optimization is performed using the simulated annealing algorithm. See Goffe et al. (1994) for details.

5

(1992) smoothed the median score function and derived a smoothed median estimator that is asymptotically normally distributed 7 . Kordas (2002) extended these results to show joint asymptotic normality of families of smoothed binary quantile estimates and showed how these smoothed estimates may be optimally combined for efficient estimation. Our main objective is to use quantile estimates to derive semiparametric estimates of the propensity score, or equivalently, semiparametric estimates of the probability that a given individual receives treatment. To this effect the un-smoothed binary quantile estimates will suffice. Thus we only consider un-smoothed estimation. With these estimates we can compute the counterfactual outcome E(Y0i |P (Xi ), Di = 1). Turning to the issue of computing probabilities from quantile estimates, note that the quantile regression model in (3) implies that if an individual’s q-th conditional quantile Xi0 α(q) is (approximately) equal to zero, his conditional probability of receiving treatment is (approximately) equal to 1 − q, i.e., Pr(Di = 1|X 0 α(q) = 0) = 1 − q. (5) Given estimates of α(q) over a grid θ = {q1 , q2 , · · · , qM |q1 < q2 BRQ Bins ↓ [0-0.05%) [.05-0.1%) [0.1-0.15%) [0.15-0.2%) [0.2-0.25%) [0.25-0.3%) [0.3-0.35%) [0.35-0.4%) [0.4-0.45%) [0.45-0.5%) [0.5-0.55%) [0.55-0.6%) [0.6-0.65%) [0.65-0.7%) [0.7-0.75%) [0.75-0.8%) [0.8-0.85%) [0.85-0.9%) [0.9-0.95%) [0.95-1.0%) Total

0

1

2178 58 42 42 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2221 103

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 Total

17 23 2 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 45

5 21 2 5 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 36

8 19 4 6 0 2 0 0 2 0 0 1 0 0 0 0 0 0 0 0 42

0 6 0 4 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 12

0 0 0 3 0 0 0 0 4 0 0 5 0 0 0 0 0 0 0 1 13

0 0 1 1 0 1 0 0 0 0 0 2 0 1 0 0 0 0 0 0 6

0 0 0 1 0 1 0 0 4 0 0 1 0 1 0 0 2 0 0 2 12

0 0 0 0 0 0 0 0 5 0 0 2 0 0 0 0 3 0 0 0 10

0 0 0 0 1 0 0 0 2 0 0 4 0 0 0 0 3 0 0 1 11

24

0 0 0 0 1 0 0 0 2 0 0 5 0 0 0 0 5 0 1 0 18

0 0 0 0 0 0 0 0 6 0 2 7 0 1 0 0 10 0 2 0 20

0 0 0 0 1 0 0 0 0 0 0 4 0 0 0 0 7 0 5 1 14

0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 10 0 5 1 23

0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 2 0 3 7 16

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 5 0 2 20 28

0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 10 0 5 8 24

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 12 14 29

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 32 24 57

2266 153 10 20 5 0 0 4 27 0 2 45 0 6 0 0 61 0 62 79 2740

Appendix Table 1: Summary Statistics Sample LaLonde LaLonde Treated Controls Sample Size Age Years of Education

297 24.626 (6.686) 10.380 1.818) 0.094 0.801 0.168 0.731

425 24.447 (6.590) 10.188 (1.619) 0.113 0.80 0.158 0.814

Hispanic Black Married Dropout Zero Earnings in 1974 0.441 .461 Zero Earnings in 1975 0.374 0.419 Real Earnings 3571.00 3672.49 in 1974 (5773.13) (6521.53) Real Earnings 3066.10 3026.68 in 1975 (4874.89) (5201.25) Real Earnings 5976.35 5090.05 in 1978 (6923.80) (5718.09) Note: Standard Deviation in Parentheses

DW Treated DW Controls 185 25.816 (7.155) 10.346 (2.011) 0.059 0.843 0.189 0.708

260 25.054 (7.058) 10.088 (1.614) 0.108 0.827 0.154 0.835

Early Assignment Treated 108 25.370 (6.251) 10.491 (1.643) 0.074 0.824 0.204 0.713

0.708

0.75

0.50

0.60 2095.57 (4886.62) 1532.06 (3219.25) 6349.14 (7867.40)

0.685 2107.03 (5687.91) 1266.91 (3102.98) 4554.80 (5483.84)

0.324 3589.64 (5970.74) 2596.03 (3871.68) 7357.41 (9027.18)

25

Early Assignment Control 142 26.014 (7.108) 10.275 (1.572) 0.113 0.817 0.190 0.803

CPS

PSID

15992 33.225 (11.045) 12.028 (2.871) 0.072 0.074 0.712 0.296

2490 34.851 (10.441) 12.117 (3.082) 0.032 0.251 0.866 0.305

0.542

0.120

0.086

0.472 3857.94 (7254.27) 2276.96 (3919.28) 4608.92 (6031.96)

0.109 14016.8 (9569.80) 13650.8 (9270.40) 14846.66 (9647.39)

0.100 19428.8 (13406.9) 19063.3 (13596.9) 21553.9 (15555.4)

Appendix Table 2: Treatment Effect Estimates with Parametric Propensity Scores using Stratification Matching SPECIFICATION ONE SPECIFICATION TWO

Experiment Impact 20 Bins Excluding 00.05

20 Bins no exclusion 20 Bins DW exclusion

Column 1

Column 2

Column 3

Column 4

Column 5

Column 6

Column 1

Column 2

Column 3

Column 4

Column 5

Column 6

886.32

886.32

1794.34

1794.34

2748.49

2748.49

886.32

886.32

1794.34

1794.34

2748.49

2748.49

-277.54 (603.86)

-401.81 (788.77)

1327.11 (860.30)

1631.55 (1157.9)

1386.91 (1108.3)

2242.87 (1265.9)

-373.80 (556.89)

18.44 (706.37)

1361.09 (794.66)

1940.58 (911.70)

2437.87 (1063.0)

1380.82 (1303.5)

-1067.65 (596.06) -819.79 (582.53)

-611.86 (759.50) -632.00 (800.09)

14.76 (816.72) 1096.57 (800.63)

1151.39 (1094.7) 1394.41 (1060.4)

211.18 (959.90) 1255.21 (1032.5)

1441.67 (1351.0) 1297.61 (1125.5)

-1312.4 (544.38) -803.78 (489.33)

-301.82 (707.88) -268.06 (688.45)

171.88 (776.22) 902.33 (731.83)

1354.01 (947.75) 1513.39 (920.17)

591.07 (1060.0) 1634.18 (938.87)

465.81 (1237.7) 694.29 (1245.2)

Note: Bootstrapped standard errors in parentheses. 1000 Bootstrap replications. DW exclusion drops all individuals in the treatment group with estimated propensity scores above the maximum propensity score in the control group and drops all control individuals whose estimated propensity score is less than the minimum propensity score of the treatment group. Appendix Table 3: Evaluation Bias Estimates with Parametric Propensity Scores using Stratification Matching SPECIFICATION ONE SPECIFICATION TWO

Experiment Impact 20 Bins 10 Bins 20 Bins 10 Bins

Column 1

Column 2

Column 3

Column 4

Column 5

Column 6

Column 1

Column 2

Column 3

Column 4

Column 5

Column 6

886.32

886.32

1794.34

1794.34

2748.49

2748.49

886.32

886.32

1794.34

1794.34

2748.49

2748.49

-1318.41 (530.93) -1148.28 (539.09)

-1011.45 (696.50) -1070.33 (783.31)

62.14 (673.98) -124.51 (675.83)

-685.21 (594.36) -985.08 (634.63)

-399.41 (547.29) -229.11 (566.49)

679.02 (760.47) 248.66 (1056.9)

-1275.2 (634.42) -1129.4 (641.74)

-1862.5 (1108.8) -1156.1 (876.66)

-1749.99 (517.42) -2004.45 (491.63)

-1269.00 (695.99) -1593.87 (741.56)

-411.11 (644.94) -977.55 (643.09)

726.59 -1216.63 -302.93 -1303.93 (779.96) (769.63) (911.30) (427.43) 84.97 -1135.18 -825.08 -1248.07 (926.07) (849.69) (952.01) (437.63) Including Lowest Probability Bin 455.02 -1851.27 -765.03 -1890.1 (770.67) (737.83) (947.28) (399.59) -505.34 -2282.38 -1471.57 -2051.4 (911.10) (762.06) (863.60) (398.61)

-1091.06 (600.09) -1421.89 (650.42)

-912.84 (528.85) -1275.7 (511.59)

270.58 (805.11) -375.59 (1057.2)

-1939.3 (630.75) -2526.2 (627.09)

-2286.0 (1097.8) -2186.9 (934.91)

Note: Bootstrapped standard errors in parentheses. 1000 Bootstrap replications.

26