Educational Attainment and Labor Market Outcomes - Cirano

1 downloads 0 Views 1MB Size Report
Pedro Carneiro∗, Karsten T. Hansen† and James J. Heckman‡. October 13, 2000§ .... As documented in Heckman and Robb. (1985, 1986) ...... [72] Rosenbaum, Paul R., and Rubin, Donald B. “The Central Role of the Propensity Score.
Educational Attainment and Labor Market Outcomes: Estimating Distributions of The Returns to Educational Interventions Pedro Carneiro∗, Karsten T. Hansen† and James J. Heckman‡ October 13, 2000§ May 22, 2001 Revised, September 27, 2001



Address: Department of Economics, University of Chicago, 1126 E. 59th Street, IL 60637. Email:[email protected]. † Address: Department of Economics, The University of Chicago, 1126 East 59th Street, Chicago, IL 60637. E-mail: [email protected]. ‡ Address: Department of Economics, The University of Chicago, 1126 East 59th Street, Chicago, IL 60637, and The American Bar Foundation. E-mail: [email protected]. This research is supported by NSF 97-09-873, SES-0099195, and NICHD-40-4043-000-85-261. Heckman’s work was also supported by the American Bar Foundation and the Donner Foundation. § Previous versions of this paper were given at the Midwest Econometrics Group, October 2000, Washington University St. Louis, May, 2001 and the Nordic Econometrics Meetings, May, 2001. A version of this paper was presented as the Klein Lecture at the University of Pennsylvania, September, 28, 2001.

1

1

Introduction Heterogeneity in responses to interventions is a central feature of economic data. Wher-

ever it has been sought, it has been found.1 Yet the consequences of this heterogeneity for economics, econometrics and the evaluation of social programs have not been fully explored. Many social programs are directed toward reducing inequality but the measures used to evaluate them are based on means and so can only account for between-group inequality. A more complete evaluation of these programs accounts for their distributional consequences for participants and for the overall society.2 This paper presents methods for estimating the distributions of the economic rates of return to schooling among different schooling groups accounting for self selection and attrition. The methods we develop apply to estimating the distributions of any treatment effects. We extend ideas first developed by Aakvik, Heckman and Vytlacil (1998, 2002) and show that factor structure models can estimate joint distributions of counterfactuals in a full multivariate setting. The methodology developed in this paper extends previous work in the economics of education and the evaluation of social programs in four ways.

• Accounting for self selection using several models of dynamic discrete choice, we 1 2

See the evidence summarized in Heckman (2001a,b). See Heckman, Smith and Clements, (1997).

2

formulate and estimate computationally tractable models for vectors of potential outcomes indexed by choice of a particular type of schooling and by the age at which the outcomes are measured. Our framework generalizes the Roy model to a dynamic setting. • Matching is a low cost computational method for analyzing interventions with multiple outcomes. (Robins and Rodnitzky, 1999; Gill and Robins, 2001). Our model extends matching by allowing the variables that produce conditional independence in matching to be unobserved by the analyst, so that we relax standard matching assumptions and yet still produce estimates of high dimensional outcome models. In our framework, it is not necessary to make the extreme informational assumptions assumed by matching models that analysts know all of the information relevant to decisions that the agents they model know. • Our framework allows us to identify effects at the margin. Economics is all about choices of people at the margin. Yet standard econometric and statistical methods focus on averages. With our methods, we are able to identify average and marginal responses to changes in constraints and opportunities for people at a variety of margins. We extend the marginal treatment effect (MTE) analysis of Heckman and Vytlacil (1999; 2000, 2001, 2002) to the vector setting. Contrary to matching, our methods

3

allow us to distinguish between the marginal effect of a program and the average effect in a multi-outcome setting. We apply our methods to consider the estimation of distributions of returns to schooling. Heterogeneity in the return to schooling was a cornerstone of early research on human capital by Mincer (1974). Using the familiar semilog earnings function,

`n yi = αi + β i Si + εi

(1)

where yi is earnings, Si is schooling and where β i is the “rate of return” for person i, his model assumes that

β i⊥ ⊥ (Si , εi ); (Si ⊥ ⊥εi )

and E(εi ) = 0

(where “⊥⊥” denotes statistical independence). He establishes that dispersion in β i , the “rate of return to schooling”, is an important contributor to overall inequality.34 Heterogeneity in the response to schooling also plays a central role in defining and estimating “the” return to schooling. Much of the literature on estimating “the” rate of return to schooling assumes that β i is a constant, conditional on observables (see, e.g. 3

Heckman, Lochner, and Todd (2001) present evidence against the empirical validity of model (1) for earnings data for recent years. 4 Mincer (1995) presents a recent application of this methodology to explain the growth in wage inequality over the past two decades and relaxes the assumption that β i is independent of Si .

4

Griliches, 1977, and numerous successor papers). As documented in Heckman and Robb (1985, 1986), Heckman (1997) and Carneiro et al. (2001), when responses to schooling vary in the population (so β i has a nondegenerate distribution), a variety of mean rates of return to schooling can be defined, depending on the conditioning sets used, if β i is stochastically dependent on Si . If β i is nondegenerate, but stochastically independent of Si , there is only one mean return and the representative agent paradigm provides an accurate description of earnings. The available evidence in the literature, and the evidence presented in this paper, suggests that rates of return determine schooling levels so no single “effect” of schooling can be defined. (Heckman, 2001a,b). As a consequence of this heterogeneity, standard instrumental variable methods do not identify any economically interpretable average rate of return. (See, Heckman, 1997; Heckman and Vytlacil, 1998, Heckman, 2001; Carneiro et al., 2001). In this case, the method of matching breaks down (see Heckman, 2001 and Heckman and Vytlacil, 2002). Methods for estimating certain mean rates of return have been devised and implemented for estimating the returns to schooling when there are two schooling choices and returns are heterogeneous.5 This paper contributes to the existing literature on the returns to schooling by estimating the distributions of the returns to different schooling categories correcting for 5

Willis and Rosen, 1979, apply these methods.

5

self-selection into schooling and sample attrition. We test the linearity assumption implicit in equation (1). Linearity is a valid description of the means of potential outcomes by schooling levels. As in the original human capital model of Mincer, we find considerable evidence of dispersion in rates of return to schooling. We extend his framework by explicitly modeling the schooling decision jointly with wage and employment outcomes at different ages. We extend the Willis-Rosen (1979) model by considering a variety of schooling choices and by considering vectors of potential outcomes associated with different ages. We revisit the question of comparative advantage in the labor market among educational classes in a setting with many potential outcomes associated with different schooling levels and outcomes. We also explicitly allow for both cognitive and noncognitive skills to affect earnings. We find that these components affect earnings differently at different schooling levels. As an important by-product of our analysis, we also present estimates of the serial correlation of wages by education level correcting for attrition and self-selection. We find fundamentally different serial correlation patterns in wages at different schooling levels. Our evidence questions the validity of the Mincer model as an accurate empirical description of earnings or wages. In addition, β i ⊥/⊥(Si , εi ) and Si ⊥/⊥εi . A richer model is required with “returns” (β i ) determining Si and with the distributions of both potential and actual returns differing across different schooling levels. (Equation (1) implies that the distributions of returns are the same across schooling levels.) We find evidence of

6

self-selection at some schooling levels but not at others. In our richer model of schooling, comparative advantage and self-selection bias are found at some schooling levels but not at others. We test and reject the conventional assumption that earnings dynamics are the same at all schooling levels. (See, e.g. MaCurdy, 1982; Abowd and Card, 1987). Persistence in earnings shocks is much lower for less educated groups than more educated groups and predictability of earnings and returns to schooling are greater for more educated workers. Accounting for self-selection affects estimates of wage dynamics and the predictability of wages. The previous literature on earnings dynamics assumes that schooling is exogenous. For many people, the ex post returns to schooling are negative. Wage levels associated with each schooling level are more predictable than are returns which vary across schooling levels. The wage returns to schooling at higher levels of education are slightly more predictable at the time schooling decisions are made than are the wage returns associated with less schooling. This is a benefit of schooling. The ex post heterogeneity in returns is roughly the same for college and non-college educated youth. However, ex ante it is somewhat easier to predict one’s place in the college outcome distribution than in the noncollege distribution. A central feature of the data is the unpredictability of both wage levels and rates of return at the time schooling decisions are made. 7

We find that a computationally simple semiparametric ordered discrete choice model, as analyzed and estimated by Cameron and Heckman (1998) and extended in Hansen, Heckman and Mullen (2001), is more than adequate to capture completed schooling choices in the data we analyze. The computational complexity required to fit Bellman equations is not required to empirically model life cycle schooling. We use our estimated distributions of returns to formulate a variety of policy counterfactuals that subsume the mean treatment effects used in the literature as special cases of a more general framework. We extend the idea of the marginal treatment effect of Björklund and Moffitt (1987) and Heckman and Vytlacil (1999, 2000, 2001) to both ordered choice models and general choice models. This paper also contributes to the literature on the effects of cognitive and noncognitive abilities on wages and employment. The existing literature on this topic uses direct measurements of these abilities to gauge their impacts on wages and employment. This paper uses factor structure models that explicitly account for the possibility that measured test scores and behavioral indicators are imperfect proxies of true cognitive and noncognitive skills. To obtain these results, we develop an extension of the LISREL model of Jöreskog (1977) and Jöreskog and Sörbom and Magidson (1979), and Jöreskog and Sörbom (1988) to a dynamic setting with discrete choices and discrete and continuous outcome measures.

8

We present a formal analysis of both nonparametric and parametric identifiability of the model under a variety of assumptions. The plan of this paper is as follows. In section 2 we define policy counterfactuals and present general models for the joint determination of schooling and labor market outcomes. Section 3 presents factor structure models and the ordered choice model we use. Section 4 discusses identification. Section 5 presents the marginal treatment effect for the ordered discrete choice model. Section 6 presents our methods for estimating this high dimensional model. Section 7 presents the data. Section 8 reports our empirical results and their interpretation. Section 9 concludes. We present policy simulations with the model in a companion paper (Carneiro, Hansen, Heckman, 2002).

2

Policy Counterfactuals Associated with each schooling level s is a vector of outcomes at age a for person ω ∈ Ω :

Ys,a (ω)

¯ a = 1, ..., A. ¯ s = 1, ..., S,

(2)

where there are S¯ schooling levels and A¯ ages. Associated with each person ω is a vector X(ω) of explanatory variables that assume identical values across schooling states (Xs (ω) = Xs0 (ω) for all s, s0 ).

9

The ceteris paribus effect of a change in schooling status from s to s0 is

∆((s, a), (s0 , a)) = Ys,a (ω) − Ys0 ,a (ω).

(3)

This is also known as the individual level treatment effect. Since it is usually not possible to observe the same person in both s and s0 , analysts focus on estimating various population level versions of these parameters.6 In this paper, we estimate distributions of potential outcomes (2) and parameters derived from these distributions, including the Average Treatment Effect

ATE((s, a), (s0 , a), x) = E(Ys,a − Ys0 ,a | X = x, S = s) and the Marginal Treatment Effect, defined more precisely below, but defined verbally as the average gain from moving from s to s0 for those on the margin of indifference between s and s0 . We also generalize the policy relevant treatment effects of Heckman and Vytlacil (2001) to a model with more than two outcomes. The policy relevant treatment effects are the effects of various policy interventions on various measures of aggregate outcomes. Associated with each schooling choice is a level of lifetime utility: 6

Heckman and Smith (1998) and Heckman, LaLonde and Smith (1999) discuss conditions under which it is possible to estimate (3).

10

¯ s = 1, ..., S.

Us (ω),

The utilities are assumed to be absolutely continuous. Agents select schooling levels s˜ to maximize lifetime utility:

¯

s˜ = argmax{Us (ω)}Ss=1 .

(4)

s

Associated with choices are explanatory variables Z(ω). This framework is sufficiently general to encompass a variety of choice processes including the sequential dynamic programming models of Eckstein and Wolpin (1999) and Keane and Wolpin (1997) and the ordered choice models of Cameron and Heckman (1998), and Hansen, Heckman and Mullen (2001) as well as unordered choice models. We let Ds = 1 if S¯ X schooling level s is selected and Ds = 1. s=1

In this notation, we define the marginal treatment effect for choices s and s0 more

precisely as

MT Es0 →s (a, U ss0 ) = E(Ys,a − Ys0 ,a | Us = Us0 = U ss0 ≥ Uj ,

j 6= s, s0 )

(5)

It is the average gain to going from s0 to s at age a for persons indifferent between s and s0 given that s and s0 are the best two choices in the choice set. 11

¯ s0 6= s, we may define the marginal treatment Aggregating over choices s0 = 1, ..., S; effect over all origin states as

MT Es (a, {U s0 ,s }Ss0 =1,s0 6=s ) =

S¯ X

s0 =1 s0 6=s

MT Es0 →s (a, U s0 s ) Pr(Us = Us0 = U ss0 ≥ Uj , j 6= s, s0 )

(6)

This is a weighted average of the pairwise marginal treatment effects from all sources states to s with the weights being the relative proportion of the groups at each relevant margin. The policy relevant treatment effect defines the change in a criterion with respect to a policy intervention in X or Z. For a criterion ψ defined on the components of the observables Y

Y =

S¯ X

Ds Ys

S¯ X

Ds ψ(Ys )

s=1

ψ(Y ) =

s=1

the policy relevant treatment effect is defined as

12

(7)

E(ψ(Y ) | X = x, Z = z) − E(ψ(Y ) | X = x0 , Z = z 0 )

(8)

for components of X and Z that can be affected by policies.7 We now present a framework for estimating the distributions of the treatment effects and the parameters derived from them.

3

Factor Structure Models Identifying the joint distribution of counterfactuals is the most challenging task of this

paper. By now, it is well known that in general it is not possible to identify these counterfactual distributions, even from data from experiments which only determine the marginals of the joint distributions. (Heckman, 1992; Heckman and Smith, 1993; Heckman, Smith and Clements, 1997). Frechet bounds on the joint distributions based on the marginals are very wide in practice. (See Heckman and Smith, 1993; Heckman, Smith and Clements, 1997). The strategy adopted in this paper is to postulate a low dimensional set of factors f with a dimension determined by Theorem 1 below, so that conditional on f , the Ys,a and Us are jointly independent. The distributions of the f are nonparametrically identified under 7

Heckman and Vytlacil (2001) only consider policies based on Z.

13

conditions specified below. With these distributions in hand, it is possible to construct the distribution of counterfactuals. Aakvik, Heckman and Vytlacil develop this idea in a one factor model with two potential outcomes. We extend their analysis to a multifactor model with multiple outcomes. We test the adequacy of any postulated factor model using within-sample fit. Under the conditions specified in this section, it is possible with low dimensional factors to identify the counterfactual distributions and to estimate all of the treatment effects defined in Section 2. For specificity, and to present the model that is estimated to produce the empirical results reported in Section 8, approximate utility of schooling choice as function of Z in a linear in parameters form:

Us = Z 0 β s + es

¯ s = 1, .., S.

(9)

Linear approximations to the value function have been advocated by Heckman (1981a), Eckstein and Wolpin (1989) and developed systematically by Geweke and Keane (2001). Following Heckman (1981a), Cameron and Heckman (1987, 1998), and McFadden (1984), write

es = α0s f + εs 14

(10)

where f is a L × 1 vector of mutually independent factors (f` ⊥⊥f`0 , ` 6= `0 ) and

f⊥ ⊥εs E(f ) = 0;

(11)

E(εs ) = 0.

We further postulate that latent potential outcomes are stochastically dependent only through the factors:

∗ Ys,a = X 0 β s,a + α0s,a f + εs,a

(12)

where E(εs,a ) = 0 and

f ⊥⊥εs,a ;

¯ s = 1, ..., S;

¯ a = 1, ..., A,

(13)

and

⊥εs0 εs,a ⊥

⊥(f, εs0 ,a00 ), Xs,a ⊥

¯ all s0 , s = 1, ..., S;

¯ a = 1, ..., A.

¯ a = 1, ..., A. ¯ for all s, a, s0 , a00 , s = 1, ..., S;

(14)

(15)

When the outcome is continuous, the observed value corresponds to the latent variable 15

∗ ∗ (Ys,a = Ys,a ).When the outcomes are discrete (e.g. employment status), we interpret Ys,a

in (12) as a latent variable. In that case, Ys,a is an indicator function:

∗ Ys,a = 1(Ys,a ≥ 0).

One intuitive motivation for the factor representation is that agents observe f (or components that can be described by f ) and act on it i.e. (choose schooling levels). The econometrician does not observe f . Conditional on f , the potential outcomes are independent. If (11), (12), (13), (14) and (15) characterize the model generating the data, we obtain the standard conditional independence assumptions used in matching (see, e.g. Cochran and Rubin (1973), Rosenbaum and Rubin (1983)). In matching it is assumed that

Ys,a ⊥⊥Ds | X = x, F = f. ⊥Ds | X = x, F = f. From this assumption, The parallel assumption for preferences is Us ⊥ we can identify ATE from the right hand side terms which are observed outcomes:

E(Ys,a − Ys0 ,a | x, f ) = E(Ys,a | x, f, Ds = 1) − E(Ys0 ,a | x, f, Ds0 = 1) and treatment on the treated and ATE and MTE are the same parameter conditional on 16

f and x. (Heckman 2001; Heckman and Vytlacil, 2001). Our framework differs from matching by allowing the factors that generate conditional independence to be unobserved by the analyst. We integrate out the f using a random effects estimator and estimate factor loadings and distributions of the factors. Factor structure models are notorious for being identified by arbitrary normalization and exclusion restrictions. To reduce this arbitrariness, and render greater interpretability to the estimates, we adjoin a measurement system to choice equation (9) and outcome system (12). Having measurements on the factors also facilitates identifiability as we demonstrate below. We consider a system of L measurements on the K factors, initially assumed to be on continuous outcome measures:

M1 = µ1 (x) + β 11 f1 + .... + β 1K fK + ε1 .. . ML = µL (x) + β L1 f1 + .... + β LK fK + εL

(16)

εM = (ε1 , ..., εL ), E(εM ) = 0 and where we assume f = (f1 , .., fK )⊥ ⊥(ε1 , ..., εL ), f ⊥ ⊥εs , f⊥ ⊥εs,a , εs,a ⊥ ⊥εM ⊥⊥εs , εi ⊥⊥εj . For interpretability, we assume fi ⊥ ⊥fj , ∀i 6= j, i, j = 1, ..., K. 17

We consider the case with discrete measurements on latent continuous variables below. This measurement system can be specialized in various ways. One specialization devotes a separate factor to each type of measurement. In our empirical analysis, one system of measurements is on ability test scores and it is natural in the context of ability measurement to associate a latent factor (“g”) to explain these test scores. A second system of measurement is based on noncognitive skills building on work by Heckman and Rubinstein (2001). A third system is based on wage measures over time for different schooling levels. We consider other specializations below after we discuss the general problem of identifiability. Measurement system (16) allows for erroneous measures of true ability, say f1 . Thus we are not committed to the infallibility of test scores as a measurement of ability. The measurement system allows for fallible test scores, and if the variance of the εM can be identified, we can measure the fallibility. 3.1 Choice Equations In other work, we analyze two versions of choice system (4) and (9). The first is an unordered discrete choice model with a factor structure. The second is an ordered choice model as developed by Cameron and Heckman (1998) and extended in Carneiro, Heckman and Vytlacil (2001) and Hansen, Heckman and Mullen (2001). We focus attention on the ordered model in this paper because it is adequate to describe our data. (See Hansen, 18

Heckman and Mullen (2001) for a comparison among alternative models of completed schooling).

3.2. Choice Model

In the ordered discrete choice analysis, we introduce a specific notation for the index of utility I :

I = Zη + εV εV

= γf + εI

σ 2V = γ 0 where E(ε2I ) = σ 2I , and

P

f

X

f

γ + σ 2I

is the covariance matrix of f.

Ds = 1 ⇔ cs−1 < I ≤ cs

s = 2, ..., S¯ − 1

D1 = 1 if -∞ < I ≤ c1 . Ds¯ = 1 if cs¯ < I < ∞. 19

(17)

It is required that cs ≥ cs−1 for all s ≥ 2. We parameterize the cs to be functions of transition-specific regressors

cs = Qs ρs

(18)

where we restrict cs ≥ cs−1 . We could also follow a suggestion in Heckman, LaLonde and Smith (1999) and incorporate one sided shocks ν s and work with thresholds c˜s in place of cs :

c˜s = cs + ν s

s = 1, ..., s¯

(19)

where ν s ≥ ν s−1 and vs ≥ 0. See Hansen, Heckman and Mullen (2001). ¯ = Support (Z), Conditioning on Qs , and assuming support (Z | Qs = qs , s = 1, ..., S) we can apply the conditions presented in Cameron and Heckman, to identify η, c1 , ...., cs¯ up to scale σ V . We can also nonparametrically identify the density of εV and the densities of the transition-specific shocks.8 Thus we can nonparametrically identify cs (Qs ) over the 8

Assuming ν s ⊥ ⊥ν s0 , s 6= s0 , ν s ⊥ ⊥εV , ν s ⊥ ⊥(Z, Q) this model is identified under the assumptions in Cameron and Heckman (1998) even without any exclusion restrictions, so Qs can just include an intercept). The proof is trivial. Normalize ν 1 = 0. From the first choice we compute, Pr(D1 = 1 |Z ) = Pr(Zη + εV ≤ c1 ) so we can identify f (εV ), and η up to scale σ 2V . We suppress the intercept in Z. One cannot distinguish the intercept from c1 . Proceeding to further choices we obtain

20

support of Qs . Unlike the case of the unordered discrete choice (see Elrod and Keane, 1995; Ben Akiva et al., 2001), without further restrictions on the distribution of εV , we cannot identify the factors generating εV using only choice data. 3.3 Rationalizing The Ordered Discrete Choice Model By An Economic Model

The ordered discrete choice model can be rationalized by the following life cycle model of schooling. Let the direct costs of schooling, given Z = z, be denoted by k(s |z ), which is assumed to be weakly convex, increasing in years of schooling, and a function of family

Pr(D1 + D2

= 1 |Z) = Pr(Zη + εV ≤ c2 + ν 2 ) = Pr(εV − ν 2 ≤ c2 − Zη).

Therefore we can identify f (εV − ν 2 ) and c2 up to scale σ εV −ν 2 . The scale is determined by the first µ 2 ¶1/2 σ ε +σ 2ν 2 σ εV −v2 V = by taking the ratio of the normalization (relative to σ εV ). We can estimate σε σ2 V

εV

normalized η from the second choice probability to the normalized ratio of η from the first choice probability for any coordinate of η. Define ψ(X) as the characteristic function of X. From the assumed independence of V and V2 . ψ(εV − ν 2 ) = ψ(εV )ψ(−ν 2 ).

Therefore we can identify ψ(εV − ν 2 ) = ψ(−ν 2 ), ψ(εV ) and we can determine f (−ν 2 ) from the convolution theorem adopting a normalization for σ 2V . Proceeding sequentially, we obtain Pr(D1 + D2 + D3 + ... + Dk = 1 |Z ) = Pr(Zη + εV ≤ ck + ν k ), and can identify ck and f (ν k ) up to the normalization given in the first step. See Hansen, Heckman and Mullen (2001) for further details. Nowhere do we use the assumption that Qs contains regressors.

21

resources and background characteristics as well as any external schooling subsidies. The Z are assumed to remain the same across schooling transitions. Assume k(1 |z ) = 0. The discounted lifetime return to schooling is R(s). Assume that R(s) is concave and increasing in s and that R(0) > 0. As R(s) is the discounted lifetime return to s years of school, R(s) implicitly includes in its definition both postponed and foregone earnings and it also implicitly embeds discount factors. Optimal schooling for an individual is the solution to the maximization problem:

Max{R(j) − k(j |z )}, j

¯ j = 1, ..., S.

(20)

Our assumptions on return and cost functions guarantee that net returns, R(j) − k(j |z ), are concave in j for all j and positive for at least the initial schooling state (since R(0) > 0 and k(0 |z ) = 0). Ignoring ties, the optimal solution for years of schooling will be unique and positive. To account for heterogeneous choices, we introduce an omitted variable into the model that affects the ratio of returns to costs. Let the scalar random variable r represent a person-specific shifter of the return relative to the cost that is observed and acted on by the individual but not necessarily observed by the analyst. We make two assumptions at this point to facilitate estimation. First, we maintain that r is independent of Z. Second 22

we assume that the relative cost function depends on observed and unobserved individual effects in a multiplicatively separable way:

k(j |z ) = k(j)ϕ(z)r,

(21)

where E(r) = 1; r ≥ 0, and ϕ(z) ≥ 0. At the optimal schooling level, net returns must be positive and at least as large as net returns at s − 1 or s + 1:

R(s) − k(s)ϕ(z)r > 0,

(22)

R(s) − k(s)ϕ(z)r ≥ R(s − 1) − k(s − 1)ϕ(z)r, R(s) − k(s)ϕ(z)r ≥ R(s + 1) − k(s + 1)ϕ(z)r.

Thus, the value of r for a person taking s years of schooling satisfies the following inequalities:

R(s) − R(s − 1) R(s + 1) − R(s) ≥r≥ [k(s) − k(s − 1)]ϕ(z) [k(s + 1) − k(s)]ϕ(z) and

[R(s)/k(s)ϕ(z)] ≥ r.

23

(23)

The second inequality ensures that net returns are positive at the optimal level of schooling and will always be satisfied at the optimum since net returns for the initial state are positive by assumption. The first inequality comes from the conditions that guarantee that net returns are maximized at s and has a simple interpretation: R(s) − R(s − 1) is the marginal return of s years of school and [k(s) − k(s − 1)]ϕ(z) is the marginal cost for an individual with endowment z. Hence, r is bounded by the ratio of marginal return to marginal cost. This justifies our interpretation of r as affecting the ratio of returns to costs.9 Assuming that r is continuously distributed, the fraction of the population with observable characteristics z who choose s years of schooling is

µ

¶ R(s + 1) − R(s) R(s) − R(s − 1) Pr(S = s |Z = z) = Pr ≤r≤ . [k(s + 1) − k(s)]ϕ(z) [k(s) − k(s − 1)]ϕ(z)

(24)

Introduce the function `(s) to simplify the notation:

exp(`(s)) =

R(s + 1) − R(s) , [k(s + 1) − k(s)]

¯ s = 1, ..., S,

so that 9

The weak convexity imposed in defining this model is not an intrinsic feature of the schooling choice problem and is relaxed in Heckman, Lochner and Todd (1996).

24

¶ exp(`(s − 1)) exp(`(s)) ≤r< . Pr(S = s |Z = z) = Pr ϕ(z) ϕ(z) µ

Writing εV = − log(r), ϕ(z) = exp(−zη) and −`(s) = cs = Qs ρs we obtain representation (17) and (18). Adding one-sided shocks ν s as in (19) preserves weak convexity and provides us with an additional margin of flexibility in estimating the model, allowing for spell-specific innovations in information sets. We next turn to an analysis of identification of the model.

4

Identification of Semi-parametric Factor Models In this paper, we develop nonparametric identification of the ordered models with an

adjoined measurement system. Parametric identification analysis is straightforward, but tedious, and we delete it for the sake of brevity. Under standard conditions (see, e.g. Heckman and Honoré, 1990); Heckman, 1990), we can identify subsets of the population (for specified X, Z, Qs−1 values) such that Pr(Ds = 1 | X, Z, Qs , Qs−1 ) = 1. For such subsets, there is no selection problem, and we obtain measurement system (16) and associated wage and employment equations, which are part of the overall measurement system. Formally these subsets are obtaining by varying Qs and Qs−1 to obtain the required unit probabilities. We assume this is possible by taking

25

appropriate limits of Qs , Qs−1 for all X. We develop a parallel analysis for the case where it is not possible, and analysts possess information about distributions of unobservables. Assume

¯ Z = z) = Support (X) Support (X | Qs = qs , s = 1, ..., S,

(A-1)

so that we can obtain estimates of (16) over its full support. Our measurement system can be specialized in the following way. For each schooling ¯ behavioral indicators and a vector of group s, we obtain a vector of T¯ test scores, B A¯ wages for each schooling group. The first T¯ components of (16) correspond to the ¯ components correspond to behavioral measures. The final ability measures. The next B SA measures correspond to potential wages for the S¯ schooling groups. For simplicity, we ignore the employment equations in our exposition although in the estimation we include E¯ potential employment equations for each schooling group. The means of the measurement system µ1 (x) are identified over the support of X by virtue of the assumption that X⊥⊥f. Following Anderson and Rubin (1956), we adopt a triangular structure of normalizations.

26

(ability)

β 1,1 = 1; β i,1 unrestricted, i = 2, .., T¯, β i,j = 0 j ≥ 2, ..., K, i = 1, ..., T¯

β i,2 = 0 β i,3 = 0 (behavior)

β T¯+j,1 unrestricted

¯ j = 1, ..., B,

β T¯+1,2 = 1 β T +j,2

(wage s = 1)

unrestricted,

j = 2, ..., B

β T +j,3 = 0,

¯ j = 1, ..., B

β T¯+B+j,` unrestricted, ¯

¯ ` = 1, 2 j = 1, ..., A,

β T¯+B+1,3 =1 ¯

¯ j = 2, ..., A.

β T¯+B+j,3 unrestricted ¯

(wage s = k + 1, k > 0) β T +B+kA+j,` unrestricted

¯ ` = 1, 2, 3 j = 1, ..., A, k = 1, .., S¯ − 1

This system imposes the restriction that f1 and f2 affect measured ability and behaviour the same way at all education levels. It is a strong restriction that states that the mapping from the factors to the measurements (ability, behavior) does not depend on the level of schooling, schooling has no effect on ability. We also estimate models with a more general framework with the coefficients different for each schooling subsystem. This framework 27

allows schooling to affect the mapping from true ability to measured ability. We control for the endogeneity of schooling by jointly estimating the measurement equation and the schooling choice system. (See Heckman, Hansen and Mullen, 2001 and Winship, 2001 for evidence on this question). Without any loss of generality, we can normalize the wage system on any schooling level. ¯ + A¯ measurements. Our first model Attached with each schooling level is a system of T¯ + B forces the measurement system on test scores and behavior to have the same factor loading at all schooling levels. Our second model allows these coefficients to be different across schooling categories, and so allows for schooling to affect test scores as well as allowing latent ability to affect both test scores and schooling. In both systems we control for the endogeneity of schooling on wages. In the second, more general system we impose the following the normalizations super¯ scripting the coefficients by s, s = 1, ..., S.

28

(ability)

β s1,1 = 1; β i,1 unrestricted i = 1, .., T¯ j ≥ 2, ..., K, i = 1, ..., T¯

β si,j = 0

(behavior) β sT +1,2 unrestricted

¯ j = 1, ..., B

β sT +j,2 , = 1;

(wage s)

β sT¯+j,2 unrestricted

¯ j = 2, .., B

β sT +j,3 = 0,

¯ j = 1, ..., B

β sT¯+B+j,` , unrestricted, ¯

¯ ` = 1, 2 j = 1, ..., A,

β sT¯+B+1,3 =1 ¯

¯ for some j = 2, ..., A,

for some s, s = 1, ..., S¯ β sT¯+B+j,3 unrestricted ¯ other s unrestricted. This system allows the effect of the latent factors, f1 , f2 , f3 on measured ability and behaviour to be moderated by schooling. A theorem of Bekker and ten Berge (1997) informs us that the variances of the uniquenesses are identified under the following conditions: ¯ are generically identified if the Ledermann Theorem 1 Var(εi ), i = 1, .., S¯A¯ + T¯ + B, (1937) bound is satisfied: 29

K
1, s < S)

Πs−1 (Q, Z) = FV (cs−1 + Zη) < UD ≤ FV (cs + Zη) = Πs (Q, Z) where UD ∼ U (0, 1) and Πs = Pr(S ≤ s |Q, Z), and

S = 1 if Π1 (Q, Z) < UD S = S¯ if ΠS¯ (Q, Z) ≥ UD .

Writing Ya the outcome at age a observed for a person as

Ya =

S¯ X

Ds Ys,a ,

s=1

we obtain 40

E(Ds Ya |Πs−1 (Q, Z), Πs (Q, Z)) = E(Ya,s |S = s, Πs−1 (Q, Z), Πs (Q, Z))(Πs (Q, Z) − Πs−1 (Q, Z)) ΠsZ(Q,Z) E(Ys,a |UD = u)du . = Πs−1 (Q,Z)

Now it follows that

∂E(Ds Ya | Πs−1 (Q, Z), Πs (Q, Z)) = −E(Ys,a | UD = Πs−1 (Q, Z)) ∂Πs−1

∂E(Ds Ya |Πs−1 (Q, Z), Πs (Q, Z)) = E(Ys,a |UD = Πs (Q, Z)) ∂Πs

∂E(Ds−1 Ya | Πs−2 (Q, Z), Πs−1 (Q, Z)) = E(Ys−1,a |UD = Πs−1 (Q, Z)) . ∂Πs−1 Thus

41

E(Ys,a − Ys−1,a |UD = Πs−1 (Q, Z)) ∂E(Ds Ya | Πs−1 (Q, Z), Πs (Q, Z)) ∂Πs−1 ∂E(Ds−1 Ya | Πs−2 (Q, Z), Πs−1 (Q, Z)) ∂Πs−1

= -

(31) (32)

= MT E

Extensions to account for the transition specific ν S are straightforward. We next consider how to estimate this model using Markov Chain Monte Carlo methods.

6

Markov Chain Monte Carlo Simulation Methods Due to the complex nature of the likelihood function we will rely on Markov Chain Monte

Carlo techniques to estimate the model. These are computer-intensive algorithms based on designing an ergodic discrete time continuous state Markov chain with a transition kernel having invariant measure equal to the posterior distribution of the parameter vector θ. See Robert and Casella (1999). In particular, we will be using the Gibbs sampling algorithm.10 There are several advantages of using a Gibbs algorithm to estimate the present model. 10

For other uses of Markov chain Monte Carlo techniques in models and applications related to ours, see Chib and Hamilton (2000) who implements MCMC methods for a panel version of a generalized Roy model and Chib and Hamilton (1999) who considers various cross sectional treatment models.

42

First of all, since this algorithm is based on cycling through each of the conditional posterior distributions derived from the posterior of the full parameter vector, we can achieve a substantial computational simplification by enlarging the state variables θ by the vector of factors f . To see this observe that conditioning on f effectively decouples the equation system into a system of independent equations. Second of all, use of the Gibbs algorithm allows us to exploit the latent data structure of the model by including the latent indices underlying schooling choice and labor market transitions in the state variables. This removes the need for computing the high dimensional integrals required to evaluate the schooling choice probabilities and the many evaluations of the normal integral required to compute the probabilities of the different labor market transitions. Third of all, the Gibbs sampler (and Markov Chain Monte Carlo algorithms in general) enjoy an advantage over other simulation methods when using models involving mixtures. We will be using mixtures of normal distributions as a distributional assumption on the factors and on the wage distributions in the model. While mixtures of normals are able to approximate a large set of different distributions they are also notoriously difficult to estimate due to the potentially large number of local maxima of the likelihood function. Using the Gibbs sampler coupled with a data-augmentation technique we can circumvent these problems, see below for details. (a) The ordered choice model

43

We first describe the Markov Chain Monte Carlo algorithm used to simulate the posterior distribution of the parameters in the ordered choice model. Let θs,a be parameters specific to the distribution of outcomes with schooling level s at age a, let θm be parameters specific to the distribution of measurements, let θc be parameters specific to the distribution of schooling choice and let θf be parameters specific to the factor distributions. Let n be the number of observations. The outcome matrix over all ages with schooling level s is Ys∗ and the vector of measurements is Ym . The complete data likelihood for completed schooling level S = s is n ¡ ¢ Y ¡ ¢ ∗ ∗ p Ym , Ys , I, f |θ = p Ym,i , Ys,i , Ii , fi |θ i=1

where “i” denotes a subscript for individual i and

∗ p(Ym,i , Ys,i , Ii , fi |θ)

¯ A Y

∗ p(Ys,a,i |θs,a , fi )p(Ii |fi , θc )p(fi |θ).

S¯ Y

p(Ym , Ys∗ , I, f |θ)p(θ)

= p(Ym,i |θm , fi ) ×

a=1

The complete data posterior is



p(Ym , Y , I, f, θ|data) ∝

s=1

44

where Y ∗ = (Y1 , ..., YS¯ ). What follows the conditional posteriors that constitute the transition kernel of the Gibbs sampler will be derived. Many of the conditional posteriors in the following will be posteriors originating from a regression model of the form,

y = x0 β + α0 f + ε,

where f is a vector of factors and ε is normal with mean zero and precision τ . If we specify a prior of the form ³ ´ p(β, α) = N 0, Λβα , where the precision matrix is 



0  τ β IK , Λβα =    0 τ α IL then well known results (see Zellner, 1971) give

p(β, α|τ , y, x) = N(µα,β , Σαβ ),

45

where the precision matrix is Σαβ = Λαβ + τ Z 0 Z, where Z = (X f ) and 0 µαβ = Σ−1 αβ τ Z Y.

Similarly with a gamma prior on τ , G(τ |a, b), standard results give

p(τ |α, β, y, x) = G(τ |a + n/2, b + (1/2)

X (yi − x0i β − α0 fi )2 .

Hence we can sample from the conditional posteriors by first sampling β, α|τ from a multivariate normal distribution and then τ |β, α from a gamma distribution. By letting the prior precision τ β approach zero we can express ignorance about the slope parameters. Throughout we keep a proper but diffuse prior on α, i.e. τ α positive but small to ensure a proper posterior. More specific values of prior parameters are given in the presentation of the results.

Choice equations

Conditional on the factors we have

46

¯ p(η, γ, ρ ¯θ−(η,γ,ρ) , f ) ∝

( n Y

p(Ii |Zi η + γ 0 fi , 1)

( i=1 s¯ X j=1

)

)

1(ci,j−1 < Ii < ci,j )Di,j }1(ci1 < · · · < ci¯s )p(η, γ)p(ρ)

(33)

This marginal can be factored into two conditionals. Conditional on ρ we

p(η, γ|ρ, θ−(η,γ,ρ) , f ) ∝

n Y i=1

p(Ii |Zi η + γ 0 fi , 1)p(η, γ).

This is the posterior for a normal regression model with covariates Zi , fi and precision fixed at one. From above it follows that this is a multivariate normal distribution. The second conditional (for ρ) is

p(ρ|η, γ, θ−(η,γ,ρ) , f ) ∝

s¯ n X Y i=1 j=1

1(ci,j−1 < Ii < ci,j )Di,j 1(ci1 < · · · < ci¯s )p(ρ)

We sample ρs one at a time conditional on the ρ1 , . . . , ρs−1 , ρs+1 , . . . , ρs¯. The conditional for ρs is

47

¯ p(ps ¯ρ−s , η, γ, θ−(η,γ,ρ) , f ) ∝ ×

Y

Y

1(ci,s−1 < Ii < Qis ρs )

i:si =s

1(Qi,s ρs < Ii < cis+1 )

i:si =s+1

As a prior for ρ we choose p(ρ) = distribution with very large support.

n Y

1(cis−1 < Qis ρs < cis+1 )p(ρ)

(34)

i=1

Q

s

U(−K, K) where K = 1000, i.e., a uniform

Let Ks be the number of elements in Qs . We sample ρhs , h = 1, . . . , Ks one at a time. The conditional for ρhs is a uniform distribution with boundary points which can be derived from a series of inequalities. Without loss we can assume that Qihs is positive. From (32) it follows that

ρhs

n o Ii − c˜is cis−1 − c˜is > max maxi:si =s , maxi , −K ≡ ghs Qihs Qihs

n Ii − c˜is cis+1 − c˜is o ρhs < min mini:si =s+1 , mini , K ≡ g¯hs , Qihs Qihs where c˜is =

PKs

j=1,j6=h

Qijs ρjs . Hence ρhs is uniform with boundaries (g hs , g¯hs ).

For some specifications the resulting Markov chain shows very slow mixing of the ρ parameter. This is a well known problem in ordered choice models, see Albert and Chib (2001), 48

and Liu (2001). To keep run-times reasonable we add a generalized Gibbs step.11 This has the form of a group transformation from the parameters (I, η, γ, ρ) to (hI, hη, hγ, hρ), where h > 0. The group transformation scalar parameter h is generated as h =

√ r where

³ n + K + K + L Pn (I − Z η − γ 0 f )2 + τ η 0 η + τ γ 0 γ ´ i i i η γ c , i=1 , r∼G 2 2 where K is the dimension of η and KC is the dimension of ρ. Conditional on the factors we have

p(I|θ−(η,γ,ρ) , f ) ∝

n nY i=1

s¯ ©X ¡ ¢ ª p(Ii |Zi η + γ fi , 1) 1 ci,j−1 < Ii < ci,j Di,j . 0

j=1

This factors into n independent truncated normals,

p(I|θ−(η,γ,ρ) , f ) =

n Y i=1

TN(ci,j−1 ,cij ) (Ii |Zi η + γ 0 fi , 1).

So we sample Ii , i = 1, . . . , N, one at a time from truncated normals. Measurement equations

Let Ym,i be the whole vector of measurements for individual i. By the factor structure 11

See Liu and Sabatti (1998) for the general theory of generalized Gibbs sampling and an example for the simple ordered choice model with constant cut-offs.

49

assumption these measurements are conditionally independent given the factors. Let the first m1 elements of Ym,i be continuously observed measurements (e.g., achievement test scores), and let θm,j , j = 1, . . . , m1 be the parameters of these elements. Then n Y ¡ p(θm,j |θ−θm,j , f ) ∝ p(θm,j ) p Ym,i,j |θm,j , fi ),

j = 1, . . . , m1 .

i=1

0 where p(Ym,i,j |θm,j , fi ) is the density for a normal distribution with mean Xm,i,j β m,j +α0m,j fi

and precision τ m,j . With a gamma prior on τ m,j and a multivariate normal prior on (β, α)m,j this is sampled as described above. Let the last m−m1 elements of the measurement vector Ym,i be binary indices generated as ∗ Ym,j = 1(Ym,j ≥ 0),

j = m1 + 1, . . . , m.

The parameters in the binary measurements are samples as above with two exceptions. ∗ , as First, a separate step samples the latent measurements, Ym,j

∗ Ym,i,j ∼

  ¡   TN(0,∞) Y ∗

0 m,i,j |Xm,i,j β m,j

¢ + α0m,j fi , 1

if Ym,i,j = 1,

  ¢ ¡ ∗  0 TN(−∞,0) Ym,i,j |Xm,i,j β m,j + α0m,j fi , 1 if Ym,i,j = 0.

Second, the precision τ m,j is not sampled but fixed at one.

50

Outcome equations

Let Ys,a be the outcome vector at age a with schooling level s. In the application Ys,a consists of an employment outcome and a wage if the individual is working. Let Y1,s,a be the ∗ wage outcome and Y2,s,a the employment outcome. Also let Y2,s,a be the latent employment

index. By the factor structure assumption we have

∗ ∗ p(Y1,s,a , Y2,s,a |f ) = p(Y1,s,a |f )p(Y2,s,a |f ),

for a person working. The model for wages is

1 0 Y1,s,a,i = X1,a,i β 1,s,a + α01,s,a fi + √ ε1,a,s,i , τ 1,a,s

i ∈ ρs,w,a ,

where ρs,w,a is the set of indices for people with schooling s working at age a, and

ε1,a,s,i ∼

Ks X j=1

¡ ¢ pj,s φ ε1,a,s,i |µj,s , τ j,s

We shall use the “mixture group-indicator” approach (Diebolt and Robert (1994)) to facilitate computation using mixtures. Let g1,a,s,i = j if ε1,a,s,i originates from mixture

51

component j. Conditional on the mixture group indicators g we have

1 1 0 Y1,s,a,i = X1,a,i β 1,s,a + α01,s,a fi + √ µg1,a,s,i ,s + √ ε∗ , τ 1,a,s τ 1,a,s τ g1,a,s,i ,s 1,a,s,i where ε∗1,a,s,i ∼ N(0, 1). So conditional on g this is a normal regression model with a heteroscedastic covariance matrix. We can transform this to

1 1 √ √ √ 0 τ g1,a,s,i ,s (Y1,s,a,i − √ µg1,a,s,i ,s ) = τ g1,a,s,i ,s X1,a,i β 1,s,a + τ g1,a,s,i ,s α01,s,a fi + √ ε∗ , τ 1,a τ 1,a,s 1,a,s,i

or 1 0 ˜ 1,a,i β 1,s,a + α01,s,a f˜i + √ ε∗ , Y˜1,s,a,i = X τ 1,a,s 1,a,s,i where

1 √ µ ), Y˜1,s,a,i = τ g1,a,s,i ,s (Y1,s,a,i − √ τ 1,a g1,a,s,i ,s ˜ 1,a,i = √τ g1,a,s,i ,s X1,a,i , X √ f˜i = τ g1,a,s,i ,s fi

This is now in the form of the usual normal regression model which is sampled as above. To preserve identification we fix τ 1,a,s to one at the first age a = 1.

52

We can allow for general state dependence by modelling the latent employment transition indices as

∗ = Y2,s,a,i

    X 0

2,a,s,i β 2,a,s,0

+ α02,a,s,0 fi + ε2,a,s,i,0 , if Y2,s,a−1,i = 0,

   0 X2,a,s,i β 2,a,s,1 + α02,a,s,1 fi + ε2,a,s,i,1 , if Y2,s,a−1,i = 1,

where ε2,a,s,i,0 and ε2,a,s,i,1 are both standard normal. The conditional of (β 2,a,s,0 , α2,a,s,0 ) and (β 2,a,s,1 , α2,a,s,1 ) is ¡ p β 2,a,s,0 , α2,a,s,0 |θ−β 2,a,s,0 ,α2,a,s,0 ) ∝ ¡ p β 2,a,s,1 , α2,a,s,1 |θ−β 2,a,s,1 ,α2,a,s,1 ) ∝

Y

¡ ∗ ¢ 0 p Y2,s,a,i |X2,a,s,i β 2,a,s,0 + α02,a,s,0 fi , 1

Y

¡ ∗ ¢ 0 p Y2,s,a,i |X2,a,s,i β 2,a,s,1 + α02,a,s,1 fi , 1

i:Y2,s,a−1,i =0

i:Y2,s,a−1,i =1

Both of these are normal regression models with the precision fixed at one. The latent employment indices are sampled as in the usual binary choice framework (see Albert and Chib (1993)). To sample the mixture parameters for wages at schooling level s note that conditional on β 1,s,a , α1,s,a , τ 1,a , f ,

ε1,a,s,i =

√ 0 τ 1,a (Y1,s,a,i − X1,a,i β 1,s,a − α01,s,a fi ),

53

i ∈ ρs,w,a , a = 1, . . . , A,

is known. So we have

ε1,a,s,i ∼

Ks X j=1

¡ ¢ pj,s φ ε1,a,s,i |µj,s , τ j,s ,

i ∈ ρs,w,a , a = 1, . . . , A,

with ε1,a,s,i treated as “known”. As mentioned above we follow the approach in Diebolt and Robert (1994) and augment the parameter vector by a sequence of latent group indicators defined as g1,a,s,i = j if ε1,a,s,i originates from mixture component j. Conditional on the mixture group indicators the mixture parameters are easily sampled and conditional on the mixture parameters the group indicators are simple multinomials. To preserve identification of intercepts we constrain the mixture to satisfy in Richardson et al., (2000).

P

j

pj µj,s = 0 using the method proposed

Factors

The conditional for f factors into n conditionals for f1 , . . . , fn . To see what the condi-

54

tional for fi is note that all contributions of fi originate from linear regression models,

Ii − Zi η = γ 0 fi + εD,i , 0 β m,j = α0m,j fi + εm,j , Ym,j − Xm,i,j

1 1 0 β 1,s,a − √ µg1,a,s,i ,s = α01,s,a fi + √ ε∗ , Y1,s,a,i − X1,a,i τ 1,a τ 1,a τ g1,a,s,i ,s 1,a,s,i ∗ 0 Y2,s,a,i − X2,a,s,i β 2,a,s,l = α02,a,s,l fi + ε2,a,s,i,l ,

(choice model) (measurements) (wages) (employment)

This equation system is of the form

Yˆi = Ai fi + ui ,

where ui ∼ N(0, Σi ), where Σi is a diagonal precision matrix. The conditional posterior for fi is then p(fi |θ) ∝ exp

n

o 1 − (Yˆi − Ai fi )0 Σi (Yˆi − Ai fi ) p(fi ), 2

where p(fi ) =

K` L X Y `=1 j=1

¡ ¢ p`,j φ fi` |µ`,j , τ `,j .

Sampling fi jointly in one block is not feasible. Instead we sample fi` |{fij }j6=` one at a

55

time. To do this first note that we can write

p(fi |θ) ∝ exp

n

o 1 − (fi − fˆi )0 A0i Σi Ai (fi − fˆi ) p(fi ), 2

where fˆi = (A0i Σi Ai )−1 A0i Σi Yˆi . The conditional of fi` is then K` X ¡ ¡ ¢ p(fi` |{fij }j6=` , θ) ∝ φ fi` |ˆ µi` , vˆi` ) p`,j φ fi` |µ`,j , τ `,j , j=1

where

−1 µ ˆ i` = fˆi` − V``,i V`,i (fi,−` − fˆi,−` ),

vˆi` = V`,`,i ,

where V`,`,i is element (`, `) of A0i Σi Ai and V`,i is A0i Σi Ai with row ` and column ` removed. The conditional for fi` is then seen to be a mixture of normals,

p(fi` |{fij }j6=` , θ) =

K` X j=1

¢ ¡ pˆi,`,j φ fi` |ˆ µi,`,j , τˆi,`,j ,

56

where

¡ ¢ pˆi,`,j = p`,j φ µ ˆ i` |µ`,j , τ `,j ,

µ ˆ i,`,j =

vˆi` µ ˆ i` + τ `,j µ`,j , vˆi` + τ `,j

τˆi,`,j = vˆi` + τ `,j .

Conditional on the factor vector f , we have

fi` ∼

K` X j=1

¡ ¢ p`,j φ fi` |µ`,j , τ `,j ,

i = 1, . . . , n.

Conditional on f we can treat the factors as known and update the mixture parameters (p` , µ` , τ ` ) using the same procedure as outlined above. We now turn to the data and estimates of the model.

7

The Data This paper uses data from the National Longitudinal Survey of Youth (NLSY). The

NLSY, designed to represent the entire population of American youth, consists of a randomly chosen sample of 6,111 U.S. civilian youths, a supplemental sample of 5,295 randomly chosen minority and economically disadvantaged civilian youths, and a sample of 57

1,280 youths on active duty in the military. All youths were between fourteen and twentytwo years of age when the first of annual interviews was conducted in 1979. The data set includes equal numbers of males and females. 16% of the respondents are Hispanic and 25% are Black. In 1980, NLSY respondents were administered a battery of ten achievement tests referred to as the Armed Forces Vocational Aptitude Battery (ASVAB) (See Cawley, Conneely, Heckman and Vytlacil (1997) for a complete description). The math and verbal components of the ASVAB can be aggregated into the Armed Forces Qualification Test (AFQT) scores12 . Many studies have used the overall AFQT score as a control variable, arguing that this is a measure of scholastic ability. We argue that AFQT is an imperfect proxy for scholastic ability and use the factor structure to capture this. We also avoid a potential aggregation bias by using each of the components of the ASVAB as a separate measure. The NLSY also contain information about involvement in crime, violence and other pathological behavior. This information consists of self reported yes/no answers to various questions regarding criminal activity and abnormal/violent behavior in school. We use answers to these questions as indicators of non-cognitive ability. For our analysis, we use the random sample of the NLSY and restrict the sample to 12

Implemented in 1950, the AFQT score is used by the army to screen draftees.

58

1589 white males for whom we have information on schooling, several parental background variables, test scores and behavior. Distance to nearest college at each date is constructed in the following way. Take the county of residence of each individual and all other counties within the same state. The distance between two counties is defined as the distance between the center of each county. If there exists a college (2 year or 4 year) in the county of residence where a person lives then the distance to the nearest college (2 year or 4 year) variable takes the value of zero. Otherwise we compute distance (in miles) to the nearest county with a college. Then we construct distance to nearest college at 17 by using the county of residence at 17. However for people who were older than 17 in 1979 we use the county of residence in 1979 for the construction of this variable. Tuition at age 17 is average tuition in colleges in county of residence at 17. If there is no college in the county then average tuition in the state is taken instead. For details on the construction of this variable see Cameron and Heckman (2001). Local labor market variables for the county of residence are computed using information in the 5% sample of the 1980 census. For each county group in the census we compute the local unemployment rate and average wage for high school dropouts, high school graduates, individuals with some college and four year college graduates. We do not have this variable for years other than 1980 so, for each county, we assume that it is a good proxy for local labor market conditions in all the other years where NLSY respondents are assumed to be

59

making the schooling decisions we consider in this paper. The means of the variables by schooling category are displayed in Table 1. Variables are ordered in the expected way. The data show that persons from better backgrounds are more likely to attend school. The relationships between schooling attainment and distance and tuition - two major instrumental variables used in the literature - are as expected. Higher tuition implies lower schooling attainment.

8

Empirical Results Using this data, we estimate a variety of models of educational attainment, ability,

behavior and labor market outcomes. We estimate both the ordered choice model and the more general discrete choice/unordered model. Since the additional flexibility of the unordered model is not required to fit the data on schooling or wages, in this paper we only present evidence from the ordered discrete choice model. (See Hansen, Heckman and Mullen, 2001, for evidence on the unordered choice model). Three factors are adequate to fit the model on test scores, behavioral indicators, choices and seven years of wages measured from age 26 to age 33. We consider four schooling groups: High school dropouts, High school graduates, High school graduates with some college attendance and College graduates. Covariates in the

60

choice index, Zη, include family background variables, geographical region and cohort dummies. The variables in the thresholds, Qs ρs , are (apart from a constant), local labor market measures (local wage and unemployment rate for the relevant schooling category) and tuition and distance to local colleges. Two year tuition is used in the two year transition-specific covariates. Distance to four year colleges is used in the four-year transition covariates. As a measurement system for cognitive ability we use seven components of the ASVAB test battery (Arithmetic reasoning, word knowledge, paragraph composition, math knowledge, general science, numerical operations and coding speed.). We dedicate the first factor to the ability measurement system and exclude factor two and three. (Recall the normalizations presented in Section 4). We include family background variables as additional covariates in the ASVAB test equations. The second measurement system consists of 16 binary outcomes corresponding to yes/no answers to various questions on crime and pathological behavior. The ability factor used in the ASVAB system enters this system as well as a second “behavior” factor. Both of these factors enter the choice model. This specification allows for the possibility that individuals correctly perceive their own cognitive ability and level of crime/violent non-social behavior, and the latent factors are these perceptions.

61

We model log wages as

Ya,s = X 0 β a,s + α0a,s f + σ a,s εa,s ,

(35)

where Ya,s is log-wages at age a with schooling level s, X is a vector of covariates and f is the factor vector with three elements. The first two elements are the ASVAB factor and the behavior factor. The third factor appears only in the wage system and captures any residual serial correlation in wages not captured by the first two factors. This framework allows the latent factors to be priced differently in different sectors. As described in Section 6, we assume the error term εa,s has a mixture of normals distribution, εa,s ∼

J X j=1

¡ ¢ ps,j φ εa,s |µjs , τ js ,

¯ a = 1, . . . , A.

(36)

We estimate a separate wage mixture for each schooling category. Since the employment percentage is higher than 90 percent for the categories high school, some college and college graduates, we estimate employment transitions only for ∗ ≥ 0), where dropouts. Employment is determined by 1(Ye,a,s

∗ Ye,a,s = X 0 β e,a,s + α0e,a,s f + εe,a,s ,

62

(37)

where s = 1 corresponds to dropouts, and εe,a,s ∼ N(0, 1). Finally, we assume that each of the three factors has a mixture of normals distribution,

f` ∼

J` X j=1

¡ ¢ p`,j φ f` |µj` , τ j` ,

` = 1, . . . , L.

(38)

We run the MCMC algorithm described in section 6 for 110,000 iterations, discarding the first 10,000 iterations to allow the chain to converge to its stationary distribution. We retain every 10th of the remaining 100,000 iterations for a total of 10,000 iterations13 . The Markov chain mixes well with most autocorrelations dying out at around lag 25 to 50. The coefficient estimates for the components of the model are presented in an Appendix, available on request. The model fits the within-sample data well. Tables 2a - 2c present goodness of fit tests. Table 2a shows a close fit to predicted wages using an overall goodness of fit tests. Table 2b reveals that the fit of the generalized ordered choice model to data is quite close. Table 2c shows that the predicted schooling probabilities for the ordered choice model and the more general unordered model fit the sample probabilities equally well. Appealing to Occam’s razor, we use the simpler choice model. Figures 1a - 1d show the fit of the wages for each schooling group. The agreement is close as confirmed in formal χ2 goodness of fit tests reported in Table 2. In order to achieve this fit it is necessary to allow 13

The run time was about eight hours on a 800mhz Pentium III PC.

63

for non-normal factors and non-normal uniquenesses. Figure 2a shows the distribution of each of the estimated three factors and compares them with a benchmark normal with the same mean and standard deviation. Although they look normal, they are not normal. A comparison of the ability factor and residualized AFQT is also given. The agreement is close. The distributions of the factors by schooling level are presented in Figures 2b - 2d. There is clear evidence of selection on ability (factor 1) with the less able less likely to attend school. There is less evidence of selection on the other factors. Dropouts are more likely to have higher levels of factor two associated with greater behavioral problems. There is no selection on factor three. In this specification, this factor does not enter schooling choice equations.14 (There is a zero coefficient on factor three in the ordered choice equation). There are two interpretations of this phenomenon: (a) agents do not know factor three or (b) it cancels out in decisions, but is known. The latter could happen in an income maximizing model in which factor three enters additively, and with the same coefficient, in each choice, and cancels out in making comparisons.15 Table 3 presents actual vs. predicted serial correlations in wage residuals across ages for different schooling levels. There is less persistence in wages across ages in the lower schooling levels. One of the benefits of higher schooling is greater predictability of wages. We fit the serial correlations moderately well and cannot reject overall equality although 14 15

In other specifications this factor enters in the choice equation. Thus factor 3 could be a general human capital factor.

64

there is a systematic pattern of underprediction of empirical serial correlations by model serial correlations. Figures 3a - 3d plot estimated counterfactual wage densities. For each schooling group, the predicted density of wages at the indicated schooling level are plotted for persons with the characteristics of each of the four major schooling groups we consider. Thus in Figures 3a, we plot the densities of dropout wages for persons who select into the four schooling groups. The plot for dropouts is based on actual data. The plots for other groups are counterfactuals. These plots reveal that the characteristics of those who drop out make them the worst potential dropouts. College graduates make (slightly) better dropouts. Figures 3b and 3c reveal no evidence of any systematic sorting across schooling groups for persons into high school and the “some college” category. Figure 3d reveals that college graduates are better than dropouts and high school graduates as college graduates. This evidence supports the story of comparative advantage in the labor market advocated by Willis and Rosen (1979). The other figures do not. We return to this point later. Figures 4a-4d reveal that there is little systematic sorting on the distribution of returns to high school (vs. dropout). The only systematic sorting is on the returns to college (see Figures 4c and 4d). The central feature of these figures is the low predictability of returns. Ex post, a substantial portion of people get negative returns to schooling. Table 4 presents counterfactual mean wages and mean returns across all schooling levels and ages.

65

It supports the conclusions based on Figures 3 and 4 of little sorting on levels or returns except for college graduates and high school dropouts. Figures 5a - 5d plot the estimated marginal treatment effects at age 29 for the ordered model that is defined in Section 5. The dropout - high school slope is slightly perverse those who drop out are more likely to benefit from high school (higher UD is associated with higher likelihood of graduating high school). However, one cannot reject the hypothesis that the MTE is flat. Thus the marginal gains are the average gains for this category. Figure 6a shows the MTE for all ages in our sample. For most age groups, the MTE is flat for this transition. A perverse MTE shows up more strongly for the high school - some college transition. This is found across all ages (see Figure 6b). However, the MTE is increasing for the transitions from some college - college and from high school to college across all ages (see Figures 5c, 6c and 5d, 6d). Those more likely to graduate college are more likely to benefit from it. Marginal entrants are below average participants in terms of their wage returns. Table 5 presents evidence on the predictability of wage levels and returns by age, and the sources of predictability. Assuming that agents know the factors, they can forecast wage levels (the top panel) fairly well. The higher the schooling level, the higher the predictability of the wage variance. Even though levels can be predicted well, returns cannot. (See the bottom panels). Predictability of returns increases (slightly) at higher schooling levels.

66

There is a lot of intrinsic uncertainty about returns. This is a central feature of American labor market data. Thus it is not surprising that a substantial fraction of persons get negative returns to schooling at different schooling levels. (See Figures 4a - 4d). Figures 7 report simulations showing how knowledge of the factors affects predictability of wage levels. Knowledge of the first two factors does little to improve predictability of wages. Knowledge of the third factor that does not appear in the choice system greatly improves predictability. This pattern is found across schooling groups. A parallel, but much weaker effect, is found in the effects of the factors on forecasting wage returns. See Figures 8a 8d. Figure 9 reports the benefit of knowing the first factor vs. knowing the AFQT scores in predicting wages. The differences are trivial. The measurement error in the test has trivial consequences largely because factor one has trivial consequences for predictability. The inability to predict returns to schooling has substantial policy implications. Obviously, there are gains to find better predictors for educational outcomes. At the same time, the unpredictability of returns appears to be a central feature of the labor market and suggests that ex post, choices are far from perfect. Table 6 presents evidence on the Mincer model. Consistent with the preceding tables and figures, we find little evidence of selection into schooling based on returns to schooling except for college graduates and high school dropouts. Linearity in terms of years of schooling in terms of potential outcomes is consistent with the evidence.

67

The first five columns of Table 6a present the mean wages of persons who attend different levels of schooling if they are assigned to the schooling levels reported in the left hand column. The overall dropout mean in the first row is higher than the mean of actual dropouts, so dropouts are below average in the distribution of potential dropout outcomes. Dropouts would earn substantially different returns in college than do actual college graduates. (See the bottom row). For other educational levels, there is little evidence of the systematic sorting emphasized by Willis and Rosen (1979). The final columns report the counterfactual average returns to different schooling levels for people on the margin of the choices indicated in the columns at the top. These reveal what persons on the margin would earn in different counterfactual states. There is some indication that marginal college attendees would do slightly better as dropouts than marginal entrants to other schooling categories. The marginal college entrant is below the average college entrant - consistent with the pattern of the MTE shown in Figure 5d. Table 6b present similar information on rates of return and again reinforces the impression of little systematic sorting on returns except for college graduation. Tables 6c and 6d report statistical tests of these propositions and confirm the impression from Tables 6a and 6b. Table 6e compares how well least squares estimates of a dummy variable model approximate the returns to the indicated schooling choices. OLS dummy variable models fit ATE very closely and one cannot reject equality of OLS and ATE estimates. There is no OLS

68

bias for ATE. However, OLS does not recover treatment on the treated, especially in the some college - college choice. There is OLS bias for treatment on the treated, especially in estimating returns to college. This is consistent with selection on gains. The lower panel of the table suggests that linearity of log wages in terms of years of schooling is not a bad description of the data. ATE is fairly well estimated by OLS but Treatment on the Treated (TT) is not. Figure 10a confirms the impression that a linear in parameters Mincer model fits the means of the wage data. In the NLSY, linearity in terms of years of schooling is a valid description of the data. This linearity masks a lot of intrinsic nonlinearity. Table 6b reveals that for persons who choose different schooling levels the returns across categories are far from equal. Table 7 reveals that the factor loadings on the latent abilities differ across schooling groups. Noncognitive skills (associated with Factor 2) receive the highest relative weight in the same college and high school wages. Thus, social skills are rewarded relatively more heavily in occupations that are more vocational in their orientation. Figure 10b shows that a main prediction of equation (1) - that variances in wages should increase with schooling levels - is not consistent with the data. Variances decline with schooling levels. Figure 10c shows the departure of the Mincer model from the actual patterns of wage variances. Our evidence suggests that sorting on returns is much less pervasive in the labor market

69

than the recent literature in labor economics has emphasized. The only serious evidence on sorting is for college graduates. However, heterogeneity in returns and unpredictability are central features of modern labor markets. Ex post many returns are negative, and it is not easy to forecast these returns even if we know the unobserved factors that affect wages, choices and behavioral outcomes.

9

Summary and Conclusions This paper uses low dimensional factor models to generate counterfactual distributions

of potential outcomes. It extends matching by allowing the variables that determine conditional independence to be unobserved by the analyst. Semiparametric identification is established. Applying our methods to an analysis of the returns to schooling, we find that for all schooling categories we analyze, except college graduation, there is little evidence of systematic sorting into schooling based on returns. There is a lot of heterogeneity in ex post returns, and a substantial proportion of the returns for each schooling levels are negative suggesting that agents, like econometricians, have limited ability to forecast their labor market outcomes. Returns for higher schooling levels are less variable and hence more predictable than are returns for lower schooling levels.

70

Mincer model (1) is surprisingly accurate in generating average returns to schooling. OLS provides an accurate approximation to ATE. OLS does not correctly estimate the treatment on the treated parameter for college graduates. However the Mincer model is a poor guide to variability in incomes across schooling levels. Contrary to the model, returns are less variable at higher schooling levels than are returns at lower schooling levels which are much more predictable. Cognitive and noncognitive skills are rewarded differently in different schooling markets. We find considerable evidence that ex post many people make mistakes in their educational decisions.

There is a lot of intrinsic uncertainty about the returns to schooling.

More

educated people are slightly better at predicting these returns but even for them the returns are highly uncertain. In Carneiro, Hansen and Heckman (2002), we use this model to simulate the effects of changes in tuition and access on the overall distribution of income. We relax the “Veil of Ignorance” assumption widely used in the literature to determine who is affected by the policies, where they come from in the pre-policy distribution and where they end up in the post-policy income distribution. Large changes in tuition and access produce small increases in enrollments. Only middle class people benefit from these reforms, and the poor are unaffected.

71

References [1] Aakvik, A., Heckman, James, J. and Vytlacil, Edward. (1998) “Training Effects on Employment when the Training Effects are Heterogeneous: An Application to Norwegian Vocational Rehabilitation Programs.” Manuscript. Chicago: University of Chicago. [2] Aakvik, A., Heckman, James J. and Vytlacil, Edward. “Treatment Effects For Discrete Outcomes when Responses To Treatment Vary Among Observationally Identical Persons: An Application to Norwegian Vocational Rehabilitation Programs,” NBER, Working Paper No.t0262, forthcoming Journal of Econometrics, 2002. [3] Abowd, John and Card, David “On the Covariance Structure of Earnings and Hours Changes,” Econometrica 57(2) (March 1989): 411-45 [4] Albert, J. H.and S. Chib. “Bayesian Analysis of Binary and Polychotomous Response Data,” Journal of the American Statistical Association (1993) 88: 669-679. [5] Albert, James and Chib, Siddhartha ”Sequential Ordinal Modeling with Applications to Survival Data,” Biometrics, (2001), 57, 829-836. [6] Anderson, T.W., and Rubin, H. “Statistical Inference in Factor Analysis.” In Proceedings of Third Berkely Symposium on Mathematical Statistics and Probability, edited 72

by J. Neyman. University of California Press, 5, 111-150, 1956. [7] Aubin and Ekeland, Ivar (1978). ”Second-order equations with convex Hamiltonian”, Canadian Math. Bull. 23 (1980), p.81-94. [8] Bartholomew (1978). [9] Bekker, Paul and Jos M.F. ten Berge. “Generic Global Identification in Factor Models.” Linear Algebra and Its Applications, 264 (1997): 255-263. [10] Ben Akiva, Moshe; Bolduc, Denis; and Walker, Joan. “Specification, Identification and Estimation of the Logit Kernel (or Continuous Mixed Logit Model). Manuscript, Department of Civil Engineering, MIT, February, 2001. [11] Bollen, Kenneth, Structural Equations with Latent Variables. New York: Wiley, 1989. [12] Bjorklund, A. and and Moffitt, R. (1987) “The Estimation of Wage Gains and Welfare Gains in Self-Selection,” Review of Economics and Statistics, 69(1):42-9. [13] Cameron, Stephen and Heckman, James J. “Son of CTM: The DCPA Approach Based on Discrete Factor Structure Models.” Manuscript. Chicago: University of Chicago, 1987. [14] _____. (1998) “Life Cycle Schooling and Dynamic Selection Bias.” Journal of Political Economy 106(2) (1998) :262-333. 73

[15] _____. “The Dynamics of Educational Attainment for Blacks, Whites and Hispanics.” Journal of Political Economy 109(3) (2001): 455-499. [16] Carneiro, Pedro; Karsten Hansen and James Heckman, “Removing the Veil of Ignorance In Assessing the Distributional Impacts of Social Policies” forthcoming Swedish Economic Policy Review, 2002. [17] Carneiro, Pedro; Heckman, James J.; and Vytlacil, Edward, J. “Estimating the Return to Education When It Varies Among Individuals.” Paper presented as the Economic Journal at the Royal Economic Society meeting, Durham, England, April 2001. Also presented as the Review of Economics and Statistics Lecture, Harvard University, April 2001. [18] Cawley, John; Karen Conneely, James Heckman and Edward Vytlacil (1997). “Cognitive Ability, Wages, and Meritocracy.” In Intelligence Genes, and Success: Scientists Respond to the Bell Curve, edited by B. Devlin, S. E. Feinberg, D. Resnick and K. Roeder, (Copernicus: Springer-Verlag, 1997), 179-192. [19] Chib, Siddhartha and Hamilton, Barton. ”Semiparametric Bayes Analysis of Longitudinal Data Treatment Models,” Journal of Econometrics, in press. [20] Chib, S. and Hamilton, B. (1999), ”Bayesian Analysis of Cross Section and Clustered Data Treatment Models”, working paper, 1999. 74

[21] Chib, S. and Hamilton, B. (2000), ”Semiparametric Bayes Analysios ofLongitudinal Data Treatment Models”, working paper, 2000. [22] Chib, Siddhartha and Hamilton, Barton. ”Bayesian Analysis of Cross Section and Clustered Data Treatment Models” Journal of Econometrics, (2000), 97, 25-50. [23] Cochran, W.G. & Rubin, D. B. (1973). ”Controlling Bias in Observational Studies: a review.” Sankhya A 35:417-446 [24] Diebolt, J. and Robert, C.P. “Estimation of Finite Mixture Distributions Through Bayesian Sampling,” Journal of the Royal Statistical Society, Series B., (1994) 56: 363-375. [25] Eckstein, Zvi and Kenneth Wolpin. “The Specification and Estimation of Dynamics Stochastic Discrete Choice Models: A Survey,” Journal of Human Resources, Vol. 24, 562-598, 1989. [26] Eckstein, Zvi, and Wolpin, Kenneth I. “Dynamic Labour Force Participation of Married Women and Endogenous Work Experience.” Review Economic Studies 56 (July 1999): 375-90. [27] Elrod and Keane, M. (1995) “A Factor-analytic Probit Model for Representing the Market Structure in Panel Data”, Journal of Marketing Research 32, 1-16.

75

[28] Geweke, John and Michael Keane (2001). ”An Empirical Analysis of Income Dynamics Among Men in the PSID: 1968-1989,” in Journal of Econometrics, forthcoming. [29] Hansen, Karsten, Heckman, James and Katheleen, Mullen. “Generalized Ordered Discrete Choice Models With One Sided Shocks and Parameterized Thresholds. Manuscript. Chicago:University of Chicago, 2001. [30] Heckman, James J. “Dummy Endogenous Variables In A Simultaneous Equations System.” Econometrica 46 (July 1978): 931-59. [31] Heckman, James J. “Statistical Models for Discrete Panel Data.” In Structural Analysis of Discrete Data, edited by C. Manski and D. McFadden. MIT Press, 1981. (a). [32] ______. “The Incidental Parameter Problem and the Problem of Initial Conditions.” In Structural Analysis of Discrete Data, edited by C. Manski and D. McFadden. MIT Press, 1981. (b) [33] ______. ”Varieties of Selection Bias.” American Economic Review 80 (2) (May 1990): 313-18. [34] _____. “Randomization and Social Policy Evaluation.” In Evaluating Welfare and Training Programs, edited by Charles F. Manski and Irwin Garfinkel. Cambridge, Mass.: Harvard University Press, 1992.

76

[35] _____. “Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations.” Journal of Human Resources 32 (Summer 1997): 441-462. [36] Heckman, James J. “Micro Data, Heterogeneity, and the Evaluation of Public Policy: Nobel Lecture.” Journal of Political Economy 109(4) (2001a): 673-748. [37] _____ .(2001b) “Heterogeneity, Diversity and Social Policy Evaluation,” forthcoming, Economic Journal. [38] Heckman, James J., and Borjas, George. “Does Unemployment Cause Future Unemployment? Definitions, Questions and Answer from a Continuous Time Model of Heterogeneity and State Dependence,” Economica 47 (1980): 47-283. [39] Heckman, James J., Layne-Farrar, Anne and Todd, Petra. “Human Capital Pricing Equations With An Application To Estimating The Effect of Schooling Quality on Earnings.” The Review of Economics and Statistics ( November 1996): 562-610. [40] Heckman, James; Lance, Lochner; and Todd, Petra. “Choice-Theoretic Generalizations of the Ordered Discrete Choice Model.” Manuscript. Chicago: University of Chicago, 1996.

77

[41] Heckman, James J., R. LaLonde, and Smith, Jeffrey. “The Economics and Econometrics of Active Labor Market Programs”. In O. Ashenfelter and D. Card, eds, Handbook of Labor Economics. Volume 3. Amsterdam: Elsevier, 1999. [42] Heckman, James J., Lochner, Lance and Todd, Petra (2001). “Fifty Years of Earnings Regressions,” working paper. [43] Heckman, James, J., and Honoré, Bo. E. “The Empirical Content of the Roy Model.” Econometrica 58(5) (1990): 1121-1149. [44] Heckman, James, Lochner, Lance and Taber Christopher. “Explaining Rising Wage Inequality: Explorations With A Dynamic General Equilibrium Model of Earnings,” Review of Economic Dynamics 1 (1998): 1-58. [45] Heckman, James, J., and Robb, Richard, Jr. “Alternative Methods for Evaluation the Impact of Interventions.” In Longitudinal Analysis of Labor Market Data, edited by James J. Heckman and Burton Singer. New York: Cambridge University Press (for Econometric Soc.), 1985. [46] _____ . “Alternative Methods for Solving the Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes.” In Drawing Inferences from Self-Selected Samples, edited by Howard Wainer. New York: Springer-Verlag, 1986.

78

[47] Heckman, James and Rubinstein, Yona. “The Importance of Noncognitive Skills: Lessons from the GED Testing Program.” AEA Papers and Proceedings, 91(2) (2001): 145-149. [48] Heckman, James J., and Smith, Jeffrey. “Evaluating the Welfare State,” in S. Strom, ed., Econometrics and Economic Theory in the 20th Century: The Ragnar Frisch Centennial, Econometric Society Monograph Series, Cambridge University Press, 1998. [49] _____. “Assessing the Case for Randomized Evaluation of Social Programs.” In Measuring Labour Market Measures: Evaluating The Effects of Active Labour Market Policy Initiatives, edited by Karsten Jensen and Per Kongshoj Madsen. Copenhagen: Ministry Labour, 1993. [50] Heckman, James, J.; Jeffrey, Smith, and Clements, Nancy. (1997) “Making the Most out of Program Evaluations and Social Experiments: Accounting for Heterogeneity in Program Impacts.” Review of Economic Studies 64 487-535. [51] Heckman, J. and E. Vytlacil. “Instrumental Variables Methods For the Correlated Random Effect Model: Estimating The Average Rate of Return to Schooling When the Return Is Correlated With Schooling,” (with E. Vytlacil), Journal of Human Resources, (Fall 1998), 33(4): 974-1002.

79

[52] Heckman, James J. and Vytlacil, Edward. “Policy-Relevant Treatment Effects,” American Economic Review, vol 91, n2, (May 2001): 107-11. [53] Heckman, James J., and Vytlacil, Edward. “Econometric Evaluation of Social Programs.” In Handbook of Econometrics, Vol. 6, edited by James J. Heckman and Edward Leamer. Amsterdam: North-Holland, 2002 (in press). [54] _____. “Local Instrumental Variables.” In Nonlinear Statistical Modeling: Proceeding of the Thirteenth International Symposium in Theory and Econometrics: Essays in Honor of Takeshi Amemiya, edited by Cheng Hsiao, Kimio Morimune, and James Powell. Cambridge: Cambridge University Press, 2000. [55] _____ . “Local Instrumental Variables and Latent Variable Models for Identifying and Bounding Treatment Effects,” Proceedings of the National Academy of Sciences, 96 (1999): 4730-4734. [56] Jöreskog, Karl, Sörbom, Dag and Magidson, Jay. Advances In Factor Analysis and Structural Equations Models, Cambridge, Mass: ABT Associates, 1979. [57] Jöreskog, Karl ,Sörbom, Dag. LISREL 7: A Guide To The Program and Applications. Chicago: SPSS, Inc., 1988.

80

[58] Jöreskog, Karl. “Structural Equations Models In The Social Sciences: Specification, Estimation and Testing. In Applications of Statistics, edited by P.R. Krishnaih. Amsterdam: North Hollland, 1977, 265-287. [59] Keane, Michael P., and Wolpin, Kenneth I. “The Career Decisions of Young Men.” Journal of Political Economy 105(3) (June 1997): 473-522. [60] Kotlarski, Ignacy, ”A theorem on joint probability distributions in stochastic locally convex linear topological spaces.” Colloquium Mathematicum 19 (1968) 175—177. [61] Ledermann, W. “On the Rank of the Reduced Correlation Matrix in Multiple Factor Analysis.” Psychometrika 2 (1937): 85-93. [62] Liu, J.S. and Sabbatti, C. (1998), ”Simulated Sintering: MCMC With Spaces of Varying Dimensions”, Bayesian Statistics , Vol 6, 386-413; Oxford University Press. [63] Liu, J.S. (2001) , ”Monte Carlo Strategies in Scientific Computing”, New York: Springer [64] MaCurdy, “The Use of Time Series Processes to Model the Error Structure of Earnings in a Longitudinal Data Analysis.” Journal of Econometrics 18(1) (Jan. 1982): 83-114.

81

[65] McFadden, Daniel. “Econometric Analysis of Qualitative Response Models.” In Handbook of Econometrics, Vol. II, edited by Z. Griliches and M. Intrilligator. Amsterdam: North Holland, 1984. [66] Mincer, Jacob. Schooling, Experience and Earnings. New York: Columbia University Press, 1974. [67] Mincer, Jacob. ”Changes in Wage Inequality, 1970-1990” NBER Working Paper No.w5823, Published “Economic Development, Growth of Human Capitol, and the Dynasty of the WageStructure,” Journal of Economic Growth , Vol.16 (1997). [68] Muthen, B. “A General Structural Equation Model With Dichotomous, Ordered Categorical and Continuous Latent Variable Indicators.” Psychometrika 49 (1985): 115132. [69] Rao(1993) [70] Richardson, S., Leblond, L., Jaussent, I, and Green, P.J., “Mixture Models in Measurement Error Problems, with Reference to Epidemiological Studies.” Working paper, 2000. [71] Robert, C.P. and Casella, G. “Monte Carlo Statistical Methods,” New York: Springer, 1999.

82

[72] Rosenbaum, Paul R., and Rubin, Donald B. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (April 1983): 41-55. [73] Willis, Robert J., and Rosen, Sherwin. “Education and Self-Selection.” Journal of Political Economy, 87, no. 5, pt. 2 (October 1979): S7-S36. [74] Winship, Christopher, “Does Going to College Make You Smarter?” Manuscript. Harvard University, 2001.

83

Variables: South Broken Urban Num. of Siblings Mother’s Education Father’s Education Age Adj. AFQT Tuition in: 2-year College 4-Year College Distance to: 2-year College 4-year College Local Unemployment Rate: Dropouts High School Grads. Some College College Grads. Local Wage: Dropouts High School Grads. Some College College Grads. Log Wage Employment

Overall (N=1536) Mean Std. Dev. 0.232 0.422 0.160 0.367 0.743 0.437 2.938 1.886 1.205 0.230 1.234 0.323 0.532 0.887

TABLE 1 SAMPLE MEANS NLSY: White Males, Age 29 Dropouts (N=246) High School Grads (N=668) Mean Std. Dev. Mean Std. Dev. 0.337 0.474 0.196 0.397 0.321 0.468 0.132 0.338 0.740 0.440 0.692 0.462 3.585 2.260 3.024 1.855 1.049 0.240 1.170 0.190 1.026 0.313 1.165 0.275 -0.354 0.792 0.311 0.743

Some College (N=239) Mean Std. Dev. 0.264 0.442 0.172 0.378 0.774 0.419 2.778 1.816 1.240 0.193 1.305 0.275 0.796 0.683

College Grads (N=383) Mean Std. Dev. 0.206 0.405 0.099 0.299 0.817 0.387 2.470 1.561 1.345 0.226 1.442 0.309 1.321 0.498

10.820 20.198

5.013 7.658

9.528 18.870

5.489 6.759

11.325 21.366

5.960 7.760

10.070 19.060

6.339 8.387

11.239 19.720

5.648 7.250

3.254 7.980

9.380 15.912

3.916 9.416

10.098 17.080

3.995 8.748

10.200 16.246

2.682 8.360

9.329 16.347

1.890 5.480

6.990 13.940

0.071 0.058 0.038 0.020

0.023 0.025 0.016 0.010

0.070 0.567 0.036 0.020

0.024 0.027 0.016 0.009

0.072 0.061 0.040 0.021

0.024 0.025 0.015 0.010

0.070 0.059 0.038 0.020

0.024 0.027 0.017 0.011

0.069 0.054 0.035 0.019

0.021 0.023 0.014 0.010

6.577 7.498 7.679 10.140 2.451 0.971

1.243 1.328 1.436 1.721 0.545 0.165

6.601 7.436 7.675 10.004 2.136 0.956

1.369 1.268 1.440 2.555 0.601 0.205

6.569 7.481 7.668 10.005 2.357 0.977

1.251 1.317 1.459 1.619 0.513 0.149

6.681 7.580 7.860 10.407 2.527 0.980

1.176 1.213 1.356 1.713 0.453 0.140

6.511 7.517 7.589 10.310 2.726 0.966

1.183 1.449 1.438 1.958 0.478 0.182

TABLE 2a TEST FOR EQUALITY OF THE EMPIRICAL AND PREDICTED DISTRIBUTIONS OF WAGES Test Value Critical Value Overall 50.1086 75.6237 Dropout 2.7760 16.9190 High School 10.8349 37.6535 Some College 6.5013 10.3070 4-Year College 7.0798 27.5871 TABLE 2b TEST FOR EQUALITY OF THE EMPIRICAL AND PREDICTED SCHOOLING PROBABILITIES Test Value Critical Value Dropout 0.3308 3.8415 High School 0.1123 3.8415 Some College 0.9396 3.8415 4-Year College 0.2956 3.8415

1

TABLE 2c PROPORTION OF PEOPLE IN EACH SCHOOLING LEVEL Actual Predicted Ordered Unordered Dropout 0.160 0.150 0.160 High School 0.435 0.440 0.450 Some College 0.156 0.170 0.160 4-Year College 0.249 0.240 0.230

1

--

--

Table 3a Correlations Schooling Choice: Dropouts Age 26 26 Actual Predicted Std.Err. 27 Actual Predicted Std.Err. 28 Actual Predicted Std.Err. 29 Actual Predicted Std.Err. 30 Actual Predicted Std.Err. 31 Actual Predicted Std.Err. 32 Actual Predicted

1

27

28

29

30

31

32

0.3751

0.3532

0.3221

0.4540

0.4058

0.3045

0.2519 (0.1200)

0.3246 (0.1245)

0.2691 (0.1155)

0.3113 (0.1234)

0.2855 (0.1198)

0.3166 (0.1299)

0.5286 0.3582 (0.1284)

0.2603 0.3182 (0.1263)

0.3685 0.3303 (0.1327)

0.5279 0.3136 (0.1367)

0.2668 0.3231 (0.1279)

0.5670

0.4930

0.5036

0.4340

0.3934 (0.1326)

0.4373 (0.1358)

0.3820 (0.1370)

0.4196 (0.1325)

0.6558

0.4178

0.4463

0.3799 (0.1294)

0.3370 (0.1341)

0.3530 (0.1297)

0.5223

0.5449

0.3739 (0.1392)

0.4093 (0.1401)

1

1

1

1

1

0.6322 0.3711 (0.1399) 1 1

--

--

Table 3b Correlations Schooling Choice: High school Age 26 26 Actual Predicted Std.Err. 27 Actual Predicted Std.Err. 28 Actual Predicted Std.Err. 29 Actual Predicted Std.Err. 30 Actual Predicted Std.Err. 31 Actual Predicted Std.Err. 32 Actual Predicted

1

27

28

29

30

31

32

0.6078

0.5135

0.5590

0.5460

0.5220

0.4246

0.4798 (0.1532)

0.5069 (0.1508)

0.5433 (0.1459)

0.5161 (0.1463)

0.4888 (0.1486)

0.4309 (0.1467)

0.6497 0.5782 (0.1504)

0.5814 0.6225 (0.1443)

0.4991 0.5907 (0.1458)

0.5343 0.5608 (0.1503)

0.4334 0.4937 (0.1514)

0.6484

0.6051

0.6585

0.4608

0.6628 (0.1404)

0.6285 (0.1420)

0.5976 (0.1470)

0.5254 (0.1506)

0.7552

0.6743

0.6524

0.6813 (0.1339)

0.6433 (0.1400)

0.5667 (0.1470)

0.7297

0.6670

0.6126 (0.1435)

0.5428 (0.1479)

1

1

1

1

1

0.7491 0.5126 (0.1517) 1 1

--

--

Table 3c Correlations Schooling Choice: Some college Age 26 26 Actual Predicted Std.Err. 27 Actual Predicted Std.Err. 28 Actual Predicted Std.Err. 29 Actual Predicted Std.Err. 30 Actual Predicted Std.Err. 31 Actual Predicted Std.Err. 32 Actual Predicted

1

27

28

29

30

31

32

0.7502

0.6311

0.5380

0.4873

0.4680

0.4857

0.6537 (0.0665)

0.6982 (0.0622)

0.6910 (0.0620)

0.6193 (0.0666)

0.6217 (0.0675)

0.5686 (0.0767)

0.7356 0.7638 (0.0519)

0.5858 0.7558 (0.0536)

0.5035 0.6846 (0.0609)

0.5595 0.6884 (0.0623)

0.4616 0.6281 (0.0726)

0.7117

0.6536

0.6216

0.5612

0.8197 (0.0451)

0.7422 (0.0565)

0.7458 (0.0571)

0.6818 (0.0696)

0.7177

0.6100

0.6168

0.7396 (0.0552)

0.7517 (0.0582)

0.6721 (0.0721)

0.6712

0.5544

0.6826 (0.0661)

0.6195 (0.0739)

1

1

1

1

1

0.5883 0.6133 (0.0786) 1 1

--

--

Table 3d Correlations Schooling Choice: College Age 26 26 Actual Predicted Std.Err. 27 Actual Predicted Std.Err. 28 Actual Predicted Std.Err. 29 Actual Predicted Std.Err. 30 Actual Predicted Std.Err. 31 Actual Predicted Std.Err. 32 Actual Predicted

1

27

28

29

30

31

32

0.7052

0.5305

0.4833

0.4748

0.4440

0.4398

0.4895 (0.0612)

0.5325 (0.0597)

0.5444 (0.0594)

0.5497 (0.0570)

0.5210 (0.0586)

0.4815 (0.0600)

0.7167 0.6769 (0.0530)

0.5985 0.6937 (0.0496)

0.5841 0.6989 (0.0475)

0.4479 0.6646 (0.0503)

0.3893 0.6164 (0.0568)

0.7212

0.6696

0.5957

0.5390

0.7703 (0.0408)

0.7729 (0.0417)

0.7348 (0.0455)

0.6826 (0.0530)

0.6796

0.6105

0.4949

0.7970 (0.0373)

0.7598 (0.0419)

0.7008 (0.0502)

0.8049

0.7089

0.7684 (0.0407)

0.7110 (0.0492)

1

1

1

1

1

0.8101 0.6775 (0.0517) 1 1

Dropout High School Some College College Graduate

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

Dropout High School Some College College Graduate

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

Dropout High School Some College College Graduate

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

Dropout High School Some College College Graduate

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

TABLE 4 Returns to Schooling White Males, Age 29 from NLSY Age 26 Wages Overall Dropout High School Some College 2.15821 2.09387 2.17569 2.23661 2.25835 2.26002 2.26213 2.25983 2.32755 2.36758 2.34163 2.31913 2.41645 2.36106 2.40533 2.44203 Age 26 Returns Overall Dropout High School Some College 0.10107 0.20771 0.12374 0.06204 0.07309 0.11148 0.08340 0.06303 0.08602 -0.00953 0.06076 0.12027 0.15860 0.10142 0.14365 0.18309 Age 27 Wages Overall Dropout High School Some College 2.13428 2.07996 2.15612 2.19970 2.32107 2.26334 2.31552 2.34076 2.39210 2.39263 2.39295 2.39031 2.49076 2.35880 2.47056 2.53877 Age 27 Returns Overall Dropout High School Some College 0.18937 0.22530 0.19747 0.17633 0.07187 0.13022 0.07826 0.05030 0.09831 -0.03435 0.07726 0.14797 0.17007 0.09578 0.15540 0.19831 Age 28 Wages Overall Dropout High School Some College 2.14188 2.08364 2.16531 2.19897 2.36445 2.31411 2.36315 2.37478 2.44541 2.48619 2.45474 2.42828 2.57821 2.4312 2.56039 2.62145 Age 28 Returns Overall Dropout High School Some College 0.22258 0.26953 0.22987 0.20956 0.08124 0.17236 0.09185 0.05370 0.13270 -0.05500 0.10558 0.19309 0.21394 0.11724 0.19742 0.24681 Age 29 Wages Overall Dropout High School Some College 2.23573 2.18260 2.24531 2.29427 2.39513 2.33814 2.38353 2.41967 2.52795 2.56862 2.53802 2.51363 2.63706 2.49333 2.61218 2.69261 Age 29 Returns Overall Dropout High School Some College 0.15958 2 0.18869 0.16558 0.15181 0.13287 0.23047 0.15457 0.09404 0.10907 -0.07531 0.07417 0.17904 0.24195 0.15521 0.22873 0.27300

College Graduate 2.27137 2.24877 2.28078 2.45491 College Graduate 0.01634 0.03572 0.17144 0.20682 College Graduate 2.22939 2.35866 2.39121 2.58962 College Graduate 0.15684 0.03326 0.19814 0.23141 College Graduate 2.24042 2.39157 2.41368 2.67538 College Graduate 0.18770 0.02232 0.26158 0.28395 College Graduate 2.33329 2.44041 2.49001 2.74725 College Graduate 0.13285 0.04958 0.25728 0.30685

Dropout High School Some College College Graduate

Overall 2.21393 2.38278 2.55086 2.62761

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

Overall 0.16911 0.16823 0.07669 0.24491

Dropout High School Some College College Graduate

Overall 2.22624 2.37138 2.51309 2.66897

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

Overall 0.14497 0.14253 0.15577 0.29826

Dropout High School Some College College Graduate

Overall 2.37396 2.45180 2.64152 2.76607

Dropout - HS HS - Some Col. Some Col. - Col. Grad. HS - Col. Grad.

Overall 0.07714 0.19177 0.12458 0.31592

Age 30 Wages Dropout High School 2.14371 2.22105 2.32722 2.36593 2.56757 2.54425 2.48134 2.59444 Age 30 Returns Dropout High School 0.22628 0.18007 0.24058 0.17844 -0.08633 0.05017 0.15420 0.22859 Age 31 Wages Dropout High School 2.13808 2.22982 2.31486 2.35550 2.53245 2.50629 2.51734 2.63573 Age 31 Returns Dropout High School 0.21722 0.15607 0.21844 0.15155 -0.01529 0.12936 0.20315 0.28086 Age 32 Wages Dropout High School 2.23675 2.38842 2.42186 2.44949 2.69839 2.65512 2.62117 2.74073 Age 32 Returns Dropout High School 0.22804 0.09791 0.27868 0.20761 -0.07715 0.08568 0.20105 0.29279

3

Some College 2.30203 2.41125 2.55388 2.69081

College Graduate 2.34131 2.43359 2.55005 2.74822

Some College 0.14474 0.14260 0.13699 0.27959

College Graduate 0.12556 0.11654 0.19808 0.31467

Some College 2.31179 2.40129 2.51661 2.73394

College Graduate 2.35765 2.41989 2.51044 2.79210

Some College 0.12001 0.11607 0.21717 0.33324

College Graduate 0.09110 0.09126 0.28150 0.37280

Some College 2.46143 2.45894 2.62097 2.81113

College Graduate 2.51891 2.46970 2.59512 2.87188

Some College 0.02850 0.16396 0.19031 0.35389

College Graduate -0.02232 0.12738 0.27679 0.40388

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.2588 0.0257 1 0.0664 0.062 0.9725 -0.0175 0.0658 1.0801 0.1874 0.052 0.8804

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.192 0.0363 -0.0275 -0.0839 0.0038 0.108 0.205 -0.0138 -0.2 0.121 -0.01 -0.0921

TABLE 5 Factor Loadings Age 26 Wages Std. Dev. Proportion of Total Variance 0.0595 0.0672 0.0633 0.0043 0 0.2874 0.0284 0.008 0.0303 0.007 0.0968 0.4252 0.0582 0.0055 0.0535 0.0104 0.1317 0.5144 0.0547 0.0487 0.0446 0.0059 0.0974 0.2956 Returns Std. Dev. Proportion of Total Variance 0.0658 0.0377 0.0697 0.0054 0.0968 0.003 0.0645 0.0158 0.061 0.0051 0.104 0.0093 0.0797 0.0535 0.0683 0.0052 0.113 0.0172 0.0615 0.0203 0.0531 0.0031 0.0769 0.0047

Total Variance 0.582 0.582 0.582 0.3634 0.3634 0.3634 0.3712 0.3712 0.3712 0.4308 0.4308 0.4308 Total Variance 0.606 0.606 0.606 0.39 0.39 0.39 0.492 0.492 0.492 0.498 0.498 0.498

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.155 0.0175 1.14 0.0864 0.055 1.01 0.004 0.063 1.22 0.198 0.0154 0.948

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.0681 0.0375 -0.13 -0.0825 0.008 0.211 0.194 -0.0476 -0.276 0.111 -0.0396 -0.065

Age 27 Wages Std. Dev. Proportion of Total Variance 0.061 0.0291 0.0628 0.0044 0.118 0.41 0.0277 0.0152 0.0301 0.007 0.099 0.556 0.0626 0.0056 0.0581 0.0102 0.14 0.636 0.0504 0.0769 0.0403 0.0034 0.0981 0.495 Returns Std. Dev. Proportion of Total Varianc~e 0.0667 0.0116 0.0689 0.0075 0.113 0.0119 0.0683 0.0224 0.0652 0.0082 0.103 0.032 0.0808 0.0782 0.0692 0.0122 0.109 0.0468 0.0571 0.0321 0.0497 0.008 0.071 0.0056

Total Variance 0.528 0.528 0.528 0.301 0.301 0.301 0.386 0.386 0.386 0.297 0.297 0.297 Total Variance 0.435 0.435 0.435 0.279 0.279 0.279 0.304 0.304 0.304 0.266 0.266 0.266

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.173 0.0171 1.24 0.0853 0.0446 1.04 -0.0309 0.0748 1.2 0.22 -0.0092 1.03

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.0878 0.0275 -0.2 -0.116 0.0302 0.165 0.251 -0.084 -0.175 0.135 -0.0538 -0.01

Age 28 Wages Std. Dev. Proportion of Total Variance 0.06 0.0428 0.0614 0.005 0.122 0.579 0.0277 0.0161 0.0292 0.0055 0.101 0.632 0.0573 0.0075 0.0547 0.015 0.132 0.76 0.0514 0.101 0.0403 0.0033 0.105 0.62 Returns Std. Dev. Proportion of Total Variance 0.0652 0.0238 0.0674 0.0103 0.108 0.0316 0.0636 0.0522 0.0611 0.0136 0.0944 0.0319 0.077 0.189 0.0662 0.031 0.1 0.0336 0.0583 0.0619 0.0493 0.0149 0.0707 0.0044

Total Variance 0.434 0.434 0.434 0.277 0.277 0.277 0.309 0.309 0.309 0.277 0.277 0.277 Total Variance 0.276 0.276 0.276 0.181 0.181 0.181 0.194 0.194 0.194 0.188 0.188 0.188

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.124 0.006 1.11 0.0925 0.0481 1.09 -0.0341 0.115 1.12 0.211 -0.0108 1.08

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.0312 0.0422 -0.0162 -0.127 0.0672 0.0283 0.245 -0.126 -0.0415 0.119 -0.0589 -0.0132

Age 29 Wages Std. Dev. Proportion of Total Variance 0.057 0.0251 0.0585 0.0045 0.133 0.492 0.0283 0.0195 0.0299 0.0064 0.106 0.727 0.054 0.0081 0.0515 0.0311 0.125 0.741 0.0535 0.0932 0.0407 0.0034 0.109 0.679 Returns Std. Dev. Proportion of Total Variance 0.063 0.0102 0.0643 0.0116 0.124 0.0095 0.0612 0.0743 0.0586 0.0291 0.0913 0.0104 0.0762 0.204 0.0643 0.0611 0.0969 0.0106 0.0604 0.0666 0.0502 0.0219 0.0716 0.006

Total Variance 0.411 0.411 0.411 0.266 0.266 0.266 0.275 0.275 0.275 0.279 0.279 0.279 Total Variance 0.271 0.271 0.271 0.142 0.142 0.142 0.171 0.171 0.171 0.142 0.142 0.142

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.218 0.0887 1.25 0.0841 0.0248 1.04 0.0048 0.102 1.04 0.218 -0.0055 1.12

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.134 -0.0639 -0.207 -0.0793 0.077 -0.0001 0.213 -0.107 0.0761 0.134 -0.0303 0.076

Age 30 Wages Std. Dev. Proportion of Total Variance 0.0612 0.0607 0.0636 0.0138 0.129 0.548 0.0277 0.0167 0.0291 0.003 0.102 0.676 0.0551 0.0057 0.0542 0.0241 0.122 0.598 0.0557 0.0934 0.0428 0.0033 0.113 0.685 Returns Std. Dev. Proportion of Total Variance 0.0666 0.0436 0.0693 0.0169 0.116 0.0338 0.0614 0.0272 0.061 0.0252 0.0947 0.0072 0.0786 0.129 0.0678 0.0392 0.103 0.0123 0.062 0.0731 0.0511 0.0115 0.0741 0.0114

Total Variance 0.468 0.468 0.468 0.26 0.26 0.26 0.296 0.296 0.296 0.297 0.297 0.297 Total Variance 0.283 0.283 0.283 0.201 0.201 0.201 0.214 0.214 0.214 0.16 0.16 0.16

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.206 -0.0264 1.11 0.0995 0.0583 1.08 0.0186 0.103 1.11 0.221 0.0008 1.1

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.106 0.0847 -0.0291 -0.0808 0.0446 0.0318 0.202 -0.102 -0.0128 0.121 -0.0575 0.0189

Age 31 Wages Std. Dev. Proportion of Total Variance 0.0565 0.0571 0.0642 0.0058 0.149 0.453 0.0299 0.0186 0.0332 0.0075 0.106 0.589 0.0596 0.0065 0.0586 0.0227 0.131 0.609 0.0565 0.089 0.0437 0.0032 0.112 0.614 Returns Std. Dev. Proportion of Total Variance 0.0637 0.0236 0.0718 0.0182 0.139 0.0093 0.0669 0.0236 0.0668 0.0134 0.102 0.0072 0.0828 0.102 0.0719 0.0327 0.108 0.0077 0.0638 0.0437 0.0543 0.0142 0.077 0.0044

Total Variance 0.446 0.446 0.446 0.321 0.321 0.321 0.331 0.331 0.331 0.32 0.32 0.32 Total Variance 0.358 0.358 0.358 0.255 0.255 0.255 0.25 0.25 0.25 0.233 0.233 0.233

Dropouts

High School

Some College

College Graduates

Dropout - HS

HS - Some Col.

Some Col. - Col. Grad.

HS - Col. Grad.

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α 0.265 -0.0644 1.16 0.0756 0.0758 0.989 -0.06 0.0467 1.11 0.202 -0.0233 1.07

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

α -0.189 0.14 -0.175 -0.136 -0.0291 0.126 0.262 -0.07 -0.042 0.126 -0.0991 0.0836

Age 32 Wages Std. Dev. Proportion of Total Variance 0.0626 0.0893 0.0709 0.0106 0.145 0.488 0.0308 0.0107 0.0346 0.0108 0.1 0.464 0.0726 0.0118 0.0702 0.0092 0.141 0.499 0.0614 0.0675 0.0468 0.004 0.115 0.523 Returns Std. Dev. Proportion of Total Variance 0.0696 0.0542 0.0793 0.0332 0.134 0.0195 0.0792 0.034 0.0778 0.0092 0.116 0.012 0.0953 0.106 0.0838 0.0159 0.125 0.0072 0.0685 0.0325 0.0578 0.0201 0.0849 0.0067

Total Variance 0.461 0.461 0.461 0.345 0.345 0.345 0.412 0.412 0.412 0.361 0.361 0.361 Total Variance 0.414 0.414 0.414 0.399 0.399 0.399 0.398 0.398 0.398 0.346 0.346 0.346

Choice

Arithmetic

Word

Paragraph

Math

Science

Numerical

Coding

Skip School

Damage Property

Fight in School

Shoplift

Steal Goods < $50

Steal Goods > $50

Use Force

Hit Others

Injure Others

Choice, Measurement System and Employment Wages α Std. Dev. Proportion of Total Variance f1 1.0318 0.0567 0.3456 f2 -0.46 0.0622 0.0673 f3 0 0 0 f1 1 0 0.6975 f2 0 0 0 f3 0 0 0 f1 0.7171 0.0223 0.6076 f2 0 0 0 f3 0 0 0 f1 0.8104 0.0249 0.6044 f2 0 0 0 f3 0 0 0 f1 1.0109 0.0263 0.6724 f2 0 0 0 f3 0 0 0 f1 0.7563 0.0244 0.5457 f2 0 0 0 f3 0 0 0 f1 0.7379 0.0265 0.4466 f2 0 0 0 f3 0 0 0 f1 0.6487 0.0265 0.3622 f2 0 0 0 f3 0 0 0 f1 0.2047 0.0853 0.018 f2 -0.9353 0.1297 0.3131 f3 0 0 0 f1 0.1002 0.0695 0.004 f2 -1.3837 0.1382 0.5034 f3 0 0 0 f1 0.3586 0.0604 0.045 f2 -1 0 0.3353 f3 0 0 0 f1 0.134 0.0669 0.0065 f2 -1.2818 0.1285 0.4644 f3 0 0 0 f1 -0.091 0.0571 0.0046 f2 -0.8259 0.0957 0.2667 f3 0 0 0 f1 0.2793 0.0959 0.0233 f2 -1.3624 0.1736 0.4847 f3 0 0 0 f1 0.1619 0.0861 0.0129 f2 -0.8752 0.126 0.2874 f3 0 0 0 f1 0.1322 0.0566 0.0075 f2 -0.9695 0.096 0.3327 f3 0 0 0 f1 0.2721 0.0779 0.0249 f2 -1.1615 0.1325 0.4084 f3 0 0 0

Total Variance 1.7061 1.7061 1.7061 0.7945 0.7945 0.7945 0.4687 0.4687 0.4687 0.6018 0.6018 0.6018 0.8417 0.8417 0.8417 0.5804 0.5804 0.5804 0.6753 0.6753 0.6753 0.6436 0.6436 0.6436 1.5047 1.5047 1.5047 2.0456 2.0456 2.0456 1.6185 1.6185 1.6185 1.9025 1.9025 1.9025 1.3763 1.3763 1.3763 2.0598 2.0598 2.0598 1.4383 1.4383 1.4383 1.5206 1.5206 1.5206 1.7781 1.7781 1.7781

Smoke Pot

Use Drugs

Sell Pot

Sell Drugs

Charged by Police

Charged by Police

Convicted

Employment

f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3 f1 f2 f3

0.1543 -1.2441 0 0.2373 -1.7962 0 0.2482 -1.8934 0 0.4008 -1.2298 0 0.389 -1.2646 0 0.4116 -1.3077 0 0.2299 -1.3008 0 0.129 -0.13 1.83

0.0621 0.1351 0 0.0849 0.1984 0 0.0949 0.2203 0 0.1354 0.2066 0 0.0809 0.155 0 0.1022 0.188 0 0.0957 0.1684 0 0.149 0.16 0.296

0.0082 0.4486 0 0.0126 0.6231 0 0.0014 0.2944 0 0.0504 0.4214 0 0.0445 0.4394 0 0.0486 0.452 0 0.0175 0.465 0 0.0129 0.0137 0.343

1.852 1.852 1.852 2.7836 2.7836 2.7836 1.4526 1.4526 1.4526 1.9336 1.9336 1.9336 1.9575 1.9575 1.9575 2.0345 2.0345 2.0345 1.9568 1.9568 1.9568 1.61 1.61 1.61

TABLE 6a AVERAGE LOG WAGES Conditioning Set Marginal Person Schooling Level

Overall

Dropout

High School

Some College

4-Year College

Dropout vs. High School

High School vs. Some Col.

Some Col. vs. 4-Year Col.

Dropout Std. Error Interval

2.1961 0.0533 2.0894,2.3064

2.0951 0.0456 2.0056,2.1735

2.1781 0.0472 2.0847,2.2813

2.2301 0.0676 2.0878,2.3695

2.2666 0.0865 2.0942,2.4455

2.1226 0.0411 2.0415,2.2008

2.2148 0.0606 2.0892,2.3420

2.2420 0.0737 2.0922,2.3948

High School Std. Error Interval

2.3593 0.0181 2.3239,2.3932

2.3049 0.0318 2.2390,2.3591

2.3509 0.0184 2.3124,2.3871

2.3781 0.0233 2.3330,2.4203

2.3952 0.0322 2.3306,2.4553

2.3208 0.0258 2.2689,2.3663

2.3704 0.0207 2.3281,2.4085

2.3835 0.0259 2.3340,2.4319

Some College Std. Error Interval

2.4909 0.0339 2.4230,2.5573

2.5662 0.0660 2.4378,2.6877

2.5080 0.0375 2.4329,2.5772

2.4672 0.0339 2.3949,2.5291

2.4340 0.0422 2.3380,2.5079

2.5486 0.0554 2.4400,2.6545

2.4800 0.0331 2.4121,2.5412

2.4563 0.0359 2.3753,2.5195

4-Year College Std. Error Interval

2.6114 0.0374 2.5351,2.6863

2.4352 0.0787 2.2817,2.5996

2.5793 0.0447 2.4873,2.6709

2.6700 0.0283 2.6130,2.7239

2.7339 0.0257 2.6859,2.7875

2.4828 0.0674 2.3533,2.6224

2.6433 0.0321 2.5775,2.7068

2.6907 0.0264 2.6393,2.7405

TABLE 6b AVERAGE RETURNS Conditioning Set Marginal Person Schooling Level

Overall

Dropout

High School

Some College

4-Year College

Dropout vs. High School

High School vs. Some Col.

Some Col. vs. 4-Year Col.

High School vs. Dropout Standard Error Interval

0.1632 0.0545 0.0573,0.2652

0.2099 0.0543 0.1047,0.3113

0.1729 0.0478 0.0743,0.2583

0.1479 0.0702 0.0073,0.2863

0.1287 0.0921 -0.0531,0.3141

0.1983 0.0461 0.1071,0.2825

0.1556 0.0622 0.0342,0.2722

0.1416 0.0772 -0.0118,0.2976

Some College vs. High School Standard Error Interval

0.1316 0.0378 0.0505,0.1980

0.2613 0.0759 0.1259,0.4082

0.1571 0.0418 0.0742,0.2339

0.0892 0.0395 0.0096,0.1611

0.0388 0.0514 -0.0715,0.1370

0.2278 0.0632 0.1118,0.3474

0.1096 0.0377 0.0285,0.1719

0.0728 0.0424 -0.0166,0.1540

4-Year College vs. Some College Standard Error Interval

0.1205 0.0508 0.0210,0.2235

-0.1311 0.1004 -0.3170,0.0834

0.0713 0.0583 -0.0438,0.1886

0.2028 0.0459 0.1155,0.2921

0.2999 0.0513 0.2082,0.4131

-0.0658 0.0859 -0.2242,0.1193

0.1633 0.0474 0.0763,0.2551

0.2344 0.0465 0.1514,0.3284

4-Year College vs. High School Standard Error Interval

0.2521 0.0418 0.1776,0.3465

0.1302 0.0870 -0.0430,0.3107

0.2283 0.0489 0.1327,0.3352

0.2920 0.0351 0.2341,0.3704

0.3386 0.0386 0.2657,0.4220

0.1620 0.0739 0.0128,0.3094

0.2729 0.0373 0.2084,0.3571

0.3072 0.0350 0.2456,0.3827

Sub-Population Dropout

TABLE 6c TESTS OF EQUALITY OF LOG WAGES (Test of Selection) ∗ xβ i = xβ i + αi E (f | choice = j) Dropout High School Some College χ2 Critical χ2 Critical χ2 Critical Value Value Value 3.0819 3.8415 2.9558 3.8415 2.4005 3.8415

4-Year College χ2 Critical Value 14.0468 3.8415

High School 1.6307 3.8415 0.8290 3.8415 2.1739 3.8415 6.8023 3.8415 Some College 2.7685 3.8415 2.5916 3.8415 1.7686 3.8415 11.7613 3.8415 4-Year College 2.8384 3.8415 2.5038 3.8415 2.5921 3.8415 13.7675 3.8415 Dropout - High School 2.9122 3.8415 2.5681 3.8415 2.5376 3.8415 13.2077 3.8415 High School - Some College 2.1191 3.8415 1.8113 3.8415 0.9001 3.8415 7.1429 3.8415 Some College - 4-Year College 2.8145 3.8415 2.6053 3.8415 2.1794 3.8415 12.9354 3.8415 ∗ Tests whether the mean wage of a person taken at random from the total population and forced into state/column j is equal to the wage of a person from sub-population/row i if forced into state/column j

1

TABLE 6d TESTS OF EQUALITY OF RETURNS∗ (Test of Selection) Dropout-High Shool High School-Some College Some College-College Hgih Schoool-4-Year College Sub-Population χ2 Critical χ2 Critical χ2 Critical χ2 Critical Value Value Value Value Dropout 0.4857 3.8415 4.7363 3.8415 14.2594 3.8415 0.6322 3.8415 High School 0.3666 3.8415 2.9909 3.8415 7.8867 3.8415 0.0352 3.8415 Some College 0.4022 3.8415 4.0773 3.8415 11.8359 3.8415 0.1202 3.8415 4-Year College 0.4957 3.8415 4.7338 3.8415 14.9008 3.8415 2.3151 3.8415 Dropout vs. High School 0.4911 3.8415 4.6165 3.8415 13.6817 3.8415 0.3504 3.8415 High School vs. Some College 0.2579 3.8415 2.6735 3.8415 6.7435 3.8415 0.0338 3.8415 Some College vs. 4-Year College 0.4528 3.8415 4.4649 3.8415 13.5294 3.8415 0.3217 3.8415 ∗ Tests whether the returns for a person taken at random from the total population and forced into state/column j are equal to the returns for a person from sub-population/row i if forced into state/column j

OLS ATE TT

TABLE 6e OLS RETURNS VS. AVERAGE RETURNS y = a + bHS DHS + bSC DSC + bCOL DCOL + ε High School -Dropout∗ Some College - High School∗∗ 4-Year College - Some College∗∗∗ 0.145 0.116 0.131 0.163 0.132 0.121 0.173 0.089 0.300

OLS RETURNS VS. AVERAGE RETURNS y = a + bS + ε (Mincer) High School - Dropout Some College - High School 4-Year College - Some College OLS 0.058 0.058 0.058 ATE 0.070 0.083 0.057 TT 0.074 0.056 0.142 ∗ Average Difference in Years of Schooling = 2.3 ∗∗ Average Difference in Years of Schooling = 1.6 ∗∗∗ Average Difference in Years of Schooling = 2.1

1

Figure 1a Overall Wage Distribution 0.06 actual predicted

0.04

f(Wage)

0.02

0

0

0.5

1

1.5

2

Wage

2.5

3

3.5

4

4.5

Figure 1b Dropouts Wage Distribution 0.35 actual predicted 0.3

0.25

0.2

f(Wage) 0.15

0.1

0.05

0

0

0.5

1

1.5

Wage

2

2.5

3

3.5

Figure 1c High School Graduates Wage Distribution 0.14 actual predicted 0.12

0.1

0.08

f(Wage) 0.06

0.04

0.02

0

0

0.5

1

1.5

2

Wage

2.5

3

3.5

4

Figure 1d Some College Wage Distribution 0.3 actual predicted

0.2

f(Wage)

0.1

0

1

1.5

2

2.5

Wage

3

3.5

4

4.5

Figure 1e College Wage Distribution 0.25 actual predicted

0.2

0.15

f(Wage) 0.1

0.05

0 0.5

1

1.5

2

2.5

Wage

3

3.5

4

4.5

Figure 2a Factor, AFQT, and Normal Distributions White Males, age 29 from NLSY 0.025 factor1 factor2 normal1 normal2 afqt factor3 normal3

f(factor)

0.02

0.015

0.01

0.005

0 -2

-1

0

Factor

1

2

Figure 2b Distributions of Factor 1 White Males, age 29 from NLSY 0.018 drop hs 2y 4y

0.016

f(factor 1)

0.014

0.012

0.01

0.008

0.006

0.004

0.002

0 -2

-1

0

Factor 1

1

2

Figure 2c Distributions of Factor 2 White Males, age 29 from NLSY 0.014 drop hs 2y 4y

0.012

f(factor 2)

0.01

0.008

0.006

0.004

0.002

0 -2

-1

0

Factor 2

1

2

Figure 2d Distributions of Factor 3 White Males, age 29 from NLSY 0.025 drop hs 2y 4y

f(factor 3)

0.02

0.015

0.01

0.005

0 -2

-1

0

Factor 3

1

2

Figure 3a Distribution of Wages, Dropouts White Males, age 29 from NLSY 0.04

drop hs 2y

4y

0.035

f(wages)

0.03

0.025

0.02

0.015

0.01

0.005

0

1

1.5

2

2.5

Wages

3

3.5

4

Figure 3b Distributions of Wages, High School Graduates White Males, age 29 from NLSY 0.05 drop hs 2y 4y

0.045

0.04

f(wages)

0.035

0.03

0.025

0.02

0.015

0.01

0.005

0

1

1.5

2

2.5

Wages

3

3.5

4

Figure 3c Distributions of Wages, Some College White Males, age 29 from NLSY 0.045 drop hs 2y 4y

0.04

f(wages)

0.035

0.03

0.025

0.02

0.015

0.01

0.005

0

1

1.5

2

2.5

Wages

3

3.5

4

Figure 3d Distributions of Wages, College White Males, age 29 from NLSY 0.045 drop hs 2y 4y

0.04

f(wages)

0.035

0.03

0.025

0.02

0.015

0.01

0.005

0

1

1.5

2

2.5

Wages

3

3.5

4

Figure 4a Distributions of Returns to High School White Males, age 29 from NLSY 0.06

f(returns)

drop hs 2y 4y

0.04

0.02

0 -1.5

-1

-0.5

0

Returns

0.5

1

1.5

Figure 4b Distributions of Returns to Some College White Males, age 29 from NLSY 0.08 drop hs 2y 4y

f(returns)

0.06

0.04

0.02

0 -1.5

-1

-0.5

0

Returns

0.5

1

1.5

Figure 4c Distributions of Returns to College Graduates White Males, age 29 from NLSY 0.08 drop hs 2y 4y

f(wages)

0.06

0.04

0.02

0 -1.5

-1

-0.5

0

Wages

0.5

1

1.5

Figure 4d Distribution of Returns to College vs. High School White Males, Age 29 from NLSY 0.08 drop hs 2y 4y

f(wages)

0.06

0.04

0.02

0 -1.5

-1

-0.5

0

wages

0.5

1

1.5

Figure 5a Marginal Treatment Effect Dropouts - High School NLSY, White Males 0.5

0.4

0.3

0.2

MTE

0.1

0

-0.1

-0.2

-0.3 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 5b Marginal Treatment Effect High School - Some College NLSY, White Males 0.8

0.6

0.4

MTE

0.2

0

-0.2

-0.4 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 5c Marginal Treatment Effect Some College - College NLSY, White Males 0.8

0.6

0.4

0.2

MTE

0

-0.2

-0.4

-0.6

-0.8 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 5d Marginal Treatment Effect High School - College NLSY, White Males 0.8

0.6

0.4

MTE

0.2

0

-0.2

-0.4 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 6a Marginal Treatment Effect Dropout - High School NLSY, White Males 0.6 age 26 age 27 age 28 age 29 age 30 age 31 age 32

0.5

0.4

0.3

0.2

MTE 0.1

0

-0.1

-0.2

-0.3 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 6b Marginal Treatment Effect High School - Some College NLSY, White Males 0.5 age 26 age 27 age 28 age 29 age 30 age 31 age 32

0.4

0.3

MTE

0.2

0.1

0

-0.1 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 6c Marginal Treatment Effect Some College - College NLSY, White Males 0.6 age 26 age 27 age 28 age 29 age 30 age 31 age 32

0.4

0.2

MTE

0

-0.2

-0.4

-0.6 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 6d Marginal Treatment Effect High School - College NLSY, White Males 0.6

0.5

age 26 age 27 age 28 age 29 age 30 age 31 age 32

0.4

MTE

0.3

0.2

0.1

0

-0.1 -5

-4

-3

-2

-1

0

ud

1

2

3

4

5

Figure 7a Dropout Log Wage Dis tribution under Different Information S ets White Males, 29 years old from NLSY 1.4 I1 I2 I3 1.2

1

f(wages )

0.8

0.6

0.4

0.2

0 0.5

1

1.5

2

2.5 wages

I1={Ø}, I2={f1 , f2}, I3={f1, f2, f3}

3

3.5

4

Figure 7b High S chool Log Wage Dis tribution under Different Information S ets White Males, 29 years old from NLSY 3 I1 I2 I3 2.5

f(wages )

2

1.5

1

0.5

0 0.5

1

1.5

2

2.5 wages

I1={Ø}, I2={f1 , f2}, I3={f1, f2, f3}

3

3.5

4

F igure 7c S ome C ollege Log Wage Dis tribution under Different Information S ets White Males, 29 years old from NLSY 2.5 I1 I2 I3

2

f(wages )

1.5

1

0.5

0 0.5

1

1.5

2

2.5 wages

I1={Ø}, I2={f1 , f2}, I3={f1, f2, f3}

3

3.5

4

F igure 7d C ollege Log Wage Dis tribution under Different Information S ets White Males, 29 years old from NLSY 3 I1 I2 I3 2.5

f(wages )

2

1.5

1

0.5

0 0.5

1

1.5

2

2.5 wages

I1={Ø}, I2={f1 , f2}, I3={f1, f2, f3}

3

3.5

4

F igure 8a R eturn to High S chool Under Different Information S ets White Males, 29 years old from NLSY 1.4 I1 I2 I3 1.2

f(returns )

1

0.8

0.6

0.4

0.2

0 -1.5

-1

-0.5

0 returns

0.5

I1={Ø}, I2={f1 , f2}, I3={f1, f2, f3}

1

1.5

F igure 8b R eturn to C ollege Under Different Information S ets White Males, 29 years old from NLSY 1.8 I1 I2 I3

1.6

1.4

f(returns )

1.2

1

0.8

0.6

0.4

0.2

0 -1.5

-1

-0.5

0 returns

0.5

I1={Ø}, I2={f1 , f2}, I3={f1, f2, f3}

1

1.5

F igure 8c R eturn to High S chool Under Different Information S ets White Males, 29 years old from NLSY 1.4

I4 1.2

I5

f(returns )

1

0.8

0.6

0.4

0.2

0 -1.5

-1

-0.5

0 returns

I4={AS V AB }, I5={f1 }

0.5

1

1.5

F igure 8d R eturn to C ollege Under Different Information S ets White Males, 29 years old from NLSY 1.6

I4 1.4

I5 1.2

f(returns )

1

0.8

0.6

0.4

0.2

0 -1.5

-1

-0.5

0 returns

I4={AS V AB }, I5={f1 }

0.5

1

1.5

F igure 9a Dropout Log Wage Dis tribution under Different Information S ets White Males, 29 years old from NLSY 0.8 I4 I5 0.7

0.6

f(wages )

0.5

0.4

0.3

0.2

0.1

0 0.5

1

1.5

2

2.5 wages

I4={AS V AB }, I5={f1 }

3

3.5

4

F igure 9b High S chool Log Wage Dis tribution under Different Information S ets White Males , 29 years old from NLS Y 1 I4 I5 0.9 0.8 0.7

f(wages )

0.6 0.5 0.4 0.3 0.2 0.1 0 0.5

1

1.5

2

2.5 wages

I4={AS V AB }, I5={f1 }

3

3.5

4

F igure 9c S ome C ollege Log Wage Dis tribution under Different Information S ets White Males , 29 years old from NLS Y 0.9 I4 I5 0.8

0.7

f(wages )

0.6

0.5

0.4

0.3

0.2

0.1

0 0.5

1

1.5

2

2.5 wages

I4={AS V AB }, I5={f1 }

3

3.5

4

F igure 9d C ollege Log Wage Dis tribution under Different Information S ets White Males, 29 years old from NLSY 1 I4 I5 0.9 0.8 0.7

f(wages )

0.6 0.5 0.4 0.3 0.2 0.1 0 0.5

1

1.5

2

2.5 wages

I4={AS V AB }, I5={f1 }

3

3.5

4

Figure 10a - Actual Log Wages vs. Predicted Log Wages vs. Mincer 3

Log Wages

2.5 2 Actual Predicted Mincer

1.5 1 0.5 0 0

2

4 Schooling

6

8

Figure 10b - Variance of Wages by Schooling Level 0.7

variance of wages

0.6 0.5

age 26 age 27 age 28 age 29 age 30 age 31 age 32

0.4 0.3 0.2 0.1 0 0

1

2

3

4

5

years of schooling (vs. dropout)

6

7

Figure 10c - Variance of Log Wages vs. Predicted Variance of Log Wages Using Mincer

Variance of Log Wages

7 6 5 4

Predicted Actual

3 2 1 0 0

2

4 Schooling

6

8