31 MAJ
CONTEXTUAL EFFECTS ON RECOVERY FROM ILLNESS IN INDIVIDUALIZED NEIGHBORHOODS
BO MALMBERG
Dept. of Human Geography Stockholm University
[email protected]
EVA ANDERSSON
Dept. of Human Geography Stockholm University
[email protected]
ABSTRACT To what extent does spatial variation in ill health reflect the influence of contextual factors such as differences in social trust, density of social network, and varying social support? To answer this question it is not enough to have individual and neighbourhood level data. It is also necessary to have an idea about the scale at which social influence on health are at work. In this paper we will demonstrated that changes in neighbourhood scale—that is, shifts in the number of nearest neighbours that are used to compute contextual variables–can lead to large shifts in the values for contextual variables that are assigned to different individuals. This implies that estimates of neighbourhood effects are not invariant to changes in scale. We also present results from an empirical analysis of scale dependent neighbourhood effects using Swedish longitudinal register-‐based data on sickness-‐benefit recipiency as an indicator of onset of and recovery from illness. Sickness-‐insurance data is used because, for confidentiality reasons, our register base data set contains limited information on health outcomes. Our first sample consists of individuals that have stayed healthy and in work for a three-‐year period, some of whom are affected by illness during the fourth year. Our second sample consists of those in the first group that fall ill during the fourth year, some of who return to good health in the fifth year. In order to compute the contextual variables for different scale level we use the Equipop software. Key words: contextual effects, neighborhood effects, context, Equipop,
2
INTRODUCTION MAUP. In studies of neighbourhood effect on health much of the focus has been on how contextual factors affect the risk of falling ill (Hartig and Lawrence, 2003; Shouls, Congdon and Curtis, 1996; Stjärne, Ponce De Leon and Hallqvist, 2004). Health, however, is not only a result of not falling ill. It is also a result of successful recovery from ill-‐health. To analyse neighbourhood effects on recovery is, therefore, no less important than the analysis of transitions from health to sickness. In fact, it can be argued neighbourhood context becomes even more important for people in ill-‐health who spend more of their time close to the home. Moreover, as we will argue in this paper, an analysis of neighbourhood effects on the recovery from ill-‐health provides an opportunity for reducing the effect of selection bias on estimated neighbourhood effects. In the early 00s sharply increasing sick-‐rates, and rapidly rising cost for the sick insurance, made ill-‐ health in the working age population an intensively discussed issue in Sweden (Marklund et al., 2005; Scb, 2004). One factor singled out as an explanation for this ill-‐health was an increase in long-‐term sickness (see Lidwall and Marklund, 2011). Research on long-‐term sickness in Sweden has mainly focused on work-‐related factors but it also possible that neighbourhood factors can be of importance especially, as argued above, for rates of recovery. A challenge for studies of contextual effects on health is, however, to determine at what neighbourhood scale such effects are likely to occur (Schaefer-‐Mcdaniel, Dunn, Minian and Katz, 2010). Different attempts to determine the relevant scale have been made but, as yet, there exists little consensus on this issue (Spielman and Yoo, 2009). In this paper we will, therefore, propose an approach using individualized neighbourhoods that allows the question of scale to be addressed in a flexible way and, at the same time, makes it possible to circumvent the indeterminacy that plagues context effect studies that use administratively defined areas to measure neighbourhood context. This paper, thus, has a two-‐fold aim. The first aim is complement earlier studies of neighbourhood effects on ill-‐health with a study that analyses if there are contextual effects on recovery of health. The second aim is to analyse if contextual measures based on individualized, scalable neighbourhoods provide can give better estimates of contextual effects than traditional area-‐based measures.
METHODS AND DATA Increasing interest in the analysing the effects of neighbourhood context on health and other individual outcomes has been accompanied by a discussion about the methodological difficulties involved in establishing causal links. A key question has been to what extent self-‐selection of individuals into neighbourhoods will make it difficult to estimate true contextual effects using observational data (Diez Roux and Mair, 2010). If individuals were randomly selected into neighbourhoods, casual effects of neighbourhood context would be reflected in statistically significant differences in outcomes across neighbourhoods. It is, however, common to argue that individuals are selected into neighbourhoods on the basis of observed and unobserved characteristics (Baker, Bentley and Mason, 2013). This implies that the assignment of individuals to neighbourhoods is non-‐random and, as a consequence, it becomes difficult to tell if 3
differing outcomes between neighbourhoods are caused by the selection process or by contextual effects. A potential remedy for this problem is to estimate neighbourhoods effects using individual level background variables to control differences in composition. However, as argued by Oakes (2004), success in controlling for differences in composition would at the same time reduce difference in outcome across neighbourhoods that are due to contextual effects. Oakes has been criticized by Subramanian (2004). He acknowledges parts of Oakes’ argument but maintains that there are empirical designs that can circumvent the selection problem. In this paper we will use two different approaches to address the problems involved in estimating contextual effects. First, instead of relying on a selection equation to control for individual level background variables we will concentrate our analysis on a sample of individuals that is homogenous as possible with respect to risk factors for ill-‐health. Second, by using measures of the socio-‐economic context that are not statistical aggregates for a given neighbourhood area but based on individualized scalable neighbourhoods we also break the identity between neighbourhood population composition and our measure of the socio-‐economic environment.
DATA Our data comes from the Population and Labour Market, Chorology Database (PLACE) at the Department of Human Geography at Uppsala University. This database contains register-‐based, longitudinal, individual level data from Statistic Sweden for the population in Sweden from 1990 to 2010 with geocodes of the residential location by 100 meter squares. For each year that data contains more than 100 different individual-‐level variables covering demographic information, education, occupation, employment, social insurance, and different income measures. Earlier studies using the same data to study health outcomes includes (Fransson and Hartig, 2010; Hartig and Fransson, 2009).
SAMPLE We use data for the years 2000-‐2010 and our sample has been constructed in three steps. First, we have selected individuals that were between 30 and 56 years of age in year 2000. Second, from this group we have excluded individuals who in any of the years 2000, 2001, or 2002 received unemployment benefits, received social allowance, received sickness benefits, did not have wage income, or were not in employment in November. Finally, we also excluded individuals that did not receive sickness benefits in 2003. The rationale behind this selection procedure is to get a sample that is a homogenous as possible with respect to risk factors for ill-‐health. The age bracket chosen excludes ages where people have relatively few health problems (below 30) and ages where people experience increasing health problems (above 56). In step two individuals with socio-‐economic risk factors associated with ill-‐health are excluded. This gives a group with low levels of observed socio-‐economic risk factor. However, some of them will have high levels of non-‐observed risk factors. In order to control for this difference in un-‐ observed risk factors in the sample we exclude, in the third step, individuals that stay healthy in 2003. This, to a large degree, will eliminate individuals that have low unobserved risks. Thus, our final sample consists of individuals who not only have low observed risk factors but also have similar levels of unobserved risk. The advantage of having a sample that is homogenous with respect to risk factors for ill-‐health is that differences in outcomes across neighbourhoods for this group will not be strongly influenced by risk-‐
4
factor based sorting. This implies that estimated neighbourhood effects can be given a causal interpretation.
OUTCOME VARIABLE In 2000-‐2004 sickness benefits registered by Statistics Sweden were paid to employees that had been absent for more than two weeks (three weeks from July, 1, 2003). This implies that reception of sickness benefits is an indicator of a relatively severe illness and it can signal the onset of continuing health problems (Malmberg, Andersson and Subramanian, 2010). Most of the individuals that included in our sample, however, did neither receive sickness benefits in 2004 nor early retirement benefits. That is, from a social insurance perspective, they recovered from illness in 2004. In this study we will analyse neighbourhood effects on the recovery from ill-‐health with recovery defined as non-‐reception of sickness and early retirement benefits in 2004 of individuals that received sickness benefits in 2003. The study design is illustrated in Figure 1.
Healthy
2000%
• Not%sick% • No%social% allowance% • Not%unemployed% • Wage%income% • In%employment%
Healthy
2001%
• Not%sick% • No%social% allowance% • Not%unemployed% • Wage%income% • In%employment%
2002%
• Not%sick% • No%social% allowance% • Not%unemployed% • Wage%income% • In%employment%
Healthy or sick?
Sick
2003%
Healthy
Sickness% benefit%
2004%
Sickness% benefit% or%no% sickness% benefit%
FIGURE 1. STUDY DESIGN.
INDIVIDUAL LEVEL CONTROL VARIABLES Our approach to the elimination of selection effects have relied primarily on selecting a sample of individuals with similar observed and un-‐observed risk for ill-‐health. Given that probabilities of recovery from ill-‐health can be influenced also by demographic factors we will, however, employ three individual level variables in our model of recovery from ill-‐health: Age, sex, and immigration status. The latter variable can take five different values: Swedish-‐born, arrived before 1975, arrived in the 1975-‐1989, arrived 1990-‐94, and arrived 1995 or later.
CONTEXTUAL M EASUREMENT 5
Our approach to context measurement introduces two important novelties: first, and most importantly, we introduce contextual measures that are based on individually defined and scalable neighborhoods. Second, we introduce a factor-‐analysis based representation of the spatial variation in socio-‐ demographic context as a means to manage the wealth of information resulting from scalability (Andersson and Malmberg, 2013; Malmberg, Andersson and Bergsten, 2013).
I NDIVIDUALLY DEFINED AND SCALABLE NEIGHBORHOOD , E QUIPOP In this study we measure neighborhood population compositions using individual centered neighborhoods with fixed population size. We have used register data containing information of individual residential location to compute contextual variables based on the population composition among an individual's nearest 12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800 neighbors for 2003. In order to measure the population composition in individually defined neighborhoods we have used Equipop, a spatial analysis program developed in 2011 by John Östh in collaboration with Eva Andersson and Bo Malmberg (Equipop version 2012-‐Feb-‐20.). Equipop was first developed in order to address the modifiable areal unit problem, MAUP, in segregation measurement. As shown in Malmberg, Andersson, Östh (2011), traditional measures of segregation such as the isolation index are strongly dependent on the size of the statistical units for which the segregation index has been computed. In many cases, variation in segregation values is more influenced by varying areal subdivisions than by variation in residential patterns. In the Equipop software, the individualized neighborhoods are obtained by expanding a circular buffer around each residential location until the population encircled by the buffer corresponds to the population threshold chosen. When this threshold is reached, the program computes an aggregate statistics for the encircled population of a selected socio-‐economic variable. Equipop requires that the input data is geocoded on a detailed level. We have used data from the PLACE database. From this data, 6 different socio-‐demographic indicators have been extracted and used as input for Equipop, see Table 1. The variables used in this study should be seen as examples of variables that could be of interest. There are certainly room for including other indicators in order to explore other environmental dimensions, for example crime (Lorenc et al., 2012). TABLE 1. CONTEXT VARIABLES RUN IN EQUIPOP FOR K NEAREST NEIGHBORS IN 2003.
Variable
Description
Year
Population
Number of neighbors (k)
Education, young
1 = university/college, 0 = not university/college 1 = university/college, 0 = not university/college 1 = Sickness benefit
2003
30-‐49
2003
50-‐64
2003
30-‐49
1 = Sickness benefit
2003
50-‐64
Employment, young
1 = In employment (November)
2003
30-‐49
Employment, old
1 = In employment (November)
2003
50-‐64
12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800 12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800 12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800 12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800 12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800 12, 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800
Education, old Sickness benefit, young Sickness benefit, old
6
F ACTOR -‐ ANALYSIS BASED REPRESENTATION OF CONTEXTUAL VARIATION With 6 different socio-‐demographic indicators and 11 different levels of neighborhood scale we obtain a total of 66 different contextual variables. Clearly, such a large number of contextual variables cannot without problem be included as explanatory variable. Moreover, many of the indicators are strongly correlated, for example, contextual indicators based on the same socio-‐economic indicator but computed for different neighborhood sizes. In order to make the analysis manageable we have, therefore, subjected the contextual indicators to a factor analysis that compresses the 66 original indicators to 10 orthogonal factors that jointly captures 79% of the original variation. The factor analysis was based on covariances and the number of principal components to be rotated was selected based on them having eigenvalues higher than one. The factors were rotated using the varimax methods. Some factors influence small number of neighbors (k) as contextual variables and other factors influence large number of neighbors. This result of the factor analysis is clearly of interest since it provides an opportunity to analyze the scale dependence of contextual effects. Table 2 shows the descriptive names of the factors one to eight and indicate the scale of interest. TABLE 2. CONTEXT DESCRIBED BY INDIVIDUALIZED NEIGHBORHOODS FOR 2003.
Factor no. Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7 Factor 8
Factor name Elite areas High employment Sick, adjacent areas High employment, adjacent areas Young sick Old sick High employment, small scale Elite, adjacent areas
Figure 2 shows diagrams of what the different factors represent. This illustration is important since we are going to include factor scores as explanatory variables in the logistic regression of recovery from ill-‐ health. Without an interpretation of the different factors it will be difficult to interpret the regression results. Factor 1 Elite areas. High values of this factor in a location result in high shares of people with tertiary education both young and old, high employment share and low sickness benefits for the young group. Factor 2 High employment. High values on this factor results in high employment shares for both the young and the old group. Factor 3 Sick, adjacent areas. High values of this factor imply high level of sickness in adjacent areas and low shares people with tertiary education. Factor 4 High employment, adjacent areas. High values on this factor results in high levels of employment in adjacent areas. Factor 5 Young sick. High values on this factor results in high levels of sickness for both the young group and to some extent for the old group. Factor 6 Old sick. This factor contributes to high shares of the older group having sickness benefits Factor 7 High employment, small scale. Factor 7 is similar to Factor 2 with the difference that Factor 7 has an effect mainly on neighborhood scales below 400 persons. Factor 8 Elite, adjacent areas. Factor 8 is similar to Factor 1 with the difference that Factor 8 has an effect mainly for neighborhood scales above 1000 persons. 7
Factor'1'Elite'
Faktor'2'High'employment,''
1%
1%
0.8%
0.8%
0.6% 0.4% 0.2% 0% !0.2%
10%
100%
1000%
10000%
%Ter-ary%old%
0.6%
%Ter-ary%young%
0.4%
Employment%share% young%
Employment%share%old%
0.2%
Employment%share% old%
Sicknes%benefit%young%
0%
!0.4%
!0.2%
!0.6%
!0.4%
10%
1% 0.8%
0.6%
Sickness%benefit%old%
0.4%
Sicknes%benefit%young%
0.2%
%Ter-ary%old% %Ter-ary%young%
0% 10%
100%
1000%
10000%
0.6%
Employment%share% old%
0.4%
Employment%share% young%
0.2% 0% !0.2%
10%
100%
1000%
10000%
!0.4%
!0.4%
Factor'5'Young'sick'
Factor'6'Sick'old'max'for'k=100'
1%
1%
0.8%
0.8%
0.6%
0.6%
0.4%
Sicknes%benefit%young%
0.4%
0.2%
Sickness%benefit%old%
0.2%
Sickness%benefit%old%
0%
0% 10%
100%
1000%
10000%
!0.2%
10%
100%
1000%
10000%
!0.4%
!0.4%
Factor'7'Employment'old'max'for' k=100'
Factor'8'TerHary'adjacent'areas' 0.5% 0.4%
1%
0.3%
0.8%
Sicknes%benefit% young%
0.2%
0.6%
Employment%share%old%
0.4%
0.1%
%Ter-ary%old%
0% Employment%share% young%
0.2% 0% !0.2%
10000%
1%
0.8%
!0.2%
1000%
Factor'4'High'employment' adjacent'areas'
Factor'3'Sick,'adjacent'areas'
!0.2%
100%
10%
!0.4%
100%
1000%
10000%
!0.1%
10%
100%
1000%
10000%
%Ter-ary%young%
!0.2% !0.3% !0.4%
FIGURE 2. FACTORS AND LOADINGS. (TO REDUCE CLUTTER IN THESE GRAPHS ONLY SHOW FACTORS THAT FOR AT LEAST ONE K-‐ LEVEL HAS A LOADING HIGHER THAN 0.2 OR LOWER THAN -‐0.2 ARE INCLUDED.)
MODELS We will estimate 5 models. The first three are logistic regression using recovery to health in 2004 as the dependent variable. Model 1 uses only individual level control variables. Model 2 is our main model and uses both individual level variables and contextual variables based on individualized neighborhoods. Model 3 in addition adds interactions between contextual factors and individual characteristics (gender and migratory status) to the explanatory variables. Model 5 is similar to Model 2 but uses contextual variables based on fixed statistical areas (called SAMS in the Swedish context). 8
Model 6 takes advantage of the fact that we have data on survival up to year 2010 for the individuals in our sample. This data is used to estimate a proportional hazard model of survival with the same set of explanatory variables. These estimates are used to check if it is warranted to use sickness benefits to measure ill-‐health. A potential criticism of our estimates is that sickness benefits need not reflect true health status since these benefits are a part of social safety net and, thus, could be used also for providing income support for individuals without jobs.
RESULTS Table 3 presents a comparison of whole model results for Model 1 and Model 2. The comparison shows that the inclusion of contextual variables implies a significant increase in the explanatory power of the model. TABLE 3. COMPARISON OF –LOG LIKELIHOOD BETWEEN MODELS 1 AND 2.
Variables
Com-‐ parision
Difference, – LogLikelihood
DF Chi2
Prob.
Model 1 Individual level
Full vs. Reduced
359
7
717.8