contextual effects on recovery from illness in ... - DiVA portal

31 downloads 5330 Views 297KB Size Report
it is not enough to have individual and neighbourhood level data. It is also ... To analyse neighbourhood effects on recovery is, therefore, no less important than ...
   

         

31  MAJ  

CONTEXTUAL  EFFECTS  ON  RECOVERY  FROM  ILLNESS  IN   INDIVIDUALIZED  NEIGHBORHOODS                

BO  MALMBERG    

Dept.  of  Human  Geography   Stockholm  University   [email protected]  

     

 

EVA  ANDERSSON    

Dept.  of  Human  Geography   Stockholm  University   [email protected]  

 

 

ABSTRACT   To   what   extent   does   spatial   variation   in   ill   health   reflect   the   influence   of   contextual   factors   such   as   differences  in  social  trust,  density  of  social  network,  and  varying  social  support?  To  answer  this  question   it   is   not   enough   to   have   individual   and   neighbourhood   level   data.   It   is   also   necessary   to   have   an   idea   about   the   scale   at   which   social   influence   on   health   are   at   work.   In   this   paper   we   will   demonstrated   that   changes   in   neighbourhood   scale—that   is,   shifts   in   the   number   of   nearest   neighbours   that   are   used   to   compute   contextual   variables–can   lead   to   large   shifts   in   the   values   for   contextual   variables   that   are   assigned  to  different  individuals.  This  implies  that  estimates  of  neighbourhood  effects  are  not  invariant   to   changes   in   scale.   We   also   present   results   from   an   empirical   analysis   of   scale   dependent   neighbourhood  effects  using  Swedish  longitudinal  register-­‐based  data  on  sickness-­‐benefit  recipiency  as   an   indicator   of   onset   of   and   recovery   from   illness.   Sickness-­‐insurance   data   is   used   because,   for   confidentiality  reasons,  our  register  base  data  set  contains  limited  information  on  health  outcomes.  Our   first  sample  consists  of  individuals  that  have  stayed  healthy  and  in  work  for  a  three-­‐year  period,  some  of   whom   are   affected   by   illness   during   the   fourth   year.   Our   second   sample   consists   of   those   in   the   first   group   that   fall   ill   during   the   fourth   year,   some   of   who   return   to   good   health   in   the   fifth   year.   In   order   to   compute  the  contextual  variables  for  different  scale  level  we  use  the  Equipop  software.   Key  words:  contextual  effects,  neighborhood  effects,  context,  Equipop,              

2    

 

 

 

 

INTRODUCTION   MAUP.     In   studies   of   neighbourhood   effect   on   health   much   of   the   focus   has   been   on   how   contextual   factors   affect  the  risk  of  falling   ill   (Hartig   and   Lawrence,   2003;   Shouls,  Congdon  and  Curtis,  1996;   Stjärne,  Ponce   De  Leon  and  Hallqvist,  2004).  Health,  however,  is  not  only  a  result  of  not  falling  ill.  It  is  also  a  result  of   successful  recovery  from  ill-­‐health.  To  analyse  neighbourhood  effects  on  recovery  is,  therefore,  no  less   important   than   the   analysis   of   transitions   from   health   to   sickness.   In   fact,   it   can   be   argued   neighbourhood  context  becomes  even  more  important  for  people  in  ill-­‐health  who  spend  more  of  their   time  close  to  the  home.  Moreover,  as  we  will  argue  in  this  paper,  an  analysis  of  neighbourhood  effects   on   the   recovery   from   ill-­‐health   provides   an   opportunity   for   reducing   the   effect   of   selection   bias   on   estimated  neighbourhood  effects.     In   the   early   00s   sharply   increasing   sick-­‐rates,   and   rapidly   rising   cost   for   the   sick   insurance,   made   ill-­‐ health  in  the  working  age  population  an  intensively  discussed  issue  in  Sweden  (Marklund  et  al.,  2005;   Scb,   2004).   One   factor   singled   out   as   an   explanation   for   this   ill-­‐health   was   an   increase   in   long-­‐term   sickness   (see   Lidwall   and   Marklund,   2011).   Research   on   long-­‐term   sickness   in   Sweden   has   mainly   focused   on   work-­‐related   factors   but   it   also   possible   that   neighbourhood   factors   can   be   of   importance   especially,  as  argued  above,  for  rates  of  recovery.     A  challenge  for  studies  of  contextual  effects  on  health  is,  however,  to  determine  at  what  neighbourhood   scale   such   effects   are   likely   to   occur   (Schaefer-­‐Mcdaniel,   Dunn,   Minian   and   Katz,   2010).   Different   attempts  to  determine  the  relevant  scale  have  been  made  but,  as  yet,  there  exists  little  consensus  on   this   issue   (Spielman   and   Yoo,   2009).   In   this   paper   we   will,   therefore,   propose   an   approach   using   individualized  neighbourhoods  that  allows  the  question  of  scale  to  be  addressed  in  a  flexible  way  and,  at   the   same   time,   makes   it   possible   to   circumvent   the   indeterminacy   that   plagues   context   effect   studies   that  use  administratively  defined  areas  to  measure  neighbourhood  context.   This   paper,   thus,   has   a   two-­‐fold   aim.   The   first   aim   is   complement   earlier   studies   of   neighbourhood   effects  on  ill-­‐health  with  a  study  that  analyses  if  there  are  contextual  effects  on  recovery   of  health.  The   second   aim   is   to   analyse   if   contextual   measures   based   on   individualized,   scalable   neighbourhoods   provide  can  give  better  estimates  of  contextual  effects  than  traditional  area-­‐based  measures.    

METHODS  AND  DATA   Increasing   interest   in   the   analysing   the   effects   of   neighbourhood   context   on   health   and   other   individual   outcomes   has   been   accompanied   by   a   discussion   about   the   methodological   difficulties   involved   in   establishing   causal   links.   A   key   question   has   been   to   what   extent   self-­‐selection   of   individuals   into   neighbourhoods  will  make  it  difficult  to  estimate  true  contextual  effects  using  observational  data  (Diez   Roux  and  Mair,  2010).     If   individuals   were   randomly   selected   into   neighbourhoods,   casual   effects   of   neighbourhood   context   would   be   reflected   in   statistically   significant   differences   in   outcomes   across   neighbourhoods.   It   is,   however,  common  to  argue  that  individuals  are  selected  into  neighbourhoods  on  the  basis  of  observed   and  unobserved  characteristics  (Baker,  Bentley  and  Mason,  2013).  This  implies  that  the  assignment  of   individuals   to   neighbourhoods   is   non-­‐random   and,   as   a   consequence,   it   becomes   difficult   to   tell   if   3    

 

 

differing   outcomes   between   neighbourhoods   are   caused   by   the   selection   process   or   by   contextual   effects.     A   potential   remedy   for   this   problem   is   to   estimate   neighbourhoods   effects   using   individual   level   background   variables   to   control   differences   in   composition.   However,   as   argued   by   Oakes   (2004),   success   in   controlling   for   differences   in   composition   would   at   the   same   time   reduce   difference   in   outcome   across   neighbourhoods   that   are   due   to   contextual   effects.   Oakes   has   been   criticized   by   Subramanian   (2004).   He   acknowledges   parts   of   Oakes’   argument   but   maintains   that   there   are   empirical   designs  that  can  circumvent  the  selection  problem.         In   this   paper   we   will   use   two   different   approaches   to   address   the   problems   involved   in   estimating   contextual   effects.   First,   instead   of   relying   on   a   selection   equation   to   control   for   individual   level   background   variables   we   will   concentrate   our   analysis   on   a   sample   of   individuals   that   is   homogenous   as   possible   with   respect   to   risk   factors   for   ill-­‐health.   Second,   by   using   measures   of   the   socio-­‐economic   context  that  are  not  statistical  aggregates  for  a  given  neighbourhood  area  but  based  on  individualized   scalable   neighbourhoods   we   also   break   the   identity   between   neighbourhood   population   composition   and  our  measure  of  the  socio-­‐economic  environment.    

DATA   Our   data   comes   from   the   Population   and   Labour   Market,   Chorology   Database   (PLACE)   at   the   Department   of   Human   Geography   at   Uppsala   University.   This   database   contains   register-­‐based,   longitudinal,  individual  level  data  from  Statistic  Sweden  for  the  population  in  Sweden  from  1990  to  2010   with  geocodes  of  the  residential  location  by  100  meter  squares.  For  each  year  that  data  contains  more   than  100  different  individual-­‐level  variables  covering  demographic  information,  education,  occupation,   employment,   social   insurance,   and   different   income   measures.   Earlier   studies   using   the   same   data   to   study  health  outcomes  includes  (Fransson  and  Hartig,  2010;  Hartig  and  Fransson,  2009).  

SAMPLE   We  use  data  for  the  years  2000-­‐2010  and  our  sample  has  been  constructed  in  three  steps.  First,  we  have   selected  individuals  that  were  between  30  and  56  years  of  age  in  year  2000.  Second,  from  this  group  we   have  excluded  individuals  who  in  any  of  the  years  2000,  2001,  or  2002  received  unemployment  benefits,   received   social   allowance,   received   sickness   benefits,   did   not   have   wage   income,   or   were   not   in   employment  in  November.  Finally,  we  also  excluded  individuals  that  did  not  receive  sickness  benefits  in   2003.   The   rationale   behind   this   selection   procedure   is   to   get   a   sample   that   is   a   homogenous   as   possible   with   respect   to   risk   factors   for   ill-­‐health.   The   age   bracket   chosen   excludes   ages   where   people   have   relatively   few   health   problems   (below   30)   and   ages   where   people   experience   increasing   health   problems  (above  56).  In  step  two  individuals  with  socio-­‐economic  risk  factors  associated  with  ill-­‐health   are  excluded.  This  gives  a  group  with  low  levels  of  observed  socio-­‐economic  risk  factor.  However,  some   of  them  will  have  high  levels  of  non-­‐observed  risk  factors.  In  order  to  control  for  this  difference  in  un-­‐ observed  risk  factors  in  the  sample  we  exclude,  in  the  third  step,  individuals  that  stay  healthy  in  2003.   This,  to  a  large  degree,  will  eliminate  individuals  that  have  low  unobserved  risks.  Thus,  our  final  sample   consists   of   individuals   who   not   only   have   low   observed   risk   factors   but   also   have   similar   levels   of   unobserved  risk.     The  advantage  of  having  a  sample  that  is  homogenous  with  respect  to  risk  factors  for  ill-­‐health  is  that   differences   in   outcomes   across   neighbourhoods   for   this   group   will   not   be   strongly   influenced   by   risk-­‐

4    

 

 

factor   based   sorting.   This   implies   that   estimated   neighbourhood   effects   can   be   given   a   causal   interpretation.    

OUTCOME  VARIABLE   In  2000-­‐2004  sickness  benefits  registered  by  Statistics  Sweden  were  paid  to  employees  that  had  been   absent  for  more  than  two  weeks  (three  weeks  from  July,  1,  2003).  This  implies  that  reception  of  sickness   benefits   is   an   indicator   of   a   relatively   severe   illness   and   it   can   signal   the   onset   of   continuing   health   problems  (Malmberg,  Andersson  and  Subramanian,  2010).  Most  of  the  individuals  that  included  in  our   sample,   however,   did   neither   receive   sickness   benefits   in   2004   nor   early   retirement   benefits.   That   is,   from   a   social   insurance   perspective,   they   recovered   from   illness   in   2004.   In   this   study   we   will   analyse   neighbourhood   effects   on   the   recovery   from   ill-­‐health   with   recovery   defined   as   non-­‐reception   of   sickness  and  early  retirement  benefits  in  2004  of  individuals  that  received  sickness  benefits  in  2003.  The   study  design  is  illustrated  in  Figure  1.  

Healthy

2000%

• Not%sick% • No%social% allowance% • Not%unemployed% • Wage%income% • In%employment%

Healthy

2001%

• Not%sick% • No%social% allowance% • Not%unemployed% • Wage%income% • In%employment%

2002%

• Not%sick% • No%social% allowance% • Not%unemployed% • Wage%income% • In%employment%

Healthy or sick?

Sick

2003%

Healthy

Sickness% benefit%

2004%

Sickness% benefit% or%no% sickness% benefit%

  FIGURE  1.  STUDY  DESIGN.  

INDIVIDUAL  LEVEL  CONTROL  VARIABLES   Our   approach   to   the   elimination   of   selection   effects   have   relied   primarily   on   selecting   a   sample   of   individuals   with   similar   observed   and   un-­‐observed   risk   for   ill-­‐health.   Given   that   probabilities   of   recovery   from   ill-­‐health   can   be   influenced   also   by   demographic   factors   we   will,   however,   employ   three   individual   level   variables   in   our   model   of   recovery   from   ill-­‐health:   Age,   sex,   and   immigration   status.   The   latter   variable   can   take   five   different   values:   Swedish-­‐born,   arrived   before   1975,   arrived   in   the   1975-­‐1989,   arrived  1990-­‐94,  and  arrived  1995  or  later.    

CONTEXTUAL  M EASUREMENT   5    

 

 

Our   approach   to   context   measurement   introduces   two   important   novelties:   first,   and   most   importantly,   we  introduce  contextual  measures  that  are  based  on  individually  defined  and  scalable  neighborhoods.   Second,   we   introduce   a   factor-­‐analysis   based   representation   of   the   spatial   variation   in   socio-­‐ demographic   context   as   a   means   to   manage   the   wealth   of   information   resulting   from   scalability   (Andersson  and  Malmberg,  2013;  Malmberg,  Andersson  and  Bergsten,  2013).    

I NDIVIDUALLY  DEFINED  AND  SCALABLE  NEIGHBORHOOD ,   E QUIPOP   In   this   study   we   measure   neighborhood   population   compositions   using   individual   centered   neighborhoods   with   fixed   population   size.   We   have   used   register   data   containing   information   of   individual   residential   location   to   compute   contextual   variables   based   on   the   population   composition   among   an   individual's   nearest   12,   25,   50,   100,   200,   400,   800,   1600,   3200,   6400,   12800   neighbors   for   2003.     In   order   to   measure   the   population   composition   in   individually   defined   neighborhoods   we   have   used   Equipop,  a  spatial  analysis  program  developed  in  2011  by  John  Östh  in  collaboration  with  Eva  Andersson   and  Bo  Malmberg  (Equipop  version  2012-­‐Feb-­‐20.).  Equipop  was  first  developed  in  order  to  address  the   modifiable   areal   unit   problem,   MAUP,   in   segregation   measurement.   As   shown   in   Malmberg,   Andersson,   Östh  (2011),  traditional  measures  of  segregation  such  as  the  isolation  index  are  strongly  dependent  on   the   size   of   the   statistical   units   for   which   the   segregation   index   has   been   computed.   In   many   cases,   variation   in   segregation   values   is   more   influenced   by   varying   areal   subdivisions   than   by   variation   in   residential   patterns.   In   the   Equipop   software,   the   individualized   neighborhoods   are   obtained   by   expanding  a  circular  buffer  around  each  residential  location  until  the  population  encircled  by  the  buffer   corresponds   to   the   population   threshold   chosen.   When   this   threshold   is   reached,   the   program   computes  an  aggregate  statistics  for  the  encircled  population  of  a  selected  socio-­‐economic  variable.   Equipop  requires  that  the  input  data  is  geocoded  on  a  detailed  level.  We  have  used  data  from  the  PLACE   database.   From   this   data,   6   different   socio-­‐demographic   indicators   have   been   extracted   and   used   as   input  for  Equipop,  see  Table  1.  The  variables  used  in  this  study  should  be  seen  as  examples  of  variables   that   could   be   of   interest.   There   are   certainly   room   for   including   other   indicators   in   order   to   explore   other  environmental  dimensions,  for  example  crime  (Lorenc  et  al.,  2012).     TABLE  1.  CONTEXT  VARIABLES  RUN  IN  EQUIPOP  FOR  K  NEAREST  NEIGHBORS  IN  2003.  

Variable  

Description  

Year  

Population  

Number  of  neighbors  (k)  

Education,  young  

1  =  university/college,     0  =  not  university/college   1  =  university/college,     0  =  not  university/college   1  =  Sickness  benefit  

2003  

30-­‐49  

2003  

50-­‐64  

2003  

30-­‐49  

1  =  Sickness  benefit  

2003  

50-­‐64  

Employment,  young  

1  =  In  employment  (November)  

2003  

30-­‐49  

Employment,  old  

1  =  In  employment  (November)  

2003  

50-­‐64  

12,  25,  50,  100,  200,  400,  800,  1600,   3200,  6400,  12800   12,  25,  50,  100,  200,  400,  800,  1600,   3200,  6400,  12800   12,  25,  50,  100,  200,  400,  800,  1600,   3200,  6400,  12800   12,  25,  50,  100,  200,  400,  800,  1600,   3200,  6400,  12800   12,  25,  50,  100,  200,  400,  800,  1600,   3200,  6400,  12800   12,  25,  50,  100,  200,  400,  800,  1600,   3200,  6400,  12800  

Education,  old   Sickness  benefit,   young   Sickness  benefit,  old  

  6    

 

 

F ACTOR -­‐ ANALYSIS  BASED  REPRESENTATION  OF  CONTEXTUAL  VARIATION   With  6  different  socio-­‐demographic  indicators  and  11  different  levels  of  neighborhood  scale  we  obtain  a   total   of   66   different   contextual   variables.   Clearly,   such   a   large   number   of   contextual   variables   cannot   without   problem   be   included   as   explanatory   variable.   Moreover,   many   of   the   indicators   are   strongly   correlated,   for   example,   contextual   indicators   based   on   the   same   socio-­‐economic   indicator   but   computed   for   different   neighborhood   sizes.   In   order   to   make   the   analysis   manageable   we   have,   therefore,   subjected   the   contextual   indicators   to   a   factor   analysis   that   compresses   the   66   original   indicators  to  10  orthogonal  factors  that  jointly  captures  79%  of  the  original  variation.  The  factor  analysis   was  based  on  covariances  and  the  number  of  principal  components  to  be  rotated  was  selected  based  on   them  having  eigenvalues  higher  than  one.  The  factors  were  rotated  using  the  varimax  methods.   Some  factors  influence  small  number  of  neighbors  (k)  as  contextual  variables  and  other  factors  influence   large   number   of   neighbors.   This   result   of   the   factor   analysis   is   clearly   of   interest   since   it   provides   an   opportunity  to  analyze  the  scale  dependence  of  contextual  effects.    Table  2  shows  the  descriptive  names   of  the  factors  one  to  eight  and  indicate  the  scale  of  interest.   TABLE  2. CONTEXT  DESCRIBED  BY  INDIVIDUALIZED  NEIGHBORHOODS  FOR  2003.

Factor  no.   Factor      1   Factor      2   Factor      3   Factor      4   Factor      5   Factor      6   Factor    7   Factor  8  

Factor  name   Elite  areas   High  employment   Sick,  adjacent  areas   High  employment,  adjacent  areas   Young  sick   Old  sick   High  employment,  small  scale   Elite,  adjacent  areas  

  Figure  2    shows  diagrams  of  what  the  different  factors  represent.  This  illustration  is  important  since  we   are  going  to  include  factor  scores  as  explanatory  variables  in  the  logistic  regression  of  recovery  from  ill-­‐ health.  Without  an  interpretation  of  the  different  factors  it  will  be  difficult  to  interpret  the  regression   results.   Factor  1  Elite  areas.  High  values  of  this  factor  in  a  location  result  in  high  shares  of  people  with  tertiary   education  both  young  and  old,  high  employment  share  and  low  sickness  benefits  for  the  young  group.   Factor  2  High   employment.  High  values  on  this  factor  results  in  high  employment  shares  for  both  the   young  and  the  old  group.     Factor  3  Sick,   adjacent   areas.  High  values  of  this  factor  imply  high  level  of  sickness  in  adjacent  areas  and   low  shares  people  with  tertiary  education.   Factor   4   High   employment,   adjacent   areas.   High   values   on   this   factor   results   in   high   levels   of   employment  in  adjacent  areas.   Factor  5  Young   sick.  High  values  on  this  factor  results  in  high  levels  of  sickness  for  both  the  young  group   and  to  some  extent  for  the  old  group.   Factor  6  Old  sick.  This  factor  contributes  to  high  shares  of  the  older  group  having  sickness  benefits   Factor  7  High  employment,  small  scale.  Factor  7  is  similar  to  Factor  2  with  the  difference  that  Factor  7   has  an  effect  mainly  on  neighborhood  scales  below  400  persons.   Factor   8   Elite,   adjacent   areas.   Factor   8   is   similar   to   Factor   1   with   the   difference   that   Factor   8   has   an   effect  mainly  for  neighborhood  scales  above  1000  persons.     7    

 

 

Factor'1'Elite'

Faktor'2'High'employment,''

1%

1%

0.8%

0.8%

0.6% 0.4% 0.2% 0% !0.2%

10%

100%

1000%

10000%

%Ter-ary%old%

0.6%

%Ter-ary%young%

0.4%

Employment%share% young%

Employment%share%old%

0.2%

Employment%share% old%

Sicknes%benefit%young%

0%

!0.4%

!0.2%

!0.6%

!0.4%

10%

1% 0.8%

0.6%

Sickness%benefit%old%

0.4%

Sicknes%benefit%young%

0.2%

%Ter-ary%old% %Ter-ary%young%

0% 10%

100%

1000%

10000%

0.6%

Employment%share% old%

0.4%

Employment%share% young%

0.2% 0% !0.2%

10%

100%

1000%

10000%

!0.4%

!0.4%

Factor'5'Young'sick'

Factor'6'Sick'old'max'for'k=100'

1%

1%

0.8%

0.8%

0.6%

0.6%

0.4%

Sicknes%benefit%young%

0.4%

0.2%

Sickness%benefit%old%

0.2%

Sickness%benefit%old%

0%

0% 10%

100%

1000%

10000%

!0.2%

10%

100%

1000%

10000%

!0.4%

!0.4%

Factor'7'Employment'old'max'for' k=100'

Factor'8'TerHary'adjacent'areas' 0.5% 0.4%

1%

0.3%

0.8%

Sicknes%benefit% young%

0.2%

0.6%

Employment%share%old%

0.4%

0.1%

%Ter-ary%old%

0% Employment%share% young%

0.2% 0% !0.2%

10000%

1%

0.8%

!0.2%

1000%

Factor'4'High'employment' adjacent'areas'

Factor'3'Sick,'adjacent'areas'

!0.2%

100%

10%

!0.4%

100%

1000%

10000%

!0.1%

10%

100%

1000%

10000%

%Ter-ary%young%

!0.2% !0.3% !0.4%

  FIGURE  2.  FACTORS  AND  LOADINGS.  (TO  REDUCE  CLUTTER  IN  THESE  GRAPHS  ONLY  SHOW  FACTORS  THAT  FOR  AT  LEAST  ONE  K-­‐ LEVEL  HAS  A  LOADING  HIGHER  THAN  0.2  OR  LOWER  THAN  -­‐0.2  ARE  INCLUDED.)  

MODELS   We   will   estimate   5   models.   The   first   three   are   logistic   regression   using   recovery   to   health   in   2004   as   the   dependent  variable.  Model  1  uses  only  individual  level  control  variables.  Model  2  is  our  main  model  and   uses   both   individual   level   variables   and   contextual   variables   based   on   individualized   neighborhoods.   Model  3  in  addition  adds  interactions  between  contextual  factors  and  individual  characteristics  (gender   and  migratory  status)  to  the  explanatory  variables.   Model   5   is   similar   to   Model   2   but   uses   contextual   variables   based   on   fixed   statistical   areas   (called   SAMS   in  the  Swedish  context).     8    

 

 

Model   6   takes   advantage   of   the   fact   that   we   have   data   on   survival   up   to   year   2010   for   the   individuals   in   our  sample.  This  data  is  used  to  estimate  a  proportional  hazard  model  of  survival  with  the  same  set  of   explanatory   variables.   These   estimates   are   used   to   check   if   it   is   warranted   to   use   sickness   benefits   to   measure  ill-­‐health.  A  potential  criticism  of  our  estimates  is  that  sickness  benefits  need  not  reflect  true   health   status   since   these   benefits   are   a   part   of   social   safety   net   and,   thus,   could   be   used   also   for   providing  income  support  for  individuals  without  jobs.  

  RESULTS   Table   3   presents   a   comparison   of   whole   model   results   for   Model   1   and   Model   2.   The   comparison   shows   that  the  inclusion  of  contextual  variables  implies  a  significant  increase  in  the  explanatory  power  of  the   model.     TABLE  3.  COMPARISON  OF  –LOG  LIKELIHOOD  BETWEEN  MODELS  1  AND  2.  

   

Variables  

Com-­‐ parision  

 Difference,    – LogLikelihood  

DF   Chi2  

Prob.  

Model  1   Individual  level    

Full  vs.   Reduced  

359  

7  

717.8