Entropy and Accidents - University of Hawaii College of Social Sciences

1 downloads 0 Views 927KB Size Report
Department of Urban and Regional Planning, University of Hawaii at Manoa,. Saunders 107, 2424 Maile Way, Honolulu, HI 96822. Corresponding ... degree of centralization for reporting of accidents to police, accident data are generally of ...
Entropy and Accidents Karl Kim, Pradip Pant, Eric Yamashita, and I Made Brunner shown to change physical activity patterns (9); and other correlates between land use, urban form, and transport decisions exist (10). As the mix of different land uses increases, travel behavior should change. With increased entropy (land use mix), greater complexity, more competing agents, and increased activities would be expected, and these developments would in turn be expected to increase interactions and promote an increased risk of accidents, collisions, injuries, fatalities, and costs. Increased entropy brings a greater assortment of driver, vehicle, roadway, and environmental factors that come into play, exacerbating roadway hazards. The entropy effects would also be incurred at the boundaries and edges of districts where movements and transitions from one set of conditions to another would increase accident risk. Although entropy may be difficult to model, a reasonable hypothesis is that entropy exacerbates accidents. On the basis of earlier research in Hawaii on land use (11, 12) and accessibility and accidents (13) through the use of a variety of tools, including geographic information systems and geospatial–temporal models (14, 15) and structural equation modeling (16), it is evident that human factors and roadway factors such as volume and roadway configuration play stronger roles than land use or the mix of activity generators in explaining accident risk. Yet, Hawaii provides a unique and compelling environment with which to explore the subtle and challenging connections between land use and safety. Although Hawaii conjures up images of swaying palms and sandy beaches, it is an ideal setting in which to explore the relationships between land use and accidents. First, because of its remote location and island setting, it provides an isolated laboratory in which to focus on the interactions between driver, vehicle, roadway, and environmental factors. Second, with only four counties and a high degree of centralization for reporting of accidents to police, accident data are generally of higher quality and consistency than those from other more distributed systems. Third, the research presented in this paper builds on more than two decades of experience working with accident, land use, demographic, and transportation interactions.

This study explored the relationships between land use entropy (the extent to which land uses are mixed, heterogeneous, and nonuniform) and motor vehicle accidents. Two aspects of entropy were considered: (a) the mix of jobs and housing and (b) the diversity of jobs in addition to the mix of jobs and housing. These measures were developed and tested with census data and geographic information system technologies combined with comprehensive police accident reports from the city and county of Honolulu, Hawaii. A grid-based approach was adopted with accident counts and negative binomial regression. Various types of accident counts were considered, including total, daytime, and nighttime accidents, as well as accidents involving tourists, nonuse of seat belts, and driving under the influence of alcohol. Grid-based characteristics were also considered, such as distance from the urban center, traffic volume, roadway length, transit use, land values, and roadway configuration (intersections versus dead ends). Although entropy plays a statistically significant role, especially for total accident counts, daytime accidents, and accidents involving tourists, the relationships involving effects such as volume, roadway length, distance to the central business district, and transit use are generally more readily detected than entropy effects. Although the research shed additional light on the complex and subtle relationships between land use and accidents, implications for both traffic safety and modeling of spatial phenomena were also apparent. Rather than examine accidents without consideration of driver characteristics and vehicle and roadway factors, this study estimated interactions between human and vehicle factors while also taking into account differences in environmental conditions and land uses that affect crashes at different spatial resolutions.

This paper explores the relationships between land use entropy and motor vehicle accidents. The research builds on efforts to explain the relationships between urban form and travel behavior advanced by Handy (1) and extended by Cervero and Kockelman (2) and ­others who have tackled the questions of how density, diversity, and design influence transportation. Although current research focuses on the linkages between urban form and walking, biking, and public transport use (3–5) and effects on obesity and public health (6, 7), this paper investigates the connections between entropy and accidents. Similar to the definition provided by Cervero and Kockelman (2), land use entropy is defined as the extent to which land uses are ­heterogeneous, mixed, and nonuniform. Sufficient evidence exists to suspect that urban form and residential choices affect travel behavior (8). The presence of parks has been

Overview of Research The paper is structured in the following manner. After a description of the data sources and methods, the accident types and independent variables, including land use, traffic volume, roadway characteristics, traffic volume, transit use, and roadway patterns, as well as land values and entropy measures, are presented as standard descriptive statistics (means, variance, skewness, kurtosis, etc.). Then, the accident data and entropy measures are mapped and described. Efforts to build, test, and validate the negative binomial regression models are recounted. The best-fitting models are presented along with a discussion comparing the various models with each other and summarizing some of the similarities and differences. In a concluding section, the implications and contributions of the work for traffic safety programs and future research endeavors are described.

Department of Urban and Regional Planning, University of Hawaii at Manoa, Saunders 107, 2424 Maile Way, Honolulu, HI 96822. Corresponding author: K. Kim, [email protected]. Transportation Research Record: Journal of the Transportation Research Board, No. 2280, Transportation Research Board of the National Academies, Washington, D.C., 2012, pp. 173–182. DOI: 10.3141/2280-19 173

174

Transportation Research Record 2280

Data and Methods The data used in this study principally came from the Department of Transportation, State of Hawaii, and were collected by the H ­ onolulu Police Department. Honolulu is the largest and most complex of the four counties in Hawaii. It covers the entire island of Oahu. Approximately 860,000 people reside on this island. It also has more than 120,000 tourists and visitors (students, military, etc.) on any given day. It contains a mix of diverse land uses, including conservation areas (where development is restricted), military facilities (Army, Navy, Air Force, and Marine Corps bases), as well as larger residential and commercial–business districts, including the world-renowned Waikiki resort district on the southeastern shore of Oahu. Because of the mix of roadway types (freeways, highways, arterials, local roads) and because of a well-developed transport system, including the state’s largest bus system, Honolulu is an ideal setting in which to study entropy and accidents. The accident, transport, and land use data were geographically coded to a uniform grid structure that comprised cells of 0.1 mi2 (approximately 64 acres) in area. Unlike census tracts and block groups, a uniform grid structure was used; and information from cadastral files on land use, employment, and traffic was assigned to each cell. The conversion of thematic maps to grid-level data required various spatial manipulation routines with geographic information systems. Grid cells 0.1 mi2 were first overlaid over a digital map of the island of Oahu. The resulting polygon vector grid map was then used as the base map onto which different thematic maps (maps of population distribution, land use, employment distribution, etc.) were overlaid. Different approaches were used to distribute the polygon thematic map attributes to the reference grid map. For example, if a grid contained several different land uses, the most dominant land use type

was assigned as the grid land use. Geographic processing routines were used to aggregate road length, intersections, and dead-end roads within each grid. Weighted values for grid cells were calculated to capture the influences of the surrounding grid cells. The grid value was then smoothed out and the effects of spatial correlation were reduced. The values of the cell were aggregated with two tiers of neighboring grid cells surrounding it with weighted coefficients of 0.5 and 0.25, ­respectively, to reflect the effects of distance on the surrounding cells. Table 1 shows a list of the 19 variables used in this study. Half of the variables are dependent variables used to characterize the different types of accident counts by grid cell, including TotAcc (total number of accidents), DayAcc (total daytime accidents), ­NightAcc (total nighttime accidents), ResAcc (total accidents involving residents), TourAcc (total accidents involving tourists), BeltAcc (total accidents involving belted motorists), NoBelt (total accidents involving unbelted motorists), Alc (total accidents involving drivers under the influence of alcohol), and NoAlc (total accidents without driving under the influence). Table 1 also contains descriptions of the independent variables. LandUse is a categorical variable involving five land use categories (LandUse 1 to LandUse 5; preservation, military, agricultural, residential, business, and commercial). CBDdist is the distance of the center of the grid cell to the center of the central business d­ istrict (CBD), Volume is a categorical variable for traffic volume (no traffic or light, medium, and high traffic) evaluated on the basis of the most dominant of the road link’s traffic conditions within the grids, RoadLen refers to the total roadway mileage in the grid cell, BusLen is a measure of the total length of bus route mileage in the grid cell, BusStop contains the number of bus stops in each grid cell, ­DeadInter is the weighted ratio of the number of dead-end streets to the number of intersections in the grid cell, TotVal is the estimated

TABLE 1   Variable Description, Unit, and Type Variable

Description

Unit

Variable Type

Total number of accidents Number of accidents during day Number of accidents during night Number of accidents involving only Hawaii residents Number of accidents involving visitors or tourist Number of accidents where seat belt or restraint was used Number of accidents where seat belt or restraint was not used Number of accidents involving alcohol use Number of accidents where alcohol was not involved

Number per grid Number per grid Number per grid Number per grid Number per grid Number per grid Number per grid Number per grid Number per grid

Count Count Count Count Count Count Count Count Count

Land use classification Distance of grid from Honolulu downtown Average traffic condition in the grid Weighted length of road in the grid Weighted length of bus route in the grid Weighted number of bus stops in the grid Weighted ratio of road dead end to intersection Weighted total value of the development in the grid Weighted mixed-use index value of the grid Weighted diversity index of the grid

na Mile na Mile Mile Number Percentage Millions of dollars Index Index

Nominal Ratio Ordinal Ratio Ratio Ratio Interval Ratio Interval Interval

Dependent TotAcc DayAcc NightAcc ResAcc TourAcc BeltAcc NoBelt Alc NoAlc Independent LandUse CBDdist Volume RoadLen BusLen BusStop DeadInter TotVal Entropy DivIndex

Note: na = not applicable.

Kim, Pant, Yamashita, and Brunner

175

total value of all real property in the cell, Entropy refers to the mix of jobs and housing within the grid cell, and DivIndex is an index measuring the cumulative mix of jobs in the individual cell. Variables such as entropy and diversity help operationalize how land use mix influences travel behavior. These variables can be derived from various attributes, such as land area, floor area, or employment mix. Krizek defines entropy as the evenness of the distribution of usage between several land use categories (17). ­Cervero and Kockelman (2) and Kockelman (18) used a dissimilarity index and entropy to define diversity. Dissimilarity is defined as the proportion of dissimilar land uses among hectare grid cells within a census tract, and entropy is defined as a measure of homogeneity and heterogeneity (2). In many cases, land use balance, land use mixing, and accessibility were found to be more relevant (measured by elasticities) than several household and traveler characteristics that often form a basis for most travel behavior prediction models (18). Hossack applied a land use index as a measure of diversity to calculate, visualize, and analyze land use mixing for the Sacramento, California, Area Council of Governments (19). In this research, two variables, entropy and the diversity index, were developed to measure and model the homogeneity and heterogeneity of land use. The entropy variable is a land use mix value within each grid cell and is calculated from population and ­employment in the grid cell. entropy = 1 −

( EMP − ROahu × POP ) EMP + ROahu × POP

(1)

and ROahu =

EMPOahu POPOahu

(2)

where EMP = total employment within each grid cell, POP = total population within each grid cell, and ROahu = ratio of employment on the island of Oahu (EMPOahu) to population on the island of Oahu (POPOahu). The diversity variable (DivIndex) is an index value used to calculate the diversity of the grid cell. This is obtained from population and employment in different sectors, such that  n  DivIndex =  ∑ Wi × Ri  × 100  i =1 

(3)

 POP × β i  SEMP i  Ri =  SEMP  i  POP × β i

(4)

βi =

if POP × β i ≤ SEMPi if POP × β i > SEMPi

(SEMPOahu )i POPOahu

where n = number of sectors, Wi = weight factor for sector i, POP = population within the grid,

SEMPi = sectoral employment in sector i within the grid, and βi = ratio of employment in sector i (SEMPOahu)i to ­population (POPOahu) on Oahu. Both the entropy and diversity variables were rescaled to 0 to 100, such that 0 indicates homogeneity and 100 indicates heterogeneity. To incorporate the influence of surrounding grid cells, the predictor variables of a grid cell were added to the first and second tiers of the surrounding grid cells, with application of weights of 0.5 and 0.25, respectively, which thus provides a weighted value for each cell. A combination of nominal, ordinal, interval, and ratio variables was used. Although the dependent variables (TotAcc, DayAcc, NightAcc, ResAcc, TourAcc, BeltAcc, NoBelt, Alc, NoAlc) are expressed as counts, the independent variables include class (nominal) variables (LandUse), ordinal variables (Volume), interval variables (DeadInter, Entropy, DivIndex) and ratio variables (CBDdist, RoadLen, BusLen, BusStop, TotVal). Table 2 shows more details on the categories of land use and estimated traffic volumes for the grid cells used in this analysis. The LandUse variable includes both built-up areas, such as residential and commercial areas, and less developed areas, including agricultural and preservation lands. The preservation category includes conservation lands, mountainous areas, and places where development is restricted. Military lands are included because of the large military presence on Oahu. Table 2 also shows details on the categorization of traffic into no traffic and traffic with low, medium, and high volumes. Three methods of analysis are used in this paper: (a) descriptive statistics, (b) geographic information systems and mapping technologies, and (c) negative binomial modeling. In addition to the production of means and variances, statistics describing the shape of the distributions (skewness and kurtosis) are also provided because these give valuable clues about the proper functions for regression modeling. By looking at the variances of both dependent and independent variables, one can envision a strategy for modeling and testing the statistical relationships between these variables. Various forms of the Poisson and negative binomial models were estimated. The data were found to best fit the negative binomial model. The negative binomial model is frequently used with event count data and has been recommended for use for accident data analysis (20). The principal method used to investigate the influences of entropy on accidents is negative binomial regression. Negative binomial models are particularly robust in instances of overdispersion (when the variance exceeds the mean of the observations). The negative binomial distribution, which is commonly symbolized as a negative binomial, is derived from a Poisson–gamma mixture distribution. The negative binomial distribution has two parameters: the mean and a dispersion parameter. The relationship between the expected

TABLE 2   Land Use Categories and Traffic Volumes Variable LandUse

(5)

Volume

Class Value 1 2 3 4 5 1 2 3 4

Description Preservation land Military lands (bases, facilities, training areas) Agriculture Residential (single family, duplex, low density) Commercial (includes business and resort) Virtually no traffic (absence of paved roads) Low traffic volume Medium traffic volume High traffic volume

176

Transportation Research Record 2280

number of accidents (Yi) occurring at grid i with a set of q predictor variables, Xi1, Xi2, . . . , Xiq, is (20, 21) function (Yi ) = β 0 + β1 X i1 + β 2 X i 2 +  + β q X iq

(6)

where β1, β2, . . . , βq are the regression coefficients and where it is assumed that the number of accidents (Yi) follows a negative binomial distribution with parameters α and d (with 0 ≤ α ≤ 1 and d ≥ 0). That is, the probability (Pr) that a grid defined by a known set of predictor variables, Xi1, Xi2, . . . , Xiq, experiences Yi = yi accidents can be expressed as Pr (Yi = yi; α , d ) =

( yi + d − 1)! αy yi ! + ( d − 1)! (1 + α ) y + k i

i

yi = 0, 1, 2, . . . (7)

where the parameters and regression coefficients β0, β1, . . . , βq for the negative binomial regression can be estimated by use of a generalized linear model procedure, such as PROC GENMOD in SAS statistical software (22). Findings This section presents the findings of the study. First, descriptive statistics are used to characterize the data used according to measures of central tendency and dispersion. The skewness and kurtosis measures are also provided. Second, the results of the geospatial analysis are presented, including measures of accidents, entropy, and diversity of jobs. Finally, the regression modeling results are described and compared. Table 3 shows an overview of the data used in this study. The top half of the table contains various data on the accident counts beginning with the total count of accidents by grid cell, which ranges from 0 to 229 accidents, with a mean of 5.8 and a variance of 285.4. The total number of accidents is 16,212. The distributions are clearly nonnormal, as evidenced by the skew (a lack of symmetry in the distribution) and kurtosis (peakedness). The count data provide a validity check on the data, showing that the count of accidents during the day (10,083) exceeds the number for nighttime accidents (6,129). In

a state with high rates of seat belt use, the number of belted drivers involved (13,646) greatly exceeds the number of unbelted drivers (2,566). Similarly, the number of accidents without alcohol-impaired drivers (15,334) exceeds the number of accidents with impaired ­drivers (878). Accidents involving Hawaii residents (13,200) exceed the number of accidents involving tourists or visitors (3,012). The accident types with the largest means include TotAcc (total accidents), BeltAcc (belted accidents), NoAlc (no alcohol), and ResAcc (accidents involving drivers with an Hawaii driver’s license). The smaller-frequency events include accidents involving alcohol (878) and accidents involving nonuse of a seat belt (2,566). For the independent (explanatory) variables, land use entropy (Entropy) and job diversity (DivIndex) at the bottom of Table 3, along with property valuations (TotVal) and several of the transport values (DeadInter, CBDdist, etc.), show the greatest means and variances. Values for both land use categories and traffic volumes show a much narrower range. Figure 1 shows the distribution of accidents in Honolulu. It is not surprising that accidents are concentrated in the downtown area and along the major east-to-west and coastal arterial road network. Figure 2 shows the weighted entropy index for the grid cells. The index value for each grid cell is added to that of the first and second tiers of surrounding cells weighted by factors of 0.5 and 0.25, respectively, to obtain a weighted value. Similarly, Figure 3 shows a ­commutative diversity index for the grid cells. A preliminary indication from the three figures suggests that accidents should correlate with both types of land use indices (Entropy and DivIndex). Negative Binomial Regression Model Results Before the results and findings for the negative binomial regression model are presented, some discussion of the modeling strategy and validation is needed. As noted earlier, because of the presence of overdispersion, negative binomial modeling was selected. This provides a powerful tool for examining a range of different variables. In addition to examination of the maximum likelihood estimates, parameter values, Wald 95% confidence intervals, and chi-square

TABLE 3   Data Used in Study Variable

Mean

Variance

Minimum

Maximum

Total

Skewness

Kurtosis

Accident Type TotAcc DayAcc NightAcc ResAcc TourAcc BeltAcc NoBelt Alc NoAlc

5.8 3.6 2.2 4.7 1.1 4.9 0.9 0.3 5.5

285.4 121.2 38.9 190.7 12.5 212.5 7.0 1.1 257.8

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

229.0 139.0 90.0 191.0 38.0 203.0 27.0 14.0 215.0

16,212 10,083 6,129 13,200 3,012 13,646 2,566 878 15,334

5.7 5.9 5.5 5.8 5.4 5.8 5.1 5.5 5.7

42.3 45.0 42.1 43.8 35.5 44.9 33.1 41.2 42.3

1.4 55.7 1.0 23.2 207.7 155.5 2,173.3 239,979.7 12,567.5 3,172.8

1.0 0.5 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0

5.0 32.5 4.0 22.5 117.0 69.8 900.0 3,288.9 595.3 410.7

na na na na na na na na na na

−0.1 0.3 1.1 1.1 2.9 1.9 6.2 2.2 1.5 2.7

−1.0 −0.9 0.1 0.3 10.0 3.8 80.3 6.1 2.2 9.0

Grid Characteristic LandUse CBDdist Volume RoadLen BusLen BusStop DeadInter TotVal Entropy DivIndex

2.7 13.9 1.7 6.3 9.1 10.2 49.0 371.9 96.3 37.9

Kim, Pant, Yamashita, and Brunner

177

Accidents/Grid 1–25 25–50 50–70 70–90 90–115 115–140 140–160 160–180 180–210 210–230

0

2

4

8

12

16 Miles

FIGURE 1   Spatial distribution of accidents, 2002 to 2004 (number of grid cells 5 2,786, number of accidents 5 16,212).

Mixed Use Index 0.05–60 60–120 120–180 180–240 240–300 300–360 360–420 420–480 480–540 540–600

0

2

4

8

12

FIGURE 2   Entropy based on population and job mix.

16 Miles

178

Transportation Research Record 2280

Diversity Index 0001 – 40 40–80 80–120 120–160 160–210 210–250 250–290 290–330 330–370 370–420

0

2

4

8

12

16 Miles

FIGURE 3   Diversity based on cumulative job mix.

goodness-of-fit test statistics on each of the parameters, the dispersion values were also generated to ensure the appropriateness of the model. The model was compared with the intercept-only model. The model p-value was calculated with the following formula: χ 2 = 2 * ( FLL m − FLL n )

(8)

where FLLm is the model full log likelihood and FLLn is the interceptonly null model full log likelihood. The number of degrees of freedom for the chi-square value is the number of predictor variables in the model. If the p-value is less than .05, the model is statistically significant. In the initial modeling efforts, the diversity variable failed to be significant, so entropy as defined earlier became the key independent variable. Other nonsignificant predictors (p > .05), including property valuations (TotVal) and the ratio of the number of dead-ends to the number of intersections (DeadInter), were also dropped from the model. Depending on the accident count formulation (total, day, night, belted, alcohol, etc.), various effects were found to be either significant or not. The results for the best-fitting model are presented in Table 4. The column Predicted Event in Table 4 shows the incident rate ratio, which is another way of looking at the results other than the parameter estimates. This ratio was obtained in SAS with an option in the GENMOD procedure. The accident rate for the grids used for military uses (LandUse 2) was 0.31 times that for the reference group (preservation lands, LandUse 1). Accident rates for residential use grids (LandUse 4) and commercial use (LandUse 5) are 2.08 and 1.66 times the rate for the reference group (preservation lands). Because the parameter estimate for lands zoned for agriculture is insignificant, no accident rate for that group is provided. For traffic volume, the accident rate for grids with low traffic (Volume 2) is 5.55 times that for the reference group (grids with no traffic, Volume 1). The accident rates for grids with medium traffic

(Volume 3) and high traffic (Volume 4) are 7.33 and 6.83 times the rate for the reference group (Volume 1). With the interval-scale independent variables, the predicted event value provides a percent change in the accident rate for every unit increase in the independent variables. Accordingly, each mile in distance from the core downtown area of Honolulu (CBDdist) reduced the total accident rate by 2.08%. Similarly, the weighted length of road miles in the grid (RoadLen), the weighted length of bus route

TABLE 4   Best-Fitting Model Parameter Estimates and Predicted Events for Total Accidents TotAcc Parameter

Estimate

Probability

Predicted Event

Independent Variable Range

Intercept LandUse 2 LandUse 3 LandUse 4 LandUse 5 CBDdist Volume 2 Volume 3 Volume 4 RoadLen BusLen BusStop Entropy

−1.007 −1.1828 −0.004 0.7309 0.5076 −0.0211 1.7146 1.9923 1.9216 0.0564 0.0196 0.0205 0.0015