A Comparison between Multivariate Poisson Gamma Mixture Model ...

2 downloads 0 Views 229KB Size Report
Poisson gamma mixture count model (MVPGM) has been implemented. The model ... severity: multivariate Poisson model; Multivariate negative binomial model; ...
A Comparison between Multivariate Poisson Gamma Mixture Model and Univariate Negative Binomial Count Models to Examine Types of Accident Frequencies on Freeway Ghasak Al-Mothafer a, Toshiyuki Yamamoto b, Venkataraman N Shankar c a

Department of Civil Engineering, Nagoya University, Nagoya, 464-8603, Japan EcoTopia Science Institute, Nagoya University, Nagoya, 464-8603, Japan c Department of Civil and Environmental Engineering, Penn State University, Pennsylvania, 16802, USA a E-mail: [email protected] b E-mail: [email protected] c E-mail: [email protected] b

Abstract: This paper investigates the relationships among the accident types on freeway sections using three-years (2005-2007) accident data for 275 multilane freeway segments in the State of Washington, U.S.A. Rear end, sideswipe, fixed objects with other types are considered. To comprehend correlations among different types of accidents and explanatory variables, while taking full benefit of the available accident count record, a multivariate Poisson gamma mixture count model (MVPGM) has been implemented. The model consents a restricted correlation pattern allowing for positive correlation among accident types. The model parameters are estimated using a maximum likelihood approach. Based on the empirical results error correlations across accidents types are significantly presented. The proposed model shows significant unobserved correlations among different types of accidents frequencies and representativeness of the covariance structure of accident types and the variance of the total number of accidents better than separate univariate models. The results also reveal that rear end accident type is more likely to be affected by geometric and traffic characteristics of freeway. Keywords: Multivariate Count Data, Poisson Gamma Mixture, Accidents Types, Correlations.

1. INTRODUCTION Various types of accidents that occur at freeway need to be accommodated and investigated. Types of accidents at freeway include rear-end, sideswipe, fixed objects, same direction and opposite direction, head-on and other types. It’s obvious that when a specific vehicle encounter various types of geometric, weather, traffic characteristics an individual driver will interact with a different response leading to a specific type of accident to occur. Modeling types of accident frequencies can be done separately or dependently as a function of freeway characteristics and environmental conditions. When a dependently approach is adopted the correlation is accommodated by considering the error term as unobserved factors that simultaneously affect the occurrence of accident types. An enormous body of literature devoted to modeling accident and safety considerations. Lord & Mannering (2010) shows an extensive review and assessment of methodological alternatives related to accident-frequency data. Accident frequency and severity are two key indices that measure risk for a roadway segment. The classical approach uses a conventional frequency model to predict total number of accidents (univariate model). For example Lord et

al. (2005) have compared Poisson, Poisson-Gamma and Zero-inflated regression models of motor vehicle accidents. Most of researches that consider accident types in the accident frequency modeling tend to use them as an explanatory variable (Chiou et al, 2013; Shaheed et al., 2013; Gkritza et al., 2010; Yan et al., 2011; Yang, et al., 2011; Mannering, et al., 1996), while other comprehensively studied one type of accidents at specific facility (Das and AbdelAty, 2011; Dissanayake and Lu, 2002). The need for developing and estimating simultaneously accident types has been recognized in the literature (Ma et al., 2008; Park and Lord, 2007). The development and application of multivariate frequency models have been introduced in the field of transportation before. Several researchers have investigated freeway accident statistics and attempted to develop models of accident frequencies for freeway. Ye et al. (2013) model accident frequency by severity at freeway using simultaneous equations Poisson log normal model with error components that normally distributed. Anastasopoulos et al. (2012) have utilized multivariate Tobit model by considering the accident rates instead of accident frequencies. Chiou and Fu (2013) have modeled accident frequency by severity using multinomial-generalized Poisson model with error components, while Dong et al. (2014) have used multivariate random-parameters zero-inflated negative binomial regression model to estimate accident frequencies of different types at intersections. Basically there are five multivariate count models to estimate the correlation among frequencies by accidents type or severity: multivariate Poisson model; Multivariate negative binomial model; multivariate Poisson-gamma mixture model; multivariate Poisson-log-normal model and latent Poissonnormal model. Multivariate Poisson-log-normal model have been used extensively in the literature for both accident types and severity. Ye et al., (2013) and Chiou and Fu (2013) have used maximum simulated likelihood method to estimate the parameters of this model, while Park and Lord (2007), Basyouny and Sayed (2009), Ma et al. (2008) and Ma et al. (2006) have used the Bayesian approach. Nevertheless, there is some doubt whether these models could be time consuming when applied to high dimensional multivariate data (Winkleman, 2008). The other model is multivariate latent Poisson-normal that proposed the non-linear parameterization of the thresholds as a function of exogenous variables (Castro et al., 2013; Castro et al., 2012; Yasmin et al., 2014; Baht et al., 2014). Complexity and no close-form for the joint probability are considered as the main hindrance to estimate parameters of these models. The Poisson gamma mixture model was first introduced by Husman et al. (1984). Also further explanation was introduced by Dey and Chung (1992). In their model correlation is generated by an individual specific multiplicative error term. Miles (2001) provides an application to individual consumer data on the number of purchases of bread and cookies in one-week period, and parameterized and estimated by maximum likelihood of Poisson gamma mixture probability. Kockelman (2001) applied for time and budget constrained activity demand analysis utilizing the same model as Miles (2001). This model offers a close form and is easy to estimate by using the maximum likelihood method. Thus, this paper develops a multivariate Poisson gamma mixture (MVPGM) model to simultaneously model accident frequencies by type (count data) considering the effects of various roadway, geometric and traffic volume factors on accident frequencies. Moreover, the proposed model considers the covariance matrix through error components specified under an integrated model framework among accidents types. Model estimation is achieved through the use of maximum likelihood estimation (MLE) method that provide consistent and efficient parameter estimates, and test statistics for hypotheses testing. The rest of the paper is structured as follows. The next section presents the types of accidents data. Section 3 presents the building blocks of the model in terms of formulation

and inference. Section 4 illustrates an application of the proposed model for analyzing accident type count at freeway. The fifth and final section offers concluding thoughts and directions for further research. 2. DATA The accident dataset is obtained for multilane divided Interstate highway No. 5 in the State of Washington, USA. Three years accidents data were collected from 2005 to 2007. Data contained three different categories: (1) the accident database; (2) geometric characteristics; and (3) the traffic information. The accident database includes accident information, such as crash types and severity of accidents. These highways, which are part of the National Highway System, are considered critical routes because of their high economic importance (Ye et al., 2013). Freeway runs of 275 roadway segments of varying lengths with a mean segment length of roughly 0.87 miles with a standard deviation of about 0.60 miles. For each roadway segment, accidents were classified by year, and individual accident data reports on the roadway segments were cumulated based on the type of accident. Hence, accident frequency counts by types were obtained for each freeway segment. In total 13,359 individual accidents were included in this study. Accident frequencies by type an year are shown in Table 1. The table shows that rear-end, sideswipe, and fixed objects accidents types are considered the highest occurrences according. Due to the limited number of accidents that resulted in overturn; same direction; head-on; and other types, it was not conceivable to statistically distinguish among all four categories. Hence, the four categories were combined into a single category (other types). The distribution of accident frequencies among segments per year by each accident type is shown in Figure 1. The figure presents the percentage of zero count out of total number of rear end accidents across all segments is 3%. In the same way the sideswipe, fixed objects and other types of accidents are 12%; 11%; 19% respectively. Therefore the zero count is not the main count in the data, and zero-inflated count model is not considered in this study. Data from the Washington State Department of Transportation databases were used for geometric, including traffic characteristics related to each segment on the roadway. Geometric data include a percentage of lanes cross section proportion by length of segment, minimum and maximum radii of horizontal curves, central angle of horizontal curves, grade, minimum grade, maximum grade, grade differential, tangent length, number of changes in grade, number of horizontal curves per segment, number of vertical curves per segment, presence of interchanges and presence of exit and entrance. Traffic operations data include average annual daily traffic. Table 1 Accident frequencies by type and year Accident type Year 2005 2006 2007 Total

Total accidents 4550 4519 4290 13359

RearEnd

Sideswipe

Fixed objects

2578 2543 2391 7512

775 785 817 2377

684 683 637 2004

Other types Same HeadOverturn Direction on 79 228 6 82 265 1 81 192 5 242 685 12

Others 199 160 166 525

(a) Rear end

(b) Sideswipe

(c) Fixed objects (d) Other types Figure (1) Distribution of accident frequency by type Table 2 provides information on the mean and standard deviation of selected variables in the dataset. There are basically 12 exogenous variables representing the explanatory variables for both traffic volume and geometric of each segment on the freeway, and logarithmic conversions of annual average daily traffic volume and length in miles are used in the model. 4 endogenous variables represent the accidents types considered in this study. Table 2 Descriptive Statistic Accident types

Abbr. Re Ss Fo Oth

Explanatory variables including geometric and Traffic volume

AADT LnAADT Length LnLength Urorru Nl345 Nhorz Diamond Maxvcrvk

Variables Number of rear-end accidents per year Number of sideswipe accidents per year Number of fixed object accidents per year Other Types [Same direction, overturn, head-on] Annual Average daily traffic volume in vehicles per hour Logarithm of AADT Length in miles Logarithm of segment length Urban rural dummy, 1 if rural, 0 if urban Percentage of Three lanes or larger {up to 5 lanes} cross section proportion by length of segment. Number of horizontal Alignments per segment. Interchange type dummy for diamond ramps Largest vertical curve rate of vertical curvature in

Mean 8.56 2.89 2.44 1.78

St. dev. 16.26 5.71 3.32 2.42

17207

7151

9.75 0.87 -0.13 0.27

8.87 0.60 -0.51 0.44

0.66

0.47

1.57 0.48 847.8

1.51 0.50 1836.6

Minvcrvg Minvcdis Maxvcrvc Maxhcdel Nvert

segment Smallest vertical curve gradient in segment Shortest vertical curve length in segment in miles Largest beginning vertical curve elevation in segment Longest horizontal curve central angle in segment Number of vertical curves in segment

1.20 0.08 -1.75 1536.5 2.59

1.23 0.07 142.47 1534.9 2.00

3. MODEL SPECIFICATION 3.1 Selection of the Count Model In this study a multivariate Poisson-gamma mixture model (MPGM) is used to analyze the accidents types where rear-end, sideswipe, fixed objects and other types are considered. Meanwhile, correlations among these types are taken into account. The univariate negative binomial regression model is utilized at first in this paper. This univariate negative binomial (NB) regression model is commonly used for modeling accident frequencies (Cameron and Trivedi, 1986). Although the purpose of this paper is to develop a multivariate accidents types frequencies model, the univariate NB is presented for three purposes, first is to investigate the over/under-dispersion problem when the expected variance is larger/smaller than expected mean which is the most common issue for accidents data (Mannering and Bhat, 2014). Second is to use to assist our selection of the most significant explanatory variables related to each accident type. Finally it would be used as a reference to compare with the proposed model. 3.2 Univariate NB Model The expectation of number of accidents is assumed to be λi and the count data model formulation is as follows:

λi = exp(xiT β ) (1) The probability of a specific number of accident to occur on a specific freeway segment i is: Γ( yi + θ )  λi    F(y i ) = yi !Γ( yi )  λi + θ 

yi

θ

 θ     θ + λi 

(2)

where, : index of segment, i F(yi ) : probability of having y i accident : number of accidents on segment i yi : parameter of dispersion for negative binomial model θ : expected number of accidents on segment i λi : set of explanatory variables on segment i, and xi β : weight of each explanatory variable. 3.3 MVPGM Model Specification A multivariate count data are likely to have a non-trivial correlation structure. The modeling

of the correlation structure is important for the efficiency of the estimated parameters and the computation of correct standard errors, i.e., valid inference (Winkleman, 2008). Poissongamma mixture model can be generalized and extended to allow for unobserved heterogeneity and overdispersion in the respective marginal distributions. The proposed model is capturing the error based on the same feature that only onefactor structure is used. Here the correlation is generated by an individual specific multiplicative error term. This model has been presented by Hausman et al. (1984). In this model, the error term represents individual specific unobserved heterogeneity. The mixture multivariate density of expected accident yi is obtained after taking integral.  J exp(− λ ji ui )(− λ ji ui )y ji  f ( y ji | x ji ) = ∫ ∏ g (ui )dui Γ( y ji + 1)  j =1 

(3)

where, j : index of the type of accident, g (ui ) : mixture function : number of the accident of type j on segment i, yij

λ ji

: expected number of accidents of type j on segment i

If ui is gamma distributed with E ( ui ) =1 and Var ( ui ) = θ −1 it can be shown that the joint distribution function of yi is of a negative binomial form with distribution function (Winkleman, 2008) given as  J (λ ji )y ji  θ θ  f ( y ji | x ji ) =  ∏ e −ui ( λTi +θ )u yTi +θ −1dui  j =1 Γ( y ji + 1)  Γ(θ ) ∫   y ji J  (λ ji )  Γ( yTi + θ )θ θ (λ + θ )−( yTi +θ ) = ∏ Ti  j =1 Γ( y ji + 1)  Γ(θ )  

(4)

(5)

where, : sum of the number of accidents of different types

yTi

This model is very closely related to the univariate Poisson-gamma mixture leading to the univariate negative binomial distribution. The only difference is that mixing is over a common variable ui rather than over independent gamma variable uij (Winkleman, 2008). The covariance between outcomes for a given individual can be derived as follows: Cov( y ji , yki ) = Eu Cov( y ji , yki | ui ) + Covu [ E ( y ji | ui ), E ( yki | ui ) | ui ] = 0 + Covu (λ ji ui , λki ui )

= θ λ ji λki , −1

where, j, k

: index of the type of accident,

(6)

Cov

: covariance matrix, and

Eu

: expected value

This model does not have an “equi-covariance” property. Rather, within a specific individual covariance is an increasing function of the product of the expected values λij

λ jl .This could be a useful feature for modeling non-negative random variables. In particular, it eliminates the strict upper bound to the correlation that was observed for the other types of multivariate count models. The unconditional correlation between the two count variables is given as

[

]

Cor y ji , yki =

(θλ

λ ji λki

)(

+ λ ji θλki + λki 2

ji

2

)

(7)

A potential disadvantage of this model is that the covariance is not determined independently of the dispersion. Hence, a finding of a significant θ can be as much an indicator of overdispersion in the data as it might be an indicator of correlation (or both) (Winkleman, 2008). By considering four categories, the expected numbers of accidents by type are given as

λ ji = exp(xTji β j ) where, : index of segment (i=1,…., n), i j : index of the type of accident (j=1, 2,3,4), : expected number of accident on segment i λ ji : set of explanatory variables on segment i for j accident type, and x ji : weight of each explanatory variable for j accident type. βj

(8)

Rewriting the joint probability in Eq. (5): Γ( y1i + y 2i + y3i + y 4i )  λ1i    P( y1i , y 2i , y3i , y 4i | xi ) = y1i ! y 2i ! y3i ! y 4i ! Γ(θ )  λTi 

y1i

 λ2 i     λTi 

y2 i

 λ3i     λTi 

y3 i

 λ4 i     λTi 

y4 i

θ     λTi 

θ

(9)

Then the log likelihood form over the population of N observations is given as

[

log L( y1i , y2i , y3i , y4i | xi ) = ∑∑ log(Γ( y ji + θ )) − log( y ji !) − log(Γ(θ )) 4

N

j =1 i =1

+ y ji log(λ ji ) − y ji log(λ ji + θ )

]

+ θ log(θ ) − θ log(λ ji + θ )

where,

P : Joint probability of all accidents types. LogL : log likelihood of joint probability. : parameter of dispersion for all accident types. θ

(10)

The log likelihood function was coded in Gauss Aptech (1999) and the default BFGS algorithm provided by the maxlik procedure in Gauss was used for maximizing the loglikelihood function. 4. MODEL ESTIMATION AND PERFORMANCE 4.1 Estimation Results of Univariate NB Model At beginning a univariate NB was conducted in order to estimate the total number of accidents as a function of exogenous variables. In the same manner a four separated univariate negative binomial models for each accident type were estimated without cross equation correlation to measure the influence of specific exogenous variables. The results are shown in Table 3. The value of θ is estimated to be 3.668 in the NB model for total number of accidents. This parameter is statistically significant as evidenced by the large t-value. A dispersion test was carried out in order to test the overdispersion occurrence. Dispersion test value is 1.633 for total number of accidents compared to zero (null hypothesis), which means that overdispersion occurs i.e. negative binomial should be used instead of Poisson count model. The values of θ for each accident type are 1.718 for rear end accident; 4.560 for sideswipe accident; 3.528 for fixed object and 4.759 for the other types. As the focus of this study effort is on explaining accidents frequencies by types, and the effect of each explanatory variable, a multivariate Poisson mixture model is developed where all parameters of accidents types are estimated together into a single model. This model offers a closed form for the joint probability and easy to estimate. The contribution of each explanatory variable will be discussed in the elasticity section. Table 3 Univariate NB models of total and type specific accidents Explanatory Variables Constant LnAADT LnLength urorru Nl345 Nhorz Diamond Minvcrvg Minvcdis Maxvcrvk Maxcrvc Maxhcdel Nvert

θ Sample size L(β) AIC BIC

Total accidents Coefficient -12.724** 1.521** 0.746** -0.322** 0.764** 0.089** -0.283** 0.115** -2.173** -0.025* 3.668** 822 -2507.8 5037.6 5089.4

Rear End

Sideswipe

Fixed Object

Other Types

Coefficient -23.528** 2.503** 0.546** -0.647** 0.950** 0.156** -0.262** 1.718** 822 -1473.9 2507.8 5089.4

Coefficient -16.317** 1.704** 0.708** -0.718** 0.611** 0.118** -0.281** 0.072* 0.884* 4.560** 822 -1350.8 2721.6 2768.7

Coefficient -6.904** 0.790** 0.885** -0.204* 0.363** 0.070** -0.244** 0.495* -0.048* 3.528** 822 -1473.9 2967.7 3014.8

Coefficient -11.123** 1.178** 0.997** 0.458** 4.759** 822 -1278.0 2566.1 2589.6

- Not relevant; ** Significant at 1% level; * Significant at 5% level. L(β): Log-likelihood at convergence

4.2 Estimation Results of MVPGM Model Estimation results of MVPGM regression model with correlation are presented in Table 4. The estimation results provide parameter estimates for four types of accidents. The overdispersion parameter is estimated to be 2.482. This parameter is statistically significant as evidenced by the large t-value. The log-likelihood is -6995.093 while AIC and BIC are 7025.093; 12365.718. The combined log-likelihood value of each univariate accident type model is found to be -5576.527. The result indicates the separate univariate types of accidents models fit better than the proposed model. The possible reason is that different sizes of overdispersion in separate univariate models are ignored in the multivariate model and assumed to have the same overdispersion parameter in order to represent the correlation among the frequencies of different types of accidents by the same overdispersion parameter. In the univariate models, the values of θ are 1.718, 4.560, 3.528 and 4.759 for rear end, side swipe, fixed object and other types models, respectively, and those values are different from one another, implying the different sizes of unobserved heterogeneity for different accident types. On the other hand, the value of θ is 2.482 for MVPGM model, which is within the range of the four values of θ for the univariate models, but significantly different from any models. Table 4 MVPGM model of four types of accidents Explanatory Variables Constant LnAADT LnLength urorru (dummy) Nl345 Nhorz Diamond Minvcrvg Minvcdis Maxcrvc Maxhcdel Nvert

θ Sample size Log-likelihood at convergence AIC BIC

Rear End Coefficient -18.628** 2.003** 0.584** -0.825** 1.171** 0.112** -0.352** -

Sideswipe Fixed Object Coefficient Coefficient -16.407** -6.985** 1.718** 0.781** 0.670** 0.724** -0.677** -0.194 0.501** 0.371** 0.129** 0.061* -0.244** -0.162* 0.048* 0.816** 0.439* -0.005 2.482** 822 -6995.1 7025.1 14191.5

Other Types Coefficient -10.799** 1.148** 0.913** 0.392** -

- Not relevant; ** Significant at 1% level; * Significant at 5% level. The model of accidents frequencies used almost the same set of variables as those used in the univariate model of total number of accidents, except the variable indicating the vertical and horizontal curve characteristics due to statistical insignificance in the types of accidents models. The values of coefficients are generally found to be quite different across the four of types of accidents frequencies. Each accident type has different geometric parameters and that’s because each accident type has different characteristics. Typical scenarios for rear ends are a sudden deceleration by the leading vehicle so that the vehicle behind does not have the time to brake and collides with the first. Common factor that contribute to rear end accident

type is speed reduction. Since the number of horizontal curves tends to reduce the speed, more chances for this type to occur. This variable appears with statistically significant positive coefficients at 5% significant level. On another hand this variable is also statistically significant with positive coefficients for both sideswipe and fixed objects. One of possible reasons is that when the leading vehicle reduces the speed due to many horizontal curves the following vehicle tends to either trying pass the leading vehicle which might leading to sideswipe or run-off-road and hit a fixed object. It is found that the largest vertical curve gradient in segment at 5% significant level for sideswipe accident type frequency (0.048). A plausible explanation that this variable is significant only for sideswipe rather than the other types is that because the sight distance association. The sight distance is one of the unobserved factor in this study. Another reason in the vertical curve if it’s more flatted as denoted by the smallest vertical curve gradient in segment then it might increase the speed which will lead to more accidents number of this type. Another variable associated to sideswipe accident type is the largest beginning vertical curve elevation in segment. This variable is found to be statistically significant (0.816) at 1% level. When a leading vehicle reduces its own speed due to high elevation of the vertical curve the following vehicle tries to pass the leading vehicle and more chances to this type to occur specially if there is a heavy vehicle a head. Fixed objects accident type usually occurs when a vehicle leaves the roadway. Contributing factors often include loss of control or miss-judging a horizontal curve, or attempting to avoid colliding with another road user or an animal. Missjudging a horizontal curve is explained by the longest horizontal curve central angle in segment. This variable is found to be significant and positive (0.439) at 5% level for fixed objects accident type only. It should be noted that all variables are found to have coefficients with similar signs in the equations for all types of accidents frequencies. Practically all explanatory variables have coefficient estimates with realistically rational values and signs. The logarithms of AADT and length of freeway segment are considered as exposure influences. Urban rural dummy has a negative coefficient indicating that this dummy variable reducing number of accidents for rear end, sideswipe and fixed object except the other types of accidents category. It infers that freeway segment in rural area tends to has less accident number compared to the one in urban area. It should be noted that the nature of the other types such as head on, overturn, same direction...etc, make these types happen either at urban or rural area which makes this dummy variable irrelevant. The percentage of three lanes or larger (up to 5 lanes) cross section proportion by length of segment has a positive sign and for all types of accidents. As more lanes per segment increase more likely the chances of getting more accidents will increase as well. The number of diamond ramps in segment has significantly negative coefficient for rear end, sideswipe, and fixed objects accidents types. The possible occurrence of such types of accidents emerges due to the fact that geometry of diamond interchange provide smooth traffic flow with less likelihood of getting conflicts. 4.3 Representativeness of Variance and Covariance Structure Table 5 shows the correlations among accidents types for a given segment as calculated using Equation 7. A restriction of MVPGM is given by the fact that it constrains the correlation among counts to be positive (Gurmu and Elder, 2000). The correlations range between 0.475 and 0.369, which demonstrate the presence of common unobserved factors that

affect accident type frequency. These common unobserved factors that influence accidents frequencies by accident type are a host of roadway surface conditions, environmental and weather conditions, driver population factors, adjacent land use characteristics, traffic composition variables (trucks, buses, etc.), sight distance and others (Ye et.al., 2009). Table 5 Estimated correlation matrix for a given segment Rear-End Sideswipe Fixed Object Other Types

Rear-End 1.000 0.455 0.475 0.436

Sideswipe

Fixed Object

Other Types

1.000 0.402 0.369

1.000 0.395

1.000

In addition to the correlation for a given segment shown above, the estimated expected numbers of type specific accidents calculated by Equation 8 also have the correlation with each other because of the common explanatory variables. The former is the correlation resulting from the unobserved heterogeneity among segments, while the latter is resulting from the observed heterogeneity. Here, the observed covariance of the number of accidents of different types should be represented by the sum of the two covariance matrices. The observed covariance of the number of accidents between accident types, the calculated one by estimated MVPGM model, and the calculated one by the estimated univariate models are presented in Table 8. The MVPGM covariance is the sum of two components; the covariance between the estimated expected numbers of type specific accidents by Equation 8 and the covariance resulting from the unobserved heterogeneity given by Equation 6. On the other hand, the covariance calculated by the univariate models is the former and doesn’t include the latter because the unobserved heterogeneities are assumed as independent with each other. The table shows that MVPGM covariance of each pair of accident types is closer to the corresponding observed covariance. The results suggest that MVPGM represents more accurate covariance structure than univariate models although the number of parameters is smaller for MVPGM than the univariate models because of the restriction of the common overdispersion parameter in MVPGM. Table 6 Covariance of the numbers of accidents between accidents types Pair of accident types

Observed

MVPGM

Univariate models

Rear end and sideswipe Rear end and fixed object Rear end and other types Sideswipe and fixed objects Sideswipe and other types Fixed objects and other types

74.45 34.35 22.98 13.73 8.89 5.30

68.12 35.48 25.84 12.01 8.64 5.36

51.41 24.76 18.12 7.69 5.56 3.28

Since the sum of the type specific accidents is the number of total accidents, it is desirable that the variance structures of the type specific models are consistent with that of the total accident model. Here, the variance structure consists of the variance of the expected number of accidents among segments and that of the number of accidents for a given segment. The latter is given by λi + θ −1λi 2 . The variance of the expected number of accidents among segment can be calculated from the estimated type specific accident models by

4 3  4  4 Var  ∑ λ j  = ∑ var(λ j ) + 2∑ ∑ Cov (λ j , λk ) , j =1 k = j +1  j =1  j =1

(11)

which is the same for MVPGM and univariate NB models. On the other hand, the variance of the number of accidents for a given segment can be calculated from the estimated MVPGM by 4

(

)

3

λi + θT −1λi 2 = ∑ λ ji + θ M −1λ ji 2 + 2∑ j =1

where,

θT θM

∑ (θ 4

j =1 k = j +1

−1 M

)

λ ji λki ,

(12)

: dispersion parameter of total accident model, and : dispersion parameter of MVPGM,

and from the estimated univariate NB models by 4

(

)

λi + θT −1λi 2 = ∑ λ ji + θ j −1λ ji 2 , j =1

where,

(13)

θj

: dispersion parameter of univariate NB model for accident type j. The difference between Equation 12 and Equation 13 is that the dispersion parameters varies among accident types for the univariate models while identical for MVPGM, and that the covariance among different accident types is considered only for MVPGM. Variance of the expected number of accidents among segments and that of the number of accidents for a given segment calculated by total accident mode, MVPGM and univariate models are shown in Table 7. The table shows that both variance of the expected number of accidents among segments and that of the number of accidents for a given segment calculated by the estimated total accident model are closer to those calculated by the estimated MVPGM than those calculated by the estimated univariate models. The results suggest that MVPGM represents the variance structure of the total accidents as the sum of the type specific accidents better than the univariate NB models. Table 7 Variance structure of total accidents

Variance of the expected number of accidents among segments Variance of the number of accidents for a given segment

Total accident model

MVPGM

Univariate Models

335.8

332.9

423.7

237.1

238.2

171.2

4.3 Elasticity The elasticity of explanatory variable x on dependent variable y can be expressed by taking the first derivative of the expected number of type specific accidents of MVPGM model. ∂E ( y | x ) = β j exp xTji β j ∂x

(

)

(15)

The elasticity calculated by the total accident model is shown in column 1 in Table 8; while the elasticities of type specific accidents calculated by MVPGM model is shown in columns 2 to 5. It is noticeable that the elasticities of rear-end accidents are larger in the absolute value than any other accident types regardless of explanatory variables. The results suggest that rear end accident is more affected by the geometric and traffic conditions compared to any other types. The last column of Table 8 represents the sum of the elasticities of the four types of accidents calculated by MVPGM model. The sum of the elasticities for Ln AADT, LnLength, Nl345, Nhorz and diamond have similar values to the corresponding elasticities of total accident model. The results suggest the elasticity of the total accident frequency could be accurately divided to the elasticities of each type of accident frequency by MVPGM model for the explanatory variables mentioned above. On the other hand, urorru, maxvcvg, minvcdis, maxvcvk, maxvcrvc, maxhcdel and nvert have different values of elasticity. It might be because of the correlations among the explanatory variables and omitted variables for some of the accident types because of the insignificant parameter estimates. Table 8 Elasticity of total accident model and type specific accident models Explanatory variable LnAADT LnLength urorru Nl345 Nhorz diamond maxvcvg minvcdis maxvcvk maxvcrvc Maxhcdel Nvert

Total No. of accidents 21.814 10.812 -0.389 11.239 1.376 -4.383 1.703 -32.213 -0.392

Rear-End 16.180 4.718 -1.281 9.460 0.905 -2.839 -

Sideswipe 4.674 1.823 -0.968 1.361 0.350 -0.662 0.129 2.218

Fixed Object 1.814 1.708 -0.214 0.875 0.144 -0.381 1.035 -0.013

Other Types 1.986 1.579 0.677 -

Sum 24.681 9.828 -2.463 12.374 1.399 -3.882 0.129 0 0 2.218 1.035 -0.013

5. CONCLUSIONS This paper focused on modeling types of accident frequencies for freeway segments. Researchers in transportation safety field have focused on jointly estimating the types of accidents using several methodologies including multivariate Poisson regression; multivariate negative binomial regression, and zero-inflated versions of these models. The purpose is to get better understanding on the nature of each type and its influence in order to adjust safety policies. Overdispersion and modeling simultaneously the accident types with error correlations have been developed in this paper. The accident types considered in this paper are rear end, sideswipe, fixed objects, and other types. The effects of geometric and traffic characteristics of the freeway have been investigated using the proposed model. It is found that rear end accident type is more affected by the explanatory variables in this study. The model shows less efficiency and goodness-of-fit from a statistical standpoint compared to the univariate NB models of accident types. The possible reason is that different sizes of overdispersion cannot be represented in the MVPGM model. It is also found that the proposed model more accurately represents the variance and covariance structure compared to the univariate NB models. Also the elasticity calculations give an insight that modeling the types

of accidents is an efficient, and no losses in the information about the total accidents if the selection of the explanatory variables is carefully carried out. Limitation of current study offer directions for future research efforts in this area. More investigation is needed where the overdispersion and the correlations should be estimated together with different parameters. Also, another dataset that contains the number of accidents both by accident type and severity level may offer richer insights into the differential impacts of various factors on the accidents. REFERENCES Anastasopoulos, P. C., Shankar, V. N., Haddock, J. E., & Mannering, F. L. (2012) A multivariate Tobit analysis of highway accident-injury-severity rates. Accident Analysis and Prevention, 45, 110–9. Aptech, (1998) Gauss 3.5 Apetch Systems. Maple Valley, Washington. Bhat, C. R., Born, K., Sidharthan, R., & Bhat, P. C. (2014). A count data model with endogenous covariates: Formulation and application to roadway crash frequency at intersections. Analytic Methods in Accident Research, 1, 53–71. Cameron, C., Trivedi, P., (1986) Econometric models based on count data: comparisons and applications of some estimators and tests. Journal of Applied Econometrics 1(1), 29-53. Castro, M., Paleti, R., & Bhat, C. R. (2012). A latent variable representation of count data models to accommodate spatial and temporal dependence: Application to predicting crash frequency at intersections. Transportation Research Part B: Methodological, 46(1), 253–272. Castro, M., Paleti, R., & Bhat, C. R. (2013). A spatial generalized ordered response model to examine highway crash injury severity. Accident Analysis and Prevention, 52, 188–203. Chiou, Y.-C., & Fu, C. (2013) Modeling crash frequency and severity using multinomial-generalized Poisson model with error components. Accident Analysis and Prevention, 50, 73–82. Das, A., & Abdel-Aty, M. a. (2011). A combined frequency–severity approach for the analysis of rear-end crashes on urban arterials. Safety Science, 49(8-9), 1156–1163. Dey, D.K. and Y. Chung (1992) Compound Poisson distributions: properties and estimation, Communications in Statistics – Theory and Methods 21: 3097-3121. Dissanayake, S., & Lu, J. J. (2002). Factors influential in making an injury severity difference to older drivers involved in fixed object-passenger car crashes. Accident Analysis and Prevention, 34(5), 609–18. Dong, C., Clarke, D. B., Yan, X., Khattak, A., & Huang, B. (2014) Multivariate random-parameters zero-inflated negative binomial regression model : An application to estimate crash frequencies at intersections. Accident Analysis and Prevention, 70, 320–329. El-Basyouny, K., & Sayed, T. (2009). Collision prediction models using multivariate Poisson-lognormal regression. Accident Analysis and Prevention, 41(4), 820–8. Gkritza, K., Kinzenbaw, C. R., Hallmark, S., & Hawkins, N. (2010). An empirical analysis of farm vehicle crash injury severities on Iowa’s public road system. Accident; Analysis and Prevention, 42(4), 1392–7. Gurmu, S. and Elder, J. (199820002000) Generalized bivariate count data regression models Estimation of multivariate count regression models with application to health care utilization, Economics Letters, 68, 31–36. Hausman, J.A., B.H. Hall and Z. Griliches (1984) Econometric models for count data

with an application to the Patents-R&D relationship, Econo- metrica 52: 909-938. Kockelman K. M. (2001) A model for time- and budget- constrained activity demand analysis, Transportation Research Part B,35(2001) 225-269. Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice, 44(5), 291–305. Lord, D., Washington, S. P., & Ivan, J. N. (2005) Poisson, Poisson-gamma and zeroinflated regression models for motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Preventation 37, 35-46. Ma, J., Kockelman, K. M., & Damien, P. (2008) A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods. Accident Analysis and Prevention, 40(3), 964–75. Ma, J., Kockelman, K.M., (2006) Bayesian multivariate Poisson regression for models of injury count by severity. Transportation Research Record 1950, 24-34. Mannering, F. L., & Bhat, C. R. (2014) Analytic methods in accident research: Methodological frontier and future directions. Analytic Methods in Accident Research, 1, 1–22. Mannering, F., Shankar, V., & Barfield, W. (1996). Statistical Analysis of Accident Rural Freeways. Accident Analysis and Preventation 28(3), 391–401. Miles, D. (2001) Joint Purchasing Decisions: A multivariate negative Binomial Approach. Applied Economics, 33, 937-946. Park, E. S., & Lord, D. (2007) Multivariate Poisson-Lognormal Models for Jointly Modeling Crash Frequency by Severity. Transportation Research Record, 2019(-1), 1–6. Shaheed, M. S. B., Gkritza, K., Zhang, W., & Hans, Z. (2013). A mixed logit analysis of two-vehicle crash severities involving a motorcycle. Accident Analysis and Prevention, 61, 119–28. Winklemann, R. (2008). Econometric Analysis of Count Data. Springer-Verlag Berlin Heidelberg (Vol. Fifth edit, p. 349). Yan, X., Ma, M., Huang, H., Abdel-Aty, M., & Wu, C. (2011). Motor vehicle-bicycle crashes in Beijing: irregular maneuvers, crash patterns, and injury severity. Accident Analysis and Prevention, 43(5), 1751–8. Yang, Z., Zhibin, L., Pan, L., & Liteng, Z. (2011). Exploring contributing factors to crash injury severity at freeway diverge areas using ordered probit model. Procedia Engineering, 21, 178–185. Yasmin, S., Eluru, N., Bhat, C. R., & Tay, R. (2014). A latent segmentation based generalized ordered logit model to examine factors influencing driver injury severity. Analytic Methods in Accident Research, 1, 23–38. Ye, X., Pendyala, R. M., Shankar, V., & Konduri, K. C. (2013) A simultaneous equations model of crash frequency by severity level for freeway sections. Accident Analysis and Prevention, 57, 140–149. Ye, X., Pendyala, R. M., Washington, S. P., Konduri, K., & Oh, J. (2009) A simultaneous equations model of crash frequency by collision type for rural intersections. Safety Science, 47(3), 443–452.