Optimizing Traffic Prediction Performance of Neural ... - CiteSeerX

Optimizing Traffic Prediction Performance of Neural Networks under Various Topological, Input, and Traffic Condition Settings

By Sherif Ishak1 Department of Civil and Environmental Engineering Louisiana State University Baton Rouge, LA 70803 Phone: (225)-578-4846 Fax: (225)-578-8652 Email: [email protected]

And Ciprian Alecsandru Graduate Student Department of Civil and Environmental Engineering Louisiana State University Baton Rouge, LA 70803

Paper submitted for presentation at the Transportation Research Board 82nd Annual Meeting Washington, D.C. 2003 and for publication in the Transportation Research Record journal

Revised and Re-Submitted on October 25, 2002

1

Corresponding Author

TRB 2003 Annual Meeting CD-ROM

Paper revised from original submittal.

1

ABSTRACT This paper presents an approach to optimize the short-term traffic prediction performance on freeways using multiple artificial neural network topologies under different network and traffic condition settings. The study was conducted to encourage multi-model techniques that are capable of improving the performance over single-model approaches. Using a mix of traditional and modern neural network topologies, the short-term speed prediction performance was extensively evaluated under different input settings and various prediction horizons (from 5 to 20 minutes). The input patterns were constructed from the most recent past data collected at multiple detector stations. To enable the network to learn from historical information, a longterm memory component was attached to the input patterns in the form of a time index. This allows the network to build internal representation of recurrent conditions that are likely to be observed at the same time of day, in addition to the short-term memory that is often encapsulated in the most recent information. Optimal settings were determined by maximizing the performance under different traffic conditions observed at the target location, as well as upstream and downstream locations. Comparative statistical analysis with naïve and heuristic approaches showed that the optimized neural network approach resulted in much better performance. The study shows that no single topology outperformed the others. However, for nearly 33% of the cases, the Co-Active Neuro-Fuzzy Inference System (CANFIS) topology was found optimal. The study also shows that the optimal settings were consistently more dependent on the longterm memory component as prediction horizon increases. Interestingly enough, the optimization of prediction performance under different traffic conditions allows for the identification of situations when prediction accuracy is unacceptable by traffic information dissemination systems. KEYWORDS Traffic prediction, speed prediction, freeway operation, artificial neural networks, and performance optimization.



2

INTRODUCTION Real time and predictive information on traffic conditions is becoming increasingly critical to the successfulness of advanced traffic management systems (ATMS). With the remarkable advancement in data collection and dissemination technology, the traveling public has now the ability to access such information at both pre-trip planning stage and en-route. With accurate and reliable information, travelers can make appropriate decisions to bypass congested segments of the network, change departure times and/or destination, whenever appropriate. Such decisions are likely to affect the potential demand at various points on the network and provide opportunities for better utilization of the existing infrastructure capacity. On the other hand, traffic management centers need such information to support their primary management and control functions. Urban freeways, a primary source of mobility, are heavily traveled by both commuters and noncommuters who continuously experience excessive delays and queuing conditions on a daily basis. Freeway travelers often seek information on traffic conditions in the form of travel times and delays along their trips. Such information provides an indirect measure of travel cost, which most travelers seek to minimize. Since travel decisions are often made in advance, travelers are often seeking predictive information on traffic conditions along their selected route and within the duration of the trip. Research in the past few years has addressed this need with a variety of short-term traffic prediction models that attempt to capture the dynamic nature of traffic conditions. A review of the literature on previously developed models and research activities in this area reveals that there is still a need to improve on the prediction performance and capability of existing models. This research study presents an approach to optimize the performance of neural networks in short-term traffic prediction by exploring different network topologies and settings under different traffic conditions. BACKGROUND The concept of intelligent transportation systems has introduced several functions and services that have the potential to improve the efficiency, safety and productivity of our surface transportation system. Among them is our ability to make better decisions on our travel choices when current and predicted information becomes available. This concept has motivated researchers to seek traffic prediction models that are capable of forecasting traffic flow, speed, and travel times in short terms (from 5 to 30 minutes). Several research efforts were thus conducted in the last few years to support ITS applications and provide the travelers with travel time information at the pre-trip planning stage and en-route. Kaysi et al. (1) and Ben-Akiva et al. (2) recommended that traffic routing strategies under recurring and non-recurring congestion be based on forecasting of future traffic conditions rather than historical and/or current traffic conditions. This is because travelers’ decisions are affected by future traffic conditions rather than current traffic conditions. Several prediction methods have been implemented in research in the past two decades. Ben Akiva et al. (3) grouped those methods into three categories: (a) statistical models, (b) macroscopic models, and (c) route choice models based on dynamic traffic assignment. Time series models have been extensively used in traffic forecasting for their simplicity and strong potential for on-line implementation (see for example; 4-14). Recently, Chen and Chien (15) conducted a study using probe vehicle data to compare the prediction accuracy under direct measuring of path-based travel time versus link-based travel



3

times. The study showed that under recurrent traffic conditions, path-based prediction is more accurate than link-based prediction. Chien and Kuchipudi (16) presented the results of using real-time and historical data for travel time prediction. Another study by Kwon et al. (17) used an approach to estimate travel time on freeways using flow and occupancy data from single loop detectors and historical travel time information. Forecasting ranged from a few minutes into the future up to an hour ahead. The study showed that current traffic conditions are good predictors for the near future, up to 20 minutes, while long-range predictions need the use of historical data. Another study by Smith et al. (18) compared parametric and nonparametric models for traffic flow forecasting. The study showed that prediction under seasonal autoregressive integrated moving average (ARIMA), a parametric modeling approach to time series, outperforms other non-parametric approaches, like regression based on heuristically improved forecast generation. Recently, several studies have investigated the use of artificial neural networks to model shortterm traffic prediction. For instance, Park and Rilett (19) proposed two modular Artificial Neural Networks (ANN) models for forecasting multiple-period freeway link travel times. One model used a Kohonen Self Organizing Feature Map (SOFM) while the other utilized a fuzzy cmeans clustering technique for traffic patterns classification. Rilett and Park (20) proposed a one-step approach for freeway corridor travel time forecasting rather than link travel time forecasting. They examined the use of a spectral basis neural network with actual travel times from Houston, Texas. Another study by Abdulhai et al. (21) used an advanced time delay neural network (TDNN) model, optimized using a Genetic Algorithm, for traffic flow prediction. The results of the study indicated that prediction errors were affected by the variables pertinent to traffic flow prediction such as spatial contribution, the extent of the loop-back interval, resolution of data, and others. Lint et al. (22) presented an approach for freeway travel time prediction with state-space neural networks. Using data from simulation models, they showed that prediction accuracy was acceptable and favorable to traditional models. Several other studies applied neural networks for predicting speed, flows, or travel times (see for instance; 2328). OBJECTIVES The objective of this study is to optimize the performance of neural networks in short-term traffic prediction of speed on freeways. The study investigates the performance of four different neural network architectures under different settings and traffic conditions. In addition to the previously applied architectures (MLP and Modular networks), two new architectures are introduced: the PCA (Principal Component Analysis) network and the Co-Active Neuro-Fuzzy Inference System (CANFIS). In each network, we investigate the effect of introducing a longterm memory component, in addition to the short-term memory components, on the prediction performance in the range of 5 to 20 minutes. To ensure optimal performance under different settings, the prediction performance was tested under a wide spectrum of spatial and temporal traffic conditions. DESCRIPTION OF THE NEURAL NETWORK MODELS Four different architectures were applied in the study to compare the prediction performance of each under different configurations and traffic conditions. Each model is described briefly in this section.



4

Multi-Layer Perceptron (MLP) Network The MLP has been extensively used in many transportation applications for its simplicity and ability to perform nonlinear pattern classification and function approximation. It is, therefore, considered the most widely implemented network topology by many researchers (29, 30). Its mapping capabilities are believed to approximate any arbitrary function. An MLP consists of three types of layers: input, hidden, and output. It is normally trained with the backpropagation algorithm, which is based on minimizing the sum of squared errors between the desired and actual outputs. Since this topology is very common, it has been repeatedly explained in detail in the literature, and thus, will not be covered here to avoid redundancy and to allow other new topologies to be explained. Modular Network Modular networks are a special class of multiple parallel feed-forward MLPs. The input is processed with several MLPs and then the results are recombined. This type offers specialization of function in each sub-module and does not require full interconnectivity between the MLP’s layers. Modular networks are often faster to train due to the smaller number of weights for the same size network. The topology used in this study is composed of two primary components: local expert networks and a gating network (31, 32). The basic idea is linked to the concept “divide-and-conquer”, where a complex system is better attacked when divided into smaller problems, the solutions of which lead to the solution of the entire system. Using a modular network, a given task will be split up among some local expert networks, thus reducing the load on each in comparison with one single network that must learn to generalize from the entire input space. A gating network eventually combines the output from the local experts to produce an overall output. FIGURE 1(a) shows the topology of the modular network. Hybrid Principal Component Analysis (PCA) Network PCA is a technique that finds an orthogonal set of directions in the input space and provides a way to find the projections into these directions in an ordered fashion. The orthogonal directions are called eigenvectors of the correlation matrix of the input vector and the projections of the corresponding eigenvalues. PCA has the ability to reduce the dimensionality of the input vectors, and therefore, can be used for data compression. When used in conjunction with MLP, the PCA can reduce the number of inputs to the MLP and improve its performance. In this study a hybrid PCA/MLP network is used, combining both unsupervised and supervised learning paradigms in one topology. The PCA projects the input vector onto a smaller dimensional space, and thus, compressing the input for the MLP network. It should be emphasized that PCA is a well known statistical procedure that is used in feature extraction from high-dimensional space (29, 30, 32). The topology of the hybrid PCA network is illustrated in FIGURE 1(b). Co-Active Neuro-Fuzzy Inference System (CANFIS) CANFIS belongs to a more general class of adaptive neuro-fuzzy inference systems (ANFIS) (31). In the context of this study, CANFIS is used as a universal approximator of any nonlinear function. The characteristics of CANFIS are emphasized by the advantages of integrating neural networks with fuzzy inference systems (FIS) in the same topology. The powerful capability of CANFIS stems from pattern-dependent weights between the consequent layer and the fuzzy association layer. Like the radial-basis function network (RBFN), CANFIS is locally tuned. The architecture of CANFIS is illustrated in FIGURE 1(c). The fundamental component for CANFIS



5

is a fuzzy axon that applies membership functions (MF) to the inputs. Two membership functions are commonly used: general bell and Gaussian. The network contains also a normalization axon to expand the output into the range of 0 to 1. The second major component in the type of CANFIS used in this study is a modular network that applies functional rules to the inputs. The number of modular networks matches the number of network outputs and the number of processing elements in each network corresponds to the number of MFs. CANFIS also has a combiner axon that applies the MF outputs to the modular network outputs. Finally, the combined outputs are channeled through a final output layer and the error is backpropagated to both the MFs and the modular networks. The CANFIS architecture used in this study is composed of four layers as shown in FIGURE 1(c). The function of each layer is described as follows. Each node in layer 1 is the membership grade of a fuzzy set (A, B, C, or D) and specifies the degree to which the given input belongs to one of the fuzzy sets, which are defined by three membership functions. Layer 2 receives input in the form of the product of all output pairs from the first layer. The third layer has two components. The upper component applies the membership functions to each of the inputs, while the lower component is a representation of the modular network that computes, for each output, the sum of all the firing strengths. The final layer calculates the weight normalization of the output of the two components from the third layer and produces the final predictions of speed at different prediction horizons. STUDY AREA AND DATA COLLECTION The study was conducted on a freeway segment of I-4 in Orlando, Florida, using 30-second loop detector speed data. The traffic surveillance system compiles data from a 40-mile six-lane corridor instrumented with 70 inductive dual loop detector stations spaced at nearly 0.5 miles in both directions. The real time and archived data is accessible on the web at the URL: trafficinfo.engr.ucf.edu. The loop detector data is collected in real time via a dedicated highspeed link between the I-4 Regional Traffic Management Center (RTMC) in Orlando and the intelligent transportation system lab at the University of Central Florida. Speed, volume counts and lane occupancies are downloaded and compiled into an SQL server that supports multiple publicly accessible web applications such as real time and short-term travel time predictions between user-selected on- and off-ramps. The web-based short-term traffic prediction system was implemented using a nonlinear time series model that was tested extensively in a previous study (14). In this study data was collected from three adjacent stations over 1-mile section. A total of 28 days were randomly selected in the year 2001 during the morning peak period from 6:00 AM to 10:00 AM in the westbound direction. To reduce the noise and random fluctuations in speed, the data resolution was reduced from 30-second averages to 5-minute averages using a 5-minute moving time window that is updated every minute. The recent past 5-minute speed averages were based on data observed as far back as 10 minutes from the current time. The 5minute moving average and the 10-minute rolling horizon were arbitrarily selected for this study. Further testing may deem necessary to verify if the selected values have a significant impact on the predictive performance of the adopted approach. The 28 days were randomly split into 14 days for training, 4 for validation, and 10 for testing. TRAINING Training each of the four network topologies was conducted using NeuroSolutions (33). In order to achieve optimal performance, different settings were attempted by varying the number and type of inputs to each network. The input to each network was classified into two components: short-term memory (STM) and long-term memory (LTM). The STM component was



6

represented by speed data observed in the past 10 minutes at the three adjacent stations. Input patterns were constructed over time and space to capture temporal and spatial variations of traffic conditions. However, to optimize the performance of the networks, input patterns were constructed from three spatial settings: current and upstream (y, z), current and downstream (y, x), or the three stations combined (x, y, z). The LTM component was introduced in addition to the STM component to test the network’s ability to learn from similar historical traffic conditions observed at the same time on other days. The LTM component was represented by a time index attached to each constructed speed pattern. The time index was referenced to the beginning of the peak period at 6:00 AM and expressed in increments of 1 minute. The essence of using LTM component is to improve the performance at relatively longer prediction horizons by making the network time-cognizant during prediction. Each network was trained to predict the average 5minute speeds at 5, 10, 15, and 20-minute horizons. During the training phase the performance of the networks was monitored via the validation set to avoid overtraining. Training is terminated when the mean square error (MSE) for the cross-validation set does not decrease for 50 consecutive training cycles, a common procedure to prevent overtraining. Except for the hybrid PCA network, the input patterns representing STM were in the form: Type 1 input: {x(t), x(t-5), y(t), y(t-5)} Type 2 input: {y(t), y(t-5), z(t), z(t-5)} Type 3 input: { x(t), x(t-5), y(t), y(t-5), z(t), z(t-5)} Where x(t) = average 5-minute speed at downstream station at time t x(t-5) = average 5-minute speed at downstream station at time t-5 y(t) = average 5-minute speed at current station at time t y(t-5) = average 5-minute speed at current station at time t-5 z(t) = average 5-minute speed at upstream station at time t z(t-5) = average 5-minute speed at upstream station at time t-5 The hybrid PCA network has the ability to reduce the dimensionality of the input space by locating the principal components. Therefore, the input patterns were presented in high dimensional vectors of 10 observations taken one minute apart from time t. In other words, the input patterns were in the form: Type 1 input: {x(t), x(t-1),… x(t-9), y(t), y(t-1),… y(t-9)} Type 2 input: {y(t), y(t-1),… y(t-9), z(t), z(t-1),… z(t-9)} Type 3 input: {x(t), x(t-1),… x(t-9), y(t), y(t-1),… y(t-9), z(t), z(t-1),… z(t-9)} The output vector for each input pattern was constructed in the form {S(t+5), S(t+10), S(t+15), S(t+20)}, where S(t+5), S(t+10), S(t+15), S(t+20) are the average 5-minute speeds taken at 5, 10, 15, and 20 minute predictions, respectively. TESTING In order to test the performance of each network after training, a set of 10 peak periods collected from 10 different days was presented to the network. For each input pattern in the testing set, multiple predictions were made at 5, 10, 15, and 20 minute horizons. Each predicted value was compared against the actual observed value to calculate two measures of performance: average absolute relative error and root mean square error. Each measure is defined as follows:



7

N

AARE =

∑ i =1

Si − Vi Vi N

(1)

And N

RMSE =

∑ (S i =1

i

− Vi ) 2

N

(2)

Where AARE = Average absolute relative error in speed RMSE = Root mean square error Si = Predicted speed (mph) for observation i Vi = Actual speed (mph) for observation i N = Number of observations The two measures were used to compare the performance of the four networks under the following settings: desired prediction horizon (5, 10, 15, and 20 minutes), input type (xy, yz, and xyz), and inclusion of LTM component (Yes/No). In addition to the previous controlled variables, the performance was also checked against various combinations of traffic conditions at each of the three stations. At each station, traffic conditions were broken down into four levels of congestion: level 1 (speed >60 mph), level 2 (40-60 mph), level 3 (20-40 mph), and level 4 (