... under the Development of a Dynamic Traffic Congestion Prediction System for Indian. Cities, funded by Tata Consultancy Services. 0. 50. 100. 150. 200. 250.
Bus Travel Time Prediction using Machine Learning Approaches B. Anil Kumar, Rakesh Behera, Vivek Kumar, Kranthi Kumar Reddy, Lelitha Vanajakshi, Shankar C. Subramanian Dynamic Input Selection
Spatial Analysis: To understand the behavior of bus travel time over different subsections 400
300 250
Actual 100
Travel time (s)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
3 5 7 Number of Previous Trips
Performance Comparison
150
0
SVM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
Subsection index
ANN
50 40
Bus Travel Time Prediction Method
SVM was able to perform better than historical average and ANN methods
30 20 10
Developed prediction methodologies using machine learning approaches, ANN and SVM.
0 7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55
Subsection index
Identified peak and off-peak times Off-peak timings: 4 AM – 8 AM, 11 AM – 2 PM, and after 7 PM Peak timings: 8 AM-11 AM and 3 PM – 7 PM
Present Study route: 19B Length: 30 km Origin: Kelambakkam Destination : Saidapet
Bus Travel Time Prediction using Artificial Neural Networks (ANN) Input: Six previous significant trips Output: Current trip travel time Data requirement: Training: 18 days Validation: 7 days Testing: 7days Neurons : Identified separately for each subsection 1-Week
2-Weeks
[
x k +1 = x k − J T J + µI
17-Oct 1101 2 06:35 16.06 50.97 32.34 12.24 9.75 17.70 14.37 8.23 6.62 6.52
17-Oct 1101 3 06:51 37.55 44.35 14.40 9.67 9.46 12.01 12.02 9.16 7.62 6.95
OBSERVATIONS Statistic/Day Type Number of Trips Mean Minimum Maximum T50 T95 Standard Deviation Sample Variance Kurtosis Skewness (S) Range Standard Error Average Speed (kmph) COV (%) PTI (T90-T10)/T50 (%)
Sun 42 54.44 43.83 65.86 55.24 64.73 5.36 28.69 -0.300 -0.013 22.04 0.83 30.86 10 2.31 27
Mon 51 57.38 42.39 76.05 56.90 73.00 7.66 58.62 0.271 0.527 33.66 1.07 29.28 13 2.61 39
Tue 38 66.35 44.60 91.98 62.34 89.72 11.78 138.77 -0.530 0.595 47.38 1.91 25.32 18 3.20 48
Wed 43 62.54 46.37 87.68 58.46 86.35 11.53 132.84 -0.247 0.936 41.31 1.76 26.86 18 3.08 55
Thu 36 68.89 45.43 97.09 67.30 96.28 12.42 154.26 0.315 0.612 51.65 2.07 24.39 18 3.44 55
Fri 53 71.13 50.55 115.50 66.33 112.91 15.83 250.58 1.207 0.226 64.95 2.17 23.62 22 4.03 62
Sat 42 58.59 45.86 75.31 57.81 73.55 6.48 42.03 0.489 0.412 29.44 1.00 28.67 11 2.63 28
Spatial analysis: high variation in travel times between sections – due to presence of bus stops, intersections Travel times in peak and off-peak hours are showing difference in variation in travel times Weekdays and Weekends showed a clear difference Each day of the week showed a distinct variation compared to other days of the week
−1
25
24
23 2
3 4 Day index
5
0
6
5
10 No. of neurons
15
20
Variation in MAPE over number of neurons
Variation in MAPE over various amounts of training data
Bus Travel Time Prediction using Support Vector Machines (SVM) Support Vector Regression: To map the data into high dimensional feature space via nonlinear mapping and perform linear regression in this space. Consider a set of training data points (x1,y1), (x2,y2), …..(xn, yn), where xn is an ndimensional input vector, yn is the desired value (output vector).
y = f (x)
15 10 5 0 1
2
3
4
5
6
7
Summary and Conclusions
24.5
23.5
1
20
MAPE comparison for various days for ANN, SVM and historical average methods
10
0
35 30
25
5
SVM
40
J T e,
25.5
15
ANN
Day index
MAPE
MAPE
Travel time variation over days of the week
]
Termination Criteria: Maximum epochs: 600 Maximum training time Falls below minimum gradient: 1e-10
3-Weeks
20
Travel time variation over Weekday vs. Weekend
Historical Average
Standard back propagation technique
25
17-Oct 1101 1 06:01 16.62 39.73 12.54 8.57 8.54 9.22 8.94 7.51 6.64 6.67
1
200
50
Identification of peak and off-peak Trajectory analysis Hourly variation Daily variation
Started in 2009 (in two routes) Current status: 450 devices (150 active; 300 under installation process) Total Routes: 30 Real-time communication – GPRS Data storage – SQL database
Direction identification Missing data Interpolation Quality check Check for outliers – 95th percentile Distance calculation Haversine formulae Segment-wise travel times Length: 100 m Used for pattern analysis
250
Comparison of static and dynamic input selection
Data were collected by fitting GPS units in MTC buses of Chennai, India
Date Route ID Trip ID Trip Starting time (HH.MM) 100 200 300 400 Distance from starting point 500 (m)/travel time 600 (sec) 700 800 900 1000
0.5
Input vector (x): previous six vehicles in the same subsection Output vector: Present vehicle of interest
300
100
Temporal Analysis: To understand the behavior of bus travel time over different time periods of the day
Data Collection and Processing
Data Processing:
MAPE – Dynamic = 18.64% MAPE – Static = 29.25%
350
Variation in travel time across various sections along the route
To identify the travel time patterns under Indian traffic conditions To identify the significant input using k-NN analysis To develop a bus travel time prediction methodology using Artificial Neural Networks and Support Vector Machines – Comparing the performance
Static
1
400
Subsection index
Objectives
Dynamic
450 50
Source: hotel-plus.info
Optimum number of inputs were identified by using Approximate Entropy (ApEn) technique ApEn: Quantify the amount of regularity and the unpredictability of fluctuations in data over time
200 150
1.5
0
MAPE
Travel time (sec)
350
k-NN Analysis To separate data, based on similarities between the travel patterns Methodology: Input: previous three subsections travel time of the current trip Criteria: Euclidean distance
MAPE
Advanced Public Transportation Systems (APTS), a functional area of ITS, applies information technologies to public transit to enhance efficiency. Aim : To provide accurate information about bus arrival times to passengers Earlier studies Limited data Developed methods that are less data intensive With more data Selection of suitable inputs Applying data intensive approaches
Data used Training: 18 days Validation: 7 days Testing: 7 days
ApEn
Pattern Analysis
Introduction
n
y ( x , ω ) = ∑ ω i φi ( x) + ω 0 = ω t φ ( x) + ω 0 i =1
Travel times in peak and off-peak hours showing difference in variation in travel times Each day of the week showed a distinct variation compared to other days of the week Developed a dynamic input selection algorithm using k-NN classifier Developed two prediction methods using ANN and SVM to predict bus travel time ANN to predict bus travel time using ANN with standard back propagation technique. Used SVM to predict bus travel time using SVR with linear kernel function in LIBSVM and the method was implemented in MATLAB. Compared the performance of the ANN and SVM with historical average methods. SVM was able to perform better than ANN and historical average methods.
Acknowledgements The authors acknowledge the support for this study as a part of the Sub-project CIE/10-11/168/IITM/LELI under the Centre of Excellence in Urban Transport project funded by the Ministry of Urban Development, Government of India, through letter No. N-11025/30/2008-UCD. Project RB/16-17/CIE/001/TATC/LELI under the Development of a Dynamic Traffic Congestion Prediction System for Indian Cities, funded by Tata Consultancy Services.