Traffic Prediction Models for Bangkok Traffic Data

10 downloads 0 Views 854KB Size Report
Traffic Prediction Models for Bangkok Traffic. Data ... congestion condition on the roads in Bangkok. ..... the high-coverage rules for the class which is minority.
Computer and Information Technology Pattern Analysis and Machine Intelligence Paper 10 1475

Traffic Prediction Models for Bangkok Traffic Data Nattapon Klakhaeng, Jumpol Yaothanee, Sukree Sinthupinyo, Wasan Pattara-Atikom

National Electronics and Computer Technology Center (NECTEC), National Science and Technology Development Agency (NSTDA) Klong Luang, Pathumthani 12120 Thailand E-mails:[email protected]@[email protected] Department of Computer Engineering Faculty of Engineering, Chulalongkorn University Bangkok, 10330, Thailand Email:[email protected]

Abstract- This Paper presents a prediction model of traffic congestion condition on the roads in Bangkok. The results from our prediction model will be useful for systems related to traffic information. We constructed a model which can achieve 92.1 % of accuracy on prediction of the congestion condition in next 30 minutes. Then, we found that the accuracy on the results that change

from

the

current

state

is

unsatisfiable.

Hence,

we

constructed a new model that can improve the accuracy on this portion of traffic data. Moreover, we propose a new performance measurement which can distinguish the results in more detail. The final results show that our approach can improve the accuracy over the former model.

Keywords- Prediction Traffic, Decision Tree, C4.S

I. INTRODUCTION Recently emerging technologies allow people to easily access traffic data, such as map and traffic information via a web portal ( www.traffy.in.th) or mobile phone [1 2]. The commuters who utilize this information can plan a trip to avoid traffic jam and read a destination faster. However, using current traffic information alone is not enough to plan the trip. Because, the departure time is not the time when the user starts planning and the actual traffic data changes all the time. It will be advantages if both the current traffic information and the predicted traffic information are available for commuters to make an informed and advanced judgment. Traffic information becomes important for both commuters and interested organizations to facilitate traffic congestion or provide traffic related services to end users. In this paper, we propose prediction models which are aimed at predicting of traffic congestion condition and the change of the condition using C4.5. This paper is organized as follows. The first section is the review of ensemble methods for prediction traffic congestion. Section II is the prediction model. The third section will explain traffic data to be used in our paper. The 8th Electrical Engineering! Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference 2011

The forth section is experimental results. And the final section is the conclusion. II. RELATED WORKS Several studies in Intelligent Transport System (ITS) literature were proposed to predicted traffic congestion from traffic flow or travel time. Yaqin Wang et al [3] present a method to predict traffic data using dynamic traffic prediction. They predicted traffic condition by using raw data from loop detector such as flow traffic. Pang Ming-bao et. AI. [4] propose a prediction method using Subtractive Clustering for Fuzzy Neural Network Modeling. The traffic data from this literature was created from a simulation program. They used traffic volume that is a ratio between vehicles and time. WangYan et al [5] propose a prediction method using the Backpropagation algorithm and combination of Neural Network Algorithm with Genetic Algorithm. The model is formed by historical data, which is the traffic of past N days to predict that of the next M days. III. PREDICTION MODEL Our preliminary is prediction of traffic congestion in the next 30 minutes. This model employs the following attribute. 1) Day of week 2) Hours of the day 3) Minute of the hour 4) Time units of five minutes start from 12:00 am 5) Current congestion condition, 6) Current congestion condition of the inbound road(s), and 7) Current congestion condition of the outbound road(s). TABLE I PREDICTION PERFORMANCE OF PREDICTION TRAFFIC CONGESTION Congestion Degrees

H%

M%

L%

Data Ratio

2.11%

16.73%

81.16%

TRUE Positive

44.96%

72.44%

97.38%

Page 484

Computer and Information Technology Pattern Analysis and Machine Intelligence

TRUE Negative

96.34%

77.15%

0.55%

3.66%

22.85%

55.04%

27.56%

2.62%

79.89%

94.83%

99.45%

FALSE Positive FALSE Negative Precision

63.71%

Accuracy

92.1%

We trained Decision Trees using the attributes mentioned above. The results are shown in Table I. This model has accuracy 92.1 % using 10-fold cross validation testing. However, we further investigated the prediction results to find the cause of the prediction error. We used the predicted result and the training data to calculate the accuracy of this model for each condition change. The detail is shown in Table II. In Table II, the first column shows the prediction case, in which the first letter shows the current status and second letter shows traffic conditions in the future. For example LH denotes the change from low-congested to highly-congested. The second column shows the number of samples found in the training data. The third column shows the accuracy of each case. The accuracy of the case that the latter condition is same as the current condition is very high. While the accuracy of the case that the congestion condition changes is relatively low. The overall accuracy of the case that the traffic condition does not change (LL, MM, and HH) is 98.08%. The accuracy of the group in which the traffic condition changes (LM, LH, ML, MH, HL, and HM) is 41.72%. From this analysis, we can see that the good accuracy results came from the portion that the traffic condition does not change.

reduced within 20 minutes, such as a change from high to moderate. This model includes the following attributes, 1) Day of week 2) Hours of the day 3) Minute of the hour 4) Time units of five minutes 5) Current congestion condition 6) Current congestion condition of the connected road(s), 7) Duration of congestion condition 8) Duration of the congestion condition in the connected road(s) 9) Change in congestion condition and 10) Change in congestion condition of connected road(s). IV. DATA COLLECTION OF BANGKOK'S ROAD TRAFFIC Road Traffic Information was collected in the form of congestion degree by Bangkok Metropolitan Administration (BMA). The CCTV cameras are installed at heavily congested arterial roads in Bangkok, e.g. Silom, Sathofll, and Phaholyothin. The video streams from the CCTV cameras are translated into three levels of congestion by image processing using occupancy ratio technique which is the proportion of time that a focused area of road surface is occupied by vehicles. High occupancy ratio can be interpreted as congested condition where vehicles remain in the focused area for a long period of time while low occupancy ratio can be interpreted as low congestion or free-flow condition where vehicles pass through the focused area quickly. In this paper, we used congestion degrees of road number 1206 as data set. The road number 1206 is a section of Sukhumvit Road which start from Yak Asoke-Sukhumvit and end at South Akamai.

TABLE II PREDICTION TRAFFIC ACCURACY DIVIDED INTO EACH CASE

State

count

Accuracy

LL

13373

99.37 %

MM

1999

90.94 %

HH

155

78.06 %

LM

717

20.50 %

LH

35

17.14 %

ML

177

62.10 %

MH

702

21.46 %

HL

19

9.09 %

HM

22

73.68 %

State

Accuracy

Not change

98.08%

Change

4l.72%

We design a new model for predicting the change of traffic conditions within the next 20 minutes. The reason for selecting the period in next 20 minutes will be explained in the next section. The change of traffic congestion is divided into three categories. 1) Increasing congestion means that traffic congestion has increased within 20 minutes, such as change from low to high. 2) Unchanging means traffic that does not change within 20 minutes. 3) Decreasing congestion means traffic jam has

The 8th Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference 2011

Tirm �(HH : 1l\1: SS) _

Fig. 1 Traffic Congestion on Link 1206

Figure 1 shows congestion degree of the road number 1206 in each period. X axis is the time period which is divided into time intervals of five minutes starting at 00:00 to 23:55. Y axis shows the proportion of the congestion degree in each period. The traffic condition was heaviest congested at 18:00. This time consists of low congestion 28 %, moderate congestion 51 % and high congestion 28 %. From 22:00 to 0:00, there is 100% of low congestion. During that time we will not make the predictions. Page 485

Computer and Information Technology Pattern Analysis and Machine Intelligence

Table III shows the duration of traffic conditions which do not change on road number 1206. The first column refers to the duration of the unchanged traffic congestion. The second, third column, and forth column show the number of the sampling data in that period of time. The last column shows the total of three columns. The last row of the table shows the average of the duration of the unchanged traffic condition. The average time of invariable highly congested is 25 minutes 48 seconds. The average time of moderately congested is 21 minutes 31 seconds and that of low congested is 23 minutes 33 seconds. The average time of all conditions is 21 minutes 4 seconds. TABLETTT

In addition, we use six variables to measure the effectiveness of the model. They can be calculated from the confusion matrix. True Positive (TP) is the proportion of correctly identified instances to all instances in that class. False Positive (FP) is the proportion of instances that were incorrectly classified to all instances from other classes. True Negative (TN) is defined as the proportion of instances that were correctly classified as negative to all negative examples. False Negative (FN) is the proportion of instances that are incorrectly classified as negative to all actual instances from the particular class. Precision (P) is the proportion of the correct prediction to all instances which are predicted as that class. Accuracy (Ae) is the proportion of the correct prediction to all instances.

CONGESTION DURATION OF LINK 1206

Duration

H

M

L

TABLE IV

Total

0:05:00

10

16

5

31

0:10:00

17

31

13

61

0:15:00

5

15

2

22

0:20:00

8

28

10

46

0:25:00

3

8

2

13

0:30:00

6

11

9

26

0:35:00

0

9

4

13

0:40:00

2

13

4

19

0:45:00

1

1

2

4

0:50:00

2

9

4

15

0:55:00

lO

16

5

31

1:00:00

17

31

13

61

Average

0:25:48

0:21:31

0:23:33

0:21:04

CONFUSION MATRIX FOR PREDICTING CHANGES OF TRAFFIC CONGESTION

Predicted

Actual

The performance of machine learning is typically evaluated by a confusion matrix [8] as shown in Table IV. The columns show the Predicted class and the rows show the Actual class. Those classes consist of three values, e.g. UP means traffic congestion is anticipated to increase, EQUAL means traffic congestion which does not change, DOWN means the traffic congestion is anticipated to decrease. For example, block A which is in row 'UP' and column 'UP', refers to the case that the decision tree's prediction is that the traffic congestion will increase and traffic congestion actually increases. In block B, the row 'UP' matched the column 'EQUAL' which means that the traffic congestion will not change, but traffic congestion actually change within 20 minutes. The 8th Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference 2011

DOWN

A

B

C

EQUAL

D

E

F

DOWN

G

H

I

TP (up) FP(up)

=

TN(up)

=

FN(up)

=

P(up) AC

We tested our prediction model usmg congestion condition on the road number 1206 by the 10-fold cross-validation [7].

Performance measures

EQUAL

UP

Those equations are an example for calculating the performance of UP class.

V. EXPERIMENT

A.

UP

B.

=

A

(1)

--

A+B+C D+G

(2)

D+G+E+F+H+I E+F+H+I

(3)

D+G+E+F+H+I B+C

(4)

A+B+C A

(5)

--

A+D+G A+E+I

(6)

A+B+C+D+E+F+G+H+I

Experimental Results TABLE V PREDICTION PERFORMANCE OF PREDICTION CHANGING OF TRAFFIC CONGESTION

Congestion Change

Up%

Equal%

down%

Data Ratio

6.44%

84.23%

TRUE Positive

58.56%

96.67%

73.97%

TRUE Negative

98.55%

68.77%

98.21%

FALSE Positive

1.45%

31.23%

1.79%

9.33%

FALSE Negative

41.44%

3.33%

26.03%

Precision

73.58%

94.29%

81.00%

Accuracy

92.1%

Page 486

Computer and Information Technology Pattern Analysis and Machine Intelligence

Table V shows the performance of the change/not change model. The overall accuracy is 92.1%. The accuracies obtained from two models (as shown in Table I and Table V) are similar. However, in the portion of the congestion condition that changes, the accuracy increases. The accuracy and the proportion of the UP and DOWN classes are less than those of the EQUAL class. This shows that the unbalanced data problem might arise when we constructed the decision trees. Such problem turns out in the case of data from one class is much less than data from other classes. As a result, the classifier cannot learn the high-coverage rules for the class which is minority. The lower average true positive rate and higher average false negative rate of UP and DOWN classes reveal the effect of this problem. Since, the learned rules cannot cover many of the train data. We are currently investigating an enhanced version of this model to solve this unbalance problem.

[7]

Tom M. Mitchell, "Artificial Neural Networks","Machine Learning" , pp

[8]

Il2, 1997 Howard Hamilton. (2009) Confusion Matrix. [Online]. Available: http://www2.cs.uregina.cal-dbd/cs831/notes/confusion_matrix/confusio n matrix.html

VI. CONCLUSION This paper has presented two traffic congestion prediction models. Those models use traffic congestion and other attributes to construct a prediction model. The first model that is aimed at predicting the traffic congestion degree in the future can be predicted accurately. But results of the prediction of the model are unsatisfiable because, in the most cases, it uses current traffic congestion as an answer of the future traffic congestion. This model may not predict accurately in the case that the traffic condition changes in the future. Hence, we present another model to predict the change of the traffic congestion. This latter model can predicted more accurate that the former one and may be more useful. The commuters who intend to use this information can know whether the traffic condition will change within next 20 minutes and can plan the trip better. REFERENCES [I]

T. Punyararj, K. Puntumapon, and W. Pattara-Atikom, "Bangkok traffic information portal using google-map & ajax techniques," in The 9th Intelligent Transport Systems Asia-Pacific Forum & Exhibition, 2008.

[2]

J. Yaothanee and W. Pattara-Atikom, 'Traffic reporter on a go," in The 9th Intelligent Transport Systems Asia-Pacific Forum & Exhibition, (Singapore), July 2008.

[3]

Yaqin Wang and Vue Chen Minggui Qin and Yangyong Zhu, "Dynamic Traffic Prediction Based on Traffic Flow Mining" in the 6th World Congress on Intelligent Control and Automation, June 2006.

[4]

Ming-bao Pang, Xin-ping Zhao, "Traffic Flow Prediction of Chaos Time Series by Using Subtractive Clustering for Fuzzy Neural Network Modeling", in Intelligent Information Technology Applications, 2007 Workshop on, pp. 23-27, 2008 Second International Symposium on

[5]

Intelligent Information Technology Application, 2008 WangYan, Wang Hua and Xia Limin, "Highway Traffic Prediction with Neural Network and Genetic Algorithms" in Vehicular Electronics and Safety, 2005.

[6]

J. R. Quinlan, "Induction of decision trees," Machine Learning, pp. 81-106, 1986.

The 8th Electrical Engineeringl Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference 2011

Page 487