An Efficient Trip Planning Approach with Travel Time ... - CiteSeerX

Trip-Mine: An Efficient Trip Planning Approach with Travel Time Constraints Eric Hsueh-Chan Lu and Chih-Yuan Lin

Vincent S. Tseng

Dept. of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. [email protected]; [email protected]

Dept. of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. [email protected]

Abstract—With the rapid development of wireless telecommunication technologies, a number of studies have been done on the Location-Based Services (LBSs) due to wide applications. Among them, one of the active topics is travel recommendation. Most of previous studies focused on recommendations of attractions or trips based on the user’s location. However, such recommendation results may not satisfy the travel time constraints of users. Besides, the efficiency of trip planning is sensitive to the scalability of travel regions. In this paper, we propose a novel data mining-based approach, namely Trip-Mine, to efficiently find the optimal trip which satisfies the user’s travel time constraint based on the user’s location. Furthermore, we propose three optimization mechanisms based on Trip-Mine to further enhance the mining efficiency and memory storage requirement for optimal trip finding. To the best of our knowledge, this is the first work that takes efficient trip planning and travel time constraints into account simultaneously. Finally, we performed extensive experimental evaluations and show that our proposals deliver excellent results. Keywords-Trip Planning; Travel Time Constraint; Locationbased Service; Data Mining.

I.

INTRODUCTION

The advancement of wireless communication techniques and the popularity of mobile devices, such as mobile phones, PDA, and GPS-enabled devices, have contributed to a number of Location-Based Services (LBSs). One of the popular topics is the travel recommendation. Before traveling to an unfamiliar city for a tourist, the most important preparation is planning his/her trip. Many tourists may need some helps in planning their trip. For example, is there any trip planning suggestion by which I can visit the most popular attractions in five hours? Although there is a number of web-based search agents which can search for related traveling information, it is not efficient. To answer this question, the first problem we need to face is “which attractions are popular and interesting”. There exist a number of studies that discussed about how to recommend attractions to tourists. Cyberguide [1] is the first project to provide a hand-held context-aware tour guide. Bayesian network [5] and the enhanced collaborative filtering (CF) solution [6] are proposed for recommendation of tourist attractions at a given destination and recommendation based on location, respectively. Besides, a HITS-based inference model [15] is proposed to mine interesting locations based on multiple

users’ GPS trajectories. Hence, the information about which attractions are popular and interesting is easy to obtain based on these works. The second problem is “how to plan these attractions as a trip”. There are also many works have been proposed for trip recommending, including travel agent [11], rule-based tour planning [13], ontological recommendation multi-agent [8], etc, are proposed to recommend a trip plan for the traveler. The problem of automatically finding semantically annotated sequences [7] is addressed based on geotagged photos. However, there exist few works that consider how much time the tourist has. No matter how interesting a trip is, the trip does not match the tourist’s expectation if the time needed in the trip exceeds the time the tourist has. Hence, a better trip planning should consider not only how interesting the trip is but also whether the trip time satisfies the tourist’s time planning. In general, to provide the service on interesting or popular trip recommendation with consideration of travel time constraint, we have to deal with the following two questions: 1) How to evaluate the interest or popularity for an attraction? 2) How to estimate the time needed for a trip? For the first question, a number of studies have discussed this issue as the above discussions. Hence, we do not focus this problem in this paper. We assume that the interest of an attraction can be evaluated as a rating score value. For the second question, the time need of a trip is composed of the moving time and stay time of attractions. The moving time is approximately estimated by the average travel speed in each routine path from traffic profiles. For the stay time of an attraction, we assume that the information is suggested by the attraction managements. Thus, the score and time need for a trip can be easy evaluated. However, for a travel destination, e.g., New York city, there exists a large number of trip itineraries which satisfy the tourist’s travel time constraint. It is not efficient if we scan all the trip itineraries and select the most interesting one. Hence, “How to efficiently search for the most interesting trip which satisfies a travel time constraint from a large number of attractions” is an important issue. We call such problem as optimal trip finding problem. Take Fig. 1 as an example. There are four attractions, i.e., a1 to a4. Each attraction has the rating score and stay time. For two attractions, the approximate travel time is provided. The optimal trip finding problem is to generate a series of attractions from the start location, and this trip must satisfy the user’s travel time

RS(a1) = 5 ST(a1) = 30

a4 RS(a4) = 10 ST(a4) = 50

50 a1

65 50 50 30

30

40 30

a2 RS(a2) = 3 ST(a2) = 20

20

au 20

Current Location tc = 210

a3 RS(a3) = 8 ST(a3) = 30

Figure 1. Example of a trip map network.

constraint. Besides, we hope that the total rating score of optimal trip is maximal. The optimal trip finding problem is different from traditional Traveling Salesman Problem (TSP) [2]. The reason is that TSP finds the shortest path from the exact n given locations. In other words, all n locations have to be involved. In contrast, the proposed optimal trip finding problem is to finds a sub-permutation of given n locations. Furthermore, the optimal trip finding problem is a NP complete problem. Essentially, a key challenge of this problem is that the computational cost can be dramatically increased as the number of attractions increases, since this is a combinatorial problem in nature. In the past, Lu et al. [10] try to improve the efficiency of route search by the concept of dynamic programming. Although the efficiency of trip finding problem is improved by the dynamic programming, the computational performance still significantly increases when the number of attraction increases. Hence, an efficient solution is highly desired to find the optimal trip with a travel time constraint. In this paper, we aim at developing an approach to efficiently find the optimal trip under a user-specific travel time constraint. We first give the simplest approach, called BruteForce, to discuss the time complexity of optimal trip finding problem. Then, two properties, namely Trip Time Closure Property and Trip Score Constant Property, are presented to discuss our observations in the optimal trip finding problem. Next, we propose a novel data mining-based approach, named Trip-Mine, and three optimization mechanisms, named Attraction Sorting, Low Bound Checking, and Score Checking, for efficient discovery of the optimal trip. To the best of our knowledge, this is the first work on taking efficient trip planning with travel time constraint into account simultaneously. Finally, through an extensively experimental evaluation by simulation, we show that our proposals deliver an excellent performance in terms of execution time and memory storage. The advantages and contributions of this paper are four-

fold:

1) We study the optimal trip finding problem, which has not been well explored in the research community. 2) We propose the Trip-Mine, a new approach for efficient optimal trip finding. 3) Based on the Trip-Mine, we propose three optimization mechanisms for enhancing the mining efficiency and memory storage requirement. 4) We design a simulation model and conduct a series of experiments to evaluate the performance of our proposal. The results show superior performance over other mining techniques in terms of execution time and memory storage. The remainder of this paper is organized as follows. We briefly review the related work in Section II. In Section III, we formulate the aimed problem. In Section IV, we first introduce the proposed approach Trip-Mine. Then, we describe the proposed optimization mechanisms Attraction Sorting, Low Bound Checking, and Score Checking. In Section V, we perform an empirical performance evaluation. Finally, in Section VI, we summarize our conclusions and future work. II.

RELATED WORK

In this section, we review and classify relevant previous studies in the domain of travel recommendation into two categories: 1) attraction recommendation and 2) trip planning. A. Attraction Recommendation In the domain of travel recommendation, the first attractive issue is the attraction recommendation. In [1], Abowd et al. propose a hand-held context-aware tour guide named the Cyberguide project which is a series of prototypes of a mobile. The idea of Cyberguide project is to think what activities could be best supported by mobile technology and then determine how the technology would have to work. They consider that a mobile application should take advantage of contextual information, such as position, to offer greater services to the user. In [5], Huang et al. propose a personalized recommendation of tourist attractions at a given destination. They consider that Bayesian network is a good method to support the combination of content-based and collaborative filtering approach. However, the capability of pure contentbased or collaborative filtering approach is limited in the travel domain. In [6], Horozov et al. focus on investigating the issues for location-based points of interest (POI) in the context of a recommender system. They use location as a key criterion for generating recommendations based on the proposed enhanced collaborative filtering (CF) solution. In [15], Zheng et al. discuss the issue of mining interesting locations and classical travel sequences. In this work, they propose a HITS-based inference model to mine interesting locations and classical travel sequences based on multiple users’ GPS trajectories in a given geospatial region. Although the above works can recommend the interesting attractions to user, they do not consider how to plan a travel trip based on the discovered attractions.

B. Trip Planning The second issue is the trip planning. In [11], Soo et al. propose a travel agent to recommend a trip plan for a particular traveler. They consider that communication and negotiation with the traveler is necessary to recommend a satisfactory trip plan for the traveler. A better travel agent should satisfy not only the customer’s personal preferences and constraints, but also the complex spatial, temporal, physical and cost constraints. In [13], Yu et al. propose a rule-based tour planning and design methods for facilitating the delivery of location-based recommendation services. Various recommendations can be generated based on tourists’ current location and time, as well as personal preferences and needs. However, this work needs the travel expert to define the travel rules. It may not be suitable for everybody. In [8], Lee et al. propose an ontological recommendation multi-agent for Tainan City travel. According to the tourist’s requirements, the agent can recommend a personalized travel route to the tourist in Tainan City. However, the numbers of attractions and restaurants are constant in this agent. In [7], Kisilevich et al. address the problem of automatically finding semantically annotated sequences based on geotagged photos. However, it did not consider the travel constraints and user preferences. In [14], Yoon et al. propose a smart recommendation, based on multiple user-generated GPS trajectories, to efficiently find itineraries. In the itinerary recommendation system, users need to provide a start point, an end point and travel duration. In [10], Lu et al. target at solving the problem of automatic travel route planning. They try to improve the efficiency of route planning by the concept of dynamic programming. The dynamic programming searches all possible trips based on the concept of depth first search. In the process, the sub-tree of a node can be pruned if the trip time of the node is greater than the travel time constraint. Hence, the computational efficiency can be improved. The similar pruning idea is referred to several data mining problems such as temporal moving object clusters mining [9], maximal frequent itemset mining [4], and utility pattern mining [12]. In [3], Ge et al. develop a mobile recommender which has the ability in recommending a sequence of pick-up points for taxi drivers. However, the efficiencies of above works are sensitive to the number of location points.

TABLE I. NOTATIONS.

Notation a; A

Description An attraction; Attraction set

RS(a)

The rating score of the attraction a

ST(a)

The stay time of the attraction a

r; R

A route; Route set

M

Trip map network, M = (A, R)

as

A set of attractions, as = {a1, a2, ..., am}

tp

A trip, tp =

TPS(tp)

The trip score of the trip tp

TPT(tp)

The trip time of the trip tp

au

The current location of the user

tc

The travel time constraint

TT( a x , a y ) instead of distance. We assume that TT( a x , a y ) j j j j = TT( a y , a x ) and the travel time is approximately estimated j

j

by using the average travel speed from the route profiles. Definition 3. Trip Map. M = (A, R) denotes the trip map network. Take Fig. 1 as an example. There are one current location of user, four attractions, i.e., a1 to a4, and 10 routes in the trip map network. For the attraction a1, the corresponding rating score is 5 and stay time is 30, i.e., RS(a1) = 5 and ST(a1) = 30. The travel time of route between a1 and a2 is 30, i.e., TT(a1, a2) = 30. Definition 4. Trip. A trip tp = < ak , ak , ..., ak >, also 1 2 n denoted as n-trip, which is orderly composed of one or several attraction destinations, where n indicates the number of attractions (i.e., |tp| = n) and ar  A,  k1  r  kn. For example, is one of the trips in Fig. 1. Definition 5. Trip Score. For a trip tp = < ak , ak , ..., ak >, 1 2 n the Trip Score TPS(tp) is defined as (1), which represent how interesting this trip is. For example, TPS() = 5 + 3 = 8. n

TPS (tp)   RS (aki )

(1)

i 1

III.

PROBLEM STATEMENT

In this section, we first define some terms used in this work and then specify our research goal. Table I summarizes the notations used in the paper. Definition 1. Attraction. A = {a1, a2, ..., a|A|} denotes the collection of attractions. For all ai  A, the attraction ai has a Rating Score RS(ai) and a Stay Time ST(ai), which represent how interesting this attraction is and how long the users often stay at this attraction, respectively. Definition 2. Route. R = {r1, r2, ..., r|R|} denotes the collection of routes. For all rj  R, rj = ( a x , a y ), where a x , a y  A j

j

j

j

represent two attractions and xj  yj. In this paper, each route is defined as the fastest route between a x and a y . The j

j

measurement of each route rj is defined as the Travel Time

Definition 6. Trip Time. Let au be the current location of the user. For a trip tp = < ak , ak , ..., ak >, the Trip Time TPT(au, 1 2 n tp) is formulated as (2), which represent how long the users travel around this trip. For example, TPT(au, ) = TT(au, a1) + TT(a2, au) + TT(a1, a2) + ST(a1) + ST(a2) = 30 + 30 + 30 + 30 + 20 = 140. TPT (au , tp )  TT (au , ak1 )  TT (akn , au )  n 1

 TT (a j 1

n

kj

, ak j1 )   ST (akn )

(2)

i 1

Definition 7. Travel Time Constraint. The travel time constraint is defined as how much time the user has in this trip. Take Fig. 1 as an example. A user has 210 (minutes) in a trip. The travel time constraint specified by the user is 210 (minutes).

Definition 8. Valid Trip. Let au and tc be the current location of the user and the user-specific travel time constraint, respectively. A trip tp = < ak , ak , ..., ak > is called a Valid 1

2

n

Trip if TPT(au, tp) is less than or equal to tc; otherwise, it is an Invalid Trip. In Fig. 1, the current location of the user is au and the travel time constraint tc is 210. tp = is a valid trip since TPT(au, tp) = 140 which is less than tc. However, tp’ = is not a valid trip since TPT(au, tp’) = 215 which is greater than tc.

2

n

PROPOSED METHOD

In this paper, three important research issues need to be addressed: 1) how to obtain the optimal trip; 2) how to improve the efficiency of trip computation; and 3) how to reduce the memory cost. In this section, we first give the simplest approach, called BruteForce, to discuss the time complexity of optimal trip finding problem. Then, two properties, namely Trip Time Closure Property and Trip Score Constant Property, are presented to discuss our observations in the optimal trip finding problem. Next, we describe our proposed approach, namely Trip-Mine, for a location-based service that finds the optimal trip based on the user current location and travel time constraint. Finally, based on the Trip-Mine, we propose three optimization mechanisms, namely Attraction Sorting, Low Bound Checking, and Score Checking, for efficient discovery of optimal trip. A. The Brute Force Approach For the optimal trip finding problem, the simplest approach to address this issue, called BruteForce, is to generate all possible trips for the whole trip map network. Next, we calculate the trip time for all the trips and obtain all the valid trips. Finally, the optimal trip can be found by calculating all the trip scores of valid trips. Theorem 1. Let n be the number of target attractions. The number of possible trips Complexity(n) is formulated as (3). n

Complexity (n)   (C  i!) i 1

n i

(3)

Trip

Time

Score

90

5

180

16

80

3

200

16

70

8

200

16

150

10

200

16

140

8

200

16

150

13

180

16

210

15

275

18

140

8

280

23

250

21

120

11

215

13

150

13

120

11

200

18

310

26

210

15

325

26

215

13

325

26

200

18

…

IV.

Score

…

Problem Formulation. With the above definitions, the main problem we address in this paper is formulated as follows. Given a current location of the user and a user-specific travel time constraint, our goal is to develop a trip planner which provides the optimal trip. We expect the planner can efficiently and accurately return the trip answer.

Time

…

Optimal Trip if there is no valid trip tp’ such that TPS(tp’) is greater than TPS(tp). Notes that there may be not only one optimal trip in some constraint conditions. Take Fig. 1 as an example, there are several valid trips in the trip map network, e.g., , , < a1, a2, a3>, etc. The optimal trip is < a3, a4> since the trip score of < a3, a4> is greater than other valid trips else.

Trip

…

Definition 9. Optimal Trip. Let au and tc be the current location of the user and the user-specific travel time constraint, respectively. A valid trip tp = < ak , ak , ..., ak > is called the 1

TABLE II. THE PARITAL POSSIBLE TRIPS.

Proof. The lengths of possible trips are from 1 to n since the number of attractions is n. For each trip of length i, the number of combinations is Cin . For each combination of length i, the number of permutations is i!. Hence, the number of possible trips is formulated as the sum of number of combinations multiplied by number of permutations from 1 to n. Take Fig. 1 as an example again. Suppose that the current location of the user is au and the travel time constraint tc is 210. The number of target attractions is 4, i.e., a1 to a4. At first, we generate all the possible trips and calculate their trip time and trip scores. Table II shows the partial possible trips. From this table, the optimal trips and are obtained by checking their trip time and trip scores. Based on Theorem 1, the number of possible trips is 64 when the number of attractions is only 4. Obviously calculating these trips takes an exponential time, and this is a NP complete problem. Hence, an efficient mining approach is crucial. With our proposed Trip-Mine, the number of calculations for trip time and trip scores is significantly reduced and that is the basis of our solutions. Then, two observed properties and three optimization mechanisms are discussed in the following subsections. B. Downward Closure Property of Trip Time Theorem 2. Given three attractions ax, ay, and az, and the three corresponding routes are rxy = (ax, ay), ryz = (ay, az), and rxz = (ax, az). The sum of travel time for any two routes must be greater than or equal to the travel time of the third route. TT (a x , a y )  TT ( a y , a z )  TT (a x , a z )  TT (a x , a y )  TT ( a x , a z )  TT (a y , a z ) TT (a , a )  TT (a , a )  TT (a , a ) y z x z x y 

Proof. Take the first Inequality as an example. If the travel time of rxz is greater than the sum of travel time of rxy and ryz, there must exists a faster route between ax and az, i.e., from ax to az through ay. Hence, the sum of travel time of rxy and ryz is the upper bound of the travel time of rxz. Theorem 3. Given two trips tp, tp’, and the current location of the user is au. TPT(au, tp)  TPT(au, tp’) if tp is the sub-trip of tp’. Take Fig. 1 as an example, let tp = , tp’ = , and the current location of the user is au. TPT(au, tp) = 140  TPT(au, tp’) = 180.

Pruning Strategy 1. Trip Time Downward Closure Pruning. Based on the Theorem 3, given an invalid trip tp, any super-trip of tp must be an invalid trip; all the super-trips of tp can be pruned. In other words, a trip is invalid if it contains any invalid trip. Based on the trip time downward closure pruning strategy, we can prune the trips which contain any invalid trip to reduce the number of redundant calculations for the trip time and trip scores. All the possible trips are generated from length 1 to k, where k is the number of attractions. Before calculating their trip time, we check that whether the trip contains any invalid trip or not. The trip can be directly determined as an invalid trip if it contains any invalid trip without calculating its trip time. Take Table II as an example. The trip can be determined as an invalid trip since it contains which is an invalid trip. Although the trip time downward closure pruning can effectively reduce the calculation cost of trip time and trip score, it still needs to generate all possible trips, and it is still a NP complete problem. Besides, trip containing checking needs to scan all discovered invalid trips.

(a) All Candidate Trips.

(b) {a1, a2, a3} Checking.

Candidate Trip

Score

Trip

Time

{a1}

5

180

{a2}

3

200

{a3}

8

200

{a4}

10

200

{a1, a2}

8

200

{a1, a3}

13

{a1, a4}

15

{a2, a3}

11

{a2, a4} {a3, a4}

180 (c) {a1, a2, a3, a4} Checking. Trip

Time

13

310

18

325

{a1, a2, a3}

16

325

{a1, a2, a4}

18

345

{a1, a3, a4}

23

325

{a2, a3, a4}

21

310

{a1, a2, a3, a4}

26

…

Proof. We first proof the situation that tp’ is exactly more than tp only one attraction a’ (i.e., |tp’| - |tp| = 1). This situation can be divided into three cases: 1) a’ is the first attraction of tp’: suppose that the first attraction of tp is a1 (the second attraction of tp’ is also a1), TPT(au, tp’) - TPT(au, tp) = TT(au, a’) + ST(a’) + TT(a’, a1) - TT(au, a1). Based on Theorem 2, TT(au, a’) + TT(a’, a1)  TT(au, a1), and ST(a’)  0. Hence, TPT(au, tp’) TPT(au, tp) = TT(au, a’) + TT(a’, a1) - TT(au, a1) + ST(a’)  0. Finally, TPT(au, tp’)  TPT(au, tp); 2) a’ is the last attraction of tp’: similar case 1, suppose that the last attraction of tp is an, TPT(au, tp’) - TPT(au, tp) = TT(an, a’) + ST(a’) + TT(a’, au) TT(an, au). Since the right hand is greater than or equal to 0, TPT(au, tp’)  TPT(au, tp); and 3) a’ is the middle attraction of tp’: suppose that the last attraction and the next attraction of a’ are alast and anext, respectively. TPT(au, tp’) - TPT(au, tp) = TT(alast, a’) + ST(a’) + TT(a’, anext) - TT(alast, anext). The right hand is also not less than 0, hence, TPT(au, tp’)  TPT(au, tp). For the situation that tp’ is more than tp several attractions (i.e., |tp’| - |tp| > 1), it can be proved by Mathematical Induction based on the above basic situation and three cases. For example, suppose that |tp’| - |tp| = k, it means there are k attractions in tp’ but not in tp. We can select one of attractions combines tp as tp*. Based on the above proof, TPT(au, tp*)  TPT(au, tp). Next, we use tp* replace tp and select an un-selected attraction to join tp as tp* until tp* is equal to tp’. It proof that TPT(au, tp’)  TPT(au, tp).

TABLE III. THE ATTRACTION COMBINATIONS.

C. Trip Score Constant Property The BruteForce method can find all the optimal trips under any trip map network and travel time constraint. In Table II, there are two optimal trips, i.e., and , when the travel time constraint is set as 210. However, it is unnecessary to return all the optimal trips to the user. We only recommend one of the best trips in all valid trips to the user. For a trip, the trip score is constant no matter what the order of trip attraction is. Take Table II as an example. The trip scores of , , and are 16. Based on this idea, we first find all the attraction combinations. Next, we only need to check whether it is a valid trip or not since the trip score of each combination is constant. Definition 10. Candidate Attraction Set. A candidate attraction set cas = { ak , ak , ..., ak }, also denoted as 1 2 n candidate n-set, which is a set of attractions, where n indicates the number of attractions (i.e., |cas| = n) and ar  A,  k1  r  kn. For example, {a1, a2} is one of candidate attraction sets in Fig. 1. Definition 11. Valid Attraction Set. A candidate attraction set cas = { ak , ak , ..., ak } is called a valid attraction set, also 1 2 n denoted valid n-set, if there exists at least one permutation of cas which satisfies the travel time constraint. Take Fig. 1 as an example. {a1, a2} has two permutations, i.e., and . {a1, a2} is a valid attraction set (valid 2-set) since both and satisfy the travel time constraint. Table III(a) shows all the candidate attraction sets for the Fig. 1. The candidate 1-sets, i.e, {a1}, {a2}, {a3}, and {a4}, are only directly checked their trip time, e.g., {a1} only check TPT(au, ), since they have only one permutation (attraction set itself). For the candidate 2-sets, i.e., {a1, a2} to {a3, a4}, they are also directly checked their trip time, e.g., {a1, a2} only need to check TPT(au, ), since all the trip time of their

Candidate 1-Sets. 1-Set

Time Score

Candidate 2-Sets.

Candidate 3-Sets.

Candidate 1-Sets.

Candidate 2-Sets.

2-Set

3-Set

1-Set Time Score

2-Set

Time Score

Time Score

Candidate 3-Sets.

Time Score

3-Set

Time Score

{a1}

90

5

{a1, a2} 140

8

{a1, a2, a3} ?

16

{a4}

150

10

{a4, a1} 210

15

{a4, a1, a3}

?

23

{a2}

80

3

{a1, a3} 150

13

{a1, a2, a4} ?

18

{a1}

90

5

{a4, a2} 215

13

{a1, a2, a3}

?

16

{a3}

70

8

{a1, a4} 210

15

{a1, a3, a4} ?

23

{a2}

80

3

{a4, a3} 200

18

{a4}

150

10

{a2, a3} 120

11

{a3}

70

8

{a1, a2} 140

8

{a2, a4} 215

13

{a1, a3} 150

13

{a3, a4} 200

18

{a2, a3} 120

11

{a1, a2, a3} Checking.



{a4, a1, a3} Checking

{a1, a2, a3} Checking

Trip

Time

Trip

Time

Trip

Time

Trip

Time

Trip

180

275

280

280

Time 180

200

275

260

260

200

200

260

270

270

200

200

275

260

260

200

200

260

270

270

200

180

275

280

280

180

Figure 2. Apriori-based optimal trip generation.

Figure 3. Apriori-based optimal trip generation with attraction sorting.

permutations are the same. However, after candidate 3-sets, i.e., {a1, a2, a3} to {a1, a2, a3, a4}, the trip time of their permutations are different. Take {a1, a2, a3} as an example. The trip time of and are 180 and 200, respectively. Hence, we need to check the trip time of their permutations. In the checking process, we do not need to check their all the permutations. Because once we find one valid trip tpValid, the trip scores of other permutations are the same as the trip score of tpValid even there still exist other valid trips. Take Table III(b) as an example. For the candidate 3-set {a1, a2, a3}, the trip checking process can be terminated in the first permutation, i.e., , since its trip time already satisfy the travel time constraint (tc = 210). Other permutations do not need to be checked since the trip scores of them are equal to that of . Hence {a1, a2, a3} is a valid 3-set. But for the candidate 4set {a1, a2, a3, a4} in Table III(c), we have to check all the permutations since all the permutations are invalid. {a1, a2, a3, a4} is not a valid 4-set. Therefore, the computation complexity can be reduced from (3) to (4), where c is a constant. The best and worst cases of c are 1 and i!, respectively.

1) and check whether the candidate attraction sets are valid or not. The valid attraction set checking process is discussed in the last sub-section. Hence, we first describe how to generate candidate attraction set without redundant candidate attraction set generation according to the trip time downward closure pruning property.

n

Complexity (n)   (Cin  c)

(4)

i 1

With the above discussions, the optimal trip finding problem is reduced to finding all combinations for all attractions, i.e., finding all candidate attraction sets. However, it is still a NP complete problem. Based on our observations, there are many candidate attraction sets which are impossible to include any valid trip. Such candidate attraction sets are called redundant candidate attraction sets. To improve the efficiency of optimal trip finding, the redundant candidate attraction sets have to be pruned as much as possible. D. Trip-Mine Approach In this sub-section, we describe our proposed Trip-Mine approach. The design idea of Trip-Mine is that we first generate the candidate attraction sets of length k (k starts from

Policy 1. Candidate Attraction Set Generation. Let vas1 = { a x , a x , ... , a x , a x } and vas2 = { a y , a y , ..., a y , a y } 1 2 k 1 k 1 2 k 1 k be two valid attraction sets of length k. The candidate attraction set cas of length k+1 is generated if x1 = y1, x2 = y2, ..., and xk-1 = yk-1, and cas = < a x , a x , ... , a x , a x , a y >. 1

2

k 1

k

k

To explain the policy of candidate attraction set generation, we take Fig. 1 as an example. Suppose OPT = (trip, score) is used to store the optimal trip. Initially, OPT.trip =  and OPT.score = 0. As shown in Fig. 2, we first generate 4 candidate 1-sets and directly check their trip time and trip scores. All the candidate 1-sets are valid. is determined to the optimal trip since the trip score of is the highest. Hence, OPT is updated as (, 10). Next, two valid 1-sets are joined as candidate 2-set. There are 6 candidate 2-sets. By checking their trip time, we observe that {a2, a4} is not valid. Hence, {a2, a4} can be pruned, and it does not join candidate 3set generation. There are 3 candidate 3-sets. OPT is updated as (, 18) since the trip score of is the highest. For these candidate 3-sets, we need to generate their permutations and check that is any permutation can satisfy the travel time constraint. For the candidate 3-sets {a1, a2, a4} and {a1, a3, a4}, there is no permutation which is valid. Hence, both {a1, a2, a4} and {a1, a3, a4} are pruned. The optimal trip is still since the trip score of , i.e., 16, is not greater than the trip score of , i.e., 18. The algorithm stops since no candidate attraction set is generated. Finally, the optimal trip can be obtained from OPT. Compared with Table III, the candidate attraction sets {a2, a3, a4}, {a1, a2, a3, a4}, and their permutations do not need to be generated and checked.

Obviously, the efficiency of optimal trip finding problem is improved by the Trip-Mine approach. E. Optimization Mechanisms Based on the Trip-Mine approach, we design three optimization mechanisms, including Attraction Sorting, Low Bound Checking, and Score Checking, to improve the mining efficiency and memory storage of optimal trip finding. 1) Attraction Sorting The first optimization mechanism is Attraction Sorting. We observe that the number of candidate attraction sets can be reduced if the attractions are sorted by their trip time at the start of algorithm. Take Fig. 1 as an example. The original order of attraction is a1, a2, a3, and a4. We obtain TPT(au, ), TPT(au, ), TPT(au, ), and TPT(au, ) are 90, 80, 70, and 150, respectively. The attraction with maximal trip time moves to the top of candidate 1-sets. The result is shown in Fig. 3. The new order of attraction is a4, a1, a2, and a3. The following optimal trip finding procedure is the same as the Trip-Mine approach. Compared with the Trip-Mine approach, the candidate set {a1, a2, a4} and their permutations do not need to be generated and checked. 2) Low Bound Checking The second optimization mechanism is Low Bound Checking. For the Trip-Mine approach, a lot of computational cost is used to check whether a candidate attraction set is valid or not. In other words, the computational cost for checking all the permutations is very high. Hence, the goal of this mechanism is to reduce the number of permutation generations. Theorem 4. Trip Time Low Bound. Given an attraction set A’ = {a1, a2, ..., an} with length is equal to n and a current location of the user au, the Trip Time Low Bound of A’, is composed of the following three parts: a) The sum of stay time for all attractions. b) The sum of travel time for the most n-1 fastest routes between attractions. For n attractions, it can be generated C2n routes. c) The sum of travel time for the most two fastest routes from the start location to attractions. Proof. For the first part, the travel time of an attraction set must include the sum of stay time for all attractions, since the user will stay these attractions. For the second part, the user will move between attractions. For n attractions, they need at least n-1 routes to connect these attractions. Hence, the most n-1 fastest routes between attractions will be selected as the low bound of moving travel time. For the third part, the user must go to one of attractions first and return to the start location finally. Hence, the most two fastest routes from the start location and attractions will be selected as the low bound of moving travel time, no matter what the number of attractions is. Pruning Strategy 2. Travel Time Low Bound Pruning. Based on the Theorem 4, given an attraction set A’, A’ is not a valid attraction set if the trip time low bound of A’ is greater than the travel time constraint. In other words, A’ can be pruned.

RS(a1) = 5 ST(a1) = 30

a4 RS(a4) = 10 ST(a4) = 50

50 a1 50

50 30

40 au 20

Current Location tc = 210

a3 RS(a3) = 8 ST(a3) = 30

Figure 4. Travel time low bound calculation for attraction set {a4, a1, a3}.

To explain the travel time low bound pruning, we take Fig. 1 and Fig. 3 as an example. In the Trip-Mine approach, for the candidate attraction set {a4, a1, a3} in Fig. 3, we need to generate all permutations of {a4, a1, a3} and determined whether {a4, a1, a3} is a valid attraction set or not. However, the computation cost of all permutations can be saved if we check the travel time low bound of {a4, a1, a3}. As Fig. 4 shows, the sum of stay time for the three attractions is 50 + 30 + 30 = 110. The routes (a4, a1) and (a1, a3) will be selected as the minimal moving travel time between attractions, i.e., TT(a4, a1) + TT(a1, a3) = 50 + 40 = 90, and the routes (au, a1) and (au, a3) will be selected as the minimal moving travel time between start location to attractions, i.e., TT(au, a1) + TT(au, a3) = 30 + 20 = 50. Finally, the travel time low bound of {a4, a1, a3} is 110 + 90 + 50 = 250, which is greater than the travel time constraint. Hence, we do not need to generate the permutations of {a4, a1, a3} and check their trip time, since the trip time is impossible low than the travel time low bound, i.e., 250. Therefore, the travel time low bound checking effectively improve the efficiency of optimal trip finding. 3) Score Checking The third optimization mechanism is named Score Checking. Based on the trip score constant property, the trip scores of an attraction set and its all permutations are the same. For a candidate attraction set, we do not need to check that is there any permutation satisfies the travel time constraint, if the trip score of this attraction set is lower than the maximal trip score we have so far found. Take Fig. 3 as an example. The maximal trip score is 18 when the set {a4, a3} is checked in candidate 2-sets. Then, the sets {a1, a2}, {a1, a3}, and {a2, a3} do not need to be checked since the trip scores of them are only 8, 13, and 11, respectively. However, the three sets cannot be pruned since we don’t know whether they are valid or not. They may join another 2-set as a candidate 3-set if they are valid. Such candidate k-set is called unknown k-set, e.g., {a1, a2}, {a1, a3}, and {a2, a3} are unknown 2-sets. {a1, a2} and {a1, a3} can generate the candidate 3-set {a1, a2, a3} if both {a1, a2} and {a1, a3} are valid. Hence, we still need to check whether

Parameter

Description

Default Value

|W|

|W| × |W| trip map network

NA

The number of attractions in the network

RRS

The range of rating score for each attraction

RST

The range of stay time for each attraction

15~90min

RTT

The range of travel time between locations

15~90min

CTT

The travel time constraint

50km 50 1~10

240min

{a1, a2} and {a1, a3} are valid or not. However, {a2, a3} does not need to be checked since there is no valid or unknown 2-set can join {a2, a3} as a candidate 3-set. Therefore, an unknown kset can be pruned without permutation checking if there is no valid or unknown k-set can join it as a candidate (k+1)-set. Based on the score checking, the total number of permutations can be reduced to improve the efficiency of optimal trip finding. V.

Execution Time (sec.)

10000

TABLE IV. MAJOR PARAMETERS OF SIMULATION MODEL.

1000 100 10

Brute Force DP Trip-Mine

1 0.1 0.01 0.001 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 The Number of Attractions (NA) Figure 5. Comparison of various trip finding approaches.

Attraction Sorting (AS), Low Bound Checking (LB), and Score Checking (SC) under various numbers of attractions and travel time constraints in terms of execution time and memory storage. Besides, we especially evaluate the effect of low bound checking, since the number of permutation checks is reduced if the low bound estimation of trip time for the attraction sets is accurate.

EXPERIMENTAL EVALUATIONS

We conducted a series of experiments to evaluate the performance of Trip-Mine and three optimization mechanisms by the synthetic data under various system conditions. In the experiments, we evaluate the execution time and memory storage of optimal trip finding under a number of examined approaches. All of the experiments were implemented in Java on an Intel Core2 Quad Processor machine with 2 GB of memory running windows XP. A. Simulation Model and Experimental Settings To evaluate the performance and scalability of Trip-Mine, we design a simulation model to generate synthetic data. Table IV summarizes the major parameters in the simulation model and their default values. In the base experiment model, we use a |W|  |W| network to model the trip map. All the attractions are randomly located on the trip map network. The number of attractions is determined by a given parameter NA. For each attraction, the rating score and the stay time are determined based on uniform distributions within given ranges RRS and RST, respectively. For each pair of attractions, the travel time is determined based on uniform distributions within a given range RTT. Thus, a complete trip map network is established. Next, we simulate tourists query the optimal trip with a travel time constraint. For each tourist, the start location is randomly selected on the trip map network. For each pair of start location and attractions, the travel time is determined based on uniform distributions within a given range RTT. The travel time constraint of the tourist is determined by a given parameter CTT. The experiments are divided into two parts: 1) External Comparison: The external comparison focuses on comparison of the proposed approach Trip-Mine with the simplest approach BruteForce and the previous proposed approach Dynamic Programming (DP) [10] under various numbers of attractions in terms of execution time. 2) Internal Evaluation: The internal evaluation focuses on evaluation of the three proposed optimization mechanisms within the Trip-Mine, including

B. Efficiency Comparison of Various Finding Approaches This experiment analyzes the execution time of examined trip finding approaches, including BruteForce, Dynamic Programming (DP), and Trip-Mine, by varying the number of attractions from 2 to 30. Fig. 5 shows that Trip-Mine is significantly better than BruteForce and DP in terms of execution time. We observe that the execution time is very efficient for Trip-Mine, while it exponentially increases for BruteForce and DP as the number of attractions increases. Clearly, the BruteForce approach does not work when the number of attractions is greater than 12. The main reason is that the optimal trip finding problem is a NP complete problem. The time complexity must be exponential growth when all the possible trips are generated and checked. For DP, although it is a pruning-based approach, it needs more time to search and check all the valid trips. Hence, the execution time of DP significantly increases when the number of attractions increases. For Trip-Mine, we first generate all the possible attraction sets. Based on the trip time downward closure property, all the redundant attraction sets are pruned to improve the search efficiency. As the results show, the BruteForce approach is not an available solution for optimal trip finding. Hence, we do not compare the BruteForce approach in the following experimental evaluations. C. Effect Evaluation of Low Bound Checking Mechanism This experiment analyzes the effect of low bound checking optimization mechanism in various lengths of attraction sets. Actually, the length of attract set represents how many attractions in the trip. For various lengths of candidate attraction set, we individually record all the estimated trip time low bounds and their corresponding exact trip time in each candidate attraction set checking. The goal of this experiment is to evaluate how close the estimated trip time low bound to the exact trip time. Fig. 6 shows that the estimated trip time is closer to the exact trip time in the shorter attraction sets, i.e.,

410

64

1000

400 390 380 370 3

4

5 6 7 Length of Attraction Set

8

Figure 6. Effect evaluation of low bound estimation of trip time.

the length is less than 5. The main reason is that our low bound estimation strategy is based on the minimal travel time selection. For a shorter attraction set, the number of routes between attractions is fewer. Hence, it is easy to estimate the exact trip time of the attraction set. Even so, the average ratio of estimated trip time to exact trip time is still 94.52% after the length of attraction set is 5. The result shows that our proposed low bound estimation of trip time achieves superior performs in terms of estimated accuracy. D. Comparison of the Optimization Mechanisms This experiment analyzes the optimization effect of the three proposed optimization mechanisms based on Trip-Mine, i.e., Attraction Sorting (AS), Low Bound Checking (LB), and Score Checking (SC), in terms of execution time and memory storage. In this figure, None and All represent the Trip-Mine without any optimization mechanism and Trip-Mine with all optimization mechanisms, respectively. Fig. 7(a) shows that the efficiency of Trip-Mine with optimization mechanisms except SC outperforms that of original Trip-Mine. For SC, both execution time and memory storage significantly increase when Trip-Mine only integrates SC. The main reason is that too many trip scores of candidate attraction sets are less than the so far maximal score during the mining process. It leads to too many unknown attraction sets are generated. In the phase of candidate join, Trip-Mine needs much execution time and memory to check whether two valid or unknown attraction sets can join or not. Hence, the SC strategy cannot solo integrate with Trip-Mine. In the following experiments, we do not 128

Memory Storage (MB)

0.6


Trip-Mine + All

10000 Memory Storage (MB)

Trip Time (min)

420

Trip-Mine + AS

Trip-Mine DP Trip-Mine + LB

Estimated Trip Time Exact Trip Time Execution Time (sec.)

430

0.5 0.4 0.3 0.2 0.1 0

64 32 16 8 4 2 1

None AS LB SC All Optimization Mechanism (a) Execution Time.

None AS LB SC All Optimization Mechanism (b) Memory Storage.

Figure 7. Comparison of three optimization mechanisms.

100 10 1 0.1 0.01

32 16 8 4 2 1

50 100 150 200 250 Number of Attractions (NA) (a) Execution Time.

50 100 150 200 250 Number of Attractions (NA) (b) Memory Storage.

Figure 8. Comparison of various numbers of attractions (NA).

compare Trip-Mine with SC to other optimization mechanisms. For AS and LB, we observe that both execution time and memory storage are improved by integrating them. The reasons are that the number of candidate attraction sets is reduced when the attraction order is considered and the computation cost of candidate attraction set checking is reduced if we first check the trip time low bound of each attraction set. Finally, we add all the optimization mechanisms to Trip-Mine. The results show that the execution time is most efficient, while Trip-Mine needs to store several unknown attraction sets. Hence, the memory storage slightly increases when Trip-Mine integrates with three optimization mechanisms. The average improvement rate of Trip-Mine with three optimization mechanisms over Trip-Mine is 10 times. E. Impact of Number of Attractions NA This experiment analyzes the execution time and memory storage of Trip-Mine, DP, and Trip-Mine with three optimization mechanisms when the number of attractions is varied from 50 to 250. Fig. 8(a) and 8(b) show that Trip-Mine with optimization strategies outperforms the original Trip-Mine and DP in terms of execution time and memory storage with varied the number of attractions. We observe that the execution time and memory storage increase significantly with the increase in the number of attractions. The reason is that the number of possible trips exponentially increases, when the number of attractions increases. Due to our problem is to find the optimal trip, Trip-Mine needs more execution time and memory storage to generate and check the more candidate attraction sets. The execution time of Trip-Mine with three optimization mechanisms is within 8 seconds, even when the number of attractions is 250. F. Impact of Travel Time Constraint CTT This experiment analyzes the execution time and memory storage of Trip-Mine, DP, and Trip-Mine with three optimization mechanisms when the travel time constraint is varied from 180min to 360min. Fig. 9(a) and 9(b) show that Trip-Mine with optimization strategies outperforms the original Trip-Mine and DP in terms of execution time and memory storage with varied the travel time constraint. We observe that

Trip-Mine + All

1000

32 Memory Storage (MB)


further enhance the performance for optimal trip finding. In addition, we plan to develop data mining-based approaches to automatically find the interesting attractions and estimate the stay time of attractions, aiming to achieve an automatic travel recommendation system with high efficiency in the trip planning.

Trip-Mine + AS

Trip-Mine DP Trip-Mine + LB

100 10 1 0.1 0.01

16 8 4 2

ACKNOWLEDGMENT

1

This research was supported by National Science Council, Taiwan, R.O.C. under grant no. NSC99-2218-E-006-001.

0.5

0.25

0.001

360

330

300

270

240

210

(a) Execution Time.

180

360

330

300

270

240

210

180

Time Constraint (CTT)

Time Constraint (CTT) (b) Memory Storage.

Figure 9. Comparison of various travel time constraints (CTT).

the execution time and memory storage significantly increase by increasing the travel time constraint. The reason is that the number of valid trips increases, when the travel time constraint increases. In other words, the tourist has more trip plan chooses. Hence, Trip-Mine needs to spend more time to find the best one. For the memory storage of DP, we observe that it keeps near constant when the travel time constraint is varied. The main reason is that the search strategy of DP is based on the concept of depth first search (DFS). The depth of search tree does not change when the number of attractions is constant. Hence, the memory storage of DP is almost constant. The execution time of Trip-Mine with three optimization mechanisms is within 6 seconds, even when the travel time constraint is 6 hours. VI.

REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

CONCLUSIONS AND FUTURE WORK

In this paper, we have proposed a novel data mining-based approach, namely Trip-Mine, for efficient finding of optimal trip within a travel time constraint. Based on Trip-Mine, we further proposed three optimization mechanisms: 1) Attraction Sorting (AS) for precipitating the candidates are pruned; 2) Low Bound Checking (LB) for evaluating the low bound of trip time for candidates and reducing the computational cost on candidate checking; and 3) Score Checking (SC) for reducing the number of candidate checks when the trip score is less than the so far best trip. To our best knowledge, this is the first work that facilitates efficient trip planning with consideration of travel time constraint. To evaluate the performance of the proposed approach and three proposed optimization mechanisms, we conducted a series of experiments. The experimental results show that the proposed approach Trip-Mine achieves significantly high efficiency in optimal trip finding. Besides, Trip-Mine integrates the three proposed optimization mechanisms to achieve superior performs in terms of execution time and memory storage. The experimental results show that our proposed approach and three optimization mechanisms are highly efficient under various conditions. For the future work, we plan to design more efficient optimal trip finding approach and optimization strategies to

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

G. D. Abowd, C. G. Atkeson, J. Hong, S. Long, R. Kooper and M. Pinkerton, “Cyberguide: A Mobile Context Aware Tour Guide,” Wireless Networks, vol. 3, no. 5, pp. 421-433, Oct. 1997. D. L. Applegate, R. E. Bixby, V. Chvatal and W. J. Cook, “The Traveling Salesman Problem: A Computational Study,” Princeton University Press, 2006. Y. Ge, H. Xiong, A. Tuzhilin, K. Xiao, M. Gruteser and M. J. Pazzani, “An Energy-Efficient Mobile Recommender System,” in Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 899-908, Jul. 2010. K. Gouda and M. J. Zaki, “GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets,” Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 223-242, Nov. 2005. Y. Huang and L. Bian, “A Bayesian Network and Analytic Hierarchy Process Based Personalized Recommendations for Tourist Attractions over the Internet,” Expert Systems with Applications, vol. 36, no. 1, pp. 933-943, Jan. 2009. T. Horozov, N. Narasimhan and V. Vasudevan, “Using Location for Personalized POI Recommendations in Mobile Environments,” in Proceedings of International Symposium on Applications on Internet, pp. 124-129, Jan. 2006. S. Kisilevich, D. Keim and L. Rokach, “A Novel Approach to Mining Travel Sequences Using Collections of Geotagged Photos,” in Proceedings of the 13th AGILE International Conference on Geographic Information Science, pp. 163-182, May 2010. C.-S. Lee, Y.-C. Chang and M.-H. Wang, “Ontological Recommendation Multi-Agent for Tainan City Travel,” Expert Systems with Applications, vol. 36, no. 3, pp. 6740-6753, Apr. 2009. Z. Li, B. Ding, J. Han and R. Kays, “Swarm: Mining Relaxed Temporal Moving Object Clusters,” in Proceedings of the VLDB Endowment, vol. 3, no. 1, pp. 723-734, 2010. X. Lu, C. Wang, J.-M. Yang, Y. Pang and L. Zhang, “Photo2Trip: Generating Travel Routes from Geo-Tagged Photos for Trip Planning,” in Proceedings of the ACM International Conference on Multimedia, pp. 143-152, Oct. 2010. V.-W. Soo and S.-H. Liang, “Recommending a Trip Plan by Negotiation with a Software Travel Agent,” Cooperative Information Agents V, vol. 2182, pp. 32-37, 2001. V. Tseng, C.-W. Wu, B.-E. Shie and P. Yu, “UP-Growth: An Efficient Algorithm for High Utility Itemsets Mining,” in Proceedings of the 16th ACM SIG KDD Conference on Knowledge Discovery and Data Mining, pp. 253-262, Jul. 2010. C.-C. Yu and H.-P. Chang, “Personalized Location-Based Recommendation Services for Tour Planning in Mobile Tourism Applications,” E-Commerce and Web Technologies, vol. 5692, pp. 3849, 2009. H. Yoon, Y. Zheng, X. Xie and W. Woo, “Smart Itinerary Recommendation Based on User-Generated GPS Trajectories,” in Proceedings of the International Conference on Ubiquitous Intelligence and Computing, pp. 19-34, Oct. 2010. Y. Zheng and X. Xie, “Learning Travel Recommendations from UserGenerated GPS Traces,” ACM Transactions on Intelligent Systems and Technologies, vol. 2, no. 1, 2011.