Weka-SAT: A Hierarchical Context-Based Inference Engine to Enrich ...

12 downloads 10119 Views 200KB Size Report
Canadian Conference on Artificial Intelligence. AI 2014: Advances ... Part of the Lecture Notes in Computer Science book series (LNCS, volume 8436). Cite this ...
Weka-SAT: A Hierarchical Context-Based Inference Engine to Enrich Trajectories with Semantics Bruno Moreno1 , Am´ılcar S. J´ unior1 , 1 1 Val´eria Times , Patr´ıcia Tedesco , and Stan Matwin2 1

Centro de Inform´ atica, UFPE, Recife, Brazil {bnm,asj,vct,pcart}@cin.ufpe.br 2 Faculty of Computer Science, Dalhousie University, Canada Institute for Computer Science, Polish Academy of Sciences, Poland [email protected]

Abstract. A major challenge in trajectory data analysis is the definition of approaches to enrich it semantically. In this paper, we consider machine learning and context information to enrich trajectory data in three steps: (1) the definition of a context model for trajectory domain; (2) the generation of rules based on that context model; (3) the implementation of a classification algorithm that processes these rules and adds semantics to trajectories. This approach is hierarchical and combines clustering and classification tasks to identify important parts of trajectories and to annotate them with semantics. These ideas were integrated into Weka toolkit and experimented using fishing vessel’s trajectories.

1

Introduction

Trajectory data are captured in many areas in its raw format as a tuple (id, x, y, t), where id is the moving object identifier and (x, y) are geographical coordinates collected on time t. Due to this lack of semantics, a series of studies are focused on the knowledge discovery based on trajectory data. Aiming at improving the concept of semantic information to trajectories, Spaccapietra et al. [1] defined a semantic trajectory as being composed by stops and moves. The stops are the most important parts of trajectories while moves are the parts that connect these stops. Current works are capable of identifying important parts of trajectories [2], [3], [4] [5], [6] but they are not capable of determine which activities were performed by the moving object. The work of Spinsanti et al. [7] proposes a series of rules for the annotation of the whole trajectory with the most probable global activity. However, the limitation of this approach lies in the impossibility of identifying activities at a lower level of detail. To address this issue, we propose a hierarchical approach for providing annotation at episode level. This approach uses a context model to trajectory domain, CMT (Context Model for Trajectories), and is implemented by an algorithm, RB-SAT (Rule Based algorithm for adding Semantic Annotations to Trajectories), that generates rules using context information from CMT. M. Sokolova and P. van Beek (Eds.): Canadian AI 2014, LNAI 8436, pp. 333–338, 2014. c Springer International Publishing Switzerland 2014 

334

B. Moreno et al.

RB-SAT combines clustering and classification techniques hierarchically and recursively. To increase the utility of our approach, CMT and RB-SAT were integrated to Weka [8], resulting in Weka-SAT (Weka for Semantic Annotation in Trajectories). Weka-SAT was used to execute experiments that added semantics to trajectories. The results were evaluated using purity and coverage [9]. The rest of this work is organized as follows. Section 2 surveys related work; in Section 3 CMT is presented; Section 4 presents the algorithm to annotate semantics in trajectories; the results of experiments are presented in Section 5 and, finally, Section 6 lists the main conclusions and future works.

2

Related Work

Here, we define Context of Trajectories as any information that can be used to characterize the situation of an object in movement. In addition, we adopted also the definition of Contextual Elements to refer to data, information and knowledge that are used to define Context [10]. Context of trajectory can be viewed also in terms of the well-known six dimensions of context, these dimensions are named as 5W+1H (Who, What, Where, When, Why and How dimensions) [10]. Context models are artifacts used to represent domain information considered as context. Vieira et al. [10] defined a generic metamodel for context that assists the creation of context models for any domain. This metamodel asserts that context is composed by Contextual Element s (CE s) and Context. A CE is any kind of data, information or knowledge that allows the qualification of an entity for a specific domain; while Context is a set of instantiated CE s that are used by an agent to execute a relevant task for a specific domain. The six dimensions of context identify basic units that characterize an entity or a situation. The definition of these dimensions are represented by answering six interrogative pronouns. For the trajectory domain, for example, these answers are given as follows: (1) Who, refers to the object’s identity that is performing the movement (e.g. car’s model, animal species); (2) What, refers to a important activity performed by a moving object in a specific part of the trajectory (e.g. tourists visiting a famous place); (3) Where refers to the place where the activity was performed; (4) When indicates the temporal information about the activity performed; (5) Why refers to the reason why a moving object executed an activity or an action; and (6) How, refers to the information that indicates the way the context information was acquired (e.g. from sensors, knowledge base).

3

CMT: A Context Model for Trajectories

In this work, we consider that inside the stop the moving object is not necessarily stationary and consequently there may exist another trajectory inside this stop. This subpart of the trajectory is called here as a sub-trajectory. This is an important concept for CMT because this motivates the hierarchical representation adopted. Sub-trajectories may exist inside stops and inside moves as well. Definition 1 describes our sub-trajectory concept.

Weka-SAT

335

Definition 1. Sub-trajectory: a sub-trajectory is a list P of spatiotemporal points inside an element E (stop or move). P = {p0 = (x0 , y0 , t0 ), p1 = (x1 , y1 , t1 ), ..., pN = (xN , yN , tN )} where p0 is the first point of E and pN is the last point of E. xi , yi ∈ R; ti ∈ R+ for i = 0, 1, ..., N, and t0 < t1 < t2 < ... < tN . In this work, a trajectory is segmented in terms of sub-trajectories. In other words, as stops and moves are composed by sub-trajectories, if new segmentations are performed over these sub-trajectories, new context situations can be generated for the application. For example, if we look inside a stop labeled as a shopping center, other sub-trajectories can be detected representing a person performing many activities, like shopping in different stores, working at a store or just stopped watching a movie. Thus, we can split a sub-trajectory into other sub-trajectories recursively and, consequently, organize them hierarchically. According to Definition 2 and Definition 3, stops and moves captured inside a sub-trajectory are seen as sub-stops and sub-moves, respectively. Definition 2. Sub-stop: a Sub-stop S is a stop inside a sub-trajectory T where S is a tuple (Rk , tj , tj+N ) such that Rk = {(xj , yj , tj ), (xj+1 , yj+1 , tj+1 ), ..., (xj+N , yj+N , tj+N )} and |tj+N -tj | < Δt, Rk is the geometry of S and Δt is its minimum time duration. Definition 3. Sub-move: Sub-move M is a move inside a sub-trajectory T where M is a tuple (Rk , tj , tj+N ) such that Rk = {(xj , yj , tj ), (xj+1 , yj+1 , tj+1 ), ..., (xj+N , yj+N , tj+N )} and |tj+N -tj | < Δt. Rk is the geometry of M and Δt is its minimum time duration. Moving Object, Stops, Moves, Sub-stops and Sub-moves are considered here as Contextual Entities, i.e. they have attributes that allow the identification of contextual information: Contextual Elements. Information related to the moving object (e.g. gender of customer) and information processed from raw trajectory (e.g. speed variation) are some examples of our Contextual Elements. Figure 1 shows the CMT using the UML notation. Different instances of Sub-trajectory’s Contextual Elements represent different meanings in the real-world (i.e. Contexts). For Stop’s Sub-trajectory we have two categories of meanings: “Place” or “Activity” and for Move’s Sub-trajectory we have “Semantic direction” and “Transportation mean”. According to Figure 1, a Sub-trajectory, Sub-stops and Sub-moves inherit all Contextual Elements from Trajectory, Stops and Moves, respectively. The labels next to each concept represent the context dimension correlated to that concept. Discovering whether the Contextual Elements are capable or not of describing different Contexts is clearly part of a classification learning task: the Contextual Elements are the features used to classify a set of data in different Contexts. Thus, we propose the RB-SAT, an algorithm that is capable of adding semantic annotation to trajectories by classifing stops and moves in different classes.

4

The RB-SAT Algorithm

The algorithm RB-SAT (Algorithm 1) basically works as follows. RB-SAT processes each element (stop or move) received as parameter aiming at classifying

336

B. Moreno et al.

Fig. 1. Trajectory Context Model represented in UML notation

them with labels contained in the outcomes of the rules received as parameter as well (method checkRules in line 7). If no outcomes were assigned, clustering algorithms are invoked to generate sub-stops and sub-moves (line 11) and the algorithm is invoked recursively passing the sub-stops and the sub-moves as parameters (line 12). To cluster data, extensions of the CB-SMoT [2] and DB-SMoT [4] were implemented and used allowing an automatic definition of input parameters. The CMT and the RB-SAT were integrated as an extension of Weka. The goal of this integration is to assist the user in the preprocessing of rules that are required as input by RB-SAT. Our system, named here as Weka-SAT, its source code and a simple tutorial are available for download1 .

5

Results of Experiments

Fishing vessels operating in Brazil are monitored with GPS devices. As a result, large amounts of trajectory data are generated. Fishing activities can be summarized basically by two main steps: (1) release and (2) gather the longline. These two phases compose what is called as “fishing bid”. Vessel’s captains commonly write on the board map the coordinates and time instant of each fishing bid. However, due to unkown reasons, this information is sometimes unreadable or absent. Also, it is frequent that some trajectories contain weird movements that are not described on the board map. To execute the experiments presented here and to generate the needed rules to pass as parameters of RB-SAT, we used the trajectories and the information contained on the board maps from the work of [4]. The rules were generated 1

http://www.brunomoreno.com/softwares/weka-sat

Weka-SAT

337

Algorithm 1. RB-SAT 1: procedure RB-SAT(T , S, λr, Δt) 2: result ← ∅ //elements with semantics 3: aux ← ∅ 4: annotated ← f alse 5: for all element s in S do 6: if time interval of s ≥ Δt then 7: annotated ← CheckRules(λr, s) 8: if annotated is true then 9: result ← result + s 10: else 11: aux ← GenerateSubElements(T, s) 12: RB-SAT(s.subtrajectory, aux, λr, Δt) 13: end if 14: end if 15: end for 16: end procedure

for two different scenarios. In scenario A we considered five different classes: (1) Gathering the longline, (2) Waiting, (3) Releasing the longline, (4) Going fishing and (5) Moving to/from harbor. In the scenario B we considered two classes: (1) fishing and (2) not fishing. The rules passed as parameters to RB-SAT were generated by J48 in Weka-SAT. In addition to these rules, RB-SAT also receives a set of stops and moves as input parameter. For each scenario, experiments were executed for different combinations of training datasets. As our fishing dataset is composed of four trajectories, our algorithm was trained using all possible combination of two by two and three by three (six and four possibilities, respectively). We tested RB-SAT with two different approaches: (1) training the model using two trajectories and testing with one trajectory and (2) training with three trajectories and testing with one. Since the trajectories used here are labeled data, we evaluated our results by extending two metrics from [9]: purity and coverage. Table 1 shows the values of coverage and purity related to our better results. We would like to highlight the relevant results related to scenario B, where the activities performed by a vessel were identified with 86.21% of coverage and with 52.99% of purity on average. Table 1. Results for purity and coverage Set type Scenario Purity Coverage Two trajs. A 71.08% 30.86% Three trajs. A 44.20% 15.51% Two trajs. B 88.05% 53.68% Three trajs. B 84.37% 52.29%

338

6

B. Moreno et al.

Conclusions

In this paper we presented a model for representing context of trajectories and an algorithm that instantiates this model by using association rules and adds semantics to trajectory data. In addition, we also integrated this algorithm as an extension of Weka. Our motivation is driven by our desire to understand patterns from trajectories domain and improve this data type with semantics. The main contribution of this work is for the trajectory data mining user, since we are offering an approach that is able to extract trajectories patterns and add this knowledge to raw data. We summarize two main contributions: (1) the definition of a contextual model for trajectories (CMT) and (2) an algorithm that uses these rules from this model to add semantics do trajectories (RB-SAT). Our approach goes a step further then other approaches ([5], [2], [4]) because it assists the application designer to select features that will be used to classify important parts of trajectories. Acknowledgment. The authors would like to thank CNPq and the Canadian Bureau of International Education for their financial support.

References 1. Spaccapietra, S., Parent, C., Damiani, M.L., de Macˆedo, J.A.F., Porto, F., Vangenot, C.: A conceptual view on trajectories. DKE 65, 126–146 (2008) 2. Palma, A.T., Bogorny, V., Kuijpers, B., Alvares, L.O.: A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the 2008 ACM Symposium on AC, pp. 863–868. ACM, New York (2008) 3. Zimmermann, M., Kirste, T., Spiliopoulou, M.: Finding stops in error-prone trajectories of moving objects with time-based clustering. In: Tavangarian, D., Kirste, T., Timmermann, D., Lucke, U., Versick, D. (eds.) IMC 2009. CCIS, vol. 53, pp. 275–286. Springer, Heidelberg (2009) 4. Rocha, J.A.M.R.: Db-smot: Um m´etodo baseado na dire´c˜ ao para identifica´c˜ ao de areas de interesse em trajet´ ´ orias. Master’s thesis, UFPE, Brasil (2010) 5. Alvares, L.O., Bogorny, V., Kuijpers, B., Macˆedo, J.A.F., Moelans, B., Vaisman, A.A.: A model for enriching trajectories with semantic geographical information. In: GIS, pp. 22:1–22:8. ACM, New York (2007) 6. Yan, Z., Chakraborty, D., Parent, C., Spaccapietra, S., Aberer, K.: Semitri: a framework for semantic annotation of heterogeneous trajectories. In: 14th International Conference on EDT, pp. 259–270. ACM, New York (2011) 7. Spinsanti, L., Celli, F., Renso, C.: Where you stop is who you are: Understanding peoples activities. In: Proceedings of the 5th Workshop on BMI (2010) 8. Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: Weka - a machine learning workbench for data mining. In: The Data Mining and Knowledge Discovery Handbook, pp. 1305–1314 (2005) 9. Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27(3), 267–289 (2006) 10. dos Santos, V.V., Br´ezillon, P., Salgado, A.C., Tedesco, P.: A context-oriented model for domain-independent context management. RIA 22(5), 609–628 (2008)