Context-aware Data Mining using Ontologies - CiteSeerX

6 downloads 71598 Views 282KB Size Report
Real-world applications of data mining require a dynamic and resilient model that is aware of a wide variety of diverse and unpredictable contexts. Contexts ...
Context-aware Data Mining using Ontologies Sachin Singh, Pravin Vajirkar, and Yugyung Lee School of Computing and Engineering, University of Missouri–Kansas City, Kansas City, MO 64110 USA. {sbs7vc, ppv22e, leeyu}@umkc.edu

Abstract. Data mining, which aims at extracting interesting information from large collections of data, has been widely used as an active decision making tool. Real-world applications of data mining require a dynamic and resilient model that is aware of a wide variety of diverse and unpredictable contexts. Contexts consist of circumstantial aspects of the user and domain that may affect the data mining process. The underlying motivation is mining datasets in the presence of context factors may improve performance and efficacy of data mining as identifying the factors, which are not easily detectable with typical data mining techniques. This paper proposes a context-aware data mining framework, where context will (1) be represented in ontology, (2) be automatically captured during data mining process (3) allow the adaptive behavior to carry over to powerful data mining. We have shown that the different behaviors and functionalities of our context-aware data mining framework dynamically generate information in dynamic, uncertain, and distributed medical applications.

1

Introduction

Real world applications are laden with huge amount of data and encompass entities that evolve over time. However, this data-rich environment does not guaranty for information-rich environment. Due to dynamic nature of environment, data must be interpreted differently depending upon situation (context). For instance, the meaning of a cold patient’s high fever might be different from the fever of a pneumonia patient. Context is a powerful, long-standing concept. It can be helpful in computerhuman interaction which is running mostly via explicit contexts of communication (e.g., user query input). Implicit context factors (e.g., physical environmental conditions, location, time etc.) are normally ignored by the computer due to absence of knowledge base or appropriate model. Context-aware computing work has been carried out by many researchers [2, 1, 3, 11]. Many of them have been working on defining context-awareness and some of them have also focused on building context-aware applications. However, little has been done towards building data mining framework based on context-awareness, leading to useful and accurate information extraction.

Data mining is a process that discovers useful information in data that may be used for valid predictions [5]. Context-aware data mining is related to how the attributes should be interpreted under specific request criteria. Current data mining approaches do not provide adequate support for handling context-aware data mining. The main reason for this is the lack of rich context that specifies when and how a data mining should be applied to its context. We fervently believe implicit context factors could be used to interpret and enhance explicit user input and thereby affecting data mining results to deliver accurate and precise prediction results. Different behaviors and functionalities of data mining are highly useful and required in generating information in dynamic, uncertain, and distributed environments. It is because such behaviors and capabilities can help to increase the various degrees of effectiveness and flexibility of data mining process. In this paper, we tried to mimic such aspects wherever feasible and show such sophisticated functionality to significantly enhance the quality of data mining. Ontologies provide a means to represent information or knowledge that is machine processable and can be communicated between different agents. The Framework represents the context factors in carefully crafted ontologies. Context is a very subjective term and is dependent on the domain under consideration. Thus, we can differentiate the context aware data mining into two parts; the actual representation of the context factor for a domain in corresponding ontology and a generic framework which can query this ontology and invoke the mining processes and coordinate them according to the ontology design. Knowledge representation in ontology can a building block in context based data mining. The paper is organized as follows. In Section 2 we define the concept of Context-awareness to suit the need of our unique integrated model which recognizes various Context factors that are generic for all domains. Section 3 discusses how the context factors can be applied to data mining through a carefully selected motivating example. Section 4 proposes a framework which applies context factors to data mining process. Section 5 explains the design of the ontology used by the framework. Section 6 demonstrates the experimental results performed using the framework, bringing out the affect of context on data mining. Medical data has been used to test with the proposed model. Section 7 concludes this paper.

2

What is Context-Awareness?

The concept of Context has often been interwoven and used in many different fields. When the information has to be conveyed from one element to another we need to let the receiving element know the reference of our discussion. Dey and Abowd [4] defined it as a piece of information that can be used to characterize the situation of a participant in an interaction. Similarly, [2, 1] defined context as location, environment, identity of people and time. By sensing context information, context enabled applications can present context information to users, or modify their behavior according to changes in the environment [10]. Chen and

Kotz [3] defines context as the set of environmental states and rules that either determines an application behavior or describes where the event occurs. It is very similar to our definition. Schilit and Theimer [12] emphasized the importance of applications which get adapt themselves to context. Lack of context-awareness leads to missing a lot of critical and useful information that would affect the data mining process and thereby, affecting the data mining results. In real-world and live data sets, the context factors that constitute Context awareness changes rapidly and therefore the factors tend to become subjective and very domain specific. There are some definition which were too broad to apply to any application. The context will make the system understand and adapt the data mine process and thereby providing the users with a time sensitive data accurately, efficiently and in a precise manner. Now we define the types of context factors specific to our framework: Domain Context describes domain specific context which is patient-centric in our case. The Target (Patient) Context captures the personal and medical history of the patient. It also records the immediate family members and their medical history. This could be useful in scenarios where the diagnosis and the treatment of a patient is affected by the medical history of his/her family. For example, an important factor in predicting whether a patient has diabetes is if anyone in his immediate family has diabetes. In such a case, it is important to get the patient context before making predictions in this regard. Location context: The datasets primary formed from the population living near a certain location. Living area is related to health issues. For example, people living in coastal regions have less probability of getting goiter. Similar, people living in the country side have fewer tendencies to get high blood pressure as compared to suburban folks. It would be a good idea to pick the appropriate data sets depending on the location context of the patient. Data Context: It is important to figure out which of the available datasets to pick for mining for a given service. This context helps us to figure out which dataset to pickup and how to combine them to get useful mining results. Here the combination of datasets is at a semantic level rather than at a structural level. For example, from domain knowledge we know that making predictions for heart attack also involves checking his diabetes. In such a case the two structurally disjoint data sets are combined at semantic level before mining the resulting dataset. User Context: (1) User Identity Context describes the information of user responsible for the query including his/her field of expertise, authorization of tasks or data sets, his/her team members and their expertise fields. (2) User History Context describes the history (i.e., user-profiling) built up for each user when he/she queries for a particular information. This helps when the user frequently queries with a similar query or uses the same piece of information.

3

A Motivating Example of Context-aware Data Mining

The effect of context factors can be explained in the light of a carefully selected scenario. The scenario explains a typical situation when a doctor wants to know the likelihood of a patient having the major blood vessels < 50% or > 50% narrowing as a measure of heart attack risk. Data mining based application is a natural choice, for building a prediction model by mining the existing data warehouse containing large amount of data. A typical data mining application would require a big set of input parameters to query the prediction model to result into the predicted value. However, by carefully selecting the context factors the user is made to give only a small set of input [14] while system deduces the rest based on context factors. Let us look into the attributes of the dataset in details: 1. (age) Age in years 2. (sex)Sex – Value 1: Male and Value 0: Female) 3. (chest pain) chest pain type – Value 1: Typical angina, Value 2: Atypical angina – Value 3: Non-anginal pain, Value 4: Asymptomatic 4. (trestbps) resting blood pressure (in mm Hg on admission to the hospital) 5. (chol) Serum cholestoral in mg/dl 6. (fbs) (Fasting blood sugar >120 mg/dl) – Value 1: True and Value 0: False 7. (restecg)resting electrocardiographic results – Value 0: Normal – Value 1: Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of >0.05 mV) – Value 2: Showing probable or definite left ventricular hypertrophy by Estes’ criteria 8. (thalach) Maximum heart rate achieved 9. (exang) Exercise induced angina (1 = yes; 0 = no) 10.(oldpeak) ST depression induced by exercise relative to rest 11.(slope) The slope of the peak exercise ST segment – Value 1: upsloping, Value 2: flat, Value 3: downsloping 12.(ca) Number of major vessels (0-3) colored by flourosopy 13.(thal) the heart status – Value 3: Normal, Value 6: Fixed defect, Value 7: Reversable defect 14.(Family-Hist)- History of any heart disease within immediate family – Value 1: True, Value 0: False 15.(Smoke-Disease) - Symptoms of smoke disease 16.(Location) - Location of the person where he lives. 17.(num)Diagnosis of heart disease (angiographic disease status) – Value 0: 50% Diameter narrowing (in any major vessel: attributes 59 through 68 are vessels)

In this scenario the doctor might be interested in determining whether the major blood vessel is 50% narrowing to evaluate the measure of heart attack risk. This could form the basis of further action that the doctor may take in treating the person. So the attribute 17 becomes the pivot element (also called as class attribute), forming the prediction value in the prediction tree. All other parameters are required as the query parameters to reach to a prediction value. As is apparent in the dataset lots of factors are required to determine the value of the pivot element. Now most of the elements from attribute 4 through attribute 13 are standard clinical tests and are available to doctors at their disposal. Attribute 1 and 2 are trivial. Consider other attributes. Location: Logically location doesn’t seem to affect this data directly. So it is less likely that this attribute will affect as a significant node in the classification tree. However the importance of this attribute lies in the fact that it can be used to cluster data records based on that. For example, if we have collected the data set from different zones/countries/states etc. If the patient under consideration falls in any one of the zones, then the system could extract a set of records, which correspond to that particular zone and then use this sub data set to mine for the classification tree. This may improve the accuracy of the system, wherein we just concentrate on the data more relevant to him than the generic data. If the patient doesn’t fall in any one of the zone then it’s on the system to decide whether it can approximate his dataset to the zone closest to him or use the entire dataset or combination of two zones etc. This is an example of how we can use the Location Context. This is true because Location here is not an input to the data mining process but a context factor which will affect the output given same other input parameters Family-History: This input parameter demands an input, which is beyond the standard clinical tests. This is an example, which uses historical personal data as input. The system stores information corresponds to the immediate family members of the given patient based on his or her user profile. Then for each of these members it can access the Historical patient repository. Hospitals usually, maintain some kind of medical record of each patient in history about what disease they had in past. Thus, this is a specific case of Domain context, which is called as Patient context. Smoke-Disease: This parameter refers to any health affects caused by smoking. Determining whether a person has smoking ill effects is in itself a subproblem. Here we refer to another data set which has information like (1) Smoking from when (2) Cigs per day (3) When quit (period) etc. Based on these input parameters from the user, the system picks up another dataset referring to smoking say Smoking Effects, mine this dataset and builds the classification tree, which predicts whether the person has smoking problems. Now using the input parameter for the given patient, the system will query this tree and predict whether the patient has smoking problems. The predicted output is the input to the original query and is then used to query the original tree for the heart disease. This is an example of the Data Context. It is so because one of the input to the original query is output after selecting another dataset. The system will

select the auxiliary dataset only if the patient ever smoked. Thus based on the context a different dataset(s) is selected, queried and used in cascade to identify implicit context factors.

4

Context-Aware Data Mining Framework

We propose a new framework for an application which mines real life datasets to build effective data mining models taking into consideration all the relevant context factors. It is worth mentioning here that the system is not focusing on how the context factors are collected but on how these factors are utilized to obtain relevant and correct results. The objective of the framework is the use of context factors to achieve better prediction and accuracy of the data mining process. The context factors are applied through a set of carefully designed ontology concepts described in details in Section 5. The framework is oriented towards medical datasets and the examples used refer to using the mining, especially classification models to build decision support systems related to medical field. However, the idea of this framework is generic and is applicable to other important application areas of data mining like e-commerce, stock trading etc. 4.1

Context-Aware Data Mining Model

In the context-aware data mining framework, different context factors engage in a number of different types of data mining behaviors. First, let us consider a set of context factors which may affect the behavior of data mining: C = {c1 , c2 , . . ., cn }. The different mining behaviors are employed according to whether or not some tuples, attributes or values of given datasets are related to computing the contexts. Suppose that a context factor takes values in the set {c1 , c2 , . . ., ck }. Let D a dataset composed of a set of tuples, T = {t1 , t2 , . . ., tn }, a set of attributes, A = {a1 , a2 , . . ., am }, a set of values for a given attribute aj , V = {v1 , v2 , . . ., vl }. In relation to the set of the context factors, C, we briefly describe two main processes of data mining. Please refer to [9] for details. Phase 1 Preprocessing: datasets to be mined are prepared using different schemas Pick, Join, or Trim against tuples (T ), attributes (A) or values (V ) of available datasets (D). The preprocessing schemas can be specified as follows: – Pick determines how to pick a particular tuple(s), a particular field or a particular value from the given dataset(s) for a particular context(s) ck . Picking particular tuples (rows), fields (columns) is called horizontal pick denoted by hP ick(Ti , ck ), vertical pick denoted by vP ick(Ti , ck ), respectively. – Join determines how to join a particular tuple(s), a particular field or a particular value picked from the source dataset(s) Ti to a target dataset Tj . Joining particular tuples (rows), fields (columns) is called horizontal join denoted by hJoin(Ti , Tj , ck ) and vertical join denoted by vJoin(Ti , Tj , ck ), respectively.

– Trim determines how to trim a particular tuple(s), a particular field or a particular value from a particular dataset Ti . Triming particular tuples (rows), fields (columns) is called horizontal trim denoted by hT rim(Ti , ck ), vertical trim denoted by vT rim(Ti , ck ), respectively. Phase 2 Data Mining: Different types of mining processes can be invoked. – Cascading mining process comprises of a main process which acquires some of its inputs by recursively invoking other process and obtaining their output to complete the input set required to execute itself. – Sequential mining process specifies requiring the output of a process being the input of the subsequent process. – Iterative mining process represents a repetitive execution of a set of activities. – Parallel fork process partition a process into a set of the subsequent processes. – Aggregating Mining Process aggregates the outputs from the previous processes. A typical data mining behavior is a hybrid form where the system employs data mining processes as combination of some behaviors mentioned above and then computes the value by using the proposed schemas. As an example, the context of height/weight ratio picks height and weight as two different values and then compute this ratio. In all these cases the system is aware of the domain and the mining processes dealing with implicit contexts. The domain knowledge and processes are specified in ontologies as described in Section 5. 4.2

Architecture

Fig. 1 shows the architecture of the proposed framework. User Profile is a collection of information about the users of the system. The user context denotes all or some of this user information, reference to the current, query. For instance, current user is obtained by the user login information. As the system is used by the user, the system learns more about the user and maintains a user profile. The User Profile could contain information like types of queries he is mostly interested in, which can help the system to provide a better service to the user. The User Interface component is the one which interacts with the client subsystems. It is essentially a client interface component which interacts with the client. The User Interface component refers to the Service Ontology which provides a listing of all services offered by the system and the input required by the user for providing that service. Once it receives a request from the user, it forwards it to the query analyzer. The Query Analyzer refers to the Process Ontology to get additional information about the process to be executed to fulfill the service. This information is typical of what additional context factors to consider for the query; where to get additional implicit input to complete the input for the given process. The Query Analyzer will fetch all the implicit information to complete the query parameters. It then passes the complete (explicit and implicit) input list to the Query Processor. The Query Processor carries out the actual data mining operation

User Profile

Location Context

Domain Ontology

User Context

Domain Context

Data Ontology

Data Context

Process Ontology

Context Factor Detection plane UI Component (Web server)

Service Ontology

Architecture Diagram

Query Analyzer

Data Preprocessing

Query Processor

Data Mining Tool

Datasets

Fig. 1. Architecture of Context-Aware Data Mining Framework

when invoked with complete set of input from the Query Analyzer. The implicit information could be picked up from a dataset record or it could be a result of a sub mining task. In such a case, the query analyzer will invoke the query processor to execute such sub mining tasks individually, consolidate the results and then pass it back to Query Processor to execute the main mining task. Thus the query analyzer can be considered as a Meta task manager, managing multiple atomic tasks which are part of the same high level tasks. The Domain Ontology stores knowledge about the existing datasets like relationships between datasets not only at structural level but more so at the semantic level. For example, if one gives a query which requires to mine a diabetes data set, from Data Ontology we also know that anybody who has diabetes should also be queried for kidney disease, and so we also have to mine other dataset apart from the primary one. Thus this ontology can be used to semantically integrate the datasets for a consolidated mining on multiple physical datasets. The actual data mining consists of two components. The first one is a Data Preprocessing component which converts the existing dataset formats into the one that is accepted by the data mining tool. The second component is the actual Data Mining tool, which will mine the dataset, given all the input parameters, like the dataset name, query element etc. this component accepts the preprocessed dataset, the other query parameters and returns the queried result. We implement the data mining using WEKA [6] as part of our framework. The

Query Generator may use the output of one query as an input to other query and so on till it gets all the input required for the primary query that the user had requested. After the Meta task is executed the final result is returned to the user.

5

Ontology Design

getValueFrom

useColumn

Fig. 2. Context-Aware Data Mining Ontology

The system supports the application of context through carefully designed ontology. The ontology is shown in Fig. 2. The main concepts in the ontology are Service and Process. The service concept represents the services that the system offers to the user along with the description. The service is linked to the concept of parameters which represent the input required for the data mining process. The concept of parameter is generic and it represents all input; explicit from the user and implicit to the context. The concept of Process represents the idea of processes inside the system. It can be considered as a process which needs to be executed to provide a given service. Process keeps all information about the process that needs to be executed; that is the data Mining process that has to

be executed to fulfill the described service. It keeps a track of all the parameters that are needed for the mining process. Some of these parameters are obtained from user, while others are implicit or indirectly obtained as discussed in Section 4. Thus Process can be thought as a reference to the complete set of parameters while the service refers only to the set of Parameters which the user needs to enter. The concept of Database is a semantic abstraction of the actual physical datasets available. The concept of Table and Columns represent the actual tables and columns respectively. Each column refers to column of other table through the Foreign Key attribute, which can be used to combine the tables based on these references. This link is an abstraction of the structural relationships between tables. The concept of Context is an abstraction of the context factors. All the context factors defined for our framework above are represented as concepts too. The Data context concept refers to the datasets that are required for the mining operation. In addition it provides the Combine Key attribute which refers the columns on which the combine operation is to be done. As mentioned earlier a data mining operation may require combining physical datasets before starting the mining operation. This combination is however different from the former one in a sense that it is a semantic combination rather than a structural combination as was the case in foreign key attribute. For example a dataset may be represented in form of two physical tables as part of Database Normalization. This combination is done on the foreign key attribute. However, some of that datasets are physically unrelated like for example a dataset denoting symptoms of heart attack and other for diabetes. To predict a diabetes has to take into consideration the kidney failure symptoms too. Thus the two physically unrelated datasets are now semantically associated. This relationship is represented in the form of Combine Key attribute. Auxiliary Dataset represents the implicit parameters that can be picked up directly from a given dataset. It refers to the table from where to pick the data, the column consisting of the data useColumn; and the parameter which will give the foriegnKey Value, which will be used to query the dataset. The Location Context refers to the actual column which has some location related information, like zone, country, state etc. Physically the name could be different, but this one semantically abstracts the notion of a region in general as location. It also refers to the parameter which provides the value. Using this information the system will filter the dataset(s) to get a relevant mining data. The concept of Domain Context abstracts all the concepts that represent domain centric context factors. As in our case, the patient context refers to the Patient Ontology. The patient ontology describes a patient, through its relationship with with other patients, like immediate family members. As described in Section 3, the Patient Ontology is used to determine the medical history of related patients. In addition the ontology supports the idea of Domain Filters, which are any factors specific to a process used to filter a given dataset(s) and refers to a column which will be used to filter and the parameter from which it

gets the value. Each parameter refers to ValueReference which denotes the place from which the parameter gets its value. ValueReference is a concept which is an abstraction of all entities from which a parameter can get its value; Service means value comes from user, context means it comes from one of the context factors. Most important point to note is that the parameter may refer to another process, which is also a ValueReference to get its value. This denotes the Cascading Mining discussed in Section 4, where the main mining process would require one or more sub mining tasks and prediction outputs of each would form the input for the main task. Each of these sub-mining task is represented as different Process and hence a parameter referring to a process means that the output of such a process is the input value for the given parameter. The framework also supports multiple domain specific contexts. Our example patient context is such a domain context. The Patient Ontology stores knowledge about patient. It stores knowledge like possible relationship between patients, like family relationships. This can be used in determining past family health problems for a patient during data mining process. In addition one could plug in additional contexts like if one needs to filter a dataset on a given attribute before mining the resulting dataset. For example, as mentioned in the example later, one could use sex as a filtering attribute to get relevant data before mining. Our initial investigation for developing ontology has led us to the tool environment Prot´eg´e [7]. This knowledge-editing environment is found to be excellent environment for our purpose to create and maintain models of concepts and relations in the context-aware data mining process.

6

Experimental Results

Now we will show the experimental results regarding the heart attack risk case. The data set selected is same as described in Section 3 that stores the history of patients with heart diseases. The dataset consists of following attributes: – Personal details: age in years and sex (1 = male; 0 = female) – Cardiac Details: painloc : chest pain location (1: substernal; 0: otherwise) – The prediction output for this dataset is num: the diameter of the artery (angiographic disease status) (0: < 50% diameter narrowing; 1: > 50% diameter narrowing) The user requests a query, which consists of input. He/she expects an outcome prediction value num. The system performs data mining using all attributes and entire datasets to construct the model. Using this model, it will apply the query variables to get the prediction result. We applied the J48 algorithm [13] and build a decision tree using C4.5 [8] on the heart dataset. Fig. 3 (Case 1) shows the classification tree generated using C4.5 tree. Case 1 specifies the tree generated when the mining is done on the entire data set as it is. However context aware data mining will be different in that. Using the metalevel understanding (which we will have eventually), we know that the personal

Case 1: Classification Tree exang= no oldpeak _ 1: < 50 (190.0/27.0) oldpeak > 1 slope = down: > 50 1 (0.0) slope= flat sex = female: < 50 (3.0/1.0) sex = male: > 50 1 (8.0) slope = up: < 50 (3.7) exang = yes: > 50 1 (89.3/19.3) Case 2: Classification Tree exang= no oldpeak _ 1: < 50 (129.0/24.0) oldpeak > 1 slope = down: > 50 1 (0.0) slope = flat: > 50 1 (8.0) slope = up: < 50 (2.0) exang= yes chest pain = typ angina: > 50 1 (0.0) chest pain = asympt: > 50 1 (62.0/6.0) chest pain = non anginal age _ 55: > 50 1 (3.0/1.0) age > 55: < 50 (2.0) chest pain = atyp angina oldpeak _ 1.5: < 50 (4.0/1.0) oldpeak > 1.5: > 50 1 (3.0) Case 3: Classification Tree exang = no: < 50 (64.82/4.0) exang = yes thalach _ 108: > 50 1 (3.04/0.04) thalach> 108 chol _ 254: < 50 (4.0) chol > 254 thalach _ 127: < 50 (4.08/1.0) thalach > 127: > 50 1 (3.06/0.06)

Fig. 3. The Experimental Results

detail attributes can be patient-contexts. If we use them as context we achieve interesting results. In our case we considered Sex as a context factor. The user enters the query variables as he does previously. The application is now context-aware so it knows that part of the query the Sex variable is actually a context input. Using Sex as context is just like saying, if we need to predict the risk of heart disease, why should we mix data. Male persons can have different factors affecting more predominantly as compared to female ones. If we mine the datasets differently we may get interesting results. So if the user query variable is Sex=male, then it may not make sense to mine the entire dataset and then query the model and get the results. Instead retrieve only those records that are Male, and then mine the dataset excluding the sex column. That is we do vertical and horizontal trimming

of data. The data model mined out of this could be specialized information. Our experiments we mined the datasets separating on basis of Sex as context information and the classification tree corresponding to males is shown in Fig. 3 (Case 2). As is evident from the figure the new model shows emergence of some new factors in the decision making tree which were not appearing in the generalized domains. This shows how using Context factors can achieve different results and also give more insight of the trend in the dataset and their interrelations. In another example we considered input as Sex=female and we get a different data model shown in Fig. 3 (Case 3). From this simple query variance it is obvious that considering context factors can greatly affect the efficacy of the results of data mining applications.

7

Conclusion

In this paper we introduced a context aware data mining framework which provides accuracy and efficacy to data mining outcomes. Context factors were modelled using Ontological representation. Although the context aware framework proposed is generic in nature and can be applied to most of the fields, the medical scenario provided was like a proof of concept to our proposed model. An experimental result confirmed the effectiveness of use of context factor in data mining.

References 1. Brown, P.J.: The Stick-e Document: a Framework for Creating Context-Aware Applications. Electronic Publishing 96, 259-272 (1996) 2. Brown, P.J., Bovey, J.D. Chen, X.: Context-Aware Applications: From the Laboratory to the Marketplace. IEEE Personal Communications, 4(5) 58-64 (1997). 3. Chen, G., Kotz, D.: A Survey of Context-Aware Mobile Computing Research. Dartmouth Computer Science Technical Report TR2000-381 (2000). 4. Dey, A.K., Abowd, G.D.: Towards a better understanding of Context and ContextAwareness. GVU Technical Report GITGVU-99-22, College of Computing, Georgia Institute of Technology. 2, 2 – 14 (1999). 5. Edelstein, H. A.: Introduction to Data Mining and Knowledge Discovery, Third Edition, Two Crows Corporation, 1999. ISBN: 1-892095-02-5. 6. Machine Learning Software in Java. The University of Waikato (http://www.cs.waikato.ac.nz/ ml/weka/index.html). 7. The Prot´eg´e Project Website, http://protege.stanford.edu/. 8. Ragone, A.: Machine Learning C4.5 Decision Tree Generator. 9. Singh, S., Lee, Y.: Intelligent Data Mining Framework, Twelfth International Conference on Information and Knowledge Management (CIKM03) (submitted) (2003). 10. Salber, D., Dey, A.K., Orr, R.J., Abowd, G.D.: Designing For Ubiquitous Computing: A Case Study in Context Sensing, GVU Technical Report GIT-GVU 99-129, (http://www.gvu.gatech.edu/) (1999).

11. Schilit, B., Adams, N., Want, R.: Context-Aware computing applications. In Proceedings of IEEE Workshop on Mobile Computing Systems and Applications, 85-90, Santa Cruz, California, December (1994). 12. Schilit, B., Theimer, M.: Disseminating Active Map Information to Mobile Hosts. IEEE Network,8(5), 22-32 (1994). 13. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999). 14. UCI Knowledge Discovery in Databases Archive, Information and Computer Science University of California, Irvine, CA 92697-3425 (http://kdd.ics.uci.edu/)