Multi-model Ontology-Based Hybrid Recommender ... - CiteSeerX

6 downloads 0 Views 731KB Size Report
University of Louisville, Louisville, KY 40292, USA ... Western Kentucky University, KY 42101, USA ... the previous history of the K most similar users [5]. Recently ...
2009 2009 2009 IEEE/WIC/ACM IEEE/WIC/ACM IEEE/WIC/ACM International International International Conference Joint JointConferences Conference on Web Intelligence on on Web Web Intelligence Intelligence and Intelligent and and Intelligent Intelligent Agent Technology Agent Agent Technologies Technology - Workshops

Multi-model Ontology-based Hybrid Recommender System in E-learning Domain Leyla Zhuhadar and Olfa Nasraoui Knowledge Discovery and Web Mining Lab Dept. of Computer Engineering and Computer Science University of Louisville, Louisville, KY 40292, USA

Robert Wyatt and Elizabeth Romero Office of Distance Learning Division of Extended Learning and Outreach Western Kentucky University, KY 42101, USA

Abstract—This paper introduces a multi-model ontology-based framework for semantic search of educational content in Elearning repository of courses, lectures, multimedia resources, etc. This hybrid recommender system is driven by two types of recommendations: content-based (domain ontology model) and rule-based (learner’s interest-based and cluster-based). The domain ontology is used to represent the learning materials. In this context, the ontology is composed by a hierarchy of concepts and sub-concepts. Whereas, the learner’s ontology model represents a subset of the domain ontology, and the cluster-based recommendations are added as additional semantic recommendations to the model. Combining the content-based with the rule-based provides the user with hybrid recommendations. All of them influenced the re-ranking of the retrieved documents with different weights. Our proposed approach has been implemented on the HyperManyMedia1 platform.

II. P REVIOUS W ORK There are several approaches to automatically generate Web recommendations based on a user’s browsing patterns or explicit ratings [1]. Some rely on learning a usage model from Web access data or user ratings. For example, lazy user modeling is used in the most widespread form of Collaborative Filtering which stores all users’ information and then uses KNearest-Neighbors (KNN) to provide recommendations from the previous history of the K most similar users [5]. Recently, [6], [7], [8], [9] used a different approach to recommend documents on the ground of the user profiles. This approach learns from implicit feedback or past click history. Other ways to form a user model include using data mining, such as by mining association rules of the form: IF user views page A, THEN user views page B [10], or by partitioning a set of user sessions into clusters or groups of similar sessions. The latter groups are called session clusters, or user profiles [11], [12]. More recently, [13] presented a Semantic Web usage mining methodology for mining evolving user profiles on dynamic Websites by clustering the user sessions in each period and relating the user profiles of one period with those discovered in previous periods to detect profile evolution, and also to understand what type of profile evolutions have occurred. This latter branch of using data mining techniques to discover user models from Web usage data is referred to as Web Usage Mining. A previous work on the use of Web mining for developing smart E-learning systems [4] integrated Web usage mining, where patterns were automatically discovered from users’ actions, and then fed into a recommender system that could assist learners in their online learning activities by suggesting actions or resources to a user. Another type of data mining in E-learning was performed on documents rather than on the students’ actions. This type of data mining is more akin to text mining (i.e., knowledge discovery from text data) than Web usage mining [14]. This approach helps alleviate some of the problems in E-learning that are due to the volume of data that can be overwhelming for a learner. It works by organizing the articles and documents based on the topics and also providing summaries for documents. In our approach we used Hybrid-based recommender system. We combined the Content-based with two types of Rule-

I. I NTRODUCTION One of the most powerful modes of personalization comes in the form of recommender systems [1]. Recommendation systems started back in information retrieval [2], but around 1990’s it emerged as an independent research field [3]. The filed of recommender systems can be classified into the following categories, based on how recommendations are made: - Content-based: the user is recommended items (Webpages) based on his/her past activities (interest); - Collaborative filtering: the user is recommended items (Webpages) based on people with similar interests liked in the past; - Rule-based: the user is recommended items (Webpages) based on rules that enable precisely the recommended items to those that limiting particular conditions. - Hybrid-based: this model uses methods that combine the above models, thus trying to avoid certain limitations in each one of the separate models. A recommender system in an E-learning context is a software agent that tries to “intelligently” recommend actions to a learner based on the actions of previous learners [4]. Such a recommender system could provide a recommendation to online learning materials or shortcuts. Those recommendations are based on previous learners’ activities or on the learning styles of the students that are discovered from their navigation patterns. 1 http://hypermanymedia.wku.edu

978-0-7695-3801-3/09 $26.00 © 2009 IEEE DOI 10.1109/WI-IAT.2009.238

91

Figure 1.

Hybrid Recommender System Framework

based. The rest of the paper is divided into the following sections: Section 3 (Methodology): We give an overview of three types of recommender systems, (1) Ontology Contentbased, (2) Cluster-based, and (3) Interest-based. Section 4 (Experimental Evaluation): In this section we describe our evaluation methods and present an experimental analysis. Section 5 (Conclusion): In this section we conclude with our findings. Finally, the paper ends with References.

n

R = ∪i=1 Ci

(1)

Where n= Number of concepts in the domain. Each concept Ci consists either of sub-concepts which can be children (Ci = ∪m j=1 SCji ) or of leaves which are the actual lecture documents (∪lk=1 dki ).  C = ∪m SC if C has subconcepts i

j=1

ji

i

Ci =

(2) ∪lk=1 dki

III. M ETHODOLOGY

leaves

We encoded the above semantic information into a treestructured domain ontology in OWL, based on the hierarchy of the E-learning resources. The root concepts are the colleges, while the sub-concepts are the courses, and the leaves are the resources of the domain (lectures). 1- Building our Domain Ontology: A variety of knowledgebased framework applications became available that support modeling ontologies. Two applications are the most famous of all, Protege2 and Altova3 . We used Protege as a framework application. Figure 2 shows the design of “HyperManyMedia” ontology in Protege. Recently, we added additional vocabularies under each Subsubclass. These terms are obtained from Section B. Cluster-based Recommendations. 2- Building A Learner’s Ontology Profile: We build the learner’s ontology profile by extracting the learner interests from the log. Let docs(Ui ) = ∪lk=1 dki be the visited documents by the ith learner, Ui . The learner ontology is considered as a subset of the E-learning domain ontology. Since our log of the user access activity shows the visited documents (which are the leaves), a bottom-up pruning algorithm is used to extract the semantic learner concepts that he/she is interested in. Each learner Ui ⊂ R has a dynamic semantic representation. First, we collect the learner’s activities over a period of time to form an initial learner profile, as follows:Let docs(Ui ) = ∪lk=1 dki be the visited documents by the ith learner, Ui . Starting from the leaves, the bottom-up pruning algorithm searches for each

Figure 1 shows the proposed architecture of our hybrid recommender system. A. Ontology Content-based Recommendations The idea of Content-based recommender system in an Elearning platform can be summarized as follows: Given the lectures that the learner has visited, the platform recommends other lectures with content that are similar to the content of the viewed lectures. Since our approach is based on a search engine based recommender system, the content of each lecture is considered as a document and the recommendation of pages is related to the matching between a learner’s query and the reverse-indexing of the lecture (Webpage). Our Search engine uses the Vector Space Model and the score of a query q for a document d is computed based on the cosine-distance similarity between the document and the query vector. The implementation can be described as follows: 1- Preliminary crawling and indexing (offline): crawling and indexing the E-learning platform that contributes to the content of the recommendation. 2- We start by representing each of the N documents as a → − term vector d =< w1 , w2 , ...wn >, where wi = tfi ∗ log nNi is the term weight for term (i), combining the term frequency, tfi , and the term’s Inverse Document Frequency (IDFi = log nNi ) if this term occurs in ni documents. 3- Building the E-learning Domain Ontology Let R, represents the root of the domain which is represented as a tree, and Ci represents a concept under R. In this case:

2 http://protege.stanford.edu/ 3 http://www.altova.com/

92

Table I C LUSTERING E NTROPY M EASURES FOR VARIOUS ALGORITHMS ( ROWS ) AND PARTIONING CRITERIA ( COLUMNS ) Agglomerative Methods I1

I2

E1

G1

G1∗

H1

0.040

0.025

0.039

0.102

0.043

0.024

H2

Slink

W SLink

Clink

W CLink

U P GM A

0.023

0.493

0.493

0.060

0.060

0.067

Direct k-way Methods I1

I2

E1

G1

G1∗

H1

0.036

0.020

0.040

0.067

0.055

0.038

H2

Slink

W SLink

Clink

W CLink

U P GM A

0.037

-

-

-

-

-

Repeated Bisection Methods L1

L2

E1

G1

G1∗

H1

0.027

0.034

0.036

0.058

0.036

0.022

Graph Partitional Methods

Figure 2.

pe

pG1

pH1

pH2

pI1

pI2

0.033

0.051

0.042

0.01

0.32

0.017

H2

Slink

W SLink

Clink

W CLink

U P GM A

0.032

-

-

-

-

-

assigned a priority ranking (α = 5.0). This boosting score has been implemented using doc.setBoost() = α, this weight is only added to the documents that the learner is interested in based on his/her previous activities (sessions). Since we used the ontology to generate the user profile, we name this type of recommendation as an Ontology Content-based Recommendations.

Hierarchical Structure of “HyperManyMedia” Ontology

Algorithm 1 Re-ranking a learner’s search results Input: q; //keyword search Output: Rank = {d1 , d2 , ..., dn }; //Re − rank Rank = {d1 , d2 , ..., dn }; //def ault search results f or query q l U Ri = ∪n j=1 SCji + ∪k=1 dki RC = ∪zc=1 dc ; //l = #of documents in Recommended Cluster foreach dj ∈ Rank if dj ∈ U Ri then dj .boost = α; //document is in user prof ile end else if dj ∈ RC then dj .boost = β; //document is in recommended cluster end else dj .boost = γ; end end

B. Cluster-based Recommendations We compared different hierarchical algorithms using the clustering package Cluto 4 . We implemented three different clustering algorithms that are based on the agglomerative, partitional, and graph partitioning paradigms [15]. Each algorithm uses a different algorithm for clustering, as shown in Table I. Of all the clustering algorithms, graph-partitioning produced the best clustering results with 35 clusters and the lowest entropy, detailed information about our clustering experiments can be seen in our previous work [16]. We consider extracting the most similar (recommended) cluster Ci = BestCluster, which is summarized by the T opn keywords (significant or frequent terms) to modify the learner’s semantic ontology and adding the cluster’s terms as semantic terms under the concepts (parent nodes) that these documents belong to, as a Rule-based recommendation. In Algorithm 1, we presented this rule as Category 2, where each document di belonging to the recommended cluster is assigned a priority ranking (β = 3.0). This boosting score has been implemented using doc.setBoost() = β. When a learner searches for lectures using a specific query q, the cosine similarity measure is used to retrieve the most similar documents that contain the terms in the query. Those documents are re-ranked based on weighted factor β. Also, we name this type of recommendation as a Cluster-based Recommendations.

Sort Rank based on boost f ield dj .boost

document visited by the learner in the “domain semantic structure”, and then increments the visit count (initialized with 0) of each visited node all the way up to the root. After backpropagating the counts of all the documents in this way in the domain structure, the pruning algorithm keeps only the concepts (colleges) and sub-concepts (courses) related to the learner interests with their weighted interests (which are the number of visits). When a learner searches for a lecture using a specific query q, the cosine similarity measure is used to retrieve the most similar documents that contain the terms in the query. Those documents have been boosted and re-ranked based on two factors. Here, we are going to introduce the first factor and in following section, the second factor. Algorithm 1 maps the ranked documents to the learner’s semantic profile (learner’s previous visited lectures) as Category 1, where each document di , belonging to a learner’s semantic profile, is

4 http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview

93

Figure 4.

Percentage of Improvement in Top-n Precision and Top-n Recall

94

ontology content-based recommendations (domain ontology model), ontology rule-based recommendations (interest-based and cluster-based). Our proposed approach has been implemented on the HyperManyMedia platform, and is already being used by online students at WKU. As of 2006 the “Hypermanymedia” search engine has been ranked number 24 on “The Ultimate Guide to Using Open Courseware6 ,” (between Cambridge University and Harvard Business). This work ended with an experimental evaluation of the results. Evidence was found that both personalization and semantic enrichment are potential elements for improving an E-learning Information Retrieval System. Figure 3.

Semantic terms recommendation

R EFERENCES [1] O. Nasraoui, “World wide web personalization,” Encyclopedia of Data Mining and Data Warehousing, Idea Group, 2005. [2] M. McGill and G. Salton, Introduction to modern information retrieval. McGraw-Hill, 1983. [3] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE transactions on knowledge and data engineering, vol. 17, no. 6, pp. 734–749, 2005. [4] O. Zaiane, “Building a recommender agent for e-learning systems,” Computers in Education, 2002. Proceedings. International Conference on, pp. 55–59 vol.1, 3-6 Dec. 2002. [5] B. J. Schafer, J. A. Konstan, and J. Riedi, “Recommender systems in e-commerce,” in ACM Conference on Electronic Commerce, 1999, pp. 158–166. [Online]. Available: http://citeseer.ist.psu.edu/benschafer99recommender.html [6] M. de Gemmis, G. Semeraro, P. Lops, and P. Basile, “A Retrieval Model for Personalized Searching Relying on Content-based User Profiles.” [7] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay, “Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search,” 2007. [8] T. Joachims and F. Radlinski, “Search engines that learn from implicit feedback,” Computer, vol. 40, no. 8, pp. 34–40, 2007. [9] T. Joachims, “Optimizing search engines using clickthrough data,” Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 133–142, 2002. [10] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, “Effective personalization based on association rule discovery from web usage data.” [11] O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator,” Eighth International World Wide Web Conference, Toronto, Canada, 1999. [12] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic personalization based on web usage mining,” Communications of the ACM, vol. 43, no. 8, pp. 142–151, 2000. [13] O. Nasraoui, M. Soliman, E. Saka, A. Badia, and R. Germain, “A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites,” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, pp. 202–215, 2008. [14] K. Hammouda and M. Kamel, Data mining in e-learning. Springer, 2006, pp. 1–28. [15] Y. Zhao and G. Karypis, “Evaluation of hierarchical clustering algorithms for document datasets,” Proceedings of the eleventh international conference on Information and knowledge management, pp. 515–524, 2002. [16] L. Zhuhadar and O. Nasraoui, “Semantic information retrieval for personalized e-learning,” Tools with Artificial Intelligence, 2008. ICTAI ’08. 20th IEEE International Conference on, vol. 1, pp. 364–368, Nov. 2008. [17] L. Zhuhadar, O. Nasraoui, and R. Wyatt, “Dual representation of the semantic user profile for personalized web search in an evolving domain,” in Proceedings of the AAAI 2009 Spring Symposium on Social Semantic Web, Where Web 2.0 meets Web 3.0, 2009, pp. 84–89.

C. Interest-based Recommendations We provided the learner with semantic term recommendations based on his/her visited concepts. We consider this type of recommendation as Rule-based. Since the ontology represents concepts and relationships, properties, functions and rules among these concepts. For each query q submitted by a learner, a semantic mapping between the query and the learner’s semantic profile brings all the concepts/subconcepts/cluster-based-recommended-terms. This framework allows the learner to navigate through the semantic structure of his/her query, as shown in Figure 3 by clicking on one of the recommended terms. For more details on a learner’s interests, refer to our previous work [17]. The effect of this action is to add the selected term to the query and repeat the search. Therefore the search is finally personalized via a query expansion using the recommended term that is selected. We name this type of recommendation, Interestbased Recommendations. IV. E XPERIMENTAL E VALUATION We used Top-n-Recall and Top-n-Precision to measure the effectiveness of re-ranking based on the learner’s semantic profile. We selected 10 learner profiles, with the size of each profile varying from one learner to another. We finally used our semantic search engine 5 to evaluate each query, and computed the Top-n-Precision and Top-n-Recall for normal search and for personalized semantic search for each learner. Figure 4 shows the Improvement in Top-n Recall and Top-n Precision for the personalized Search over the normal search, with three sizes of queries (1, 2, and 3 keywords). The personalized semantic search shows an improvement in precision that varies between 5-25 %. This improvement is noticeable between the top-30 and top-50 for single-keyword and two-keywords queries. The recall results show a noticeable improvement in recall between top-20 and top-40. Overall, these results show the effectiveness of the re-ranking based on the learner’s semantic profile. V. C ONCLUSION In this paper, we presented a Hybrid-based Recommender system. This system is driven by multi-ontology models:

6 Open Courseware: http://www.collegedegree.com/library/collegelife/the_ultimate_guide_to_using_open_courseware

5 http://hypermanymedia.wku.edu

95