Who do you want to be today? Web Personae for ...

6 downloads 0 Views 79KB Size Report
Abstract. Personalised context sensitivity is the Holy Grail of web information retrieval. As a first step towards this goal, we present the Web Personae person-.
Who do you want to be today? Web Personae for personalised information access JP McGowan Nicholas Kushmerick Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin {jp.mcgowan,nick,barry.smyth}@ucd.ie Abstract. Personalised context sensitivity is the Holy Grail of web information retrieval. As a first step towards this goal, we present the Web Personae personalised search and browsing system. We use well-known information retrieval techniques to develop and track user models. Web Personae differ from previous approaches in that we model users with multiple profiles, each corresponding to a distinct topic or domain. Such functionality is essential in heterogeneous environments such as the Web. We introduce Web Personae, describe an algorithm for learning such models from browsing data, and discuss applications and evaluation methods.

Introduction Despite recent advances in Web information retrieval technologies (e.g. [1,3]), Web search services will find it increasingly difficult to return relevant and valuable results unless they deploy mechanisms for delivering personalised context-sensitive results. As a step towards this goal, we introduce Web Personae, a simple method for developing web user models, and describe several applications that use Web Personae to deliver personalised context-sensitive search results. Web Personae are designed to address a long standing issue in personalised information filtering: people often have multiple information needs, and attempting to model a user with a single monolithic profile can lead to poor retrieval accuracy. Consider the following scenario. Michelle has a variety of interests: she is a medical doctor, and enjoys golf, plays computer games, and regularly visits the theatre. Given a list of Michelle’s favourite web pages or other aspects of her browsing history that reflect these interests, a Web Personae system should automatically discover that Michelle can be modelled by distinct personae such as Golf, Games, Theatre and Medical. Furthermore, given these models and a sample of her current browsing behaviour, the Web Personae system should recognise which persona is currently active, and personalise her information access. For example, if Michelle browses on pages with words such as “green” and “tee”, then the Web Personae system should recognise that her active topic is Golf. Web Personae enable a variety of personalised information access applications. First, as a user surfs, several kinds of adaptive hypertext applications could dynamically transform the HTML of web pages. The most obvious application is web page recommendation: based on recent pages, we can recommend either links on the current page, or new pages entirely. For example, if a user is interested in Golf, then, when they visit a generic sports site, the application could highlight links specifically relevant to golf. Similarly, if the user then goes to a search engine, it would be possible to highlight the results returned for their query that more closely match their per-

sonal needs – based on their estimated web persona. A more sophisticated method would involve a full re-ranking of the results based on the current persona, so that relevant results are ranked higher. For example, the application could apply a modified version of the PageRank [1] or HITS [3] topology-based algorithms that weights pages according the similarity to the current persona, before calculating the topologybased ordering. Finally, another application is query expansion based on high-weight terms in the current persona. These applications demonstrate a variety of ways in which Web Personae can deliver customised context-sensitive information access.

Web Personae Construction and Recognition The architecture of the Web Personae system is depicted in Figure 1. The three main components of the system are the Constructor, which learns the Web Personae, the Recogniser, which estimates which persona is currently active, and the Application, which uses this to provide personalised context-sensitive hypertext adaptation. User Model (personae) P1 Constructor/ Maintainer

P2

Pn

Recogniser

web

URL

Client

Context

URL Pi

Application

Fig. 1. System Architecture

Personae Construction The Constructor component uses hierarchical clustering techniques over web page content. This content is initially provided in the form of a list of frequently visited pages, browsing history, or bookmarks. Given these URLs for a given user, the Constructor fetches the web pages, and performs feature-selection by stop-word removal and stemming [2]. The resulting document term vectors are then clustered, using the standard TF-IDF cosine similarity metric. In order to automatically discover a user’s distinct personae, the clustering process is halted when the ratio of intra-cluster similarity to inter-cluster similarity has reached a maximum. The clusters learned by this process are assumed to represent the user’s several Web Personae. (Below we describe some experiments we intend to run to evaluate this assumption.)

As well as this offline functionality, the Constructor has a greedy, incremental mode in which the personae are modified to track preference changes. However, due to noise and transient browsing behaviour we have found that this online mode poses several thorny user-interface challenges. Personae Recognition Once the personae have been identified in the offline stage, we must utilise these when the user is online. An essential design requirement is that the user should not have to explicitly indicate which persona is currently active; the system should be able to infer the current persona based on user actions. Furthermore, this inference must be made rapidly as the user surfs from page to page. The Recogniser component uses a simple and efficient similarity estimate. We convert the centroids of the personae clusters and the current pages into term vectors that captures only the word frequencies, without taking IDF into account. We then select the Web Persona that has the largest cosine similarity with the current document. This gives us a quick persona recognition system, which has worked well in preliminary experiments. Related Work Document clustering has mainly been used in information retrieval for improving the effectiveness and efficiency of the retrieval process. We utilise automatic clustering to reveal different domains of user interest. The documents are represented in a vector space [4], then we use hierarchical agglomerative techniques to produce a cluster tree. Web document clustering has been extensively researched in recent years [5,6,7,8]. Applications range from bookmark organising to recommendation systems. Many systems have been developed which assist web browsing. Letizia [9] learns the interests of a user by observing their browsing behaviour - it can then recommend links to follow - i.e. it models the browsing process, rather than explicitly modelling the user, as our system does. WebWatcher [10] takes some user interests as an initial input, then updates these interests based on pages they visit. The system then recommends pages, based on these interests and the previous browsing behaviour of other users with similar interests. Various systems have been developed which utilise user models for personalisation. WebMate [7] is an agent that assists browsing and searching. It represents different domains of user interest using multiple term vectors - it updates these incrementally when users give positive feedback for visited pages. However, it does not cluster the vectors to produce a 'persona' as in our system. WebACE [6] constructs a customised user profile by recording information about the documents the user browses. It then clusters these documents, using novel clustering techniques, and uses these to generate queries to search for similar documents. Personal View Agent [11] tracks, learns and manages user interests. Beginning with a fixed palette of categories, the system follows the user, detecting their domains of interest. This ‘personal view’ takes the form of a tree and corresponds closely to our notion of web personae. This view can be updated - i.e. it can adapt to changing

user interests using a 'personal view maintainer', which can split and merge categories in the personal view. Applications and Evaluation We have introduced the notion of Web Personae, discussed how they enable personalised context-sensitive information access, and described how they can be automatically learned and recognised from browsing behaviour using standard information retrieval techniques. We are currently conducting an empirical evaluation of the learning and recognition components. Preliminary tests indicate good performance but we are currently designing more sophisticated evaluations. One experiment involves looking at logs for servers that provide a local search facility. Using a user's accesses to the server obtained from the web logs, we can build their personae. We can then look at clickthrough data from local searches run by that user to estimate how effective the Web Personae system would have been at re-ranking these search results. We are also building applications for the system, such as the page recommender service and web query expansion service discussed earlier. Our main application will be a personalised search service, based on both the simple highlighting of relevant results, and the more sophisticated system that does a full re-ranking based on link analysis techniques. References 1.

S. Brin & L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. 7th International World Wide Web Conference, 1998. 2. M. Porter. An algorithm for suffix stripping. Program, 1980. 3. D. Gibson, J. Kleinberg & P. Raghavan. Inferring Web communities from link topology. In Proc. ACM Conf. Hypertext & Hypermedia, 1998. 4. G. Salton & M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983 5. D. Cutting, D. Karger, J .Pederson, J.Tukey. Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections. In Proc. SIGIR, 1992. 6. E. Han, et al. WebACE: A web agent for document categorization and exploration. In Proc. 2nd Int. Conf. on Autonomous Agents, 1998 nd 7. L. Chen, K.Sycara. WebMate: A personal agent for browsing and searching. In Proc. 2 Int. Conf. on Autonomous Agents, 1998 8. Y. Maarek, I. Ben Shaul. Automatically Organising Bookmarks per Contents. In Proc. WWW5, 1996. 9. H.Lieberman. Letizia: An agent that assists web browsing. In Proc. IJCAI-95, Montreal, Canada 10. T. Joachims, D. Freitag & T. Mitchell. WebWatcher: A tour guide for the World Wide Web, In Proc. IJCAI-97, Nagoya, Japan 11. Chien Chin Chen, Meng Cheng Chen, Yeali Sun. A Web Document Personalisation User Model and System. In Proc. User Modelling, 2001.