Improving the usability of an e-commerce web site ... - Semantic Scholar

6 downloads 0 Views 641KB Size Report
approach to enhance the search function of the BOL1 web site, an on-line media shop specialized in books, CDs and gifts. A chatterbot was installed at the BOL ...
Improving the usability of an e-commerce web site through personalization F. Abbattista, M. Degemmis, O. Licchelli, P. Lops, G. Semeraro and F. Zambetta Dipartimento di Informatica – Università di Bari Via E. Orabona, 4 - 70125 Bari - tel +39-080-5442140 fax +39-080-5443196 {fabio, degemmis, licchelli, lops, semeraro, zambetta}@di.uniba.it

Abstract. The rapid evolution of interactive Internet services has led to both a constantly increasing number of modern web sites and to an increase in their functionality, which in turn makes them more complicated to use. The COGITO project aims at developing innovative software components allowing e-commerce companies to effectively set up and maintain Web sites which address customers in personalized and pro-active ways. The COGITO solution is based on “intelligent personalized agents” which represent virtual assistants or advisors (also visually) by modeling their ability to support customers. In this paper we present the profile extractor, the personalization component, based on machine learning techniques, which allows for the discovery of preferences, needs and interests of users that have access to an e-commerce web site. Exploiting personalization and the underlying ‘one-to-one’ marketing paradigm is of great importance for business in order to be successful in today competitive markets.

1 Introduction The COGITO project, UE-funded in the 5th Framework Programme, Key Action 2: New Methods of Work and Electronic Commerce (IST-1999-13347), aims at improving consumer-supplier relationships in future e-commerce through intelligent personalized agents [1,2,4] which can play the role of virtual assistant for users. In order to reach this objective, it is necessary to combine the usefulness of a value-added service with a high degree of usability. Furthermore, suitable measures to build up trust and confidence in inexperienced users have to be adopted. To meet these conditions the interaction should be as natural as possible, thus enabling users to rely on their communicative skills, it must convey precise and relevant information, and address the personal background of the individual user. A chatterbot (chat robot) is a software system capable of engaging in conversation in written form with a user. Simple chatterbots only simulate conversation without utilizing any knowledge about the individual users and their actual behavior during online sessions. Such simple virtual agents are not powerful enough to serve as a medium for customer advice. In order to develop a system that offers useful “proactive” advices and reacts in a cooperative way, we used learning mechanisms for

extracting dynamic features of a given user from the dialogue and storing the learned information in a user profile. The paper proposes an “intelligent” retrieval process founded upon chatterbot technology, user modeling techniques, and an automatic query expansion mechanism which takes advantage from the knowledge stored in user profiles in order to refine the original query assumed by the chatterbot. In the COGITO project, we use this approach to enhance the search function of the BOL1 web site, an on-line media shop specialized in books, CDs and gifts. A chatterbot was installed at the BOL German site and experimentally used in BOL’s on-line media shop: The agent, based on its rule bases, provides access to relevant product information or helps the user in finding appropriate offers. Thus, a considerably more precise search in the product database is accomplished.

2 User Profiling Personalization is very common in the area of e-commerce, where a user explicitly wants the site to store information on herself such as her preferences. In fact, the more a system knows about users the better it can serve them effectively. But there are different styles, and even philosophies, to teach computers about user habits, interests, patterns and preferences. User modeling simply means ascertaining a few bits of information about each user, processing that information quickly and providing the results to applications, all without intruding upon the user’s consciousness. The final result is the construction of a user model or a user profile [5]. By user profile we mean all the information collected about a user that logs to a web site, in order to take into account her needs, wishes, and interests. A user profile, as intended within the COGITO project, is composed by two main frames: the frame of user data, which comprehends interaction data (number of searches or purchases within a category, number of connections, etc.) and the frame of the user interests, which is the part of the profile built on the basis of supervised learning algorithms. The preferences for the user automatically “learned” by the system concern the ten main book categories the BOL product database is subdivided into. User profiles are represented by XML files and are the key to personal recommendations because they enable the agent to customize its book recommendations to the individual user. The main advantages of using this approach in e-commerce are [10]: - making the site more attractive for users: a web site that takes into account user preferences is able to suggest products reflecting customer needs. It results more attractive and will probably turn a significant part of browsers into buyers; - obtaining customer trust and confidence: users will not be requested to explicitly insert information concerning their preferences, tastes, etc., but they will be able to participate in the management and updating of their personal profile. This will result in an increase of their trust and confidence in a system able to automatically collect data about their preferences; 1

BOL Medien, one of the partners of the COGITO consortium, is one of the European subsidiaries of Bertelsmann AG, Europe’s largest media enterprise.

- improving customer loyalty: the effectiveness of a personalization system improves in the long run. Every time a customer interacts with the web site, the personalization mechanism collects new data about her preferences, so that a more and more satisfactory service can be offered. In this case, passing to the competition is often unfavorable for a customer. In fact, even if a competitor uses a personalization system, it has to learn a lot of information about the new customer to be able to offer the same satisfactory service.

3 The Profile Extractor Module in the COGITO Architecture This section describes the COGITO architecture (Figure 1), identifying the role of the Profile Extractor module within the system. The general architecture is mainly based on six macro modules: BOL Web-server, Connector, eBrain, Profile Extractor, Prompter and XML Content Manager (XML CM). The overall system architecture is centered around an existing chatterbot system, eBrain by Logica pdv, capable of engaging a conversation with users. The integration with back-end systems is realized through the Connector that enables the system to access external services and knowledge sources. The XML CM provides a suite of web services to support the creation, management and distribution of XML documents and specific Document Type Definitions (DTDs) or Schemas. In the COGITO architecture, the Profile Extractor module is the personalization component which dynamically discovers user preferences from data collected in the course of past dialogues with the chatterbot. By examining the dialogue histories, it extracts some characteristics that are useful for recognizing the categories preferred by a buyer. Indeed, the module is capable of automatically assigning a customer with a subset of product categories (book categories) in order to improve the system usability. Finally, the Prompter is an automatic query expansion system accessing the user profiles. Applying a rule interpreter, the prompter is able to use the structure of documents and, through comparisons with the profile terms, to expand the original query acquired via chatterbot.

Fig. 1. The architecture of the COGITO system.

4 The Profile Extractor Architecture and the Profile Generation Process The Profile Extractor is a highly reusable module that allows for the classification of users accessing to a web site. It employs supervised learning techniques in order to induce a set of rules that are used by a classification module, the Profile Manager, in the effective profile generation step. The complete architecture of the module, that is further divided into four submodules, is shown in Figure 2. The Profile Manager and the Profile Rules Extractor are the modules mainly involved in the profile generation process; the Usage Patterns Extractor implements both a clustering algorithm and a technique for extracting association rules (unsupervised learning). It groups dialogue sessions in order to infer some usage patterns that can be exploited for understanding trends useful to further market studies and for grouping single users, sharing the same interests and preferences, into user communities [7]. The core of the system is Weka [11], a machine learning tool downloadable on the Web, developed at the University of Waikato (New Zealand) and written in Java. It is a suite of supervised and unsupervised learning algorithms. In the native implementation, Weka can accept data in ARFF format2. During a session, user dialogues with the eBrain agent are stored in log files. The Dialogue Analyzer (Figure 1) receives the log files of past sessions and processes them in order to produce a structured dialogue history, containing some transactional data (Figure 3). The input to the Profile Extractor is represented by an XML file containing all the interactions and personal details of users. The file comes, through the Connector, from the Dialogue Analyzer module that provides a representation format for dialogue histories. The XML I/O Wrapper has the task of extracting all relevant information from the file in order to prepare the examples for the learning components. The most important information to be considered for the setting of the examples are: user_id, date of last access, number of connections and, for each book category, number of searches, frequency of searches, number of purchases and frequency of purchases.

Fig. 2. The architecture of the Profile Extractor.

2

In the COGITO system we developed a new version of Weka that is able to represent input and output in XML format.

Fig. 3. An example of Structured Dialogue History file.

This information is arranged into a set of unclassified instances, where each instance represents a single customer. The subset of the instances chosen to train the learning system has to be pre-classified by a domain expert. These instances represent the actual input to the Profile Rules Extractor, which infers classification rule sets for each book category (Figure 4). The learning scheme is based on PART [3], a rulebased learner that induces classification rules from pruned partial decision trees built using C4.5 heuristics [8]. The actual user profile generation process is performed by the Profile Manager on the ground of the user history and the set of rules induced by the Profile Rules Extractor. When the need of generating/updating user profiles arises (such a decision is taken by the domain expert), the related structured dialogue history is arranged into a set of instances which represents the input to the Profile Manager. On the basis of the classification rule sets inferred, the classifier assigns a “classification” to an instance. In other words, for each book category, the module predicts whether the user is interested in that category. All these classifications, together with the interaction details, are gathered to form a user profile (Figure 5). The column on the right contains the list of the 10 book categories and the degree of user interest. After the training phase, once a user accesses the BOL web site, her dialogue history file is generated or updated by the system. The file is then exploited to produce a new example that the Profile Extractor classifies on the ground of the rules inferred. In this way the system is capable of tracking user behavior evolution and, consequently, customer profiles are updated across multiple interactions. In the experiments performed to test the Profile Rules Extractor, we considered the 10 book categories of the on-line book selling company BOL (they are used to organize the offered books for customers from German-speaking countries). For each of the 10 classes, the system has been trained in order to infer proper classification rules. In the experimental session we used a training set of 500 examples, each one representing a different user of the BOL web site. 1. 2. 3. 4. 5.

If search_freq_Belletristik > 0.3 Then Class: yes ElsIf search_freq_Belletristik > 0.11 And age > 38.0 Then Class: yes ElsIf search_freq_Nachschlagewerke 0.21 Then Class: yes Otherwise Class: no Fig. 4. An example of classification rules for the class Belletristik.

Fig. 5. An example of user profile.

The outcome of the training phase consisted of 75 rules for all the 10 classes. An empirical evaluation of the added value provided by the profile extractor module to learn interaction models for the users of a digital library service is available in [9].

5 Exploiting the Profiles to Search Interesting Products On the ground of their profiles, the chatterbot offers a better support to customers during the interaction, providing personal recommendations, purchase incentives and helping users in problematic situations during the search. This improves the usability of the BOL web site, as shown by the following three scenarios. Scenario 1 - unknown user: A user is known by the COGITO system if she completes the BOL registration procedure. This step allows to provide each customer with a personal identification number and is necessary to both recognize a user accessing to the on-line media shop and collect data about her preferences for generating/updating her profile. In the first scenario, a dialogue between the chatterbot, named Susanna, and an unknown user asking for a book by author “ King” is undertaken. Susanna: User: Susanna: User: Susanna: User: Susanna: User: Susanna:

Good evening! Nice to see you found your way to BOL.DE. May I introduce myself? My name is Susanna and I am your personal assistant at Bertelsmann Online. What is your name? My name is Fabienne. Nice to meet you, Fabienne. What can I do for you? I’m looking for a book. Then this is the right place to be! We have hundreds of thousands of deliverable books. Are you looking for a book by a specific author? Yes What is the name of the author? King I shall check whether we have a book by the author King. Please wait …

Fig. 6. Susanna offers a long list of books belonging to several categories by authors whose last name is “King”.

Susanna finds several books by the author “King” through a remote call (deep linking) to the search engine available on the BOL web site and displays them, as shown in Figure 6. It can be noticed that the books ranked first are by the author Stephen King. Books by other authors are found further down the list, which means that the user should scroll down a long list if she was not looking for a book by Stephen King. The customer not looking for a Stephen King book can now choose to either refine the search by using an advanced search function or continue to chat with Susanna about different fields of interest. Scenario 2 - registered user: In the second scenario, the user has already been chatting with Susanna about some of her interests. Therefore, a profile of this user is available to the system, which can exploit it to accomplish a more precise search in the product database. Let us suppose that the profile of such a user is the one depicted in the Figure 5 and the query submitted by the user is the same as the previous scenario. The first book displayed in the page of the search results is a book about Windows 2000 written by Robert King (Figure 7). This result is due to the fact that the original query “ King” has been automatically expanded by the system in “King” AND “Computer & Internet” (highlighted by the circle in Figure 7), since Computer & Internet is the category with the highest degree of interest in the profile of the user (see Figure 5). A thorough description of the query expansion process is given in the following section. Scenario 3 – another registered user: Let us suppose that another registered user having different interests with respect to the previous user undertakes a dialogue with Susanna. Let us suppose that she likes “Science & Technique” (Wissenschaft_und_Technik) and dislikes “ Narrative” (Belletristik). The first book displayed in Figure 8 is “ Allergy in ENT Practice” by the author Hueston C. King. The result of the query shows books in this category at the top. In this case, the system exploits the query expansion mechanism to modify the original query “ King” into “King” AND “ Naturwissenschaften”, a subcategory of Wissenschaft_und_Technik.

Fig. 7. List of books by authors whose last name is “King” belonging to the book category “Computer & Internet”.

These scenarios highlight the dependence of the result set on the profile of the user that issued the query.

6 The Query Expansion Process When a user asks the chatterbot for a book by author “ King” , eBrain dynamically builds an XML file containing the value “ King” for the proper tag – in this case – and sends it to the “Prompter” in order to expand the original query by using the favorite book categories stored in her profile. The query expansion process consists of an improvement of the criteria used for the specification of a query. This is usually achieved by adding search terms to an already defined query. The additional terms may be taken from different sources, as shown in Figure 9.

Fig. 8. List of books by authors whose last name is “King” belonging to the book category “Naturwissenschaften”.

Fig. 9. The query expansion process.

Once the input file has been parsed, the prompter is responsible for determining a suitable query expansion method to be used, according to the information available in the input file. The decision process is described in [6]. The information sources invoked for expanding a query are: 1) The Product Thesaurus. Products like books or other kinds of media are usually characterized by a textual description. The most relevant words contained in these descriptions are clustered according to their relation to the most frequently appearing ones, thus generating a “thesaurus” of terms. 2) The User Profiles. They are accessed for identifying the book categories preferred by a user, which can be enclosed in a query for a more specific result identification, as described above. 3) The Usage Patterns. The application of association rules to a specific user can lead to infer a possible interest of this user in a product or service. In this way, the chatterbot can decide which dialogue context to use when the dialogue comes to a dead end, i.e. when the user does not want to neither take the initiative nor mention a specific topic of discourse. The information retrieved by the interface to the information sources (i.e. the expanded keywords, the preferred book category or the dialogue context coming out from an applied usage pattern) is used for the generation of a deep linking which is directly forwarded to the chatterbot by the prompter.

7 Conclusions A key issue when developing personalization applications is constructing accurate and comprehensive customer profiles based on the collected data. User models are a prerequisite for personalized interaction between computer systems and their users. The paper has shown that user modeling is crucial for providing intelligent personalized user support in modern e-commerce applications. We presented the profile extractor module, the personalization component based on machine learning techniques, used in the COGITO project to build user profiles. Moreover, we examined how a chatterbot using this module can be integrated in a web site of an online book selling company to improve the intelligent retrieval process. We can

conclude that our approach leads to improve the usability of the BOL web site and similar ones.

Acknowledgements The COGITO project is funded by the European Commission under contract IST1999-13347. The consortium is lead by FhG-IPSI, and comprises Logica Pdv Unternehmensberatung GmbH, Hamburg; BOL Medien GmbH, Rheda-Wiedenbrück; Risø National Laboratory (System Analysis Department), Roskilde; University of Bari, LACAM Laboratory; and Sword ICT S.r.l., Bari.

References 1.

André, E., Rist, T., Müller, J.: Integrating reactive and scripted behaviors in a life-like presentation agent. In: Proceedings of the Second International Conference on Autonomous Agents (Agents '98), ACM Press, New York (1998) 261-268 2. Ball, G., Ling, D., Kurlander, D., Miller, J., Pugh, D., Skelly, T., Stankosky, A., Thiel, D., Dantzich, M. van, Wax, T.: Lifelike computer characters: the Persona project at Microsoft Research. In: Bradshaw, J.M. (ed.): Software Agents, AAAI/MIT Press, Menlo Park (1997) 191-222 3. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of International Conference on Machine Learning, Morgan Kaufmann Publishers , Menlo Park (1998) 144-151 4. Hayes-Roth, B., Johnson, V., Gent, R. van, Wescourt, K.: Staffing the Web with interactive characters. Communications of the ACM 43(1999) 103-105 5. Kobsa, A.: User Modeling: Recent Work, Prospects and Hazards. In: SchneiderHufschmidt, M, Kuehme, T, Malinowski, U (eds.): Adaptive User Interfaces: Principles and Practice, North-Holland, Amsterdam (1993) 111-128 6. L'Abbate, M., Thiel, U.: Intelligent Product Information Search in E-Commerce: Retrieval Strategies for Virtual Shop Assistants. In Stanford-Smith B., Chiozza E. (eds.): Proceedings of E-work and E-Commerce Conference 2001, IOS Press, Amsterdam (2001) 347-353 7. Paliouras, G., Papatheodorou, C., Karakaletsis, V., Spyropoulos, C., Malaveta, V.: Learning User Communities for Improving the Service of Information Providers. In: Lecture Notes in Computer Science, Vol. 1513. Springer-Verlag, Berlin Heidelberg New York (1998) 367-384 8. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993) 9. Semeraro, G., Ferilli, S., Fanizzi, N., Abbattista, F.: Learning Interaction Models in a Digital Library Service. In: Bauer M., Gmytrasiewicz P.J., Vassileva J. (eds.): UM2001: User Modelling - Proc. of the 8th Int. Conf., Lecture Notes in Artificial Intelligence, Vol. 2109. Springer, Berlin Heidelberg New York (2001) 44-53 10. Tasso, C., Omero, P.: Personalization of web content: e-commerce, i-access, egovernment. Franco Angeli Ed., Milano (2002) (in italian) 11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, Menlo Park (2000)