Data Mining and Personalization Technologies - IEEE Xplore

Data Mining and Personalization Technologies Philip S. Yu IBM T. J. Watson Research Center Yorktown Heights, NY 10598 [email protected]

Abstract Data mining has become increasingly popular and is widely used in various application areas. In this paper, we examine new developments in data mining and its application to personalization in E-commerce. Personalization is what merchants and publishers want to do to tailor the Web site or advertisement and product promotion to a customer based on his past behavior and inference from other like-minded people. E-commerce oers the opportunity to deploy this type of one-to-one marketing instead of the traditional mass marketing. The technology challenges to support personalization will be discussed. These include the need to perform clustering and searching in very high dimensional data space with huge amount of data. We'll examine some of the new data mining technologies developed that can support personalization.

1. Introduction As the World Wide Web becomes increasingly important as an information source and a place to conduct commerce, Web surfers face the daunting challenge on how to sift through a morass of information to get to the needed one. Thus there is a need for a personalized recommendation system which can provide one-to-one guidance to the user [32]. This is especially important for E-commerce. In an E-commerce site, Web pages often contain some promotion of the company's products and external advertisement from other companies. The objective of personalization is to tailor the promotion and advertisement to match each viewer's interests. Using dynamically constructed Web pages, the promotion items can be selected in real-time (from a promotion list) based on what the person has liked or purchased in the past and his current interests which may be re ected by the Web pages browsed during the cur-

rent session. Similarly, the advertisement can be selected based on the viewer's demographic information and past behavior, e.g., clicking history on the dierent types of advertisement. There are two major approaches to provide personalized recommendation. In the content based approach, it recommends items that are similar to what the user has liked in the past [25]. In the collaborative ltering approach, it identi es other users that have showed similar preference to the given users and recommends what they have liked [35]. A hybrid approach of the two has also been explored [9]. Data mining, which is also referred to as knowledge discovery in databases, has recently become an important area of research. It focuses on the techniques of nontrivial extraction of implicit, previously unknown and potentially useful information (such as rules, constraints) from very large amounts of data. The common data mining problems include nding clustering, associations and classi cations [13]. In this paper, we will examine the major challenges to provide personalized recommendation. We will explore how some of the data mining techniques can be applied to personalization. In section 2, we review the dierent approaches on personalization. The various data mining techniques and their applications to personalization are considered in Section 3. Section 4 provides a summary.

2. Personalization 2.1. Alternative Approaches

Collaborative ltering is the more popular technique to provide personalized recommendation [19, 35, 31]. It makes recommendation by observing the behavior of the like-minded peer group. Ringo is a pioneering prototype based on this concept for music recommendation [35]. Under Ringo, each user is requested to provide ratings to a given set of music titles and based

on the ratings, the peer group of each user are then identi ed. Besides the obvious issue that the approach needs to have a large number of users participating in the rating eort before the recommendation starts to make sense, the more serious weakness of this approach is that rating schemes can only naturally be applied to a homogeneous type of simple products, such as books, CDs, videos, or computer games. In that environment, the more items two persons have rated in common, they can be considered to be more similar in preference. Only the count on the number of similarly (or dissimilarly) rated items matters, i.e., we don't need to be concerned about the identities of the items. The rating technique used in collaborative ltering does not seem to be directly applicable to a heterogeneous product environment. For example, a computer store can sell PCs, printers, adapters, disk drives, memory chips, game software, etc. These items have vastly dierent cost and characteristics. Since the implication of people preferring the same disk drive model is likely to be quite dierent from that of people preferring the same PC model, to determine how close in preference the two persons are, we need to know not only how many items they have rated in common, but also the identities of these items. Furthermore, for a complex product, such as a computer or automobile, one can like many aspects of the product item, but also dislikes some other aspects of it. So it may be hard to use a single number to convey a person's taste as people can give the same rating to an item for vastly dierent reasons. For example, one person can give a low rating due to his dislike of the color of an item, while the other person may like the color but is not satis ed with its functionality. Content based ltering analyzes the content of the items that a person has liked in the past and recommends items with similar content [25, 24]. So in general the items it can recommend are far more limited than the collaborative ltering approach which can get clues for recommendation from other like-minded people. That is to say, the content based approach tends to be more narrowly focused and recommend only items that score highly against the user pro le or past behavior. Furthermore, it can not distinguish items based on some assessment of quality or style. Another problem with content based ltering is how to analyze the content of each item. For certain items, the content dimensions may be easily identi able. For example, for video, the potential dimensions may include action, drama, humor, sex, violence, suspense, and obeat [30]. Each video can be rated across these dimensions. The resulting rating vector will serve as

a content descriptor or state vector of the video. For other products such as video camera or computer, the content dimensions may be less obvious. Even if the dimensions are readily identi able as in the video case, there is still a considerable amount of eort needed to rate each item across these dimensions. The exception is on text documents where content analysis may be automated. In the information retrieval community, content based approach has been employed to match user pro le with text documents. Various techniques have been developed to extract features (or key words) from the document using dierent weighting schemes [15, 20]. Finally, there is the hybrid approach which combines the collaborative ltering approach with the content based approach. Here peer group is formed based on the contents of the items that a person has liked in the past. The preference of a person is described by the aggregation of the content of each item which he has liked. One example is the Fab [9] system providing Web page recommendations. Another example exploring the hybrid approach is the Intelligent Recommendation Analyzer (IRA) which is a prototype we developed at IBM T.J. Watson Research Center. IRA is a personalization engine for Ecommerce. It allows for multiple recommendation engines for product promotion. Each of these recommendation engines uses a dierent method to address the need of a dierent type of product/store requirements, e.g., homogenous vs. heterogeneous product mix, simple vs. complex product, etc. IRA also consists of target mailing and advertisement engines. In addition to the personalization aspect, the advertisement engine can take into account dierent revenue objectives, such as number of exposures vs. clicks or a mixture of the two [6]. The target mailing engine identi es the customer list to send promotion e-mail based on speci c product or product category to be promoted. 2.2. Issues

The major challenges for a personalization system are its eciency, scalability, and quality of the recommendation. The eciency which is the amount of processing required to make a recommendation is important, as the recommendation generally has to be done in real-time. As an E-commerce site can have thousands of products and customers, the personalization system has to be scalable to the database size. For collaborative ltering, an important issue is how to eciently determine the peer group of a person. This peer group may need to be determined dynamically in real-time. This is due to the fact that a user

can exhibit distinct behaviors at dierent times, e.g., when shopping for a friend, the buying behavior will dier from the normal buying pattern. If a person usually only buys technical books, but is now shopping for a ction for a friend, a peer group which is interested in ctions should be used to make recommendation on the appropriate book. If the system has to compare the interests or pro le of a user with the pro les of all the other users in order to determine its peer group, it will not be scalable. For any personalization approach, the quality of the recommendation can be strongly aected by the way we determine the closeness between persons and/or items. Speci cally, an issue here is the similarity function or measure to determine whether two persons have close enough interests to be put into the same peer group for the collaborative ltering approach or two items have close enough characteristics to be put into the same category for the content based recommendation. Even in the simple case with a binary rating, i.e., like and dislike, or purchased and not purchased, to compare the closeness of the taste of two persons, there are many alternative similarity measures. We can use the number of matches between the lists of items liked by each person, the hamming distance (i.e., the mismatches), or some functions of the matches and hamming distance. For the more complex case, where the rating can be over a wider range of values, one needs to consider rating bias and the range eect. Rating bias is the problem that some people tend to rate things at the extreme ends of a given scale, while others tend to rate around the middle or neutral point. In [35], several dierent similarity measures are compared, including mean squared dierence, standard and constrained Pearson r correlation coecients. Range eect occurs when the rating range is open or wide (e.g., when the rating represents the number of purchases on an item or the number of occurrences of a keyword). In that case, the peer group formation will be dominated by the highest rating item if one simply uses the sum of the rating dierence on each item to select a peer group. Instead of directly using the magnitude of the rating, other similarity measures being considered include the cosine measure [20] or square root measure.

3 Data Mining Various data mining techniques can be used to improve recommendation systems. Here we will consider clustering, association, classi cation and similarity indexing. Clustering and similarity indexing techniques can be applied as a means to identify or form peer groups or content categories. Association rule min-

ing can be used to identify products which often are bought together for cross-sell. It can also be generalized to identify customer pro le for target promotion [4]. Classi cation is another means for target promotion or categorization. 3.1. Clustering

The clustering problem has been discussed extensively in the database literature as a tool for similarity search, customer segmentation, pattern recognition, trend analysis and classi cation. The method has been studied in considerable detail by both the statistics and database communities [11, 16, 17, 18, 27, 39]. Detailed studies on clustering methods may be found in [21].

3.1.1 Issues The problem of clustering data points can be de ned as follows: Given a set of points in multidimensional space, nd a partition of the points into clusters so that the points within each cluster are close to one another. (There may also be a group of outlier points.) Some algorithms assume that the number of clusters is prespeci ed as a user parameter. We now consider how to map the peer group formation problem into a clustering problem. Each product item represents a dimension of the state space. Each customer can be represented by a state vector, where the value on each dimension indicates the person's interests on the corresponding item. So identifying a group with similar interests is similar to nding a cluster where the points are close together. The personalization system can pre-generate the cluster and select the appropriate peer group dynamically based on the user behavior. However, there is a dimension issue as the number of items can be very large, e.g., in a commerce application, the number of dierent products can often be more than a couple of thousands. Most clustering algorithms do not work eciently in higher dimensional spaces because of the inherent sparsity of the data [22]. In high dimensional applications, it is likely that at least a few dimensions exist for which a given pair of points are far apart from one another. So a clustering algorithm is often preceded by feature selection (See, for example [23]). The goal is to nd the particular dimensions for which the points in the data are correlated. Pruning away irrelevant dimensions reduces the noise in the data. The problem of using traditional feature selection algorithms is that picking certain dimensions in advance can lead to a loss of information. Furthermore, in many real examples, some points are

xx x x x x xx x xx x x x x xx

Y axis

Z axis

xxx x xxxx xx

a possibly dierent subset i of dimensions for each cluster Ci , 1 , such that the points

xx xx x x x x

xxx xxxxx xx x x

D

i

D

X axis

Cross Section on X-Y axis

Cross Section on X-Z axis

Figure 1. Difficulties Associated with Feature Preselection

correlated with respect to a given set of dimensions and others are correlated with respect to dierent dimensions. Thus it may not always be feasible to prune o too many dimensions without at the same time incurring a substantial loss of information. We demonstrate this with the help of an example. In Figure 1, we have illustrated two dierent projected cross sections for a set of points in 3-dimensional space. There are two patterns to the data. The rst pattern corresponds to a set of points in the - plane, which are close in distance to one another. The second pattern corresponds to a set of points in the - plane, which are also close in distance. We would like to have some way of discovering such patterns. Feature preselection is not a viable option here, since each dimension is relevant to at least one of the clusters. x y

x z

3.1.2 Projected Clustering In this context, we shall now de ne what we call a projected cluster [1]. Consider a set of data points in some (possibly large) dimensional space. A projected cluster is a subset of dimensions together with a subset C of data points such that the points in C are closely clustered in the projected subspace of dimensions . In Figure 1, two clusters exist in two dierent projected subspaces. Cluster 1 exists in projected space, while cluster 2 exists in projected - space. We assume that the number of clusters to be found is an input parameter. The output of the algorithm will be twofold: D

D

x y

x z

k

a ( + 1)-way partition fC1

Ck Og of the data, such that the points in each partition element except the last form a cluster. (The points in the last partition element are the outliers, which by de nition do not cluster well.) k

; :::;

;

in the th cluster are correlated with respect to these dimensions. (The dimensions for the outlier set O can be assumed to be empty.) The cardinality of each set i for the dierent clusters can be dierent. Techniques for performing projected clustering have been studied in [1] to address clustering in high dimension space. One advantage of this technique is that since its output is two-fold in terms of reporting both the points and the dimensions, it gives the user a very good idea of both the identity and nature of the similarity of the dierent points in each cluster. For example, in the peer group formation problem for collaborative ltering discussed before, the state space can contain thousands of dimensions to represent the different product items. It is very unlikely that similarity can be found for each and every dimension in the high dimensional space. However, it is possible to segment the data points into groups, such that each group is de ned by its similarity based on a speci c set of dimensions. Clearly the set of representative dimensions for each cluster is useful information, since it contains the set of product items which are of common interests to all people in the cluster and may directly be used for analyzing the behavior of that cluster. i

x x x x x

X axis

k

3.2. Similarity Indexing

One approach to quickly identify people with similar taste (respectively, items with similar characteristic) is to provide an ecient index structure by mapping each person (respectively, item) to a state vector based on his rating or purchase history (respectively, attribute values). It has been observed in [34] that index structures are often sensitive to the nature of the similarity measure/function used. Two simple similarity functions between two state vectors are the number of matches and the hamming distance. The inverted index is a data structure which can support queries based on the number of matches between the target state vector and the database. It relies on the sparsity of the data in order to provide ecient responses to similarity queries [33] and is widely used for information retrieval applications. Unfortunately, the inverted index cannot eciently resolve queries in which the similarity criterion can be any general function of the number of matches and hamming distance between the target state vector and the database. It also cannot easily support queries which are variations on the traditional similarity search problem (e.g., range

queries or multi-target variations or similarity queries). Furthermore, the degradation in performance of inverted index with increasing density of state space can be very steep. In [2], a similarity indexing method is proposed to construct a single index to simultaneously support a variety of queries based on dierent similarity functions. It considers the case when the state vector is binary. For example, the state vector can represent the product items a person has purchased. More speci cally, each product item represents a dimension in the state space and for each item purchased by the person in the past, the corresponding dimension is set to 1, otherwise it will be zero. In another example, the state vector can represent the keywords of interests to a person, where each dimension denotes a potential keyword. The exible index structure is able to perform similarity search when the similarity function between two state vectors is a general function of the number of matches and the hamming distance. The index structure, referred to as the signature table, is independent of the similarity function that can be selected at query time. A signature is a set of items (or attributes). The set of items in the original data are partitioned into a set of signatures. Assuming that it is partitioned into K sets of signatures, the supercoordinate of the state vector will exist in K-dimensional space such that each dimension of the supercoordinate has a unique correspondence with a particular signature. Each dimension in this K-dimensional supercoordinate is a 0-1 value which indicates whether or not the corresponding signature is activated by the state vector. The signature table consists of a set of 2K entries. One entry in the signature table corresponds to each possible supercoordinate. Thus, the entries in the signature table create a partition of the state space. This partition will be used for the purpose of similarity search based on some branch and bound technique. The method is shown in [2] to provide very good pruning and accurate performance and scale well with database size and memory availability. 3.3. Association rules

The association rule problem was originally proposed in the context of supermarket data to study the relationships of the buying patterns of customers in transaction data [7], i.e., to nd how the items bought in a consumer basket related to each other.

3.3.1 Problem Formulation Let = f 1 2 m g be a set of binary literals called I

i ; i ; : : :; i

items. Each transaction

T

is a set of items, such that

. This corresponds to the set of items which a consumer may buy in a basket transaction. An association rule is a condition of the form ) where and are two sets of items. The idea of an association rule is to develop a systematic method by which a user can gure out how to infer the presence of some sets of items, given the presence of other items in a transaction. Such information is useful in making decisions such as customer targeting, shelving, and sales promotions. The support of a rule ) is the fraction of transactions which contain both and . The con dence of a rule ) is the fraction of transactions containing , which also contain . Thus, if we say that a rule has 90% con dence then it means that 90% of the tuples containing also contain . The process of mining association rules is a two phase technique in which all large itemsets are determined, and then these large itemsets are used in order to nd the rules [7]. The large itemset approach is as follows. Generate all combinations of items that have fractional transaction support above a certain user-de ned threshold called minsupport. We call all such combinations large itemsets. Given an itemset = f 1 2 k g, we can use it to generate at most rules of the type [ , f r g] ) f r g for each 2 f1 g. Once these rules have been generated, only those rules above a certain user de ned threshold called mincon dence may be retained. In order to generate the large itemsets, an iterative approach is used to rst generate the set of large 1-itemsets 1 , then the set of large itemsets 2 , and so on until for some value of the set r is empty [8]. T

I

X

X

I

Y

Y

I

X

Y

X

Y

X

Y

X

Y

X

Y

S

i ; i ; : : :; i

k

r

S

i

i

; : : :; k

L

L

r

L

3.3.2 Pro le Association Rules A number of interesting extensions [37, 38, 4] and improvements on rule quality [5, 12] and mining eciency [8, 10, 28, 3] have been proposed. The problem of mining pro le association rule has been proposed in [4] which associates customer pro le information with behavior information. The left hand side of the rules consists of quantitative attributes corresponding to customer pro le information. Examples of such attributes are age and salary. The right hand side of the rules consists of binary or categorical attributes corresponding to customer behavior information. Examples of such attributes are buying beer or diapers. The objective of mining pro le association rules is to identify customer pro le for target marketing. Let correspond to a customer pro le and 1 , ..., k correspond to behavior attributes with values 1 , C

b

b

v

..., k . The rule ) 1 = 1 j 2 = 2 j k = k is true at a given level of support and con dence if and only if all of the rules ) i = i for all 2 f1 2 g are true at that level of con dence and support. Thus, represents the common pro le of people who exhibit all of the behaviors i = i for all 2 f1 2 g. Note that if the set of rules g, then the i ) i = i holds for all 2 f1 2 rule [ki=1 i ) 1 = 1 j 2 = 2 j k = k may not necessarily hold. Thus, a straightforward postprocessing after nding rules for individual behavior types does not solve the above problem. In [4], a multidimensional indexing technique is developed to generate the pro le association rules in online fashion. An additional advantage of using an indexing structure is that it allows the exibility to specify speci c pro le ranges and behavior attributes to nd the pro le rules. The pro le association rule problem is closely related to the mining of quantitative association rules in relational tables proposed in [38]. In such cases association rules are discovered in relational tables which have both categorical and quantitative attributes. Thus, it is possible to nd rules which indicate how a given range of quantitative and categorical attributes may aect the values of other attributes in the data. However, the mining algorithms are very dierent. The algorithm for the quantitative association rule problem discretizes the quantitative data into disjoint ranges and then constructs an item corresponding to each such range. Once these pseudo-items have been constructed, a large itemset procedure can be applied in order to nd the association rules. Often a large number of rules may be produced by such partitioning methods, many of which may not be interesting. By using the multidimensional index approach, the pro le association rule algorithm avoids these problems. v

C

b

v

b

C

i

;

v

b

b

v

v

; : : :; k

C

b

i

C

;

b

v

C

v

; : : :; k

i

b

v

b

;

v

b

; : : :; k v

3.4. Classification

The problem of classi cation has been studied extensively by the database and Arti cial Intelligence communities. The problem of classi cation is de ned as follows: The input data is referred to as the training set, which contains a plurality of records, each of which contains multiple attributes or features. Each example in the training set is tagged with a class label. The class label may either be categorical or quantitative. The problem of classi cation in the context of a quantitative class label is referred to as the regression modeling problem. The training set is used in order to build a model of the classi cation attribute based upon the

other attributes. This model is used in order to predict the value of the class label for the test set. The classi cation problem has several direct applications in target marketing, and E-commerce. A prototypical application of classi cation is that of mass mailing for marketing. For example, credit card companies often mail solicitations to consumers. Naturally, they would like to target those consumers who are most likely to respond to a mailer. Often demographic information is available for those people who have responded before to such solicitations, and this information may be used in order to target the most likely respondents. Decision tree is one commonly used approach [29]. The idea in decision trees is to recursively partition the data set until each partition contains mostly of examples from a particular class [26, 36]. Each non-leaf node of the tree contains a split point which uses some condition to decide how the data set should be partitioned. For each example in the training data, the split point is used in order to nd how to best partition the data. The performance of the decision tree depends critically upon how the split point is chosen. The condition which describes the split point is described as a predicate. Predicates with high inference power are desirable for building decision trees with better accuracy. Most decision trees in the literature are based on single attribute predicates. Recent work by Chen and Yu [14] has discussed how to use multi-attribute predicates in building decision trees. K-nearest neighbor technique is another approach. In this case, we nd the nearest neighbors of the test example and assign its class label based on the majority labels of the nearest neighbors. The distributions of the class labels in the nearest neighbors may also be used in order to nd the relative probabilities for the test example to take on dierent values. Thus, nearest neighbor techniques assume that locality in the feature space may often imply strong relationships among the class labels. This technique may often lose robustness in very high dimensional space, since the data tends to be sparse, and the concept of locality is no longer well de ned. (In other words, well de ned clusters do not exist in the original feature space.)

4. Conclusions Personalization provides the merchants and publishers the ability to tailor the Web site or advertisement and product promotion to each customer based on his past behavior and inference from other like-minded customers. This oers E-commerce the opportunity to deploy one-to-one marketing instead of the tradi-

tional mass marketing. We discuss the technological challenges to provide personalization and examine new developments in data mining and its applications to personalization, including techniques to support clustering and searching in a very high dimensional data space with huge amount of data.

References [1] Aggarwal C. C., Procopiuc C., Wolf J. L., Yu P. S. Park J.-S.: Fast Algorithms for Projected Clustering. Proceedings of the ACM SIGMOD Conference (1999). [2] Aggarwal C. C., Wof J.L., Yu P. S.: A New Method for Similarity Indexing for Market Data. Proceedings of the ACM SIGMOD Conference (1999). [3] Aggarwal C. C., Yu P. S.: Online Generation of Association Rules. Proceedings of the International Conference on Data Engineering (1998). [4] Aggarwal C. C., Sun Z., Yu P. S.: Online Algorithms for Finding Pro le Association Rules. Proceedings of the International Conference on Knowledge Discovery and Data Mining (1998). [5] Aggarwal C. C., Yu P. S.: A New Framework for Itemset Generation. Proceedings of the ACM Symposium on Principles of Database Systems (1998). [6] Aggarwal C. C., Wof J.L., Yu P. S.: A Framework for the Optimizing of WWW Advertising. Trends in Distributed Systems for Electronic Commerce, (Lecture Notes in Computer Science 1402), Springer (1998). [7] Agrawal R., Imielinski T., Swami A.: Mining Association Rules between Sets of Items in Very Large Databases. Proceedings of the ACM SIGMOD Conference (1993). [8] Agrawal R., Srikant R.: Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th VLDB Conference (1994). [9] Balabanovic M., Shoham Y.: Content-Based, Collaborative Recommendation. Communications of the ACM, Vol. 40, No. 3 (1997). [10] Bayardo R. J.: Eciently Mining Long Patterns from Databases. Proceedings of the ACM SIGMOD Conference (1998).

[11] Berger M., Rigoutsos I.: An Algorithm for Point Clustering and Grid Generation. IEEE Transactions on Systems, Man and Cybernetics, Vol. 21, No. 5 (1991). [12] Brin S., Motwani R., Silverstein C.: Beyond Market Baskets: Generalizing Association Rules to Correlations. Proceedings of the ACM SIGMOD Conference (1997). [13] Chen M.-S., Han, J., Yu P. S.: Data Mining: An Overview from a Database Perspective. IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6 (1996). [14] Chen M.-S., Yu P. S.: Using Multi-Attribute Predicates for Mining Classi cation Rules. IBM Research Report 20562 (1996). [15] Cutting D. R., Karger D. R., Pedersen J. O.: Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. Proceedings of the ACM SIGIR Conference (1993). [16] Ester M., Kriegel H.-P., Xu X.: A Database Interface for Clustering in Large Spatial Databases. Proceedings of the International Conference on Knowledge Discovery and Data Mining (1995). [17] Ester M., Kriegel H.-P., Xu X.: Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Ecient Class Identi cation. Proceedings of the International Symposium on Large Spatial Databases (1995). [18] Ester M., Kriegel H.-P., Sander J., Xu X.: A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the International Conference on Knowledge Discovery and Data Mining (1995). [19] Goldberg D., Nichols D., Oki B. M., Terry D.: Using Collabortive Filtering to Weave an Information Tapestry. Communications of the ACM, Vol. 35, No. 12 (1992). [20] Hearst M. A., Pedersen J. O.: Reexaming the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proceedings of the ACM SIGIR Conference (1996). [21] Jain A., Dubes R.: Algorithms for Clustering Data. Prentice Hall, Englewood Clis, New Jersey (1998).

[22] Keim D., Berchtold S., Bohm C., Kriegel, H.-P.: A Cost Model for Nearest Neighbor Search in Highdimensional Data Space. Proceedings of the International Symposium on Principles of Database Systems (1997). [23] Kohavi R., Sommer eld D.: Feature Subset Selection Using the Wrapper Method: Over tting and Dynamic Search Space Topology. Proceedings of the International Conference Knowledge Discovery and Data Mining (1995). [24] Krulwich, B., Burkey, C.: Learning User Information Interests through Extraction of Semantically Signi cant Phrases. Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access (1996). [25] Lang, K.: Newsweeder: Learning to Filter Netnews. Proceedings of the 12th Intl. Conference on Machine Learning (1995). [26] Mehta M., Agrawal R., Rissanen J.: SLIQ: A Fast Scalable Classi er for Data Mining. Proceedings of the 5th International Conference on Extending Database Technology (1996). [27] Ng R., Han J.: Ecient and Eective Clustering Methods for Spatial Data Mining. Proceedings of the 20th VLDB Conference (1994). [28] Park J. S., Chen M. S., Yu P. S.: Using a Hash-based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions on Knowledge and Data Engineering, Volume 9, No 5 (1997). [29] Quinlan J. R.: Induction of Decision Trees, Machine Learning, Volume 1, No. 1 (1986). [30] http://www.reel.com [31] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Reidl, J.: GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proceedings of the ACM Conference on ComputerSupported Cooperative Work (1994). [32] Resnick, P., Varian H. R.: Recommender Systems. Communications of the ACM, Vol. 40, No. 3 (1997). [33] Salton G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Publishing Company.

[34] Seidl T., Kriegel H.-P.: Optimal Multi-Step kNearest Neighbor Search. Proceedings of the ACM SIGMOD Conference (1998). [35] Shardanand U., Maes, P. Social Information Filtering: Algorithms for Automating "Word of Mouth". Proceedings of the Conference on Human Factors in Computing Systems-CHI'95 (1995). [36] Shafer J., Agrawal R., Mehta M.: SPRINT: A Scalable Parallel Classi er for Data Mining. Proceedings of the 22nd VLDB Conference (1996). [37] Srikant R., and Agrawal R.: Mining Generalized Association Rules. Proceedings of the 21st VLDB Conference (1995). [38] Srikant R., Agrawal R.: Mining Quantitative Association Rules in Large Relational Tables. Proceedings of the ACM SIGMOD Conference (1996). [39] Zhang T., Ramakrishnan R., Livny M.: BIRCH: An Ecient Data Clustering Method for Very Large Databases. Proceedings of the ACM SIGMOD Conference (1996).