Collaborative Filtering Recommender Systems Using Tag ... - Core

QUT Digital Repository: http://eprints.qut.edu.au/

Liang, Huizhi and Xu, Yue and Li, Yuefeng and Nayak, Richi (2008) Collaborative Filtering Recommender Systems Using Tag Information. In: Web Intelligence and Intelligent Agent Technology. IEEE/WIC/ACM International Conference (and Workshops).

©

Copyright 2009 IEEE

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Collaborative Filtering Recommender Systems Using Tag Information

Huizhi Liang, Yue Xu, Yuefeng Li, Richi Nayak Faculty of Information Technology Queensland University of Technology, Brisbane, Australia [email protected], {yue.xu, y2.li, r.nayak}@qut.edu.au

Abstract

voting on the tagged information resources or items [1]. Thus, the tagging information can be used to make

Recommender Systems is one of the effective tools to

recommendations.

deal with information overload issue. Similar with the

Currently some researches are focusing on how to

explicit rating and other implicit rating behaviours such

use collaborative tagging information to recommend

as purchase behaviour, click streams, and browsing

personalized tags to users [2], but not much work has

history etc., the tagging information implies user’s

been done on utilizing tagging information to help users

important

to find interested items easily and quickly.

personal

interests

and

preferences

information, which can be used to recommend

In this paper, we will discuss how to recommend

personalized items to users. This paper is to explore

items to users based on tag information.

how to utilize tagging information to do personalized

2. Related work

recommendations. Based on the distinctive three dimensional relationships among users, tags and items, a new user profiling and similarity measure method is

Collaborative filtering is a traditional and wildly used

proposed. The experiments suggest that the proposed

approach to recommend items to users based on the

approach is better than the traditional collaborative

assumption that similar minded people may have

filtering recommender systems using only rating data.

similar taste or behaviors. In general, there are two kinds of collaborative filtering methods: user-based and

1. Introduction

item-based. Though there is a lot of work on the collaborative filtering recommender systems, to the best

Recommender systems can provide personalized

of our knowledge, only Tso-Sutter’s [3] work discussed

contents, services and information items to potential

about

consumers to decrease information retrieval time and

recommendation.

using

the

tag

information

to

do

item

support decision making process. Because user’s

In Tso-Sutter’s work, the tag information was

explicit rating is not always available, the implicit

converted into two 2-dimensional relationships, user-tag

rating such as purchase history, downloading behaviour

and tag-item, and was used as a supplementary source

and click patterns etc. become another important

to extend the rating data. Because it ignored the three

information source for recommender systems.

dimensional relationship among users, items, and tags,

With the development of web 2.0, collaborative

the users’ tagging behavior was not accurately profiled,

tagging information becomes popular. Besides helping

and thus the recommendation quality based on the

user organize his or her personal collections, a tag also

extended data is still not satisfactory.

can be regarded as a user’s personal opinion expression,

3. Tag-based Recommender systems

while tagging can be considered as implicit rating or 978-0-7695-3496-1/08 $25.00 © 2008 IEEE DOI 10.1109/WIIAT.2008.97

59

Neighbourhood formation is to generate a set of

3.1 User profiling

like-minded peers for a target user. Based on user

User profiling is to model users’ features or

profiles, the similarity of users can be calculated

preferences. The approaches of profiling users with

through various proximity measures such as Pearson

user-item rating matrix and keywords vectors are

correlation and cosine similarity. In Tso-Sutter’s work,

widely used in recommender systems. However, these

the overlap of tags shared by users was used to measure

approaches are used for describing two-dimensional

the similarity [3] just like the traditional collaborative

relationships

Though

filtering (CF) using the overlap of commonly rated

Tso-Sutter’s approach takes the relationship between

items. Tso-Sutter’s method actually is the traditional CF

tags and items into consideration [3], it ignores the

with an extended dataset treating tags as additional

relationship between tags and items for each user. The

items. The improvement is very limit, because the

user should be profiled not only by the tags and items,

user’s neighborhood may be incorrectly formed if only

but also the relationship between the tags and items of

treating users’ tagging as implicit rating and ignoring to

the user.

measure the similarities of the relationships of tags and

between

users

and

items.

To profile user’s tagging behavior accurately, we

items. We propose to measure the similarity of two

propose to model a user in a collaborative tagging

users from the following three aspects:

community in three aspects, i.e., the tags used by the

(1) UTsim(ui, uj): The similarity of users’ tags, which is

user, the items tagged by the user, and the relationship

measured by the percentage of common tags used

between the tags and the tagged items. For easy

by the two users:

describing the proposed approach, we give the

UTsim(ui,uj)=

|Tui∩Tuj|

(1) max{|Tuk|} uk ∈U (2) UPsim(ui, uj): the similarity of user’s items, which

following definitions first: U= {u1, u2…un}: Set of users in the collaborative tagging community.

is measured by the percentage of common items

P= {p1, p2… pm}: Set of items that have been

tagged

tagged by the two users:

by users.

UPsim(ui,uj)=

T= {t1, t2…, tl}: Set of tags that have been used by E(ui,tj,pk)={0,1}: a function that specifies whether user

(3) UTPsim(ui, uj): the similarity of the users’ tag-item

ui has used the tag tj to tag item pk

relationship, which is measured by the percentage

The user profile is defined as follows:

of common relations shared by the two users: |TPi∩TPj| (3) UTPsim(ui,uj)= max {|TPk|} uk ∈U

For a user ui, i=1..n, let Tui be the tag set of ui, E(ui,tj, pk) =1}, Tui

(2)

max{|Puk|} uk ∈U

users.

Tui={tj|tj∈T, ∃pk∈P,

|Pui∩Puj|

⊆ T, Pui be

the item set of ui, Pui={pk|pk∈T, ∃tj∈P, E(ui,tj, pk) =1}, Pui ⊆ P, TPi be the relationship between ui’s tag

Thus, the overall similarity measure of two users is

and item set, TPi={| tj∈T, pk∈P, and

defined as below:

E(ui,tj,pk)=1} , UFi = (Tui, Pui, TPi) is defined as the

Simu(ui, uj)=wUT*UTsim(ui, uj)+wUP*UPsim(ui, uj)

user profile of user ui. The user profile or user model of

+wUTP*UTPsim(ui, uj) (4)

all users is denoted as UF, UF={UFi|i=1..n }.

Where wUT + wUP+ wUTP=1, wUT, wUP and wUTP are the

3.2 Neighborhood Formation

weights

to

the

three

similarity

measures,

respectively. Similarly, the similarity between two items is defined

60

as formula (5).

(9) given below. According to the prediction scores, the

Simp(pi,pj)=wPU*PUsim(pi, pj)+wPT*PTsim(pi, pj) +wPUT*PUTsim(pi, pj)

top N items will be recommended to ui. .

(5)

Where wPU, wPT, wPUT are the weights. Their sum is 1,

Au(ui,pk)=

and PUsim(pi, pj), PTsim(pi, pj), PUTsim(pi, pj) are

∑simu(ui,uj)*R(uj, pk) uj∈C(ui)

(9)

|C(ui)| For the item based approach, the prediction score is

defined as follows: (1) PTsim(pi, pj): The similarity of two items based on

calculated by formula (10) using the item similarities.

the percentage of being put in the same tag. |Tpi∩Tpj| PTsim(pi, pj)= (6) max|Tpk| pk ∈P Where Tpk is the tag set of item pk. Tpk= {ti| tj∈T, E(pk,

Ap(ui,pk)= ∑simp(pk,pj) pj∈Pui,

(10)

4. Experiments

tj)=1}. We have conducted experiments to evaluate the

(2) PUsim(pi, pj): the similarity of two items based on

methods proposed in Section 3. The dataset for the

the percentage of being tagged by the same user. PUsim(pi,pj)=

|Upi∩Upj|

experiments is obtained from Amazon.com. Because the

(7)

max|Upk| pk ∈P

items of the Amazon tagging community are mainly books, the book items are collected. To avoid severe

Where Upk is the user set of item pk. Upk= {ui| ui∈U,

sparsity problem, we selected those users who tagged at

∃tj∈T, E(ui,tj, pk) =1}.

least 5 items, tags that are used by at least 5 users, and

(3) PUTsim(pi, pj): the similarity of the two items

items that are tagged at least 5 times. The final dataset

based on the percentage of common tag-item relationship. PUTsim(pi,pj)=

|UPi∩UPj|

comprises 3179 users, 8083 tags and 11942 books. The whole dataset is split into a test dataset and a

(8)

training dataset and the split percentage is 50% each.

max{|UPk|} pk ∈P Where UPj is the user and item set of tag tj. UPj= {| ui∈U,pk∈P, and E(ui,tj,pk)=1}.

recommendations. To evaluate the effectiveness of the proposed

3.3 Recommendation Generation

approach (Tag-based CF), we compared the precision and recall of the recommended top 5 items of the item

proposed approach with the performance of the standard

recommendations to the target user ui, namely, a user

We

propose

two

methods

to

make

collaborative filtering (Traditional CF) approaches that

based approach and an item based approach,

only use

based

user ratings and also compared with

on the neighbour users’ item lists or the similarity of

Tso-Sutter’s approach (Tag-aware method) that extends

items, respectively.

the user rating matrix with the tag information. In fact,

Let C(ui) be the neighbourhood of ui. For the user

the proposed approach covers the two approaches when

based approach, the candidate items for ui are taken

some of the similarity measure weights are set to zero.

from the items tagged by the users in C(ui). For each

The comparison of precision and recall of user-based

candidate item pk, based on the similarity between ui

approaches is illustrated in Figure 1, while item-based

and its neighbour users, and the neighbour users’

comparison is shown in Figure 2.

implicit ratings to pk denoted as R(uj, pk), a prediction score denoted as Au(ui,pk) is calculated using

5. Discussion

Equation

61

results suggest that the recommendation accuracy is The experimental results in Figure 1 and Figure 2

more improved by profiling users with tag, item and the

show that the precision and recall of the proposed

relationship between tag and item than profiling users

approach are better than the traditional CF and

by extending implicit rating with tag information.

Tso-Sutter’s

Furthermore, the results also suggest that it is better to

approach

for

both

user-based

and

item-based models

calculate the similarity based on the overall similarity of

Though Tso-Sutter claimed that the tag information

tagging behaviour than just measuring it as implicit

can only be useful after fusing the user and item

rating similarity.

collaborative filtering and it will be seen as noise for

6. Conclusion

standard user-based and item-based CF alone, our experimental results show that tag information can be

This paper discusses how to recommend items to

used to improve the standard user-based and item-based

users based on collaborative tagging information.

collaborative filtering.

Instead of treating tagging behaviour as just implicit rating behaviour, the proposed tag based collaborative

User-based approach

filtering

0.12 0.1 0.08 0.06 0.04 0.02 0

approach

uses

the

three

dimensional

relationship of tagging behaviour to profile users and

Precision Recall

generate most likely minded neighbours or similar items. The experiments show promising results of employing

Traditional CF

Tag-aware method

Tag-based CF

the

tag

based

collaborative

filtering

approach to recommend personalized items. The experimental

Top 5

results

also

indicate

that

the

tag

information can be used to improve the standard user-based

Figure 1. Comparison of user based approaches

item-based

collaborative

filtering

approaches.

Item-based approach 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

and

References: [1] H. Halpin, V.Robu, and H. Shepherd. “The Complex Precision Recall

Dynamics of Collaborative Tagging”, Proceedings of the 16th international conference on World Wide Web, ACM, USA, 2007, pp. 211 – 220.

Traditional CF

Tag-aware method

Tag-based CF

[2] P. Heymann, D. Ramage, and H. Garcia-Molina. ”Social tag prediction”, Proceedings of the 31st annual international

Top 5

ACM SIGIR conference on Research and development in

Figure 2.

information retrieval, ACM, USA, 2008, pp. 531-538.

Comparison of item-based approaches

[3] K.H.L. Tso-Sutter, L.B. Marinho, and L.Schmidt-Thieme.

Besides, the experimental results also show that the

“Tag-aware

traditional collaborative filtering recommendation based

Collaborative Filtering Algorithms”, Proceedings of the 2008

on the similarity of rating behaviour doesn’t work well

ACM symposium on Applied computing, ACM, USA, 2008,

to process the collaborative tagging information. The

pp. 1995-1999.

62

Recommender

Systems

by

Fusion

of