QUT Digital Repository: http://eprints.qut.edu.au/
Liang, Huizhi and Xu, Yue and Li, Yuefeng and Nayak, Richi (2008) Collaborative Filtering Recommender Systems Using Tag Information. In: Web Intelligence and Intelligent Agent Technology. IEEE/WIC/ACM International Conference (and Workshops).
©
Copyright 2009 IEEE
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
Collaborative Filtering Recommender Systems Using Tag Information
Huizhi Liang, Yue Xu, Yuefeng Li, Richi Nayak Faculty of Information Technology Queensland University of Technology, Brisbane, Australia
[email protected], {yue.xu, y2.li, r.nayak}@qut.edu.au
Abstract
voting on the tagged information resources or items [1]. Thus, the tagging information can be used to make
Recommender Systems is one of the effective tools to
recommendations.
deal with information overload issue. Similar with the
Currently some researches are focusing on how to
explicit rating and other implicit rating behaviours such
use collaborative tagging information to recommend
as purchase behaviour, click streams, and browsing
personalized tags to users [2], but not much work has
history etc., the tagging information implies user’s
been done on utilizing tagging information to help users
important
to find interested items easily and quickly.
personal
interests
and
preferences
information, which can be used to recommend
In this paper, we will discuss how to recommend
personalized items to users. This paper is to explore
items to users based on tag information.
how to utilize tagging information to do personalized
2. Related work
recommendations. Based on the distinctive three dimensional relationships among users, tags and items, a new user profiling and similarity measure method is
Collaborative filtering is a traditional and wildly used
proposed. The experiments suggest that the proposed
approach to recommend items to users based on the
approach is better than the traditional collaborative
assumption that similar minded people may have
filtering recommender systems using only rating data.
similar taste or behaviors. In general, there are two kinds of collaborative filtering methods: user-based and
1. Introduction
item-based. Though there is a lot of work on the collaborative filtering recommender systems, to the best
Recommender systems can provide personalized
of our knowledge, only Tso-Sutter’s [3] work discussed
contents, services and information items to potential
about
consumers to decrease information retrieval time and
recommendation.
using
the
tag
information
to
do
item
support decision making process. Because user’s
In Tso-Sutter’s work, the tag information was
explicit rating is not always available, the implicit
converted into two 2-dimensional relationships, user-tag
rating such as purchase history, downloading behaviour
and tag-item, and was used as a supplementary source
and click patterns etc. become another important
to extend the rating data. Because it ignored the three
information source for recommender systems.
dimensional relationship among users, items, and tags,
With the development of web 2.0, collaborative
the users’ tagging behavior was not accurately profiled,
tagging information becomes popular. Besides helping
and thus the recommendation quality based on the
user organize his or her personal collections, a tag also
extended data is still not satisfactory.
can be regarded as a user’s personal opinion expression,
3. Tag-based Recommender systems
while tagging can be considered as implicit rating or 978-0-7695-3496-1/08 $25.00 © 2008 IEEE DOI 10.1109/WIIAT.2008.97
59
Neighbourhood formation is to generate a set of
3.1 User profiling
like-minded peers for a target user. Based on user
User profiling is to model users’ features or
profiles, the similarity of users can be calculated
preferences. The approaches of profiling users with
through various proximity measures such as Pearson
user-item rating matrix and keywords vectors are
correlation and cosine similarity. In Tso-Sutter’s work,
widely used in recommender systems. However, these
the overlap of tags shared by users was used to measure
approaches are used for describing two-dimensional
the similarity [3] just like the traditional collaborative
relationships
Though
filtering (CF) using the overlap of commonly rated
Tso-Sutter’s approach takes the relationship between
items. Tso-Sutter’s method actually is the traditional CF
tags and items into consideration [3], it ignores the
with an extended dataset treating tags as additional
relationship between tags and items for each user. The
items. The improvement is very limit, because the
user should be profiled not only by the tags and items,
user’s neighborhood may be incorrectly formed if only
but also the relationship between the tags and items of
treating users’ tagging as implicit rating and ignoring to
the user.
measure the similarities of the relationships of tags and
between
users
and
items.
To profile user’s tagging behavior accurately, we
items. We propose to measure the similarity of two
propose to model a user in a collaborative tagging
users from the following three aspects:
community in three aspects, i.e., the tags used by the
(1) UTsim(ui, uj): The similarity of users’ tags, which is
user, the items tagged by the user, and the relationship
measured by the percentage of common tags used
between the tags and the tagged items. For easy
by the two users:
describing the proposed approach, we give the
UTsim(ui,uj)=
|Tui∩Tuj|
(1) max{|Tuk|} uk ∈U (2) UPsim(ui, uj): the similarity of user’s items, which
following definitions first: U= {u1, u2…un}: Set of users in the collaborative tagging community.
is measured by the percentage of common items
P= {p1, p2… pm}: Set of items that have been
tagged
tagged by the two users:
by users.
UPsim(ui,uj)=
T= {t1, t2…, tl}: Set of tags that have been used by E(ui,tj,pk)={0,1}: a function that specifies whether user
(3) UTPsim(ui, uj): the similarity of the users’ tag-item
ui has used the tag tj to tag item pk
relationship, which is measured by the percentage
The user profile is defined as follows:
of common relations shared by the two users: |TPi∩TPj| (3) UTPsim(ui,uj)= max {|TPk|} uk ∈U
For a user ui, i=1..n, let Tui be the tag set of ui, E(ui,tj, pk) =1}, Tui
(2)
max{|Puk|} uk ∈U
users.
Tui={tj|tj∈T, ∃pk∈P,
|Pui∩Puj|
⊆ T, Pui be
the item set of ui, Pui={pk|pk∈T, ∃tj∈P, E(ui,tj, pk) =1}, Pui ⊆ P, TPi be the relationship between ui’s tag
Thus, the overall similarity measure of two users is
and item set, TPi={| tj∈T, pk∈P, and
defined as below:
E(ui,tj,pk)=1} , UFi = (Tui, Pui, TPi) is defined as the
Simu(ui, uj)=wUT*UTsim(ui, uj)+wUP*UPsim(ui, uj)
user profile of user ui. The user profile or user model of
+wUTP*UTPsim(ui, uj) (4)
all users is denoted as UF, UF={UFi|i=1..n }.
Where wUT + wUP+ wUTP=1, wUT, wUP and wUTP are the
3.2 Neighborhood Formation
weights
to
the
three
similarity
measures,
respectively. Similarly, the similarity between two items is defined
60
as formula (5).
(9) given below. According to the prediction scores, the
Simp(pi,pj)=wPU*PUsim(pi, pj)+wPT*PTsim(pi, pj) +wPUT*PUTsim(pi, pj)
top N items will be recommended to ui. .
(5)
Where wPU, wPT, wPUT are the weights. Their sum is 1,
Au(ui,pk)=
and PUsim(pi, pj), PTsim(pi, pj), PUTsim(pi, pj) are
∑simu(ui,uj)*R(uj, pk) uj∈C(ui)
(9)
|C(ui)| For the item based approach, the prediction score is
defined as follows: (1) PTsim(pi, pj): The similarity of two items based on
calculated by formula (10) using the item similarities.
the percentage of being put in the same tag. |Tpi∩Tpj| PTsim(pi, pj)= (6) max|Tpk| pk ∈P Where Tpk is the tag set of item pk. Tpk= {ti| tj∈T, E(pk,
Ap(ui,pk)= ∑simp(pk,pj) pj∈Pui,
(10)
4. Experiments
tj)=1}. We have conducted experiments to evaluate the
(2) PUsim(pi, pj): the similarity of two items based on
methods proposed in Section 3. The dataset for the
the percentage of being tagged by the same user. PUsim(pi,pj)=
|Upi∩Upj|
experiments is obtained from Amazon.com. Because the
(7)
max|Upk| pk ∈P
items of the Amazon tagging community are mainly books, the book items are collected. To avoid severe
Where Upk is the user set of item pk. Upk= {ui| ui∈U,
sparsity problem, we selected those users who tagged at
∃tj∈T, E(ui,tj, pk) =1}.
least 5 items, tags that are used by at least 5 users, and
(3) PUTsim(pi, pj): the similarity of the two items
items that are tagged at least 5 times. The final dataset
based on the percentage of common tag-item relationship. PUTsim(pi,pj)=
|UPi∩UPj|
comprises 3179 users, 8083 tags and 11942 books. The whole dataset is split into a test dataset and a
(8)
training dataset and the split percentage is 50% each.
max{|UPk|} pk ∈P Where UPj is the user and item set of tag tj. UPj= {| ui∈U,pk∈P, and E(ui,tj,pk)=1}.
recommendations. To evaluate the effectiveness of the proposed
3.3 Recommendation Generation
approach (Tag-based CF), we compared the precision and recall of the recommended top 5 items of the item
proposed approach with the performance of the standard
recommendations to the target user ui, namely, a user
We
propose
two
methods
to
make
collaborative filtering (Traditional CF) approaches that
based approach and an item based approach,
only use
based
user ratings and also compared with
on the neighbour users’ item lists or the similarity of
Tso-Sutter’s approach (Tag-aware method) that extends
items, respectively.
the user rating matrix with the tag information. In fact,
Let C(ui) be the neighbourhood of ui. For the user
the proposed approach covers the two approaches when
based approach, the candidate items for ui are taken
some of the similarity measure weights are set to zero.
from the items tagged by the users in C(ui). For each
The comparison of precision and recall of user-based
candidate item pk, based on the similarity between ui
approaches is illustrated in Figure 1, while item-based
and its neighbour users, and the neighbour users’
comparison is shown in Figure 2.
implicit ratings to pk denoted as R(uj, pk), a prediction score denoted as Au(ui,pk) is calculated using
5. Discussion
Equation
61
results suggest that the recommendation accuracy is The experimental results in Figure 1 and Figure 2
more improved by profiling users with tag, item and the
show that the precision and recall of the proposed
relationship between tag and item than profiling users
approach are better than the traditional CF and
by extending implicit rating with tag information.
Tso-Sutter’s
Furthermore, the results also suggest that it is better to
approach
for
both
user-based
and
item-based models
calculate the similarity based on the overall similarity of
Though Tso-Sutter claimed that the tag information
tagging behaviour than just measuring it as implicit
can only be useful after fusing the user and item
rating similarity.
collaborative filtering and it will be seen as noise for
6. Conclusion
standard user-based and item-based CF alone, our experimental results show that tag information can be
This paper discusses how to recommend items to
used to improve the standard user-based and item-based
users based on collaborative tagging information.
collaborative filtering.
Instead of treating tagging behaviour as just implicit rating behaviour, the proposed tag based collaborative
User-based approach
filtering
0.12 0.1 0.08 0.06 0.04 0.02 0
approach
uses
the
three
dimensional
relationship of tagging behaviour to profile users and
Precision Recall
generate most likely minded neighbours or similar items. The experiments show promising results of employing
Traditional CF
Tag-aware method
Tag-based CF
the
tag
based
collaborative
filtering
approach to recommend personalized items. The experimental
Top 5
results
also
indicate
that
the
tag
information can be used to improve the standard user-based
Figure 1. Comparison of user based approaches
item-based
collaborative
filtering
approaches.
Item-based approach 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0
and
References: [1] H. Halpin, V.Robu, and H. Shepherd. “The Complex Precision Recall
Dynamics of Collaborative Tagging”, Proceedings of the 16th international conference on World Wide Web, ACM, USA, 2007, pp. 211 – 220.
Traditional CF
Tag-aware method
Tag-based CF
[2] P. Heymann, D. Ramage, and H. Garcia-Molina. ”Social tag prediction”, Proceedings of the 31st annual international
Top 5
ACM SIGIR conference on Research and development in
Figure 2.
information retrieval, ACM, USA, 2008, pp. 531-538.
Comparison of item-based approaches
[3] K.H.L. Tso-Sutter, L.B. Marinho, and L.Schmidt-Thieme.
Besides, the experimental results also show that the
“Tag-aware
traditional collaborative filtering recommendation based
Collaborative Filtering Algorithms”, Proceedings of the 2008
on the similarity of rating behaviour doesn’t work well
ACM symposium on Applied computing, ACM, USA, 2008,
to process the collaborative tagging information. The
pp. 1995-1999.
62
Recommender
Systems
by
Fusion
of