0% found this document useful (0 votes)
50 views

Towards The Semantic Web: Collaborative Tag Suggestions

Tagging allows users to organize and share web content by assigning keywords or tags. While tagging has benefits over traditional hierarchies, user-generated tags can be chaotic and low-quality. The document proposes criteria for high-quality tags, including coverage of multiple facets, popularity, and least effort. It also describes an algorithm that uses these criteria and collective user authorities to suggest high-quality tags, helping to address issues with user-generated tags. An implementation in My Web 2.0 showed the algorithm was effective in suggesting appropriate tags.

Uploaded by

miracle007_bd
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Towards The Semantic Web: Collaborative Tag Suggestions

Tagging allows users to organize and share web content by assigning keywords or tags. While tagging has benefits over traditional hierarchies, user-generated tags can be chaotic and low-quality. The document proposes criteria for high-quality tags, including coverage of multiple facets, popularity, and least effort. It also describes an algorithm that uses these criteria and collective user authorities to suggest high-quality tags, helping to address issues with user-generated tags. An implementation in My Web 2.0 showed the algorithm was effective in suggesting appropriate tags.

Uploaded by

miracle007_bd
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Towards the Semantic Web: Collaborative Tag Suggestions

Zhichen Xu, Yun Fu, Jianchang Mao, and Difu Su


Yahoo! Inc
2821 Mission College Blvd., Santa Clara, CA 95054
{zhichen, yfu, jmao, difu}@yahoo-inc.com

ABSTRACT assign labels (in the form of keywords) to Web objects with
Content organization over the Internet went through several a purpose to share, discover and recover them. Discovery
interesting phases of evolution: from structured directories to enables users to find new content of their interest shared by
unstructured Web search engines and more recently, to tagging other users. Recovery enables a user to recall content that
as a way for aggregating information, a step towards the was discovered before. Further, tagging allows ranking and
semantic web vision. Tagging allows ranking and data data organization to utilize metadata from individual users
organization to directly utilize inputs from end users, enabling
directly. It brings some benefits of semantic Web into the
machine processing of Web content. Since tags are created by
individual users in a free form, one important problem facing
current HTML dominated Web.
tagging is to identify most appropriate tags, while eliminating We are witnessing an increasing number of tagging services
noise and spam. For this purpose, we define a set of general on the web, such as Flickr [11], Delicious [10], My Web
criteria for a good tagging system. These criteria include high 2.0 [12], Rawsugar [14], and Shadows [15]. Flickr enables
coverage of multiple facets to ensure good recall, least effort to users to tag photos and share them with others. Delicious
reduce the cost involved in browsing, and high popularity to
users can tag URLs and share their bookmarks with the
ensure tag quality. We propose a collaborative tag suggestion
algorithm using these criteria to spot high-quality tags. The public. My Web 2.0 provides a Web-scale social search
proposed algorithm employs a goodness measure for tags derived engine to enable users to find, use, share, and expand
from collective user authorities to combat spam. The goodness human knowledge. It allows users to save and tag Web
measure is iteratively adjusted by a reward-penalty algorithm, pages so that they can easily browse and search for the
which also incorporates other sources of tags, e.g., content-based content again. It also enables users to share Web pages
auto-generated tags. Our experiments based on My Web 2.0 show within a personalized community or to the public by setting
that the algorithm is effective. access privileges. Further, My Web 2.0 provides scoped
search within user’s trusted social networks, e.g., friends or
Keywords friends of friends. Consequently, the search results are
Classification, tagging, information retrieval, collaborative
personalized and spam-filtered by the trusted networks.
filtering, Web 2.0.
Tagging advocates a grass root approach to form a so-
1. INTRODUCTION called “folksonomy”, which is neither hierarchical nor
Effectively organizing information over the World Wide exclusive. With tagging, a user can enter labels in a free
Web has been a challenging problem since the beginning. form to tag any object; it therefore relieves users much
In the early days of the Internet, portal services organized burden of fitting objects into a universal ontology.
Web content into hierarchical directories, assuming that the Meanwhile, a user can use a certain tag combination to
Web can be organized by strict structures of topics. express the interest in objects tagged by other users, e.g.,
However, the manually supervised directories have been tags (renewable, energy) for objects tagged by both
gradually predominated by crawler-based search engines the keywords renewable and energy.
for at least two reasons: data explosion and the unstructured Ontology works well when the corpus is small or in a
nature of Web content. While search engines work well for constrained domain, the objects to be categorized are
users to access Web information by issuing ad hoc queries, stable, and the users are experts [8]. A universal ontology is
they use very limited semantic information of the Web difficult and expensive to construct and maintain when
content by parsing content and exploiting the hyperlink there involve hundreds of millions of users with diverse
structure established by Web masters. The pull model used background. When used to organize Web objects, ontology
by search engines makes it hard to discover new and faces two hard problems: unlike physical objects, digital
dynamic content. According to Brightplanet, the deep Web content is seldom semantically pure to fit in a specific
can be 500 times larger than the surface Web. In addition, category; and it is difficult to predict the paths, through
personalization and spam detection require human inputs. which a user would explore to discover a digital object [8].
Furthermore, it is difficult for people to share massive Taking Yahoo directory as an example, a recipe book
unstructured Web pages among each other or recover them belongs to both the categories Shopping and Health,
later. A push model that directly takes inputs from users
solves these problems. Tagging is a process by which users
since it is hard to predict which category an end user would
perceive to be the best fit.
Tagging bridges some gap between browsing and search. tagging
Browsing enumerates all objects and finds the desirable one
by exerting the recognition aspect of human brain, whereas folksonomy
search uses association and dives directly to the interested
objects, and thus is mentally less obnoxious [9].
The benefits of tagging do not come without a cost. For
instance, the number of tags in a social network multiples ontology
like rabbits [13]. The structure in traditional hierarchy
disappears: Tagging relates to faceted classification, which
uses clearly defined, mutually exclusive, and collectively
exhaustive aspects to describe objects. For instance, a Figure 1. Tag browsing via filtering. The objects tagged by
music piece can be identified by facets such as artist, the tag “folksonomy” intersect with those tagged by the tags
album, genre, and composer. Faceted systems fail to dictate “tagging” and “ontology.” Therefore, the tags “tagging” and
a linear order in which to experience the facets, a step “ontology” are related to the tag “folksonomy.”
crucial for guiding the users to explore this system. Since tags auto-generated via content-based or context-based
tags are created by end-users in a free form, they can be analysis.
chaotic when compared with a faceted system constructed
by experts. This lack of order and depth can result in a • We have implemented a simplified tag suggestion
disaster, leaving the users muddled in a “hodgepodge” [13]. scheme in My Web 2.0. Our experience shows that this
simple scheme is quite effective in suggesting
To remedy the shortcomings of tagging, we advocate using
appropriate tags that possess the properties proposed
collaboratively filtering to automatically identify high-
by us for a good tagging system.
quality tags for users, leveraging the collective wisdom of
Web users. Specifically, this paper makes the following The rest of the paper is organized as follows: Section 2
contributions: discusses an important usage of tags for relational
browsing. Section 3 describes a set of criteria for selecting
• We discuss the desirable properties of a good tagging
high quality tags and proposes an algorithm for tag
system, which include: (a) high coverage of multiple
suggestion. In section 4, we illustrate our algorithm with a
facets, (b) high popularity, and (c) least-effort. Faceted
few examples. We conclude in Section 5.
and generic tags can facilitate the aggregation of
objects entered by different users. It makes discovery
and recovery of tagged content easier. Tags used by a 2. RELATIONAL TAG BROWSING
large number of people for a given object are less
likely to be spam and more likely to be used by a new Tagging is a tool to organize objects for the purposes of
user for the same object. Least-effort has two recovery and discovery. Unlike scientific classification,
meanings: The number of objects identified by the which forces a hierarchical structure on objects, tagging
suggested tags should be small, and the number of tags organizes objects in a network structure, thus making it
for identifying an object should be minimized as well. suitable to organize Web objects, which lack a clear
This enables efficient recovery of the tagged objects. hierarchical structure by nature. Tagging, when combined
with search technology, becomes a powerful tool to
• We propose collaborative tagging techniques that discover interesting Web objects. With the help of search
suggest tags for an object based on what other users technology, tagged objects can be browsed or searched for.
use to tag the object. This not only addresses the The way tags work is analogous to filters. They are treated
vocabulary divergence problem, but also relieves users as logical constraints to filter the objects. Refinement of
the obnoxious task of having to come up with a good results is done through strengthening the constraints
set of tags. whereas generalization is done by weakening them. E.g.,
• We propose a reputation score for each user based on tag combination (2006, calendar) strengthens tag
the quality of the tags contributed by the user. (2006) and tag (calendar).
• By introducing the notion of “virtual” users, our tag Figure 1 illustrates how tags can be used as a filtering
suggestion algorithm incorporates not only user- mechanism for browsing and searching for objects. In My
generated tags but also other sources of tags, such as Web 2.0, we explore the co-occurrence of tags to enable tag
browsing through progressive refinement. When a user
selects a tag combination, the system returns the set of include generic tags such as category (travel), location
objects tagged with the combination. Meanwhile, it also (San Francisco), time (2005), specific tag (Golden
returns the tags that relate to the selected tags, which are Gate Bridge), and subjective tag (cool).
those co-occur with the selected tags. In Figure 1, the tags Generic tags facilitate the aggregation of the content
(tagging) and (ontology) relate to the tag entered by different users and thus are often used for a large
folksonomy. number of objects. The larger the number of facets the more
likely a user is able to recall the tagged content.
In the next section, we describe our collaborative tag
suggestion algorithm. High popularity. If a set of tags are used by a large number
of people for a particular object, these tags are less likely to
3. COLLABORATIVE TAG SUGGESTION be a spam. They are more likely to uniquely identify the
3.1 A taxonomy of tags tagged content and the more likely to be used by a new user
Before presenting the algorithm, we first describe the for the given object. This is analogous to the term
categories of tags that we observe on My Web 2.0. frequency in traditional information retrieval.

1. Content-based tags: Tags that describe the content of Least-effort. The number of tags for identifying an object
an object or the categories that the object belongs to, should be minimized, and the number of objects identified
e.g., Autos, Honda Odyssey, batman, open by the tag combination should be small. As a result, a user
source, Lucene, and German Embassy. These can reach any tagged objects in a small number of steps via
tags are usually specific terms and are common in My tag browsing.
Web 2.0. Uniformity (normalization). Since there is no universal
2. Context-based tags: Tags that provide the context of an ontology, tags can diverge dramatically. Different people
object in which the object was created or saved, e.g., can use different terms for the same concept. In general, we
tags describing locations and time such as San have observed two general types of divergence: those due
Francisco, Golden Gate Bridge, and to syntactic variance, e.g., blogs, blogging, and bog;
2005-10-19. and those due to synonym, e.g., cell-phone and
mobile-phone, which are different syntactic terms that
3. Attribute tags: Tags that are inherent attributes of an refer to the same underlying concept. These kinds of
object but may not be able to be derived from the divergence are a double-edged sword. On the one hand,
content directly, e.g., author of a piece of content such they introduce noises to the system; on the other hand it can
as Jeremy’s Blog and Clay Shirky. increase recall. The right thing to do is to allow the users to
4. Subjective tags: Tags that express user’s opinion and use whatever form they like but to collapse the variances to
emotion, e.g., funny or cool. an internal canonical representation.

5. Organizational tags: Tags that identify personal stuff, Exclusion of certain types of tags. For example,
e.g., my paper or my work, and tags that serve as personally used organizational tags are less likely to be
a reminder of certain tasks such as to-read or shared by different users. Thus, they should be excluded
to-review. This type of tags is usually not useful for from public usage. Rather than ignoring these tags, My
global tag aggregation with other user’s tags. Web 2.0 includes a feature that auto-completes tags as they
are being typed by matching the prefixes of the tags entered
Golder and Huberman have also discussed tag by the user before. This not only improves the usability of
categorization [3]. the system but also enables the convergence of tags.
3.2 Criteria for good tags Our criteria are based on study of tag usage by real users in
In a large scale tagging system like My Web 2.0, an object My Web 2.0. Figure 2 shows the rank of a tag versus the
is usually identified by a group of tags. A specific tag is number of URLs labeled by the tag in a log-log scale, which
efficient to identify an object but less useful for other demonstrates a Zipf-like distribution. The figure only shows
people to discover new objects. In contrast, a generic tag is a subset of data publicly shared by users. We excluded
useful for discovery but not effective to narrow down three system introduced tags, which are automatically
objects. Tagging an object with a good set of tags helps generated for Web objects imported from other services.
both discovery and recovery. We argue that a good tag Our data shows that people naturally select some popular
combination should have the following properties. and generic tags to label their interested Web objects. The
most popular tags include music, news, software, blog, rss,
High coverage of multiple facets. A good tag combination web, programming, and design. These tags are convenient
should include multiple facets of the tagged objects. For for users to recover and share with other users.
example, tags for a URL to a travel attraction site may
Figure 2. Tag popularity Figure 3. Distribution of the number of Web objects tagged
with the corresponding number of tags

overlap of the concepts identified by the suggested


Figure 3 shows the distribution of the number of tags versus tags.
the number of Web objects tagged with the corresponding
number of tags. From the figure, we can observe that 92% • S(t,o) --- Goodness measure (score) of the tag t to an
Web objects are labeled with equal or less than 5 tags, 79% object o. We use the sum of the authority scores of all
Web objects with equal or less than 3 tags. The figure users who have assigned tag t to the object o. In a
demonstrates that our least-effort criteria will be acceptable simple case where we assign uniform authority score of
by most users. 1.0 for every user.
• C(t) --- The coverage of tag t, defined as the number of
3.3 Collaborative Tag Suggestions different objects tagged by t with some dampening. In
Our tag suggestion algorithm takes the above criteria into practice, the goodness measure can be enhanced by
consideration. First, it favors tags that are used by a large accounting for the coverage of a tag. The wider the
number of people (with good reputation). Second, it aims coverage, the less specific the tag is to a given object.
to minimize the overlap of concepts among the suggested This is analogous to TF*IDF used in traditional
tags to allow for high coverage of multiple facets. Third, it information retrieval.
honors the high correlation among tags, e.g., if tags ajax
The basic idea of our algorithm is to iteratively select the
and javascript tend to be used together by most users
tags with the highest additional contribution measured by
for a given object, they should co-occur in our suggested
S(t,o) to the already selected tag set. S(t,o) is initialized to
tags. We first introduce some basic concepts and notations
the sum of the authority scores (of all users who have
before presenting our tag suggestion algorithm:
assigned tag t to object o) multiplied by the inverse of C(t).
• Ps(ti|tj;o) --- the probability that an object o is tagged In the remainder of the paper, we ignore C(t) for simplicity
with ti given it is already tagged with tj by the same of presentation. At each step, after a tag ti is selected, we
user. For the given object o, one way to measure such adjust the score for each remaining tag t’ as follows:
correlation between ti and tj is to divide the number of
people who have tagged o with both ti and tj by the • Penalize tag t’ by removing the redundant information,
number of people who have tagged it by tj. Our e.g., by subtracting Pa(t’|ti)*S(ti,o) from
S(t’,o), i.e.,
algorithm honors such correlation when suggesting
tags. S(t’,o) = S(t’,o) - Pa(t’|ti)*S(ti,o)

• Pa(ti|tj) --- the probability that any object is tagged This minimizes the overlap of the concepts identified
with ti, given it is already tagged with tj by any user. by the suggested tags.
Such correlation can be measured as the number of • Reward tag t’ if it co-occurs with the selected tag ti
people who have used both ti and tj over the number of when users tag object o.
people who have used with tj. This probability
indicates the overlap in terms of the concepts between S(t’,o) = S(t’,o) + Ps(t’|ti;O)*S(ti,o)
ti and tj.. To ensure that the suggested tags cover Since, a user is not likely to tag a given URL using tags
multiple facets, our algorithm attempts to minimize the that are syntactic variances, e.g., blogs, blogging,
and blog. This rewarding mechanism also improves Let a(u) be the authority score of a given user u. As we
the uniformity of the suggested tags. have mentioned before, the goodness measure of a (tag,
object) pair is the sum of the authority scores of all users
This simple principle ensures that the suggested tag
who have tagged the object with the tag, that is
combination has a good balance between coverage and
popularity. S (t , o) = ∑ a(u) (1)
The algorithm is summarized in Table 1. T is the set of tags u∈user ( t , o )

assigned to a given object o by all users. The algorithm Here user(t,o) denotes the set of users who have tagged a
suggests a pre-specified number of K tags for object o to given object o with the tag t.
users based on the tags in T. The suggested tags are stored
in R. One simple way to measure the authority of a user is to
Table 1. Basic Algorithm
assign authority score of the user according to the average
quality of this user’s tags (see Equation (2)).
R = {}; // result tag set
T = all the tags assigned to object o by all users; ∑ ∑ S (t , o)
o∈object ( u ), t ∈tag ( o , u )
X = a set of excluded tags a (u ) = (2)
K = pre-specified maximum number of suggested tags; ∑ | tag (o, u ) |
o∈object ( u )
T = T – X;
Compute S(t,o) for each t in T;
In Equation (2), object(u) is the set of objects tagged by the
While (T ≠ empty AND |R| < K) { user u, and tag(o, u) denotes the set of tags assigned to
object o by user u. Equation (2) measures the average
//find the tag with the highest additional contribution quality of a given user’s tags. The authority score a(u) can
ti∈T AND S(ti,o)≥S(tj,o) for tj∈T be computed via an iterative algorithm similar to HITs [7].
AND j≠i Initially, we can set the weight of each user to be the same,
e.g., 1.0.
//remove the chosen tag from T The above formula treats heavy users the same way as light
T=T-{ti}; users. It does not distinguish people who introduce original
tags from those who follow the steps of others. People who
//adjust the additional contribution of the remaining tags
introduce original and high quality tags should be assigned
foreach tag t’∈T {
S(t’,o)=S(t’,o)– higher authority than those who follow, and similarly for
Pa(t’|ti)*S(ti,o)+ people who are heavy users of the system. One way to
Ps(t’|ti;o)*S(ti,o); handle this is to give the user who introduces an original tag
} some bonus credit each time the tag is reinforced by another
//record the chosen tag user.
R = R ∪ {ti};
} If a tagging application also allows users to rate other users
or tagged objects as in many open rating systems [4][5], the
authority score from such open rating systems can be
Note that we have adopted a greedy approach to penalize incorporated into our collaborative tag suggestion
and reward the tag score because of its efficiency, which is algorithm.
important for dealing with Web-scale data. Other more
sophisticated algorithms are under investigation. 3.5 Content-based Tag Suggestions
In addition to using tags entered by the real end-users as a
3.4 Tag Spam Elimination source for tag suggestion, we can also suggest content-
As tagging becomes more and more popular, tag spam based (and context-based) tags based on analysis and
could become a serious problem. In order to combat tag classification of the tagged content and context. This not
spam, we introduce an authority score (or reputation score) only solves the cold start problem, but also increases the tag
for each user. The authority score measures how well each quality of those objects that are less popular.
user has tagged in the past. This can be modeled as a voting
One simple way to incorporate auto-generated tags is to
problem. Each time, a user votes correctly (consistent with
introduce a virtual user and assign an authority score to this
the majority of other users), the user gets a higher authority
user. The auto-generated tags are than attributed to this
score; the user gets a lower score with more bad votes.
virtual user. The algorithm described in Table 1 remains
Table 2. Suggested Tags for the URL http://wiki.osfoundation.org/bin/view/Projects/AjaxLibraries
Base case Pa Ps Pa AND Ps Pa AND Ps AND Syntactic
Variance Elimination
1 ajax, ajax, ajax, ajax, ajax,
2 javascript, library, javascript, javascript, javascript,
3 library, ajax library, programming, library, library,
4 ajax library, development, webdev, programming, programming,

5 development, javascript, Development ajax library, development,


6 programming, programming, reference, development, reference,
7 wedev, reference, library, webdev, webdev,
8 Reference webdev ajax library Reference Ajax library

intact. This mechanism allows us to incorporate multiple In the second case, we consider the penalty adjustment in
sources of tag suggestions under the same framework. the column labeled by Pa. In this case, javascript and
webdev are pushed down in the list. This is due to the
3.6 Tag Normalization relative big overlap between ajax and javascript and
Collapsing syntactic variances of the same term can fit in the overlap between ajax and webdev. In our system,
the same algorithmic framework, for instance, by Pa(javascript|ajax)=0.37, and Pa (webdev|ajax) =
computing the bi-grams (shingles of two characters [1]) of 0.22.
the tags in the currently chosen tag set C. To adjust the
In the third case (see the third column of Table 2), we
additional contribution of another tag, we compute the set
consider the rewarding mechanism without factoring in
of bi-grams (S) of the tag. The additional contribution of
penalties. As a result, the tags programming and
the tag can be computed by multiplying its current value

webdev are pulled higher up in the list due to high Ps


with the following factor, 1- |S C|/|S|. Other techniques
values, where Ps(programming|ajax)=0.31 and
for improving tag uniformity include stemming, edit
Ps(webdev|ajax)=0.26 respectively. Users who have
distance, thesauri, etc.
tagged ajax for the URL also tagged the URL with tags
programming or webdev.
3.7 Temporal Tags
Tags introduced are often time sensitive, e.g., due to recent The next experiment shows the results of the interaction
events such as Katrina, shifting user interests, or between the forces of penalty and reward. The results are
announcement of new products. In My Web 2.0 we have shown in the fourth column of Table 2. We observe that the
seen a lot of such tags like iTune and ajax. Thus, a joint force pulls the tag programming up but pushes the
higher weight can be assigned to more recent tags than tag ajax library down.
those introduced long time ago.
If we need to suggest four tags to users, these tags would be
ajax, javascript, library, and programming.
3.8 Adjustments We can see that this tag combination includes three fairly
Our algorithm considers a variety of factors simultaneously.
orthogonal facets; JavaScript, library, and programming.
Ideally, we would like to train our algorithm by adjusting
At the same time, it also honors the popular demand of
the parameters, e.g., by dampening tag coverage score, and
users to include ajax along with javascript.
(ii) by adding coefficients to the penalizing and rewarding
forces. What is interesting to speculate is that as an object is In the last column of Table 2, we show results with
being tagged by more people, the penalizing and rewarding syntactic variance elimination, which pushes the redundant
forces start to reflect more in the goodness measure. phrase ajax library to the bottom of our list. The
order of the tags being suggested is also meaningful. What
4. EXAMPLES is more important to note is the intricate balance between
To see how effective our algorithm is, we use the URL the forces of reward and penalty.
http://wiki.osfoundation.org/bin/view/Projects/AjaxLibrarie Table 3 shows more examples of tag suggestions for URLs
s (saved in My Web 2.0) as an example. We compare with variable popularity. We observe that the tags
several cases and show how the forces of penalty and suggested by our algorithm both have good facet mix and
reward interact. As a base case, we suggest tags by using are fairly indicative of the target objects.
the S score alone without penalty and reward adjustments.
The suggested tags are listed in the first column in Table 2.
Table 3. Tags suggested for URLs with varying popularity
URLs Suggested Tags
http://maps.yahoo.com/ maps, yahoo, directions, reference, map
http://www.php.net/ php, programming, opensource, php home page, development
http://sourceforge.net/ open source, download, applications, programming, projects
http://code.google.com/ google, api, code, opensource, programming
http://delicious.mozdev.org/ firefox, del.icio.us, extension, tags, tools
http://www.apple.com/ apple, mac, computer, ipod, itunes
http://azureus.sourceforge.net/ bittorrent, software, p2p, java, windows
http://blogs.law.harvard.edu/tech/rss rss, specification, xml, rss-learning, web design
http://eventful.com/ calendar, events, web2.0, community, tags
http://hymn-project.org/ itunes, ipod, aac, mp3, kickass
http://hype.non-standard.net/ music, mp3, blog, audio, aggregator
http://del.icio.us/ bookmark, del.icio.us, tagging, social, blog
http://digg.com/ digg, news, daily, aggregator, rss
http://en.wikipedia.org/wiki/Main_Page encyclopedia, reference, wiki, knowledge, research
http://johnvey.com/features/deliciousdirector/ del.icio.us, ajax, javascript, tools, xml
http://maps.google.com/ maps, google, satellite, directions, search
http://myweb2.search.yahoo.com/ my web, yahoo, bookmarks, search, beta
http://next.yahoo.com/ yahoo, betas, next, 1 varios technologia, search

• Develop metrics to quantitatively measure the quality


of suggested tags, and study how tag suggestion can
5. CONCLUSIONS help to facilitate convergence of tag vocabulary.
The pull model widely adopted by search engines uses
• Introduce automatically generated content-based tags
limited semantic information of Web content. This makes it
and also consider the time-sensitivity of tags. This
hard to personalize search results, detect spam, and
addresses the cold start problem as well as the
discover new or dynamic content. A push model that
evolution of concepts and user interests over time.
directly takes inputs from end users has the potential to
address these problems. Tagging allows users to assign • Improve tag uniformity by normalizing semantically
keywords to Web objects for sharing, discovering and similar tags that are not similar in letters. The bi-gram
recovering them. It allows ranking and data organization to method cannot achieve this. This would require
utilize metadata from individual users directly, and brings incorporating certain linguistic analysis features.
some benefits of semantic Web into the current HTML
dominated Web. • Using voting and existing tags alone may prevent new
high-quality tags from emerging. It subsequently can
Since tags are created by individual users in a free form, make content discovery harder. In practice, we can do
one important problem facing tagging is to identify most the following to avoid such limitation. (i) We could
appropriate tags, while eliminating noise and spam. We give new users bootstrapping time to establish their
advocate using the collective wisdom of the Web users to reputation. (ii) Rather than only relying on the tags
suggest tags for Web objects. We discussed the basic assigned to a given object, we should also consider the
criteria for a good tagging system and proposed a tags across similar objects identified by clustering. (iii)
collaborative algorithm for suggesting tags that meet these We should allow tags assigned with low score by the
criteria. Our preliminary experience shows that a simple algorithm to have opportunity to be judged by users.
embodiment of such an algorithm is effective. In the future, To do so, we can separate tags into buckets with
we plan to make the following improvements. different score ranges and display tags from each
bucket. Thus, we get user’s feedback on tags that are
identified by the algorithm as having low quality.
• Improve tag browsing experience by applying the same • We are in the process of incorporating the full
principles in constructing tag cloud, e.g., by presenting algorithm into My Web 2.0. Part of the challenge is to
tags with good facet mix while considering popularity handle Internet-scale data and Yahoo-scale users.
and user interests. At a high-level, we will investigate
how to bridge the gap between taxonomy and faceted
systems to get the best of both worlds.
6. ACKNOWLEDGMENTS [6] “Interview on tagging with Jon Lebkowsky and Clay
Many thanks to Caterina Fake, Hao Xu, Adrienne Basset, Tom Shirky.”http://adam.easyjournal.com/entry.aspx?eid=26324
Chi, Chung-Man Tam, Ken Norton, Nathan Arnold, Chad 26, July 28, 2005.
Norwood, and David Rout for many helpful discussions. [7] Kleinberg, J. “Authoritative sources in a hyperlinked
environment.” Proc. 9th ACM-SIAM Symposium on Discrete
7. REFERENCES Algorithms, 1998.
[1] Broder, A. Z. “On the resemblance and containment of [8] Shirky, C. “Ontology is Overrated: Categories, Links, and
documents.” In Proceedings of the Compression and Tags.” In Economics & Culture, Media & Community.
Complexity of Sequences, June, 1997. (http://www.shirky.com/writings/ontology_overrated.html),
2005.
[2] Dvorak, John C. “To Tag or Not To Tag, That Is the
Question.” PC Magazine, [9] Xu, Z., Karlsson, M., Tang, C., and Karamanolis C.
(http://www.pcmag.com/article2/0,1759,1819101,00.asp), “Towards a Semantic-Aware File Store.” 9th Workshop on
2005. Hot Topics in Operating Systems (HotOS IX). May 18-21,
2003.
[3] Golder, Scott A., Huberman, Bernardo A. “The Structure of
Collaborative Tagging Systems.” HPL Technical Report. [10] Delicious. http://del.icio.us/
2005. [11] Flickr. http://www.flickr.com/
[4] Guha, R. “Open Rating Systems.” Proceedings of the 1st [12] My Web 2.0. http://myweb2.search.yahoo.com/
workshop on Friends of a Friend, Social Networking and the
Semantic Web, 2004. [13] “OSAF wiki.Journal.HierarchyVersusFacets.”
http://wiki.osafoundation.org/bin/view/Journal/HierarchyVe
[5] Guha, R., Kumar R., Raghavan P., and Tomkins A. rsusFacetsVersusTags?skin=print, 2005.
“Propagation of trust and distrust.” In Proceedings of the
Thirteenth International World Wide Web Conference, 2004 [14] Rawsugar. http://www.rawsugar.com/
[15] Shadows. http://www.shadows.com/

You might also like