Keyword Extraction PDF
Keyword Extraction PDF
Keyphrases aid the exploration of text collections by communicating salient aspects of documents and are
often used to create effective visualizations of text. While prior work in HCI and visualization has proposed
a variety of ways of presenting keyphrases, less attention has been paid to selecting the best descriptive
terms. In this article, we investigate the statistical and linguistic properties of keyphrases chosen by human 19
judges and determine which features are most predictive of high-quality descriptive phrases. Based on 5,611
responses from 69 graduate students describing a corpus of dissertation abstracts, we analyze characteristics
of human-generated keyphrases, including phrase length, commonness, position, and part of speech. Next,
we systematically assess the contribution of each feature within statistical models of keyphrase quality.
We then introduce a method for grouping similar terms and varying the specificity of displayed phrases so
that applications can select phrases dynamically based on the available screen space and current context
of interaction. Precision-recall measures find that our technique generates keyphrases that match those
selected by human judges. Crowdsourced ratings of tag cloud visualizations rank our approach above other
automatic techniques. Finally, we discuss the role of HCI methods in developing new algorithmic techniques
suitable for user-facing applications.
Categories and Subject Descriptors: H.1.2 [Models and Principles]: User/Machine Systems
General Terms: Human Factors
Additional Key Words and Phrases: Keyphrases, visualization, interaction, text summarization
ACM Reference Format:
Chuang, J., Manning, C. D., and Heer, J. 2012. “Without the clutter of unimportant words”: Descriptive
keyphrases for text visualization. ACM Trans. Comput.-Hum. Interact. 19, 3, Article 19 (October 2012), 29
pages.
DOI = 10.1145/2362364.2362367 http://doi.acm.org/10.1145/2362364.2362367
1. INTRODUCTION
Document collections, from academic publications to blog posts, provide rich sources
of information. People explore these collections to understand their contents, uncover
patterns, or find documents matching an information need. Keywords (or keyphrases)
aid exploration by providing summary information intended to communicate salient
aspects of one or more documents. Keyphrase selection is critical to effective visualiza-
tion and interaction, including automatically labeling documents, clusters, or themes
[Havre et al. 2000; Hearst 2009]; choosing salient terms for tag clouds or other text
visualization techniques [Collins et al. 2009; Viégas et al. 2006, 2009]; or summarizing
text to support small display devices [Yang and Wang 2003; Buyukkokten et al. 2000,
This work is part of the Mimir Project conducted at Stanford University by Daniel McFarland, Dan Jurafsky,
Christopher Manning, and Walter Powell. This project is supported by the Office of the President at Stanford
University, the National Science Foundation under Grant No. 0835614, and the Boeing Company.
Authors’ addresses: J. Chuang, C. D. Manning, and J. Heer, 353 Serra Mall, Stanford, CA 94305;
emails: {jcchuang, manning, jheer}@cs.stanford.edu.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or [email protected].
c 2012 ACM 1073-0516/2012/10-ART19 $15.00
DOI 10.1145/2362364.2362367 http://doi.acm.org/10.1145/2362364.2362367
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:2 J. Chuang et al.
2002]. While terms hand-selected by people are considered the gold standard, manually
assigning keyphrases to thousands of documents simply does not scale.
To aid document understanding, keyphrase extraction algorithms select descriptive
phrases from text. A common method is bag-of-words frequency statistics [Laver et al.
2003; Monroe et al. 2008; Rayson and Garside 2000; Robertson et al. 1981; Salton and
Buckley 1988]. However, such measures may not be suitable for short texts [Boguraev
and Kennedy 1999] and typically return single words, rather than more meaning-
ful longer phrases [Turney 2000]. While others have proposed methods for extracting
longer phrases [Barker and Cornacchia 2000; Dunning 1993; Evans et al. 2000; Hulth
2003; Kim et al. 2010; Medelyan and Witten 2006], researchers have yet to systemat-
ically evaluate the contribution of individual features predictive of keyphrase quality
and often rely on assumptions—such as the presence of a reference corpus or knowledge
of document structure—that are not universally applicable.
In this article, we characterize the statistical and linguistic properties of human-
generated keyphrases. Our analysis is based on 5,611 responses from 69 students de-
scribing Ph.D. dissertation abstracts. We use our results to develop a two-stage method
for automatic keyphrase extraction. We first apply a regression model to score candi-
date keyphrases independently; we then group similar terms to reduce redundancy
and control the specificity of selected phrases. Through this research, we investigate
the following concerns.
Reference Corpora. HCI researchers work with text from various sources, including
data whose domain is unspecified or in which a domain-specific reference corpus is
unavailable. We examine several frequency statistics and assess the trade-offs of se-
lecting keyphrases with and without a reference corpus. While models trained on a
specific domain can generate higher-quality phrases, models incorporating language-
level statistics in lieu of a domain-specific reference corpus produce competitive results.
Document Diversity. Interactive systems may need to show keyphrases for a col-
lection of documents. We compare descriptions of single documents and of multiple
documents with varying levels of topical diversity. We find that increasing the size or
diversity of a collection reduces the length and specificity of selected phrases.
Feature Complexity. Many existing tools select keyphrases solely using raw term
counts or tf.idf scores [Salton and Buckley 1988], while recent work [Collins et al. 2009;
Monroe et al. 2008] advocates more advanced measures, such as G2 statistics [Dunning
1993; Rayson and Garside 2000]. We find that raw counts or tf.idf alone provide poor
summaries but that a simple combination of raw counts and a term’s language-level
commonness matches the improved accuracy of more sophisticated statistics. We also
examine the impact of features such as grammar and position information; for example,
we find that part-of-speech tagging provides significant benefits over which more costly
statistical parsing provides little improvement.
Term Similarity and Specificity. Multiword phrases identified by an extraction al-
gorithm may contain overlapping terms or reference the same entity (person, place,
etc). We present a method for grouping related terms and reducing redundancy. The
resulting organization enables users to vary the specificity of displayed terms and al-
lows applications to dynamically select terms in response to available screen space.
For example, a keyphrase label might grow longer and more specific through semantic
zooming.
We assess our resulting extraction approach by comparing automatically and manu-
ally selected phrases and via crowdsourced ratings. We find that the precision and recall
of candidate keyphrases chosen by our model can match that of phrases hand-selected
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:3
by human readers. We also apply our approach to tag clouds as an example of real-world
presentation of keyphrases. We asked human judges to rate the quality of tag clouds
using phrases selected by our technique and unigrams selected using G2 . We find that
raters prefer the tag clouds generated by our method and identify other factors such
as layout and prominent errors that affect judgments of keyphrase quality. Finally, we
conclude the article by discussing the implications of our research for human-computer
interaction, information visualization, and natural language processing.
2. RELATED WORK
Our research is informed by prior work in two surprisingly disjoint domains: (1) text
visualization and interaction and (2) automatic keyphrase extraction.
2.1. Text Visualization and Interaction
Many text visualization systems use descriptive keyphrases to summarize text or label
abstract representations of documents [Cao et al. 2010; Collins et al. 2009; Cui et al.
2010; Havre et al. 2000; Hearst 2009; Shi et al. 2010; Viégas et al. 2006, 2009]. One
popular way of representing a document is as a tag cloud, that is, a list of descriptive
words typically sized by raw term frequency. Various interaction techniques summarize
documents as descriptive headers for efficient browsing on mobile devices [Buyukkok-
ten et al. 2000, 2002; Yang and Wang 2003]. While HCI researchers have developed
methods to improve the layout of terms [Cui et al. 2010; Viégas et al. 2009], they have
paid less attention to methods for selecting the best descriptive terms.
Visualizations including Themail [Viégas et al. 2006] and TIARA [Shi et al. 2010]
display terms selected using variants of tf.idf (term frequency by inverse document
frequency [Salton and Buckley 1988])—a weighting scheme for information retrieval.
Rarely are more sophisticated methods from computational linguistics used. One excep-
tion is Parallel Tag Clouds [Collins et al. 2009], which weight terms using G2 [Dunning
1993], a probabilistic measure of the significance of a document term with respect to a
reference corpus.
Other systems, including Jigsaw [Stasko et al. 2008] and FacetAtlas [Cao et al. 2010],
identify salient terms by extracting named entities, such as people, places, and dates
[Finkel et al. 2005]. These systems extract specific types of structured data but may
miss other descriptive phrases. In this article, we first score phrases independent of
their status as entities but later apply entity recognition to group similar terms and
reduce redundancy.
2.2. Automatic Keyphrase Extraction
As previously indicated, the most common means of selecting descriptive terms is via
bag-of-words frequency statistics of single words (unigrams). Researchers in natural
language processing have developed various techniques to improve upon raw term
counts, including removal of frequent “stop words,” weighting by inverse document
frequency as in tf.idf [Salton and Buckley 1988] and BM25 [Robertson et al. 1981],
heuristics such as WordScore [Laver et al. 2003], or probabilistic measures [Kit and
Liu 2008; Rayson and Garside 2000] and the variance-weighted log-odds ratio [Monroe
et al. 2008]. While unigram statistics are popular in practice, there are two causes for
concern.
First, statistics designed for document retrieval weight terms in a manner that
improves search effectiveness, and it is unclear whether the same terms provide good
summaries for document understanding [Boguraev and Kennedy 1999; Collins et al.
2009]. For decades, researchers have anecdotally noted that the best descriptive terms
are often neither the most frequent nor infrequent terms, but rather mid-frequency
terms [Luhn 1958]. In addition, frequency statistics often require a large reference
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:4 J. Chuang et al.
corpus and may not work well for short texts [Boguraev and Kennedy 1999]. As a result,
it is unclear which existing frequency statistics are best suited for keyphrase extraction.
Second, the set of good descriptive terms usually includes multiword phrases as well
as single words. In a survey of journals, Turney [2000] found that unigrams account
for only a small fraction of human-assigned index terms. To allow for longer phrases,
Dunning proposed modeling words as binomial distributions using G2 statistics to
identify domain-specific bigrams (two-word phrases) [Dunning 1993]. Systems such as
KEA++ or Maui use pseudo-phrases (phrases that remove stop words and ignore word
ordering) for extracting longer phrases [Medelyan and Witten 2006]. Hulth considered
all trigrams (phrases up to length of three words) in her algorithm [2003]. While the
inclusion of longer phrases may allow for more expressive keyphrases, systems that per-
mit longer phrases can suffer from poor precision and meaningless terms. The inclusion
of longer phrases may also result in redundant terms of varied specificity [Evans et al.
2000], such as “visualization,” “data visualization,” and “interactive data visualization.”
Researchers have taken several approaches to ensure that longer keyphrases are
meaningful and that phrases of the appropriate specificity are chosen. Many ap-
proaches [Barker and Cornacchia 2000; Daille et al. 1994; Evans et al. 2000; Hulth
2003] filter candidate keyphrases by identifying noun phrases using a part-of-speech
tagger or a parser. Of note is the use of so-called technical terms [Justeson and Katz
1995] that match regular expression patterns over part-of-speech tags. To reduce redun-
dancy, Barker and Cornacchia [2000] choose the most specific keyphrase by eliminating
any phrases that are a subphrase of another. Medelyan and Witten’s KEA++ system
[2006] trains a naı̈ve Bayes classifier to match keyphrases produced by professional
indexers. However, all existing methods produce a static list of keyphrases and do not
account for task- or application-specific requirements.
Recently, the Semantic Evaluation (SemEval) workshop [Kim et al. 2010] held a
contest comparing the performance of 21 keyphrase extraction algorithms over a corpus
of ACM Digital Library articles. The winning entry, named HUMB [Lopez and Romary
2010], ranks terms using bagged decision trees learned from a combination of features,
including frequency statistics, position in a document, and the presence of terms in
ontologies (e.g., MeSH, WordNet) or in anchor text in Wikipedia. Moreover, HUMB
explicitly models the structure of the document to preferentially weight the abstract,
introduction, conclusion, and section titles. The system is designed for scientific articles
and intended to provide keyphrases for indexing digital libraries.
The aims of our current research are different. Unlike prior work, we seek to system-
atically evaluate the contributions of individual features to keyphrase quality, allowing
system designers to make informed decisions about the trade-offs of adding potentially
costly or domain-limiting features. We have a particular interest in developing methods
that are easy to implement, computationally efficient, and make minimal assumptions
about input documents.
Second, our primary goal is to improve the design of text visualization and interaction
techniques, not the indexing of digital libraries. This orientation has led us to develop
techniques for improving the quality of extracted keyphrases as a whole, rather than
just scoring terms in isolation (cf., [Barker and Cornacchia 2000; Turney 2000]). We
propose methods for grouping related phrases that reduce redundancy and enable
applications to dynamically tailor the specificity of keyphrases. We also evaluate our
approach in the context of text visualization.
3. CHARACTERIZING HUMAN-GENERATED KEYPHRASES
To better understand how people choose descriptive keyphrases, we compiled a corpus
of phrases manually chosen by expert and non-expert readers. We analyzed this corpus
to assess how various statistical and linguistic features contribute to keyphrase quality.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:5
3.1.2. Independent Factors. We varied the follwing three independent factors in the user
study.
Familiarity. We considered a subject familiar with a topic if they had conducted
research in the same discipline as the presented text. We relied on self-reports to
determine subjects’ familiarity.
Document count. Participants were asked to summarize the content of either a single
document or three documents as a group. In the case of multiple documents, we used
three dissertations supervised by the same primary advisor.
Topic diversity. We measured the similarity between two documents using the cosine
of the angle between tf.idf term vectors. Our experimental setup provided sets of three
documents with either low or high topical similarity.
3.1.3. Dependent Statistical and Linguistic Features. To analyze responses, we computed
the following features for the documents and subject-authored keyphrases. We use
“term” and “phrase” interchangeably. Term length refers to the number of words in a
phrase; an n-gram is a phrase consisting of n words.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:6 J. Chuang et al.
Documents are the texts we showed to subjects, while responses are the provided
summary keyphrases. We tokenize text based on the Penn Treebank standard [Marcus
et al. 1993] and extract all terms of up to length five. We record the position of each
phrase in the document as well as whether or not a phrase occurs in the first sen-
tence. Stems are the roots of words with inflectional suffixes removed. We apply light
stemming [Minnen et al. 2001] which removes only noun and verb inflections (such as
plural s) according to a word’s part of speech. Stemming allows us to group variants of
a term when counting frequencies.
Term frequency (tf ) is the number of times a phrase occurs in the document (docu-
ment term frequency), in the full dissertation corpus (corpus term frequency), or in all
English webpages (Web term frequency), as indicated by the Google Web n-gram corpus
[Brants and Franz 2006]. We define term commonness as the normalized term fre-
quency relative to the most frequent n-gram, either in the dissertation corpus or on the
Web. For example, the commonness of a unigram equals log(tf )/ log(tf the ), where tf the
is the frequency of “the”—the most frequent unigram. When distinctions are needed,
we refer to the former as corpus commonness and the latter as Web commonness.
Term position is a normalized measure of a term’s location in a document; 0 corre-
sponds to the first word and 1 to the last. The absolute first occurrence is the minimum
position of a term (cf., [Medelyan and Witten 2006]). However, frequent terms are
more likely to appear earlier due to higher rates of occurrence. We introduce a new
feature—the relative first occurrence—to factor out the correlation between position and
frequency. Relative first occurrence (formally defined in Section 4.3.1) is the probability
that a term’s first occurrence is lower than that of a randomly sampled term with the
same frequency. This measure makes a simplistic assumption—that term positions
are uniformly distributed—but allows us to assess term position as an independent
feature.
We annotate terms that are noun phrases, verb phrases, or match technical term pat-
terns [Justeson and Katz 1995] (see Table I). Part-of-speech information is determined
using the Stanford POS Tagger [Toutanova et al. 2003]. We additionally determine
grammatical information using the Stanford Parser [Klein and Manning 2003] and
annotate the corresponding words in each sentence.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:7
75%
Single Doc
50%
25%
0%
5 6 7 8 9 10 11 12 13 14 15 16
75%
Multiple Docs
50%
25%
0%
5 6 7 8 9 10 11 12 13 14 15 16
75%
Diverse Docs
50%
25%
0%
5 6 7 8 9 10 11 12 13 14 15 16
Number of Keyphrases
Fig. 1. How many keyphrases do people use? Participants use fewer keyphrases to describe multiple docu-
ments or documents with diverse topics, despite the increase in the amount of text and topics.
50%
40%
Single Doc
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10
50%
Multiple Docs
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10
50%
Diverse Docs
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10
Phrase Length
Fig. 2. Do people use words or phrases? Bigrams are the most common. For single documents, 75% of
responses contain multiple words. Unigram use increases with the number and diversity of documents.
decrease in the use of trigrams and longer terms. The prevalence of bigrams confirm
prior work [Turney 2000]. By permitting users to enter any response, our results
provide additional data on the tail end of the distribution: there is minimal gain when
assessing the quality of phrases longer than five words, which account for <5% of
responses.
Figure 3 shows the distribution of responses as a function of Web commonness. We
observe a bell-shaped distribution centered around mid-frequency, consistent with the
distribution of significant words posited by Luhn [1958]. As the number of documents
and topic diversity increases, the distribution shifts toward more common terms. We
found similar correlations for corpus commonness.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:8 J. Chuang et al.
30%
Single Doc
20%
10%
0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
20%
10%
0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Diverse Docs
30%
20%
10%
0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Term Web Commonness
Fig. 3. Do people use generic or specific terms? Term commonness increases with the number and diversity
of documents.
For each user-generated keyphrase, we find matching text in the reading and note
that 65% of the responses are present in the document. Considering for the rest of this
paragraph just the two-thirds of keyphrases present in the readings, the associated
positional and grammatical properties of this subset are summarized in Table II. 22%
of keyphrases occur in the first sentence, even though first sentences contain only
9% of all terms. Comparing the first occurrence of keyphrases with that of randomly
sampled phrases of the same frequency, we find that keyphrases occur earlier 56%
of the time—a statistically significant result (χ 2 (1) = 88, p < 0.001). Nearly two-
thirds of keyphrases found in the document are part of a noun phrase (i.e., continuous
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:9
subsequence fully contained in the phrase). Only 7% are part of a verb phrase, though
this is still statistically significant (χ 2 (1) = 147,000, p < 0.001). Most strikingly, over
80% of the keyphrases are part of a technical term.
In summary, our exploratory analysis shows that subjects primarily choose multi-
word phrases, prefer terms with medium commonness, and largely use phrases already
present in a document. Moreover, these features shift as the number and diversity of
documents increases. Keyphrase selection also correlates with term position, suggest-
ing we should treat documents as more than just “bags of words.” Finally, human-
selected keyphrases show recurring grammatical patterns, indicating the utility of
linguistic features.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:10 J. Chuang et al.
We also assessed each model using model selection criteria (i.e., AIC, BIC). As these
scores coincide with the rankings from precision-recall measures, we omit them.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:11
0.8 0.8
G2 log tf + All Commonness
Weighted Log -Odds Ratio log tf + Corpus Com
0.7 0.7
BM25 log tf + Web Com
tf.idf (1 -grams) G2
0.6 0.6
tf.idf (hierarchical) log tf
log tf
0.5 0.5
tf.idf (2 -grams)
Precision
Precision
tf.idf (3 -grams)
0.4 0.4
tf.idf (5 -grams)
WordScore
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Recall Recall
(a) Frequency statistics. (b) Adding term commonness.
0.8 0.8
log tf + Com + All Grammar Best -Performing Model
log tf + Com + Tagger Corpus -Independent Model
0.7 0.7
log tf + Com + Parser log tf + All Commonness
log tf + All Commonness G2
0.6 0.6
G2 log tf
log tf
0.5 0.5
Precision
Precision
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Recall Recall
(c) Adding grammatical features. (d) Adding positional features.
Fig. 4. Precision-recall curves for keyphrase regression models. Legends are sorted by decreasing initial
precision. (a) Frequency statistics only; G2 and log-odds ratio perform well. (b) Adding term commonness;
a simple combination of log(tf ) and commonness performs competitively to G2 . (c) Grammatical features
improve performance. (d) Positional features provide further gains for both a complete model and a simplified
corpus-independent model.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:12 J. Chuang et al.
for position x ∈ [0, 1] and some normalization constant η. Suppose a term w occurs k
times in the document and its first occurrence is observed to be at position a ∈ [0, 1].
Its relative first occurrence is the cumulative probability distribution from a to 1.
1 1
k
Relative first occurrence of w = min P(wi ) = η (1 − x )k−1 dx = (1 − a)k .
a i=1 a
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:13
0.8 0.5
Best -Performing Model SemEval Maximum
Corpus -Independent Model SemEval Median
0.7
Humans Corpus -Independent Model
0.4 SemEval Minimum
0.6
0.5
0.3
Precision
Precision
0.4
0.3 0.2
0.2
0.1
0.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5
Recall Recall
(a) Comparison with human-selected phrases. (b) Comparison with SemEval 2010.
Fig. 5. Precision-recall curves for keyphrase regression models. Legends are sorted by decreasing initial
precision. (a) Comparison with human-selected keyphrases; our models provide higher precision at low
recall values. (b) Comparison with SemEval 2010 [Kim et al. 2010] results for 5, 10, and 15 phrases; our
corpus-independent model closely matches the median scores.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:14 J. Chuang et al.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:15
spend 10–15 minutes per paper generating keyphrases. For each class, precision and
recall were computed for the top 5, 10, and 15 keyphrases.
We used this same data to evaluate the performance of our corpus-independent
modeling approach trained on the SemEval corpus. The coefficients of our SemEval
model differ slightly from those of our Stanford dissertations model (Table V), but the
relative feature weightings remain similar, including a preference for mid-commonness
terms, a strong negative weight for high commonness, and strong weights for technical
term patterns.
Figure 5(b) compares our precision-recall scores against the distribution of
SemEval results for the combined author- and reader-assigned keyphrases. Our
corpus-independent model closely matches the median scores. Though intentionally
simplified, our approach matches or outperforms half of the contest entries. This
outcome is perhaps surprising, as competing techniques include more assumptions
and complex features (e.g., leveraging document structure and external ontologies)
and more sophisticated learning algorithms (e.g., bagged decision trees vs. logistic
regression). We believe these results argue in favor of our identified features.
4.4.3. Lexical Variation and Relaxed Matching. While we are encouraged by the results of
our precision-recall analysis, some skepticism is warranted. Up to this point, our anal-
ysis has concerned only exact matches of stemmed terms. In practice, it is reasonable
to expect that both people and algorithms will select keyphrases that do not match ex-
actly but are lexically and/or conceptually similar (e.g., “analysis” vs. “data analysis”).
How might the results change if we permit a more relaxed matching?
To gain a better sense of lexical variation among keyphrases, we analyzed the impact
of a relaxed matching scheme. We experimented with a number of matching approaches
by permitting insertion or removal of terms in phrases or re-arrangement of terms in
genitive phrases. For brevity, we report on just one simple but effective strategy: we
consider two phrases “matching” if they either match exactly or if one can induce an
exact match by adding a single word to either the beginning or the end of the shorter
phrase.
Permitting relaxed matching significantly raises the proportion of automatically ex-
tracted keyphrases that match human-selected terms. Considering just the top-ranked
term produced by our model for each document in the SemEval contest, 30.0% are exact
matches, while 75.0% are relaxed matches. Looking at the top five terms per document,
27.4% exactly match a human-selected term, permitting a relaxed match increases this
number to 64.2%. These results indicate that human-selected terms regularly differ
from our automatically extracted terms by a single leading or trailing word. This obser-
vation suggests that (a) precision-recall analysis may not reveal the whole picture and
(b) related keyphrases might vary in length but still provide useful descriptions. We
now build upon this insight to provide means for parameterizing keyphrase selection.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:16 J. Chuang et al.
Fig. 6. Term grouping. The graph shows a subset of unigrams, bigrams, and trigrams considered to be
conceptually similar by our algorithm. Connected terms differ by exactly one word at the start or the end
of the longer phrase. Values in parentheses are the scores from our simplified model for the dissertation
“Visualizing Route Maps.” By default, our algorithm displays the keyphrase “route map” and suppresses
“route”, “map”, and “hand-designed route maps”. Users may choose to display a shorter word (“map”) or
longer phrase (“hand-designed route map”) to describe this document.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:17
Fig. 7. Term grouping for named entities and acronyms. The graph shows typed edges that embed additional
relationships between terms in a document about President Obama. Black edges represent basic term
grouping based on string similarity. Bold blue edges represent people: terms that share a common trailing
substring and are tagged as “person” by a named entity recognition algorithm. By default, our algorithm
displays “Obama” to summarize the text. Users may choose to show a longer phrase “President Obama”
or display a longer and more specific description “President Barack Obama” by shifting the scores along
the typed edges. Users may also apply type-specific operations, such as showing the longest name without
honorifics, “Barack H. Obama.”
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:18 J. Chuang et al.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:19
Note: Top 25 keyphrases for an open letter from Adobe about Flash technologies.
We apply redundancy reduction to both lists.
or text visualization; we hypothesize that visual features such as layout, sizing, term
proximity, and other aesthetics are likely to affect the perceived utility of and prefer-
ences for keyphrases in real-world applications. Tag clouds are a popular form used by
a diverse set of people [Viégas et al. 2009]. Presenting selected terms in a simple list
would fail to reveal the impact of these effects. Second, keyphrases are often displayed
in aggregate; we hypothesize that the perceived quality of a collective set of keyphrases
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:20 J. Chuang et al.
differs from that of evaluating each term independently. Tag clouds encourage readers
to assess the quality of keyphrases as a whole.
Parallel Tag Clouds [Collins et al. 2009] use unigrams weighted by G2 for text ana-
lytics, making G2 statistics an interesting and ecologically valid comparison point. We
hypothesized that tag clouds created using our technique would be preferred due to
more descriptive terms and complete phrases. We also considered variable-length G2
that includes phrases up to 5-grams. Upon inspection, many of the bigrams (e.g., “more
about”, “anyone can”) and the majority of trigrams and longer phrases selected by G2
statistics are irrelevant to the document content. We excluded the results from the
study, as they were trivially uncompetitive. Including only unigrams results in shorter
terms, which may lead to a more densely-packed layout (this is another reason that we
chose to compare to G2 unigrams).
7.1. Method
We asked subjects to read a short text passage and write a 1–2 sentence summary.
Subjects then viewed two tag clouds and were asked to rate which they preferred on
a 5-point scale (with 3 indicating a tie) and provide a brief rationale for their choice.
We asked raters to “consider to what degree the tag clouds use appropriate words,
avoid unhelpful or unnecessary terms, and communicate the gist of the text.” One tag
cloud consisted of unigrams with term weights calculated using G2 ; the other contained
keyphrases selected using our corpus-independent model with redundancy reduction
and with the default preferred length. We weighted our terms by their regression
score: the linear combination of features used as input to the logistic function. Each
tag cloud contained the top 50 terms, with font sizes proportional to the square root
of the term weight. Occasionally our method selected less than 50 terms with positive
weights; we omitted negatively weighted terms. Tag cloud images were generated by
Wordle [Viégas et al. 2009] using the same layout and color parameters for each. We
randomized the presentation order of the tag clouds.
We included tag clouds of 24 text documents. To sample a variety of genres, we used
documents in four categories: CHI 2010 paper abstracts, short biographies (three U.S.
presidents, three musicians), blog posts (two each from opinion, travel, and photogra-
phy blogs), and news articles. Figure 8 shows tag clouds from a biography of the singer
Lady Gaga; Figures 9 and 10 show two other clouds used in our study.
We conducted our study using Amazon’s Mechanical Turk (cf., [Heer and Bostock
2010]). Each trial was posted as a task with a US$0.10 reward. We requested 24
assignments per task, resulting in 576 ratings. Upon completion, we tallied the ratings
for each tag cloud and coded free-text responses with the criteria invoked by raters’
rationales.
7.2. Results
On average, raters significantly preferred tag clouds generated using our keyphrase
extraction approach (267 ratings vs. 208 for G2 and 101 ties; χ 2 (2) = 73.76, p < 0.0001).
Moreover, our technique garnered more strong ratings: 49% (132/267) of positive ratings
were rated as “MUCH better,” compared to 38% (80/208) for G2 .
Looking at raters’ rationales, we find that 70% of responses in favor of our technique
cite the improved saliency of descriptive terms, compared to 40% of ratings in favor of
G2 . More specifically, 12% of positive responses note the presence of terms with mul-
tiple words (“It’s better to have the words ‘Adobe Flash’ and ‘Flash Player’ together”),
while 13% cite the use of fewer, unnecessary terms (“This is how tag clouds should
be presented, without the clutter of unimportant words”). On the other hand, some
(16/208, 8%) rewarded G2 for showing more terms (“Tag cloud 2 is better since it has
more words used in the text.”).
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:21
Fig. 8. Tag cloud visualizations of an online biography of the pop singer Lady Gaga. (top) Single-word
phrases (unigrams) weighted using G2 . (bottom) Multiword phrases, including significant places and song
titles, selected using our corpus-independent model.
Tag clouds in both conditions were sometimes preferred due to visual features, such
as layout, shape, and density: 29% (60/208) for G2 and 23% (61/267) for our technique.
While visual features were often mentioned in conjunction with remarks about term
saliency, G2 led to more ratings (23% vs. 14%) that mentioned only visual features
(“One word that is way bigger than the rest will give a focal point . . . it is best if that
word is short and in the center”).
The study results also reveal limitations of our keyphrase extraction technique.
While our approach was rated superior for abstracts, biographies, and blog posts, on
average, G2 fared better for news articles. In one case, this was due to layout issues (a
majority of raters preferred the central placement of the primary term in the G2 cloud),
but others specifically cite the quality of the chosen keyphrases. In an article about
racial discrimination in online purchasing, our technique disregarded the term “black”
due to its commonness and adjective part-of-speech. The tendency of our technique to
give higher scores to people names non-central to the text at times led raters to prefer
G2 . In general, prominent mistakes or omissions by either technique were critically
cited.
Unsurprisingly, our technique was preferred by the largest margin for research paper
abstracts, the domain closest to our training data. This observation suggests that
applying our modeling methodology to human-selected keyphrases from other text
genres may result in better selections. Our study also suggests that we might improve
our keyphrase weighting by better handling named entities, so as to avoid giving high
scores to non-central actors. Confirming our hypothesis, layout affects tag cloud ratings.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:22 J. Chuang et al.
Fig. 9. Tag clouds for a research paper on chart perception. (top) Unigrams weighted using G2 . (bottom)
Multiword phrases selected by our method.
The ability to dynamically adjust keyphrase length, however, can produce alternative
terms and may allow users to generate tag clouds with better spatial properties.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:23
Fig. 10. Tag clouds for a travel article. (top) Unigrams weighted using G2 . (bottom) Multiword phrases
selected by our method.
parameterize phrase length. Our grouping approach (§5) provides a means of parame-
terizing selection while preserving descriptive quality.
Choice of frequency statistics. In our studies, probabilistic measures such as G2 sig-
nificantly outperformed common techniques, such as raw term frequency and tf.idf.
Moreover, a simple linear combination of log term frequency and Web commonness
matches the performance of G2 without the need of a domain-specific reference cor-
pus. We advocate using these higher-performing frequency statistics when identifying
descriptive keyphrases.
Grammar and position. At the cost of additional implementation effort, our results
show that keyphrase quality can be further improved through the addition of gram-
matical annotations (specifically, technical term pattern matching using part-of-speech
tags) and positional information. The inclusion of these additional features can improve
the choice of keyphrases. More computationally costly statistical parsing provides little
additional benefit.
Keyphrase selection. When viewed as a set, keyphrases may overlap or reference
the same entity. Our results show how text visualizations might make better use of
screen space by identifying related terms (including named entities and acronyms) and
reducing redundancy. Interactive systems might leverage these groupings to enable
dynamic keyphrase selection based on term length or specificity.
Potential effects of layout and collective accuracy. Our study comparing tag cloud
designs provides examples suggesting that layout decisions (e.g., central placement of
the largest term) and collective accuracy (e.g., prominent errors) impact user judgments
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:24 J. Chuang et al.
of keyphrase quality. Our results do not provide definitive insights but suggest that
further studies on the spatial organization of terms may yield insights for more effective
layout and that keyphrase quality should not be assessed in isolation.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Ahem Alice Alice Alice Alice Alice Alice , wo n't Alice accident
Alice Alice cramped , wo n't you beasts Alice
arms arm answer arches arches
archbishop angry tone adventures blasts
back somersault axes butter arm arm atom
birds apples beak cauldron clean cup Cheshire Cat cheap sort Alice cook busy farm yard
clock
Caucus arm butterfly Cheshire confused poor Alice cook tulip roots chin Beau court curious dream
Caucus race barrowful Cheshire Cat crumbs courtiers croquet ground Change lobster courtroom dream
Bill Caterpillar CHORUS curiosity crimson velvet cushion digging cheeks evidence
chorus chin Dormouse
bottle creature cushion Croquet Ground Drawling master creatures flurry
comfits chrysalis first witness
bright brass plate croquet curious croquet ground crumbs Giant Alice upsets
Conqueror conversation Dormouse Duchess frontispiece
cakes dish do n't dance goldfish
conquest Doth doze Everybody great wig
chimney door elbows doubled up soldiers delightful thing grave voice
Dinah eel executions Gryphon
Dinah Elsie Duchess evening great dismay
eggs Duchess flamingo guinea
Dodo door encouraging tone ears eyes Gryphon
Father ear extraordinary ways
Hare
draggled feathers DRINK ME elbow Gryphon flapper haste
Eaglet Father William extraordinary noise eyes face Hatter head
Duchess feeble voice hers forepaws
Eaglet bent good height Fish Footman Hearts head downwards
elbow fellows flamingo hopeful tone
earls goose Footman funny watch Gryphon indignant voice hers
fan gray locks large mustard mine
Edgar Atheling Frog Footman great disgust gardeners Gryphon singing judge hurry
feet lessons
Edwin hookah Frog servants Hare grand procession Majesty Lobster jurors jury
ferrets
energetic remedies jaw great crash head Lobster primping jury jury box
foolish Alice Hatter Mock
frog limbs great letter Hearts jury men
gloves Little Busy Bee Lacie Lobster Quadrille jurymen
frowning grin hedgehog Mock Turtle King King
head little queer large arm chair mad things
Fury Hare little bat mustard Knave
housemaid melancholy voice Idiot Mock Knave
immediate adoption head little hot tea not large dish
hunting minute King knee
important air hearth little shriek old conger eel last word two
insolence
hurry mouth little sisters Knave Mock Turtle little Alice
lesson books invitation many tea things pause Majesty
Long Tale mushroom large rose tree mouths Lizard
lessons kettle paws March Hare
neck March Hare little dears offended tone Majesty
Lory little magic bottle knuckles pepper next witness
mournful tone Majesty oop melancholy way
not
Descriptive Keyphrases for Text Visualization
Fig. 11. Parallel tag cloud using our keyphrase extraction algorithm as the underlying text processing step. The columns contain the top 50 keyphrases
(without redundancy reduction) in chapters 3 through 12 of Lewis Carroll’s Alice’s Adventures in Wonderland. Longer phrases enable display of entities,
such as “Cheshire Cat” and “Lobster Quadrille”, that might be more salient to a reader than unigrams alone. Term grouping can enable novel interaction
techniques, such as brushing-and-linking conceptually similar terms. When a user selects the word “tone”, the visualization shows the similar but changing
tones in Alice’s adventures from “melancholy tone” to “solemn tone” and from “encouraging tone” to “hopeful tone” as the story develops.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:25
19:26 J. Chuang et al.
Fig. 12. Adaptive tag cloud summarizing an article about the new subway map by the New York City
Metropolitan Transportation Authority. By adjusting the model output to show more specific or more general
terms, a visualization can adapt the text for readers with varying familiarity with the city’s subway system.
For example, a user might interactively drag a slider to explore different levels of term specificity. The top
tag cloud provides a general gist of the article and of the redesigned map. By increasing term specificity,
the middle tag cloud progressively reveals additional terms, including neighborhoods such as “TriBeCa”,
“NoHo”, and “Yorkville”, that may be of interest to local residents. The bottom tag cloud provides additional
details, such as historical subway maps with the “Massimo Vignellis abstract design.”
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:27
quantitative measures (e.g., precision recall on exact matches), we evaluated the ex-
tracted keyphrases in situations closer to the actual context of use. An analysis using
relaxed matching yielded insights on the shortcomings of the standard equality-based
precision-recall scores and provided the basis for our redundancy reduction algorithm.
Evaluating keyphrase use in tag clouds revealed effects due to visual features as well
as the impact of prominent mistakes.
While many of these preceding concepts may be familiar to HCI practitioners, their
uses in natural language processing are not widely adopted. Incorporating HCI meth-
ods, however, may benefit various active areas of NLP research.
For example, topic models are tools for analyzing the content of large text corpora;
they can automatically produce latent topics that capture coherent and significant
themes in the text. While topic models have the potential to enable large-scale text
analysis, their deployment in the real world has been limited. Studies with domain ex-
perts might better characterize human-defined textual topics and inform better models
of textual organization. HCI design methods may lead to visualizations and interfaces
that better address domain-specific tasks and increase model adoption. HCI evalua-
tions may also enable more meaningful assessment of model performance in the context
of real-world tasks.
9. CONCLUSION
In this article, we characterize the statistical and grammatical features of human-
generated keyphrases and present a model for identifying highly descriptive terms in
a text. The model allows for adjustment of keyphrase specificity to meet application
and user needs. Based on simple linguistic features, our approach does not require a
preprocessed reference corpus, external taxonomies, or genre-specific document struc-
ture while supporting interactive applications. Evaluations reveal that our model is
preferred by human judges, can match human extraction performance, and performs
well even on short texts.
Finally, the process through which we arrived at our algorithm—identifying human
strategies via a formal experiment and exploratory analysis, designing our algorithm
based on these identified strategies, and evaluating its performance in ecologically-
valid settings—demonstrates how HCI methods can be applied to aid the design and
development of effective algorithms in other domains.
REFERENCES
BARKER, K. AND CORNACCHIA, N. 2000. Using noun phrase heads to extract document keyphrases. In Proceed-
ings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence:
Advances in Artificial Intelligence. 40–52.
BOGURAEV, B. AND KENNEDY, C. 1999. Applications of term identification technology: Domain description and
content characterisation. Nat. Lang. Process. 5, 1, 17–44.
BRANTS, T. AND FRANZ, A. 2006. Web 1T 5-gram Version 1, Linguistic Data Consortium, Philadelphia.
BUYUKKOKTEN, O., GARCIA-MOLINA, H., PAEPCKE, A., AND WINOGRAD, T. 2000. Power browser: Efficient Web
browsing for PDAs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
BUYUKKOKTEN, O., KALJUVEE, O., GARCIA-MOLINA, H., PAEPCKE, A., AND WINOGRAD, T. 2002. Efficient Web browsing
on handheld devices using page and form summarization. ACM Trans. Inf. Syst. 20, 82–115.
CAO, N., SUN, J., LIN, Y.-R., GOTZ, D., LIU, S., AND QU, H. 2010. FacetAtlas: Multifaceted visualization for rich
text corpora. IEEE Trans. Visual Comput. Graphics 16, 1172–1181.
COLLINS, C., VIÉGAS, F. B., AND WATTENBERG, M. 2009. Parallel tag clouds to explore and analyze faceted text
corpora. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 91–98.
CUI, W., WU, Y., LIU, S., WEI, F., ZHOU, M. X., AND QU, H. 2010. Context-preserving, dynamic word cloud
visualization. In Proceedings of the IEEE PacificVis Symposium. 42–53.
DAILLE, B., GAUSSIER, E., AND LANGÉ, J.-M. 1994. Towards automatic extraction of monolingual and bilingual
terminology. In Proceedings of the Conference on Computational Linguistics. 515–521.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
19:28 J. Chuang et al.
DUNNING, T. 1993. Accurate methods for the statistics of surprise and coincidence. Comput. Ling. 19, 1, 61–74.
EVANS, D. K., KLAVANS, J. L., AND WACHOLDER, N. 2000. Document processing with LinkIT. In Recherche
d’Informations Assistee par Ordinateur.
FARAWAY, J. J. 2006. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric
Regression Models. Chapman & Hall/CRC.
FINKEL, J. R., GRENAGER, T., AND MANNING, C. 2005. Incorporating non-local information into information
extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics (ACL). 363–370.
HAVRE, S., HETZLER, B., AND NOWELL, L. 2000. ThemeRiver: Visualizing theme changes over time. In Proceed-
ings of the IEEE Symposium on Information Visualization. 115.
HEARST, M. 2009. Search User Interfaces. Cambridge Press, Cambridge, U.K.
HEER, J. AND BOSTOCK, M. 2010. Crowdsourcing graphical perception: Using Mechanical Turk to assess visu-
alization design. In Proceedings of the 28th International Conference on Human Factors in Computing
Systems. 203–212.
HULTH, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing. 216–223.
JUSTESON, J. S. AND KATZ, S. M. 1995. Technical terminology: Some linguistic properties and an algorithm for
identification in text. Nat. Lang. Engi. 1, 1, 9–27.
KIM, S. N., MEDELYAN, O., KAN, M.-Y., AND BALDWIN, T. 2010. Semeval-2010 task 5: Automatic keyphrase extrac-
tion from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation.
KIT, C. AND LIU, X. 2008. Measuring mono-word termhood by rank difference via corpus comparison.
Terminol. 14, 2, 204–229.
KLEIN, D. AND MANNING, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the Annual Meeting on
Association for Computational Linguistics (ACL). 423–430.
LAVER, M., BENOIT, K., AND COLLEGE, T. 2003. Extracting policy positions from political texts using words as
data. Am. Political Sci. Rev. 311–331.
LOPEZ, P. AND ROMARY, L. 2010. HUMB: Automatic key term extraction from scientific articles in GROBID. In
Proceedings of the International Workshop on Semantic Evaluation.
LUHN, H. P. 1958. The automatic creation of literature abstracts. IBM J. Res. Develop. 2, 2, 159–165.
MANNING, C. D., RAGHAVAN, P., AND SCHTZE, H. 2008. Introduction to Information Retrieval. Cambridge Univer-
sity Press, New York, NY.
MARCUS, M. P., MARCINKIEWICZ, M. A., AND SANTORINI, B. 1993. Building a large annotated corpus of English:
The Penn Treebank. Comput. Ling. 19, 2, 313–330.
MEDELYAN, O. AND WITTEN, I. H. 2006. Thesaurus based automatic keyphrase indexing. In Proceedings of the
6th ACM/IEEE-CS Joint Conference on Digital Libraries. 296–297.
MINNEN, G., CARROLL, J., AND PEARCE, D. 2001. Applied morphological processing of English. Nat. Lang.
Engi. 7, 3, 207–223.
MONROE, B., COLARESI, M., AND QUINN, K. 2008. Fightin’ words: Lexical feature selection and evaluation for
identifying the content of political conflict. Political Anal. 16, 4, 372–403.
RAYSON, P. AND GARSIDE, R. 2000. Comparing corpora using frequency profiling. In Proceedings of the Workshop
on Comparing Corpora. 1–6.
ROBERTSON, S. E., VAN RIJSBERGEN, C. J., AND PORTER, M. F. 1981. Probabilistic models of indexing and searching.
In Research and Development in Information Retrieval, R. N. Oddy, S. E. Robertson, C. J. van Rijsbergen,
and P. W. Williams, Eds. 35–56.
SALTON, G. AND BUCKLEY, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Proc.
Manage. 513–523.
SCHWARTZ, A. S. AND HEARST, M. A. 2003. A simple algorithm for identifying abbreviation definitions in
biomedical text. In Proceedings of the Pacific Symposium on Biocomputing.
SHI, L., WEI, F., LIU, S., TAN, L., LIAN, X., AND ZHOU, M. X. 2010. Understanding text corpora with multiple
facets. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology. 99–106.
STASKO, J., GÖRG, C., AND LIU, Z. 2008. Jigsaw: Supporting investigative analysis through interactive visual-
ization. Inform. Visual. 7, 118–132.
TOUTANOVA, K., KLEIN, D., MANNING, C. D., AND SINGER, Y. 2003. Feature-rich part-of-speech tagging with
a cyclic dependency network. In Proceedings of the Conference of the North American Chapter of
the Association for Computational Linguistics on Human Language Technologies (HLT-NAACL). 252–
259.
TURNEY, P. D. 2000. Learning algorithms for keyphrase extraction. Inform. Retrie. 2, 4, 303–336.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.
Descriptive Keyphrases for Text Visualization 19:29
VIÉGAS, F. B., GOLDER, S., AND DONATH, J. 2006. Visualizing email content: Portraying relationships from
conversational histories. In Proceedings of the International Conference on Human Factors in Computing
Systems (CHI). 979–988.
VIÉGAS, F. B., WATTENBERG, M., AND FEINBERG, J. 2009. Participatory visualization with Wordle. IEEE Trans.
Visual Comput. Graphics 15, 6, 1137–1144.
YANG, C. C. AND WANG, F. L. 2003. Fractal summarization for mobile devices to access large documents on the
web. In Proceedings of the 12th International Conference on World Wide Web. 215–224.
ACM Transactions on Computer-Human Interaction, Vol. 19, No. 3, Article 19, Publication date: October 2012.