0% found this document useful (0 votes)
9 views6 pages

R11-1071

Uploaded by

Hiếu Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

R11-1071

Uploaded by

Hiếu Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Reflective View on Text Similarity

Daniel Bär, Torsten Zesch, and Iryna Gurevych


Ubiquitous Knowledge Processing Lab
Computer Science Department, Technische Universität Darmstadt
Hochschulstrasse 10, D-64289 Darmstadt, Germany
www.ukp.tu-darmstadt.de

Abstract two novels by Leo Tolstoy1 . A reader may readily


argue that these novels are completely dissimilar
While the concept of similarity is well due to different plots, people, or places (i.e. dis-
grounded in psychology, text similarity is similar content). On the other hand, another reader
less well-defined. Thus, we analyze text may argue that both texts are indeed highly simi-
similarity with respect to its definition and lar because of their stylistic similarities. Hence,
the datasets used for evaluation. We for- text similarity is a loose notion unless we provide
malize text similarity based on the geo- a certain frame of reference. Therefore, we intro-
metric model of conceptual spaces along duce a formalization based on conceptual spaces
three dimensions inherent to texts: struc- (Gärdenfors, 2000). Furthermore, we discuss the
ture, style, and content. We empirically datasets used for evaluating text similarity mea-
ground these dimensions in a set of anno- sures. We analyze the properties of each dataset by
tation studies, and categorize applications means of annotation studies and a critical view on
according to these dimensions. Further- the performance of common similarity measures.
more, we analyze the characteristics of the
existing evaluation datasets, and use those 2 Formalization
datasets to assess the performance of com- In psychology, similarity is well formalized and
mon text similarity measures. captured in formal models such as the set-
theoretic model (Tversky, 1977) or the geometric
1 Introduction
model (Widdows, 2004). In an attempt to over-
Within the natural language processing (NLP) come the traditionally loose definition of text sim-
community, similarity between texts (text similar- ilarity, we rely on a conceptual framework based
ity, henceforth) is utilized in a wide range of tasks, on conceptual spaces (Gärdenfors, 2000). In this
e.g. automatic essay grading (Attali and Burstein, model, objects are represented in a number of ge-
2006) or paraphrase recognition (Tsatsaronis et ometric spaces. For example, potential spaces re-
al., 2010). However, text similarity is often used lated to countries are political affinity and geo-
as an umbrella term covering quite different phe- graphical proximity. In order to adapt this model
nomena. Therefore, we formalize text similarity to texts, we need to define explicit spaces (i.e. di-
and analyze the datasets used for evaluation. mensions) suitable for texts. Therefore, we ana-
We argue that the seemingly simple question lyzed common NLP tasks with respect to the rele-
“How similar are two texts?” cannot be answered vant dimensions of similarity, and then conducted
independently from asking what properties make annotation studies to ground them empirically.
them similar. Goodman (1972) gives a good ex- Table 1 gives an overview of common NLP
ample regarding the baggage check at an airport: tasks and their relevant dimensions: structure,
While a spectator might compare bags by shape, style, and content. Structure thereby refers to
size, or color, the pilot only focuses on a bag’s the internal developments of a given text, e.g. the
weight, and the passenger compares them by des- order of sections. Style refers to grammar, us-
tination and ownership. Similarly, texts also have age, mechanics, and lexical complexity (Attali and
certain inherent properties (dimensions, hence- Burstein, 2006). Content addresses all facts and
forth) that need to be considered in any attempt 1
A famous 19th century Russian writer of realist fiction
to judge their similarity. Consider, for example, and philosophical essays

515
Proceedings of Recent Advances in Natural Language Processing, pages 515–520,
Hissar, Bulgaria, 12-14 September 2011.
Task str sty c the annotator A2 indicates that a different dimen-
Authorship Classification X sion might have been used to judge similarity.
Automatic Essay Scoring X X X To further investigate this issue, we asked the
Information Retrieval X X X
Paraphrase Recognition X annotators about the reasons for their judgments.
Plagiarism Detection X X A1 and A3 consistently focused only on the con-
Question Answering X tent of the texts and completely disregarded other
Short Answer Grading X X X
Summarization X X dimensions. A2 , however, was also taking struc-
Text Categorization X tural similarities into account, e.g. two texts were
Text Segmentation X X rated highly similar because of the way they are
Text Simplification X X
Word Sense Alignment X organized: First, an introduction to the topic is
given, then a quotation is stated, then the text con-
Table 1: Classification of common NLP tasks with cludes with a certain reaction of the acting subject.
respect to the relevant dimensions of text similar-
ity: structure (str), style (sty), and content (c) Content vs. Style The annotators in the previ-
ous study only identified the dimensions content
and structure. Style was not addressed, as the text
their relationships within a text. For example, pairs were all of similar style, and hence that di-
the task of automatic essay scoring (Attali and mension was not perceived as salient. Thus, we
Burstein, 2006) typically not only requires the es- selected 10 pairs of short texts from Wikipedia
say to be about a certain topic (content dimension), (WP) and Simple Wikipedia2 (SWP). We used the
but also an adequate style and a coherent structure first paragraphs of WP articles and the full texts
are necessary. However, in authorship classifica- of SWP articles to obtain pairs of similar length.
tion (Holmes, 1998) only style is important. Pairs were formed in all combinations (WP-WP,
Taking this dimension-centric view on text sim- SWP-WP, and SWP-SWP) to ensure that both
ilarity also opens up new perspectives. For exam- similarity dimensions were salient for some pairs.
ple, standard information retrieval usually consid- For example, an article from SWP and one from
ers only the content dimension (keyword overlap WP about the same topic share the same content,
between query and document). However, a scholar but are different in style, while two articles from
in digital humanities might be interested in texts SWP have a similar style, but different content.
that are similar to a reference document with re- We then asked three annotators to rate each pair
spect to style and structure, while texts with simi- according to the content and style dimensions. The
lar content are of minor interest. In this paper, we results show that WP-WP and SWP-SWP pairs are
only address dimensions inherent to texts, and do perceived as stylistically similar, while WP-SWP
not consider dimensions such as user intentions. pairs are seen similar with respect to their content.

2.1 Empirical Grounding 2.2 Discussion


In order to empirically ground the proposed di- The results demonstrate that humans indeed dis-
mensions of text similarity, we conducted a num- tinguish the major dimensions of text similarity.
ber of exemplary annotation studies. The results Also, they seem intuitively able to find an appro-
show that annotators indeed distinguish between priate dimension of comparison for a given text
different dimensions of text similarity. collection. Smith and Heise (1992) refer to that as
perceived similarity which “changes with changes
Content vs. Structure In this study, we used the in selective attention to specific perceptual prop-
dataset by Lee et al. (2005) that contains pairwise erties.” Selective attention can be modeled us-
human similarity judgments for 1,225 text pairs. ing dimension-specific similarity measures. The
We selected a subset of 50 pairs with a uniform scores for all dimensions are computed in parallel,
distribution of judgments across the whole similar- and then summed up for each text pair.3 Thereby,
ity range. We then asked three annotators: “How we automatically obtain the discriminating dimen-
similar are the given texts?” We then computed the sion (see Figure 1). A, B, and C are documents of
Spearman correlation of each annotator’s ratings 2
Articles written in Simple English use a limited vocabu-
with the gold standard: ρA1 = 0.83, ρA2 = 0.65, lary and easier grammar than the standard Wikipedia.
3
and ρA3 = 0.85. The much lower correlation of The last step requires all measures to be normalized.

516
Length in Rating # Judges
Dataset Text Type / Domain # Pairs
Terms () Scale per Pair
30 Sentence Pairs (Li et al., 2006) Concept Definitions 5–33 (11) 30 0–4 32
50 Short Texts (Lee et al., 2005) News (Politics) 45–126 (80) 1,225 1–5 8–12
Computer Science Assignments
Computer Science 1–173 (18) 630 0–5 2
(Mohler and Mihalcea, 2009)
Microsoft Paraphrase Corpus
News 5–31 (19) 5,801 binary 2–3
(Dolan et al., 2004)

Table 2: Statistics for text similarity evaluation datasets

Content 3.1 30 Sentence Pairs


2 {ABC} +{D} Style
Aggr. Similarity

Li et al. (2006) introduced 65 sentence pairs which


1.5 {ABC} +{E}
are based on the noun pairs by Rubenstein and
1
Goodenough (1965). Each noun was replaced by
0.5 its definition from Collins Cobuild English Dictio-
0 nary (Sinclair, 2001). The dataset contains judg-
AB AC BC AD BD CD AB AC BC AE BE CE ments from 32 subjects on how similar in meaning
Text Pair Text Pair
one sentence is to another. Li et al. (2006) selected
Figure 1: Combination of specialized text similar- 30 pairs to reduce the bias in the frequency distri-
ity measures to determine the salient dimension. bution (30 Sentence Pairs, henceforth).
Left: Adding document D makes content salient. We conducted a re-rating study to evaluate
Right: Adding document E makes style salient. whether text similarity judgments are stable across
time and subjects. We collected 10 judgments per
the same style but rather different content (as in- pair asking: “How close do these sentences come
dicated by the comparable height of the stacked to meaning the same thing?”4 The Spearman cor-
bars). Adding another text D of the very same relation of the aggregated results with the original
style, but where the content is rather similar to B, scores is ρ = 0.91. We conclude that text similar-
changes the situation to what is shown in Figure 1 ity judgments are stable across time and subjects.
(left). The pair BD stands out as its aggregated It also indicates that humans indeed share a com-
score is significantly higher than that of the oth- mon understanding on what makes texts similar.
ers. In contrast, adding document E which is writ- In order to better understand the characteristics
ten with a different style, results in the situation of this dataset, we performed another study. For
as shown in Figure 1 (right). Even though B and each text pair we asked the annotators: “Why did
E have rather similar content, the content dimen- people agree that these two sentences are (not)
sion will not become salient because of the dom- close in meaning?” We collected 10 judgments per
inance of the style dimension. Consequently, the pair in the same crowdsourcing setting as before.
better measures for a certain dimension are avail- To our surprise, the annotators only used lex-
able, the better this automatic discrimination will ical semantic relations between terms to justify
work. Developing such dimension-specific mea- the similarity relation between texts. For ex-
sures, however, requires evaluation datasets which ample, the text pairs about tool/implement and
are explicitly annotated according to those dimen- cemetery/graveyard were consistently said to be
sions. In the next section, we analyze whether the synonymous. We conclude that – in this setting –
existing datasets already fulfill this requirement. humans reduce text similarity to term similarity.
As the text pairs are originally based on term
3 Evaluation Datasets pairs, we computed the Spearman correlation be-
tween the text pair scores and the original term
Four datasets are commonly used for evaluation
pair scores. The very high correlation of ρ = 0.94
(see Table 2). They contain text pairs together with
shows that annotators indeed judged the similar-
human judgments about their perceived similarity.
ity between terms rather than texts. We conclude
However, none of those datasets has yet undergone
a thorough analysis with respect to the dimensions 4
Same question as in the original study by Li et al. (2006).
of text similarity encoded therein. We used Amazon Mechanical Turk via CrowdFlower.

517
Measure r ρ Measure r
Cosine Baseline .81 .83 Cosine Baseline .56
Term Pair Heuristic .83 .84
ESA (Wikipedia) .46
ESA (Wikipedia) .61 .77 ESA (Wiktionary) .53
ESA (Wiktionary) .77 .82 ESA (WordNet) .59
ESA (WordNet) .75 .80
ESA (Gabrilovich and Markovitch, 2007) .72
Kennedy and Szpakowicz (2008) .87 - LSA (Lee et al., 2005) .60
LSA (Tsatsaronis et al., 2010) .84 .87 WikiWalk (Yeh et al., 2009) .77
OMIOTIS (Tsatsaronis et al., 2010) .86 .89
STASIS (Li et al., 2006) .82 .81 Table 4: Results on the 50 Short Texts dataset. Sta-
STS (Islam and Inkpen, 2008) .85 .84
tistically significant7 improvements in bold.
Table 3: Results on the 30 Sentence Pairs dataset
ρ = 0.88. This shows that judgments are quite
that this dataset encodes the content dimension of stable across time and subjects.
similarity, but a rather constrained one. In Section 2.1, two annotators had a content-
centric view on similarity while one subject also
Evaluation Results Table 3 shows the results considered structural similarity important. When
of state of the art similarity measures obtained combining only the two content-centric annota-
on this dataset. We used a cosine baseline and tors, the correlation is ρ = 0.90, while it is much
implemented an additional baseline which disre- lower for the other annotator. Thus, we conclude
gards the actual texts and only takes the target that this dataset encodes the content dimension of
noun of each sentence into account. We computed text similarity.
their pairwise term similarity using the metric by
Lin (1998) on WordNet (Fellbaum, 1998). Our Evaluation Results Table 4 summarizes the re-
heuristic achieves Pearson r = 0.83 and Spearman sults obtained on this dataset. We used a co-
ρ = 0.84. The block of results in the middle shows sine baseline, and our implementation of ESA ap-
our implementation of Explicit Semantic Anal- plied to different knowledge sources. The results
ysis (ESA) (Gabrilovich and Markovitch, 2007) at the bottom are scores previously obtained and
using different knowledge sources (Zesch et al., reported in the literature. All of them signifi-
2008). The bottom rows show scores previously cantly outperform the baseline.7 In contrast to the
obtained and reported in the literature. None of the 30 Sentence Pairs, this dataset encodes a broader
measures significantly5 outperforms the baselines. view on the content dimension of similarity. It
Given the limitation of encoding rather term than obviously contains text pairs that are similar (or
text similarity and the fact that the dataset is also dissimilar) for reasons beyond partial string over-
very small (30 pairs), it is questionable whether it lap. Thus, the dataset might be used to intrinsi-
is a suitable evaluation dataset for text similarity. cally evaluate text similarity measures.
However, the distribution of similarity scores in
3.2 50 Short Texts this dataset is heavily skewed towards low scores,
The dataset by Lee et al. (2005) comprises 50 rela- with 82% of all term pairs having a text similarity
tively short texts (45 to 126 words6 ) which contain score between 1 and 2 on a 1–5 scale. This limits
newswire from the political domain. In analogy to the kind of conclusions that can be drawn as the
the study in Section 3.1, we performed an anno- number of the pairs in the most interesting class of
tation study to show whether the encoded judg- highly similar pairs is actually very small.
ments are stable across time and subjects. We Another observation is that we were not able to
asked three annotators to rate “How similar are reproduce the ESA score on Wikipedia reported
the given texts?”. We used the same uniformly by Gabrilovich and Markovitch (2007). We found
distributed subset as in Section 2.1. The resulting that the difference probably relates to the cut-off
Spearman correlation between the aggregated re- value used to prune the vectors as reported by Yeh
sults of the annotators and the original scores is et al. (2009). By tuning the cut-off value, we could
improve the score to 0.70, which comes very close
5
α = .05, Fisher Z-value transformation to the reported score of 0.72. However, as this tun-
6
Lee et al. (2005) report the shortest document having 51
7
words probably due to a different tokenization strategy. α = .01, Fisher Z-value transformation

518
Measure r Measure F-measure
Cosine Baseline .44 Cosine Baseline .81
Majority Baseline .80
ESA (Mohler and Mihalcea, 2009) .47
LSA (Mohler and Mihalcea, 2009) .43 ESA (Wikipedia) .80
Mohler and Mihalcea (2009) .45 LSA (Mihalcea et al., 2006) .81
Mihalcea et al. (2006) .81
Table 5: Results on the Computer Science Assign- OMIOTIS (Tsatsaronis et al., 2010) .81
PMI-IR (Mihalcea et al., 2006) .81
ments dataset Ramage et al. (2009) .80
STS (Islam and Inkpen, 2008) .81

ing is done directly on the evaluation dataset, it Finch et al. (2005) .83
Qiu et al. (2006) .82
probably overfits the cut-off value to the dataset. Wan et al. (2006) .83
Zhang and Patrick (2005) .81
3.3 Computer Science Assignments
The dataset by Mohler and Mihalcea (2009) was Table 6: Results on Microsoft Paraphrase Corpus
introduced for assessing the quality of short an-
swer grading systems in the context of computer Evaluation Results We summarize the results
science assignments. The dataset comprises 21 obtained on this dataset in Table 6. As detecting
questions, 21 reference answers and 630 student paraphrases is a classification task, we use an addi-
answers. The answers were graded by two teach- tional majority baseline which classifies all results
ers – not according to stylistic properties, but to the according to the predominant class of true para-
extent the content of the student answers matched phrases. The block of results in the middle con-
with the content of the reference answers. tains measures that are not specifically tailored to-
wards paraphrase recognition. None of them beats
Evaluation Results We summarize the results
the cosine baseline. The results at the bottom show
obtained on this dataset in Table 5. The scores are
measures which are specifically tailored towards
reported without relevance feedback (Mohler and
the detection of a bidirectional entailment relation-
Mihalcea, 2009) which distorts results by chang-
ship. None of them, however, significantly outper-
ing the reference answers. None of the measures
forms the cosine baseline. Obviously, recognizing
significantly8 outperforms the baseline. This is not
paraphrases is a very hard task that cannot simply
overly surprising, as the textual similarity between
be tackled by computing text similarity, as sharing
the reference and the student answer only consti-
similar content is a necessary, but not a sufficient
tutes part of what makes an answer the correct one.
condition for detecting paraphrases.
More sophisticated measures that also take lexi-
cal semantic relationships between terms into ac- 3.5 Discussion
count might even worsen the results, as typically
We showed that all four datasets encode the con-
a specific answer is required, not a similar one.
tent dimension of text similarity. The Computer
We conclude that similarity measures can be used
Science Assignments dataset and the Microsoft
to grade assignments, but it seems questionable
Paraphrase Corpus are tailored quite specifically
whether this dataset is suited to draw any conclu-
to a certain task. Thereby, factors exceeding the
sions on the performance of similarity measures
similarity of texts are important. Consequently,
outside of this particular task.
none of the similarity measures significantly out-
3.4 Microsoft Paraphrase Corpus performed the cosine baseline. The evaluation
of similarity measures on these datasets is hence
Dolan et al. (2004) introduced a dataset of 5,801
questionable outside of the specific application
sentence pairs taken from news sources on the
scenario. The 30 Sentence Pairs dataset was found
Web. They collected binary judgments from 2–3
to rather represent the similarity between terms
subjects whether each pair captures a paraphrase
than texts. Obviously, it is not suited for evaluating
relationship or not (83% interrater agreement).
text similarity measures. However, the 50 Short
The dataset has been used for evaluating text simi-
Texts dataset currently seems to be the best choice.
larity measures as, by definition, paraphrases need
As it is heavily skewed towards low similarity
to be similar with respect to their content.
scores, though, the conclusions that can be drawn
8
α = .05, Fisher Z-value transformation from the results are limited. Further datasets are

519
necessary to guide the development of measures David I. Holmes. 1998. The Evolution of Stylometry in Hu-
along other dimensions such as structure or style. manities Scholarship. Literary and Linguistic Computing,
13(3):111–117.
Aminul Islam and Diana Inkpen. 2008. Semantic Text Sim-
4 Conclusions ilarity Using Corpus-Based Word Similarity and String
Similarity. ACM Transactions on Knowledge Discovery
In this paper, we reflected on text similarity as a from Data, 2(2):1–25.
foundational technique for a wide range of tasks. Alistair Kennedy and Stan Szpakowicz. 2008. Evaluat-
ing Roget’s Thesauri. In Proceedings of the 46th Annual
We argued that while similarity is well grounded Meeting of the Association for Computational Linguistics:
in psychology, text similarity is less well-defined. Human Language Technologies, pages 416–424.
We introduced a formalization based on concep- Michael D. Lee, Brandon Pincombe, and Matthew Welsh.
2005. An empirical evaluation of models of text document
tual spaces for modeling text similarity along ex- similarity. In Proceedings of the 27th Annual Conference
plicit dimensions inherent to texts. We empirically of the Cognitive Science Society, pages 1254–1259.
grounded these dimensions by annotation stud- Yuhua Li, David McLean, Zuhair Bandar, James O’Shea, and
ies and demonstrated that humans indeed judge Keeley Crockett. 2006. Sentence Similarity Based on Se-
mantic Nets and Corpus Statistics. IEEE Transactions on
similarity along different dimensions. Further- Knowledge and Data Engineering, 18(8):1138–1150.
more, we discussed common evaluation datasets Dekang Lin. 1998. An information-theoretic definition of
and showed that it is of crucial importance for text similarity. In Proceedings of International Conference on
Machine Learning, pages 296–304.
similarity measures to address the correct dimen- Rada Mihalcea, Courtney Corley, and Carlo Strapparava.
sions. Otherwise, these measures fail to outper- 2006. Corpus-based and Knowledge-based Measures of
form even simple baselines. Text Semantic Similarity. In Proceedings of the 21st Na-
tional Conference on Artificial Intelligence.
We propose that future studies aiming at collect- Michael Mohler and Rada Mihalcea. 2009. Text-to-text Se-
ing human judgments on text similarity should ex- mantic Similarity for Automatic Short Answer Grading.
plicitly state which dimension is targeted in order In Proc. of the Europ. Chapter of the ACL, pages 567–575.
to create reliable annotation data. Further evalua- Long Qiu, Min-Yen Kan, and Tat-Seng Chua. 2006. Para-
phrase Recognition via Dissimilarity Significance Classi-
tion datasets annotated according to the structure fication. In Proceedings of the Conference on Empirical
and style dimensions of text similarity are neces- Methods in Natural Language Processing, pages 18–26.
sary to guide further research in this field. Daniel Ramage, Anna N. Rafferty, and Christopher D. Man-
ning. 2009. Random Walks for Text Semantic Similarity.
In Proceedings of the Workshop on Graph-based Methods
Acknowledgments for Natural Language Processing, pages 23–31.
This work has been supported by the Volkswagen Founda- Herbert Rubenstein and John B. Goodenough. 1965. Con-
tion as part of the Lichtenberg-Professorship Program under textual correlates of synonymy. Communications of the
grant No. I/82806, and by the Klaus Tschira Foundation un- ACM, 8(10):627–633.
der project No. 00.133.2008. We thank György Szarvas for John Sinclair, editor. 2001. Collins COBUILD Advanced
sharing his insights into the ESA similarity measure with us. Learner’s English Dictionary. HarperCollins, 3rd edition.
Linda B. Smith and Diana Heise. 1992. Perceptual similarity
and conceptual structure. In B. Burns, editor, Percepts,
References Concepts, and Categories. Elsevier.
Yigal Attali and Jill Burstein. 2006. Automated essay scor- George Tsatsaronis, Iraklis Varlamis, and Michalis Vazir-
ing with e-rater v.2.0. Journal of Technology, Learning, giannis. 2010. Text relatedness based on a word the-
and Assessment, 4(3). saurus. Journal of Artificial Intell. Research, 37:1–39.
Amos Tversky. 1977. Features of similarity. In Psychologi-
Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsuper- cal Review, volume 84, pages 327–352.
vised Construction of Large Paraphrase Corpora: Exploit-
Stephen Wan, Dras Mark, Robert Dale, and Cécile Paris.
ing Massively Parallel News Sources. In Proc. of the 20th
2006. Using dependency-based features to take the “para-
International Conference on Computational Linguistics.
farce” out of paraphrase. In Proc. of the Australasian Lan-
Christiane Fellbaum. 1998. WordNet: An Electronic Lexical guage Technology Workshop, pages 131–138.
Database. MIT Press. Dominic Widdows. 2004. Geometry and Meaning. Center
Andrew Finch, Young-Sook Hwang, and Eiichiro Sumita. for the Study of Language and Information.
2005. Using machine translation evaluation techniques to Eric Yeh, Daniel Ramage, Christopher D. Manning, Eneko
determine sentence-level semantic equivalence. In Proc. Agirre, and Aitor Soroa. 2009. WikiWalk: Random walks
of the 3rd Intl. Workshop on Paraphrasing, pages 17–24. on Wikipedia for Semantic Relatedness. In Proceedings of
the Workshop on Graph-based Methods for Natural Lan-
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Comput-
guage Processing, pages 41–49.
ing Semantic Relatedness using Wikipedia-based Explicit
Semantic Analysis. In Proc. of the 20th Intl. Joint Confer- Torsten Zesch, Christof Müller, and Iryna Gurevych. 2008.
ence on Artificial Intelligence, pages 1606–1611. Using Wiktionary for Computing Semantic Relatedness.
In Proc. of the 23rd AAAI Conf. on AI, pages 861–867.
Peter Gärdenfors. 2000. Conceptual Spaces: The Geometry Yitao Zhang and Jon Patrick. 2005. Paraphrase Identifica-
of Thought. MIT Press. tion by Text Canonicalization. In Proc. of the Australasian
Nelson Goodman. 1972. Seven strictures on similarity. In Language Technology Workshop, pages 160–166.
Problems and projects, pages 437–446. Bobbs-Merrill.

520

You might also like