0% found this document useful (0 votes)

33 views

Topic Extraction From Online Reviews For Classification and Recommendation

This paper proposes techniques for identifying helpful product reviews using natural language processing. Key topics and sentiment are extracted from reviews using part-of-speech tagging and opinion pattern mining. Sentiment is assigned to topics to identify positive and negative elements. These features are used to predict review quality and recommend high quality reviews to users. An example camera review is provided to demonstrate extraction of topics like "build quality" and sentiment tagging.

Uploaded by

Akhiyar Waladi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Topic Extraction From Online Reviews For Classification and Recommendation

Uploaded by

Akhiyar Waladi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/249643263

Topic Extraction from Online Reviews for Classiﬁcation and Recommendation

Conference Paper · August 2013

CITATIONS READS
47 1,588

4 authors:

Ruihai Dong Markus Schaal

University College Dublin University College Dublin
43 PUBLICATIONS 490 CITATIONS 35 PUBLICATIONS 302 CITATIONS

SEE PROFILE SEE PROFILE

Michael P. O’Mahony Barry Smyth

University College Dublin University College Dublin
82 PUBLICATIONS 1,830 CITATIONS 567 PUBLICATIONS 14,479 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

'Pace-Man', A predictive analytics platform that helps runners prepare for, predict and pace their race. View project

Financial Index Forecasting View project

All content following this page was uploaded by Ruihai Dong on 22 May 2014.

The user has requested enhancement of the downloaded file.

Topic Extraction from Online Reviews for Classification and Recommendation∗
Reviews Topic Extraction

Product, P1 Product, Pn
Sentiment Terms

(Part of Speech Tagging)

+++ ---
Ruihai Dong, Markus Schaal, Michael P. O’Mahony, Barry Smyth

Shallow NLP
CLARITY: Centre for Sensor Web Technologies Bi-Grams Nouns

School of Computer Science and Informatics

Thresholding & Ranking
University College Dublin, Ireland
{firstname.lastname}@ucd.ie F1, F2, ...

Opinion Patterns Opinion Pattern

Mining
JJ-FEATURE,...

Abstract build quality and action shots Valid in a positive or negative con-
Opinion
Patterns
text. Our aim is to automatically mine these types of topics
Automatically identifying informative reviews is (Ri,from
Sj, Tk, +/-)
Sentiment
the raw review text and to Sentiment
Assignment automatically
Matching assign senti-
increasingly important given the rapid growth of ment labels to the relevant topics and review elements. We
user generated reviews on sites like Amazon and describe and evaluate how such features can be used to pre-
Sentiment Analysis

TripAdvisor. In this paper, we describe and eval- dict review quality (helpfulness). Further we show how this
uate techniques for identifying and recommending can be used as the basis of a review recommendation system
helpful product reviews using a combination of re- to automatically recommend high quality reviews even in the
view features, including topical and sentiment in- absence of any explicit helpfulness feedback.
formation, mined from a review corpus.
The Fuji X100 is a great camera. It looks beautiful and takes great quality images.
I have found the battery life to be superb during normal use. I only seem to charge
1 Introduction after well over 1000 shots. The build quality is excellent and it is a joy to hold.

The web is awash with user generated reviews, from the con-
templative literary critiques of GoodReads to the flame wars The camera is not without its quirks however and it does take some getting used to.
that can sometimes erupt around hotels on TripAdvisor. User The auto focus can be slow to catch, for example. So it's not so good for action shots
generated reviews are now an important part of how we in- but it does take great portraits and its night shooting is excellent.
form opinions when we make decisions to travel and shop.
The availability of reviews helps shoppers to choose [Hu et
Figure 1: A product review for a digital camera with topics
al., 2008] and increases the likelihood that they will make a
marked as bold, underlined text and sentiment highlighted as
buying decision [Zhu and Zhang, 2010]. But the helpfulness
either a green (positive) or red (negative) background.
of user reviews depends on their quality (detail, objectivity,
readability etc.) As the volume of online reviews grows it
is becoming increasingly important to provide users with the
tools to filter hundreds of opinions about a given product. 2 Related Work
Sorting reviews by helpfulness is one approach but it takes Recent research highlights how online product reviews can
time to accumulate helpfulness feedback and more recent re- influence on the purchasing behaviour of users; see [Hu et
views are often disadvantaged until they have accumulated a al., 2008; Zhu and Zhang, 2010]. The effect of consumer
minimum amount of feedback. One way to address this is reviews on book sales on Amazon.com and Barnesandno-
to develop techniques for automatically assessing the help- ble.com [Chevalier and Dina Mayzlin, 2006] shows that the
fulness of reviews. This has been attempted in the past with relative sales of books on a site correlates closely with pos-
varying degrees of success [O’Mahony et al., 2009] by learn- itive review sentiment; although interestingly, there was in-
ing classifiers using review features based on the readability sufficient evidence to conclude that retailers themselves ben-
of the text, the reputation of the reviewer, the star rating of efit from making product reviews available to consumers; see
the review, and various content features based on the review also the work of [Dhar and Chang, 2009] and [Dellarocas et
terms. al., 2007] for music and movie sales, respectively. But as re-
In this paper we build on this research in a number of ways. view volume has grown retailers need to develop ways to help
We describe a technique to extract interesting topics from re- users find high quality reviews for products of interest and to
views and assign sentiment labels to these topics. Figure 1 avoid malicious or biased reviews. This has led to a body of
provides a simple example based on a review of a camera. In research focused on classifying or predicting review helpful-
this case the reviewer has mentioned certain topics such as ness and also research on detecting so-called spam reviews.
A classical review classification approach, proposed by
∗
This work is supported by Science Foundation Ireland under [Kim et al., 2006], considered features relating to the rat-
grant 07/CE/I1147. ings, structural, syntactic, and semantic properties of reviews
to find ratings and review length among the most discriminat- 3.1 Topic Extraction
ing. Reviewer expertise was found to be a useful predictor of We consider two basic types of topics — bi-grams and single
review helpfulness by [Liu et al., 2008], confirming, in this nouns — which are extracted using a combination of shallow
case, the intuition that people interested in a certain genre of NLP and statistical methods, primarily by combining ideas
movies are likely to pen high quality reviews for similar genre from [Hu and Liu, 2004a] and [Justeson and Katz, 1995]. To
movies. Review timeliness was also found to be important produce a set of bi-gram topics we extract all bi-grams from
since review helpfulness declined as time went by. Further- the global review set which conform to one of two basic part-
more, opinion sentiment has been mined from user reviews to of-speech co-location patterns: (1) an adjective followed by a
predict ratings and helpfulness in services such as TripAdvi- noun (AN ) such as wide angle; and (2) a noun followed by a
sor by the likes of [Baccianella et al., 2009; Hsu et al., 2009; noun (N N ) such as video mode. These are candidate topics
O’Mahony et al., 2009; O’Mahony and Smyth, 2009]. that need to be filtered to avoid including AN ’s that are ac-
Just as it is useful to automate the filtering of helpful re- tually opinionated single-noun topics; for example, excellent
views it is also important to weed out malicious or biased lens is a single-noun topic (lens) and not a bi-gram topic. To
reviews. These reviews can be well written and informative do this we exclude bi-grams whose adjective is found to be a
and so appear to be helpful. However these reviews often sentiment word (e.g. excellent, good, great, lovely, terrible,
adopt a biased perspective that is designed to help or hinder horrible etc.) using the sentiment lexicon proposed in [Hu
sales of the target product [Lim et al., 2010]. [Li et al., 2011] and Liu, 2004b].
describe a machine learning approach to spam detection that To identify the single-noun topics we extract a candidate
is enhanced by information about the spammer’s identify as set of (non stop-word) nouns from the global review set. Of-
part of a two-tier co-learning approach. On a related topic, ten these single-noun candidates will not make for good top-
[O’Callaghan et al., 2012] use network analysis techniques ics; for example, they might include words such as family or
to identify recurring spam in user generated comments as- day or vacation. [Qiu et al., 2009] proposed a solution for
sociated with YouTube videos by identifying discriminating validating such topics by eliminating those that are rarely as-
comment motifs that are indicative of spambots. sociated with opinionated words. The intuition is that nouns
In this paper we extend the related work in this area by that frequently occur in reviews and that are frequently asso-
considering novel review classification features. We describe ciated with sentiment rich, opinion laden words are likely to
techniques for mining topical and sentiment features from be product topics that the reviewer is writing about, and there-
user generated reviews and demonstrate their ability to boost fore good topics. Thus, for each candidate single-noun, we
classification accuracy. calculate how frequently it appears with nearby words from
a list of sentiment words (again, as above, we use Hu and
3 Topic Extraction and Sentiment Analysis Liu’s sentiment lexicon), keeping the single-noun only if this
frequency is greater than some threshold (in this case 70%).
For the purpose of this work our focus is on mining topics The result is a set of bi-gram and single-noun topics which
from user-generated product reviews and assigning sentiment we further filter based on their frequency of occurrence in the
to these topics on a per review basis. Before we describe how review set, keeping only those topics (T1 , . . . , Tm ) that occur
this topical and sentiment information can be used as novel in at least k reviews out of the total number of n reviews; in
classification features, we will outline how we automatically this case, for bi-gram topics we set kbg = n/20 and for single
extract topics and assign sentiment as per Figure 2. noun topics we set ksn = 10 × kbg .
Reviews Topic Extraction 3.2 Sentiment Analysis
(Part of Speech Tagging)

Product, P1 Product, Pn Sentiment Terms To determine the sentiment of the topics in the product topic
set we use a method similar to the opinion pattern mining
Shallow NLP

+++ ---

Bi-Grams Nouns
technique proposed by [Moghaddam and Ester, 2010] for ex-
tracting opinions from unstructured product reviews. Once
Thresholding & Ranking again we use the sentiment lexicon from [Hu and Liu, 2004b]
as the basis for this analysis. For a given topic Ti , and cor-
T1, T2, …
responding review sentence Sj from review Rk (that is the
sentence in Rk that includes Ti ), we determine whether there
Opinion Patterns Opinion Pattern are any sentiment words in Sj . If there are not then this topic
JJ-TOPIC,... Mining
is marked as neutral, from a sentiment perspective. If there
Valid Opinion
Patterns
are sentiment words (w1 , w2 , ...) then we identify that word
(wmin ) which has the minimum word-distance to Ti .
Sentiment Sentiment
(Ri, Sj, Tk, + / - / =) Assignment Matching Next we determine the part-of-speech tags for wmin , Ti
and any words that occur between wmin and Ti . The POS
Sentiment Analysis
sequence corresponds to an opinion pattern. For example, in
the case of the bi-gram topic noise reduction and the review
Figure 2: System architecture for extracting topics and as-
sentence “...this camera has great noise reduction...”, wmin
signing sentiment for user generated reviews.
is the word “great” which corresponds to the opinion pattern
JJ-TOPIC as per [Moghaddam and Ester, 2010].
The Fuji X100 is a great camera. It looks beautiful and takes great quality images.
I have found the battery life to be superb during normal use. I only seem to charge
after well over 1000 shots. The build quality is excellent and it is a joy to hold.

The camera is not without its quirks however and it does take some getting used to.
The auto focus can be slow to catch, for example. So it's not so good for action shots
but it does take great portraits and its night shooting is excellent.
Once an entire pass of all topics has been completed we can by simply counting the number of topics contained within the
compute the frequency of all opinion patterns that have been review and the average word count associated with the cor-
recorded. A pattern is deemed to be valid (from the perspec- responding review sentences, as in Equation 2. Similarly we
tive of our ability to assign sentiment) if it occurs more than can aggregate the popularity of review topics, relative to the
the average number of occurrences over all patterns [Moghad- topics across the product as a whole as in Equation 3 (with
dam and Ester, 2010]. For valid patterns we assign senti- rank(Ti ) as a topic’s popularity rank for the product and
ment based on the sentiment of wmin and subject to whether U niqueT opics(Rk ) as the set of unique topics in a review);
Sj contains any negation terms within a 4-word-distance of so if a review covers many popular topics then it receives a
wmin . If there are no such negation terms then the sentiment higher score than if it covers fewer rare topics.
assigned to Ti in Sj is that of the sentiment word in the senti-
ment lexicon. If there is a negation word then this sentiment Breadth(Rk ) = |topics(Rk )| (1)
is reversed. If an opinion pattern is deemed not to be valid
(based on its frequency) then we assign a neutral sentiment P
∀Ti topics(Rk ) len(sentence(Rk , Ti ))
to each of its occurrences within the review set. Depth(Rk ) =
Breadth(Rk )
(2)
4 Classifying Helpful Reviews
In the previous section we described our approach for auto- X 1
matically mining topics (T1 , ..., Tm ) from review texts and as- T opicRank(Rk ) = (3)
rank(Ti )
signing sentiment values to them. Now we can associate each ∀Ti U niqueT opics(Rk )
review Ri with sentiment tuples, (Ri , Sj , Tk , +/ − / =), cor- When it comes to sentiment we can formulate a variety of
responding to a sentence Sj containing topic Tk with a senti- classification features from the number of positive (NumPos
ment value positive (+), negative (-), or neutral (=). and NumUPos), negative (NumNeg and NumUNeg) and neu-
To build a classifier for predicting review helpfulness we tral (NumNeutral and NumUNeutral) topics (total and unique)
adopt a supervised machine learning approach. In the data in a review, to the rank-weighted number of positive (WPos),
that is available to us each review has a helpfulness score that negative (WNeg), and neutral (WNeutral) topics, to the rel-
reflects the percentage of positive votes that it has received, if ative sentiment, positive (RelUPos), negative (RelUNeg), or
any. In this work we label a review as helpful if and only if neutral (RelUNeutral), of a review’s topics; see Table 1.
it has a helpfulness score in excess of 0.75. All other reviews We also include a measure of the relative density of opin-
are labeled as unhelpful; thus we adopt a similar approach to ionated (non-neutral sentiment) topics in a review (see Equa-
that described by [O’Mahony and Smyth, 2009]. tion 4) and a relative measure of the difference between the
To represent review instances we rely on a standard overall review sentiment and the user’s normalized product
feature-based encoding using a set of 7 different types of fea- rating, i.e. SignedRatingDif f (Rk ) = RelU P os(Rk ) −
tures including temporal information (AGE), rating informa- N ormU serRating(Rk ); we also compute an unsigned ver-
tion (RAT ), simple sentence and word counts (SIZE), top- sion of this metric. The intuition behind the rating difference
ical coverage (T OP ), sentiment information (SEN T ), read- metrics is to note whether the user’s overall rating is similar
ability metrics (READ), and content made up of the top to or different from the positivity of their review content. Fi-
50 most popular topics extracted from the reviews (CN T ). nally, as shown in Table 1 each review instance also encodes
These different types, and the corresponding individual fea- a vector of the top 50 most popular review topics (CNT), in-
tures are summarised in Table 1. Some of these features, dicating whether it is present in the review or not.
such as rating, word and sentence length, date and readabil-
ity have been considered in previous work [Kim et al., 2006;
|pos(topics(Rk ))| + |neg(topics(Rk ))|
Liu et al., 2008; O’Mahony and Smyth, 2010] and reflect best Density(Rk ) =
practice in the field of review classification. But the topical |topics(Rk )|
and sentiment features (explained in detail below) are novel, (4)
and in this paper our comparison of the performance of the 4.2 Expanding Basic Features
different feature sets is intended to demonstrate the efficacy
of our new features (in isolation and combination) in com- Each of the basic features in Table 1 is calculated for a given
parison to classical benchmarks across a common dataset and single review. For example, we may calculate the breath of
experimental setup. review Ri to be 5, indicating that it covers 5 identified topics.
Is this a high or low value for the product in question, which
4.1 From Topics and Sentiment to Classification may have tens or even hundreds of reviews written about it?
Features For this reason, in addition to this basic feature value, we
include 4 other variations as follows to reflect the distribution
For each review Rk , we assign a collection of topics of its values across a particular product:
(topics(Rk ) = T1 , T2 , ..., Tm ) and corresponding sentiment
scores (pos/neg/neutral) which can be considered in isola- • The mean value for this feature across the set of reviews
tion and/or in aggregate as the basis for classification fea- for the target product.
tures. For example, we can encode information about a re- • The standard deviation of the values for this feature
view’s breadth (see Equation 1) and depth of topic coverage across the target product reviews.
Table 1: Classification Feature Sets.

Type Feature # Description

AGE Age 1 The number of days since the review was posted.
RAT N ormU serRating 1 A normalised rating score obtained by scaling the user’s rating into the interval [0, 1].
N umSentences 1 The number of sentences in the review text.
SIZE
N umW ords 1 The total number of words in the review text.
Breadth 1 The total number of topics mined from the review.
Depth 1 The average number of words per sentence containing a mined topic.
TOP
Redundancy 1 The total word-count of sentences that are not associated with any mined topic.
T opicRank 1 The sum of the reciprocal popularity ranks for the mined topics present; popularity
ranks are calculated across the target product.
N umP os (N eg, N eutral) 3 The number of positive, negative, and neutral topics, respectively.
Density 1 The percentage of review topics associated with non-neutral sentiment.
N umU P os (N eg, N eutral) 3 The number of unique topics with positive/negative/neutral sentiment.
SENT
W P os (N eg, N eutral) 3 The number of positive, negative, and neutral topics, weighted by their reciprocal pop-
ularity rank.
RelU P os (N eg, N eutral) 3 The relative proportion of unique positive/negative/neutral topics.
SignedRatingDif f 1 The value of RelU P os minus N ormU serRating
U nsignedRatingDif f 1 The absolute value of RelU P os minus N ormU serRating
N umComplex 1 The number of ’complex’ words (3 or more syllables) in the review text.
SyllablesP erW ord 1 The average number of syllables per word
W ordsP erSen 1 The average number of words per sentence
READ GunningF ogIndex 1 The number of years of formal education required to understand the review.
F leschReadingEase 1 A standard readability score on a scale from 1 (30 - very difficult) to 100 (70 - easy).
KincaidGradeLevel 1 Translates FleschReadingEase into KincaidGradeLevel required (U.S. grade level).
SM OG 1 Simple Measure of Gobbledygood (SMOG) estimates the years of education required,
see [DuBay, 2004].
CNT 50 The top 50 most frequent topics that occur in a particular product’s reviews.

• The normalised value for the feature based on the num- unique products. We focused on 4 product categories — Dig-
ber of standard deviations above (+) or below (-) the ital Cameras (DC), GPS Devices, Laptops, Tablets — and
mean. labeled them as helpful or unhelpful, depending on whether
• The rank of the feature value, based on a descending their helpfulness score was above 0.75 or not, as described
ordering of the feature values for the target product. in Section 4. For the purpose of this experiment, all reviews
included at least 5 helpfulness scores (to provide a reliable
Accordingly most of the features outlined in Table 1 trans- ground-truth) and the helpful and unhelpful sets were sam-
late into 5 different actual features (the original plus the above pled so as to contain approximately the same number of re-
4 variations) for use during classification. This is the case for views. Table 2 presents a summary of these data, per product
every feature (30 in all) in Table 1 except for the content fea- type, including the average helpfulness scores across all re-
tures (CN T ). Thus each review instance is represented as a views, and separately for helpful and unhelpful reviews.
total set of 200 features ((30 × 5) + 50).
Category #Reviews #Prod. Avg. Helpfulness
5 Evaluation Help. Unhelp. All
We have described techniques for extracting topical features DC 3180 113 0.93 0.40 0.66
GPS Devices 2058 151 0.93 0.46 0.69
from product reviews and an approach for assigning senti-
Laptops 4172 592 0.93 0.40 0.67
ment to review sentences that cover these topics. Our hypoth- Tablets 6652 241 0.92 0.39 0.65
esis is that these topical and sentiment features will help when
it comes to the automatic classification of user generated re-
views, into helpful and unhelpful categories, by improving Table 2: Filtered and Balanced Datasets.
classification performance above and beyond more traditional
features (e.g. terms, ratings, readability); see [Kim et al., Each review was processed to extract the classification fea-
2006; O’Mahony et al., 2009]. In this section we test this tures described in Section 4. Here we are particularly in-
hypothesis on real-world review data for a variety of product terested in understanding the classification performance of
categories using a number of different classifiers. different categories of features. In this case we consider 8
different categories, AGE, RAT, SIZE, TOP, SENT-1, SENT-
5.1 Datasets & Setup 2, READ, CNT. Note, we have split the sentiment features
The review data for this experiment was extracted from Ama- (SEN T ) into two groups SENT-1 and SENT-2. The latter
zon.com during October 2012; 51,837 reviews from 1,384 contains all of the sentiment features from Table 1 whereas
the former excludes the ratings difference features (signed and sentiment features contributes to an uplift in classification
and unsigned) so that we can better gauge the influence of rat- performance, particularly with respect to more conventional
ing information (usually a powerful classification feature in features that have been traditionally used for review classifi-
its own right) within the sentiment feature-set. Accordingly cation. In Figure 3(d) we present summary classification re-
we prepared corresponding datasets for each category (Dig- sults according to product category when we build classifiers
ital Cameras, GPS Devices, Laptops and Tablets) in which using the combination of all types of features. Once again
the reviews were represented by a single set of features; for we can see strong classification performance. We achieve an
example, the SENT-1 dataset consists of reviews (one set of AUC of more than 0.7 for all conditions and the RF classifier
reviews for each product category) represented according to delivers an AUC close to 0.8 or beyond for all categories.
the SENT-1 features only.
For the purpose of this evaluation we used three commonly
used classifiers: RF (Random Forest), JRip and NB (Naive
6 Recommending Helpful Reviews
Bayes), see [Witten and Frank, 2005]. In each case we evalu- In many situations users are faced with a classical informa-
ated classification performance, in terms of the area under the tion overload problem: sifting through potentially hundreds
ROC curve (AUC) using a 10-fold cross validation. or even thousands of product opinions. Sites like Amazon
collect review helpfulness feedback so that they can rank re-
5.2 Results views by their average helpfulness scores but this is far from
The results are presented in Figures 3(a-d). In Figures 3(a-c) perfect. Many reviews (often a majority) have received very
we show the AUC performance for each classification algo- few or no helpfulness scores. This is especially true for more
rithm (RF, JRip, NB) separately; each graph plots the AUC recent reviews, which arguably may be more reliable in the
of one algorithm for the 8 different categories of classifica- case of certain product categories (e.g. hotel rooms). More-
tion features for each of the four different product categories over, if reviews are sorted by helpfulness then it is unlikely
(DC, GPS, Laptop, and Tablet). Figure 3(d) provides a direct that users will get to see those yet to be rated making it even
comparison of all classification algorithms (RF, JRip, NB); less likely that they will attract ratings. It quickly becomes a
here we use a classifier using all features combined. AUC case of “the rich get richer” for those early rated helpful re-
values in excess of 0.7 can be considered as useful from a views. This is one of the strong motivations behind our own
classification performance viewpoint [Streiner and Cairney, work on review classification, but can our classifier be used
2007]. Overall we can see that RF tends to produce better to recommend helpful reviews to the end user?
classification performance across the various feature groups Amazon currently adopts a simple approach to review rec-
and product categories. Classification performance tends to ommendation, by suggesting the most helpful positive and
be poorer for the GPS dataset compared to Laptop, Tablet, most helpful critical review from a review collection. To eval-
and DC. uate the ability of our classifier to make review recommenda-
We know from previous research that ratings information tions we can use the classification confidence as one simple
proves to be particularly useful when it comes to evaluating way to rank-order helpful reviews and select the top-ranked
review helpfulness [Kim et al., 2006]. It is perhaps no sur- review for recommendation to the user. In this experiment we
prise therefore to see that our ratings-based features perform select the single most confident helpful review for each indi-
well, often achieving an AUC > 0.7 on their own; for ex- vidual product across the four different product categories;
ample in Figure 3(a) we see an AUC of approximately 0.75 we refer to this strategy as Pred. Remember we are making
for the Laptop and Tablet datasets, compared to between 0.65 this recommendation without the presence of actual helpful-
and 0.69 for GPS and DC, respectively. Other ‘traditional’ ness scores and rely only on our ability to predict whether
feature groups (AGE, SIZE, READ, and CNT) rarely manage a review will be helpful. In this experiment we use an RF
to achieve AUC scores > 0.7 across the product categories. classifier using all features. As a baseline recommendation
We can see strong performance from the new topic and strategy we also select a review at random; we call this strat-
sentiment feature-sets proposed in this work. The SENT-2 egy Rand.
features consistently and significantly outperform all others, We can test the performance of these recommendation
with AUC scores in excess of 0.7 for all three algorithms and techniques in two ways. First, because we know the actual
across all four product categories; indeed in some cases the helpfulness scores of all reviews (the ground-truth) we can
SENT-2 features deliver AUC greater than 0.8 for DC, Lap- compare the recommended review to the review which has
top and Tablet products; see Figure 3(a)). The SENT-2 feature the actual highest helpfulness score for each product, and av-
group benefits from a combination of sentiment and ratings erage across all products in a given product class. Thus the
based features but a similar observation can be made for the two line graphs in Figure 4 plot the actual helpfulness of the
sentiment-only features of SENT-1, which also achieve AUC recommended reviews (for Pred and Rand) as a percentage
greater than 0.7 for almost classification algorithms and prod- of the actual helpfulness of the most helpful review for each
uct categories. Likewise, the topical features (TOP) also de- product; we call this the helpfulness ratio (HR). We can see
liver a strong performance with AU C > 0.7 for all product that Pred significantly outperforms Rand delivering a help-
categories except for GPS. fulness ratio of 0.9 and higher compared to approximately 0.7
These results bode well for a practical approach to review for Rand. This means that Pred is capable of recommending
helpfulness prediction/classification, with or without ratings a review that has an average helpfulness score that is 90% that
data. The additional information contained within the topical of the actual most helpful review.
(a) RF (b) JRip

(c) NB (d) All Features

Figure 3: Classification performance results: (a-c) for RF, JRip and NB classifiers and different feature groups; (d) comparison
of RF, JRip and NB for all features.

Incidentally, very often the most helpful review has a per-

fect review score of 1 and this review is often recommended
by Pred. In fact we can look at this by computing how often
Pred and Rand select their recommended review from among
the top k reviews ranked by actual helpfulness. In Figure 4
we present the results for k = 3 as bars for each product class.
For instance, we can see that for Laptops Pred recommends
the best possible review 60% of the time compared to only
about 37% for Rand. And in general we can see that Pred
manages to recommend the optimal review between 1.5 and
2 times as frequently as Rand.
In summary we have shown that our helpfulness classifier
can be used to recommend helpful reviews, without the need
for explicit helpfulness information, and that these recom-
mendations are almost as good as the optimal helpful reviews Figure 4: The average helpfulness ratio and top-k results for
that could be chosen if perfect helpfulness information was Pred and Rand across all product categories.
available. Again this bodes well for systems where helpful-
ness information is not available or is incomplete: it may still
be possible to identify and recommend those reviews (new or timent information from user-generated product reviews as
old) which are likely to be genuinely helpful. the basis for a review quality classification system. We have
demonstrated that these topical and sentiment features help
7 Conclusion to improve classification performance above and beyond that
which is possible using more conventional feature extraction
User-generated reviews are now an important source of techniques. We have further described a possible applica-
knowledge for consumers and are known to play an active tion of this classification approach and evaluated its ability
role in decision making in many domains. In this paper to make high quality review recommendations in practice.
we have described techniques for mining topical and sen-
References [Lim et al., 2010] Ee-Peng Lim, Viet-An Nguyen, Nitin Jin-
dal, Bing Liu, and Hady Wirawan Lauw. Detecting prod-
[Baccianella et al., 2009] Stefano Baccianella, Andrea Esuli,
uct review spammers using rating behaviors. In Proceed-
and Fabrizio Sebastiani. Multi-facet rating of product ings of the 19th ACM international conference on Infor-
reviews. In Advances in Information Retrieval, 31th mation and knowledge management, CIKM 2010, pages
European Conference on Information Retrieval Research 939–948, New York, NY, USA, 2010. ACM.
(ECIR 2009), pages 461–472, Toulouse, France, 2009.
Springer. [Liu et al., 2008] Y. Liu, X. Huang, A. An, and X. Yu. Mod-
eling and predicting the helpfulness of online reviews. In
[Chevalier and Dina Mayzlin, 2006] Judith A. Chevalier and Proceedings of the 2008 Eighth IEEE International Con-
Dina Dina Mayzlin. The effect of word of mouth on sales: ference on Data Mining (ICDM 2008), pages 443–452,
Online book reviews. Journal of Marketing Research, Pisa, Italy, December 15-19 2008. IEEE Computer Soci-
43(3):345–354, 2006. ety.
[Dellarocas et al., 2007] C. Dellarocas, M. Zhang, and N. F. [Moghaddam and Ester, 2010] Samaneh Moghaddam and
Awad. Exploring the value of online product reviews in Martin Ester. Opinion digger: an unsupervised opinion
forecasting sales: The case of motion pictures. Journal of miner from unstructured product reviews. In Proceedings
Interactive Marketing, 21(4):23–45, November 2007. of the 19th ACM international conference on Information
[Dhar and Chang, 2009] Vasant Dhar and Elaine A. Chang. and knowledge management, CIKM 2010, pages 1825–
Does chatter matter? the impact of user-generated con- 1828, New York, NY, USA, 2010. ACM.
tent on music sales. Journal of Interactive Marketing, [O’Callaghan et al., 2012] Derek O’Callaghan, Martin Har-
23(4):300–307, 2009. rigan, Joe Carthy, and Pádraig Cunningham. Network
analysis of recurring youtube spam campaigns. In ICWSM,
[DuBay, 2004] W.H. DuBay. The principles of readability.
2012.
Impact Information, pages 1–76, 2004.
[O’Mahony and Smyth, 2009] M. P. O’Mahony and
[Hsu et al., 2009] Chiao-Fang Hsu, Elham Khabiri, and B. Smyth. Learning to recommend helpful hotel reviews.
James Caverlee. Ranking comments on the social web. In Proceedings of the 3rd ACM Conference on Recom-
In Proceedings of the 2009 IEEE International Confer- mender Systems (RecSys 2009), New York, NY, USA,
ence on Social Computing (SocialCom 2009), pages 90– October 22-25 2009.
97, Vancouver, Canada, 2009.
[O’Mahony and Smyth, 2010] Michael P. O’Mahony and
[Hu and Liu, 2004a] Minqing Hu and Bing Liu. Mining and Barry Smyth. Using readability tests to predict helpful
summarizing customer reviews. In Proceedings of the product reviews. In Adaptivity, Personalization and Fusion
tenth ACM SIGKDD international conference on Knowl- of Heterogeneous Information, RIAO 2010, pages 164–
edge discovery and data mining, KDD 2004, pages 168– 167, Paris, France, 2010.
177, New York, NY, USA, 2004. ACM. [O’Mahony et al., 2009] M. P. O’Mahony, P. Cunningham,
[Hu and Liu, 2004b] Minqing Hu and Bing Liu. Mining and B. Smyth. An assessment of machine learning tech-
opinion features in customer reviews. Science, 4:755–760, niques for review recommendation. In Proceedings of the
2004. 20th Irish Conference on Artificial Intelligence and Cogni-
[Hu et al., 2008] Nan Hu, Ling Liu, and Jie Zhang. Do on- tive Science (AICS 2009), pages 244–253, Dublin, Ireland,
2009.
line reviews affect product sales? the role of reviewer char-
acteristics and temporal effects. Information Technology [Qiu et al., 2009] Guang Qiu, Bing Liu, Jiajun Bu, and Chun
and Management, 9:201–214, 2008. 10.1007/s10799-008- Chen. Expanding domain sentiment lexicon through dou-
0041-2. ble propagation. In Proceedings of the 21st international
jont conference on Artifical intelligence, IJCAI 2009,
[Justeson and Katz, 1995] J. Justeson and S. Katz. Technical pages 1199–1204, San Francisco, CA, USA, 2009. Mor-
terminology: some linguistic properties and an algorithm gan Kaufmann Publishers Inc.
for identification in text. Natural Language Engineering,
pages 9–27, 1995. [Streiner and Cairney, 2007] D.L. Streiner and J. Cairney.
What’s under the roc? an introduction to receiver operat-
[Kim et al., 2006] S.-M. Kim, P. Pantel, T. Chklovski, and ing characteristics curves. The Canadian Journal of Psy-
M. Pennacchiotti. Automatically assessing review help- chiatry/La revue canadienne de psychiatrie, 2007.
fulness. In Proceedings of the Conference on Empirical [Witten and Frank, 2005] I.H. Witten and E. Frank. Data
Methods in Natural Language Processing (EMNLP 2006),
Mining: Practical machine learning tools and techniques.
pages 423–430, Sydney, Australia, July 22-23 2006.
Morgan Kaufmann, 2005.
[Li et al., 2011] Fangtao Li, Minlie Huang, Yi Yang, and Xi- [Zhu and Zhang, 2010] Feng Zhu and Xiaoquan (Michael)
aoyan Zhu. Learning to identify review spam. In Proceed- Zhang. Impact of online consumer reviews on sales: The
ings of the Twenty-Second international joint conference moderating role of product and consumer characteristics.
on Artificial Intelligence - Volume Volume Three, IJCAI Journal of Marketing, 74(2):133–148, 2010.
2011, pages 2488–2493. AAAI Press, 2011.

View publication stats

1 - Workday HCM Main Book-1-81 PDF
86% (36)
1 - Workday HCM Main Book-1-81 PDF
81 pages
Common Commands in ICC2 2 Place Stage
No ratings yet
Common Commands in ICC2 2 Place Stage
5 pages
2013ScientiaHorticulturae PDF
No ratings yet
2013ScientiaHorticulturae PDF
7 pages
EE ImperfectCSIv2
No ratings yet
EE ImperfectCSIv2
6 pages
The-Value-of-Multiproxy-Reconstruction-of-Past-Climate
No ratings yet
The-Value-of-Multiproxy-Reconstruction-of-Past-Climate
30 pages
Compressive Strength and Behaviour of Gusset Plate
No ratings yet
Compressive Strength and Behaviour of Gusset Plate
19 pages
2010-Facilitating Preservice Teachers Development of T
No ratings yet
2010-Facilitating Preservice Teachers Development of T
246 pages
In Uence of Surface Roughness of Bragg Re Ectors On Resonance Characteristics of Solidly-Mounted Resonators
No ratings yet
In Uence of Surface Roughness of Bragg Re Ectors On Resonance Characteristics of Solidly-Mounted Resonators
8 pages
ProductionandcharacterizationofectoineusingamoderatelyhalophilicstrainHalomonassalinaBCRC17875 (1)
No ratings yet
ProductionandcharacterizationofectoineusingamoderatelyhalophilicstrainHalomonassalinaBCRC17875 (1)
8 pages
Jms LSTM Rul Prediction
No ratings yet
Jms LSTM Rul Prediction
10 pages
Arxiv
No ratings yet
Arxiv
11 pages
65-2020-611-615
No ratings yet
65-2020-611-615
6 pages
2017 MA Teaching Chinese As Second or Foreign Language
No ratings yet
2017 MA Teaching Chinese As Second or Foreign Language
18 pages
Mobileinsight Mobicom16
No ratings yet
Mobileinsight Mobicom16
15 pages
Amphibians and Reptiles Matang Wildlife Centre
No ratings yet
Amphibians and Reptiles Matang Wildlife Centre
8 pages
Effective Ways in Teaching Chinese Characters Without Phonetic Clues
No ratings yet
Effective Ways in Teaching Chinese Characters Without Phonetic Clues
8 pages
Srep 04936
No ratings yet
Srep 04936
5 pages
Carbon Nanowalls and Related Materials
No ratings yet
Carbon Nanowalls and Related Materials
10 pages
Genomic and Protein Structure Modelling Analysis D
No ratings yet
Genomic and Protein Structure Modelling Analysis D
11 pages
Conceptual Framework and Roadmap Approach For Integrating BIM Into Lifecycle Project Management
No ratings yet
Conceptual Framework and Roadmap Approach For Integrating BIM Into Lifecycle Project Management
11 pages
Dynamic Modeling of Universal Motors: June 1999
No ratings yet
Dynamic Modeling of Universal Motors: June 1999
4 pages
Characteristics of Groundwater Seepage With Cut-Off Wall in Gravel Aquifer. I: Field Observations
No ratings yet
Characteristics of Groundwater Seepage With Cut-Off Wall in Gravel Aquifer. I: Field Observations
14 pages
2015PVLDBInfluentialCommunitySearchinLargeNetworks
No ratings yet
2015PVLDBInfluentialCommunitySearchinLargeNetworks
13 pages
Finding Ideal Menu Items Assortments An Empirical PDF
No ratings yet
Finding Ideal Menu Items Assortments An Empirical PDF
12 pages
20090521biotechniques Prim SNPing
No ratings yet
20090521biotechniques Prim SNPing
7 pages
Anti-Inflammatory Effects of Punica Granatum Linne
No ratings yet
Anti-Inflammatory Effects of Punica Granatum Linne
9 pages
2014 459 Tjog Medicaltreatmentofadenomyoma
No ratings yet
2014 459 Tjog Medicaltreatmentofadenomyoma
8 pages
Res TFG03
No ratings yet
Res TFG03
7 pages
2017huang Leafpolymer
No ratings yet
2017huang Leafpolymer
7 pages
GoingDeeperwithEmbeddedFPGAPlatformforConvolutionalNeuralNetwork
No ratings yet
GoingDeeperwithEmbeddedFPGAPlatformforConvolutionalNeuralNetwork
11 pages
Teng Et Al.2007
No ratings yet
Teng Et Al.2007
11 pages
1 s2.0 S014102962401719X Main
No ratings yet
1 s2.0 S014102962401719X Main
2 pages
2021 Neuroprotective Effects and Therapeutic Potential of Transcorneal Electrical Stimulation For Depression
No ratings yet
2021 Neuroprotective Effects and Therapeutic Potential of Transcorneal Electrical Stimulation For Depression
22 pages
CFRP Sheets For Exural Strengthening of RC Beams: July 2011
No ratings yet
CFRP Sheets For Exural Strengthening of RC Beams: July 2011
10 pages
Klasifikasi Teks Menggunakan Label Kata
No ratings yet
Klasifikasi Teks Menggunakan Label Kata
7 pages
Opass: A User Authentication Protocol Resistant To Password Stealing and Password Reuse Attacks
No ratings yet
Opass: A User Authentication Protocol Resistant To Password Stealing and Password Reuse Attacks
14 pages
Experimental and Numerical Studies On The Performances of Stone Column and Sand Compaction Piles
No ratings yet
Experimental and Numerical Studies On The Performances of Stone Column and Sand Compaction Piles
7 pages
A Survey of Medical Image Registration On Graphics
No ratings yet
A Survey of Medical Image Registration On Graphics
15 pages
Iecon 11 2009
No ratings yet
Iecon 11 2009
7 pages
Pagination FIELD 1093351
No ratings yet
Pagination FIELD 1093351
14 pages
B1 BiocharCombinedwithGarbageEnzymeEnhancesNitrogen
No ratings yet
B1 BiocharCombinedwithGarbageEnzymeEnhancesNitrogen
16 pages
Moon Snail Eggmass
No ratings yet
Moon Snail Eggmass
29 pages
2020ICKIIPredatoryJournalClassificationUsingMachineLearning
No ratings yet
2020ICKIIPredatoryJournalClassificationUsingMachineLearning
5 pages
Softrotofpotatoescausedby Bacillusamyloliquefaciensin Guangdongprovince China
No ratings yet
Softrotofpotatoescausedby Bacillusamyloliquefaciensin Guangdongprovince China
9 pages
Channel Modeling by RBF Neural Networks For 5G Mm-Wave Communication
No ratings yet
Channel Modeling by RBF Neural Networks For 5G Mm-Wave Communication
6 pages
A Simple Method To Extract DNA From Hair Shafts Us
No ratings yet
A Simple Method To Extract DNA From Hair Shafts Us
8 pages
Formulation of Granules For Site-Specific Delivery of An Antimicrobial Essential Oil To The Animal Intestinal Tract
No ratings yet
Formulation of Granules For Site-Specific Delivery of An Antimicrobial Essential Oil To The Animal Intestinal Tract
11 pages
Compact Microstrip Patch Array Antenna With Parasitically Coupled Feed
No ratings yet
Compact Microstrip Patch Array Antenna With Parasitically Coupled Feed
5 pages
Açık Kanal Akışlarında Yatak Malzemesi Tortu Deşarjını Tahmin Etmek İçin Karma Hız Ölçeği #Velocity Scale
No ratings yet
Açık Kanal Akışlarında Yatak Malzemesi Tortu Deşarjını Tahmin Etmek İçin Karma Hız Ölçeği #Velocity Scale
9 pages
Chen 2010
No ratings yet
Chen 2010
9 pages
ApplPhysB HeatConduction2006
No ratings yet
ApplPhysB HeatConduction2006
29 pages
Kscejce 2008 p379
No ratings yet
Kscejce 2008 p379
12 pages
JSAC ICIRRUS Paper Ethernet Fronthaul
No ratings yet
JSAC ICIRRUS Paper Ethernet Fronthaul
10 pages
HardAttentionNetforRetinaVesselSegmentation Final
No ratings yet
HardAttentionNetforRetinaVesselSegmentation Final
13 pages
BIMGISNoise
No ratings yet
BIMGISNoise
16 pages
Extraction of Fingerprint From Regular Expression
No ratings yet
Extraction of Fingerprint From Regular Expression
7 pages
2023 FH 3D Printability Pea System
No ratings yet
2023 FH 3D Printability Pea System
10 pages
Tunneling_Field-Effect_Transistors_TFETs_With_Subt
No ratings yet
Tunneling_Field-Effect_Transistors_TFETs_With_Subt
4 pages
A_10_to_115GHz_rotational_phase_and_frequency_dete
No ratings yet
A_10_to_115GHz_rotational_phase_and_frequency_dete
5 pages
Altered Immune Responses in Mice Lacking Inducible Nitric Oxide Synthase
No ratings yet
Altered Immune Responses in Mice Lacking Inducible Nitric Oxide Synthase
5 pages
A Computationally Efficient and Hierarchical Control Strategy for Velocity Optimization of on-Road Vehicles (2018)
No ratings yet
A Computationally Efficient and Hierarchical Control Strategy for Velocity Optimization of on-Road Vehicles (2018)
12 pages
A Simplified Approach to It Architecture with Bpmn: A Coherent Methodology for Modeling Every Level of the Enterprise
From Everand
A Simplified Approach to It Architecture with Bpmn: A Coherent Methodology for Modeling Every Level of the Enterprise
David W. Enstrom
No ratings yet
Presentation Day 3 - Lasso-Ridge Regression, Logistic Regression, SVM
No ratings yet
Presentation Day 3 - Lasso-Ridge Regression, Logistic Regression, SVM
56 pages
IJSDR191671
No ratings yet
IJSDR191671
4 pages
Machine Learning Based Prediction of Consumer Purchasing Decisions: The Evidence and Its Significance
No ratings yet
Machine Learning Based Prediction of Consumer Purchasing Decisions: The Evidence and Its Significance
7 pages
Master Sarvi Tuukka 2020
No ratings yet
Master Sarvi Tuukka 2020
68 pages
Report Tag I - 202109
No ratings yet
Report Tag I - 202109
522 pages
Waves Optics MHT Cet Notes and Pyqs
No ratings yet
Waves Optics MHT Cet Notes and Pyqs
11 pages
Mechanism PPT Chapt 6 From Gemechu Group-1
No ratings yet
Mechanism PPT Chapt 6 From Gemechu Group-1
53 pages
Prog - in C 2018
No ratings yet
Prog - in C 2018
4 pages
PM101 Rca
No ratings yet
PM101 Rca
19 pages
MM-REACT Prompting ChatGPT For Multimodal Reasoning and Action
No ratings yet
MM-REACT Prompting ChatGPT For Multimodal Reasoning and Action
31 pages
Eesof Design Forum: Agilent Eesof Solutions For Your RF Board Design Flow
No ratings yet
Eesof Design Forum: Agilent Eesof Solutions For Your RF Board Design Flow
16 pages
Playfair Cipher - Encryption (Java)
No ratings yet
Playfair Cipher - Encryption (Java)
12 pages
Spec Eng Afc Asian Cup Qatar Women's U17 - 13 Mei - 22 Mei 2024
No ratings yet
Spec Eng Afc Asian Cup Qatar Women's U17 - 13 Mei - 22 Mei 2024
100 pages
Models3d 12345
No ratings yet
Models3d 12345
45 pages
Download ebooks file (Ebook) Human Interaction, Emerging Technologies and Future Systems V by Manfred Baumgartner; Martin Klonk; Christian Mastnak; Helmut Pichler; Richard Seidl; Siegfried Tancz ISBN 9783030855406, 3030855406 all chapters
100% (5)
Download ebooks file (Ebook) Human Interaction, Emerging Technologies and Future Systems V by Manfred Baumgartner; Martin Klonk; Christian Mastnak; Helmut Pichler; Richard Seidl; Siegfried Tancz ISBN 9783030855406, 3030855406 all chapters
81 pages
Cad Data Translators
No ratings yet
Cad Data Translators
27 pages
Lab 01 Getting started with Fanuc robots
100% (1)
Lab 01 Getting started with Fanuc robots
3 pages
Topview: 1. Introduction of Topview Software
100% (1)
Topview: 1. Introduction of Topview Software
16 pages
5 PLC Program To Implement A Combinational Logic Circuit
No ratings yet
5 PLC Program To Implement A Combinational Logic Circuit
4 pages
Climatic-Chambers DS FDM
No ratings yet
Climatic-Chambers DS FDM
3 pages
Modeling Labs For Personal
No ratings yet
Modeling Labs For Personal
2 pages
LS-DYNA Manual Volume I R12
No ratings yet
LS-DYNA Manual Volume I R12
3,527 pages
Dynamic Power Sharing Between LTE Carriers (ERAN18.1 - Draft A)
No ratings yet
Dynamic Power Sharing Between LTE Carriers (ERAN18.1 - Draft A)
66 pages
STQA Experiment 5 8 PDF
No ratings yet
STQA Experiment 5 8 PDF
14 pages
PRO132 FuelsManager Service Station Management
No ratings yet
PRO132 FuelsManager Service Station Management
2 pages
BoostLi Energy Storage Module ESM-48150A3 User Manual
No ratings yet
BoostLi Energy Storage Module ESM-48150A3 User Manual
51 pages
Cheethirala V S Sampath Kumar: S I T e E N G I N e e R
No ratings yet
Cheethirala V S Sampath Kumar: S I T e E N G I N e e R
3 pages
Kuka - Experttech 2.4: Kuka Robot Group Kuka System Technology (KST)
100% (2)
Kuka - Experttech 2.4: Kuka Robot Group Kuka System Technology (KST)
45 pages
Personalized Yoga Recommendation System
No ratings yet
Personalized Yoga Recommendation System
2 pages
Python for Chemists Christian Hill - Read the ebook online or download it to own the complete version
100% (2)
Python for Chemists Christian Hill - Read the ebook online or download it to own the complete version
62 pages
SCM Project 1
No ratings yet
SCM Project 1
31 pages
Awk Script PDF
No ratings yet
Awk Script PDF
3 pages
4-LS6 DL Demonstrate How To Openclose A Word Processing Application
No ratings yet
4-LS6 DL Demonstrate How To Openclose A Word Processing Application
8 pages

Topic Extraction From Online Reviews For Classification and Recommendation

Uploaded by

Topic Extraction From Online Reviews For Classification and Recommendation

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Topic Extraction from Online Reviews for Classiﬁcation and Recommendation

Conference Paper · August 2013

Ruihai Dong Markus Schaal

SEE PROFILE SEE PROFILE

Michael P. O’Mahony Barry Smyth

SEE PROFILE SEE PROFILE

Financial Index Forecasting View project

The user has requested enhancement of the downloaded file.

(Part of Speech Tagging)

School of Computer Science and Informatics

Opinion Patterns Opinion Pattern

Type Feature # Description

(c) NB (d) All Features

Incidentally, very often the most helpful review has a per-

View publication stats

You might also like