01 What Is Text Classification 8-12

Uploaded by

idhitappu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

01 What Is Text Classification 8-12

Uploaded by

idhitappu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

In this lecture, we'll introduce the topic

of text classification and the Naive Bayes

algorithm which is one of the most
important ways of doing text
classification. Let's begin by looking at
some examples of text classification
applications. Here I've shown an email
that I actually received the other day.
How do I know that this email is spam?
Take a look at the mail and think of some
features you might automatically extract
from this email that tells you that it's
spam. You might notice the word greats. A
mis, a misspelling of great, so we have a
typo here. Maybe you might notice
important notice and maybe an exclamation
point. It's pretty rare that universities
put exclamation points in their subject
headers. You might notice that there's no,
Dan here, it's not addressed to me in
particular, and we have undisclosed
recipients and there's no particular
address. And the URL's a little funny
here, that's not a Stanford URL. Maybe the
word exciting. Each of these features can
be combined in a classified give us some
evidence that we got a piece of spam.
Another important text classification
class is authorship attribution. How do I
know which author wrote which piece of
text. One of the most famous examples of
authorship attribution is the famous
anonymous essays called the Federalist
Papers that were written at the beginning
of the history of our country in part to
convince the state of New York to ratify
the early constitution. And three authors
wrote various numbers of the letters but
twelve of the letters it wasn't clear
which author wrote. And in 1960 the And in
1963 Mosteller and Wallace show that
Bayesian methods were able to distinguish
which letters were written by Madison and
which letters were written by Hamilton.
And the Bayesian methods that they used in
1963, gave rise to the naive base method
that we're going to be talking about
today. Another text classification task is
gender identification. Determining if an
author is male or female. Recent research
in gender identification has shown that we
can look at the number of pronouns and
other features. The number of determiners,
the number of noun phrases, are, subtlety
indicative of the difference between male
and female writers. Female writers tend to
use more pronouns and male writers tend to
use more facts and determiners in their
noun phrases. And you can see from that,
that, here we have a lot of pronouns. And
here we have a lot of determiners and, and
factual sentences with, with the copula
verbs and so you might determine that this
in fact a male and this is a female and
that would be correct. This is the author
Margaret Drabble and this is the author
Anthony Grey. Another text classification
task is sentiment analysis. And one of the
classic sentiment analysis tasks is movie
review identification. Given a review,
whether it's a movie or a product, can I
tell whether this review is positive or
negative. And although I'm gonna show you
an example here for movies, this can apply
to any product review for any, any product
or service you might find on the web. So
this is actually a very important
commercial application. So, suppose we saw
a review that said unbelievably
disappointing. Well, that's clearly a
negative review. How'bout, full of zany
characters and richly applied satire,
positive. How'bout, this is the greatest
screwball comedy ever filmed. We've got
words like greatest, or greatest ever,
that's very positive. How'bout, it was
pathetic. The worst part about it was the
boxing scenes. Here, we've got evidence
like pathetic and worst, and so on, to
tell us that this is, in fact, a negative
review. Text classification often. We also
apply text classification to scientific
articles. For example, deciding what the
topic of a particular article in a data
base like [inaudible] line might be. For
example, we might have to decide, you
know, automatically indexing an article
which of various subjects, antagonist or
blood supply or drug therapy or
epidemiology, apply to any particular
article that's written that's in our data
base. So, in summary text classification
is the task of assigning any kind of topic
category to any piece of text. And that
could be subject categories in, in, in
some kind of online data base. It could be
detecting spam. It could be choosing an
author from a set of authors, choosing
their gender or maybe it's their age. You
want to find the young writers or old
writers. Telling if a language [inaudible]
if a text was written in one language
versus another language. And the important
application of sentimentals. All of these
are examples of text classification. Let's
define the task of text classification. We
have as input a document d, and then a
fixed set of classes a set c with j
classes c1, c2 up 'til cj. And our goal
given this document in the set of classes
is to predict a class c from that set of
classes. Our job is to take a document and
assign a class to that document. How do we
do this? The simplest possible text
classification method is to use hand
return rules. So for example if we're
doing spam detection, we might just have a
list of, of bad email addresses or
blacklists that these people are probably
spammers. Or we might look for phrases
like millions of dollars, or you have been
selected. These are good indications that
we have spam. And if these rules are
carefully refined by an expert, you can
get high accuracy from hand written rules.
But in general, building and maintaining
these rules is expensive. So, although
hand coded rules are often used as part of
a system of text classification we, we
generally combine that with an important
method for machine learning. This method
in supervised machine learning. So in
supervised machine learning. We have a
document D, just as we did before and a
fixed set of classes, as we did before,
but we need one more thing now. We need a
training set of some la. Some documents
that have been hand labeled for their
class. So we have for document one, we
know that it's in class one. For document
two it's in some other class. Maybe for
document M we have a label for the class
of document M. So given. The document, the
set of classes and the fixed and the
training set of hand labeled documents.
The goal of machine learning is to produce
a classifier and we'll be using gamma to
refer to the classifier and gamma's a
function that, given a new document will
give us the class. So, given a set of
training labels of documents and classes,
we'll learn a classifier that maps a
document to a class. There's lots of kinds
of machine learning classifiers. We're
going to talk today about naive Bayes, but
we'll see, we'll look later, in the
course, we'll talk about logistic
regression and we'll touch on other kinds
of classifiers, like support vector
machines or so called SVMs, [inaudible]
neighbors, lots of other classifiers. No
matter which classifier we use, the task
of text classification is to take a
document, it's text, other kinds of
features and extract features that
represent the document and build the
classifier that can tell us which class
the document belongs to.

Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
1ISB Leadership Course with AI _ Leadership course from ISB institute
No ratings yet
1ISB Leadership Course with AI _ Leadership course from ISB institute
20 pages
Infosys Ar 20 PDF
No ratings yet
Infosys Ar 20 PDF
313 pages
Prisha
No ratings yet
Prisha
23 pages
Deep Learning
No ratings yet
Deep Learning
42 pages
NLP_CS(AI)
No ratings yet
NLP_CS(AI)
103 pages
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
No ratings yet
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
20 pages
Artificial Intelligence Structures and Strategies for Complex Problem Solving 5th edition by George Luger ISBN 0321263189 978-0321263186 - The full ebook version is ready for instant download
No ratings yet
Artificial Intelligence Structures and Strategies for Complex Problem Solving 5th edition by George Luger ISBN 0321263189 978-0321263186 - The full ebook version is ready for instant download
47 pages
Survey On Text Classification
No ratings yet
Survey On Text Classification
7 pages
02 Naive Bayes 3-19
No ratings yet
02 Naive Bayes 3-19
2 pages
05 Smoothing - Add-One 6-30
No ratings yet
05 Smoothing - Add-One 6-30
3 pages
08 Kneser-Ney Smoothing 8-59
No ratings yet
08 Kneser-Ney Smoothing 8-59
3 pages
02 Regular Expressions in Practical NLP 6-04
No ratings yet
02 Regular Expressions in Practical NLP 6-04
3 pages
Prasang Biyani
No ratings yet
Prasang Biyani
1 page
03 Real-Word Spelling Correction 9-19
No ratings yet
03 Real-Word Spelling Correction 9-19
4 pages
Data Clustering Using Particle Swarm Optimization: PSO Is PSO by PSO A
No ratings yet
Data Clustering Using Particle Swarm Optimization: PSO Is PSO by PSO A
6 pages
Machine Learning in Medicine
No ratings yet
Machine Learning in Medicine
12 pages
111 1460444112 - 12-04-2016 PDF
No ratings yet
111 1460444112 - 12-04-2016 PDF
7 pages
Assessment 2 Project
No ratings yet
Assessment 2 Project
6 pages
CH4
No ratings yet
CH4
98 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
Theis finaldoc
No ratings yet
Theis finaldoc
86 pages
text classification research paper 2
No ratings yet
text classification research paper 2
7 pages
L2_CSE256_FA24_TC
No ratings yet
L2_CSE256_FA24_TC
65 pages
A Review on the Role of Big Data Analytics in The
No ratings yet
A Review on the Role of Big Data Analytics in The
8 pages
NLP_A2 (2)
No ratings yet
NLP_A2 (2)
7 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
A Survey On Machine Learning Techniques
No ratings yet
A Survey On Machine Learning Techniques
8 pages
Unit-3
No ratings yet
Unit-3
27 pages
02u Handout
No ratings yet
02u Handout
37 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
Lecture-Feb20&25
No ratings yet
Lecture-Feb20&25
11 pages
Module-5 (1)
No ratings yet
Module-5 (1)
57 pages
Text Classification
No ratings yet
Text Classification
24 pages
02 The Noisy Channel Model of Spelling 19-30
No ratings yet
02 The Noisy Channel Model of Spelling 19-30
12 pages
Ai & Ml Unit-3 Ir & Ie
No ratings yet
Ai & Ml Unit-3 Ir & Ie
15 pages
RevisedReport-1-1
No ratings yet
RevisedReport-1-1
20 pages
Selected Text Analysis 2
No ratings yet
Selected Text Analysis 2
20 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
Impact of Artificial Intelligence On Vietnam Commercial Bank Operations
No ratings yet
Impact of Artificial Intelligence On Vietnam Commercial Bank Operations
8 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Text Classification[1][1]
No ratings yet
Text Classification[1][1]
11 pages
1.1 Handout - Machine Learning for Text - Applications
No ratings yet
1.1 Handout - Machine Learning for Text - Applications
6 pages
7 - Text Classification Naive Bayes
No ratings yet
7 - Text Classification Naive Bayes
41 pages
NLP Unit-3
No ratings yet
NLP Unit-3
17 pages
Naive Bayes Sentiment Analysis
No ratings yet
Naive Bayes Sentiment Analysis
23 pages
Machine Learning-Based Mix Design Tools To Minimize Carbon Footprint and Cost of UHPC. Part 1 - Efficient Data Collection and Modeling
No ratings yet
Machine Learning-Based Mix Design Tools To Minimize Carbon Footprint and Cost of UHPC. Part 1 - Efficient Data Collection and Modeling
14 pages
Task 3
No ratings yet
Task 3
17 pages
NLP NB
No ratings yet
NLP NB
52 pages
Decision Trees Palagraism
No ratings yet
Decision Trees Palagraism
16 pages
NLP ch4 l1
No ratings yet
NLP ch4 l1
23 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
A Survey On Different Types of Approaches To Text Categorization
No ratings yet
A Survey On Different Types of Approaches To Text Categorization
3 pages
Machine Learning in Automated Text Categorization
No ratings yet
Machine Learning in Automated Text Categorization
55 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
67 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
Artificial Intelligence (Unit - 10)
No ratings yet
Artificial Intelligence (Unit - 10)
48 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Ijcst V3i2p17
No ratings yet
Ijcst V3i2p17
5 pages
research paper 3
No ratings yet
research paper 3
7 pages
4.Machine Learning for Text Understanding-1
No ratings yet
4.Machine Learning for Text Understanding-1
45 pages
Unit 2
No ratings yet
Unit 2
26 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
Lect05
No ratings yet
Lect05
17 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
Kshitij Text Classification
No ratings yet
Kshitij Text Classification
20 pages
Text Classification: Slides Adapted From Lyle Ungar and Dan Jurafsky
No ratings yet
Text Classification: Slides Adapted From Lyle Ungar and Dan Jurafsky
29 pages
Learning To Learn With Quantum Neural Networks Via Classical Neural Networks
No ratings yet
Learning To Learn With Quantum Neural Networks Via Classical Neural Networks
12 pages
Analyzing Variations of Opinions On Twitter: R. Nisha Pauline
No ratings yet
Analyzing Variations of Opinions On Twitter: R. Nisha Pauline
5 pages
MDM-23-24
No ratings yet
MDM-23-24
37 pages
Batch 30
No ratings yet
Batch 30
5 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Document
No ratings yet
Document
7 pages
05 Sentence Segmentation 5-31
No ratings yet
05 Sentence Segmentation 5-31
3 pages
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
26 pages
Text Classification
No ratings yet
Text Classification
53 pages
Torch (Machine Learning)
No ratings yet
Torch (Machine Learning)
4 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
spam detection
No ratings yet
spam detection
39 pages
Effective Classification of Text
No ratings yet
Effective Classification of Text
6 pages
Text Classification PDF
No ratings yet
Text Classification PDF
7 pages
MultinomialNB
No ratings yet
MultinomialNB
52 pages
Constantinou 2018 ML PDF
No ratings yet
Constantinou 2018 ML PDF
27 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Python Machine Learning for Beginners eBook Final
No ratings yet
Python Machine Learning for Beginners eBook Final
305 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Resume For Data Analyst
No ratings yet
Resume For Data Analyst
1 page
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
"Artificial Intelligence in Health Care ": Master of Business Administration
No ratings yet
"Artificial Intelligence in Health Care ": Master of Business Administration
56 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
12 SpectrumMED
No ratings yet
12 SpectrumMED
80 pages
Editor's Notebook
From Everand
Editor's Notebook
Kalifer Deil
No ratings yet
Structures and C
From Everand
Structures and C
Prakash Hegade
4/5 (2)
Thesis Statement: How to Write a Good Thesis Statement
From Everand
Thesis Statement: How to Write a Good Thesis Statement
Grant Andrews
4.5/5 (9)

01 What Is Text Classification 8-12

Uploaded by

01 What Is Text Classification 8-12

Uploaded by

In this lecture, we'll introduce the topic

of text classification and the Naive Bayes

You might also like