0% found this document useful (0 votes)

10 views48 pages

NLP DL Lecture1

This document provides an overview and agenda for a lecture on using logistic regression for text classification. The lecture covers classical machine learning techniques for natural language processing including the vector space model and tf-idf weighting. It then outlines the neural network-based roadmap for NLP, including word embeddings, recurrent neural networks, attention mechanisms, and pre-trained language models. The document concludes by stating that the lecture will demonstrate logistic regression for text classification and assign related work.

Uploaded by

thanh.tien.96.vn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views48 pages

NLP DL Lecture1

Uploaded by

thanh.tien.96.vn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Deep Learning for Natural Language Processing

Lecture 1: Logistics Regression for Text Classification

Quan Thanh Tho

Faculty of Computer Science and Engineering
Back Khoa University
Acknowledgement

• Some slides are from Coursera course of Prof. Andew Ng.

Agenda

• Classical Machine Learning Techniques for NLP

• Neural-based Roadmap of NLP
• Logistics Regression for Text Classification
• Demonstration and Assignment
Classical Machine Learning Techniques for NLP

• Machine Learning Task

• The tf.idf weights
• Neural-based approach
Document Collection
• A collection of n documents can be represented in the vector
space model by a term-document matrix.
• An entry in the matrix corresponds to the “weight” of a term in
the document; zero means the term has no significance in the
document or it simply doesn’t exist in the document.

T1 T2 …. Tt
D1 w11 w21 … wt1
D2 w12 w22 … wt2
: : : :
: : : :
Dn w1n w2n … wtn
7
Term Weights: Term Frequency
• More frequent terms in a document are more important, i.e. more
indicative of the topic.
fij = frequency of term i in document j

• May want to normalize term frequency (tf) by dividing by the

frequency of the most common term in the document:
tfij = fij / maxi{fij}

8
Term Weights: Inverse Document Frequency
• Terms that appear in many different documents are less indicative of overall
topic.
df i = document frequency of term i
= number of documents containing term i
idfi = inverse document frequency of term i,
= log2 (N/ df i)
(N: total number of documents)
• An indication of a term’s discrimination power.
• Log used to dampen the effect relative to tf.

9
TF-IDF Weighting
• A typical combined term importance indicator is tf-idf weighting:
wij = tfij idfi = tfij log2 (N/ dfi)
• A term occurring frequently in the document but rarely in the rest of
the collection is given high weight.
• Many other ways of determining term weights have been proposed.
• Experimentally, tf-idf has been found to work well.

10
Computing TF-IDF -- An Example
Given a document containing terms with given frequencies:
A(3), B(2), C(1)
Assume collection contains 10,000 documents and
document frequencies of these terms are:
A(50), B(1300), C(250)
Then:
A: tf = 3/3; idf = log2(10000/50) = 7.6; tf-idf = 7.6
B: tf = 2/3; idf = log2 (10000/1300) = 2.9; tf-idf = 2.0
C: tf = 1/3; idf = log2 (10000/250) = 5.3; tf-idf = 1.8

11
Neural-based Approaches
Why Neural?
Neural-based Milestones for NLP

14
Neural Language Model

15
Multitask Learning

16
Word Embedding

17
18
Recurrent Neural Networks

19
RNN Common Architectures

20
RNN Common Architectures

21
RNN Common Architectures

22
RNN Common Architectures

23
Enhancement from RNN

24
Enhancement from RNN

25
Gated Recurrent Unit

26
Seq2seq Architecture

27
Seq2seq limitation

28
Attention Mechanism

29
Attention Mechanism

30
Dynamic Memory Model

31
Pre-trained Language Model

32
Pre-trained Neural Language Model

33
Supervised Learning Problem

Exploring TF-IDF Weighting in Natural Language Processing
No ratings yet
Exploring TF-IDF Weighting in Natural Language Processing
14 pages
Module III
No ratings yet
Module III
42 pages
Lecture 10- Term Frequency
No ratings yet
Lecture 10- Term Frequency
17 pages
NLP-Neuro Linguistic Programming: What Is A Corpus?
No ratings yet
NLP-Neuro Linguistic Programming: What Is A Corpus?
3 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Feature extraction techniques in NLP
No ratings yet
Feature extraction techniques in NLP
10 pages
NLP4E - Day1
No ratings yet
NLP4E - Day1
32 pages
Report Rohun Sjmoon
No ratings yet
Report Rohun Sjmoon
6 pages
Term Weighting 2021
100% (2)
Term Weighting 2021
38 pages
Term Weighting and Similarity Measures
50% (2)
Term Weighting and Similarity Measures
54 pages
Text Pre Processing With NLTK
No ratings yet
Text Pre Processing With NLTK
42 pages
AIML_P5
No ratings yet
AIML_P5
10 pages
Chapter-3 Termweighting
No ratings yet
Chapter-3 Termweighting
17 pages
CHAPTER TWO
No ratings yet
CHAPTER TWO
3 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
TF-IDF
No ratings yet
TF-IDF
15 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
3 Term Weighting
No ratings yet
3 Term Weighting
34 pages
TF-IDF
No ratings yet
TF-IDF
4 pages
Allnlp
No ratings yet
Allnlp
15 pages
Preprocessing Stemin JI
No ratings yet
Preprocessing Stemin JI
3 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Natural Language Processing (NLP) Introduction:: Top 10 NLP Interview Questions For Beginners
No ratings yet
Natural Language Processing (NLP) Introduction:: Top 10 NLP Interview Questions For Beginners
24 pages
Bag of Words
No ratings yet
Bag of Words
19 pages
Term Weighting and Similarity Measures
No ratings yet
Term Weighting and Similarity Measures
35 pages
TextFeatureEnginerring-NLP lec2
No ratings yet
TextFeatureEnginerring-NLP lec2
60 pages
NLTK 3
No ratings yet
NLTK 3
5 pages
Toxic Comment Classification Using Natural Language Processing IRJET-V7I61123
No ratings yet
Toxic Comment Classification Using Natural Language Processing IRJET-V7I61123
4 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
6. Applications of NLP
No ratings yet
6. Applications of NLP
85 pages
learn 4
No ratings yet
learn 4
27 pages
Text Representation
No ratings yet
Text Representation
16 pages
Document Classification Using Distributed Machine Learning
No ratings yet
Document Classification Using Distributed Machine Learning
4 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages
DeekshikaJadyada26-AP24LDS11
No ratings yet
DeekshikaJadyada26-AP24LDS11
7 pages
AI6122 Topic 3.2 - Ranking
No ratings yet
AI6122 Topic 3.2 - Ranking
27 pages
big data analytics Chap 11
No ratings yet
big data analytics Chap 11
8 pages
TF-IDF - From - Scratch - Towards - Data - Science
No ratings yet
TF-IDF - From - Scratch - Towards - Data - Science
20 pages
Term Frequency and Inverse Document Frequency
No ratings yet
Term Frequency and Inverse Document Frequency
26 pages
3 termWeightingIR
No ratings yet
3 termWeightingIR
32 pages
Agarwal 2014
No ratings yet
Agarwal 2014
9 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
Experiment No. 4: Kjsce/It/Lybtech/Sem Viii/Ir/2023-24
No ratings yet
Experiment No. 4: Kjsce/It/Lybtech/Sem Viii/Ir/2023-24
4 pages
1 Overview
No ratings yet
1 Overview
44 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
FALLSEM2024-25_BCSE409L_TH_VL2024250101881_2024-11-15_Reference-Material-I
No ratings yet
FALLSEM2024-25_BCSE409L_TH_VL2024250101881_2024-11-15_Reference-Material-I
68 pages
Lect05
No ratings yet
Lect05
17 pages
Lesson 2.1_V4 - Term Frequency-Inverse Document Frequency (TF-IDF)
No ratings yet
Lesson 2.1_V4 - Term Frequency-Inverse Document Frequency (TF-IDF)
14 pages
3-Natural Language Processing With Attention Models
No ratings yet
3-Natural Language Processing With Attention Models
62 pages
3 Termweighting
No ratings yet
3 Termweighting
40 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
Chapter 3 IR
No ratings yet
Chapter 3 IR
34 pages
Natural language processing-Section (5)
No ratings yet
Natural language processing-Section (5)
38 pages
NLP Ir
No ratings yet
NLP Ir
24 pages
Fake News Detection Project
No ratings yet
Fake News Detection Project
9 pages
NLP m4
No ratings yet
NLP m4
97 pages
Mastering TensorFlow: From Basics to Expert Proficiency
From Everand
Mastering TensorFlow: From Basics to Expert Proficiency
William Smith
No ratings yet
Tcl Language Essentials: Definitive Reference for Developers and Engineers
From Everand
Tcl Language Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The_Perfect_Prompt__A_Prompt_Engineering_Cheat_Sheet___by_Maximilian_Vogel___The_Generator___Apr_2024___Medium
No ratings yet
The_Perfect_Prompt__A_Prompt_Engineering_Cheat_Sheet___by_Maximilian_Vogel___The_Generator___Apr_2024___Medium
22 pages
REMAX Brand Guide
No ratings yet
REMAX Brand Guide
106 pages
(2022개정)2025년 공통영어1 Ne능률(민병천) 4과 예상문제 4회
No ratings yet
(2022개정)2025년 공통영어1 Ne능률(민병천) 4과 예상문제 4회
9 pages
Unit1 Lecture 1
No ratings yet
Unit1 Lecture 1
15 pages
AI ML RL GenAI
No ratings yet
AI ML RL GenAI
37 pages
AI - Chapter 4 Review questions
No ratings yet
AI - Chapter 4 Review questions
3 pages
Computational Intelligence: (Introduction To Machine Learning)
No ratings yet
Computational Intelligence: (Introduction To Machine Learning)
55 pages
Video Transcript_Lee_The AI in Social Media.docx
No ratings yet
Video Transcript_Lee_The AI in Social Media.docx
3 pages
Download Complete (Ebook) Generative AI with LangChain by Ben Auffarth ISBN 9781835083468, 1835083463 PDF for All Chapters
100% (18)
Download Complete (Ebook) Generative AI with LangChain by Ben Auffarth ISBN 9781835083468, 1835083463 PDF for All Chapters
65 pages
IA hallucination
No ratings yet
IA hallucination
11 pages
Sessions Program I3C2S'25
No ratings yet
Sessions Program I3C2S'25
15 pages
AI Class 10 Mock Test
No ratings yet
AI Class 10 Mock Test
6 pages
Day 1 Special Bonus
No ratings yet
Day 1 Special Bonus
23 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
Artificial Intelligence Course Content
No ratings yet
Artificial Intelligence Course Content
6 pages
AI Unit-4
No ratings yet
AI Unit-4
59 pages
The Road To Superintelligence
No ratings yet
The Road To Superintelligence
6 pages
Menon, Todariya, Agerwala - AI, Consciousness and The New Humanism Fundamental Reflections On Minds and Machines (2024)
No ratings yet
Menon, Todariya, Agerwala - AI, Consciousness and The New Humanism Fundamental Reflections On Minds and Machines (2024)
349 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
32 pages
Deeplearning - Ai Deeplearning - Ai
100% (1)
Deeplearning - Ai Deeplearning - Ai
88 pages
PAPER Prompt Engineering for LLM
No ratings yet
PAPER Prompt Engineering for LLM
6 pages
Multiple Choice (AI)
No ratings yet
Multiple Choice (AI)
3 pages
d41586-025-00068-5
No ratings yet
d41586-025-00068-5
3 pages
Ai 2
No ratings yet
Ai 2
37 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Test 33
No ratings yet
Test 33
11 pages
Arificial Intelligence
No ratings yet
Arificial Intelligence
2 pages
AI For EV
No ratings yet
AI For EV
22 pages
DAI_School_TG_1 (1)
No ratings yet
DAI_School_TG_1 (1)
2 pages
Rushikesh Sresume
No ratings yet
Rushikesh Sresume
1 page

NLP DL Lecture1

Uploaded by

NLP DL Lecture1

Uploaded by

Deep Learning for Natural Language Processing

Lecture 1: Logistics Regression for Text Classification

Quan Thanh Tho

• Some slides are from Coursera course of Prof. Andew Ng.

• Classical Machine Learning Techniques for NLP

• Machine Learning Task

• May want to normalize term frequency (tf) by dividing by the

You might also like