0% found this document useful (0 votes)

4 views23 pages

Word Embedding

Uploaded by

Sarthak Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

Word Embedding

Uploaded by

Sarthak Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Word Embedding

A report by :-
Naveen Kumar
Sarthak Sharma
Ayush Purohit
Topics to be covered:-
Prerequisite

Problem of previous technique

About Word Embedding

Some word embedding techniques

Programming for word embedding

Graphical visualization
Prerequsite

Need of Encoding techniques

Some of commonly used encoding techniques:-

1. One_Hot Encoding
2. Label Encoding
3. Mapping
Need of Encoding techniques:-
Gender City Age Income (k$) Buys Product

Male Delhi 25 50 Yes

Female Mumbai 30 60 No

Male Delhi 22 45 Yes

Female Bangalore 28 55 No

Male Mumbai 35 80 Yes

Female Delhi 24 48 Yes

Male Bangalore 40 90 No

Note :- Machine only understand numbers

One Hot Encoding
One-Hot Encoding is a way to convert categories into numbers by creating a new column for each
category.

If a category is present, we put 1, otherwise 0.

Gender_Male Gender_Female City_Delhi City_Mumbai City_Bangalore Age Income (k$) Buys Product

1 0 1 0 0 25 50 Yes

0 1 0 1 0 30 60 No

1 0 1 0 0 22 45 Yes

0 1 0 0 1 28 55 No

1 0 0 1 0 35 80 Yes

0 1 1 0 0 24 48 Yes

1 0 0 0 1 40 90 No
Label Encoding
Label Encoding is a technique used to convert categorical variables into numerical values by assigning each
unique category a distinct integer.

Key Points:

● Each category is mapped to a unique number.

●
● Used when the categorical feature has no meaningful order.

● Fast and memory-efficient.

Category Encoded Value

Red 0

Blue 1

Green 2
Label Encoding Example

Gender City Age Income (k$) Buys Product

1 1 25 50 1

0 2 30 60 0

1 1 22 45 1

0 3 28 55 0

1 2 35 80 1

0 1 24 48 1

1 3 40 90 0
Mapping
Mapping encoding is a technique to convert categorical data into numerical data by assigning specific
numbers to each category manually.

It gives flexibility to assign any number to any category based on your choice or logic.

Example:
Suppose we have City = {Delhi, Mumbai, Bangalore}.
We can map them like this:
● Delhi → 10 ● Delhi → 5
● Delhi → 1

● Mumbai → 20 ● Mumbai → 6
● Mumbai → 2

● Bangalore → 30 ● Bangalore → 7
● Bangalore → 3
Gender:
City: Buys Product:
○ Male → 5
○ Delhi → 100 ● Yes → 7
○ Female → 10
○ Mumbai → 200 ● No → 3
○ Bangalore → 300

Gender City Age Income (k$) Buys Product

5 100 25 50 7

10 200 30 60 3

5 100 22 45 7

10 300 28 55 3

5 200 35 80 7

10 100 24 48 7

5 300 40 90 3
One Hot Encoding vs Label Encoding vs Mapping
Feature Label Encoding One Hot Encoding Mapping

What it does Gives a number to Makes a new column You decide numbers
each value for each value yourself

When to use Categories have order Categories have no When you know the
or few types order ranking

Result One column with Many columns with 0s One column with
numbers and 1s custom numbers

Memory use Very less More (because more Very less

columns)

Example Red → 0, Blue → 1, Red → [1,0,0], Blue → Low → 5, Medium →

Green → 2 [0,1,0] 25, High → 150
Problem with previous Encoding techniques
Label Encoding Problems:

● Machine may think numbers have order (like 0 < 1 < 2), even if there is no real order.

● Can confuse models and give wrong predictions.

One Hot Encoding Problems:

● Creates many new columns (especially if many categories).

● Takes more memory and slows down training.

● Can cause "curse of dimensionality" (too many features).

Mapping Problems:

● Depends on human logic; if wrong numbers are assigned, model learns wrong patterns.

● Not automatic — needs manual work and good domain knowledge.

Main Drawback :-

Previously discussed Encoding techniques are able to convert categorical into

numerical form of data.

Previous Method => Categorical → Numerical form ( good work)

But it do not tells about the relationship / similarities between these words.

Hence, Word Mapping comes into the picture.

Word Mapping => Categorical → Numerical + Similarity ( Nice Work)

Word Embedding
What is Word Embedding?

● Word Embedding is a way to turn words into numbers, but in a smart way.
● It gives similar words similar numbers.
● Instead of simple numbers (like 1, 2, 3), words are mapped into vectors (groups of numbers) that capture their meaning.

Key Points:

● It helps machine learning models understand the meaning of words.

● Used in NLP tasks like translation, sentiment analysis, chatbots, etc.

Need for Word Embedding?

● To reduce dimensionality
● To use a word to predict the words around it.
● Inter-word semantics must be captured.

How are Word Embeddings used?

● They are used as input to machine learning models.
Take the words —-> Give their numeric representation —-> Use in training or inference.
● To represent or visualize any underlying patterns of usage in the corpus that was used to train them
Some Word Embedding Techniques
Word2Vec: Uses shallow neural networks to learn word associations. Models: CBOW (predicts center word
from context) & Skip-Gram (predicts context from center word). Efficient for semantic relationships

GloVe: Matrix factorization technique based on word co-occurrence matrix, capturing both local and global
statistics.

FastText: Represents words as character n-grams, improving handling of out-of-vocabulary words and subword
information.

ELMo: Context-dependent embeddings from bidirectional LSTM, offering dynamic word representations based
on context.

BERT: Pre-trains deep bidirectional representations, using context from both directions to generate contextual
embeddings for superior NLP task performance.

Transformer Models (e.g., GPT-3, T5): Use attention mechanisms for generating highly accurate, complex, and
contextual embeddings.
Example for better clearity
Example Continue
Example Visualization
Programming - part 1 - ( From Scratch )
Programming - part2 - Using Gemsim Library ( Word2Vec)
Output:-
Example of Bag Of Word
Definition:

● Represents text by counting word occurrences.

● Word order is ignored, only frequency matters.
● Each document is converted into a vector.

Example: Sentences:

● I love dogs
● I love cats

Vocabulary: [I, love, dogs, cats]

Word "I love dogs" "I love cats"

I 1 1

love 1 1

dogs 1 0

cats 0 1
Thank You 😊

Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
Unit - 1 Notes - Introduction To Data-Analytics PDF
50% (2)
Unit - 1 Notes - Introduction To Data-Analytics PDF
106 pages
Business Case - Aerofit - Descriptive Statistics Probability (Final)
100% (1)
Business Case - Aerofit - Descriptive Statistics Probability (Final)
1 page
Handling of Categorical Data
No ratings yet
Handling of Categorical Data
18 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
Data Encoding and Decoding Techniques
No ratings yet
Data Encoding and Decoding Techniques
5 pages
L1_Data Pre-processing & Steps of Building a Model (1)
No ratings yet
L1_Data Pre-processing & Steps of Building a Model (1)
30 pages
Comparison Between Encoding Methods - 1
No ratings yet
Comparison Between Encoding Methods - 1
7 pages
Feature Engineering
No ratings yet
Feature Engineering
43 pages
Practical1c.ipynb - Colab
No ratings yet
Practical1c.ipynb - Colab
2 pages
Unit-4 Short Notes
No ratings yet
Unit-4 Short Notes
5 pages
DVT 1st Unit
No ratings yet
DVT 1st Unit
64 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Data Transformation
No ratings yet
Data Transformation
5 pages
Model Question Paper 2012 Hse I Economics
No ratings yet
Model Question Paper 2012 Hse I Economics
3 pages
Dự báo và phát triển kinh doanh
No ratings yet
Dự báo và phát triển kinh doanh
43 pages
DMDW 03
No ratings yet
DMDW 03
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Big Data
No ratings yet
Big Data
5 pages
Solution Michael.l.1
No ratings yet
Solution Michael.l.1
6 pages
Assignment Mod 10
No ratings yet
Assignment Mod 10
5 pages
Bcagc201 Unit 1
No ratings yet
Bcagc201 Unit 1
10 pages
Ex-MBA12024
No ratings yet
Ex-MBA12024
5 pages
practice_test
No ratings yet
practice_test
8 pages
Chapter 2 Multiple Regression 2
No ratings yet
Chapter 2 Multiple Regression 2
6 pages
Types of Data
No ratings yet
Types of Data
14 pages
Feature Eng Cheat Sheet
No ratings yet
Feature Eng Cheat Sheet
5 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Week 01 Introduction and Graphical Statistics
No ratings yet
Week 01 Introduction and Graphical Statistics
19 pages
QB Stat
No ratings yet
QB Stat
7 pages
Vertopal.com AML Project LearnerNotebook LowCode
No ratings yet
Vertopal.com AML Project LearnerNotebook LowCode
74 pages
EX 3
No ratings yet
EX 3
11 pages
Data Analysis 1
No ratings yet
Data Analysis 1
15 pages
Introduction Slides 1
No ratings yet
Introduction Slides 1
19 pages
02 - ML - Data Presentation-24-03-09
No ratings yet
02 - ML - Data Presentation-24-03-09
21 pages
TV Sets Sales (Units '000) 2008 2009: Applying SUM Using Shortcut "ALT "
No ratings yet
TV Sets Sales (Units '000) 2008 2009: Applying SUM Using Shortcut "ALT "
11 pages
A Comparative Study of Categorical Variable Encoding Techniques
No ratings yet
A Comparative Study of Categorical Variable Encoding Techniques
4 pages
Fbb066'2e 369f 416f 991c 9adc3f8a8a38MTQ's F2 Questions
No ratings yet
Fbb066'2e 369f 416f 991c 9adc3f8a8a38MTQ's F2 Questions
32 pages
dgdsfa1e_presentation_2_4
No ratings yet
dgdsfa1e_presentation_2_4
12 pages
Nmat by Gmac™ Practice Exam
100% (1)
Nmat by Gmac™ Practice Exam
105 pages
Mastering Categorical Encoding
No ratings yet
Mastering Categorical Encoding
8 pages
Ch1 Solutions 1
No ratings yet
Ch1 Solutions 1
59 pages
1-Basic SQL-Queries
No ratings yet
1-Basic SQL-Queries
6 pages
Data Aanalysis
No ratings yet
Data Aanalysis
33 pages
Ontela A - Grup 1 - Instagrammable
No ratings yet
Ontela A - Grup 1 - Instagrammable
9 pages
EXCEL Assignment
No ratings yet
EXCEL Assignment
7 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
(Articulo) A Comparative Study of Categorical Variable Encoding PDF
No ratings yet
(Articulo) A Comparative Study of Categorical Variable Encoding PDF
4 pages
IT Assignment For B.com 3rd Semester 2019
No ratings yet
IT Assignment For B.com 3rd Semester 2019
5 pages
quiz3_ba
No ratings yet
quiz3_ba
1 page
ADB Course Chapter_3.1 Data Storage
No ratings yet
ADB Course Chapter_3.1 Data Storage
34 pages
Quantitative method for business- chapter 1
No ratings yet
Quantitative method for business- chapter 1
8 pages
Findings - Aging in The LGBT Community
No ratings yet
Findings - Aging in The LGBT Community
5 pages
RM Unit4+5
No ratings yet
RM Unit4+5
149 pages
Lecture 15 - Introduction to Matrix Programming
No ratings yet
Lecture 15 - Introduction to Matrix Programming
22 pages
Module 1
No ratings yet
Module 1
30 pages
TouchCode Class 7: Coding Book
From Everand
TouchCode Class 7: Coding Book
Team Orange
No ratings yet
20-Minute (Or Less) Filter Hacks
From Everand
20-Minute (Or Less) Filter Hacks
Sheela Preuitt
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
CHAPTER 1: Anatomy and Destiny: Biological Arguments About Gender Difference
No ratings yet
CHAPTER 1: Anatomy and Destiny: Biological Arguments About Gender Difference
21 pages
Natural Disasters
No ratings yet
Natural Disasters
2 pages
Unit 1-MCQ
No ratings yet
Unit 1-MCQ
10 pages
1-APAR-ReadingMaterialApril29KSN
No ratings yet
1-APAR-ReadingMaterialApril29KSN
7 pages
Surya Medika: Niken Setyaningrum Agung Rejecky
No ratings yet
Surya Medika: Niken Setyaningrum Agung Rejecky
5 pages
One Storey Residential Building Plans: Roof Plan
No ratings yet
One Storey Residential Building Plans: Roof Plan
1 page
Standard Tank Construction
100% (1)
Standard Tank Construction
2 pages
Weekly_Recap_Michaels_Executions
No ratings yet
Weekly_Recap_Michaels_Executions
43 pages
Brochure - 26th World Congress On Pediatrics, Neonatology & Primary Care
No ratings yet
Brochure - 26th World Congress On Pediatrics, Neonatology & Primary Care
6 pages
CETM47-Ass1 Tofun
No ratings yet
CETM47-Ass1 Tofun
12 pages
Domingo v. Garlitos
No ratings yet
Domingo v. Garlitos
1 page
DLP Cot 1 Math8
100% (1)
DLP Cot 1 Math8
9 pages
5 Expense Accounts
No ratings yet
5 Expense Accounts
2 pages
Pipes & Cisterns
No ratings yet
Pipes & Cisterns
1 page
khan2
No ratings yet
khan2
47 pages
Nemolight Instruction Manual: Indicator Led Instructions
No ratings yet
Nemolight Instruction Manual: Indicator Led Instructions
2 pages
Irrawaddy Dolphin A4 Factsheet
No ratings yet
Irrawaddy Dolphin A4 Factsheet
1 page
Waves, Sound and Optics: Team 8 20 May 2022
No ratings yet
Waves, Sound and Optics: Team 8 20 May 2022
5 pages
Character Development Slides Yash & Vihaan
No ratings yet
Character Development Slides Yash & Vihaan
24 pages
Introduction To Programming Sample Question Paper
No ratings yet
Introduction To Programming Sample Question Paper
35 pages
2.3 Training Delivery and Facilitation
100% (1)
2.3 Training Delivery and Facilitation
38 pages
Lawrence Gowing - The Encyclopedia of Visual Art (10 Volume Set) Vol. 08 (1985, Grolier Educational Corporation)
100% (1)
Lawrence Gowing - The Encyclopedia of Visual Art (10 Volume Set) Vol. 08 (1985, Grolier Educational Corporation)
200 pages
Future Value Tables PDF
No ratings yet
Future Value Tables PDF
2 pages
Adrese TV 2013
No ratings yet
Adrese TV 2013
517 pages
Casio CTK 501
No ratings yet
Casio CTK 501
21 pages
DNA Barcoding, An Effective Tool For Species Identification: A Review
No ratings yet
DNA Barcoding, An Effective Tool For Species Identification: A Review
15 pages
The Handmaid's Tale: The Aspects of A Totalitarian System Viewed in The Novel
No ratings yet
The Handmaid's Tale: The Aspects of A Totalitarian System Viewed in The Novel
3 pages
National Plan of Action For Children in Kenya 2015 2022
No ratings yet
National Plan of Action For Children in Kenya 2015 2022
84 pages
Sleep Medicine and PSG
No ratings yet
Sleep Medicine and PSG
2 pages
Value Education
No ratings yet
Value Education
2 pages