Word Embedding
Word Embedding
A report by :-
Naveen Kumar
Sarthak Sharma
Ayush Purohit
Topics to be covered:-
Prerequisite
Graphical visualization
Prerequsite
1. One_Hot Encoding
2. Label Encoding
3. Mapping
Need of Encoding techniques:-
Gender City Age Income (k$) Buys Product
Female Mumbai 30 60 No
Female Bangalore 28 55 No
Male Bangalore 40 90 No
Gender_Male Gender_Female City_Delhi City_Mumbai City_Bangalore Age Income (k$) Buys Product
1 0 1 0 0 25 50 Yes
0 1 0 1 0 30 60 No
1 0 1 0 0 22 45 Yes
0 1 0 0 1 28 55 No
1 0 0 1 0 35 80 Yes
0 1 1 0 0 24 48 Yes
1 0 0 0 1 40 90 No
Label Encoding
Label Encoding is a technique used to convert categorical variables into numerical values by assigning each
unique category a distinct integer.
Key Points:
Red 0
Blue 1
Green 2
Label Encoding Example
1 1 25 50 1
0 2 30 60 0
1 1 22 45 1
0 3 28 55 0
1 2 35 80 1
0 1 24 48 1
1 3 40 90 0
Mapping
Mapping encoding is a technique to convert categorical data into numerical data by assigning specific
numbers to each category manually.
It gives flexibility to assign any number to any category based on your choice or logic.
Example:
Suppose we have City = {Delhi, Mumbai, Bangalore}.
We can map them like this:
● Delhi → 10 ● Delhi → 5
● Delhi → 1
● Mumbai → 20 ● Mumbai → 6
● Mumbai → 2
● Bangalore → 30 ● Bangalore → 7
● Bangalore → 3
Gender:
City: Buys Product:
○ Male → 5
○ Delhi → 100 ● Yes → 7
○ Female → 10
○ Mumbai → 200 ● No → 3
○ Bangalore → 300
5 100 25 50 7
10 200 30 60 3
5 100 22 45 7
10 300 28 55 3
5 200 35 80 7
10 100 24 48 7
5 300 40 90 3
One Hot Encoding vs Label Encoding vs Mapping
Feature Label Encoding One Hot Encoding Mapping
What it does Gives a number to Makes a new column You decide numbers
each value for each value yourself
When to use Categories have order Categories have no When you know the
or few types order ranking
Result One column with Many columns with 0s One column with
numbers and 1s custom numbers
● Machine may think numbers have order (like 0 < 1 < 2), even if there is no real order.
Mapping Problems:
● Depends on human logic; if wrong numbers are assigned, model learns wrong patterns.
But it do not tells about the relationship / similarities between these words.
● Word Embedding is a way to turn words into numbers, but in a smart way.
● It gives similar words similar numbers.
● Instead of simple numbers (like 1, 2, 3), words are mapped into vectors (groups of numbers) that capture their meaning.
Key Points:
GloVe: Matrix factorization technique based on word co-occurrence matrix, capturing both local and global
statistics.
FastText: Represents words as character n-grams, improving handling of out-of-vocabulary words and subword
information.
ELMo: Context-dependent embeddings from bidirectional LSTM, offering dynamic word representations based
on context.
BERT: Pre-trains deep bidirectional representations, using context from both directions to generate contextual
embeddings for superior NLP task performance.
Transformer Models (e.g., GPT-3, T5): Use attention mechanisms for generating highly accurate, complex, and
contextual embeddings.
Example for better clearity
Example Continue
Example Visualization
Programming - part 1 - ( From Scratch )
Programming - part2 - Using Gemsim Library ( Word2Vec)
Output:-
Example of Bag Of Word
Definition:
Example: Sentences:
● I love dogs
● I love cats
I 1 1
love 1 1
dogs 1 0
cats 0 1
Thank You 😊