0% found this document useful (0 votes)

89 views4 pages

What - Why: Dummy Variables

Dummy variables are used to represent categorical variables in machine learning models that cannot process categorical data directly. There are two main techniques for converting categorical variables to dummy variables: label encoding and one-hot encoding. Label encoding assigns integer values to each category, but this can incorrectly imply an ordering between categories. One-hot encoding creates a new binary feature for each unique category value and avoids any ordering implications. It is preferable when the categorical variable is nominal rather than ordinal.

Uploaded by

Naing Naing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views4 pages

What - Why: Dummy Variables

Uploaded by

Naing Naing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Dummy Variables

 What - A dummy variable is a numerical variable that represents

categorical variables.

 Why – A lot of machine learning algorithms cannot work with

categorical variables directly, they need to be converted to numbers.

 How – There are multiple ways of handling Categorical variables

1. Label Encoding
2. One-Hot Encoding
Label Encoding
 Each categorical label is simply assigned a unique integer.

Country Age Salary Country Age Salary

India 44 32000 0 44 32000

US 34 33400 2 34 33400

Japan 43 45000 1 43 45000

US 23 23000 2 23 23000

Japan 23 67000 1 23 67000

 An effective technique when categorical data is ordinal.

 Challenge – Country is a nominal variable, there is no inherent ordering, Label encoding creates ranks for
countries. For eg here: India < Japan < US.
 This will affect model interpretation.
 We can use one-hot encoding to overcome this.
One-Hot Encoding
 One hot encoding is a representation of categorical variables as binary vectors.
 It creates additional features based on the number of unique labels in the categorical feature

Country Age Salary Country.India Country.Japan Country.US Age Salary

India 44 32000 1 0 0 44 32000

US 34 33400 0 0 1 34 33400

Japan 43 45000 0 1 0 43 45000

US 23 23000
0 0 1 23 23000
Japan 23 67000
0 1 0 23 67000

 3 new features are added in place of Country

 We solved the problem of ranking as each category is represented by a binary vector.
 Apply this technique when the categorical data is not ordinal
 Challenges – If number of categories is high, it can lead to high dimensionality.
Note : For One Hot Encoding

 The regression model won't actually need all the dummy variables.
 It doesn't need the final dummy variable as it can deduce that information from the combination of
all other dummy variables!
 To avoid multicollinearity, drop one dummy variable (use n-1 of them for model building).

ECN 813 Dummy Variable
No ratings yet
ECN 813 Dummy Variable
21 pages
Lexical Analysis Sample
100% (1)
Lexical Analysis Sample
13 pages
ML Concepts Papers
No ratings yet
ML Concepts Papers
3 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Dummy Variables EAB
No ratings yet
Dummy Variables EAB
12 pages
All About Categorical Variable Encoding
No ratings yet
All About Categorical Variable Encoding
21 pages
Dummy Variables 1
No ratings yet
Dummy Variables 1
15 pages
Exp 6
No ratings yet
Exp 6
9 pages
L1_Data Pre-processing & Steps of Building a Model (1)
No ratings yet
L1_Data Pre-processing & Steps of Building a Model (1)
30 pages
All About Encoding - by Baijayanta Roy - Towards Data Science
No ratings yet
All About Encoding - by Baijayanta Roy - Towards Data Science
25 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
Dummy Variable Final
No ratings yet
Dummy Variable Final
14 pages
Encoding Notes
No ratings yet
Encoding Notes
4 pages
Model Description: Solution
No ratings yet
Model Description: Solution
5 pages
Yusuf Notes
No ratings yet
Yusuf Notes
4 pages
Label Encoding Presentation
No ratings yet
Label Encoding Presentation
11 pages
6 One Hot Encoding
No ratings yet
6 One Hot Encoding
3 pages
Dummy Variables MCQ
No ratings yet
Dummy Variables MCQ
2 pages
Dummy Variables
No ratings yet
Dummy Variables
2 pages
Econometrics II All Chapters
No ratings yet
Econometrics II All Chapters
240 pages
Econometrics II Chapter One
No ratings yet
Econometrics II Chapter One
87 pages
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
15 pages
Dummy Variable - Lecture
No ratings yet
Dummy Variable - Lecture
20 pages
econoch7
No ratings yet
econoch7
32 pages
Handling of Categorical Data
No ratings yet
Handling of Categorical Data
18 pages
Ees 401 Econometrics II Module
No ratings yet
Ees 401 Econometrics II Module
77 pages
Lecture 08 Dummy Variables
No ratings yet
Lecture 08 Dummy Variables
6 pages
EBE Dummy Variables
No ratings yet
EBE Dummy Variables
9 pages
Econometrics II (n)
No ratings yet
Econometrics II (n)
30 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
9 pages
Chapter One_ Re
No ratings yet
Chapter One_ Re
10 pages
Dummy Variables
No ratings yet
Dummy Variables
2 pages
Encoding Categorical Data
No ratings yet
Encoding Categorical Data
4 pages
3.dummy Variables
No ratings yet
3.dummy Variables
25 pages
Lecture7 - Regression Extensions
No ratings yet
Lecture7 - Regression Extensions
58 pages
CHapter 5 Acct
No ratings yet
CHapter 5 Acct
8 pages
Dummy 19
No ratings yet
Dummy 19
9 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
SMDS-unit-3
No ratings yet
SMDS-unit-3
45 pages
Dummy Dependent Variables Models
No ratings yet
Dummy Dependent Variables Models
15 pages
Lecture 10
No ratings yet
Lecture 10
37 pages
Ch07 - Dummy Variables - Ver1
No ratings yet
Ch07 - Dummy Variables - Ver1
29 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
LEC11_ECMT
No ratings yet
LEC11_ECMT
25 pages
Dealing with categorical
No ratings yet
Dealing with categorical
25 pages
Mastering Categorical Encoding
No ratings yet
Mastering Categorical Encoding
8 pages
Chapter10 Econometrics DummyVariableModel
No ratings yet
Chapter10 Econometrics DummyVariableModel
8 pages
Econometrics Cha 4(1)
No ratings yet
Econometrics Cha 4(1)
72 pages
Why use k-1variables in dummy
No ratings yet
Why use k-1variables in dummy
1 page
Chapter 5 & 6
No ratings yet
Chapter 5 & 6
136 pages
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
100% (5)
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
83 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
96 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
Get immediate PDF access to the full Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank.
100% (15)
Get immediate PDF access to the full Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank.
34 pages
TP4-ML-features encoding (3)
No ratings yet
TP4-ML-features encoding (3)
4 pages
Introduction To Dummy Variable Regressors 1. An Example of Dummy Variable Regressors
No ratings yet
Introduction To Dummy Variable Regressors 1. An Example of Dummy Variable Regressors
18 pages
Econometrics 4
No ratings yet
Econometrics 4
37 pages
Feature+Encoding
No ratings yet
Feature+Encoding
5 pages
20-Minute (Or Less) Filter Hacks
From Everand
20-Minute (Or Less) Filter Hacks
Sheela Preuitt
No ratings yet
Java Core Interview Questions and Answers. Tech interviewer’s notes
From Everand
Java Core Interview Questions and Answers. Tech interviewer’s notes
John Edward Cooper Berg
1/5 (1)
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
From Everand
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
Fouad Sabry
No ratings yet
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Algorithms List in AI
No ratings yet
Algorithms List in AI
59 pages
15CSL76 Students
No ratings yet
15CSL76 Students
18 pages
Alzheimer's Disease PHD PPT 2019
100% (1)
Alzheimer's Disease PHD PPT 2019
14 pages
MP Documentation
No ratings yet
MP Documentation
75 pages
Fast-Fourier-transform Based Numerical Integration Method For The RayleighSommerfeld Diffraction Formula, PDF
No ratings yet
Fast-Fourier-transform Based Numerical Integration Method For The RayleighSommerfeld Diffraction Formula, PDF
9 pages
Problem Set 02
No ratings yet
Problem Set 02
6 pages
Network Theory 3rd Sem Update
No ratings yet
Network Theory 3rd Sem Update
5 pages
VTU E&CE (CBCS) 5th Sem Information Theory and Coding Full Notes (1-5 Modules)
80% (5)
VTU E&CE (CBCS) 5th Sem Information Theory and Coding Full Notes (1-5 Modules)
691 pages
Data Analytics-Methods-Tools-And-Techniques
No ratings yet
Data Analytics-Methods-Tools-And-Techniques
19 pages
11.3 Eigenvalues and Eigenvectors of A Tridiagonal Matrix
No ratings yet
11.3 Eigenvalues and Eigenvectors of A Tridiagonal Matrix
7 pages
lOCS82: System Modeling and Simulation
No ratings yet
lOCS82: System Modeling and Simulation
2 pages
Module 4
No ratings yet
Module 4
54 pages
Model Predictive Control: Theory, Computation, and Design, 2nd Edition Rawlings James B. - Download the ebook now for instant access to all chapters
100% (2)
Model Predictive Control: Theory, Computation, and Design, 2nd Edition Rawlings James B. - Download the ebook now for instant access to all chapters
63 pages
Session 08022022
No ratings yet
Session 08022022
128 pages
Topological order
No ratings yet
Topological order
3 pages
Dijkstra's Algorithm Shortest Path
No ratings yet
Dijkstra's Algorithm Shortest Path
3 pages
Classical Ciphers: Cryptography - CS 411 / 507 Erkay Savas Sabancı University
No ratings yet
Classical Ciphers: Cryptography - CS 411 / 507 Erkay Savas Sabancı University
52 pages
AI Quiz3 MCQs
No ratings yet
AI Quiz3 MCQs
51 pages
Adaline
No ratings yet
Adaline
28 pages
Unsupervised Feature Learning and Deep Learning - A Review and New Perspectives Author Yoshua Bengio, Aaron Courville, and Pascal Vincent
No ratings yet
Unsupervised Feature Learning and Deep Learning - A Review and New Perspectives Author Yoshua Bengio, Aaron Courville, and Pascal Vincent
30 pages
Introduction To PROCESS MODELING AND SIMULATION
50% (2)
Introduction To PROCESS MODELING AND SIMULATION
47 pages
D2l-En Deep Learning PDF
No ratings yet
D2l-En Deep Learning PDF
639 pages
The EOQ Formula
No ratings yet
The EOQ Formula
3 pages
Control System Unit 1 Question Bank
No ratings yet
Control System Unit 1 Question Bank
2 pages
Data Mining: Priyanka Nemalikanti
No ratings yet
Data Mining: Priyanka Nemalikanti
5 pages
Zhang, El-Gohary - 2015 - Automated Extraction of Information From Building Information Models Into A Semantic Logic-Based Representatio
No ratings yet
Zhang, El-Gohary - 2015 - Automated Extraction of Information From Building Information Models Into A Semantic Logic-Based Representatio
9 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
CS QB
No ratings yet
CS QB
14 pages

What - Why: Dummy Variables

Uploaded by

What - Why: Dummy Variables

Uploaded by

Dummy Variables

 What - A dummy variable is a numerical variable that represents

 Why – A lot of machine learning algorithms cannot work with

 How – There are multiple ways of handling Categorical variables

Country Age Salary Country Age Salary

India 44 32000 0 44 32000

Japan 43 45000 1 43 45000

Japan 23 67000 1 23 67000

 An effective technique when categorical data is ordinal.

Country Age Salary Country.India Country.Japan Country.US Age Salary

India 44 32000 1 0 0 44 32000

Japan 43 45000 0 1 0 43 45000

 3 new features are added in place of Country

You might also like