0% found this document useful (0 votes)

9 views41 pages

ET 610 - Data Preprocessing

Uploaded by

venkadesh.p.2021.csbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views41 pages

ET 610 - Data Preprocessing

Uploaded by

venkadesh.p.2021.csbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Learning Analytics Tools

Data Preprocessing: Dr Ashwin T S

IIT Bombay
Dat a Pr ocessi n g
Why Data Processing?
Data Information Knowledge
What are the general types?
○ Pre-processing

○ Post-processing

What is Data Preprocessing

“Data preprocessing is a data mining technique which is used to transform the raw data in a useful
and efficient format”
OR
“Data preprocessing can refer to manipulation or dropping of data before it is used in order to
ensure or enhance performance, and is an important step in the data mining process”
W h y Dat a Pr ep r ocessi n g - A bs t r a c t L ev el
● Algorithm

Can we write an algorithm for any given problem?

Not Always -
Problem Type | Verifiable in P time | Solvable in P time

Algorithm

Solution

● Methods/architectures/frameworks

Can we get an optimum solution for any given problem?

W h y Dat a Pr ep r ocessi n g - A bs t r a c t L ev el
Common Time Complexities/
● Algorithm Growth Rate
Complexity • Constant

Time and Space • Log

• Linear
● Efficiency
• Quadratic
What does efficiency mean?
• Exponential
● Effectiveness
• Factorial
How much effective your algorithm should be?
W h y Dat a Pr ep r ocessi n g - A bs t r a c t L ev el

● Errors
Handling things in the algorithm, Both input data and the system specification

● Human Error

Slip

Mistake

Violations

● System Error
Su m m ar y
● What is data processing at an abstract level
● Complexity of an Algorithm
● What are the terms Efficiency and Effectiveness mean
● What are the different types of error
W h y i s Dat a p r ep r ocessi n g i m p or t an t ?
Preprocessing of data is mainly to check the data quality. The quality can be
checked by the following

● Accuracy: To check whether the data entered is correct or not.

● Completeness: To check whether the data is available or not recorded.
● Consistency: To check whether the same data is kept in all the places that do
or do not match.
● Timeliness: The data should be updated correctly.
● Believability: The data should be trustable.
● Interpretability: The understandability of the data.
M ajor T ask s i n
D a t a P r epr oc es s i n g
1. Data cleaning
2. Data integration
3. Data reduction
4. Data transformation
Dat a Cl ean i n g
Data cleaning routines work to “clean” the data by filling in missing values,
smoothing noisy data, identifying or removing outliers, and resolving
inconsistencies.

Sr. No Gender Pregnant Sr. No Gender Pregnant

1 Male No 1 Male No

2 Female Yes Adult 2 Female Yes

Adult 3 Male Yes 4 Female No

4 Female No

5 Male Yes
Dat a Ed i t i n g

Sr. No. Gender Pregnant Sr. No. Gender Pregnant

1 Male No 1 Male No

2 Female Yes 2 Female Yes

Adult 3 Male Yes Adult 3 Female Yes

4 Female No 4 Female No

5 Male Yes 5 Female Yes

Dat a Red u ct i on
Sr. No. Gender Pregnant Sr. No. Gender Pregnant

1 Male No 2 Female Yes

2 Female Yes 4 Female No

Adult 3 Male Yes Adult 1 Male No

4 Female No 3 Male Yes

5 Male Yes 5 Male Yes

Data reduction is the transformation of numerical or Data reduction obtains a reduced representation of
alphabetical digital information derived empirically or the data set that is much smaller in volume, yet
experimentally into a corrected, ordered, and simplified produces the same (or almost the same) analytical
form results.
Dat a T r an sf or m at i on an d Dat a I n t egr at i on
Normalization: It is the method of scaling the data so that it can be represented in a
smaller range. Example ranging from -1.0 to 1.0.

What are the other possibilities?

Entity identification problem: Identifying entities from multiple databases. For

example, the system or the user should know student _id of one database and
student_name of another database belongs to the same entity.

What are the other possibilities?

Su m m ar y
● Why Data Pre-processing
● What are the Major Tasks in data preprocessing
○ Data cleaning
○ Data integration
○ Data reduction
○ Data transformation
M i ssi n g Dat a
What is Missing Data?
Missing data is defined as the values or data that is not stored (or not present) for some
variable/s in the given dataset.
How to find?
Manual
Code
isnull() and notnull()
NaN
Is this sufficient?
W h y I s Dat a M i ssi n g Fr om T h e Dat aset
Some of the reasons are listed below:

● Past data might get corrupted due to improper maintenance.

● Observations are not recorded for certain fields due to some reasons. There
might be a failure in recording the values due to human error.
● The user has not provided the values intentionally.
W h y t o H an d l e M i ssi n g Dat a

● Many machine learning algorithms fail if the dataset contains missing values.
However, algorithms like K-nearest and Naive Bayes support data with missing
values.
● You may end up building a biased machine learning model which will lead to
incorrect results if the missing values are not handled properly.
● Missing data can lead to a lack of precision in the statistical analysis.
How to Handle?
1. Deleting the Missing values
2. Imputing the Missing Values
● assign (a value) to something by inference..
Types of Missingness

M i ssi n g Dat a Non-ignorable Ignorable

MNAR MCAR MAR

Jingjing Chen, Sharon Hunter, Krisztina Kisfalvi, Richard A. Lirio, A hybrid approach of handling missing data under different missing data mechanisms: VISIBLE 1 and
VARSITY trials for ulcerative colitis, Contemporary Clinical Trials, Volume 100, 2021, 106226, ISSN 1551-7144, https://towardsdatascience.com/missing-data-
cfd9dbfd11b7 https://doi.org/10.1016/j.cct.2020.106226.
Del et i n g t h e M i ssi n g v al u e
Deleting the entire row

Deleting the entire column

Note: If every row has some (column) value missing then you might end up deleting
the whole data.
I m p u t i n g t h e M i ssi n g V al u e
Replacing With Arbitrary Value

Corporate/work experience of a undergrad student can be zero

Replacing With Mean

most common method of imputing missing values of numeric columns

Replacing With Mode

used in the case of categorical features

I m p u t i n g t h e M i ssi n g V al u e
Replacing With Median

Used in case of outliers

Replacing with previous value – Forward fill

imputing the values with the previous value instead of mean, mode or median
is more appropriate in some cases. This is called forward fill

used in time series data

Replacing with next value – Backward fill

I m p u t i n g t h e M i ssi n g V al u e
Imputing Missing Values For Categorical Features

● Impute the Most Frequent Value

● Impute the Value “missing”, which treats it as a Separate Category

One extra column in one hot encoding

Univariate Approach

Multivariate Approach
H an d l i n g M i ssi n g V al u es: Del et i n g
Pros:
Complete removal of data with missing values results in robust and highly
accurate model
Deleting a particular row or a column with no specific information is better,
since it does not have a high weightage
Cons:
Loss of information and data
Works poorly if the percentage of missing values is high (say 30%),
compared to the whole database
Rep l aci n g W i t h
M ea n /M edi a n /M ode
Pros: CS37300: Data Mining & Machine Learning cs.purdue.edu

This is a better approach when the data size is small

It can prevent data loss which results in removal of the rows and columns

Cons:

Imputing the approximations add variance and bias

Works poorly compared to other multiple-imputations method

A ssi gn i n g A n Un i qu e Cat egor y
Pros:
Less possibilities with one extra category, resulting in low variance after
one hot encoding —since it is categorical
Negates the loss of data by adding an unique category
Cons:
Adds less variance
Adds another feature to the model while encoding, which may result in
poor performance
Pr ed i ct i n g T h e M i ssi n g V al u es
Pros:
Imputing the missing variable is an improvement as long as the bias from
the same is smaller than the omitted variable bias
Yields unbiased estimates of the model parameters
Cons:
Bias also arises when an incomplete conditioning set is used for a
categorical variable
Considered only as a proxy for the true values
Usi n g A l gor i t h m s W h i ch Su p p or t M i ssi n g
V a l u es
Pros:
Does not require creation of a predictive model for each attribute with
missing data in the dataset
Correlation of the data is neglected
Cons:
Is a very time consuming process and it can be critical in data mining where
large databases are being extracted
Choice of distance functions can be Euclidean, Manhattan etc. which is do
not yield a robust result
M i ssi n g Dat a H an d l i n g

Gangadharan, Nishanthi & Turner, Richard & Field, Ray & Oliver, Stephen & Slater, Nigel & Dikicioglu, Duygu. (2019). Metaheuristic approaches in
biopharmaceutical process development data analysis. Bioprocess and Biosystems Engineering. 42. 10.1007/s00449-019-02147-0.
Su m m ar y
● What is missing data
● Why to handle missing data
● What are the types of missingness
● What are the various ways to handle missing data
Ou t l i er s
Outliers are of three types, namely –

1. Global (or Point) Outliers

2. Collective Outliers
3. Contextual (or Conditional) Outliers

https://www.geeksforgeeks.org/
Fi n d i n g Ou t l i er s - B ox P l ot

data = [0, 1, 2, 3, 4, 5, 9] data = [0, 1, 2, 3, 6, 6, 6]

(4.5-1.5)=>3

data = [0, 1, 2, 3, 4, 5, 10]

https://www.geeksforgeeks.org/
1.5*3 is 4.5 and third quartile(4.5)+4.5=>9
En cod i n g
C a t egor i c a l D a t a
What is encoding categorical data?
Categorical Encoding is a process
where we transform categorical
data into numerical data
Why encoding is important?
The performance of a machine learning model not only depends on the model and
the hyperparameters but also on how we process and feed different types of variables
to the model. Since most machine learning models only accept numeric variables, and
hence preprocessing the categorical variables becomes a necessary step
Lab el En cod i n g or Or d i n al En cod i n g
Label Encoding is mostly only applicable for the Ordinal or categorical data with
meaningful order

In Label encoding, each label is

converted into an integer value

The process is easy and informative

Why is Label Encoding not applicable

to the non-ordinal or Nominal data?

Because it causes a prioritization

issue
introduce sparsity in the dataset

it creates multiple dummy features in the dataset

On e-H ot E n c odi n g without adding much information.
Dummy Variables Trap
In one hot encoding, for each level of a categorical feature, we create a new
variable. Each category is mapped with a binary variable containing either 0 or 1.
Here, 0 represents the absence, and 1 represents the presence of that category.
B i n ar y En cod i n g
In this encoding scheme, the categorical feature is first converted into numerical
using an ordinal encoder. Then the numbers are transformed in the binary number.
After that binary value is split into different columns.
Scal i n g: W h y Feat u r e Scal i n g
Dataset: Data Preprocessing course
Independent variable (Result) and
3 dependent variables (Time_Spent, Age, and Difficulty_Level)
We can easily notice that the variables are not on the same scale because the range of
Age is from 27 to 50, while the range of Time_Spent going from 48 K Seconds to 83 K
Seconds.
The range of Time_Spent is much wider than the range of Age. This will cause some
issues in our models since a lot of machine learning models such as k-means clustering
and nearest neighbour classification are based on the Euclidean Distance.
Scal i n g - S t a n da r di s a t i on a n d N or m a l i s a t i on
Feature scaling is one of the most important data preprocessing step in machine
learning
Normalization or Min-Max Scaling is used to transform features to be on a similar scale.
The new point is calculated as:
X_new = (X - X_min)/(X_max - X_min)
Standardization or Z-Score Normalization is the transformation of features by
subtracting from mean and dividing by standard deviation. This is often called as Z-
score.
X_new = (X - mean)/Std
S.No. Normalization Standardization

Minimum and maximum value of features are used

1. Mean and standard deviation is used for scaling.
for scaling

It is used when we want to ensure zero mean and unit

2. It is used when features are of different scales.
standard deviation.

3. Scales values between [0, 1] or [-1, 1]. It is not bounded to a certain range.

4. It is really affected by outliers. It is much less affected by outliers.

It is useful when we don’t know about the It is useful when the feature distribution is Normal or
7.
distribution Gaussian.

8. It is a often called as Scaling Normalization It is a often called as Z-Score Normalization.

Op en En d ed Qu est i on s
1. Should we always scale our features?
2. Is there a single best scaling technique?
3. How different scaling techniques affect different classifiers?
4. Should we consider scaling technique as an important hyperparameter of our
model?

Major Sources of Data Preprocessing used in these lecture slides

● Data Mining: C oncepts and Techniques, 3rd Edition. J iawei Han, Micheline Kamber, J ian Pei
● https://www.geeksforgeeks.org/
● https://www.analyticsvidhya.com
● https://towardsdatascience.com
Su m m ar y
● Types of Outliers
○ Global
○ Collective
○ Contextual
● Categorical data Encoding
○ Label or Ordinal
○ One-Hot
○ Binary
● Data Transformation: Scaling
○ Normalization
○ Standardization
Ov er al l Su m m ar y
● Why Data Pre-processing
● What are the Major Tasks in data preprocessing
○ Data cleaning
○ Data integration
○ Data reduction
○ Data transformation
● Missing Values
○ Types in of missingness: MNAR, MAR, MCAR
● Outliers
○ Types of Outliers: Global, Collective, Contextual
● Categorical Data Encoding
○ Types of Categorical Data Encoding: Ordinal, One-hot and Binary
● Scaling
○ Types of Scaling: Normalization and Standardization
Thank You

Ch 3
No ratings yet
Ch 3
34 pages
Lec 3 Data Preprocessing and Transformation
No ratings yet
Lec 3 Data Preprocessing and Transformation
66 pages
Lecture 02
No ratings yet
Lecture 02
41 pages
Week 3
No ratings yet
Week 3
77 pages
DS Unit 2
No ratings yet
DS Unit 2
42 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Data Preparation .1
No ratings yet
Data Preparation .1
37 pages
CS322_Lec 3_S25
No ratings yet
CS322_Lec 3_S25
42 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
3. Data Preprocessing
No ratings yet
3. Data Preprocessing
120 pages
Adsl Exp 3 2024
No ratings yet
Adsl Exp 3 2024
11 pages
Unit 2 Data Preprocessing (1)
No ratings yet
Unit 2 Data Preprocessing (1)
66 pages
Lec 3 Data Preprocessing and Transformation(1)
No ratings yet
Lec 3 Data Preprocessing and Transformation(1)
73 pages
14. Preprocessing-Cleaning & Reduction
No ratings yet
14. Preprocessing-Cleaning & Reduction
42 pages
4 - Data Pre-Processing I
No ratings yet
4 - Data Pre-Processing I
37 pages
Lecture 4 New Data Pre Processing
No ratings yet
Lecture 4 New Data Pre Processing
41 pages
W4-5 03preprocessing
No ratings yet
W4-5 03preprocessing
83 pages
2_Machine Learning_130824
No ratings yet
2_Machine Learning_130824
81 pages
DEC_Unit II Data Pre-processing
No ratings yet
DEC_Unit II Data Pre-processing
96 pages
04 DM BI Data Preprocessing
No ratings yet
04 DM BI Data Preprocessing
93 pages
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
02 Data_preprocessing -4,5,6
No ratings yet
02 Data_preprocessing -4,5,6
54 pages
Introduction to data science 1-2-2025
No ratings yet
Introduction to data science 1-2-2025
14 pages
DA_MID1
No ratings yet
DA_MID1
32 pages
Week 12
No ratings yet
Week 12
55 pages
FBA Module 3
No ratings yet
FBA Module 3
41 pages
Data Cleaning Wrangling
No ratings yet
Data Cleaning Wrangling
42 pages
-16-Data Preprocessing
No ratings yet
-16-Data Preprocessing
27 pages
Integrating Data From Different Sources
No ratings yet
Integrating Data From Different Sources
11 pages
Unit 1
No ratings yet
Unit 1
21 pages
Module 2_data preprocessing
No ratings yet
Module 2_data preprocessing
16 pages
Unit-4 Part 1 Preparing Model
No ratings yet
Unit-4 Part 1 Preparing Model
20 pages
02_23ECE216_EDA_Pre Processing
No ratings yet
02_23ECE216_EDA_Pre Processing
16 pages
Unit - II
No ratings yet
Unit - II
56 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
chapter3 DS
No ratings yet
chapter3 DS
17 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Unit3
No ratings yet
Unit3
41 pages
Lec 9
No ratings yet
Lec 9
1 page
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
DSV-S8 Data Cleaning
No ratings yet
DSV-S8 Data Cleaning
34 pages
DWM Module 2
No ratings yet
DWM Module 2
9 pages
Data Preparation
No ratings yet
Data Preparation
17 pages
Unit-Ii Data Preprocessing
No ratings yet
Unit-Ii Data Preprocessing
94 pages
03preprocessing 1
No ratings yet
03preprocessing 1
39 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
dm(2)
No ratings yet
dm(2)
3 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Estimasi Anggaran Biaya Google Adwords Iklan Website
No ratings yet
Estimasi Anggaran Biaya Google Adwords Iklan Website
54 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
Data Preprocessing
No ratings yet
Data Preprocessing
22 pages
Probability Theory I STA 112 IPETU
No ratings yet
Probability Theory I STA 112 IPETU
31 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
Lecture 2 v2 (Compatibility Mode)
No ratings yet
Lecture 2 v2 (Compatibility Mode)
240 pages
Bachelor_thesis_Maike_Wellenbrock_s1136488
No ratings yet
Bachelor_thesis_Maike_Wellenbrock_s1136488
12 pages
Distribution System Short Circuit Analysis
No ratings yet
Distribution System Short Circuit Analysis
6 pages
2 Qwaz
No ratings yet
2 Qwaz
35 pages
Amc 10-2005
No ratings yet
Amc 10-2005
8 pages
Life Table Construction
No ratings yet
Life Table Construction
3 pages
Class - X
No ratings yet
Class - X
1 page
Grade 7 Math Lesson 3: Problems Involving Sets Teaching Guide
100% (2)
Grade 7 Math Lesson 3: Problems Involving Sets Teaching Guide
7 pages
Concepts and Principles of Computer
No ratings yet
Concepts and Principles of Computer
24 pages
I. Objectives: A. Pre-Activity
No ratings yet
I. Objectives: A. Pre-Activity
4 pages
Miembros en Tension en Acero
No ratings yet
Miembros en Tension en Acero
19 pages
Numerical Ray Tracing: 4.2 Lab Objectives
No ratings yet
Numerical Ray Tracing: 4.2 Lab Objectives
12 pages
Danessa 1
No ratings yet
Danessa 1
6 pages
Estimation of Pulp
No ratings yet
Estimation of Pulp
6 pages
Comparative Study of Various Parameter of ESP by Using CFD: Abstract
No ratings yet
Comparative Study of Various Parameter of ESP by Using CFD: Abstract
9 pages
Lab 2
No ratings yet
Lab 2
5 pages
Decision Under Uncertainity 13
No ratings yet
Decision Under Uncertainity 13
26 pages
Pol Ya 1
No ratings yet
Pol Ya 1
23 pages
Chemistry and Math Adventure Assignment
No ratings yet
Chemistry and Math Adventure Assignment
10 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
21 pages
Bound States in The Continuum
No ratings yet
Bound States in The Continuum
1 page
Frequency, Spectrum and Bandwidth
No ratings yet
Frequency, Spectrum and Bandwidth
4 pages
Js
No ratings yet
Js
29 pages
16 Solving The Travelling Salesman Problem Using The Branch and Bound Method
No ratings yet
16 Solving The Travelling Salesman Problem Using The Branch and Bound Method
12 pages
State Machine Timing: Retiming
No ratings yet
State Machine Timing: Retiming
15 pages
Java Programs
100% (1)
Java Programs
30 pages
Aerodynamic Flutter Analysis of Suspension Bridges by A Modal Technique
No ratings yet
Aerodynamic Flutter Analysis of Suspension Bridges by A Modal Technique
8 pages
DDA Book 01 Introduction
No ratings yet
DDA Book 01 Introduction
18 pages
Unit 2 Grade 6 Math I Can Statements-Full Page
No ratings yet
Unit 2 Grade 6 Math I Can Statements-Full Page
5 pages
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet

ET 610 - Data Preprocessing

Uploaded by

ET 610 - Data Preprocessing

Uploaded by

Learning Analytics Tools

Data Preprocessing: Dr Ashwin T S

What is Data Preprocessing

Can we write an algorithm for any given problem?

Can we get an optimum solution for any given problem?

Time and Space • Log

● Accuracy: To check whether the data entered is correct or not.

Sr. No Gender Pregnant Sr. No Gender Pregnant

2 Female Yes Adult 2 Female Yes

Adult 3 Male Yes 4 Female No

Sr. No. Gender Pregnant Sr. No. Gender Pregnant

2 Female Yes 2 Female Yes

Adult 3 Male Yes Adult 3 Female Yes

5 Male Yes 5 Female Yes

1 Male No 2 Female Yes

2 Female Yes 4 Female No

Adult 3 Male Yes Adult 1 Male No

4 Female No 3 Male Yes

5 Male Yes 5 Male Yes

What are the other possibilities?

Entity identification problem: Identifying entities from multiple databases. For

What are the other possibilities?

● Past data might get corrupted due to improper maintenance.

M i ssi n g Dat a Non-ignorable Ignorable

MNAR MCAR MAR

Deleting the entire column

Corporate/work experience of a undergrad student can be zero

Replacing With Mean

most common method of imputing missing values of numeric columns

Replacing With Mode

used in the case of categorical features

Used in case of outliers

Replacing with previous value – Forward fill

used in time series data

Replacing with next value – Backward fill

● Impute the Most Frequent Value

One extra column in one hot encoding

This is a better approach when the data size is small

Imputing the approximations add variance and bias

Works poorly compared to other multiple-imputations method

1. Global (or Point) Outliers

data = [0, 1, 2, 3, 4, 5, 9] data = [0, 1, 2, 3, 6, 6, 6]

data = [0, 1, 2, 3, 4, 5, 10]

In Label encoding, each label is

The process is easy and informative

Why is Label Encoding not applicable

Because it causes a prioritization

it creates multiple dummy features in the dataset

Minimum and maximum value of features are used

It is used when we want to ensure zero mean and unit

4. It is really affected by outliers. It is much less affected by outliers.

8. It is a often called as Scaling Normalization It is a often called as Z-Score Normalization.

Major Sources of Data Preprocessing used in these lecture slides

You might also like