0% found this document useful (0 votes)
2 views

Data Migration Process Infographics by Slidesgo

Uploaded by

Anushree Asthana
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Migration Process Infographics by Slidesgo

Uploaded by

Anushree Asthana
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Data

preprocessin
g and
cleaning
Presentation by : Anushree Asthana
A20204221006
Content

Cleaning Preprocessing
Data cleaning involves Data preprocessing
identifying and correcting involves transforming and
errors, inconsistencies, standardizing the data to
and inaccuracies in the make it suitable for
data. analysis
Data cleaning

Deletion Prediction
Removing records Using machine
with missing values, learning algorithms to
but only if they are a predict missing
small percentage of values based on
the dataset. other features in the
dataset.

Imputation
Replacing missing
Detection
values with Identifying data points
estimated values. that deviate significantly
This can be done from the rest of the data.
using techniques like This can be done using
mean, median, mode statistical methods like z-
imputation, or more scores or box plots.
advanced methods
like regression
imputation.
Data preprocessing
Data Integration
Combining data from multiple
sources into a coherent dataset
ultimately resolving inconsistencies.

Data Transformation
Normalization, discretization and
standardization.

Feature Selection
Identifying the most relevant
features for the analysis to reduce
the dimensionality and improve
model performance.
Feature Engineering
Creating new features from existing
ones to capture more complex
relationships in the data.
Let’s talk more about data transformation

Normalizati Descritizati Standardizati


Scalingon
numerical on
Converting on
data to a continuous Transforming data to
common range numerical data have a mean of 0
(e.g., 0-1 or -1 to into discrete and a standard
1). This helps intervals or deviation of 1. This
prevent features categories. is useful for
with larger scales Useful for certain algorithms that
from dominating algorithms or assume normally
the analysis. simplify analysis. distributed data.

Erroneous
data
Now let’s take an example of the topic

Cleaning Preprocessing
Our dataset
We will use the iris
dataset for our
example.
Let’s take a look at the dataset
https://colab.research.google.com/drive/1fBRlo17ZPjWp0Tiw-icFeFdWDyutx-Jt?usp=sha
ring

You might also like