Data Migration Process Infographics by Slidesgo
Data Migration Process Infographics by Slidesgo
preprocessin
g and
cleaning
Presentation by : Anushree Asthana
A20204221006
Content
Cleaning Preprocessing
Data cleaning involves Data preprocessing
identifying and correcting involves transforming and
errors, inconsistencies, standardizing the data to
and inaccuracies in the make it suitable for
data. analysis
Data cleaning
Deletion Prediction
Removing records Using machine
with missing values, learning algorithms to
but only if they are a predict missing
small percentage of values based on
the dataset. other features in the
dataset.
Imputation
Replacing missing
Detection
values with Identifying data points
estimated values. that deviate significantly
This can be done from the rest of the data.
using techniques like This can be done using
mean, median, mode statistical methods like z-
imputation, or more scores or box plots.
advanced methods
like regression
imputation.
Data preprocessing
Data Integration
Combining data from multiple
sources into a coherent dataset
ultimately resolving inconsistencies.
Data Transformation
Normalization, discretization and
standardization.
Feature Selection
Identifying the most relevant
features for the analysis to reduce
the dimensionality and improve
model performance.
Feature Engineering
Creating new features from existing
ones to capture more complex
relationships in the data.
Let’s talk more about data transformation
Erroneous
data
Now let’s take an example of the topic
Cleaning Preprocessing
Our dataset
We will use the iris
dataset for our
example.
Let’s take a look at the dataset
https://colab.research.google.com/drive/1fBRlo17ZPjWp0Tiw-icFeFdWDyutx-Jt?usp=sha
ring