0% found this document useful (0 votes)
2 views

ML Lab 3

Uploaded by

zulqarnain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML Lab 3

Uploaded by

zulqarnain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AI-3002- Machine Learning

Lab 2
Data Pre-Processing and Feature Engineering
Using Pandas and Numpy

Instructor: Shaina Laraib


1. Introduction
DATA PRE-PROCESSING

Data preprocessing is an integral step in Machine Learning as the quality of data


and the useful information that can be derived from it directly affects the ability of
our model to learn; therefore, it is extremely important that we preprocess our data
before feeding it into our model.
In this lab, we will be covering the following steps of data pre-processing:
 Data Cleaning
- Handling missing values
- Handling Outliers
- Dealing with duplicate values
 Data transformation
- Scaling
- Normalization

FEATURE ENGINEERING

Feature engineering is the process of selecting, manipulating, and transforming raw


data into features that can be used in supervised learning. Feature engineering is a
machine learning technique that leverages data to create new variables that aren’t in
the training set. It can produce new features for both supervised and unsupervised
learning, with the goal of simplifying and speeding up data transformations while
also enhancing model accuracy. Feature Engineering is important because
regardless of the data or architecture, a terrible feature will have a direct impact on
your model.
Data Cleaning
Data cleaning is the key step in machine learning. Data is usually gathered
from multiple sources, resulting in duplicates and redundant values. Such
values need to be dealt with before giving it to the model.
Looking for Missing Values
- Loading and importing the dataset.
- Looking for Null Values across Rows and Columns
- Handling Missing Values
- Outliers Detection and Removal
- Duplicates Removal
LAB TASK:
- Apply data cleaning step by step on House Price
Prediction Dataset
- Apply transformations on the data if required.
- You’re required to remove null values, remove outliers,
handle duplicates, and apply scaling and normalization on
features (if required).
- Bonus Question: Can you apply some feature engineering
to this data?

You might also like