Lab 2 Data Pre-Processing and Feature Engineering Using Pandas and Numpy
Instructor: Shaina Laraib
1. Introduction DATA PRE-PROCESSING
Data preprocessing is an integral step in Machine Learning as the quality of data
and the useful information that can be derived from it directly affects the ability of our model to learn; therefore, it is extremely important that we preprocess our data before feeding it into our model. In this lab, we will be covering the following steps of data pre-processing: Data Cleaning - Handling missing values - Handling Outliers - Dealing with duplicate values Data transformation - Scaling - Normalization
FEATURE ENGINEERING
Feature engineering is the process of selecting, manipulating, and transforming raw
data into features that can be used in supervised learning. Feature engineering is a machine learning technique that leverages data to create new variables that aren’t in the training set. It can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy. Feature Engineering is important because regardless of the data or architecture, a terrible feature will have a direct impact on your model. Data Cleaning Data cleaning is the key step in machine learning. Data is usually gathered from multiple sources, resulting in duplicates and redundant values. Such values need to be dealt with before giving it to the model. Looking for Missing Values - Loading and importing the dataset. - Looking for Null Values across Rows and Columns - Handling Missing Values - Outliers Detection and Removal - Duplicates Removal LAB TASK: - Apply data cleaning step by step on House Price Prediction Dataset - Apply transformations on the data if required. - You’re required to remove null values, remove outliers, handle duplicates, and apply scaling and normalization on features (if required). - Bonus Question: Can you apply some feature engineering to this data?
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB