Wayspire AI Course

This document provides an overview of key techniques for exploratory data analysis (EDA) and data preprocessing, including: 1. Visualizing data distributions and relationships using Matplotlib and Seaborn libraries in Python. 2. Detecting and handling missing values and encoding categorical data. Common encoding techniques include one-hot encoding, label encoding, and ordinal encoding. 3. Standardizing data using min-max scaling and z-score normalization to center and normalize features.

Uploaded by

rhittum1802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Wayspire AI Course

Uploaded by

rhittum1802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Wayspire AI Course

Plots:
-Two types of libraries
1. Matplotlib
2. Seaborn

For five point summary:

import matplotlib.pyplot as plt
%matplotlib inline

For outliers:
import seaborn as sns
sns.boxplot (x=”Age”, data=df)
#or
sns.boxplot(x=”Gender”, y=”Age”, data=df)

Count plot (using seaborn):

sns.countplot(x=”Product”, hue=”MaritalStatus”, data=df)
Pairplot:
sns.pairplot(df)
To check the shape of the distribution (using seaborn):
sns.distplot(df[“Age”])

(or to get the shape)

sns.distplot(df[“Age”], kde=True)

For correlation:
corr = df.corr()
print(corr)
Heatmap:
sns.heatmap(corr, annot=True)

EDA (Exploratory Data Analysis)

1. Missing values
i. Standard missing values
Values which pandas can detect.
ii. Non-Standard missing values
Values which pandas cannot detect.
2. Encoding:
i. Dealing with categorical data.
ii. Encodes the categorical data and helps in regional and
iii. Encoding Techniques:
a. N-1 Dummy encoding
b. One-Hot encoding
c. Label encoding
d. Ordinal encoding
e. Frequency encoding
f. Target encoding

I) N-1 Dummy encoding:

Ex: If we have 3 different categories in a column we use “3-1
encoding” or 2 columns will be encoded.
Columns Dairy product Fruits product
1. Veg 0 0
2. Fruits 0 1
3. Dairy 1 0
If there are 10 columns then the no. of encoded columns
would be 9.
II) One-Hot encoding:
In this for ‘n’ categories the no. of encoded columns would
be ‘n’.
Categories Veg-Category Fruit-Category Dairy-Category
Veg 1 0 0
Fruits 0 1 0
Dairy 0 0 1

III) Label encoding:

In this we consider the labels in categorical variables by
Alphabetical order for encoding.
It encodes the columns from 0 – (n-1).
Ex: Performance of a car
Mileage Performance Price encoded Perf.
Bad -> 0 - Bad - 0
Average -> 1 - Good - 2
Good -> 2 - Average - 1

IV) Ordinal encoding:

Encoded values ranges between 0 – (n-1)
Ex: Bad = 0, Average = 1, Good = 2
Standardization and Z transform scaling

 Min-Max scaling and Z Standardization (Scaling)

i. Z-score normalization (Standardisation)
Formula: z = (x – µ)/σ
ii. New mean in Z transform is always 0.
iii. To convert from Z scale to Standard scale
X norm = (X – X min)/(X max – X min)

 Transformation
i.

Lab - Pipes, Redirection and REGEX
No ratings yet
Lab - Pipes, Redirection and REGEX
10 pages
aadarsh
No ratings yet
aadarsh
26 pages
Solution for mid sem paper
No ratings yet
Solution for mid sem paper
7 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
Useful Functions in Python and Associated Libraries: Library Importation
No ratings yet
Useful Functions in Python and Associated Libraries: Library Importation
2 pages
EXP7
No ratings yet
EXP7
10 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
PROGRAM.1
No ratings yet
PROGRAM.1
7 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
LAB MANUAL 5 SOLVED 40 (1)
No ratings yet
LAB MANUAL 5 SOLVED 40 (1)
13 pages
BDA File
No ratings yet
BDA File
26 pages
ML Pgms_24Mar2025
No ratings yet
ML Pgms_24Mar2025
23 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
DAV Assign6
No ratings yet
DAV Assign6
8 pages
Part A Assignment_No_1
No ratings yet
Part A Assignment_No_1
7 pages
Python Datasci Slides
No ratings yet
Python Datasci Slides
13 pages
lec19
No ratings yet
lec19
14 pages
Ip pb1 QP Ms Agra Set A
No ratings yet
Ip pb1 QP Ms Agra Set A
17 pages
Machine Learning
No ratings yet
Machine Learning
81 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
CSE445_T2b_Data_Preprocessing
No ratings yet
CSE445_T2b_Data_Preprocessing
42 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Set-B_CT2_ AnswerKey
No ratings yet
Set-B_CT2_ AnswerKey
10 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
EX-02-Data manipulation pandas matplot
No ratings yet
EX-02-Data manipulation pandas matplot
9 pages
DS_lab manual
No ratings yet
DS_lab manual
31 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Visualization Library Documentation
No ratings yet
Visualization Library Documentation
16 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
final dev record
No ratings yet
final dev record
49 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Question_bank2_1722502558363
No ratings yet
Question_bank2_1722502558363
6 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
Ai - ML - Sarthak1.4
No ratings yet
Ai - ML - Sarthak1.4
4 pages
ml file syllabus
No ratings yet
ml file syllabus
43 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Adobe Scan 15 Apr 2025
No ratings yet
Adobe Scan 15 Apr 2025
19 pages
DSA lab manual pgms_fINAL
No ratings yet
DSA lab manual pgms_fINAL
34 pages
Certificate
No ratings yet
Certificate
25 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
lec20
No ratings yet
lec20
24 pages
visualization
No ratings yet
visualization
18 pages
DEV Nov-Dec-2024
No ratings yet
DEV Nov-Dec-2024
2 pages
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Imp Linux Interview
No ratings yet
Imp Linux Interview
4 pages
Stick Logger LSE-3 User Manual
No ratings yet
Stick Logger LSE-3 User Manual
1 page
SSG-6049P-E1CR36H - System 2,5 Pol HDD and SSD List
No ratings yet
SSG-6049P-E1CR36H - System 2,5 Pol HDD and SSD List
12 pages
Comparatives Activity
100% (1)
Comparatives Activity
2 pages
Sicam Ak Sicam TM: Protocol Element For Ethernet Acc. IEC 61850 Edition 2
No ratings yet
Sicam Ak Sicam TM: Protocol Element For Ethernet Acc. IEC 61850 Edition 2
30 pages
RH2288H V3 IBMC V399 Release Notes 01
No ratings yet
RH2288H V3 IBMC V399 Release Notes 01
83 pages
Crontab - Quick Reference: Setting Up Cron Jobs in Unix and Solaris
No ratings yet
Crontab - Quick Reference: Setting Up Cron Jobs in Unix and Solaris
3 pages
CaseStudy - Dubai Silicon Oasis
No ratings yet
CaseStudy - Dubai Silicon Oasis
2 pages
Cerd Application
No ratings yet
Cerd Application
9 pages
VIT MCA Brochure
No ratings yet
VIT MCA Brochure
21 pages
Unit 1 To 5 DMM Notes
No ratings yet
Unit 1 To 5 DMM Notes
158 pages
Thinkstation P330 SFF Gen 2: 30D1003Cge
No ratings yet
Thinkstation P330 SFF Gen 2: 30D1003Cge
3 pages
Hacking The HP DPS-1200FB A PSU
No ratings yet
Hacking The HP DPS-1200FB A PSU
8 pages
Syed Silar - UI Full Stack Lead Architect
No ratings yet
Syed Silar - UI Full Stack Lead Architect
11 pages
Accounting Information System - Chapter 8
0% (1)
Accounting Information System - Chapter 8
22 pages
Doctor Appointment Booking System
No ratings yet
Doctor Appointment Booking System
3 pages
OS9.14.1.14 S3048 ON Release Notes
No ratings yet
OS9.14.1.14 S3048 ON Release Notes
18 pages
Lab 6
No ratings yet
Lab 6
5 pages
A Practical Guide For Using Micro Station V8i SS2
No ratings yet
A Practical Guide For Using Micro Station V8i SS2
180 pages
Chapter 7 OSI Data Link Layer
No ratings yet
Chapter 7 OSI Data Link Layer
7 pages
Deep Learning For Shark Detection Tasks
No ratings yet
Deep Learning For Shark Detection Tasks
6 pages
SW 7 0 ISE Configuration Guide DV 1 0
No ratings yet
SW 7 0 ISE Configuration Guide DV 1 0
16 pages
Dario Diabetes Management App User Manual
No ratings yet
Dario Diabetes Management App User Manual
12 pages
Informatica Unix Command
100% (1)
Informatica Unix Command
7 pages
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
No ratings yet
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
17 pages
Magic Cube en
No ratings yet
Magic Cube en
2 pages
Knowledge Management, Business Intelligence, and Business Analytics
No ratings yet
Knowledge Management, Business Intelligence, and Business Analytics
8 pages
CN MCQ Unit 1 Set 1
No ratings yet
CN MCQ Unit 1 Set 1
5 pages
Chapter1 Introduction To E-Commerce-Student
0% (1)
Chapter1 Introduction To E-Commerce-Student
46 pages