0% found this document useful (0 votes)

60 views4 pages

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views4 pages

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

3/7/22, 3:49 PM Copy of data_preprocessing_tools.

ipynb - Colaboratory

Data Preprocessing Tools

Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Importing the dataset

dataset= pd.read_csv('Data.csv')

X= dataset.iloc[:, :-1]

Y= dataset.iloc[:, -1]

print(X)

Country Age Salary

0 France 44.0 72000.0

1 Spain 27.0 48000.0

2 Germany 30.0 54000.0

3 Spain 38.0 61000.0

4 Germany 40.0 NaN

5 France 35.0 58000.0

6 Spain NaN 52000.0

7 France 48.0 79000.0

8 Germany 50.0 83000.0

9 France 37.0 67000.0

print(Y)

0 No

1 Yes

2 No

3 No

4 Yes

5 Yes

6 No

7 Yes

8 No

9 Yes

Name: Purchased, dtype: object

Taking care of missing data

from sklearn impute import SimpleImputer
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 1/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
from sklearn.impute import SimpleImputer

imputer= SimpleImputer(missing_values=np.nan,strategy='mean')

imputer.fit(X.iloc[:, 1:3])

X.iloc[:, 1:3] = imputer.transform(X.iloc[:, 1:3])

Encoding categorical data

Encoding the Independent Variable

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

ct= ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthr
X = np.array(ct.fit_transform(X))

print(X)

[[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.00000000e+01

5.40000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.70000000e+01

6.70000000e+04]]

Encoding the Dependent Variable

from sklearn.preprocessing import LabelEncoder

le =LabelEncoder()

Y= le.fit_transform(Y)

print(Y)

[0 1 0 0 1 1 0 1 0 1]

Splitting the dataset into the Training set and Test set
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 2/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)

print(X_train)

[[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]]

print(Y_train)

[0 1 0 0 1 1 0 1]

print(X_test)

[[0.0e+00 1.0e+00 0.0e+00 3.0e+01 5.4e+04]

[1.0e+00 0.0e+00 0.0e+00 3.7e+01 6.7e+04]]

print(Y_test)

[0 1]

Feature Scaling

from sklearn.preprocessing import StandardScaler

sc= StandardScaler()

X_train[:, 3:]=sc.fit_transform(X_train[:, 3:])

X_test[:, 3:]=sc.transform(X_test[:, 3:])

print(X_train)

[[ 0. 0. 1. -0.19159184 -1.07812594]

[ 0. 1. 0. -0.01411729 -0.07013168]

[ 1. 0. 0. 0.56670851 0.63356243]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 3/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
[ 0. 0. 1. -0.30453019 -0.30786617]

[ 0. 0. 1. -1.90180114 -1.42046362]

[ 1. 0. 0. 1.14753431 1.23265336]

[ 0. 1. 0. 1.43794721 1.57499104]

[ 1. 0. 0. -0.74014954 -0.56461943]]

print(X_test)

[[ 0. 1. 0. -1.46618179 -0.9069571 ]

[ 1. 0. 0. -0.44973664 0.20564034]]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 4/4

Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Exp. 1 (2)
No ratings yet
Exp. 1 (2)
4 pages
Data Science
No ratings yet
Data Science
1 page
DA_Programs
No ratings yet
DA_Programs
44 pages
LinearReg33
No ratings yet
LinearReg33
3 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
HW2A_Jiarui Han
No ratings yet
HW2A_Jiarui Han
6 pages
Final ML File
No ratings yet
Final ML File
34 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
9 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Feature Scaling Codes
No ratings yet
Feature Scaling Codes
1 page
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
DMLAB-2 - 4238 - 01-08-2024.ipynb - Colab
No ratings yet
DMLAB-2 - 4238 - 01-08-2024.ipynb - Colab
4 pages
Shobit Sharma (2124399) ML lab file pdf
No ratings yet
Shobit Sharma (2124399) ML lab file pdf
19 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Machine File
No ratings yet
Machine File
27 pages
ML Lab
No ratings yet
ML Lab
7 pages
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
No ratings yet
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
4 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
ML_Manual
No ratings yet
ML_Manual
18 pages
mlalllabprgs
No ratings yet
mlalllabprgs
17 pages
data-mining-lab-manual-CSE-VII-Sem
No ratings yet
data-mining-lab-manual-CSE-VII-Sem
63 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
1
No ratings yet
1
13 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Experiment1111
No ratings yet
Experiment1111
25 pages
Regression
No ratings yet
Regression
32 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Sample code
No ratings yet
Sample code
8 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
6 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
30 pages
som
No ratings yet
som
19 pages
External
No ratings yet
External
11 pages
cp4252-machine-learning-lab-manual (1)
No ratings yet
cp4252-machine-learning-lab-manual (1)
21 pages
ML
No ratings yet
ML
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Arbitrary Value Imputation.
No ratings yet
Arbitrary Value Imputation.
5 pages
Final_ML_Programs_075005
No ratings yet
Final_ML_Programs_075005
15 pages
KRAI Practical
No ratings yet
KRAI Practical
14 pages
featureselection
No ratings yet
featureselection
11 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
DSBDA05
No ratings yet
DSBDA05
5 pages
Dsbda 4
No ratings yet
Dsbda 4
4 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Logistic Regression
No ratings yet
Logistic Regression
3 pages
FDA_BATCH2PROGRAM
No ratings yet
FDA_BATCH2PROGRAM
18 pages
Deep Learning Perceptron
No ratings yet
Deep Learning Perceptron
10 pages
Machine learningn
No ratings yet
Machine learningn
5 pages
1st PGM
No ratings yet
1st PGM
10 pages

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

3/7/22, 3:49 PM Copy of data_preprocessing_tools.

Data Preprocessing Tools

Importing the libraries

Importing the dataset

Country Age Salary

0 France 44.0 72000.0

1 Spain 27.0 48000.0

2 Germany 30.0 54000.0

3 Spain 38.0 61000.0

4 Germany 40.0 NaN

5 France 35.0 58000.0

6 Spain NaN 52000.0

7 France 48.0 79000.0

8 Germany 50.0 83000.0

9 France 37.0 67000.0

Name: Purchased, dtype: object

Taking care of missing data

Encoding categorical data

Encoding the Independent Variable

[[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.00000000e+01

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.70000000e+01

Encoding the Dependent Variable

[[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

[[0.0e+00 1.0e+00 0.0e+00 3.0e+01 5.4e+04]

[1.0e+00 0.0e+00 0.0e+00 3.7e+01 6.7e+04]]

You might also like