0% found this document useful (0 votes)
50 views

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

3/7/22, 3:49 PM Copy of data_preprocessing_tools.

ipynb - Colaboratory

Data Preprocessing Tools

Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Importing the dataset

dataset= pd.read_csv('Data.csv')

X= dataset.iloc[:, :-1]

Y= dataset.iloc[:, -1]

print(X)

Country Age Salary

0 France 44.0 72000.0

1 Spain 27.0 48000.0

2 Germany 30.0 54000.0

3 Spain 38.0 61000.0

4 Germany 40.0 NaN

5 France 35.0 58000.0

6 Spain NaN 52000.0

7 France 48.0 79000.0

8 Germany 50.0 83000.0

9 France 37.0 67000.0

print(Y)

0 No

1 Yes

2 No

3 No

4 Yes

5 Yes

6 No

7 Yes

8 No

9 Yes

Name: Purchased, dtype: object

Taking care of missing data


from sklearn impute import SimpleImputer
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 1/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
from sklearn.impute import SimpleImputer

imputer= SimpleImputer(missing_values=np.nan,strategy='mean')

imputer.fit(X.iloc[:, 1:3])

X.iloc[:, 1:3] = imputer.transform(X.iloc[:, 1:3])

Encoding categorical data

Encoding the Independent Variable

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

ct= ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthr
X = np.array(ct.fit_transform(X))

print(X)

[[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.00000000e+01

5.40000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.70000000e+01

6.70000000e+04]]

Encoding the Dependent Variable

from sklearn.preprocessing import LabelEncoder

le =LabelEncoder()

Y= le.fit_transform(Y)

print(Y)

[0 1 0 0 1 1 0 1 0 1]

Splitting the dataset into the Training set and Test set
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 2/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)

print(X_train)

[[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]]

print(Y_train)

[0 1 0 0 1 1 0 1]

print(X_test)

[[0.0e+00 1.0e+00 0.0e+00 3.0e+01 5.4e+04]

[1.0e+00 0.0e+00 0.0e+00 3.7e+01 6.7e+04]]

print(Y_test)

[0 1]

Feature Scaling

from sklearn.preprocessing import StandardScaler

sc= StandardScaler()

X_train[:, 3:]=sc.fit_transform(X_train[:, 3:])

X_test[:, 3:]=sc.transform(X_test[:, 3:])

print(X_train)

[[ 0. 0. 1. -0.19159184 -1.07812594]

[ 0. 1. 0. -0.01411729 -0.07013168]

[ 1. 0. 0. 0.56670851 0.63356243]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 3/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
[ 0. 0. 1. -0.30453019 -0.30786617]

[ 0. 0. 1. -1.90180114 -1.42046362]

[ 1. 0. 0. 1.14753431 1.23265336]

[ 0. 1. 0. 1.43794721 1.57499104]

[ 1. 0. 0. -0.74014954 -0.56461943]]

print(X_test)

[[ 0. 1. 0. -1.46618179 -0.9069571 ]

[ 1. 0. 0. -0.44973664 0.20564034]]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 4/4

You might also like