0% found this document useful (0 votes)
2 views

Normalization

The document outlines data preprocessing steps for the Iris dataset using Python's pandas and scikit-learn libraries. It includes normalization using MinMaxScaler, standardization with StandardScaler, and encoding categorical variables with LabelEncoder and OneHotEncoder. The final output consists of a processed DataFrame ready for analysis or modeling.

Uploaded by

nandhus3860kvk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Normalization

The document outlines data preprocessing steps for the Iris dataset using Python's pandas and scikit-learn libraries. It includes normalization using MinMaxScaler, standardization with StandardScaler, and encoding categorical variables with LabelEncoder and OneHotEncoder. The final output consists of a processed DataFrame ready for analysis or modeling.

Uploaded by

nandhus3860kvk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

import pandas as pd

from sklearn.preprocessing import MinMaxScaler, StandardScaler


from sklearn.preprocessing import LabelEncoder, OneHotEncoder

df = pd.read_csv("C:\\Users\\HP\\Downloads\\iris_data (2).csv")
df

sepal_length sepal_width petal_length petal_width species


0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 virginica
146 6.3 2.5 5.0 1.9 virginica
147 6.5 3.0 5.2 2.0 virginica
148 6.2 3.4 5.4 2.3 virginica
149 5.9 3.0 5.1 1.8 virginica

[150 rows x 5 columns]

# Normal df containns both feature and target


df_num = df.drop('species',axis = 1) # axis means row,column to drop
and 0 means row and 1 means column
df_num.head()

sepal_length sepal_width petal_length petal_width


0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

n = MinMaxScaler()
df_normalization = pd.DataFrame(n.fit_transform(df_num),columns =
df_num.columns)

df_normalization

sepal_length sepal_width petal_length petal_width


0 0.222222 0.625000 0.067797 0.041667
1 0.166667 0.416667 0.067797 0.041667
2 0.111111 0.500000 0.050847 0.041667
3 0.083333 0.458333 0.084746 0.041667
4 0.194444 0.666667 0.067797 0.041667
.. ... ... ... ...
145 0.666667 0.416667 0.711864 0.916667
146 0.555556 0.208333 0.677966 0.750000
147 0.611111 0.416667 0.711864 0.791667
148 0.527778 0.583333 0.745763 0.916667
149 0.444444 0.416667 0.694915 0.708333

[150 rows x 4 columns]

#Standarize the data (Z-Score Scaling)


s = StandardScaler()
df_standardize = pd.DataFrame(s.fit_transform(df_num),columns =
df_num.columns)
df_standardize.head()

sepal_length sepal_width petal_length petal_width


0 -0.900681 1.032057 -1.341272 -1.312977
1 -1.143017 -0.124958 -1.341272 -1.312977
2 -1.385353 0.337848 -1.398138 -1.312977
3 -1.506521 0.106445 -1.284407 -1.312977
4 -1.021849 1.263460 -1.341272 -1.312977

# LabelEncoder
label_encoder = LabelEncoder()
df['species_encoder'] =label_encoder.fit_transform(df['species'])
print(df['species_encoder'].head())
print('===============================================================
=============================')
df

0 0
1 0
2 0
3 0
4 0
Name: species_encoder, dtype: int32
======================================================================
======================

sepal_length sepal_width petal_length petal_width


species \
0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

.. ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica


147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

species_encoder
0 0
1 0
2 0
3 0
4 0
.. ...
145 2
146 2
147 2
148 2
149 2

[150 rows x 6 columns]

# ===================OneHot Encoding=========================

import pandas as pd
df = pd.read_csv('C:\\Users\\HP\\Downloads\\iris_data (2).csv')

onehot_encoder = OneHotEncoder(sparse_output = False)

encoded_species =
pd.DataFrame(onehot_encoder.fit_transform(df[['species']]))
encoded_species

merged = pd.concat([df, encoded_species], axis=1)


merged

result = merged.drop('species',axis = 1)
result

sepal_length sepal_width petal_length petal_width 0 1


2
0 5.1 3.5 1.4 0.2 1.0 0.0
0.0
1 4.9 3.0 1.4 0.2 1.0 0.0
0.0
2 4.7 3.2 1.3 0.2 1.0 0.0
0.0
3 4.6 3.1 1.5 0.2 1.0 0.0
0.0
4 5.0 3.6 1.4 0.2 1.0 0.0
0.0
.. ... ... ... ... ... ... .
..
145 6.7 3.0 5.2 2.3 0.0 0.0
1.0
146 6.3 2.5 5.0 1.9 0.0 0.0
1.0
147 6.5 3.0 5.2 2.0 0.0 0.0
1.0
148 6.2 3.4 5.4 2.3 0.0 0.0
1.0
149 5.9 3.0 5.1 1.8 0.0 0.0
1.0

[150 rows x 7 columns]

You might also like