0% found this document useful (0 votes)
13 views

DMLAB-2 - 4238 - 01-08-2024.ipynb - Colab

Uploaded by

subhash.ramanoju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DMLAB-2 - 4238 - 01-08-2024.ipynb - Colab

Uploaded by

subhash.ramanoju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

8/26/24, 11:46 AM DMLAB-2_4238_01-08-2024.

ipynb - Colab

A) Dealing with categorial Data

1 import pandas as pd
2 data ={'chocolets':["Cadbury","Snickers","KitKack","MilkyBar"]}
3 data
4 df = pd.DataFrame(data)
5 df

chocolets

0 Cadbury

1 Snickers

2 KitKack

3 MilkyBar

1 df_encoded = pd.get_dummies(df)
2 df_encoded.head()

chocolets_Cadbury chocolets_KitKack chocolets_MilkyBar chocolets_Snickers

0 True False False False

1 False False False True

2 False True False False

3 False False True False

1 from sklearn.preprocessing import OneHotEncoder


2 encoder = OneHotEncoder()
3 encoder_result = encoder.fit_transform(df).toarray()
4 encoder_result

array([[1., 0., 0., 0.],


[0., 0., 0., 1.],
[0., 1., 0., 0.],
[0., 0., 1., 0.]])

B)Scaling

1 import pandas as pd
2 from sklearn.preprocessing import StandardScaler
3 scale = StandardScaler()
4 df=pd.read_excel("/content/students.xlsx")
5 df.head()

https://colab.research.google.com/drive/1ghyP3zLjCY_gZhTo4Hrkan3CXFTgtwnn#printMode=true 1/4
8/26/24, 11:46 AM DMLAB-2_4238_01-08-2024.ipynb - Colab

Name ID Branch Marks Percentage

0 Uday 1 CSM 456 91.2

1 Kalyan 2 CSM 434 86.8

2 Ajay 3 CSM 334 66.8

3 Abhi ram 4 CSM 423 84.6

4 Sai Kumar 5 CSM 458 91.6

1 x=df[['Marks','Percentage']]
2 scaledx=scale.fit_transform(x)
3 print(scaledx)

[[ 0.87218389 0.87218389]
[ 0.58557 0.58557 ]
[-0.7172204 -0.7172204 ]
[ 0.44226306 0.44226306]
[ 0.8982397 0.8982397 ]
[ 1.01549084 1.01549084]
[-0.73024831 -0.73024831]
[-0.86052735 -0.86052735]
[-0.54785765 -0.54785765]
[-1.73339692 -1.73339692]
[ 0.71584905 0.71584905]
[ 0.87218389 0.87218389]
[ 1.43238377 1.43238377]
[-0.57391346 -0.57391346]
[-2.02001081 -2.02001081]
[-1.17319705 -1.17319705]
[ 0.12959336 0.12959336]
[ 1.27604892 1.27604892]
[ 0.11656546 0.11656546]]

C) splitting dataset into training & Testing

1 import numpy as np
2 from sklearn.model_selection import train_test_split
3 x=np.arange(1,25).reshape(12,2)
4 y=np.array([0,1,1,0,1,0,0,1,1,0,1,0])
5

1 x

array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12],
[13, 14],
[15, 16],

https://colab.research.google.com/drive/1ghyP3zLjCY_gZhTo4Hrkan3CXFTgtwnn#printMode=true 2/4
8/26/24, 11:46 AM DMLAB-2_4238_01-08-2024.ipynb - Colab
[17, 18],
[19, 20],
[21, 22],
[23, 24]])

1 y

array([0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

1 x_train,x_test,y_train,y_test = train_test_split(x,y)

1 x_train

array([[21, 22],
[ 9, 10],
[19, 20],
[13, 14],
[ 7, 8],
[15, 16],
[ 1, 2],
[ 3, 4],
[23, 24]])

1 x_test

array([[ 5, 6],
[11, 12],
[17, 18]])

1 y_train

array([1, 1, 0, 0, 0, 1, 0, 1, 0])

1 y_test

array([1, 0, 1])

https://colab.research.google.com/drive/1ghyP3zLjCY_gZhTo4Hrkan3CXFTgtwnn#printMode=true 3/4
8/26/24, 11:46 AM DMLAB-2_4238_01-08-2024.ipynb - Colab

https://colab.research.google.com/drive/1ghyP3zLjCY_gZhTo4Hrkan3CXFTgtwnn#printMode=true 4/4

You might also like