0% found this document useful (0 votes)
5 views1 page

vertopal.com_Numpy,,Pandas(24.4.25)

The document demonstrates various operations using NumPy and Pandas for data manipulation, including array creation, basic arithmetic operations, matrix multiplication, and data stacking. It also covers the creation and manipulation of Pandas Series and DataFrames, showcasing data indexing, sorting, and handling missing values. Additionally, it provides insights into reading a CSV file and analyzing its contents, including unique value counts and duplicates.

Uploaded by

gipom89038
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views1 page

vertopal.com_Numpy,,Pandas(24.4.25)

The document demonstrates various operations using NumPy and Pandas for data manipulation, including array creation, basic arithmetic operations, matrix multiplication, and data stacking. It also covers the creation and manipulation of Pandas Series and DataFrames, showcasing data indexing, sorting, and handling missing values. Additionally, it provides insights into reading a CSV file and analyzing its contents, including unique value counts and duplicates.

Uploaded by

gipom89038
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

import numpy as np

a= np.random.randint(1,50,4).reshape(2,2)
a

array([[34, 18],
[36, 39]])

b= np.random.randint(1,50,4).reshape(2,2)
b

array([[16, 8],
[33, 47]])

a+b

array([[50, 26],
[69, 86]])

a-b

array([[18, 10],
[ 3, -8]])

a*b

array([[ 544, 144],


[1188, 1833]])

a/b

array([[2.125 , 2.25 ],
[1.09090909, 0.82978723]])

## matrix multiplication
a@b

array([[1138, 1118],
[1863, 2121]])

## Stacking
np.hstack([a,b])

array([[34, 18, 16, 8],


[36, 39, 33, 47]])

np.vstack([a,b])

array([[34, 18],
[36, 39],
[16, 8],
[33, 47]])

np.identity(3)

array([[1., 0., 0.],


[0., 1., 0.],
[0., 0., 1.]])

array([[34, 18],
[36, 39]])

np.sum(a,axis=0)

array([70, 57])

np.sum(a,axis=1)

array([52, 75])

Pandas

import pandas as pd

Series

ser1= pd.Series(data=[100,60,88,99,86],index=['Ajay','Sanjay','Ramesh',
ser1

Ajay 100
Sanjay 60
Ramesh 88
Ramya 99
Tom 86
dtype: int64

ser1['Ramesh']

88

ser1['Ajay':'Ramesh']

Ajay 100
Sanjay 60
Ramesh 88
dtype: int64

ser1[['Ajay','Ramesh']]

Ajay 100
Ramesh 88
dtype: int64

ser1.max()

100

ser1.min()

60

ser1.mean()

86.6

ser1.median()

88.0

ser1[ser1==100]

Ajay 100
dtype: int64

ser1[ser1<80]

Sanjay 60
dtype: int64

ser1

Ajay 100
Sanjay 60
Ramesh 88
Ramya 99
Tom 86
dtype: int64

ser1['Sanjay']=75

ser1

Ajay 100
Sanjay 75
Ramesh 88
Ramya 99
Tom 86
dtype: int64

DataFrame
dict1={'Names':['Ajay','Sanjay','Ramya','Suman','John'],
'Age':[22,25,32,26,29],'Gender':['Male','Male','Female','Female',
'Location':['Bengaluru','Chennai','Orissa','Bengaluru','Pune']}

dict1

{'Names': ['Ajay', 'Sanjay', 'Ramya', 'Suman', 'John'],


'Age': [22, 25, 32, 26, 29],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Male'],
'Location': ['Bengaluru', 'Chennai', 'Orissa', 'Bengaluru', 'Pune']}

df=pd.DataFrame(dict1,index=range(1,6))
df

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

4 Suman 26 Female Bengaluru

5 John 29 Male Pune

df.columns

Index(['Names', 'Age', 'Gender', 'Location'], dtype='object')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 1 to 5
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Names 5 non-null object
1 Age 5 non-null int64
2 Gender 5 non-null object
3 Location 5 non-null object
dtypes: int64(1), object(3)
memory usage: 292.0+ bytes

df.shape

(5, 4)

df.size

20

df.ndim

df

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

4 Suman 26 Female Bengaluru

5 John 29 Male Pune

df.sort_values(by=['Age','Gender'],ascending=False)

Names Age Gender Location

3 Ramya 32 Female Orissa

5 John 29 Male Pune

4 Suman 26 Female Bengaluru

2 Sanjay 25 Male Chennai

1 Ajay 22 Male Bengaluru

## df.drop(columns=['Names','Age'],inplace=True)

df

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

4 Suman 26 Female Bengaluru

5 John 29 Male Pune

df[:3]

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

df[['Names','Gender']]

Names Gender

1 Ajay Male

2 Sanjay Male

3 Ramya Female

4 Suman Female

5 John Male

df

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

4 Suman 26 Female Bengaluru

5 John 29 Male Pune

df.columns.get_loc('Location')

df.iloc[:,:3]

Names Age Gender

1 Ajay 22 Male

2 Sanjay 25 Male

3 Ramya 32 Female

4 Suman 26 Female

5 John 29 Male

df.iloc[:3,1:]

Age Gender Location

1 22 Male Bengaluru

2 25 Male Chennai

3 32 Female Orissa

df.iloc[[0,2,4],[0,2,3]]

Names Gender Location

1 Ajay Male Bengaluru

3 Ramya Female Orissa

5 John Male Pune

df

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

4 Suman 26 Female Bengaluru

5 John 29 Male Pune

df.loc[1:3,'Names':'Gender']

Names Age Gender

1 Ajay 22 Male

2 Sanjay 25 Male

3 Ramya 32 Female

df.loc[3:5,'Age':'Location']

Age Gender Location

3 32 Female Orissa

4 26 Female Bengaluru

5 29 Male Pune

df.loc[[1,3,5],['Names','Age','Location']]

Names Age Location

1 Ajay 22 Bengaluru

3 Ramya 32 Orissa

5 John 29 Pune

df

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

3 Ramya 32 Female Orissa

4 Suman 26 Female Bengaluru

5 John 29 Male Pune

df[df.Gender=='Male']

Names Age Gender Location

1 Ajay 22 Male Bengaluru

2 Sanjay 25 Male Chennai

5 John 29 Male Pune

df[(df.Gender=='Male') & (df.Location=='Bengaluru')]

Names Age Gender Location

1 Ajay 22 Male Bengaluru

df= pd.read_csv('D:\Desktop(8.1.25)\Machine Learning Datasets\CARS_1.cs

df.head(10)

name_of_car Model Type Origin DriveTrain MSRP Invoice

NSX coupe
0 Acura Sports Asia Rear $89,765 $79,978
2dr manual S
3.5 RL
1 Acura w/Navigation Sedan Asia Front $46,100 $41,100
4dr

2 Acura 3.5 RL 4dr Sedan Asia Front $43,755 $39,014

3 Acura MDX NaN Asia All $36,945 NaN

4 Acura TL 4dr NaN Asia Front $33,195 NaN

5 Acura TSX 4dr NaN Asia Front $26,990 NaN


RSX Type S
6 Acura NaN Asia Front $23,820 NaN
2dr

7 Audi RS 6 4dr NaN Europe Front $84,600 NaN

A8 L Quattro
8 Audi NaN Europe All $69,190 NaN
4dr
A6 4.2
9 Audi NaN Europe All $49,690 NaN
Quattro 4dr

df.tail()

name_of_car Model Type Origin DriveTrain MSRP Invoice

438 NaN NaN NaN NaN NaN NaN NaN

439 NaN NaN NaN NaN NaN NaN NaN

group 1) data will


440 NaN NaN NaN NaN NaN
by split
2) apply
the
441 NaN NaN NaN NaN NaN NaN
function
mean(mpg)

442 NaN NaN 3) combine NaN NaN NaN NaN

df.shape

(443, 16)

df.size

7088

df.ndim

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 443 entries, 0 to 442
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name_of_car 428 non-null object
1 Model 429 non-null object
2 Type 409 non-null object
3 Origin 428 non-null object
4 DriveTrain 428 non-null object
5 MSRP 428 non-null object
6 Invoice 406 non-null object
7 EngineSize 428 non-null float64
8 Cylinders 426 non-null float64
9 Horsepower 343 non-null float64
10 MPG_City 428 non-null float64
11 MPG_Highway 428 non-null float64
12 Weight 428 non-null float64
13 Wheelbase 428 non-null float64
14 Length 428 non-null float64
15 Unnamed: 15 2 non-null float64
dtypes: float64(9), object(7)
memory usage: 55.5+ KB

## missing values
df.isnull().sum()

name_of_car 15
Model 14
Type 34
Origin 15
DriveTrain 15
MSRP 15
Invoice 37
EngineSize 15
Cylinders 17
Horsepower 100
MPG_City 15
MPG_Highway 15
Weight 15
Wheelbase 15
Length 15
Unnamed: 15 441
dtype: int64

df.duplicated().sum()

11

df[df.duplicated()]

name_of_car Model Type Origin DriveTrain MSRP Invoice Eng

429 NaN NaN NaN NaN NaN NaN NaN NaN

430 NaN NaN NaN NaN NaN NaN NaN NaN

431 NaN NaN NaN NaN NaN NaN NaN NaN

432 NaN NaN NaN NaN NaN NaN NaN NaN

433 NaN NaN NaN NaN NaN NaN NaN NaN

434 NaN NaN NaN NaN NaN NaN NaN NaN

435 NaN NaN NaN NaN NaN NaN NaN NaN

436 NaN NaN NaN NaN NaN NaN NaN NaN

437 NaN NaN NaN NaN NaN NaN NaN NaN

438 NaN NaN NaN NaN NaN NaN NaN NaN

439 NaN NaN NaN NaN NaN NaN NaN NaN

df.Origin.unique()

array(['Asia', 'Europe', 'USA', nan], dtype=object)

df.Origin.value_counts()

Asia 158
USA 147
Europe 123
Name: Origin, dtype: int64

df.name_of_car.value_counts()

Toyota 28
Chevrolet 27
Mercedes-Benz 26
Ford 23
BMW 20
Audi 19
Honda 17
Nissan 17
Volkswagen 15
Chrysler 15
Dodge 13
Mitsubishi 13
Volvo 12
Jaguar 12
Hyundai 12
Subaru 11
Pontiac 11
Mazda 11
Lexus 11
Kia 11
Buick 9
Mercury 9
Lincoln 9
Saturn 8
Cadillac 8
Suzuki 8
Infiniti 8
GMC 8
Acura 7
Porsche 7
Saab 7
Land Rover 3
Oldsmobile 3
Jeep 3
Scion 2
Isuzu 2
MINI 2
Hummer 1
Name: name_of_car, dtype: int64

You might also like