Pandas 2 Complete Notes Class XII
Pandas 2 Complete Notes Class XII
import pandas as pd
import numpy as np
df=pd.DataFrame({'Population':[10927986,12691836,4631392
, 4328063],
'Hospital':[189,208,149,157],
'School':[7916,8508,7226,7617]},
index=['Delhi','Mumbai','Kolkata','Chennai'])
print(df)
for (row, rowSeries) in df.iterrows():
print("Row index :" , row)
print("Containing :")
print(rowSeries)
df1=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
df2=pd.DataFrame([[51,12,32],[41,55,62],[17,88,None]])
df.quantile()
df[['Maths','Science','S. St','Hindi','Eng']].var()
df[['Maths','Science','S. St','Hindi','Eng']].std
df2=pd.DataFrame([[51,12,32],[41,55,62],[17,88,None]])
df2.describe()
0 1 2
count 3.000000 3.000000 2.000000
mean 36.333333 51.666667 47.000000
std 17.473790 38.109491 21.213203
min 17.000000 12.000000 32.000000
25% 29.000000 33.500000 39.500000
50% 41.000000 55.000000 47.000000
75% 46.000000 71.500000 54.500000
max 51.000000 88.000000 62.000000
import pandas as pd
df=pd.DataFrame(marksUT)
print(df)
>>> df.aggregate('max')
Name Zuhaire
UT 3
Maths 24
Science 25
S.St 25
Hindi 25
Eng 24
dtype: object
>>>df.aggregate(['max','count'])
Name UT Maths Science S.St Hindi Eng
max Zuhaire 3 24 25 25 25 24
count 12 12 12 12 12 12 12
Sorting a dataFrame
Sorting refers to the arrangement of data elements in a
specified order, which can either be ascending or descending.
Pandas provide sort_values() function to sort the data values
of a DataFrame.
Here, a column list (by), axis arguments (0 for rows and 1 for
columns) and the order of sorting (ascending = False or True)
are passed as arguments. By default, sorting is done on row
indexes in ascending order.
print(df.sort_values(by=['Name']))
>>> print(df.sort_values(by=['Name']))
Name UT Maths Science S.St Hindi Eng
6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20
0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
5 Zu haire 3 22 18 19 23 13
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15
print(df.sort_values(by=['Science']))
print(df.sort_values(by=['Eng'],ascending=False))
A DataFrame can be sorted based on multiple columns.
>>> print(df.sort_values(by=['Science','Hindi']))
Group BY FunctIons
GROUP BY() function is used to split the data into groups based
on some criteria. Pandas objects like a DataFrame can be split
on any of their axes.
In other words, the duplicate values in the same field are
grouped together to form groups.
g1=df.groupby('Name')
note:- Python creaed groups based on volumn's values but did
not display grouped data, as groupby() is also an object.
df1=df.groupby('Name')
{'Ashravy': [6, 7, 8], 'Mishti': [9, 10, 11], 'Raman': [0, 1, 2],
'Zu haire': [5], 'Zuhaire': [3, 4]}
#df1.get_group('Mishti')
df1.get_group('Raman')
df1=df.groupby(['Name', 'UT'])
>>> df1.first()
Maths Science S.St Hindi Eng
Name UT
Ashravy 1 23 19 20 15 22
2 24 22 24 17 21
3 12 25 19 21 23
Mishti 1 15 22 25 22 22
2 18 21 25 24 23
3 17 18 20 25 20
Raman 1 22 21 18 20 21
2 21 20 17 22 24
3 14 19 15 24 23
Zu haire 3 22 18 19 23 13
Zuhaire 1 20 17 22 24 19
2 23 15 21 25 15
>>> df1.size()
Name
Ashravy 3
Mishti 3
Raman 3
Zu haire 1
Zuhaire 2
dtype: int64
>>> df1.count()
When we slice the data, we get the original index which is not
continuous.
>>> a=df[df.UT == 1]
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
3 Zuhaire 1 20 17 22 24 19
6 Ashravy 1 23 19 20 15 22
9 Mishti 1 15 22 25 22 22
>>> a.reset_index(inplace=True)
>>> a
index Name UT Maths Science S.St Hindi Eng
0 0 Raman 1 22 21 18 20 21
1 3 Zuhaire 1 20 17 22 24 19
2 6 Ashravy 1 23 19 20 15 22
3 9 Mishti 1 15 22 25 22 22
a.drop(columns=['index'],inplace=True)
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
1 Zuhaire 1 20 17 22 24 19
2 Ashravy 1 23 19 20 15 22
3 Mishti 1 15 22 25 22 22
a.set_index('Name',inplace=True)
>>> a
UT Maths Science S.St Hindi Eng
Name
Raman 1 22 21 18 20 21
Zuhaire 1 20 17 22 24 19
Ashravy 1 23 19 20 15 22
Mishti 1 15 22 25 22 22
Reshaping Data :
For reshaping data, two basic functions are available in Pandas,
pivot and pivot_table.
import pandas as pd
d1={'Tutor':['Tahira','Gurjot','Anusha','Jacob','Venkat'],
'Class':[28,36,41,32,40],'Country':['USA','UK','Japan','USA','Br
azil']}
df=pd.DataFrame(d1)
>>> df
Tutor Class Country
0 Tahira 28 USA
1 Gurjot 36 UK
2 Anusha 41 Japan
3 Jacob 32 USA
4 Venkat 40 Brazil
>>> df.pivot(index='Country', columns='Tutor',values='Class')
df.pivot(index='Country', columns='Tutor',
values='Class' ) .fillna(0)
df1=pd.DataFrame(ontutD)
>>> df1
Tutor Classes Quarter Country
0 Tahira 28 1 USA
1 Gurjot 36 1 UK
2 Anusha 41 1 Japan
3 Jacob 32 1 USA
4 ,Venkat 40 1 Brazil
5 Tahira 36 2 USA
6 Gurjot 40 2 USA
7 Anusha 36 2 Japan
8 Jacob 40 2 Brazil
9 Venkat 46 2 USA
10 Tahira 24 3 Brazil
11 Gurjot 30 3 USA
12 Anusha 44 3 UK
13 Jacob 40 3 Brazil
14 Venkat 32 3 USA
15 Tahira 36 4 Japan
16 Gurjot 32 4 Japan
17 Anusha 36 4 Brazil
18 Jacob 41 4 UK
19 Venkat 38 4 USA
for data having multiple values for same row and column
combination we can use another pivoting funciton the
pivot_table() function.
df2=pd.DataFrame([[51,12,32],[41,np.NaN,55,62],[17,88,Non
e]])
>>> df2.isnull()
0 1 2 3
0 False False False True
1 False True False False
2 False False True True
print(df2['Tutor'].isnull())
print(df2['Country'].isnull())
print(df2.isnull().any())
Tutor False
Class False
Country False
dtype: bool
df = pd.DataFrame(marksUT)
print(df.isnull()
print(df['Science'].isnull())
print(df.isnull().any())
The function any() can be used for a particular attribute..
print(df.isnull().sum())
a.dropna(inplace=True, how='any')
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22.0 21.0 18 20 21.0
1 Raman 2 21.0 20.0 17 22 24.0
2 Raman 3 14.0 19.0 15 24 23.0
To get the column labels appear in sorted order we can set the
parameter sort=True. The column labels shall appear in
unsorted order when the parameter sort = False.
dFrame2 =df1.append(df, sort='True')
We can load the data from the data.csv file into a DataFrame,
say marks using Pandas read_csv() function.
marks = pd.read_csv(r"C:\Users\Ashutosh\Desktop\data.csv",
sep =",", header=0)
marks = pd.read_csv(r"C:\Users\Ashutosh\Desktop\data.csv",
sep =",", names=['RNo','StudentName', 'Sub1','Sub2'] )
df1.to_csv(r"C:\Users\Ashutosh\Desktop\data12.csv", sep
=",", header=0)