0% found this document useful (0 votes)

12 views

Pandas 2 Complete Notes Class XII

Uploaded by

asayushsingh638

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Pandas 2 Complete Notes Class XII

Uploaded by

asayushsingh638

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Python Pandas II

Introduction: - Basic operations of dataframe descriptive

statistics , pivoting, handling missing data, combining/merging
etc.

Descriptive Statistics are used to summarise the given data. In

other words, they refer to the methods which are used to get
some basic idea about the data.

Iteration over a Dataframe:-

(i)The iterrows() method iterates over dataframe row wise :-
We can iterate over dataframe row-wise where each row's
values are returned in form of a Series type object.

import pandas as pd
import numpy as np
df=pd.DataFrame({'Population':[10927986,12691836,4631392
, 4328063],
'Hospital':[189,208,149,157],
'School':[7916,8508,7226,7617]},
index=['Delhi','Mumbai','Kolkata','Chennai'])
print(df)
for (row, rowSeries) in df.iterrows():
print("Row index :" , row)
print("Containing :")
print(rowSeries)

Row index : Delhi

Containing :
Population 10927986
Hospital 189
School 7916
Name: Delhi, dtype: int64
Row index : Mumbai
Containing :
Population 12691836
Hospital 208
School 8508
Name: Mumbai, dtype: int64
Row index : Kolkata
Containing :
Population 4631392
Hospital 149
School 7226
Name: Kolkata, dtype: int64
Row index : Chennai
Containing :
Population 4328063
Hospital 157
School 7617
Name: Chennai, dtype: int64

(ii) The iteritems() method iterates over dataframe

column-wise :-
for (column, ColSeries) in df.iteritems():
print("Col index :" , column)
print("Containing :")
print(ColSeries)
Col index : Population
Containing :
Delhi 10927986
Mumbai 12691836
Kolkata 4631392
Chennai 4328063
Name: Population, dtype: int64
Col index : Hospital
Containing :
Delhi 189
Mumbai 208
Kolkata 149
Chennai 157
Name: Hospital, dtype: int64
Col index : School
Containing :
Delhi 7916
Mumbai 8508
Kolkata 7226
Chennai 7617
Name: School, dtype: int64

Binary Operations in a DataFrames

df1=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])

df2=pd.DataFrame([[51,12,32],[41,55,62],[17,88,None]])

(i)Addition +, add() and radd()

df1+df2 or df1.add(df2)
(ii) substraaction -, sub(), rsub()
df1- df2 or df1.sub(df2)
(iii) multiplication * and mul(), rmul()
df1*df2 or df1.mul(df2)
(iv) division / and div()
df1/df2 or df1.div(df2)

Descriptive Statistics with Pandas

(i)DataFrame.max() is used to calculate the maximum
values from the DataFrame.
print(df.max())
If we want to output maximum value for the columns having
only numeric values, then we can set the parameter
numeric_only=True in the max() method
print(df.max(numeric_only=True))

(ii) DataFrame.min() is used to display the minimum

values from the DataFrame.
print(df.min())
(iii) DataFrame.sum() will display the sum of the values
from the DataFrame.
print(df.sum())
print(df['Maths'].sum())
To calculate total marks of a particular student, the name of
the student needs to be specified.
(iv) DataFrame.count() will display the total number of
values for each column or row of a DataFrame. To count
the rows we need to use the argument axis=1 .
print(df.count())
(v) DataFrame.mean() will display the mean (average)
of the values of each column of a DataFrame.
df.mean()
(vi) DataFrame.Median() will display the middle value of
the data. This function will display the median of the
values of each column of a DataFrame.
print(df.median())
(vii) DateFrame.mode() will display the mode. The mode is
defined as the value that appears the most number of
times in a data.
df.mode()
Quartile
Dataframe.quantile() is used to get the quartiles. It will output
the quartile of each column or row of the DataFrame in four
parts i.e. the first quartile is 25% (parameter q = .25), the
second quartile is 50% (Median), the third quartile is 75%
(parameter q = .75). By default, it will display the second
quantile (median) of all numeric values.

df.quantile()

DataFrame.var() is used to display the variance. It is the

average of squared differences from the mean.

df[['Maths','Science','S. St','Hindi','Eng']].var()

DataFrame.std() returns the standard deviation of the values.

Standard deviation is calculated as the square root of the
variance.

df[['Maths','Science','S. St','Hindi','Eng']].std

DataFrame.describe() function displays the descriptive

statistical values in a single command. These values help us
describe a set of data in a DataFrame.
The describe() function: The describe() funciton give the
following information for a dataframe.
• Count Count of non-NA values in a column
• Mean Computed mean of values in column
• std. Standard deviation of values in a column
• min Minimum value in a column
• 25%,50%, 75% Percentiles of values in the column
• max maximum value in column

df2=pd.DataFrame([[51,12,32],[41,55,62],[17,88,None]])
df2.describe()
0 1 2
count 3.000000 3.000000 2.000000
mean 36.333333 51.666667 47.000000
std 17.473790 38.109491 21.213203
min 17.000000 12.000000 32.000000
25% 29.000000 33.500000 39.500000
50% 41.000000 55.000000 47.000000
75% 46.000000 71.500000 54.500000
max 51.000000 88.000000 62.000000

Data Aggregations :- Aggregation means to transform the

dataset and produce a single numeric value from an array.
Aggregation can be applied to one or more columns together.
Aggregate functions are max(),min(), sum(), count(), std(),
var().

import pandas as pd

df=pd.DataFrame(marksUT)
print(df)
>>> df.aggregate('max')
Name Zuhaire
UT 3
Maths 24
Science 25
S.St 25
Hindi 25
Eng 24
dtype: object
>>>df.aggregate(['max','count'])
Name UT Maths Science S.St Hindi Eng
max Zuhaire 3 24 25 25 25 24
count 12 12 12 12 12 12 12

Sorting a dataFrame
Sorting refers to the arrangement of data elements in a
specified order, which can either be ascending or descending.
Pandas provide sort_values() function to sort the data values
of a DataFrame.

DataFrame.sort_values(by, axis=0, ascending=True)

Here, a column list (by), axis arguments (0 for rows and 1 for
columns) and the order of sorting (ascending = False or True)
are passed as arguments. By default, sorting is done on row
indexes in ascending order.

print(df.sort_values(by=['Name']))
>>> print(df.sort_values(by=['Name']))
Name UT Maths Science S.St Hindi Eng
6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20
0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
5 Zu haire 3 22 18 19 23 13
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15

print(df.sort_values(by=['Science']))

print(df.sort_values(by=['Eng'],ascending=False))
A DataFrame can be sorted based on multiple columns.
>>> print(df.sort_values(by=['Science','Hindi']))

Group BY FunctIons
GROUP BY() function is used to split the data into groups based
on some criteria. Pandas objects like a DataFrame can be split
on any of their axes.
In other words, the duplicate values in the same field are
grouped together to form groups.

Step 1: Split the data into groups by creating a GROUP BY

object from the original DataFrame.
2: Apply the required function.
Step 3: Combine the results to form a new DataFrame.

g1=df.groupby('Name')
note:- Python creaed groups based on volumn's values but did
not display grouped data, as groupby() is also an object.

df1=df.groupby('Name')

>>> df1.groups (lists the groups created)

{'Ashravy': [6, 7, 8], 'Mishti': [9, 10, 11], 'Raman': [0, 1, 2],
'Zu haire': [5], 'Zuhaire': [3, 4]}

#Displaying group data, i.e., group_name, row indexes

corresponding to the group and their data type.

#df1.get_group('Mishti')

Name UT Maths Science S.St Hindi Eng

9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20

df1.get_group('Raman')
df1=df.groupby(['Name', 'UT'])

>>> df1.first()
Maths Science S.St Hindi Eng
Name UT
Ashravy 1 23 19 20 15 22
2 24 22 24 17 21
3 12 25 19 21 23
Mishti 1 15 22 25 22 22
2 18 21 25 24 23
3 17 18 20 25 20
Raman 1 22 21 18 20 21
2 21 20 17 22 24
3 14 19 15 24 23
Zu haire 3 22 18 19 23 13
Zuhaire 1 20 17 22 24 19
2 23 15 21 25 15

#Displaying the size of each group

>>> df1.size()
Name
Ashravy 3
Mishti 3
Raman 3
Zu haire 1
Zuhaire 2
dtype: int64

>>> df1.count()

UT Maths Science S.St Hindi Eng

Name
Ashravy 3 3 3 3 3 3
Mishti 3 3 3 3 3 3
Raman 3 3 3 3 3 3
Zu haire 1 1 1 1 1 1
Zuhaire 2 2 2 2 2 2
df.groupby(['UT']).aggregate('mean')

Altering the Index :- We use indexing to access the elements

of a DataFrame. It is used for fast retrieval of data. By default,
a numeric index starting from 0 is created as a row index.

When we slice the data, we get the original index which is not
continuous.

We create a new continuous index alongside this using the

reset_index() function.

>>> a=df[df.UT == 1]
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
3 Zuhaire 1 20 17 22 24 19
6 Ashravy 1 23 19 20 15 22
9 Mishti 1 15 22 25 22 22

>>> a.reset_index(inplace=True)
>>> a
index Name UT Maths Science S.St Hindi Eng
0 0 Raman 1 22 21 18 20 21
1 3 Zuhaire 1 20 17 22 24 19
2 6 Ashravy 1 23 19 20 15 22
3 9 Mishti 1 15 22 25 22 22

A new continuous index is created while the original one is also

intact. We can drop the original index by using the drop
function.

a.drop(columns=['index'],inplace=True)

>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
1 Zuhaire 1 20 17 22 24 19
2 Ashravy 1 23 19 20 15 22
3 Mishti 1 15 22 25 22 22

We can change the index to some other column of the data.

a.set_index('Name',inplace=True)
>>> a
UT Maths Science S.St Hindi Eng
Name
Raman 1 22 21 18 20 21
Zuhaire 1 20 17 22 24 19
Ashravy 1 23 19 20 15 22
Mishti 1 15 22 25 22 22

We can revert back to previous index.

a.reset_index('Name', inplace = True)
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22 21 18 20 21
1 Zuhaire 1 20 17 22 24 19
2 Ashravy 1 23 19 20 15 22
3 Mishti 1 15 22 25 22 22

Reshaping Data :
For reshaping data, two basic functions are available in Pandas,
pivot and pivot_table.

(A) Pivot: -The pivot function is used to reshape and create a

new DataFrame from the original one.
Pivoting is actually a summary technique that work on tabular
data.

import pandas as pd
d1={'Tutor':['Tahira','Gurjot','Anusha','Jacob','Venkat'],
'Class':[28,36,41,32,40],'Country':['USA','UK','Japan','USA','Br
azil']}
df=pd.DataFrame(d1)
>>> df
Tutor Class Country
0 Tahira 28 USA
1 Gurjot 36 UK
2 Anusha 41 Japan
3 Jacob 32 USA
4 Venkat 40 Brazil
>>> df.pivot(index='Country', columns='Tutor',values='Class')

Tutor Anusha Gurjot Jacob Tahira Venkat

Country
Brazil NaN NaN NaN NaN 40.0
Japan 41.0 NaN NaN NaN NaN
UK NaN 36.0 NaN NaN NaN
USA NaN NaN 32.0 28.0 NaN

With pivot() function following argument work:

index stores the column name about which the
information is to be sumarised(will become rows
in result)
columns Stores the columns name whose data will
become a column each in the summary
information (will become columns in the result)
values Stores the column name whose data will be
displayed for the index column combination
(will become cells in result.

df.pivot(index='Country', columns='Tutor',
values='Class' ) .fillna(0)

Using pivot_table() functon:- if there are multiple entries for a

column's value for the same values for index(row), it leads to
error. Hence, before we use pivot(), we should ensure that the
data does not have rows with duplicate values for the specified
columns.
ontutD={'Tutor': [ 'Tahira', 'Gurjot', 'Anusha','Jacob', 'Venkat',
'Tahira','Gurjot', 'Anusha','Jacob','Venkat',
'Tahira','Gurjot','Anusha',
'Jacob','Venkat','Tahira','Gurjot','Anusha','Jacob','Venkat'],
'Classes':[28,36,41,32,40,36,40,36,40,46,24,30,44,40,32,36,3
2,36,41,38],'Quarter':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
'Country':['USA','UK','Japan','USA','Brazil','USA','USA','Japan','B
razil','USA','Brazil','USA','UK','Brazil','USA','Japan','Japan','Brazil
','UK','USA']}

df1=pd.DataFrame(ontutD)
>>> df1
Tutor Classes Quarter Country
0 Tahira 28 1 USA
1 Gurjot 36 1 UK
2 Anusha 41 1 Japan
3 Jacob 32 1 USA
4 ,Venkat 40 1 Brazil
5 Tahira 36 2 USA
6 Gurjot 40 2 USA
7 Anusha 36 2 Japan
8 Jacob 40 2 Brazil
9 Venkat 46 2 USA
10 Tahira 24 3 Brazil
11 Gurjot 30 3 USA
12 Anusha 44 3 UK
13 Jacob 40 3 Brazil
14 Venkat 32 3 USA
15 Tahira 36 4 Japan
16 Gurjot 32 4 Japan
17 Anusha 36 4 Brazil
18 Jacob 41 4 UK
19 Venkat 38 4 USA

for data having multiple values for same row and column
combination we can use another pivoting funciton the
pivot_table() function.

The pivot_table() is also a pivoting function, which like pivot()

also produces a pivoted table, but it is different from the
pivot() funciton.
(i) It does not raise errors for multiple entries fo a row,
column combination.
(ii) It aggregates the multiple entires present for a row-
column combination. We need to specify what type of
aggregation we want (sum, mean).
Parameters:
index contains the column name for rows.
columns contains the column name for columns.
Values contains the column names for data of the
pivoted table.
aggfunc contains, the function as per which data is to be
aggregated. By default mean will compute.

>>> df1.pivot_table(index='Country', columns='Tutor',

values='Classes', aggfunc=[sum,max,np.mean])

Tutor Anusha Gurjot Jacob Tahira Venkat

Country
Brazil 36.0 NaN 40.0 24.0 40.000000
Japan 38.5 32.0 NaN 36.0 NaN
UK 44.0 36.0 41.0 NaN NaN
USA NaN 35.0 32.0 32.0 38.666667

Filling NaN values : - Where the NaN values is missing, we can

fill by using fillna(0) argument.

df1.pivot_table(index='Country', columns='Tutor', values=

'Classes'). fillna(0)

Handling missing values : - If a value corresponding to a

column is not present, it is considered to be a missing value. A
missing value is denoted by NaN.

Missing values create a lot of problems during data analysis

and have to be handled properly. The two most common
reason for handling missing values are:
i) drop the object having missing values,
ii) fill the missing value

df2=pd.DataFrame([[51,12,32],[41,np.NaN,55,62],[17,88,Non
e]])

Checking Missing Values :- isnull() to check whether any

value is missing or not in the DataFrame. This function checks
all attributes and returns True in case that attribute has
missing values, otherwise returns False.

>>> df2.isnull()
0 1 2 3
0 False False False True
1 False True False False
2 False False True True

print(df2['Tutor'].isnull())
print(df2['Country'].isnull())

To check whether a column (attribute) has a missing value in

the entire dataset, any() function is used. It returns True in
case of missing value else returns False.

print(df2.isnull().any())
Tutor False
Class False
Country False
dtype: bool

The function any() can be used for a particular attribute also

print(df2[1].isnull().any())

Dropping Missing Values

marksUT =
{ 'Name':['Raman','Raman','Raman','Raman','Zuhaire','Zuhaire'
,'Zuhaire' ,'Zuhaire','Ashravy','Ashravy','Ashravy','Ashravy','Mis
hti','Mishti', 'Mishti','Mishti'],
'UT':[1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4],
'Maths':[22,21,14,np.NaN,20,23,22,19,23,24,12,15,15,18,17,1
4],
'Science':[21,20,19,np.NaN,17,15,18,20,19,22,25,20,22,21,18
,20],
'S.St':[18,17,15,19,22,21,19,17,20,24,19,20,25,25,20,19],
'Hindi':[20,22,24,18,24,25,23,21, 15,17,21,20,22,24,25,20],
'Eng':[21,24,23,np.NaN,19,15,13,16,22,21,23,17,22,23,20,18]
}

df = pd.DataFrame(marksUT)

print(df.isnull()
print(df['Science'].isnull())
print(df.isnull().any())
The function any() can be used for a particular attribute..

To find the number of NaN values corresponding to each

attribute.

print(df.isnull().sum())

To find the total number of NaN in the whole dataset, we can

use-
print(df.isnull().sum().sum())

Dropping Missing Values :- Dropping will remove the entire

row (object) having the missing value(s). The dropna()
function we can use to drop NaN values.
>>> a=df[df.Name=='Raman']
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22.0 21.0 18 20 21.0
1 Raman 2 21.0 20.0 17 22 24.0
2 Raman 3 14.0 19.0 15 24 23.0
3 Raman 4 NaN NaN 19 18 NaN

a.dropna(inplace=True, how='any')
>>> a
Name UT Maths Science S.St Hindi Eng
0 Raman 1 22.0 21.0 18 20 21.0
1 Raman 2 21.0 20.0 17 22 24.0
2 Raman 3 14.0 19.0 15 24 23.0

Joining, Merging and Concatenation of DataFrames

(A) Joining :- We can use the pandas DataFrame.append()
method to merge two DataFrames. It appends rows of the
second DataFrame at the end of the first DataFrame. Columns
not present in the first DataFrame are added as new columns.

df=pd.DataFrame([[1, 2, 3], [4, 5], [6]], columns=['C1', 'C2',

'C3'], index=['R1', 'R2', 'R3'])
>>> df
C1 C2 C3
R1 1 2.0 3.0
R2 4 5.0 NaN
R3 6 NaN NaN

>>> df1=pd.DataFrame([[10, 20], [30], [40, 50]],

columns=['C2', 'C5'], index=['R4', 'R2', 'R5'])
>>> df1
C2 C5
R4 10 20.0
R2 30 NaN
R5 40 50.0
dfnew=df.append(df1)
>>> dfnew
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0

To get the column labels appear in sorted order we can set the
parameter sort=True. The column labels shall appear in
unsorted order when the parameter sort = False.
dFrame2 =df1.append(df, sort='True')

The parameter ignore_index of append()method may be set to

True, when we do not want to use row index labels. By default,
ignore_index = False.

dFrame1 = df.append(df1, ignore_index=True)

>>> dFrame1
C1 C2 C3 C5
0 1.0 2.0 3.0 NaN
1 4.0 5.0 NaN NaN
2 6.0 NaN NaN NaN
3 NaN 10.0 NaN 20.0
4 NaN 30.0 NaN NaN
5 NaN 40.0 NaN 50.0

Importing a CSV file to a DataFrame : We can create a

DataFrame by importing data from CSV files where values are
separated by commas.

the following data in a csv file

named C:\Users\Ashutosh\Desktop\ ng.csv stored.

We can load the data from the data.csv file into a DataFrame,
say marks using Pandas read_csv() function.

marks = pd.read_csv(r"C:\Users\Ashutosh\Desktop\data.csv",
sep =",", header=0)

• The first parameter to the read_csv() is the name of the

comma separated data file along with its path.

• The parameter sep specifies whether the values are

separated by comma, semicolon, tab, or any other character.
The default value for sepis a space.
• The parameter header specifies the number of the row whose
values are to be used as the column names. It also marks the
start of the data to be fetched. Header=0 implies that column
names are inferred from the first line of the file. By default,
header=0.

names parameter is used to specify the labels for columns of

the DataFrame marks1

marks = pd.read_csv(r"C:\Users\Ashutosh\Desktop\data.csv",
sep =",", names=['RNo','StudentName', 'Sub1','Sub2'] )

Exporting a DataFrame to a CSV file:- We can use the

to_csv() function to save a DataFrame to a text or csv file.

df1.to_csv(r"C:\Users\Ashutosh\Desktop\data12.csv", sep
=",", header=0)

Final Industrial Project I Prison Management System
100% (2)
Final Industrial Project I Prison Management System
62 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Data Handling Using Pandas - II
No ratings yet
Data Handling Using Pandas - II
42 pages
Data Handling Using Pandas - II
No ratings yet
Data Handling Using Pandas - II
42 pages
Pandas Tutorial1 - Informatics
No ratings yet
Pandas Tutorial1 - Informatics
43 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
02. Python Pandas - 2 2020-21
No ratings yet
02. Python Pandas - 2 2020-21
21 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Practical File ANKIT RAJ CLASS 12-F
No ratings yet
Practical File ANKIT RAJ CLASS 12-F
48 pages
Dataframe Extended-Ii
No ratings yet
Dataframe Extended-Ii
19 pages
Pandas Dataframe1
No ratings yet
Pandas Dataframe1
43 pages
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
100% (1)
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
37 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
vertopal.com_12_Pandas
No ratings yet
vertopal.com_12_Pandas
14 pages
Series 1
No ratings yet
Series 1
408 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Dataframes-I (Create & Selection)
No ratings yet
Dataframes-I (Create & Selection)
10 pages
Pandas Dataframe2
No ratings yet
Pandas Dataframe2
12 pages
Data Frame Notes1
No ratings yet
Data Frame Notes1
7 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Journal 12
No ratings yet
Journal 12
54 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Pandas
No ratings yet
Pandas
5 pages
Answers Practical File
No ratings yet
Answers Practical File
19 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
IP_Record-5
No ratings yet
IP_Record-5
9 pages
64[6]
No ratings yet
64[6]
5 pages
IP XII U1 Ch3 DataHandling (DataFrame) Final
No ratings yet
IP XII U1 Ch3 DataHandling (DataFrame) Final
45 pages
MCQ On Dataframe
No ratings yet
MCQ On Dataframe
11 pages
Vantika Kamra's Practical File 12 Diamond (26600872)
No ratings yet
Vantika Kamra's Practical File 12 Diamond (26600872)
46 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
GR12 RECORD PROGRAMS 6TH ONWARDS
No ratings yet
GR12 RECORD PROGRAMS 6TH ONWARDS
18 pages
PDF&Rendition=1
No ratings yet
PDF&Rendition=1
47 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
Pandas+With+Python+ +DATAhill+Solutions
No ratings yet
Pandas+With+Python+ +DATAhill+Solutions
24 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
Assignments IP Class 12
No ratings yet
Assignments IP Class 12
9 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
609008987-EDA-Lab-Manual
No ratings yet
609008987-EDA-Lab-Manual
93 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
python interviews
No ratings yet
python interviews
154 pages
a5
No ratings yet
a5
28 pages
Lab2.2 Kritika
No ratings yet
Lab2.2 Kritika
10 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
Dataframe in Pandas
No ratings yet
Dataframe in Pandas
23 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
Python_1st_10
No ratings yet
Python_1st_10
11 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
How To Write Custom Launcher App in Android - Arnab Chakraborty
No ratings yet
How To Write Custom Launcher App in Android - Arnab Chakraborty
13 pages
Creo Parametric Toolkit 2.0 Release Notes: December 2015
No ratings yet
Creo Parametric Toolkit 2.0 Release Notes: December 2015
42 pages
Expanded Sorting Visualizer Project Report
No ratings yet
Expanded Sorting Visualizer Project Report
33 pages
OPC 20 How To Use E
No ratings yet
OPC 20 How To Use E
19 pages
Development Manual Elastix
No ratings yet
Development Manual Elastix
49 pages
Full Form of Some Technical Words
No ratings yet
Full Form of Some Technical Words
5 pages
Ipt101 Syllabus
100% (1)
Ipt101 Syllabus
8 pages
Butch 4 Butch - Rio Romeo Sheet Music For Piano (Solo) Musescore - Com 5
No ratings yet
Butch 4 Butch - Rio Romeo Sheet Music For Piano (Solo) Musescore - Com 5
1 page
Placement Consultants in Chennai
No ratings yet
Placement Consultants in Chennai
15 pages
Share Market Basics in Tamil PDF Download - Google Search
17% (6)
Share Market Basics in Tamil PDF Download - Google Search
2 pages
Mobile Phone: Service Manual
No ratings yet
Mobile Phone: Service Manual
174 pages
Montecarlo (Números Pseudoaleatorios) - How Computers Generate Random Numbers
100% (1)
Montecarlo (Números Pseudoaleatorios) - How Computers Generate Random Numbers
5 pages
So3 b1 Quick Quiz u8b PDF
No ratings yet
So3 b1 Quick Quiz u8b PDF
2 pages
Security Manual Production Products 202201 v1
No ratings yet
Security Manual Production Products 202201 v1
90 pages
Business Intelligence - Chapter 3
No ratings yet
Business Intelligence - Chapter 3
72 pages
OM-E-CL500 Rev.4 - Operator's Manual English HTI CL-500
No ratings yet
OM-E-CL500 Rev.4 - Operator's Manual English HTI CL-500
28 pages
Career Chart
No ratings yet
Career Chart
2 pages
RuggedCom - Testing Serial Ports On A RuggedServer v1.0
No ratings yet
RuggedCom - Testing Serial Ports On A RuggedServer v1.0
16 pages
DPIM User Manual Version 1.0
No ratings yet
DPIM User Manual Version 1.0
28 pages
1010music Blackbox 3 Guide To New Features
No ratings yet
1010music Blackbox 3 Guide To New Features
6 pages
Domain 4 Communication and Network Security
No ratings yet
Domain 4 Communication and Network Security
9 pages
C++ Date and Time
No ratings yet
C++ Date and Time
4 pages
O RAN - WG11.O CLOUD Security Analysis TR.O R003 v06.00
No ratings yet
O RAN - WG11.O CLOUD Security Analysis TR.O R003 v06.00
119 pages
Bryan Jang 1.2.5 Practice - Analyzing Business Ethical Dilemmas (Practice)
No ratings yet
Bryan Jang 1.2.5 Practice - Analyzing Business Ethical Dilemmas (Practice)
5 pages
Blue Prism Release Manager Guide
No ratings yet
Blue Prism Release Manager Guide
24 pages
BSSC MOM 27th Jan 2024
No ratings yet
BSSC MOM 27th Jan 2024
2 pages
Global Information System
No ratings yet
Global Information System
4 pages
Line Differential Protection (ZIV e-NET Flex Family)
No ratings yet
Line Differential Protection (ZIV e-NET Flex Family)
2 pages

Pandas 2 Complete Notes Class XII

Uploaded by

Pandas 2 Complete Notes Class XII

Uploaded by

Python Pandas II

Introduction: - Basic operations of dataframe descriptive

Descriptive Statistics are used to summarise the given data. In

Iteration over a Dataframe:-

Row index : Delhi

(ii) The iteritems() method iterates over dataframe

Binary Operations in a DataFrames

(i)Addition +, add() and radd()

Descriptive Statistics with Pandas

(ii) DataFrame.min() is used to display the minimum

DataFrame.var() is used to display the variance. It is the

DataFrame.std() returns the standard deviation of the values.

DataFrame.describe() function displays the descriptive

Data Aggregations :- Aggregation means to transform the

DataFrame.sort_values(by, axis=0, ascending=True)

Step 1: Split the data into groups by creating a GROUP BY

>>> df1.groups (lists the groups created)

#Displaying group data, i.e., group_name, row indexes

Name UT Maths Science S.St Hindi Eng

#Displaying the size of each group

UT Maths Science S.St Hindi Eng

Altering the Index :- We use indexing to access the elements

We create a new continuous index alongside this using the

A new continuous index is created while the original one is also

We can change the index to some other column of the data.

We can revert back to previous index.

(A) Pivot: -The pivot function is used to reshape and create a

Tutor Anusha Gurjot Jacob Tahira Venkat

With pivot() function following argument work:

Using pivot_table() functon:- if there are multiple entries for a

The pivot_table() is also a pivoting function, which like pivot()

>>> df1.pivot_table(index='Country', columns='Tutor',

Tutor Anusha Gurjot Jacob Tahira Venkat

Filling NaN values : - Where the NaN values is missing, we can

df1.pivot_table(index='Country', columns='Tutor', values=

Handling missing values : - If a value corresponding to a

Missing values create a lot of problems during data analysis

Checking Missing Values :- isnull() to check whether any

To check whether a column (attribute) has a missing value in

The function any() can be used for a particular attribute also

Dropping Missing Values

To find the number of NaN values corresponding to each

To find the total number of NaN in the whole dataset, we can

Dropping Missing Values :- Dropping will remove the entire

Joining, Merging and Concatenation of DataFrames

df=pd.DataFrame([[1, 2, 3], [4, 5], [6]], columns=['C1', 'C2',

>>> df1=pd.DataFrame([[10, 20], [30], [40, 50]],

The parameter ignore_index of append()method may be set to

dFrame1 = df.append(df1, ignore_index=True)

Importing a CSV file to a DataFrame : We can create a

the following data in a csv file

• The first parameter to the read_csv() is the name of the

• The parameter sep specifies whether the values are

names parameter is used to specify the labels for columns of

Exporting a DataFrame to a CSV file:- We can use the

You might also like