0% found this document useful (0 votes)
63 views

1723524625270_Data_Frame_Notes3

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

1723524625270_Data_Frame_Notes3

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Data Frames: (NCERT – Chapter 2)

1. Creation of DataFrame from A) dictionary of series, B) list of dictionaries, C) text/CSV files,


2. Operations on DataFrame - A) display B) i. Selection ii. Slicing iii. Iteration
3. Operations on rows and columns: A) add ( insert /append) , B) select, C) delete (drop
column and row), D) rename, E) Head and Tail functions,
4. indexing using A) labels, B) Boolean indexing.

Data Visualization (NCERT – Chapter 4)


Data Visualization : Purpose of plotting, drawing and saving of plots using Matplotlib
(line plot, bar graph, histogram).
Customizing plots: adding label, title, and legend in plots.

Notes 3 DataFrame Operations


2. Operations on DataFrame - A) Display B) i. Selection

A) Display – 1. with print() (in both Jupyter and IDLE)


2. use the dataframe object name independently (only in Jupyter)

print(dataframe_object)

Ex. of A) 1 
import pandas as pd
MyDoS = {
'Term1': pd.Series([90, 100, 90, 99]),
'Term2': pd.Series([80, 90, 85, 99])
}
Mydf=pd.DataFrame(MyDoS, index=[1,2,3,4)
print(Mydf)
O/P 
Jupyter IDLE

Ex. of A) 2 
import pandas as pd
MyDoS = { 'Term1': pd.Series([90, 100, 90, 99]),
'Term2': pd.Series([80, 90, 85, 99])
}
Mydf=pd.DataFrame(MyDoS, index=[1,2,3,4])
Mydf

O/P  Jupyter

print(Mydf)
DataFrame Created from LoD
Ex. of A) 1 
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit
Sinha'])
O/P 

Jupyter IDLE

Ex. Of A)2 
DataFrame Created from a csv file (WorldCups.csv)
Year CountryWinner Runners-Up Third Fourth GoalsScored QualifiedTeams MatchesPlayed
Attendance
1 1930 Uruguay Uruguay Argentina USA Yugoslavia 70 13 18
590.549
2 1934 Italy Italy Czechoslovakia Germany Austria 70 16 17 363.000
3 1938 France Italy Hungary Brazil Sweden 84 15 18 375.700
4 1950 Brazil Uruguay Brazil SwedenSpain 88 13 22 1.045.246
5 1954 Switzer Germany Hungary Austria Uruguay 140 16 26
768.607
6 1958 SwedenBrazil Sweden France Germany FR 126 16 35 819.810
7 1962 Chile Brazil Czechoslovakia Chile Yugoslavia 89 16 32 893.172
8 1966 England England Germany FR Portugal Soviet Union 89 16 32
1.563.135
9 1970 Mexico Brazil Italy Germany Uruguay 95 16 32
1.603.975
10 1974 Germany Germany Netherlands Poland Brazil 97 16 38 1.865.753
11 1978 Argentina Argentina Netherlands Brazil Italy 102 16 38
1.545.791
12 1982 Spain Italy Germany FR Poland France 146 24 52 2.109.723
13 1986 Mexico Argentina Germany FR France Belgium 132 24 52
2.394.031
14 1990 Italy Germany Argentina Italy England 115 24 52
2.516.215
15 1994 USA Brazil Italy SwedenBulgaria 141 24 52 3.587.538
16 1998 France France Brazil Croatia Netherlands 171 32 64 2.785.100
17 2002 Korea Brazil Germany Turkey Korea Republic 161 32 64 2.705.197
18 2006 Germany Italy France Germany Portugal 147 32 64
3.359.439
19 2010 S Africa Spain Netherlands Germany Uruguay 145 32 64
3.178.856
20 2014 Brazil Germany Argentina NethersBrazil 171 32 64
3.386.810

Ex. of A) 1 
import pandas as pd
Mydf = pd.read_csv("WorldCups.csv")
print(Mydf)
Ex. Of A)2 
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

Loan_ID Gender Married Dependents Education Self_Employed \


0 LP001002 Male No 0 Graduate No
1 LP001003 Male Yes 1 Graduate No
2 LP001005 Male Yes 0 Graduate Yes
3 LP001006 Male Yes 0 Not Graduate No
4 LP001008 Male No 0 Graduate No
.. ... ... ... ... ... ...
609 LP002978 Female No 0 Graduate No
610 LP002979 Male Yes 3+ Graduate No
611 LP002983 Male Yes 1 Graduate No
612 LP002984 Male Yes 2 Graduate No
613 LP002990 Female No 0 Graduate Yes

ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \


0 5849 0.0 NaN 360.0
1 4583 1508.0 128.0 360.0
2 3000 0.0 66.0 360.0
3 2583 2358.0 120.0 360.0
4 6000 0.0 141.0 360.0
.. ... ... ... ...
609 2900 0.0 71.0 360.0
610 4106 0.0 40.0 180.0
611 8072 240.0 253.0 360.0
612 7583 0.0 187.0 360.0
613 4583 0.0 133.0 360.0

Credit_History Property_Area Loan_Status


0 1.0 Urban Y
1 1.0 Rural N
2 1.0 Urban Y
3 1.0 Urban Y
4 1.0 Urban Y
.. ... ... ...
609 1.0 Rural Y
610 1.0 Rural Y
611 1.0 Urban Y
612 1.0 Urban Y
613 0.0 Semiurban N

Exercises – Write the executable statements to select an entire dataframe named DF12
?
??

2. B)i. Selection – of an entire dataframe


To select the entire dataframe  DataFrame_object name

For DataFrame created from DoS –

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)

Mydf
For DataFrame created from LoD –

import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])

To select the entire dataframe  DataFrame_object name


Mydf

For DataFrame created from CSV File –

** online csv file -


import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

To select the entire dataframe  DataFrame_object name


Mydf

2. B ii. Extraction / Slicing - is an operation performed over the


dataframe to show/extract values from the dataframe either row wise or
column wise or a single data –

For DataFrame created from DoS –


import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
a. To Select/Extract a particular column  print( dataframe_object[ ‘Column name’ ] )

Ex. 1  print(Mydf [ ' Term1%' ] )


Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object

Ex. 2  print(Mydf[ 'Term2%' ] )


Amit Shekhar 80
Aryaman Bhagat 90
Bhavyam Kamal 85
Pulkit Sinha ML
Name: Term2%, dtype: object

b. To Select/Extract one data from the entire dataframe 


print( dataframe_object[ ‘Column name’ ] [index address])

0th Row

1st Row

2nd Row

3rd Row

Aryaman term2% ??? – 90

print( Mydf[ ‘Term2%’] [ 1 ] ) - 90

Ex. 1  print(Mydf[ 'Term1%' ] [0] ) Data will be extracted from 0th Row = Amit Shekhar
90 80
but only of Term1%
90

Ex. 2  print(Mydf[ 'Term2%' ] [3] ) Data will be extracted from 3rd Row = Pulkit Sinha 99
ML
ML

Ex. 3  print(Mydf[ 'Term2%' ] [5] ) There is no such 5th Row


IndexError: index 5 is out of bounds for axis 0 with size 4

c. To Select/Extract one/more row(s)  print( dataframe_object[ ‘Column name’ ]


[range of the row])
Ex. 1 
print(Mydf [ 'Term1%' ] [ 0:1 ] )  [0 : 1]  0th row to (1-1)th row = 0th to 0th (row)
th
0 row = Amit Shekhar 90 80
From 0th row the Extraction of [ Term1%] value Amit Shekhar 90 80 Index Label (Amit Shekhar)
will be displayed by default.

O/P 
Amit Shekhar 90
Name: Term1%, dtype: object

Ex. 2 
print(Mydf[ 'Term1%' ] [ 1:2 ] )  [ 1 : 2 ]  1st row to (2-1) row = 1st to 1st (row)
1st row = Aryaman Bhagat NA 90
From 1st row the Extraction of [ Term1%] value Aryaman Bhagat NA 90 Index Label (Aryaman
Bhagat) will be displayed by default.

Aryaman Bhagat NA
Name: Term1%, dtype: object

A= Mydf[ 'Term1%' ] [ 1:2 ] )


A = Aryaman Bhagat NA
A.tail( ) = Aryaman Bhagat NA

Ex. 3 
print(Mydf[ 'Term1%' ] [ 2:2 ] ) 2 to 1  2nd row to (2-1) row = 2nd row to 1st row ( invalid
direction of movement)
** We can move from 2nd to 3rd and so on but not in the opposite direction.
So the output is an empty Series( ) !!!
Series([], Name: Term1%, dtype: object)

Why a series??? My data_object is a dataframe so the output after extraction should be a


dataframe too!!!
Simple recall the structure of a Series and compare with a dataframe.
1-D is Series and 2-D is dataframe.
So whenever the output is of the structure is 1-D it will be of type Series( ) and you can perform all
sort of operations on that output wrt Series( ) and its attributes.

And whenever the output of the structure is 2-D it will be of type DataFrame( ) and you can
perform all sort of operations on that output wrt DataFrame( ) and its attributes.

Ex. 4 
print(Mydf[ 'Term1%' ] [ 1:3 ] ) = 1st row to (3-1) row = 1st to 2nd row
1st row = Aryaman Bhagat NA 90
2nd row = Bhavyam Kamal 90 85

From 1st row to 2nd row the Extraction of [ Term1%] value


Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Index Label will be displayed by default.

O/P 
Aryaman Bhagat NA
Bhavyam Kamal 90
Name: Term1%, dtype: object
Ex. 5  print(Mydf[ 'Term1%' ] [ : ] ) = when ‘n’ and ‘m’ values are not specified it means
slicing/ extraction of all the rows (n = 0 default value) (m = total number of rows = 4 )
So, all the rows will be extracted and value of Term1% for each row will be displayed .
Index Label will be displayed by default.
O/P 
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object
Ex. 6  print(Mydf[ 'Term1%' ] [ 0 : ] )  ‘n’ = starting index address from where extraction has
to begin = 0 = 0th row
‘m’ is not specified = last row
[ 0 : ] = Extraction from 0th row till last row
So, all the rows will be extracted and value of Term1% for each row will be displayed .
Index Label will be displayed by default.

O/P 
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object

- 4th Row

- 3rd Row

- 2nd Row

- 1st Row

Ex. 7  print(Mydf['Term1%'] [ -4 : ] ) = - 4 to last row (When ‘m’ is not specified so till


the last value) = -4th row to -1st row (m = total no of rows) [ -4 : 4]
So, all the rows will be extracted and value of Term1% for each row will be displayed .
Index Label will be displayed by default.

O/P 
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object

Ex. 8  print(Mydf['Term1%'] [ -4: -1 ] )  -4th row to (-1 – 1) row = -4th row to -2nd row
-4th row = Amit Shekhar 90 80
-3rd row = Aryaman Bhagat NA 90
-2nd row = Bhavyam Kamal 90 85
From -4th row to -2nd row the Extraction of [ Term1%] value
Amit Shekhar 90 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Index Label will be displayed by default.

O/P 
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Name: Term1%, dtype: object

Ex. 9  print(Mydf[ 'Term1%' ] [ :-1 ] )  when ‘m’ is not specified the value is = 0, 0th row
to (-1 – 1) row= 0th row to -2nd row
-4th row = Amit Shekhar 90 80
-3rd row = Aryaman Bhagat NA 90
-2nd row = Bhavyam Kamal 90 85

From -4th row to -2nd row the Extraction of [ Term1%] value


Amit Shekhar 90 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Index Label will be displayed by default.

O/P 
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Name: Term1%, dtype: object

For DataFrame created from LoD –


import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
Mydf

a. To Select/Extract a particular column  print( dataframe_object[ ‘Column name’ ] )

print( Mydf[ 'Term1%' ] )

Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object

b. To Select/Extract one particular data  print( dataframe_object[ ‘Column name’ ]


[index address])
print(Mydf['Term2%'][2]) 
85

c. To Select/Extract one/more row  print( dataframe_object[ ‘Column name’ ] [range


of the row])

print(Mydf['Term1%'][0:4]) 

Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object

** Slicing cannot be done on more than one column (Columns can’t be defined in range or as
values of a list)

print(Mydf [ ' Term1% : ' Term2% ' ] [ 0 : 4 ] ) 


TypeError: cannot do slice indexing on RangeIndex with these indexers [Name] of type str

print(Mydf['Term1%', 'Term2%' ] [ 0 : 4 ] ) 
KeyError: ('Name', 'M2')

For DataFrame created from CSV File –

** online csv file -


import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

a. To Select/Extract a particular column  print( dataframe_object[ ‘Column name’ ] )

print(Mydf['Education']) 
0 Graduate
1 Graduate
2 Graduate
3 Not Graduate
4 Graduate
...
609 Graduate
610 Graduate
611 Graduate
612 Graduate
613 Graduate
Name: Education, Length: 614, dtype: object

b. To Select/Extract one data  print( dataframe_object[ ‘Column name’ ] [index


address])
print(Mydf['LoanAmount'][3]) 
120.0
c. To Select/Extract one/more row  print( dataframe_object[ ‘Column name’ ] [range
of the row])

print(Mydf['Loan_ID'][0:10]) 

0 LP001002
1 LP001003
2 LP001005
3 LP001006
4 LP001008
5 LP001011
6 LP001013
7 LP001014
8 LP001018
9 LP001020
Name: Loan_ID, dtype: object

** Attributes of DataFrame_object
1. index – will return the index labels as a list value of the dataframe.
print(Mydf.index) 
Index(['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha'], dtype='object')

2. columns – will return the column labels as a list value of the dataframe.
print(Mydf.columns) 
Index( [ 'Term1%', 'Term2%' ], dtype='object')

3. values – will return the values of the columns as list of list of the dataframe
print(Mydf.values) 
[ [90 80]
['NA' 90]
[90 85]
[99 'ML'] ]

2. B iii. Iteration – is an operation performed over the dataframe to move across the
data of a dataframe row wise but for a single column.

1. using for loop with index attribute of dataframe_object – the index attribute of the
dataframe object is used to iterate ( move across) through the rows of the dataframe. As the index
attribute returns the index address.

Syntax –
for counter_variable in dataframe_object.index:
executable statement

Ex. 1 
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99, 98, 100], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha',
'Kalash', 'Praganya']),
'Term2%': pd.Series([80, 90, 85, 'ML',99,99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha',
'Kalash', 'Praganya'])
}
Mydf=pd.DataFrame(MyDoS)
print("\n Iterating over all the row values :\n")

Mydf[ 'Term1%'] [0])  ??


Mydf[ 'Term1%'] [1])  ??
for i in range(0,6):
.
print(Mydf[ 'Term1%'] [i]) )
.
.
Mydf[ 'Term1%'] [5])  ??

range(start_value, stop_value, jump_value)


What if total number of rows are unknown!!! Range unknown!!!
Use the attribute Mydf.index = will return the index values of the dataframe in a list.
Will also assign the default index address to each row label (Index address)

Mydf.index = ['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha', 'Kalash', 'Praganya']
0 1 2 3 4 5

for i in Mydf.index:
print( Mydf [ 'Term1%'] [i])

i Mydf.index Check (i< Mydf.index) print( Mydf [ 'Term1%'] [i])


Jump (i++)
0 0 0 < 0 Amit Shekhar 90 80 0+1 = 1
1 1 1<1 Aryaman Bhagat NA 90
1+1 = 2
.
.
.
5 5 5<5 Praganya 100 99 5+1 = 6
6 6 6<6

O/P 
Iterating over all the rows using the index attribute :
90
NA
.
.
100
** Iteration extracts only the data. (Not with the Index label)

Ex. 2  Iteration over the rows of a particular column of a Series created from an
LoD data_object

import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit
Sinha'])
print("\n Iterating over row values using the index attribute :\n")
for i in Mydf.index:
print( Mydf ['Term2%'] [i])

O/P 
Iterating over rows using the index attribute :
80
90
85
ML

Ex. 3  Iteration over the rows of a particular column of a Series created from a
CSV File (url)
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
print("\n Iterating over row values using the index attribute :\n")

for i in Mydf.index: (i = 0 to 614)


print( Mydf ['Dependents'] [i])
O/P 
Iterating over row values using the index attribute :
LP001002
LP001003
LP001005
LP001006
LP001008
.
.
.

LP001865
LP001868
LP002990
Ex. 4 
import pandas as pd
sample = { 'Employee' : ['Amitej', 'Prakhar', 'Naman', 'Amitej', 'Prakhar'],
'Payable Amount':[10000, 12000, 14000, 20000, 15000]
}
mydf = pd.DataFrame(sample)
mydf
print("\n Iterating over rows using the index attribute :\n")
for i in mydf.index:
print( mydf [ 'Employee'] [i])

** Extraction of a single Character


i. from a single row & a single column 

Syntax  dataframe_object [ 'Column name' ] [ row index address ] [ character


index address ]
** Extraction only for characters not for integer / float

Term1% Term2%
0 1 2 0 1
Amit Shekhar 900 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Pulkit Sinha 99 ML
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf[‘Term2%’ ] [ 0 ] [ 0 ] 

Write the code to extract the 1st character of 2nd column of 2nd row of the given
dataframe –
What is it ???
2nd Column’s name = 'Term2%'
2nd row’s index address =1
1st character index address = 0
DataFrame object = Mydf

Ex.  Mydf [ ' Term2% ' ] [ 1 ] [ 0 ]

Term1% Term2%
01 01
Amit Shekhar 90 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Pulkit Sinha 99 ML

O/P  TypeError: 'int' object is not sub scriptable

** Extraction only for characters not for integer / float

Ex.  Mydf [ ' Term1% ' ] [ 1 ] [ 0 ]


Term1% Term2%
01 01
Amit Shekhar 90 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Pulkit Sinha 99 ML

O/P  N

** Extraction of a single Character


ii. from multiple rows (from a single column)
Syntax  for counter_variable in dataframe_object.index:
executable statement
Ex.  for i in Mydf.index:
print( Mydf [ 'Term1%'] [ i ] [ 0 ] )

This will return the First Character (Character at 0th location) of each row (Iterating through the counter variable i)
of the Column ‘Term1%’
But provided the Term1% column holds only Characters (Alphabets)

Ex. 1 
Ex. 2

Ex. 3
Ex. 4 
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
print("\n Extracting a Character from specified index address of the value stored in each row :\n")
for i in Mydf.index:
print( Mydf ['Gender'] [ i ][0])

3. Operations on rows and columns of a


DataFrame:
3. A) add ( insert /append) -
Addition / Insertion of a row/column in a Dataframe

Add a new Column  DataFrame_Object[ ‘newcolumn_name’] = [column_value1 ,
column_value2, .. ]
Mydf[‘Term_Total’] = value/ [ values ] / Operations

Add a column to a DataFrame created from DoS –


import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf

Mydf['Term_Total']=[ 170, ‘NA’, 175, 99]


Mydf

Mydf['Term_Total']=170

OR
Mydf['Term_Total']=Mydf['Term1%']+ Mydf['Term2%']
Mydf

TypeError: must be str, not int

Add a column to a DataFrame created from LoD –


import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
Mydf

Mydf['Term_Total']=[170, 'NA', 175, 99]


Mydf

OR
Mydf['Term_Total']=Mydf[‘Term1%]+ Mydf[‘Term2%]
Mydf
TypeError: must be str, not int

Add a column to a DataFrame created from CSV File –


import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

Mydf[‘StartLoanCredit’]=20000
Mydf['Loan_Rebate']=100
Mydf

Mydf['Loan_Discount']=Mydf['LoanAmount']*20/100
Mydf[‘Final_Amount’]=Mydf[‘LoanAmount’]-Mydf[‘Loan_Discount]
Mydf
Can I add multiple columns at one go ??? No!! As many columns those many
statements.

Add a row to a DataFrame created from DoS –


To Add a new Row  DataFrame_Object.loc[index_address]= [column_value1 ,
column_value2, .. ]
Mydf.loc[‘Amartya Vats’] = 99 / [99, 98]

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf

Mydf.loc['Amartya Vats']=[99, 98]


Mydf

** If a row already exists and you write the code to introduce new column values to
the existing row. The old column values of that row will be overwritten with the new
one.
** you can assign one single value to all the columns of a particular row
Mydf.loc['Amit Shekhar']= 99
Mydf # do not write the value as a list element just
independent value.

Mydf.loc['Rayarth Bhat']= 'NaN'


Mydf

Add a row to a DataFrame created from LoD –


import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])

Mydf.loc['Amartya Vats']=[99, 98]


Mydf

Mydf.loc['Rayarth Bhat']= 'NaN'


Mydf
Add a row to a DataFrame created from CSV File –
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

Mydf.loc[614]='NaN'
Mydf

** More than one new row can’t be added at one go


** But multiple existing rows can be edited at one go.

Method II – df_obj.insert( arguments for columns /


key:values for rows)
To add a new column -
df_obj.insert(loc=indx_add,column=clm_nam,valu
e=[ ])

Code 1 
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])

Mydf.insert(loc=2, column='Total',value=99)
Mydf

Code II 

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat',
'Bhavyam Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat',
'Bhavyam Kamal', 'Pulkit Sinha'])
}
Mydf2=pd.DataFrame(MyDoS)
Mydf2.insert(loc=2, column='Total',value=99)
Mydf2
3. B) Selection of rows and columns –
Already covered in 2. B ii. Extraction / Slicing.

3. C) Delete (drop column and row)


dataframe_obj.drop('index_name' / 'column_name', axis / inplace ) - will show the new
dataframe with the rows/columns which have not been deleted.

axis- {0 means rows / index, 1 means ‘columns’}, default axis = 0


inplace – will place the action of drop when set to true ( default = False)

Delete a row From a DataFrame created from DoS –


(
dataframe_obj.drop 'index_name', axis/ inplace ) - will show the new dataframe with the
rows/columns which have not been deleted.

axis- {0 means rows / index, 1 means ‘columns’}, default axis = 0


inplace – will place the action of drop when set to true ( default = False)

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)

Mydf.loc['Amartya Vats']=[99, 95]


Mydf.loc['Rayirth Bhat']= [97, 96]

Mydf
** drop( ) needs inplace to be set to True (default is False)

Mydf.drop(['Amartya Vats'], inplace=True)


Mydf

OR

df=Mydf.drop( ['Amartya Vats'] )


print(df)

Mydf.drop([‘Rayirth Bhat’], inplace=true)


** If you try deleting an already deleted row – KeyError [index_name] not found

Delete a row from a DataFrame created from LoD –


import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])

Mydf.loc['Amartya Vats']=[99, 95]


Mydf.loc['Rayirth Bhat']=[97, 96]

Mydf
Delete a row from a DataFrame created from CSV File –
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")

Mydf.loc['614']='NaN'
Mydf
Mydf.drop([614]) 

** drop( ) will drop the entire row of the specified index in a CSV DataFrame without the inplace
set to True.

Delete a column From a DataFrame created from DoS –

(
dataframe_obj.drop 'column_name', axis ) - will show the new dataframe with the
rows/columns which have not been deleted.

axis- {0 means rows / index = 1 means ‘columns’}, default axis = 0

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf

Mydf['Term_Total']=[170, 'NA', 175, 99]


Mydf

Mydf.drop(['Term_Total'], axis=1)

Mydf.drop(['Term_Total'], inplace=True)

KeyError: "['Term_Total'] not found in axis"

Delete a column from a DataFrame created from LoD –


import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
Mydf['Term_Total']=[170, 'NA', 175, 99]
Mydf
Mydf.drop(['Term_Total'], axis=1)

Mydf.drop(['Term_Total'], inplace=True)??

KeyError: "['Term_Total'] not found in axis"

Delete a column From a DataFrame created from CSV –


import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

Mydf.drop(['Loan_Rebate'],axis=1)
Mydf

Extras

del dataframe_obj[ 'Coulmn_name'] - del keyword will delete the entire


content of the specified column from the dataframe.

dataframe_obj.pop('Column_name') - will delete the specified column and


will also show the deleted column with the values.
3. D) Rename a column of

A DataFrame created from DoS –

Method 1  DataFrame_Object.columns=[ ‘new_column_name1 , ‘new_column_name2’ , … . . ]

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)

Mydf. columns = ['T1%', 'T2%'] 

Mydf.columns = ['T1%']
Mydf

ValueError:
Length mismatch: Expected axis has 2 elements, new values have 1 elements

Mydf.rename(columns = {'Term1%':'T1%'}, inplace = True)


Mydf

Mydf.rename(columns = {'Term1%':'T1%', 'Term2%':'T2%'}, inplace = True)



A DataFrame created from LoD –
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
Mydf

A DataFrame created from CSV File –


import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf

E) Head and Tail functions,


Extraction of first/last ‘n’ number of rows of the
DataFrame –
DataFrame_object.head( )
Mydf.head( )  will extract the first 5 rows (default 5) from the dataframe
Mydf.head(‘n’)  will extract the first ‘n’(specified ‘n’ values) rows from
the dataframe
Mydf.head(‘-n’)  will extract all the rows other than last ‘n’ (specified ‘n’
values) rows from the dataframe

import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-
datahack-datacamp/train.csv")
Mydf
Mydf.head(5)

Mydf.head(-5)  Will give all the rows other than last 5


DataFrame_object.tail( )
Mydf.tail( )  will extract the last 5 rows (default 5) from the dataframe
Mydf.tail(‘n’)  will extract the last ‘n’(specified ‘n’ values) rows from the
dataframe
Mydf.tail(‘-n’)  will extract all the rows other than first ‘n’ (specified ‘n’ values)
rows from the dataframe

import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-
datacamp/train.csv")
Mydf
Mydf.tail(7)  the last 7 records

Mydf.tail(-7)  other than the top 7


import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv",
skiprows = 7)
Mydf

What is the difference btw ( tail(-n ) and skiprows=n )????

Add( Not mathematical adding but appending) rows with


columns in an exisiting dataframes or appending two dataframes

dataframe_object.append(object, ignore_index=True))

This method is used to append rows of other dataframe to the end of the given dataframe,
returning a new dataframe object.
Columns which do not exist in the original dataframe are added as new columns and the new
cells appear with default value NaN.

ignore_index is an argument which by default is false and repeats the index address
of the independent dataframe. But when set to true will show the index address
as of the new dataframe in the order.

Ex.1  Appending a row (at the last) in the dataframe


import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99]),
'Term2%': pd.Series([80, 90, 85, 'ML'])
}
Mydf=pd.DataFrame(MyDoS)
print(Mydf, "\n")
Newdf=Mydf.append({'Term1%':100, 'Term2%':50}, ignore_index=True)
Newdf.index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha','Amartya Vats']
print(Newdf)
** The index address is as per the new dataframe’s default index address

Limitation with a dictionary of series when the ignore_index is not set to true (direct updating
a row as a dictionary not as a dataframe)
Inserting or adding a new Dataframe to an existing dataframe created from a list of
dictionaries.
* without ignore_index

import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit
Sinha'])
print(Mydf, "\n")
Newdf=Mydf.append({'Term1%':100, 'Term2%':100}, ignore_index=True)
print(Newdf)
Inserting or adding a new Dataframe to an existing dataframe created from a list of
dictionaries.
* without ignore_index

Extras

Adding a row and dataframes in a dataframe created


through a csv file

pandas.concat( )
pandas.concat([row_value, dataframe_object]).reset_index(drop)
OR
pandas.concat([dataframe_objects], axis, join, join_axes[ ] , ignore_index)

row_value - is the new row value for a predefined dataframe which has to be added.
dataframe_object - is the dataframe in which a new row has to be added.
reset_index(drop) - is the method of the concat( ) which allows to reset the index
address values of the new dataframe.
axis - default value is 0 (0 / 1) which means adding of the new row will be row wise.
join - default value is 'outer' ('outer' / 'inner' ) where outer is for union between the
dataframe objects and inner for the intersection of the dataframe objects.
join_axes[ ] - replaces the indexes of the dataframes with a new set of indexes,
ignoring their actual index and if one of the dataframes is longer than the
corresponding index, then that particular index will be truncated in the resultant
dataframe.

import pandas as pd
Mydf = pd.read_csv("https://www.nseindia.com/live_market/dynaContent/live_watch/
equities_stock_watch.htm")

new_row = pd.DataFrame( { 'Symbol':'SIB', 'Open':1200.50, 'High':350, 'Low':220, 'Age':33,


'Close': 250}, index =[0])

# simply concatenate both dataframes


Mydf2 = pd.concat([new_row, Mydf]).reset_index(drop = True)
print(Mydf2)

To concatenate the dataframes as one dataframe -

Mydf1
Mydf2
Mydf3
Listdf = [ Mydf1, Mydf2, Mydf3 ]
Finaldf = pandas.concat(Listdf)

***********************************
2. using for loop with loc attribute of dataframe_object – the loc attribute of the
dataframe object is used to iterate (move across) through the rows of the dataframe as this attribute
takes the location value as input.

range( )
len(data_object) – is a method which returns the total number of characters of the
specified data_object
Ex. 1 

import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
print("\n Iterating over rows using the index attribute :\n")
for i in range(len(Mydf)) :
print(Mydf.loc[ i, 'Term1%' ] )

Ex. 2 

import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha'])
print("\n Iterating over rows using the index attribute :\n")
for i in range(len(Mydf)) :
print(Mydf.loc[ i, 'Term1%' ] )

Ex. 3 

import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
print("\n Iterating over rows using the index attribute :\n")
for i in range(len(Mydf)) :
print(Mydf.loc[ i, 'Term1%' ] )

Ex. 4 

import pandas as pd
sample = { 'Employee' : ['Amitej', 'Prakhar', 'Naman', 'Amitej', 'Prakhar'],
'Payable Amount':[10000, 12000, 14000, 20000, 15000]
}
mydf = pd.DataFrame(sample)
mydf
print("\n Iterating over rows using the index attribute :\n")
for i in rang(len(mydf):
print( mydf.loc [ i, 'Employee'])

iterrows( ): - is the method / function which shows data of a dataframe row wise.

Syntax – for (row, rowSeries) in dataframeobjectname.iterows( ):

executable statement(s)

# row is the fixed keyword with initial row index value which keeps iterating (increases by 1)

# rowSeries is the keyword which extracts the data for each column along with column heading of the current row
index
Eg- for (row, rowSeries) in mydf1.iterrows( ):

print(“Row Index – ”, row, “ & its record is – ” rowSeries)

ndarray (numpy) – 1-D structure


DataFrame (pandas) - 2-D structure

Many ndarrays = 1 dataframe

So, first create ndarrays objects and then create a dataframe object out of those
ndarrays objects.

For the same given data as above (#Ref1) How many ndarrays?? (As many rows
those many ndarrays)
Creation of ndarrays objects –
import numpy
n1=numpy.array(['Abhishek', 95, 90, 91] )
n2=numpy.array(['Amitej', 96, 89, 93] )
n3=numpy.array(['Prakhar', 97, 88, 95] )
n4=numpy.array(['Bhavya', 98, 87, 97] )
n5=numpy.array(['Stephy', 99, 86, 99] )

Creation of a dataframe object from the ndarray objects –


import pandas
df1=pandas.DataFrame( [n1,n2,n3,n4,n5] )
print(df1)

print(df1)

0 1 2 3
0 Abhishek 95 90 91
1 Amitej 96 89 93
2 Prakhar 97 88 95
3 Bhavya 98 87 97
4 Stephy 99 86 99
The output does not match with the table #Ref1 (Default row and column labels are
appearing)

To match, change the column headings ( default is 0, 1, 2, 3 …. ) to the one as in


table
How??
Use the keyword argument - ‘columns’ of the construct pandas.DataFrame(data [,
index , columns ] )
df1=pandas.DataFrame( [n1,n2,n3,n4,n5] , columns=['Name', 'M1', 'M2', 'M3'])
df1

To change the default row index 0, 1, 2, 3, 4 to the roll no 1, 2, 3 4, 5 use ‘index’


argument of the pandas.DataFrame( ) construct.

df1=pandas.DataFrame( [n1,n2,n3,n4,n5] , columns=['Name', 'M1', 'M2', 'M3'],


index=[1,2,3,4,5] )

One last thing the Roll No heading is missing as the heading ??


Since the Roll no column is the index column se we can use the index.name construct
to give a name.

Method 1 

df1.index.name='Roll No'
df1

Method 2 
df1.rename_axis('Roll No', inplace=True)
df1

Ex. 2  Create a dataframe from ndarrays using the below data as data
source
Roll No Name M1 M2 M3
1 Abhishek 95 90 AB
2 Amitej 96 89
3 Prakhar 97 ML 95
4 Bhavya 98 87 97
5 Stephy 99 NaN

import numpy
n1=numpy.array( [ 'Abhishek', 95, 90, 'AB' ] )
n2=numpy.array( [ 'Amitej', 96, 89 ] )
n3=numpy.array( [ 'Prakhar', 97, 'ML', 95 ] )
n4=numpy.array( [ 'Bhavya', 98, 87, 97 ] )
n5=numpy.array( [ 'Stephy', 99, 'NaN', ] )

import pandas
df1=pandas.DataFrame( [ n1, n2, n3, n4, n5 ] , columns = [ 'Name', 'M1', 'M2', 'M3' ]
)
df1.index.name= 'Roll No'
df1

this table is created through the DataFrame( ) construct.


This is how the dataframe appears

***********************

You might also like