1723524625270_Data_Frame_Notes3
1723524625270_Data_Frame_Notes3
print(dataframe_object)
Ex. of A) 1
import pandas as pd
MyDoS = {
'Term1': pd.Series([90, 100, 90, 99]),
'Term2': pd.Series([80, 90, 85, 99])
}
Mydf=pd.DataFrame(MyDoS, index=[1,2,3,4)
print(Mydf)
O/P
Jupyter IDLE
Ex. of A) 2
import pandas as pd
MyDoS = { 'Term1': pd.Series([90, 100, 90, 99]),
'Term2': pd.Series([80, 90, 85, 99])
}
Mydf=pd.DataFrame(MyDoS, index=[1,2,3,4])
Mydf
O/P Jupyter
print(Mydf)
DataFrame Created from LoD
Ex. of A) 1
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit
Sinha'])
O/P
Jupyter IDLE
Ex. Of A)2
DataFrame Created from a csv file (WorldCups.csv)
Year CountryWinner Runners-Up Third Fourth GoalsScored QualifiedTeams MatchesPlayed
Attendance
1 1930 Uruguay Uruguay Argentina USA Yugoslavia 70 13 18
590.549
2 1934 Italy Italy Czechoslovakia Germany Austria 70 16 17 363.000
3 1938 France Italy Hungary Brazil Sweden 84 15 18 375.700
4 1950 Brazil Uruguay Brazil SwedenSpain 88 13 22 1.045.246
5 1954 Switzer Germany Hungary Austria Uruguay 140 16 26
768.607
6 1958 SwedenBrazil Sweden France Germany FR 126 16 35 819.810
7 1962 Chile Brazil Czechoslovakia Chile Yugoslavia 89 16 32 893.172
8 1966 England England Germany FR Portugal Soviet Union 89 16 32
1.563.135
9 1970 Mexico Brazil Italy Germany Uruguay 95 16 32
1.603.975
10 1974 Germany Germany Netherlands Poland Brazil 97 16 38 1.865.753
11 1978 Argentina Argentina Netherlands Brazil Italy 102 16 38
1.545.791
12 1982 Spain Italy Germany FR Poland France 146 24 52 2.109.723
13 1986 Mexico Argentina Germany FR France Belgium 132 24 52
2.394.031
14 1990 Italy Germany Argentina Italy England 115 24 52
2.516.215
15 1994 USA Brazil Italy SwedenBulgaria 141 24 52 3.587.538
16 1998 France France Brazil Croatia Netherlands 171 32 64 2.785.100
17 2002 Korea Brazil Germany Turkey Korea Republic 161 32 64 2.705.197
18 2006 Germany Italy France Germany Portugal 147 32 64
3.359.439
19 2010 S Africa Spain Netherlands Germany Uruguay 145 32 64
3.178.856
20 2014 Brazil Germany Argentina NethersBrazil 171 32 64
3.386.810
Ex. of A) 1
import pandas as pd
Mydf = pd.read_csv("WorldCups.csv")
print(Mydf)
Ex. Of A)2
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf
Exercises – Write the executable statements to select an entire dataframe named DF12
?
??
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf
For DataFrame created from LoD –
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
0th Row
1st Row
2nd Row
3rd Row
Ex. 1 print(Mydf[ 'Term1%' ] [0] ) Data will be extracted from 0th Row = Amit Shekhar
90 80
but only of Term1%
90
Ex. 2 print(Mydf[ 'Term2%' ] [3] ) Data will be extracted from 3rd Row = Pulkit Sinha 99
ML
ML
O/P
Amit Shekhar 90
Name: Term1%, dtype: object
Ex. 2
print(Mydf[ 'Term1%' ] [ 1:2 ] ) [ 1 : 2 ] 1st row to (2-1) row = 1st to 1st (row)
1st row = Aryaman Bhagat NA 90
From 1st row the Extraction of [ Term1%] value Aryaman Bhagat NA 90 Index Label (Aryaman
Bhagat) will be displayed by default.
Aryaman Bhagat NA
Name: Term1%, dtype: object
Ex. 3
print(Mydf[ 'Term1%' ] [ 2:2 ] ) 2 to 1 2nd row to (2-1) row = 2nd row to 1st row ( invalid
direction of movement)
** We can move from 2nd to 3rd and so on but not in the opposite direction.
So the output is an empty Series( ) !!!
Series([], Name: Term1%, dtype: object)
And whenever the output of the structure is 2-D it will be of type DataFrame( ) and you can
perform all sort of operations on that output wrt DataFrame( ) and its attributes.
Ex. 4
print(Mydf[ 'Term1%' ] [ 1:3 ] ) = 1st row to (3-1) row = 1st to 2nd row
1st row = Aryaman Bhagat NA 90
2nd row = Bhavyam Kamal 90 85
O/P
Aryaman Bhagat NA
Bhavyam Kamal 90
Name: Term1%, dtype: object
Ex. 5 print(Mydf[ 'Term1%' ] [ : ] ) = when ‘n’ and ‘m’ values are not specified it means
slicing/ extraction of all the rows (n = 0 default value) (m = total number of rows = 4 )
So, all the rows will be extracted and value of Term1% for each row will be displayed .
Index Label will be displayed by default.
O/P
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object
Ex. 6 print(Mydf[ 'Term1%' ] [ 0 : ] ) ‘n’ = starting index address from where extraction has
to begin = 0 = 0th row
‘m’ is not specified = last row
[ 0 : ] = Extraction from 0th row till last row
So, all the rows will be extracted and value of Term1% for each row will be displayed .
Index Label will be displayed by default.
O/P
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object
- 4th Row
- 3rd Row
- 2nd Row
- 1st Row
O/P
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object
Ex. 8 print(Mydf['Term1%'] [ -4: -1 ] ) -4th row to (-1 – 1) row = -4th row to -2nd row
-4th row = Amit Shekhar 90 80
-3rd row = Aryaman Bhagat NA 90
-2nd row = Bhavyam Kamal 90 85
From -4th row to -2nd row the Extraction of [ Term1%] value
Amit Shekhar 90 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Index Label will be displayed by default.
O/P
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Name: Term1%, dtype: object
Ex. 9 print(Mydf[ 'Term1%' ] [ :-1 ] ) when ‘m’ is not specified the value is = 0, 0th row
to (-1 – 1) row= 0th row to -2nd row
-4th row = Amit Shekhar 90 80
-3rd row = Aryaman Bhagat NA 90
-2nd row = Bhavyam Kamal 90 85
O/P
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Name: Term1%, dtype: object
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object
print(Mydf['Term1%'][0:4])
Amit Shekhar 90
Aryaman Bhagat NA
Bhavyam Kamal 90
Pulkit Sinha 99
Name: Term1%, dtype: object
** Slicing cannot be done on more than one column (Columns can’t be defined in range or as
values of a list)
print(Mydf['Term1%', 'Term2%' ] [ 0 : 4 ] )
KeyError: ('Name', 'M2')
print(Mydf['Education'])
0 Graduate
1 Graduate
2 Graduate
3 Not Graduate
4 Graduate
...
609 Graduate
610 Graduate
611 Graduate
612 Graduate
613 Graduate
Name: Education, Length: 614, dtype: object
print(Mydf['Loan_ID'][0:10])
0 LP001002
1 LP001003
2 LP001005
3 LP001006
4 LP001008
5 LP001011
6 LP001013
7 LP001014
8 LP001018
9 LP001020
Name: Loan_ID, dtype: object
** Attributes of DataFrame_object
1. index – will return the index labels as a list value of the dataframe.
print(Mydf.index)
Index(['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha'], dtype='object')
2. columns – will return the column labels as a list value of the dataframe.
print(Mydf.columns)
Index( [ 'Term1%', 'Term2%' ], dtype='object')
3. values – will return the values of the columns as list of list of the dataframe
print(Mydf.values)
[ [90 80]
['NA' 90]
[90 85]
[99 'ML'] ]
2. B iii. Iteration – is an operation performed over the dataframe to move across the
data of a dataframe row wise but for a single column.
1. using for loop with index attribute of dataframe_object – the index attribute of the
dataframe object is used to iterate ( move across) through the rows of the dataframe. As the index
attribute returns the index address.
Syntax –
for counter_variable in dataframe_object.index:
executable statement
Ex. 1
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99, 98, 100], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha',
'Kalash', 'Praganya']),
'Term2%': pd.Series([80, 90, 85, 'ML',99,99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha',
'Kalash', 'Praganya'])
}
Mydf=pd.DataFrame(MyDoS)
print("\n Iterating over all the row values :\n")
Mydf.index = ['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha', 'Kalash', 'Praganya']
0 1 2 3 4 5
for i in Mydf.index:
print( Mydf [ 'Term1%'] [i])
O/P
Iterating over all the rows using the index attribute :
90
NA
.
.
100
** Iteration extracts only the data. (Not with the Index label)
Ex. 2 Iteration over the rows of a particular column of a Series created from an
LoD data_object
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit
Sinha'])
print("\n Iterating over row values using the index attribute :\n")
for i in Mydf.index:
print( Mydf ['Term2%'] [i])
O/P
Iterating over rows using the index attribute :
80
90
85
ML
Ex. 3 Iteration over the rows of a particular column of a Series created from a
CSV File (url)
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
print("\n Iterating over row values using the index attribute :\n")
LP001865
LP001868
LP002990
Ex. 4
import pandas as pd
sample = { 'Employee' : ['Amitej', 'Prakhar', 'Naman', 'Amitej', 'Prakhar'],
'Payable Amount':[10000, 12000, 14000, 20000, 15000]
}
mydf = pd.DataFrame(sample)
mydf
print("\n Iterating over rows using the index attribute :\n")
for i in mydf.index:
print( mydf [ 'Employee'] [i])
Term1% Term2%
0 1 2 0 1
Amit Shekhar 900 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Pulkit Sinha 99 ML
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf[‘Term2%’ ] [ 0 ] [ 0 ]
Write the code to extract the 1st character of 2nd column of 2nd row of the given
dataframe –
What is it ???
2nd Column’s name = 'Term2%'
2nd row’s index address =1
1st character index address = 0
DataFrame object = Mydf
Term1% Term2%
01 01
Amit Shekhar 90 80
Aryaman Bhagat NA 90
Bhavyam Kamal 90 85
Pulkit Sinha 99 ML
O/P N
This will return the First Character (Character at 0th location) of each row (Iterating through the counter variable i)
of the Column ‘Term1%’
But provided the Term1% column holds only Characters (Alphabets)
Ex. 1
Ex. 2
Ex. 3
Ex. 4
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
print("\n Extracting a Character from specified index address of the value stored in each row :\n")
for i in Mydf.index:
print( Mydf ['Gender'] [ i ][0])
Mydf['Term_Total']=170
OR
Mydf['Term_Total']=Mydf['Term1%']+ Mydf['Term2%']
Mydf
OR
Mydf['Term_Total']=Mydf[‘Term1%]+ Mydf[‘Term2%]
Mydf
TypeError: must be str, not int
Mydf[‘StartLoanCredit’]=20000
Mydf['Loan_Rebate']=100
Mydf
Mydf['Loan_Discount']=Mydf['LoanAmount']*20/100
Mydf[‘Final_Amount’]=Mydf[‘LoanAmount’]-Mydf[‘Loan_Discount]
Mydf
Can I add multiple columns at one go ??? No!! As many columns those many
statements.
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf
** If a row already exists and you write the code to introduce new column values to
the existing row. The old column values of that row will be overwritten with the new
one.
** you can assign one single value to all the columns of a particular row
Mydf.loc['Amit Shekhar']= 99
Mydf # do not write the value as a list element just
independent value.
Mydf.loc[614]='NaN'
Mydf
Code 1
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal',
'Pulkit Sinha'])
Mydf.insert(loc=2, column='Total',value=99)
Mydf
Code II
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat',
'Bhavyam Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat',
'Bhavyam Kamal', 'Pulkit Sinha'])
}
Mydf2=pd.DataFrame(MyDoS)
Mydf2.insert(loc=2, column='Total',value=99)
Mydf2
3. B) Selection of rows and columns –
Already covered in 2. B ii. Extraction / Slicing.
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf
** drop( ) needs inplace to be set to True (default is False)
OR
Mydf
Delete a row from a DataFrame created from CSV File –
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
Mydf.loc['614']='NaN'
Mydf
Mydf.drop([614])
** drop( ) will drop the entire row of the specified index in a CSV DataFrame without the inplace
set to True.
(
dataframe_obj.drop 'column_name', axis ) - will show the new dataframe with the
rows/columns which have not been deleted.
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf
Mydf.drop(['Term_Total'], axis=1)
Mydf.drop(['Term_Total'], inplace=True)
Mydf.drop(['Term_Total'], inplace=True)??
Mydf.drop(['Loan_Rebate'],axis=1)
Mydf
Extras
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam
Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
Mydf.columns = ['T1%']
Mydf
ValueError:
Length mismatch: Expected axis has 2 elements, new values have 1 elements
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-
datahack-datacamp/train.csv")
Mydf
Mydf.head(5)
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-
datacamp/train.csv")
Mydf
Mydf.tail(7) the last 7 records
dataframe_object.append(object, ignore_index=True))
This method is used to append rows of other dataframe to the end of the given dataframe,
returning a new dataframe object.
Columns which do not exist in the original dataframe are added as new columns and the new
cells appear with default value NaN.
ignore_index is an argument which by default is false and repeats the index address
of the independent dataframe. But when set to true will show the index address
as of the new dataframe in the order.
Limitation with a dictionary of series when the ignore_index is not set to true (direct updating
a row as a dictionary not as a dataframe)
Inserting or adding a new Dataframe to an existing dataframe created from a list of
dictionaries.
* without ignore_index
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit
Sinha'])
print(Mydf, "\n")
Newdf=Mydf.append({'Term1%':100, 'Term2%':100}, ignore_index=True)
print(Newdf)
Inserting or adding a new Dataframe to an existing dataframe created from a list of
dictionaries.
* without ignore_index
Extras
pandas.concat( )
pandas.concat([row_value, dataframe_object]).reset_index(drop)
OR
pandas.concat([dataframe_objects], axis, join, join_axes[ ] , ignore_index)
row_value - is the new row value for a predefined dataframe which has to be added.
dataframe_object - is the dataframe in which a new row has to be added.
reset_index(drop) - is the method of the concat( ) which allows to reset the index
address values of the new dataframe.
axis - default value is 0 (0 / 1) which means adding of the new row will be row wise.
join - default value is 'outer' ('outer' / 'inner' ) where outer is for union between the
dataframe objects and inner for the intersection of the dataframe objects.
join_axes[ ] - replaces the indexes of the dataframes with a new set of indexes,
ignoring their actual index and if one of the dataframes is longer than the
corresponding index, then that particular index will be truncated in the resultant
dataframe.
import pandas as pd
Mydf = pd.read_csv("https://www.nseindia.com/live_market/dynaContent/live_watch/
equities_stock_watch.htm")
Mydf1
Mydf2
Mydf3
Listdf = [ Mydf1, Mydf2, Mydf3 ]
Finaldf = pandas.concat(Listdf)
***********************************
2. using for loop with loc attribute of dataframe_object – the loc attribute of the
dataframe object is used to iterate (move across) through the rows of the dataframe as this attribute
takes the location value as input.
range( )
len(data_object) – is a method which returns the total number of characters of the
specified data_object
Ex. 1
import pandas as pd
MyDoS = { 'Term1%': pd.Series([90, 'NA', 90, 99], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha']),
'Term2%': pd.Series([80, 90, 85, 'ML'], index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha'])
}
Mydf=pd.DataFrame(MyDoS)
print("\n Iterating over rows using the index attribute :\n")
for i in range(len(Mydf)) :
print(Mydf.loc[ i, 'Term1%' ] )
Ex. 2
import pandas
LoD = [ { 'Term1%': 90, 'Term2%': 80 },
{ 'Term1%': 'NA', 'Term2%': 90 },
{ 'Term1%': 90, 'Term2%': 85 },
{ 'Term1%': 99, 'Term2%': 'ML' }
]
Mydf=pandas.DataFrame(LoD, index=['Amit Shekhar', 'Aryaman Bhagat', 'Bhavyam Kamal', 'Pulkit Sinha'])
print("\n Iterating over rows using the index attribute :\n")
for i in range(len(Mydf)) :
print(Mydf.loc[ i, 'Term1%' ] )
Ex. 3
import pandas as pd
Mydf = pd.read_csv("https://s3-ap-southeast-1.amazonaws.com/av-datahack-datacamp/train.csv")
print("\n Iterating over rows using the index attribute :\n")
for i in range(len(Mydf)) :
print(Mydf.loc[ i, 'Term1%' ] )
Ex. 4
import pandas as pd
sample = { 'Employee' : ['Amitej', 'Prakhar', 'Naman', 'Amitej', 'Prakhar'],
'Payable Amount':[10000, 12000, 14000, 20000, 15000]
}
mydf = pd.DataFrame(sample)
mydf
print("\n Iterating over rows using the index attribute :\n")
for i in rang(len(mydf):
print( mydf.loc [ i, 'Employee'])
iterrows( ): - is the method / function which shows data of a dataframe row wise.
executable statement(s)
# row is the fixed keyword with initial row index value which keeps iterating (increases by 1)
# rowSeries is the keyword which extracts the data for each column along with column heading of the current row
index
Eg- for (row, rowSeries) in mydf1.iterrows( ):
So, first create ndarrays objects and then create a dataframe object out of those
ndarrays objects.
For the same given data as above (#Ref1) How many ndarrays?? (As many rows
those many ndarrays)
Creation of ndarrays objects –
import numpy
n1=numpy.array(['Abhishek', 95, 90, 91] )
n2=numpy.array(['Amitej', 96, 89, 93] )
n3=numpy.array(['Prakhar', 97, 88, 95] )
n4=numpy.array(['Bhavya', 98, 87, 97] )
n5=numpy.array(['Stephy', 99, 86, 99] )
print(df1)
0 1 2 3
0 Abhishek 95 90 91
1 Amitej 96 89 93
2 Prakhar 97 88 95
3 Bhavya 98 87 97
4 Stephy 99 86 99
The output does not match with the table #Ref1 (Default row and column labels are
appearing)
Method 1
df1.index.name='Roll No'
df1
Method 2
df1.rename_axis('Roll No', inplace=True)
df1
Ex. 2 Create a dataframe from ndarrays using the below data as data
source
Roll No Name M1 M2 M3
1 Abhishek 95 90 AB
2 Amitej 96 89
3 Prakhar 97 ML 95
4 Bhavya 98 87 97
5 Stephy 99 NaN
import numpy
n1=numpy.array( [ 'Abhishek', 95, 90, 'AB' ] )
n2=numpy.array( [ 'Amitej', 96, 89 ] )
n3=numpy.array( [ 'Prakhar', 97, 'ML', 95 ] )
n4=numpy.array( [ 'Bhavya', 98, 87, 97 ] )
n5=numpy.array( [ 'Stephy', 99, 'NaN', ] )
import pandas
df1=pandas.DataFrame( [ n1, n2, n3, n4, n5 ] , columns = [ 'Name', 'M1', 'M2', 'M3' ]
)
df1.index.name= 'Roll No'
df1
***********************