0% found this document useful (0 votes)
14 views

Python Code

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Python Code

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

1.

Data Handling Using Pandas


Python module- A python module is a python script file(.py file) containing variables, python classes,
functions, statements etc.

Python Library/package- A Python library is a collection of modules that together cater to a specific type of
need or application. The advantage of using libraries is that we can directly use functions/methods for
performing specific type of application instead of rewriting the code for that particular use. They are used by
using the import command as-
import libraryname
at the top of the python code/script file.

Some examples of Python Libraries-


1. Python standard library-It is a collection of library which is normally distributed along with Python
installation. Some of them are-
a. math module- provides mathematical functions
b. random module- provides functions for generating pseudo-random numbers.
c. statistics module- provides statistical functions
2. Numpy (Numerical Python) library- It provides functions for working with large multi-dimensional
arrays(ndarrays) and matrices. NumPy provides a large set of mathematical functions that can
operate quickly on the entries of the ndarray without the need of loops.
3. Pandas (PANel + DAta) library- Pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool. Pandas is built on top of NumPy, relying on ndarray and its fast and
efficient array based mathematical functions.
4. Matplotlib library- It provides functions for plotting and drawing graphs.

Data Structure- Data structure is the arrangement of data in such a way that permits efficient access and
modification.

Pandas Data Structures- Pandas offers the following data structures-


a) Series - 1D array
b) DataFrame - 2D array
c) Panel - 3D array (not in syllabus)

Series- Series is a one-dimensional array with homogeneous data.


Index/Label
0 1 2 3 4
abc def ghi Jkl mno
1D Data values

Key features of Series-


A Series has only one dimension, i.e. one axis
Each element of the Series can be associated with an index/label that can be used to access the data
Series is data mutable i.e. the data values can be changed in-place in memory
Series is size immutable i.e. once a series object is created in memory with a fixed number of
elements, then the number of elements cannot be changed in place. Although the series object can
be assigned a different set of values it will refer to a different location in memory.
All the elements of the Series are homogenous data i.e. their data type is the same. For example.
0 1 2 3 4
all data is of int type
223 367 456 339 927

a b c de fg
all data is of object type
1 def 10.5 Jkl True
Creating a Series- A series object can be created by calling the Series() method in the following ways-
a) Create an empty Series- A Series object not containing any elements is an empty Series. It can be
created as follows-
import pandas as pd
s1=pd.Series()
print(s1)

o/p-
Series([], dtype: float64)

b) Create a series from array without index- A numpy 1D array can be used to create a Series object as

import pandas as pd
import numpy as np
a1=np.array(['hello', 'world', 'good', np.NaN])
s1=pd.Series(a1)
print(s1)

o/p-
0 hello
1 world
2 good
3 nan
dtype: object

c) Create a series from array with index- The default index for a Series object can be changed and
specified by the programmer by using the index parameter and enclosing the index in square
brackets. The number of elements of the array must match the number of index specified otherwise
python gives an error.
#Creating a Series object using numpy array and specifying index
import pandas as pd
import numpy as np
a1=np.array(['hello', 'world', 'good', 'morning'])
s1=pd.Series(a1, index=[101, 111, 121, 131])
print(s1)
o/p-
101 hello
111 world
121 good
131 morning
dtype: object

d) Create a Series from dictionary- Each element of the dictionary contains a key:value pair. The key of
the dictionary becomes the index of the Series object and the value of the dictionary becomes the
data.
#4 Creating a Series object from dictionary
import pandas as pd

d={101:'hello', 111:'world', 121:'good', 131:'morning'}


s1=pd.Series(d)
print(s1)

o/p-
101 hello
111 world
121 good
131 morning
dtype: object

e) Create a Series from dictionary, reordering the index- When we are creating a Series object from a
dictionary then we can specify which all elements of the dictionary, we want to include in the Series
object and in which order by specifying the index argument while calling the Series() method.
If any key of the dictionary is missing in the index argument, then that element is not added
to the Series object.
If the index argument contains a key not present in the dictionary then a value of NaN is
assigned to that particular index.
The order in which the index arguments are specified determines the order of the elements
in the Series object.
#5 Creating a Series object from dictionary reordering the index
import pandas as pd

d={101:'hello', 111:'world', 121:'good', 131:'morning'}


s1=pd.Series(d, index=[131, 111, 121, 199])
print(s1)

o/p-
131 morning
111 world
121 good
199 NaN
dtype: object
f) Create a Series from a scalar value- A Series object can be created from a single value i.e. a scalar
value and that scalar value can be repeated many times by specifying the index arguments that
many number of times.
#6 Creating a Series object from scalar value
import pandas as pd

s1=pd.Series(7, index=[101, 111, 121])


print(s1)

o/p-
101 7
111 7
121 7
dtype: int64

g) Create a Series from a List- A Series object can be created from a list as shown below.
#7 Creating a Series object from list
import pandas as pd
L=['abc', 'def', 'ghi', 'jkl']
s1=pd.Series(L)
print(s1)

o/p-
0 abc
1 def
2 ghi
3 jkl
dtype: object

h) Create a Series from a Numpy Array (using various array creation methods) - A Series object can be
created from a numpy array as shown below. All the methods of numpy array creation can be used
to create a Series object.
#7a Creating a Series object from list
import pandas as pd
import numpy as np

#a. Create an array consisting of elements of a list [2,4,7,10, 13.5, 20.4]


a1=np.array([2,4,7,10, 13.5, 20.4])
s1=pd.Series(a1)
print('s1=', s1)

#b. Create an array consisting of ten zeros.


a2=np.zeros(10)
s2=pd.Series(a2, index=range(101, 111))
print('s2=', s2)

#c. Create an array consisting of five ones.


a3=np.ones(5)
s3=pd.Series(a3)
print('s3=', s3)
#d. Create an array consisting of the elements from 1.1, 1.2, 1.3,1.4, 1.5, 1.6, 1.7
a4=np.arange(1.1,1.8,0.1)
s4=pd.Series(a4)
print('s4=', s4)

#e. Create an array of 10 elements which are linearly spaced between 1 and 10 (both inclusive)
a5=np.linspace(1,10,4)
s5=pd.Series(a5)
print('s5=', s5)

#f.
a6=np.fromiter('helloworld', dtype='U1')
s6=pd.Series(a6)
print('s6=', s6)

o/p:
s1= 0 2.0
1 4.0
2 7.0
3 10.0
4 13.5
5 20.4
dtype: float64
s2= 101 0.0
102 0.0
103 0.0
104 0.0
105 0.0
106 0.0
107 0.0
108 0.0
109 0.0
110 0.0
dtype: float64
s3= 0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
s4= 0 1.1
1 1.2
2 1.3
3 1.4
4 1.5
5 1.6
6 1.7
dtype: float64
s5= 0 1.0
1 4.0
2 7.0
3 10.0
dtype: float64
s6= 0 h
1 e
2 l
3 l
4 o
5 w
6 o
7 r
8 l
9 d
dtype: object

Operations on Series objects-


1. Accessing elements of a Series object
The elements of a series object can be accessed using different methods as shown below-
a) Using the indexing operator []
The square brackets [] can be used to access a data value stored in a Series object. The index
of the element must be entered within the square brackets. If the index is a string then the
index must be written in quotes. If the index is a number then the index must be written
without the quotes. Attempting to use an index which does not exist leads to error.
#8 Accessing elements of Series using index
import pandas as pd

d={101:'hello', 'abc':'world', 121:'good', 131:'morning'}


s=pd.Series(d)
print(s['abc'])
print(s[131])

o/p-
world
morning

b) Using the get() method


The get() method returns the data value associated with an index.
Syntax: seriesobject.get(key, default=None)
The first argument to the get method is the index of the element which we want to access.
Here if the key/index is not present in the series object and the second argument is not
specified then None is returned. If the key is not present and we want some default value to
be returned then it is specified using the default argument.
#9 Accessing elements of Series using get() method
import pandas as pd

d={101:'hello', 'abc':'world', 121:'good', 131:'morning'}


s=pd.Series(d)
print(s.get('abc'))
print(s.get(131))
print(s.get(200))
print(s.get(333, default='nice day'))

o/p-
world
morning
None
nice day

c) Using the at property of the Series object


The at property of a Series object can be used to access a data value using an index. The
limitation of the at property is that all the indexes must NOT be numbers. If the index is not
present in the Series object then it gives an error.

#10 Accessing elements of Series using at property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
print(s.at['def'])

o/p-
world

d) Using the iat property of the Series object


The iat property of a Series object can be used to access a data value using the integer
position of the index. Here we can use the forward indexing method (i.e. the index starts
irst having index -1,-2,-
If the integer value is out of bounds then it gives an error.

#11 Accessing elements of Series using iat property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
print(s.iat[0])
print(s.iat[-1])

o/p-
hello
morning

e) Using the loc property of the Series object


The loc property of a Series object can be used to access a range of data values using the
label/index name inside [] brackets in the following ways:
1. A single index can be passed to the loc property. This will return back a single value.
2. A list of indexes can be passed. This will return back a Series object containing the
multiple values
3. A slice notation using labels/index such as startindex:stopindex. Here contrary to the
slice notation the ending index value also is included in the result.
4. A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

#12 Accessing elements of Series using loc property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
x=s.loc['def']
print(type(x))
print(x)
y=s.loc[['def', 'jkl']] #note the use of nested [[]]
print(type(y))
print('y=\n', y)
z=s.loc['def':'jkl']
print('z=\n', z)
m=s.loc[[False,True,True,False]] #note the use of nested [[]]
print('m=\n', m)

o/p-
<class 'str'>
world
<class 'pandas.core.series.Series'>
y=
def world
jkl morning
dtype: object
z=
def world
ghi good
jkl morning
dtype: object
m=
def world
ghi good
dtype: object

f) Using the iloc property of the Series object


The iloc property of a Series object can be used to access a range of data values using the
index position numbers inside [] brackets in the following ways:
1. A single int can be passed to the iloc property. This will return back a single value.
2. A list of int representing index position numbers can be passed. This will return back a
Series object containing the multiple values
3. A slice notation using index position numbers can be passed. The data values at the slice
position numbers will the included in the returned Series object
4. A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

#13 Accessing elements of Series using iloc property


import pandas as pd

d={'abc':'hello', 'def':'world', 'ghi':'good', 'jkl':'morning'}


s=pd.Series(d)
x=s.iloc[2]
print(type(x))
print(x)
y=s.iloc[[0,2]] #note the use of nested [[]]
print(type(y))
print('y=\n', y)
z=s.iloc[1:]
print('z=\n', z)
m=s.iloc[[False,True,True,False]] #note the use of nested [[]]
print('m=\n', m)

o/p-
<class 'str'>
good
<class 'pandas.core.series.Series'>
y=
abc hello
ghi good
dtype: object
z=
def world
ghi good
jkl morning
dtype: object
m=
def world
ghi good
dtype: object

2. Accessing the top elements of a Series object


The head() method can be used to return back the top elements of a Series object. This function returns
back another Series object. If no parameter is passed to the head() method it returns back the top 5
elements. If an integer parameter (say n) is passed to the head() method, then the top n elements of the
Series object is returned back. The index of the respective elements is returned as it was in the original
object.
#14 Accessing the top elements of a Series object
import pandas as pd

L=[101, 111, 121, 131, 141, 151, 161, 171, 181, 191, 201, 211]
s=pd.Series(L)
x=s.head()
print('x=\n', x)
y=s.head(3)
print('y=\n', y)

o/p:
x=
0 101
1 111
2 121
3 131
4 141
dtype: int64
y=
0 101
1 111
2 121
dtype: int64

3. Accessing the bottom elements of a Series object


The tail() method can be used to return back the bottom elements of a Series object. This function returns
back another Series object. If no parameter is passed to the tail() method it returns back the bottom 5
elements. If an integer parameter (say n) is passed to the tail() method, then the bottom n elements of the
Series object is returned back. The index of the respective elements is returned as it was in the original
object.

#15 Accessing the bottom elements of a Series object


import pandas as pd

L=[101, 111, 121, 131, 141, 151, 161, 171, 181, 191, 201, 211]
s=pd.Series(L)
x=s.tail()
print('x=\n', x)
y=s.tail(3)
print('y=\n', y)
o/p:
x=
7 171
8 181
9 191
10 201
11 211
dtype: int64
y=
9 191
10 201
11 211
dtype: int64
4. Indexing/Slicing a Series object-
The index [] operator can be used to perform indexing and slicing operations on a Series object. The index[]
operator can accept either-
a) Index/labels
b) Integer index positions

a) Using the index operator with labels-


The index operator can be used in the following ways-
i) Using a single label inside the square brackets- Using a single label/index inside the square brackets
will return only the corresponding element referred to by that label/index.
# 16 indexing a Series object single label
import pandas as pd

d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}


s=pd.Series(d)
t=s['b']
print(t)

o/p:
102

ii) Using multiple labels- We can pass multiple labels in any order that is present in the Series object.
The multiple labels must be passed as a list i.e. the multiple labels must be separated by commas and
enclosed in double square brackets. Passing a label is passed that is not present in the Series object,
should be avoided as it right now gives NaN as the value but in future will be considered as an error by
Python.
# 17 indexing a Series object multiple labels
import pandas as pd

d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}


s=pd.Series(d)
u=s[['b', 'a', 'f']]
print(u)

o/p:
b 102
a 101
f 106
dtype: int64

iii) Using slice notation startlabel:endlabel- Inside the index operator we can pass startlabel:endlabel.
Here contrary to the slice concept all the items from startlabel values till the endlabel values including
the endlabel values is returned back.
# 18 indexing a Series object using startlabel:endlabel
import pandas as pd

d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}


s=pd.Series(d)
u=s['b': 'e']
print(u)

o/p:
b 102
c 103
d 104
e 105
dtype: int64

b) Slicing a Series object using Integer Index positions-


The concept of slicing a Series object is similar to that of slicing python lists, strings etc. Even though the data
type of the labels can be anything each element of the Series object is associated with two integer numbers:

first element, 1 being assigned to the second element and so on.


In backward indexing method the elements are numbered from -1,-2, - -1 being assigned to
the last element, -2 being assigned to the second last element and so on.
For example consider the following Series object-
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
The Series object is having the following integer index positions-
forward
indexing---> 0 1 2 3 4 5
a b c d e f
101 111 121 131 141 151
<----- backward
-6 -5 -4 -3 -2 -1 indexing

Slice concept-
The basic concept of slicing using integer index positions are common to Python object such as strings, list,
tuples, Series, Dataframe etc. Slice creates a new object using elements of an existing object. It is created as:
ExistingObjectName[start : stop : step] where start, stop , step are integers

The basic rules of slice:


i. The slice generates index/integers from : start, start + step, start + step + step, and so on. All the
numbers generated must be less than the stop value when step is positive.
ii. If step value is missing then by default is taken to be 1
iii. If start value is missing and step is positive then start value is by default taken as 0.
iv. If stop value is missing and step is positive then start value is by default taken to mean till you reach
the ending index(including the ending index)
v. A negative step value means the numbers are generated in backwards order i.e. from - start, then
start - step, then start -step -step and so on. All the numbers generated in negative step must be
greater than the stop value.
vi. If start value is missing and step is negative then start value takes default value -1
vii. If stop value is missing and step is negative then stop value is by default taken to be till you reach the
first element(including the 0 index element)
#16 Slicing a Series object
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

x=s[1: :2]
print('x=\n', x)

y=s[-1: :-1]
print('y=\n', y)

z=s[1: -2: 2]
print('z=\n', z)

o/p:
x=
b 111
d 131
f 151
dtype: int64
y=
f 151
e 141
d 131
c 121
b 111
a 101
dtype: int64
z=
b 111
d 131
dtype: int64

5. Modifying elements of Series object-


The elements of a Series object can be modified using any of the following methods-
a) Using index [] operator to modify single/multiple values
#20 Modifying a Series object index [] method
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s['c'] = 555
s[['f','a']] = [666,777]
print('s=\n', s)

s['b':'d']=[0,1,2]
print('s=\n', s)
o/p:
s=
a 777
b 111
c 555
d 131
e 141
f 666
dtype: int64
s=
a 777
b 0
c 1
d 2
e 141
f 666
dtype: int64

b) Using at/iat property to modify a single value


#21 Modifying a Series object at iat property
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s.at['d'] = 999
s.iat[-1] = 777
print('s=\n', s)

o/p:
s=
a 101
b 111
c 121
d 999
e 141
f 777
dtype: int64

c) Using loc, iloc property to modify single /multiple values


#22 Modifying a Series object loc iloc property
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s.loc['b'] = 9
s.loc['e':'f'] = [8,7]
print('s=\n', s)
s.iloc[1: :2] = [33,44,55]
print('s=\n', s)

o/p:
s=
a 101
b 9
c 121
d 131
e 8
f 7
dtype: int64
s=
a 101
b 33
c 121
d 44
e 8
f 55
dtype: int64

d) Using slice method to modify multiple values


#23 Modifying a Series object slice method
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}


s=pd.Series(d)

s[1: :2] = [1,2,3]


print('s=\n', s)

o/p:
s=
a 101
b 1
c 121
d 2
e 141
f 3
dtype: int64

6. Changing indexes of Series object-


The index property can be used to change the indexes of a Series object.
#24 Changing indexes of Series object
import pandas as pd

d={'a':101, 'b':111, 'c':121, 'd':131}


s=pd.Series(d)

s.index = ['have','a','nice', 'day']


print('s=\n', s)
o/p:
s=
have 101
a 111
nice 121
day 131
dtype: int64

7. Vector/Arithmetic Operations on Series object-


All Arithmetic, Relational, Logical Operations are possible between a Series object and a scalar(single) value
as well as between two Series object.

Vector Operations: When an operation is performed on a Series object then all the elements of that object
take part in that operation. Such operations are known as Vector Operations. A Series object supports Vector
operations and the result that is returned is also a Series object.

Data Alignment: When Vector/Arithmetic operations are performed between two Series objects then the
data is aligned based on the common/matching labels/indexes between the two Series objects. In the case
where the data does not align due to incompatible data or corresponding index not being available then
usually a result of NaN is assigned to the result.
#25 Vector Operations on Series object
import pandas as pd

d={'a':21, 'b':5, 'c':11, 'd':3}


s1=pd.Series(d)

s2=s1*2
print('s2=\n', s2)

s3=s1>10
print('s3=\n', s3)

o/p:
s2=
a 42
b 10
c 22
d 6
dtype: int64
s3=
a True
b False
c True
d False
dtype: bool

#26 Arithmetic Operations on Series object


import pandas as pd
d1={'a':21, 'b':5, 'c':11, 'd':3}
s1=pd.Series(d1)

d2={'b':2, 'c':7, 'e':5 }


s2=pd.Series(d2)

s3=s1+s2
print('s3=\n', s3)

o/p:
s3=
a NaN
b 7.0
c 18.0
d NaN
e NaN
dtype: float64

7. Deleting an element of Series object-


The following commands can be used to delete an element of a Series object.
a) del seriesobject[index]
The del command can be used to delete a particular element of the series object by specifying the
index.

b) seriesobject.drop(labels=[list_of_indexes], inplace=False)
The drop() method can be used to delete one or more elements by passing a list of labels/indexes to
the labels parameter. If the inplace parameter is not passed or is False then the new series object
with the elements deleted is returned back. If the inplace=True parameter is passed then the current
object is modified in place.
c) seriesobject.pop(index)
The pop() method is passed an index of the element that is to be deleted. This method returns back
the value of the element that is being deleted.

import pandas as pd

s1=pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
del s1['b']
print('s1=',s1)

s1.drop(labels=['c','e'], inplace=True)
print('s1=',s1)

x=s1.pop('d')
print('Deleted value:',x)
print('s1=',s1)

o/p:
s1= a 1
c 3
d 4
e 5
dtype: int64
s1= a 1
d 4
dtype: int64
Deleted value: 4
s1= a 1
dtype: int64
8. Boolean Indexing in Series object-
The index operator [], the loc and the iloc properties can be passed a boolean list containing the same
number of elements as that in the Series object. Wherever the True values are present, those elements are
selected and returned back as another Series object.

Inside the index operator a relational expression can also be passed that gives a boolean Series object having
the same number of elements as the Series object itself. If different boolean expressions are combined then
instead of using the 'and' operator the 'bitwise and' i.e. '&' is used, instead of 'or' operator the 'bitwise or'
i.e. '|' is used and instead of the 'not' operator the 'bitwise not' i.e. '~' is used.

Whenever multiple relational expressions are used for Boolean indexing then individual expressions must be
enclosed in parentheses since the bitwise and(&), or(|), not(~) operators have higher precedence than the
relational operators.
#Boolean indexing
import pandas as pd

s1=pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
s2=s1[[True,False,True,False,True]]
print('s2=',s2)

s3=s1[s1>=3]
print('s3=',s3)

s4=s1[(s1>=2)&(s1<=4)]
print('s4=',s4)

s5=s1[(s1<2)|(s1>4)]
print('s5=',s5)

s6=s1[~(s1==3)]
print('s6=',s6)

o/p:
s2= a 1
c 3
e 5
dtype: int64
s3= c 3
d 4
e 5
dtype: int64
s4= b 2
c 3
d 4
dtype: int64
s5= a 1
e 5
dtype: int64
s6= a 1
b 2
d 4
e 5
dtype: int64

9. Mathematical Properties/Methods of Series object-


The following Mathematical functions are defined on Series objects:
Sr. Property/Method Description Example
No. (import pandas as pd is already present)
1 is_monotonic Return True if values in the d1={'a':21, 'b':55, 'c':61, 'd':93}
object are s1=pd.Series(d1)
monotonic_increasing print(s1.is_monotonic)
otherwise Returns False.
o/p:
True
2 is_monotonic_decreasing Return True if values in the d1={'a':9, 'b':7, 'c':5, 'd':1}
object are s1=pd.Series(d1)
monotonic_decreasing. print(s1.is_monotonic_decreasing)

o/p:
True
3 is_monotonic_increasing Return boolean if values in the d1={'a':9, 'b':7, 'c':5, 'd':1}
object are s1=pd.Series(d1)
monotonic_increasing. print(s1.is_monotonic_increasing)

o/p:
False
4 ndim Returns the number of d1={'a':9, 'b':1, 'c':7, 'd':2}
dimensions of the underlying s1=pd.Series(d1)
data, for Series object by print(s1.ndim)
definition has only one
dimension i.e. 1. o/p:
1
5 shape, size shape property returns a tuple d1={'a':9, 'b':1, 'c':7, 'd':2}
(n,) containing a single element s1=pd.Series(d1)
which is the number of print(s1.shape)
elements in the Series object. print(s1.size)

size property also returns an o/p:


integer value containing the (4,)
number of elements in the 4
Series object.
6 abs() Returns a Series object with s1=pd.Series([2.5, -3.2, 4, -99], index=['a','b','c','d'])
absolute i.e. positive values of s2=s1.abs()
each numeric element print('s2=\n', s2)
o/p:
s2=
a 2.5
b 3.2
c 4.0
d 99.0
dtype: float64
7 obj1.add(obj2, fill_value=None) Return the addition of series #28 add Function on Series object
element/index-wise. import pandas as pd
Fill value: import numpy as np
Fill existing missing (NaN)
values, and any new element a = pd.Series([1, 1, 1, np.NaN], index=['a', 'b', 'c',
needed for successful Series 'd'])
alignment, with this value b = pd.Series([10, 15, 20, np.nan], index=['a', 'b', 'd',
before computation. If data in 'e'])
both corresponding Series c = a.add(b)
locations is missing the result print('c=\n', c)
will be missing.
d=a.add(b, fill_value=100)
print('d=\n', d)

o/p:
c=
a 11.0
b 16.0
c NaN
d NaN
e NaN
dtype: float64
d=
a 11.0
b 16.0
c 101.0
d 120.0
e NaN
dtype: float64
8 obj1.radd(obj2, fill_value=None) Reverse addition. Technically
does obj2+obj1, similar to add
function
9 agg() Both agg() and aggregate() a = pd.Series([23,11,2,7,7,2],
aggregate() methods are used to perform index=['a','b','c','d','e','f'])
aggregate functions over a b = a.agg('min')
Series object. The aggregate print('b=', b)
operation viz 'min', 'max', 'sum', c = a.agg('sum')
'mean', 'median', 'mode' are print('c=', c)
passed as string parameter to
the agg()/aggregate() method. d = a.aggregate('mean')
print('d=', d)

e = a.aggregate('mode') #multiple modes


print('e=\n', e)

f = a.aggregate('median')
print('f=', f)

o/p:
b= 2
c= 52
d= 8.666666666666666
e=
0 2
1 7
dtype: int64
f= 7.0
10 obj1.count() Return number of non-NA/null import pandas as pd
observations in the Series import numpy as np

a = pd.Series([1, 1, 1, np.NaN], index=['a', 'b', 'c',


'd'])
print(a.count())

o/p:
3
11 obj1.div(obj2, fill_value=None) Return the floating division of #30 divide Function on Series object
series element/index-wise. import pandas as pd
obj1.divide(obj2, fill_value=None) Fill value: import numpy as np
Fill existing missing (NaN)
obj1.truediv(obj2, fill_value=None) values, and any new element a = pd.Series([10, 12, 13, np.NaN], index=['a', 'b',
needed for successful Series 'c', 'd'])
obj1.rtruediv (obj2, fill_value=None) alignment, with this value b = pd.Series([2, 5, 6, np.nan], index=['a', 'b', 'd',
before computation. If data in 'e'])
obj1.rdiv(obj2, fill_value=None) both corresponding Series c = a.div(b)
locations is missing the result print('c=\n', c)
will be missing.
d=a.divide(b, fill_value=3)
rdiv does reverse division i.e print('d=\n', d)
obj2/obj1
e=a.rdiv(b)
print('e=\n', e)

o/p:
c=
a 5.0
b 2.4
c NaN
d NaN
e NaN
dtype: float64
d=
a 5.000000
b 2.400000
c 4.333333
d 0.500000
e NaN
dtype: float64
e=
a 0.200000
b 0.416667
c NaN
d NaN
e NaN
dtype: float64
12 Relational functions on Series object: The eq, ne gt, ge, lt, le functions x = pd.Series([12,2,3,4,np.NaN], index=['a', 'b', 'c',
obj1.eq(obj2, fill_value=None) are similar to the relational 'd','f'])
operators ==, !=, >, >=, <, <=. y = pd.Series([1,9,4,5,np.NaN], index=['a', 'b', 'd',
obj1.ne(obj2, fill_value=None) These functions return back a 'e','f'])
Series object which compares m = x.eq(y)
obj1.gt(obj2, fill_value=None) the elements index-wise and print('m=\n', m)
returns either True/False.
obj1.ge(obj2, fill_value=None) n = x.gt(y)
If only one value is missing or print('n=\n', n)
obj1.lt(obj2, fill_value=None) NaN then result is False.
p=x.le(y)
obj1.le(obj2, fill_value=None) In addition the use of fill_value print('p=\n', p)
is:
Fill value: q=x.le(y,fill_value=10)
Fill existing missing (NaN) print('q=\n', q)
values, and any new element
needed for successful Series o/p:
alignment, with this value m=
before computation. a False
b False
If data in both corresponding c False
Series locations is missing/NaN d True
the result will be False. e False
f False
dtype: bool
n=
a True
b False
c False
d False
e False
f False
dtype: bool
p=
a False
b True
c False
d True
e False
f False
dtype: bool
q=
a False
b True
c True
d True
e False
f False
dtype: bool
13 equals() It returns a single True value if x = pd.Series([1,2,3,4,np.NaN], index=['a', 'b', 'c',
all the elements of both Series 'd','e'])
object match index-wise. If y = pd.Series([1,2,3,4,np.NaN], index=['a', 'b', 'c',
both Series object contain NaN 'd','e'])
at same position then also it print(x.equals(y))
evaluates to True. Otherwise it
returns False. o/p:
True
14 fillna(inplace=False) It fills NaN values with the x = pd.Series([1,2,3,4,np.NaN], index=['a', 'b', 'c',
parameter passed. If inplace 'd','e'])
parameter is not passed or is y = x.fillna(0)
False, then it returns back print('x=', x)
another Series object with the print('y=', y)
NaN values filled with the
parameter that was passed. z = pd.Series([2,2,np.NaN], index=['a', 'b', 'c'])
z.fillna(99,inplace=True)
If inplace=True parameter is print('z=', z)
passed then the current object
itself is modified. o/p:
x= a 1.0
b 2.0
c 3.0
d 4.0
e NaN
dtype: float64
y= a 1.0
b 2.0
c 3.0
d 4.0
e 0.0
dtype: float64
z= a 2.0
b 2.0
c 99.0
dtype: float64
15 obj1.floordiv(obj2, fill_value=None) Performs action similar to // i.e. x = pd.Series([10,7,9,5,np.NaN], index=['a', 'b', 'c',
integer/floor division with the 'd','e'])
obj1.rfloordiv (obj2, fill_value=None) addition of the fill_value y = pd.Series([3,2,5,4,7], index=['a', 'b', 'c', 'd','e'])
argument. z = x.floordiv(y)
print('z=',z)

o/p:
z= a 3.0
b 3.0
c 1.0
d 1.0
e NaN
dtype: float64
16 max(), min(), mean(), median(), max-finds the max a = pd.Series([2,1,2,1,np.NaN],
mode(), sum() min-finds the min index=['a','b','c','d','e'])
mean-finds the mean b = a.min()
median-finds the median print('b=', b)
mode-finds the mode. Mode
can return multiple values and c = a.max()
it returns back another Series print('c=', c)
object as the result.
sum-finds the sum. d = a.mean()
print('d=', d)
If any data is NaN then it is not
counted while doing the e = a.median()
calculation print('e=', e)

f = a.mode() #mode can return multiple values


print('f=\n', f)

g = a.sum()
print('g=', g)

o/p:
b= 1.0
c= 2.0
d= 1.5
e= 1.5
f=
0 1.0
1 2.0
dtype: float64
g= 6.0
17 obj1.mul(obj2, fill_value=None) Similar to the * operator with a = pd.Series([2,1,3,4,np.NaN],
the addition of fill_value index=['a','b','c','d','e'])
obj1.multiply (obj2, fill_value=None) argument b = pd.Series([4,5,7,np.NaN], index=['a','b','d','e'])
c= a.mul(b)
print('c=\n', c)

o/p:
c=
a 8.0
b 5.0
c NaN
d 28.0
e NaN
dtype: float64
18 nlargest(), nsmallest() If no parameter is passed, it a = pd.Series([2,1,3,4,7,10, 19, 21, 8, np.NaN])
returns the top 5 largest / b = a.nlargest()
smallest element in the Series print('b=\n', b)
object. If an integer parameter
x, is passed then it returns the c=a.nsmallest(3)
top 'x' largest/smallest print('c=\n', c)
elements in the Series object.
o/p:
b=
7 21.0
6 19.0
5 10.0
8 8.0
4 7.0
dtype: float64
c=
1 1.0
0 2.0
2 3.0
dtype: float64
19 obj1.pow(obj2, fill_value=None) Similar to the ** operator with a = pd.Series([2,1,3], index=['a','b','c'])
the option of using the b = pd.Series([3,4,2], index=['a','b','d'])
fill_value parameter to fill NaN c = a.pow(b)
values. print('c=\n', c)

o/p:
c=
a 8.0
b 1.0
c NaN
d NaN
dtype: float64
20 obj1.prod() Returns the product of the a = pd.Series([2,4,3], index=['a','b','c'])
obj1.product() values in the Series object b = a.prod()
print('b=', b)

o/p:
b= 24
21 obj1.round(decimals=0) Round each value in a Series to a = pd.Series([212.542,452.987,327.192],
the given number of decimals. index=['a','b','c'])
The parameter decimals has b = a.round()
default value of 0 i.e. if no print('b=\n', b)
parameter is specified then it c = a.round(2)
rounds to integers. If decimals print('c=\n', c)
is negative, it specifies the d = a.round(-2)
number of positions to the left print('d=\n', d)
of the decimal point
o/p:
b=
a 213.0
b 453.0
c 327.0
dtype: float64
c=
a 212.54
b 452.99
c 327.19
dtype: float64
d=
a 200.0
b 500.0
c 300.0
dtype: float64
22 obj1.std(ddof=1) std() without any parameters a = pd.Series([9, 2, 5, 4])
takes default ddof parameter as b = a.std() #calculates sample standard
1 and calculates the sample deviation
standard deviation: print('b=', b)
c = a.std(ddof=0) #calculates population standard
deviation
If we want to calculate the print('c=', c)
population standard deviation,
then use obj.std(ddof=0) which o/p:
is given by the formula: b= 2.943920288775949
c= 2.5495097567963922

23 obj1.var(ddof=1) var() without any parameters a = pd.Series([9, 2, 5, 4])


takes default ddof parameter as b = a.var() #calculates sample variance
1 and calculates the sample print('b=', b)
variance: c = a.var(ddof=0) #calculates population variance
print('c=', c)

o/p:
var(ddof=0) calculates the
b= 8.666666666666666
population variance:
c= 6.5

24 obj1.sub(obj2, fill_value=None) Similar to the - operator with mport numpy as np


the option of using the
obj1.subtract(obj2, fill_value=None) fill_value parameter to fill NaN a = pd.Series([9, 2, 5, 4])
values. b = pd.Series([1, 5, np.NaN, 4])
obj1.rsub(obj2, fill_value=None) c = a.sub(b)
obj1.rsub(obj2, print('c=\n', c)
fill_value=None)
rsub performs the reverse d = a.rsub(b)
subtract operation of obj2-obj1 print('d=\n', d)

o/p:
c=
0 8.0
1 -3.0
2 NaN
3 0.0
dtype: float64
d=
0 -8.0
1 3.0
2 NaN
3 0.0
dtype: float64

10. Dropping empty/NaN values-


The dropna() method can be used to drop empty/NaN values. The syntax is:
seriesobject.dropna(inplace=False)
The empty string '' is not considered as an NaN value whereas the object None is considered as empty and is
removed by the dropna() method. The default value for inplace parameter is False and the dropna() method
returns a new Series object with the empty/NaN values removed. If the parameter inplace=True is passed
then the current object is modified inplace and None is returned back by the method.
#53 dropping empty NaN values
import pandas as pd
import numpy as np

s1=pd.Series([1,None,3,4,np.NaN], index=['a','b','c','d','e'])
s2=s1.dropna()
print('s1=\n',s1)
print('s2=\n',s2)

s1.dropna(inplace=True)
print('s1=\n',s1)

o/p:
s1=
a 1.0
b NaN
c 3.0
d 4.0
e NaN
dtype: float64
s2=
a 1.0
c 3.0
d 4.0
dtype: float64
s1=
a 1.0
c 3.0
d 4.0
dtype: float64

11. Filling empty/NaN values-


The fillna() method can be used to fill empty/NaN values. The syntax is:
seriesobject.fillna(value, method=None,inplace=False,limit=None)
where -
value - is the value that is to be filled in place of empty/None/NaN values
method - is the filling method to be used, which can be one of :
'pad' / 'ffill' - front fill, propagate last valid observation forward to sequence of NaN values
till you reach the next valid value.
'backfill' / 'bfill' - back fill, take the next valid observation value and fill backwards the NaN
values till you reach a previous valid value
[Note: Either the value or the method parameter can be used. Both cannot be used together]
limit - the number of consecutive empty/NaN values to fill in. The remaining NaN values are left as is.
inplace - default value is False. If inplace=True is passed then the current object is modified inplace and None
is returned back by the method.
#53 filling empty NaN values
import pandas as pd
import numpy as np

s1=pd.Series([1,np.NaN,np.NaN,np.NaN,5], index=['a','b','c','d','e'])
s2=s1.fillna(0)
print('s2=\n',s2)

s3=s1.fillna(method='ffill')
print('s3=\n',s3)

s4=s1.fillna(method='bfill')
print('s4=\n',s4)

s1.fillna(1.5,limit=2,inplace=True)
print('s1=\n',s1)

o/p:
s2=
a 1.0
b 0.0
c 0.0
d 0.0
e 5.0
dtype: float64
s3=
a 1.0
b 1.0
c 1.0
d 1.0
e 5.0
dtype: float64
s4=
a 1.0
b 5.0
c 5.0
d 5.0
e 5.0
dtype: float64
s1=
a 1.0
b 1.5
c 1.5
d NaN
e 5.0
dtype: float64
DataFrame
DataFrame

A DataFrame is a two-dimensional data structure in the python pandas library which stores heterogeneous
(different kinds of) data in different columns.

columns

axis=1

index

axis = 0

rows

Key features of DataFrame

The DataFrame contains labelled axes (rows and columns).


The rows are also known as axis=0 and the row labels are also known as index.
The columns are also known as axis=1 and the column labels are also known simply as columns.
Any operations on DataFrame are aligned on both row as well as column labels.
All elements within a single column have the same data type, but different columns can have different
data types.
DataFrame is size mutable as well as data-mutable

For using the DataFrame object we must import the pandas library by using the statement:
import pandas as pd

Creating a DataFrame
The DataFrame() method is primarily used to create a DataFrame. It can accept different kinds of input. There
are many different ways of creating a DataFrame. Some of which are:

1. Creating an Empty DataFrame


The DataFrame method when it is called with no parameters creates an empty DataFrame.

#1 Creating an Empty DataFrame


import pandas as pd

df1=pd.DataFrame()
print('df1=\n',df1)

o/p:
df1=
Empty DataFrame
Columns: []
Index: []
2. Creating a DataFrame from List of Lists
A two-dimensional nested list can be used to create a DataFrame. The columns parmeter is used to
give pass the name of the columns as a list.
#2 Creating a DataFrame from List of Lists
import pandas as pd

L = [['abc', 15], ['def', 16], ['ghi', 17]]


df1=pd.DataFrame(L, columns=['name', 'age'])
print('df1=\n',df1)

o/p:
df1=
name age
0 abc 15
1 def 16
2 ghi 17

3. Creating a DataFrame from Dictionary of Lists/ndarrays/Series


A dictionary can be used to create a DataFrame. The key of the dictionary becomes the column label
and the values, which can be lists/ndarrays/Series objects, become the elements appearing under that
column. The row labels can be specified by passing to the index parameter, a list of row labels.
#3 Creating a DataFrame from Dictionary of Lists/ndarrays/Series
import pandas as pd
import numpy as np

d1 = {'name':['abc', 'def', 'ghi'], 'age':[15,16,17] }


df1 = pd.DataFrame(d1)
print('df1=\n',df1)

a1 = np.array(['jkl','mno','pqr'])
a2 = np.array([20,21,22])
d2 = { 'name' : a1, 'age' : a2 }
df2 = pd.DataFrame(d2, index=['r1', 'r2', 'r3'])
print('df2=\n', df2)

s1 = pd.Series(['stu','vw', 'xyz'])
s2 = pd.Series([23, 24, 25])
d3 = {'name':s1, 'age' : s2}
df3 = pd.DataFrame(d3)
print('df3=\n', df3)

o/p:
df1=
name age
0 abc 15
1 def 16
2 ghi 17
df2=
name age
r1 jkl 20
r2 mno 21
r3 pqr 22
df3=
name age
0 stu 23
1 vw 24
2 xyz 25

4. Creating a DataFrame from List of Dictionary


A DataFrame can be created from a List of Dictionary. The elements of the dictionary are key:value
pairs. The keys of the dictionary become the column names in the DataFrame object and the values of
the dictionary become the column-values of the DataFrame object. If any one of the column-
names(keys) is missing from a particular dictionary, then that column has a NaN value associated with
it in the DataFrame object.
#4 Creating a DataFrame from List of Dictionary
import pandas as pd

d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]


df1 = pd.DataFrame(d1)
print('df1=\n',df1)

o/p:
df1=
age class name
0 15 NaN abc
1 16 5.0 def

5. Creating a DataFrame from List of Dictionary and specifying row index


Similar to the previous example we can create a DataFrame from a list of dictionary but in addition
instead of the default row labels
index=[list_of _row_labels] parameters when using the DataFrame() method.
#5 Creating a DataFrame from List of Dictionary and row index
import pandas as pd

d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]


df1 = pd.DataFrame(d1, index=['r1', 'r2'])
print('df1=\n',df1)

o/p:
df1=
age class name
r1 15 NaN abc
r2 16 5.0 def

6. Creating a DataFrame from List of Dictionary and specifying row / column index
Similar to the previous two examples we can use the index=[list_of_row_labels] and
columns=[list_of_column_labels] to specify the row index as well as the column index.

Here while specifying the column labels we have the flexibility of specifying only a limited list of
column names in the column list in which case only the columns apperaring in the list appear in the
DataFrame object.
Another flexibility is that if any additional column name is specified which does not exist in any of the
dictionary then that column is created in the DataFrame object and all the values appear as NaN under
that column.
#6 Creating a DataFrame from List of Dictionary and row / column index
import pandas as pd

d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]

df1 = pd.DataFrame(d1, index=['r1', 'r2'], columns=['name','age']) #column class is left out


print('df1=\n',df1)

df2 = pd.DataFrame(d1, index=['r1', 'r2'], columns=['name','age','marks']) # column marks is added


print('df2=\n',df2)

o/p:
df1=
name age
r1 abc 15
r2 def 16
df2=
name age marks
r1 abc 15 NaN
r2 def 16 NaN

7. Creating a DataFrame using csv files / Writing to csv file


A csv file can be imported directly to a DataFrame object using the read_csv() method. The read_csv()
method has many parameters to control the kind of data imported.

The parameter sep='char' can be used to specify the character used to separate the column values, by
default it is the comma(,).

The parameter index_col=int can be used to specify the the row labels are to be taken from which
column. An int is specified to highlight the int column number containing the row labels. The first
column has index 0, second column has index 1, and so on.

Similar to importing of data from a csv file, data from a DataFrame object can be exported to a csv file
using the to_csv() method. The to_csv() method has many parameters to control the kind of data
exported. The parameter index=False will not export the index as a column in the csv file. The
parameter header=False will omit writing of the column names to the csv file being exported.

Content of file'students.csv' Content of file 'newdata.csv'

name age regno name age hometown


abc 15 101 abc 15 lll
def 16 111 def 16 mmm
ghi 17 121 ghi 17 nnn
#7 Creating a DataFrame using csv files / Writing to csv file
import pandas as pd
df1 = pd.read_csv('students.csv')
print('df1=\n',df1)

df2 = pd.read_csv('newdata.csv',sep=',', index_col=0 )


print('df2=\n',df2)

df1.to_csv('newfile1.csv')
df1.to_csv('newfile2.csv',index=False, header=False)

o/p:
df1=
name age
0 abc 15
1 def 16
2 ghi 17
df2=
name age hometown
regno
101 abc 15 lll
111 def 16 mmm
121 ghi 17 nnn

Content of file'newfile1.csv' Content of file 'newfile2.csv'

name age abc 15

0 abc 15 def 16

1 def 16 ghi 17

2 ghi 17
Common properties/attributes of DataFrames
Assume DataFrame df1 is as defined below:
df1=
dict1={'students':['abc', 'def','ghi'], students marks sports
'marks': [24.5, 27.5, 30], I abc 24.5 cricket
'sports': ['cricket', 'badminton', 'football']} II def 27.5 badminton
df1=pd.DataFrame(dict1,index=['I','II','III']) III ghi 30.0 football
print('df1=\n',df1)

SrNo Attribute Description Example


1 index displays the index (row labels) of DataFrame print('index is:\n', df1.index)
o/p:
index is:
Index(['I', 'II', 'III'], dtype='object')
2 columns displays the column labels of the DataFrame print('columns are:\n', df1.columns)
o/p:
columns are:
Index(['students', 'marks', 'sports'], dtype='object')
3 axes Returns a list containing both the axes elements print('axes are:\n', df1.axes)
o/p:
axes are:
[Index(['I', 'II', 'III'], dtype='object'),
Index(['students', 'marks', 'sports'], dtype='object')]
4 dtypes Returns the dtype of data of each columns print("dtypes are:\n", df1.dtypes)
o/p:
dtypes are:
students object
marks float64
sports object
dtype: object
5 size Returns the number of elements in the object print('size are:\n', df1.size)
o/p:
size are:
9
6 shape Returns a tuple () representing the dimensions print('shape is :\n', df1.shape)
(rows,columns) of the DataFrame o/p:
shape is :
(3, 3)
7 ndim Returns an int representing the number of axes print('ndim is :\n', df1.ndim)
o/p:
ndim is :
2
8 empty Returns True/False to show if the DataFrame is print('Is DataFrame empty:\n', df1.empty)
empty o/p:
Is DataFrame empty:
False
9 T Diplays the Transpose of the DataFrame print('Transpose is:\n', df1.T)
o/p:
Transpose is:
I II III
students abc def ghi
marks 24.5 27.5 30
sports cricket badminton football
10 len Displays the number of rows of the DataFrame print('Number of rows of DataFrame is:', len(df1))
o/p:
Number of rows of DataFrame is: 3
DataFrame Operations
Assume DataFrame df1 is as defined below:
df1=
dict1={'students':['abc', 'def','ghi'], students marks sports
'marks': [24.5, 27.5, 30], I abc 24.5 cricket
'sports': ['cricket', 'badminton', 'football']} II def 27.5 badminton
df1=pd.DataFrame(dict1,index=['I','II','III']) III ghi 30.0 football
print('df1=\n',df1)

#1. Selecting/Accessing a single column


print("Students column is :\n", df1['students']) #using square brackets to access column

o/p:
Student column is :
I abc
II def
III ghi
Name: students, dtype: object

print("Marks column is :\n", df1.marks) #using dot notation to access a column

o/p:
Marks column is :
I 24.5
II 27.5
III 30.0
Name: marks, dtype: float64
The square bracket notation(df1['students'], df[2017]) can be used when the column names are
strings('students') or numbers(2017). The dot notation can only be used when the column name
is a string(df1.marks). Hence we use the square bracket notation in general for all cases.

#2. Selecting/Accessing Multiple columns


print("Students and Marks columns are:\n", df1[['students','marks']])
# use two square brackets only to access multiple columns
o/p:
Students and Marks columns are:
students marks
I abc 24.5
II def 27.5
III ghi 30.0

#3. Selecting subset of rows/columns from Dataframe using row names and column names
print('Displaying subset:\n', df1.loc['I':'II', 'students':'marks'])
o/p:
Displaying subset:
students marks
I abc 24.5
II def 27.5
<dataframe>.loc( <startrow> : <endrow> , <startcolumn> : <endcolumn> )
Used to access a subset of dataframe using row index name and column index name
#4. Selecting subset of rows/columns using row numbers and column numbers
print('Displaying subset using row and column index numbers:\n', df1.iloc[0:2, 1:3])

o/p:
Displaying subset using row and column index numbers:
marks sports
I 24.5 cricket
II 27.5 badminton
<dataframe>.iloc( <startrow index> : <endrow index> : <step value> ,
<startcolumn index> : <endcolumn index> :<step value> )
Used to access a subset of dataframe using row index number and column index number using
row slice and column slice. If step value is not written it is assumed to be 1.

#5. Selecting/Accessing individual value using column name and row name
print("Value in row I column student is:\n", df1.students['I'])
# after dot there is column name and inside square bracket the row name
o/p:
Value in row I column student is:
abc

#6. Selecting/Accessing individual value using column name and row number
print("Value in row 0 column sports is:\n", df1.sports[0])
# after dot there is column name and inside square bracket the row number
o/p:
Value in row 0 column sports is:
cricket

#7. Selecting/Accessing individual value using at attribute i.e. using row name and column
name
print("Accessing individual value using at attribute:\n", df1.at['II','students'])
# after .at there is square bracket and then inside it row name and column name
o/p:
Accessing individual value using at attribute:
def

#8. Selecting/Accessing individual value using iat attribute i.e. using numeric row index and
column index
print("Accessing individual value using iat attribute:\n", df1.iat[2,2])
# after .iat there is square bracket and then inside it row number and column number
o/p:
Accessing individual value using iat attribute:
football

Difference between at, iat, loc, iloc:


at used to access a single element of a DataFrame using row index name and column index name
iat - used to access a single element of a DataFrame using row index number and column index
number
loc used to access a group of rows and columns using row index name and column index name
iloc - used to access a group of rows and columns using row index number and column index number
#9. Changing a single data value
df1.students['I']='xyz'
df1.sports[0]='chess'
df1.at['II','students']='pqr'
df1.iat[2,2]='carrom'
print('df1(After updating values)=\n', df1)

o/p:
df1(After updating values)=
students marks sports
I xyz 24.5 chess
II pqr 27.5 badminton
III ghi 30.0 carrom
All the four methods described previously to access individual values of a DataFrame can be
used to also change an individual value of a DataFrame.

#10. Adding / Changing a column (same value in all rows)


df1['hometown'] = 'otp' # column named hometown is added to the DataFrame
print('df1(After adding column hometown)=\n', df1)

o/p:
df1(After adding column hometown)=
students marks sports hometown
I xyz 24.5 chess otp
II pqr 27.5 badminton otp
III ghi 30.0 carrom otp
The value 'otp' is appearing across all rows of the DataFrame.

#11. Adding / Changing a column (different values in all rows)


df1['hometown'] = ['ottapalam', 'shoranur', 'palakkad']
print('df1(After updating column hometown)=\n', df1)

o/p:
df1(After updating column hometown)=
students marks sports hometown
I xyz 24.5 chess ottapalam
II pqr 27.5 badminton shoranur
III ghi 30.0 carrom palakkad
The value ['ottapalam', 'shoranur', 'palakkad'] is appearing across the rows of the DataFrame.

#12. Adding / Changing row (same value in all columns)


df1.at['IV', :] = 'rrr'
print('df1(After adding row IV )=\n', df1)

o/p:
df1(After adding row IV )=
students marks sports hometown
I xyz 24.5 chess ottapalam
II pqr 27.5 badminton shoranur
III ghi 30 carrom palakkad
IV rrr rrr rrr rrr
The value 'rrr' is appearing across the columns of the DataFrame.

The functions at and loc can only be used to add as well as modify an entire row, since they
only can be used to access a row label.

#13. Adding / Changing row (different values in all columns)


df1.loc['IV',: ] = ['mno', 25.5, 'football', 'delhi']
print('df1(After updating row 3 )=\n', df1)

o/p:
df1(After updating row 3 )=
students marks sports hometown
I xyz 24.5 chess ottapalam
II pqr 27.5 badminton shoranur
III ghi 30 carrom palakkad
IV mno 25.5 football delhi
The value ['mno', 25.5, 'football', 'delhi'] is appearing across the columns of the DataFrame.

The functions at and loc can only be used to add as well as modify an entire row, since they
only can be used to access a row label.

#14. Deleting a column


del df1['hometown']
print('df1(After deleting column hometown ")=\n', df1)

df1.drop(['sports'], axis=1, inplace=True)


print('df1(After deleting column sports )=\n', df1)

s1=df1.pop('marks')
print('Deleted column is:\n', s1)
print('df1(After deleting column marks )=\n', df1)

o/p:
df1(After deleting column hometown )=
students marks sports
I xyz 24.5 chess
II pqr 27.5 badminton
III ghi 30 carrom
IV mno 25.5 football
df1(After deleting column sports )=
students marks
I xyz 24.5
II pqr 27.5
III ghi 30
IV mno 25.5
Deleted column is:
I 24.5
II 27.5
III 30
IV 25.5
Name: marks, dtype: object
df1(After deleting column marks )=
students
I xyz
II pqr
III ghi
IV mno
There are three ways of deleting a column of a DataFrame:
a) using the python del command as:
del dataframeobject[columnname]
b) using the dataframe drop() method
The drop command can be used to delete rows (axis=0) or columns(axis=1).
The first parameter is a list containing either the row index names or the column index
names.
The parameter inplace=True is used to modify/delete the dataframe df1 itself. If this
parameter is not specified or is False then the dataframe df1 is not modified, instead it
returns a new dataframe with the row or column deleted.
c) Using the pop('columnname') method
The pop() method is used to delete a single column from a DataFrame. In addition, the
column that was deleted is returned back as a Series object.

#15. Deleting a row


df1.drop(['II','III'], axis=0,inplace=True) #first parameter contains multiple row labels to be deleted
print('df1(After deleting row 1 and 2 )=\n', df1)

o/p:
df1(After deleting row 1 and 2 )=
students marks sports
I xyz 24.5 chess
IV mno 25.5 football
The drop command can be used to delete rows (axis=0) or columns(axis=1).

If multiple rows are to be deleted then the first parameter must contain the list of row names to be
deleted.

#16. head() and tail() functions


The head() function is used to retrieve the top rows of a DataFrame whereas the tail() function is
used to retrieve the bottom rows of a DataFrame. If no parameter is passed, then it retrieves the top
5 or bottom 5 rows.

If a positive value, n, is passed to the head function then it retrieves the top n rows. If a negative n is
passed to the head function, then it returns all the rows except the last n rows.

Similarly, if a positive value, n, is passed to the tail function then it retrieves the bottom n rows of the
DataFrame. If a negative, n, is passed to the DataFrame then all the rows except the first n rows are
retrieved back.

These functions are useful for quickly verifying the data for example after sorting or adding rows.

#10 head and tail functions


import pandas as pd
import numpy as np

d={'students':['a', 'b','c','d','e','f','g','h','i','j'],
'marks': [25,21,8,9,15,29,np.NaN,25,24,30]}
df1=pd.DataFrame(d)
df2=df1.head()
print('df2=\n',df2)

df3=df1.head(-7)
print('df3=\n',df3)

print(df1.tail(2))

o/p:
df2=
students marks
0 a 25.0
1 b 21.0
2 c 8.0
3 d 9.0
4 e 15.0
df3=
students marks
0 a 25.0
1 b 21.0
2 c 8.0
students marks
8 i 24.0
9 j 30.0

#17. Boolean Indexing


The following three methods of accessing DataFrame elements, can be passed a Boolean array to
select specific rows of the DataFrame:
a) index operator[]
b) loc property
c) iloc property
This method of accessing the rows of the DataFrame based on a Boolean array is known as Boolean
Indexing.

The length of the Boolean array passed must match the number or rows/indexes of the DataFrame
otherwise an error is thrown. The Boolean array can also be a Series object which can be derived
from applying a relational operator to one or more columns of the DataFrame. Different relational
expressions can be combined using the bitwise and (&), or (|), not (~) operators. When using the
bitwise operators the individual relational expressions must be enclosed in parentheses () as the
bitwise operators have higher precedence than the relational operators.

#11 Boolean Indexing


import pandas as pd
import numpy as np

d={'students':['a', 'b','c','d','e','f','g','h','i','j'],
'marks': [25,21,8,9,15,29,np.NaN,25,24,30],
'hobby': ['mm','nn','oo','pp','qq','rr','ss','t','uu','vv']}
df1=pd.DataFrame(d)

df2=df1[[True, True, False,False,False,False,False,False,True, True]]


print('df2=\n',df2)

df3=df1.loc[[True, True, False,False,False,False,False,False,False, False]]


print('df3=\n',df3)

df4=df1.iloc[[False,False, False,False,False,False,False,False,True, True]]


print('df4=\n',df4)

#display the details of student having marks > 25


df5=df1[df1['marks'] >25]
print('df5=\n',df5)

#display details of students having marks in range 20-25


df6=df1[(df1['marks'] >20) & (df1['marks'] <25)]
print('df6=\n',df6)

o/p:
df2=
students marks hobby
0 a 25.0 mm
1 b 21.0 nn
8 i 24.0 uu
9 j 30.0 vv
df3=
students marks hobby
0 a 25.0 mm
1 b 21.0 nn
df4=
students marks hobby
8 i 24.0 uu
9 j 30.0 vv
df5=
students marks hobby
5 f 29.0 rr
9 j 30.0 vv
df6=
students marks hobby
1 b 21.0 nn
8 i 24.0 uu

Iterating over a DataFrame


Generally for a DataFrame if some columns need to be worked on then the columns are extracted using
df[column_name] or any such method. And if some processing on rows need to be performed, then the df.loc
or df.iloc commands are used. We must use these methods only for processing DataFrame as far as possible as
they are optimized for performance.
In the rare occasion that we need to iterate over the rows or iterate over the columns of a DataFrame, then
only the iteration methods over a DataFrame should be used. The following methods can be used to iterate
over a DataFrame:
1. Iterate directly over a DataFrame
2. Use the df.iteritems() or df.items() method
3. Use the df.iterrows() method
4. Use the df.itertuples() method

Usually when using any of the iteration methods, we work on a copy of the DataFrame. So we must not
modify any of the DataFrame's values as those are not reflected/updated in the original DataFrame.
#18. Iterating directly over a DataFrame
Iterating directly over a DataFrame gives the column names.
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Iterating over DataFrame')
for i in df:
print(i)
o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Iterating over DataFrame
name
age
hobby

#19. Use the df.iteritems() or df.items() method


Using the df.iteritems() or the df.items() method has the same effect. It returns back two objects -
the first one is the column name and the second one is a Series object having all the values of that
particular column.
#Using iteritems
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using iteritems')
for cname,cseries in df.items(): #df.iteritems() also gives same results
print('cname:',cname)
print('cseries:\n',cseries)

o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Using iteritems
cname: name
cseries:
s1 abc
s2 def
s3 ghi
Name: name, dtype: object
cname: age
cseries:
s1 19
s2 20
s3 21
Name: age, dtype: int64
cname: hobby
cseries:
s1 reading
s2 playing
s3 gardening
Name: hobby, dtype: object

#20. Use the df.iterrows() method


Using the df.iterrows() method we get back two objects - the first object is the row label or index and
the second object is a Series object containing the elements of one particular row at each iteration.
The Series object has index as the column name and the value of Series object is the value under that
particular column for that particular row.
#Using iterrows
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using iterrows')
for rname,rseries in df.iterrows():
print('rname:',rname)
print('rseries:\n',rseries)

o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Using iterrows
rname: s1
rseries:
name abc
age 19
hobby reading
Name: s1, dtype: object
rname: s2
rseries:
name def
age 20
hobby playing
Name: s2, dtype: object
rname: s3
rseries:
name ghi
age 21
hobby gardening
Name: s3, dtype: object

#21. Use the df.itertuples() method


Using the df.itertuples() method we get back a named tuple for each row of the DataFrame. [ Note:
Named tuple is not there in syllabus ]

The first element of the named tuple is the row label and the remaining elements are the values
under different columns for that particular row.
#Using itertuples
import pandas as pd

d={ 'name': ['abc','def','ghi'],


'age': [19,20,21],
'hobby':['reading','playing','gardening']}
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using itertuples')
for r in df.itertuples():
print(r)

o/p:
name age hobby
s1 abc 19 reading
s2 def 20 playing
s3 ghi 21 gardening
Using itertuples
Pandas(Index='s1', name='abc', age=19, hobby='reading')
Pandas(Index='s2', name='def', age=20, hobby='playing')
Pandas(Index='s3', name='ghi', age=21, hobby='gardening')

You might also like