XII IP Ch 1 Python Pandas - I Series
XII IP Ch 1 Python Pandas - I Series
Pandas
• Pandas is a software library written for Python
programming language, used for data analysis.
import pandas as pd
Data Structures in Pandas
• A data structure is a way of storing and
organizing data in a computer so that it
can be accessed and worked with in an
appropriate way.
• Pandas provides three data structures:
– 1D Series
– 2D DataFrame
– 3D Panel Data System
Series
• A Series is a one dimensional data
structure which contains any data
type (int, string, float, etc) of
homogenous data (same type)
• It contains
– A sequence of values (actual data)
– Associated data labels or index
DataFrame
• It is a data structure, which stores data in the
two-dimensional form.
• Columns may store values of different
datatype.
• A single column will have the same type of
values.
Series vs. DataFrame
SERIES DATAFRAME
Dimension 1 Dimensional 2 Dimensional
Type of Data Homogenous, all Heterogeneous, can
values of same have values of
data type. different data types.
Mutable Value mutable Value mutable, size
mutable
Example 0 47 Name Class Marks
1 58 0 Arpit XII 47
2 69 1 Jai XII 58
3 85 2 Piyush XII 69
4 74 3 Aditya XII 85
Series
Creating Empty Series
import pandas as pd
s=pd.Series()
print(s)
Method 1
Creating Series using List/Tuple
import pandas as pd 0 47
1 58
S=pd.Series([47,58,69,85,74])
2 69
print(S) 3 85
4 74
dtype: int64
import pandas as pd 0 47
A=[47,58,69,85,74] 1 58
S=pd.Series(A) 2 69
3 85
print(S)
4 74
dtype: int64
Mentioning Index while creating Series
import pandas as pd
S=pd.Series([47,58,69,85,74], ['A', 'B', 'C', 'D', 'E'])
print(S) A 47
B 58
C 69
D 85
E 74
dtype: int64
Method 2
S=pd.Series([47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])
Method 3
S=pd.Series(data=[47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])
Method 4
S=pd.Series(index=['A', 'B', 'C', 'D', 'E'], data=[47,58,69,85,74])
A 47
B 58
C 69
D 85
E 74
dtype: int64
Q1. WAP to create the
Abhinav 20
given Series. Udit 22
Ansh 25
Jai 18
Kunal 23
Arushi 22
dtype: int64
import pandas as pd
marks=[20,22,25,18,23,22]
names=['Abhinav','Udit','Ansh','Jai','Kunal','Arushi']
S=pd.Series(marks, index=names)
print(S)
Q2. WAP to create the
Q1 50000
given Series using a
Q2 47000
Dictionary. Q3 52500
Q4 36000
dtype: int64
import pandas as pd
d={'Q1':50000,'Q2':47000,'Q3':52500,'Q4':36000}
s=pd.Series(d)
print(s)
Method 6
Creating Series using range() function
import pandas as pd
s=pd.Series(range(101,106), index=range(1,6))
print(s)
1 101
2 102
3 103
4 104
5 105
dtype: int64
Method 7
Creating Series using Numpy Array (ndarray)
import numpy as np
import pandas as pd
n=np.array([2,4,6,8])
0 2
s=pd.Series(n) 1 4
print(s) 2 6
3 8
dtype: int64
Method 8
Creating Series using Scalar/ Constant value
import pandas as pd 1 55
s=pd.Series(55, index=[1,2,3,4,5]) 2 55
print(s) 3 55
4 55
5 55
dtype: int64
Using mixed datatypes while creating
Series
• The Series can store only values of one datatype.
• If the values given at the time of creating a Series are
of different types, it takes them according to the
given precedence.
• String (Object) float int 0 10.0
1 20.0
import pandas as pd
2 25.6
3 30.0
s = pd.Series([10,20,25.6,30,40])
4 40.0
print(s)
dtype: float64
Specifying the Datatype while creating
the Series
Jan 31.0
Feb 28.0
Mar 31.0
import pandas as pd Apr 30.0
May 31.0
days=[31,28,31,30,31]
dtype: float64
mon=['Jan','Feb','Mar','Apr','May']
s=pd.Series(data=days, index=mon, dtype=float)
print(s)
Specify missing values
• The missing values are denoted by the
keyword 'None' in Python.
• Adding None value to the Series, the datatype
is changed to float.
0 10.0
import pandas as pd 1 20.0
2 NaN
s=pd.Series([10,20,None,40,None]) 3 40.0
print(s) 4 NaN
dtype: float64
Specifying duplicate indexes
• While creating Series object, there is no
compulsion for uniqueness.
• There can be duplicate entries in the index.
1 10
import pandas as pd 2 20
A=[10,20,30,40,50,60] 3 30
B=[1,2,3,1,3,5] 1 40
s=pd.Series(A,index=B) 3 50
print(s) 5 60
dtype: int64
Specifying data/ indexes using a loop
import pandas as pd
s=pd.Series(range(1,15,3), index=[x for x in 'abcde'])
print(s)
a 1
b 4
c 7
d 10
e 13
dtype: int64
Getting number of rows and count of non-NA
values in a Series
• len() function can be used to find the number
of rows in a Series.
print(len(s)) 5
print(s.count()) 3
Accessing Data from Series
• Data can be accessed from a Series using the
user-defined labels or in-built indexes.
Indexing Slicing
(Single value) (Part of a Series)
In-built (0,1,2…) in-built (only +ve) +ve, -ve
print(S[-3:-1]) 2 30
3 40
Indexing Slicing
(Single value) (Part of a Series)
S= print(S[0]) 10 print(S[0:2]) A 10
pd.Series([10,20,3 B 20
0,40,50], print(S[-1]) 50
index=['A', 'B', 'C', print(S[-3:-1]) C 30
'D', 'E']) print(S['D']) 40 D 40
print(S['B':'D']) B 20
C 30
D 40
Attributes of Series
Attribute Description
index The index (row labels)
values NumPy representation of the Series
dtype dtype of the data
shape a tuple representing the dimensions
nbytes returns the number of bytes
ndim number of dimensions
size number of elements
hasnans returns True if there are any NaN values, else False
empty True / False (Series is empty or not)
name returns the name of the Series, can be changed
Application of Attributes
import pandas as pd
s=pd.Series({'Jan':31 , 'Feb':28 , 'Mar':31 , 'Apr':30 })
s=pd.Series([10,20,30,40,50,60], index=['A','B','C','D','E','F'])
Vector operations
• If we apply a function or expression then it is
individually applied on each item of the
object.
Arithmetic
• When you perform arithmetic operations on
two Series objects, the data is aligned on the
basis of matching indexes (called Data
Alignment) and then performed arithmetic.
• For non-overlapping indexes the arithmetic
operation results as NaN.
Filtering Entries
A=pd.Series([10,20,30,40,50], index=[11,12,13,14,15])
print(A)
Jan 31
Feb 29
Mar 31
Apr 30
May 31
dtype: int64
Removing Values from the Series
Temporary Permanent
Jan 31 Jan 31
Feb 29 Feb 29
Mar 31 Mar 31
May 31 May 31
dtype: int64 dtype: int64
Viewing Values
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})