0% found this document useful (0 votes)
12 views

XII IP Ch 1 Python Pandas - I Series

Pandas is a powerful open-source library for data analysis in Python, created by Wes McKinney, that supports various data formats and operations on datasets. It provides key data structures like Series (1D) and DataFrame (2D) for organizing and manipulating data, along with methods for data selection, reshaping, and visualization. The document also covers installation, creating Series, accessing data, and performing operations on Series objects.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

XII IP Ch 1 Python Pandas - I Series

Pandas is a powerful open-source library for data analysis in Python, created by Wes McKinney, that supports various data formats and operations on datasets. It provides key data structures like Series (1D) and DataFrame (2D) for organizing and manipulating data, along with methods for data selection, reshaping, and visualization. The document also covers installation, creating Series, accessing data, and performing operations on Series objects.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Python Pandas - I

Pandas
• Pandas is a software library written for Python
programming language, used for data analysis.

• It is a fast, powerful, flexible and easy to use


open source data analysis and manipulation
tool.

• The main author of Pandas is Wes McKinney.


Key features of Pandas
• It can read or write in many different data formats.
• It can calculate in all the possible ways data is
organized i.e. across rows and down columns.
• It can easily select subsets of data from bulky subsets,
combine multiple datasets and also find and fill missing
data.
• It allows to apply operations to independent groups
within the data.
• It supports reshaping of data into different forms.
• It supports advanced time-series functionality.
• It supports visualization by integrating matplotlib and
seaborn libraries.
Installing Pandas
• To install the pandas library, the
following command is used

pip install pandas


Working in Pandas
• To work in Pandas we need to import
pandas library in the Python
environment.

import pandas as pd
Data Structures in Pandas
• A data structure is a way of storing and
organizing data in a computer so that it
can be accessed and worked with in an
appropriate way.
• Pandas provides three data structures:
– 1D Series
– 2D DataFrame
– 3D Panel Data System
Series
• A Series is a one dimensional data
structure which contains any data
type (int, string, float, etc) of
homogenous data (same type)
• It contains
– A sequence of values (actual data)
– Associated data labels or index
DataFrame
• It is a data structure, which stores data in the
two-dimensional form.
• Columns may store values of different
datatype.
• A single column will have the same type of
values.
Series vs. DataFrame
SERIES DATAFRAME
Dimension 1 Dimensional 2 Dimensional
Type of Data Homogenous, all Heterogeneous, can
values of same have values of
data type. different data types.
Mutable Value mutable Value mutable, size
mutable
Example 0 47 Name Class Marks
1 58 0 Arpit XII 47
2 69 1 Jai XII 58
3 85 2 Piyush XII 69
4 74 3 Aditya XII 85
Series
Creating Empty Series

import pandas as pd
s=pd.Series()
print(s)
Method 1
Creating Series using List/Tuple

import pandas as pd 0 47
1 58
S=pd.Series([47,58,69,85,74])
2 69
print(S) 3 85
4 74
dtype: int64
import pandas as pd 0 47
A=[47,58,69,85,74] 1 58
S=pd.Series(A) 2 69
3 85
print(S)
4 74
dtype: int64
Mentioning Index while creating Series

import pandas as pd
S=pd.Series([47,58,69,85,74], ['A', 'B', 'C', 'D', 'E'])
print(S) A 47
B 58
C 69
D 85
E 74
dtype: int64
Method 2
S=pd.Series([47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])

Method 3
S=pd.Series(data=[47,58,69,85,74], index=['A', 'B', 'C', 'D', 'E'])

Method 4
S=pd.Series(index=['A', 'B', 'C', 'D', 'E'], data=[47,58,69,85,74])

Note: 1. Tuples can be used in place of lists.


2. If the indexes are not given, it uses the default indexes.
Method 5
Creating Series using a Dictionary

S=pd.Series({'A':47 , 'B':58 , 'C':69 , 'D':85 , 'E':74})

A 47
B 58
C 69
D 85
E 74
dtype: int64
Q1. WAP to create the
Abhinav 20
given Series. Udit 22
Ansh 25
Jai 18
Kunal 23
Arushi 22
dtype: int64
import pandas as pd
marks=[20,22,25,18,23,22]
names=['Abhinav','Udit','Ansh','Jai','Kunal','Arushi']
S=pd.Series(marks, index=names)
print(S)
Q2. WAP to create the
Q1 50000
given Series using a
Q2 47000
Dictionary. Q3 52500
Q4 36000
dtype: int64

import pandas as pd
d={'Q1':50000,'Q2':47000,'Q3':52500,'Q4':36000}
s=pd.Series(d)
print(s)
Method 6
Creating Series using range() function

import pandas as pd
s=pd.Series(range(101,106), index=range(1,6))
print(s)
1 101
2 102
3 103
4 104
5 105
dtype: int64
Method 7
Creating Series using Numpy Array (ndarray)

import numpy as np
import pandas as pd
n=np.array([2,4,6,8])
0 2
s=pd.Series(n) 1 4
print(s) 2 6
3 8
dtype: int64
Method 8
Creating Series using Scalar/ Constant value

import pandas as pd 1 55
s=pd.Series(55, index=[1,2,3,4,5]) 2 55
print(s) 3 55
4 55
5 55
dtype: int64
Using mixed datatypes while creating
Series
• The Series can store only values of one datatype.
• If the values given at the time of creating a Series are
of different types, it takes them according to the
given precedence.
• String (Object)  float  int 0 10.0
1 20.0
import pandas as pd
2 25.6
3 30.0
s = pd.Series([10,20,25.6,30,40])
4 40.0
print(s)
dtype: float64
Specifying the Datatype while creating
the Series
Jan 31.0
Feb 28.0
Mar 31.0
import pandas as pd Apr 30.0
May 31.0
days=[31,28,31,30,31]
dtype: float64
mon=['Jan','Feb','Mar','Apr','May']
s=pd.Series(data=days, index=mon, dtype=float)
print(s)
Specify missing values
• The missing values are denoted by the
keyword 'None' in Python.
• Adding None value to the Series, the datatype
is changed to float.
0 10.0
import pandas as pd 1 20.0
2 NaN
s=pd.Series([10,20,None,40,None]) 3 40.0
print(s) 4 NaN
dtype: float64
Specifying duplicate indexes
• While creating Series object, there is no
compulsion for uniqueness.
• There can be duplicate entries in the index.

1 10
import pandas as pd 2 20
A=[10,20,30,40,50,60] 3 30
B=[1,2,3,1,3,5] 1 40
s=pd.Series(A,index=B) 3 50
print(s) 5 60
dtype: int64
Specifying data/ indexes using a loop

import pandas as pd
s=pd.Series(range(1,15,3), index=[x for x in 'abcde'])
print(s)
a 1
b 4
c 7
d 10
e 13
dtype: int64
Getting number of rows and count of non-NA
values in a Series
• len() function can be used to find the number
of rows in a Series.

print(len(s)) 5

• count() can be used to count the non-NAN


values.

print(s.count()) 3
Accessing Data from Series
• Data can be accessed from a Series using the
user-defined labels or in-built indexes.
Indexing Slicing
(Single value) (Part of a Series)
In-built (0,1,2…) in-built (only +ve) +ve, -ve

S = pd.Series( print(S[0]) 10 print(S[0:2]) 0 10


[10,20,30,40,50]) print(S[-1]) Error 1 20

print(S[-3:-1]) 2 30
3 40
Indexing Slicing
(Single value) (Part of a Series)

User-defined user-defined +ve, -ve


(numeric)
print(S[3]) 10 print(S[0:2]) 3 10
S = pd.Series( 4 20
[10,20,30,40,50], print(S[0]) Error
index=[3,4,5,6,7]) print(S[-3:-1]) 5 30
print(S[-1]) Error 6 40
Indexing Slicing
(Single value) (Part of a Series)

User-defined(text) user-defined, +ve, -ve user-defined, +ve, -ve

S= print(S[0]) 10 print(S[0:2]) A 10
pd.Series([10,20,3 B 20
0,40,50], print(S[-1]) 50
index=['A', 'B', 'C', print(S[-3:-1]) C 30
'D', 'E']) print(S['D']) 40 D 40

print(S['B':'D']) B 20
C 30
D 40
Attributes of Series
Attribute Description
index The index (row labels)
values NumPy representation of the Series
dtype dtype of the data
shape a tuple representing the dimensions
nbytes returns the number of bytes
ndim number of dimensions
size number of elements
hasnans returns True if there are any NaN values, else False
empty True / False (Series is empty or not)
name returns the name of the Series, can be changed
Application of Attributes
import pandas as pd
s=pd.Series({'Jan':31 , 'Feb':28 , 'Mar':31 , 'Apr':30 })

print(s.index) Index(['Jan', 'Feb', 'Mar', 'Apr'], dtype='object')


print(s.values) [31 28 31 30]
print(s.dtype) int64
print(s.shape) (4,)
print(s.nbytes) 32
print(s.ndim) 1
print(s.size) 4
print(s.hasnans) False
print(s.empty) False
print(s.name) None
s.name='Days'
print(s.name) Days
Operations on Series Object
• Modifying Elements of Series Object
• The head() and tail() Functions
• Vector operations
• Arithmetic
• Filtering Entries
• Sorting Series Values
• Adding & Removing values from Series Object
Modifying Elements of Series Object
Renaming indexes
The head() and tail() Functions
• The head() function returns the first n rows and
tail() function returns the last n rows.
• If n is not specified, the default value is 5.

s=pd.Series([10,20,30,40,50,60], index=['A','B','C','D','E','F'])
Vector operations
• If we apply a function or expression then it is
individually applied on each item of the
object.
Arithmetic
• When you perform arithmetic operations on
two Series objects, the data is aligned on the
basis of matching indexes (called Data
Alignment) and then performed arithmetic.
• For non-overlapping indexes the arithmetic
operation results as NaN.
Filtering Entries
A=pd.Series([10,20,30,40,50], index=[11,12,13,14,15])

Performs vector Returns filtered result,


operations and i.e. only which fulfill
results True/ False the condition
Sorting Series Values
• You can sort the Series on the basis of values
or indexes.
• To sort on the basis of values
seriesobject.sort_values([ascending=True|False])

• To sort on the basis of indexes


seriesobject.sort_index([ascending=True|False])
Adding Values in the Series
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})

A['Feb']=29 # modifies the value as index exists


A['May']=31 # adds a value

print(A)

Jan 31
Feb 29
Mar 31
Apr 30
May 31
dtype: int64
Removing Values from the Series
Temporary Permanent

A.drop('Apr') A.drop('Apr', inplace=True)


OR
A = A.drop('Apr')
print(A)

Jan 31 Jan 31
Feb 29 Feb 29
Mar 31 Mar 31
May 31 May 31
dtype: int64 dtype: int64
Viewing Values
A=pd.Series({'Jan':31, 'Feb':28, 'Mar':31, 'Apr':30})

• Using user-defined Index


print(A['Mar'])
print(A.loc['Mar'])

• Using in-built Index


print(A[0])
print(A.iloc[0])

You might also like