0% found this document useful (0 votes)
9 views54 pages

1 (1)

Uploaded by

Naren Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views54 pages

1 (1)

Uploaded by

Naren Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

1/05/20

INTRODUCING PYTHON PANDAS


PANDAS
•Pandas or Python Pandas is Python’s library for data analysis.

•Pandas has derived its name from “Panel data system”, (term
used for structured data sets.

• It is useful for data analysis and manipulation.


Data analysis: refers to process of evaluating big data sets
using statistical tools
to discover useful information and conclusions to support
business decision –making.
• Pandas provide powerful and easy-to-use data structures, as
well as the means to quickly perform operations on these
structures.
WHY Pandas ?
It is capable of many tasks including
• It can read or write in many different data formats( integer,
float, double etc)

•It can calculate in all ways data is organised i.e across rows
and columns.

•It can easly select subsets of data from bulky data sets and
even combine multiple datasets together.

• It has functionality to find and fill missing data.


•It allows you to apply operations to independent groups
within data.

•It supports reshaping of data into different forms.

•It supports advance time series functionality(time series


forecasting is the use of a model to predict future values
based on previously observed values.)

•It supports data visualization.


2/05/20
DATA STRUCTURE IN PANDAS
DATA STRUCTURE:-
It refers to specialized way of storing and organizing data in a computer so that it can
be accessed and we can apply a specific type of functionality on them as per
requirements.

Pandas deals with 3 data structure


1. Series
2. Data Frame
3. Panel
We are having only Series and data frame in our syllabus
SERIES

Series :- Series is a one-dimensional array like structure with homogeneous


data(meaning –of the same kind), which can be used to handle and manipulate
data. It is special because of its index attribute, which has incredible(Unbelievable)
Functionality and is heavily mutable.

It has two parts:--


1. Data part(An array of actual data)
2. Associated index with data( associated array of indexes or data labels)
e.g---
Index Data
0 10
1 15
2 18
3 22
•Pandas data structures is enhanced versions of NumPy
structured array.

•FOR WORKING IN PANDAS WE GENERALLY IMPORT


BOTH PANDAS AND NUMPY LIBRARIES

• NumPy is used because in Pandas’ some function


return result in form of NumPy arrays(Pandas library’s
data manipulation capabilities have been built over
NumPy library)
04/05/20

CREATION OF SERIES FROM


•Ndarray
•Dictionary

•Scalar value
5/05/20

USE OF MATHEMATICAL FUNCTION TO CREATE DATA ARRAY IN Series().


•The Series( ) allows us to define a function that can calculate values for data
sequence.

• eg

import pandas as pd

import numpy as np

a=np.arange(9,13)

print (a)
[ 9 10 11 12]

S=pd.Series(index=a,data=a*2)

S
Out[6]:
9 18
10 20
11 22
12 24
dtype: int32
6/05/20

SERIES OBJECT ATTRIBUTES

SERIES ATTRIBUTES

• When we create Series all information related to it (such as


size, its datatype etc) is available through attributes .

•We can use these attributes in the following format to get


information about the Series object.
<series object>.<attribute name>
ATTRIBUTE DESCRIPTION
Series.index The index(axis labels) of the series
s.index
RangeIndex(start=0, stop=4, step=1)
Series.values Return Series as ndarray ndarray-like depending
on the dtype
s.Values
array([2, 6, 7, 9])
Series.dtype Return the dtype object of the underlying data
s.dtype
dtype('int32')
Series.size Return the number of elements in the
underlying data
print(s.size)
4
7/05/20

Series.itemsize Return the size of the dtype of the item of the


underlying data
s.Itemsize
4
Series.nbytes Return the number of bytes in the underlying data
print(s.nbytes)
16
(nbytes is equal to the size*itemsize)
Series.ndim Return the number of dimensions of the underlying
data
s.ndim
Out[6]: 1
ATTRIBUTE DESCRIPTION

Series.hasnans Return True if there are NaN values;


otherwise return False
s.hasnans
False

Series.empty Return True if the Series object is empty,


false otherwise
s.empty
Out[8]: False
import pandas as pd
obj1=pd.Series([])
obj1.empty
Out[14]: True
7/05/20

Series.itemsize Return the size of the dtype of the item of the


underlying data
s.Itemsize
4
Series.nbytes Return the number of bytes in the underlying data
print(s.nbytes)
16
(nbytes is equal to the size*itemsize)
Series.ndim Return the number of dimensions of the underlying
data
s.ndim
Out[6]: 1
ATTRIBUTE DESCRIPTION

Series.hasnans Return True if there are NaN values;


otherwise return False
s.hasnans
False

Series.empty Return True if the Series object is empty,


false otherwise
s.empty
Out[8]: False
import pandas as pd
obj1=pd.Series([])
obj1.empty
Out[14]: True
8/05/20

ACCESSING A SERIES OBJECT AND ITS ELEMENTS

After creating Series type object, we can access it in many


ways. We can access its

• indexes separately

•Its data separately

•Access individual elements and slices


1. Accessing individual elements
• To access individual elements of a series object, we
can give its index in square brackets along with its
name
eg Series object name [valid index]
2. Extracting Slices from Series Object
• We can extract slices too from a Series object .
• Slicing is a powerful way to retrieve subsets of data from a
pandas object.
• Slicing takes place position wise and not the index wise in
a series object.
Eg obj1 position Index Data
0 Feb 28
1 Jan 31
S[1:]
S[1:3]
9/05/20

OPERATIONS ON SERIES OBJECT

After creating Series type object, we can perform various


types of operations on pandas SERIES OBJECTS.

• Modifying Elements of Series Object

• The head() and tail() functions

•Vector Operations on Series Objects

•Arithmetic on Series objects

•Filtering Entries
1. Modifying Elements of Series Object
The data values of a Series object can be easily
modified through item assignment

eg (a) Series object[index]= newvalue


above assignment will change the data value of
the given index in Series object.

(b) Series object[star:stop]=newvalue


above assignment will replace all the values
falling in given slice
Please note that Series object’s values can be modified but
size cannot. So we can say that Series object are value-
mutable but size-immutable objects.
11/05/20

OPERATIONS ON SERIES OBJECT

After creating Series type object, we can perform various


types of operations on pandas SERIES OBJECTS.

• Modifying Elements of Series Object

• The head() and tail() functions

•Vector Operations on Series Objects

•Arithmetic on Series objects

•Filtering Entries
The head() and tail() functions

head():- It is used to access the first n rows of a Series.


pandas object.head()

tail():- returns last n rows from a pandas object.


pandas object.head()
import pandas as pd

s=pd.Series([2,3,21,12,31,7,8])

s
Out[3]:
0 2
1 3
2 21
3 12
4 31
5 7
6 8
dtype: int64
s
Out[7]:
0 2
1 3
2 21
3 12
4 31
5 7
6 8
dtype: int64

s.head(4)
Out[8]:
0 2
1 3
2 21
3 12
dtype: int64
import pandas as pd

s=pd.Series([2,3,21,12,31,7,8])

s
Out[3]:
0 2
1 3
2 21
3 12
4 31
5 7
6 8
dtype: int64
s.tail(3)
Out[9]:
4 31
5 7
6 8
dtype: int64
VECTOR OPERATIONS ON SERIES OBJECTS

import pandas as pd
s+2 s*3
s=pd.Series([2,3,21,1 Out[10]: Out[11]:
2,31,7,8]) 0 4 0 6
1 5 1 9
s 2 23 2 63
Out[3]: 3 14 3 36
0 2 4 33 4 93
1 3 5 9 5 21
2 21 6 10 6 24
3 12 dtype: int64
4 31
5 7
6 8
dtype: int64
import pandas as pd

s=pd.Series([2,3,21,1
2,31,7,8])

s s=s**2
Out[3]: Out[16]:
0 2 0 4
1 3 1 9
2 21 2 441
3 12 3 144
4 31 4 961
5 7 5 49
6 8 6 64
dtype: int64 dtype: int64
Filtering Entries

import pandas as pd s>15


s[s>15]
Out[12]:
s=pd.Series([2,3,21,1 Out[17]:
0 False
2,31,7,8]) 2 441
1 False
3 144
2 True
s 4 961
3 False
Out[3]: 5 49
4 True
0 2 6 64
5 False
1 3 dtype: int64
6 False
2 21 dtype: bool
3 12
4 31
5 7
6 8
dtype: int64
Arithmetic on Series objects
• We can perform arithmetic like addition, subtraction,
division, etc

import pandas as pd

s=pd.Series([2,3,4,1])

s2=pd.Series([6,7,8,9])

s+s2
Out[25]:
0 8
1 10
2 12
3 10
dtype: int64
11/05/20

OPERATIONS ON SERIES OBJECT

After creating Series type object, we can perform various


types of operations on pandas SERIES OBJECTS.

• Modifying Elements of Series Object

• The head() and tail() functions

•Vector Operations on Series Objects

•Arithmetic on Series objects

•Filtering Entries
The head() and tail() functions

head():- It is used to access the first n rows of a Series.


pandas object.head()

tail():- returns last n rows from a pandas object.


pandas object.head()
import pandas as pd

s=pd.Series([2,3,21,12,31,7,8])

s
Out[3]:
0 2
1 3
2 21
3 12
4 31
5 7
6 8
dtype: int64
s
Out[7]:
0 2
1 3
2 21
3 12
4 31
5 7
6 8
dtype: int64

s.head(4)
Out[8]:
0 2
1 3
2 21
3 12
dtype: int64
import pandas as pd

s=pd.Series([2,3,21,12,31,7,8])

s
Out[3]:
0 2
1 3
2 21
3 12
4 31
5 7
6 8
dtype: int64
s.tail(3)
Out[9]:
4 31
5 7
6 8
dtype: int64
VECTOR OPERATIONS ON SERIES OBJECTS

import pandas as pd
s+2 s*3
s=pd.Series([2,3,21,1 Out[10]: Out[11]:
2,31,7,8]) 0 4 0 6
1 5 1 9
s 2 23 2 63
Out[3]: 3 14 3 36
0 2 4 33 4 93
1 3 5 9 5 21
2 21 6 10 6 24
3 12 dtype: int64
4 31
5 7
6 8
dtype: int64
import pandas as pd

s=pd.Series([2,3,21,1
2,31,7,8])

s s=s**2
Out[3]: Out[16]:
0 2 0 4
1 3 1 9
2 21 2 441
3 12 3 144
4 31 4 961
5 7 5 49
6 8 6 64
dtype: int64 dtype: int64
Filtering Entries

import pandas as pd s>15


s[s>15]
Out[12]:
s=pd.Series([2,3,21,1 Out[17]:
0 False
2,31,7,8]) 2 441
1 False
3 144
2 True
s 4 961
3 False
Out[3]: 5 49
4 True
0 2 6 64
5 False
1 3 dtype: int64
6 False
2 21 dtype: bool
3 12
4 31
5 7
6 8
dtype: int64
Arithmetic on Series objects
• We can perform arithmetic like addition, subtraction,
division, etc

import pandas as pd

s=pd.Series([2,3,4,1])

s2=pd.Series([6,7,8,9])

s+s2
Out[25]:
0 8
1 10
2 12
3 10
dtype: int64
12/05/20

Q :- What is Pandas Library of Python ? What is its


significance?
Solution:- Pandas is a python Data Analysis library
that provides data structure and functions for data
manipulation and analysis. It provides fast, flexible,
and expressive data structures designed to make
working with labeled data in an easy and intuitive
manner. It is capable of handling huge amounts od
data and at the same time it provides multiple ways
to handle missing data thereby making data analysis
more accurate and reliable.

You might also like