0% found this document useful (0 votes)
261 views

Pandas Notoes For XII PDF

(1) Pandas is a Python library used for data analysis and manipulation. It allows import and analysis of data from many formats. (2) Pandas has two main data structures - Series for 1D data and DataFrame for 2D data. Series is a 1D array holding data of the same type while DataFrame allows different data types. (3) A Series can be created from many data types including lists, NumPy arrays, scalars, and functions. It allows accessing, slicing, filtering and sorting of data.

Uploaded by

Manish Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
261 views

Pandas Notoes For XII PDF

(1) Pandas is a Python library used for data analysis and manipulation. It allows import and analysis of data from many formats. (2) Pandas has two main data structures - Series for 1D data and DataFrame for 2D data. Series is a 1D array holding data of the same type while DataFrame allows different data types. (3) A Series can be created from many data types including lists, NumPy arrays, scalars, and functions. It allows accessing, slicing, filtering and sorting of data.

Uploaded by

Manish Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Python Pandas- I

Introduction
Pandas or python pandas is Python’s library for data analysis. Pandas has derived
its name from “panel data system”, which is an ecometrics term for
multidimensional, structured data sets. Pandas has become a popular choice for
data analysis. Data analysis refers to process of evaluating big data set using
analytical and statistical tools so as to discover useful information and conclusions
to support business decision-making.

Using pandas- Pandas is an open source, BSD library built for Python
programming language. Pandas offer high-performance, easy to use data structures
and data analysis tools.

Following statement is used to import pandas in python program-

import pandas as pd

Why Pandas- Pandas is the most popular library in the scientific Python
ecosystem for doing data analysis. Pandas is capable of many tasks including:

∑ It can read or write in many different data formats(integer, float, double,


etc.)
∑ It can calculate in all the possible ways data is organized.
∑ It can easily select subsets of data from bulky data set and even combine
multiple datasets together. It has functionality to find the fill missing data.
∑ It allows us to apply operations to independent groups within the data.
∑ It supports reshaping of data into different forms.
∑ It supports advanced time –series functionality

Pandas Data Structures


It refers to specialized way of storing data so as to apply a specific type of functionality on
them.

There are two basic structures of pandas: series and DataFrame

This PDF is created at https://www.pdfonline.com/convert-pdf/


1. Series: It is 1-dimensional data structure of Python pandas and
2. Dataframe: It is 2 – dimensional data structure of Python Pandas.
Pandas also support another data structure called panel.
Property Series DataFrame
Dimensions 1-dimensional 2-dimensional
Type of data Homogeneous-all the Heterogeneous-
elements must be of DataFrame object can
same data type in a have elements of
Series object. different data types.
Mutability Values mutable: their Value mutable- Their
elements value can elements value can
change. change.
Size-immutable- size Size mutable-size of a
of a series object, one DataFrame object once
created, cannot change. created can change in
place. You can add/drop
elements in an existing
dataframe object.

Series Data Structure:


Series is an important data structure of pandas. It represents a one –
dimensional array of indexed data. A series type object has two main
components:
∑ An array of actual data
∑ An associated array of indexes or data labels.

Creating Series Objects: A series type object can be created in many ways
using pandas library’s Series(). We have to import pandas and numpy also.

1. Create empty series object using series()-


<Series object>= pandas.Series()
2. Creating non empty series object-
<Series object>=pd.Series(data,,index=idx)
Where idx is a valid numpy datatype and data is the data part of the series
object.
(i) Specify data as Python Sequence:
<Series Object>=Series(<any Python sequence>)

This PDF is created at https://www.pdfonline.com/convert-pdf/


Example –
import pandas as pd
s1=pd.series([4,6,8,10])
print(“series object1”)
print(s1)
Output
Series object1
0 4
1 6
2 8
3 10

dtype:int64
s1=pd.series((4,6,8,10))
data cab be given in the form of tuples.

Example-
import pandas as pd
s1=pd.series([“i”,”am”,”laughing”])
print(“series object”)
print(s1)

output:
Series object
0 i
1 am
2 laughing
(ii) Specify data as an ndarray

import pandas as pd
import numpy as np
nda1=np.arange(3,13,3.5)
print(nda1)
ser1=pd.series(nda1)
print(ser1)

This PDF is created at https://www.pdfonline.com/convert-pdf/


linspace(): It show no of terms start from starting value to ending
value.
numpy.linspace(start, stop, term, endpoint=True, retstep=False, dtype=None,
axis=0)
Example-
import pandas as pd
s1=pd.Series(np.linspace(24,64,5))
print(s1)
Output:
0 24.0
1 34.0
2 44.0
3 54.0
4 64.0
dtype:float64

tile():Repeat elements of an array.


Example-
import pandas as pd
s1=pd.Series(np.tile([3,5],2))
print(s1)
Output:
0 3
1 5
2 3
3 5
dtype=int32
(iii) Specify data as a Scalar Value-
Example-
import pandas as pd
s1=pd.Series(50000,index=[‘qtr1’,’qtr2’,’qtr3’,’qtr4’])
print(s1)
OutPut:

This PDF is created at https://www.pdfonline.com/convert-pdf/


Qtr1 50000
Qtr2 50000
Qtr3 50000
Qtr4 50000
Dtype: int64

Example:
import pandas as pd
s1=pd.Series(200,index=range(2020,2029,2))
print(s1)
OutPut:
2020 200
2022 200
2024 200
2026 200
2028 200
Dtype: int64
(iv) Spefify index(es) as well as data with Series()-
<Series Object>= pandas.Series(data=None,index=None)
Example-
import pandas as pd
obj1=pd.Series(data=[32,24,26],index=[‘a’,’b’,’c’])
print(obj1)

example-
import pandas as pd
section=[‘a’,’b’,’c’,’d’]
contri=[6700,5600,5000,5200]
s1=pd.Series(data=contri,index =section)
print(s1)

This PDF is created at https://www.pdfonline.com/convert-pdf/


(v) Using a Mathematical Function/Expression to Create Data Array in
Series()-
<series
Object>=pandas.Series(index=None,data=<function/Expression>)

import pandas as pd
import numpy as np
a=np.arange(9,13)
s1=pd.Series(data=a,index =a*2)
print(s1)

Example-
import pandas as pd
import numpy as np
section=['a','b','c','d','e']
contri=np.array([6700,5600,5000,5200,np.NaN])
s1=pd.Series(data=contri,index =section,dtype=np.float32)
print(s1)
output:
a 6700.0
b 5600.0
c 5000.0
d 5200.0
e NaN
dtype: float32

(A) Accessing Individual Elements


To access individual elements we shall be using the sample
objects.
<Series Object Name>[Valid index no]
Example :
import pandas as pd
s1=pd.Series(200,index=range(2020,2029,2))
print(s1)
print(s1[2])

This PDF is created at https://www.pdfonline.com/convert-pdf/


(B) Extracting Slices from Series Object-
A slice object is created from series object using a syntax of
<Object>[Start:End:step], but the start and stop signify the positions of
elements not the indexes.
Example:

import pandas as pd
s1=pd.series([4,6,8,10])
print(“series object1”)
print(s1)
s1[0]=100
s1[2:]=50
print(s1)
Output

series object1
0 4
1 6
2 8
3 10
dtype: int64
0 100
1 6
2 50
3 50
dtype: int64

(C) The head() and tail() functions-


The head() function is used to fetch first n rows from a Pandas object and
tail() function returns last n rows from a Pandas object.
<pandas object>. Head([n])
<pandas object>.tail([n])
Example-
import pandas as pd

This PDF is created at https://www.pdfonline.com/convert-pdf/


data=[2,4,5,6]
trdata1=pd.Series(data)
print(trdata1.head(2))
print(trdata1.tail(2))

Output:
0 2
1 4
dtype: int64
2 5
3 6
dtype: int64
Filtering Entries- User can give boolean condition with elements.
<series object>([ Boolean Expression])
Example-
import pandas as pd
info=pd.Series(data=[31,41,51])
print(info)
print(info>40)
print(info[info>40])
output
0 31
1 41
2 51
dtype: int64
0 False
1 True
2 True
dtype: bool
1 41
2 51
dtype: int64

print(info.sort_values(ascending=False))

Sorting Series Values:


To sort a Series object on the basis of values, you may use sort_values()
function as per the following index:
<Series object>.sort_values([ascending=true[Flase])
Example-
>>>s1.sort_index(ascending=False)

This PDF is created at https://www.pdfonline.com/convert-pdf/


D 5200
C 5000
B 5600
A 6700
dtype:int64

Reindexing:
<Series Object>=<Object>.reindex(<sequence with new order>)
obj1= obj2.reindex([‘e’,’b’,’c’,’d’,’a’])
example-
import pandas as pd
data=[2,4,5,6]
trdata1=pd.Series(data)
print(trdata1)
trdata2=trdata1.reindex([3,2,1,0])
print(trdata2)

Drop: <Series Object>.drop(<index to be removed>)


Obj2=obj1.drop(‘c’)
example
import pandas as pd
data=[2,4,5,6]
trdata1=pd.Series(data)
print(trdata1)
trdata2=trdata1.drop(0)
print(trdata2)

DataFrame Data Structure: A DataFrame in another Pandas Structure


which stores data in two-dimensional way. It is actually a two-dimensional(tabular
and spreadsheet like) labeled array, which is actually an ordered collection of
column that can store data of different types.
Creating and Displaying a DataFrame-
Import pandas as pd
Import numpy as np
Example-
import pandas as pd

This PDF is created at https://www.pdfonline.com/convert-pdf/


dict1={‘section’:[‘a’,’b’,’c’,’d’],’contri’:[6700,5600,5000,5200]}
df1=pd.DataFrame(dict1)
print(df1)
Output:
section contri
0 a 6700
1 b 5600
2 c 5000
3 d 5200
import pandas as pd
dict1={'section':['a','b','c','d','e'],'contri':[6700,5600,500
0,5200,4000],'class':[11,12,11,12,11]}
df1=pd.DataFrame(dict1)
print(df1)

Example:
import pandas as pd
topperA={‘Rollno’:115,’Name’:’Pavni’,’Marks’:97.5}
topperB={‘Rollno’:116,’Name’:’Rishi’,’Marks’:98}
topperC={‘Rollno’:117,’Name’:’Paula’,’Marks’:98.5}
toppers=[topperA,topperB,topperC]
topdf=pd.DataFrame(toppers)
print(topdf)

Output:
Rollno Name Marks
0 115 Pavni 97.5
1 116 Rishi 98.0
2 117 Paula 98.5
Example:
import pandas as pd
list2=[[25,45,60],[34,67,89],[88,90,56]]
df2=pd.DataFrame(list2,index=[‘row1’,’row2’,’row3’])
print(df2)
Output
0 1 2
row1 25 45 60

This PDF is created at https://www.pdfonline.com/convert-pdf/


row2 34 67 89
row3 88 90 56

DataFrame Attributes-
When you create a DataFrame object, all information related to it. It is
available through its attributes.
<DataFrame object>.<attribute name>
1. Len(df object)- count the rows
2. Df_object.count()- count the non-Na values for each column
3. Df_object.T – Transform a dataframe.
Selecting or accessing data-

<DataFrame object>[<column name>,<column name>]


>>> df[2:5]
>>>df.loc[2:5,[‘Item Type’,’Total Profit’]]
Adding/Modifying a Column-
To change or add a column-
<Df object>.<column name>=<new value>
Example:
>>>dtf5[‘Desity’]=1219
Adding/Modifying a Row-
To change or add a row-
<DF object>.at[<row name>,:]=<new value>
Example-
>>>dtf.at[‘Banglore’,:]=[1200,1100,1000,900]
>>>dtf
Deleting row/column in a DataFrame-
Del <df object>[<obj name>]
>>>del dft[‘density’]
Renaming rows/column-

<Df>.rename(index={<name dictionary>},column={<name dictionary>})


>>> topdf.rename(index={‘Sec A’:’A’,’Sec B’:’B’,’Sec C’:’C’,’Sec D’:’D’})
>>>topdf

This PDF is created at https://www.pdfonline.com/convert-pdf/


This PDF is created at https://www.pdfonline.com/convert-pdf/

You might also like