0% found this document useful (0 votes)
4 views

Exp 25_26

The document provides an introduction to the Pandas library in Python, detailing installation, importing, and data structures such as Series and DataFrame. It explains how to create and manipulate Series and DataFrames, including accessing elements, indexing, and performing binary operations. Additionally, it covers basic operations on rows and columns within DataFrames, including selection and indexing techniques.

Uploaded by

Prasad Nirmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Exp 25_26

The document provides an introduction to the Pandas library in Python, detailing installation, importing, and data structures such as Series and DataFrame. It explains how to create and manipulate Series and DataFrames, including accessing elements, indexing, and performing binary operations. Additionally, it covers basic operations on rows and columns within DataFrames, including selection and indexing techniques.

Uploaded by

Prasad Nirmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Started with Pandas

Installing Pandas
The first step in working with Pandas is to ensure whether it is installed in the system or not.
If not, then we need to install it on our system using the pip command.
pip install pandas
Importing Pandas
After the Pandas have been installed in the system, we need to import the library. This
module is generally imported as follows:
import pandas as pd
Note: Here, pd is referred to as an alias for the Pandas. However, it is not necessary to
import the library using the alias, it just helps in writing less code every time a method or
property is called.
Data Structures in Pandas Library
Pandas generally provide two data structures for manipulating data. They are:
 Series
 DataFrame

Python Pandas Series


A Pandas Series is a one-dimensional labeled array capable of holding data of any
type (integer, string, float, Python objects, etc.). The axis labels are collectively
called indexes.
Creating a Series
Pandas Series is created by loading the datasets from existing storage (which can be a SQL
database, a CSV file, or an Excel file).
Pandas Series can be created from lists, dictionaries, scalar values, etc.
Pandas Series Examples
# import pandas as pd
import pandas as pd

# simple array
data = [1, 2, 3, 4]
ser = pd.Series(data)
print(ser)

Output
0 1
1 2
2 3
3 4
dtype: int64
The axis labels are collectively called index. Pandas Series is nothing but a column in an excel
sheet.
Labels need not be unique but must be a hashable type. The object supports both integer
and label-based indexing and provides a host of methods for performing operations
involving the index.

Python Pandas Series


We will get a brief insight on all these basic operations which can be performed on Pandas
Series :
 Creating a Series
 Accessing element of Series
 Indexing and Selecting Data in Series
 Binary operation on Series
 Conversion Operation on Series

Creating a Pandas Series


In the real world, a Pandas Series will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas Series can be
created from the lists, dictionary, and from a scalar value etc. Series can be created in
different ways, here are some ways by which we create a series:
Creating a series from array: In order to create a series from array, we have to import a
numpy module and have to use array() function.
# import pandas as pd
import pandas as pd
# import numpy as np
import numpy as np
# simple array
data = np.array(['g','e','e','k','s'])
ser = pd.Series(data)
print(ser)

Output
0 g
1 e
2 e
3 k
4 s
dtype: object
Creating a series from Lists:
In order to create a series from list, we have to first create a list after that we can create a
series from list.
import pandas as pd

# a simple list
list = ['g', 'e', 'e', 'k', 's']

# create series form a list


ser = pd.Series(list)
print(ser)

Output
0 g
1 e
2 e
3 k
4 s
dtype: object
For more details refer to Creating a Pandas Series.
Accessing element of Series
There are two ways through which we can access element of series, they are :
 Accessing Element from Series with Position
 Accessing Element Using Label (index)
Accessing Element from Series with Position : In order to access the series element refers to
the index number. Use the index operator [ ] to access an element in a series. The index
must be an integer. In order to access multiple elements from a series, we use Slice
operation.
Accessing first 5 elements of Series.
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data)
#retrieve the first element
print(ser[5:])

Output
0 g
1 e
2 e
3 k
4 s
dtype: object
Accessing Element Using Label (index) :
In order to access an element from series, we have to set values by index label. A Series is
like a fixed-size dictionary in that you can get and set values by index label.
Accessing a single element using index label.
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data,index=[10,11,12,13,14,15,16,17,18,19,20,21,22])

# accessing a element using index element


print(ser[16])

Output
o
For more details refer to Accessing element of Series
Indexing and Selecting Data in Series
Indexing in pandas means simply selecting particular data from a Series. Indexing could
mean selecting all the data, some of the data from particular columns. Indexing can also be
known as Subset Selection.
Indexing a Series using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object.
The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing
operator to refer to df[ ].
# importing pandas module
import pandas as pd
# making data frame
df = pd.read_csv("nba.csv")
ser = pd.Series(df['Name'])
data = ser.head(10)
data

Now we access the element of series using index operator [ ].


# using indexing operator
data[3:6]

Indexing a Series using .loc[ ] :


This function selects data by refering the explicit index . The df.loc indexer selects data in a
different way than just the indexing operator. It can select subsets of data.
# importing pandas module
import pandas as pd
# making data frame
df = pd.read_csv("nba.csv")
ser = pd.Series(df['Name'])
data = ser.head(10)
data
Now we access the element of series using .loc[] function.
# using .loc[] function
data.loc[3:6]
Output :

Indexing a Series using .iloc[ ] :


This function allows us to retrieve data by position. In order to do that, we’ll need to specify
the positions of the data that we want. The df.iloc indexer is very similar to df.loc but only
uses integer locations to make its selections.
# importing pandas module
import pandas as pd

# making data frame


df = pd.read_csv("nba.csv")

ser = pd.Series(df['Name'])
data = ser.head(10)
data
Output:

Now we access the element of Series using .iloc[] function.


# using .iloc[] function
data.iloc[3:6]
Output :

Binary Operation on Series


We can perform binary operation on series like addition, subtraction and many other
operation. In order to perform binary operation on series we have to use some function
like .add(),.sub() etc..
Code #1:
# importing pandas module
import pandas as pd
# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
# creating a series
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
print(data, "\n\n", data1)
Output
a 5
b 2
c 3
d 7
dtype: int64

a 1
b 6
d 4
e 9
dtype: int64
Now we add two series using .add() function.
# adding two series using
# .add
data.add(data1, fill_value=0)
Output :

Code #2:
# importing pandas module
import pandas as pd

# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
# creating a series
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
print(data, "\n\n", data1)

Output
a 5
b 2
c 3
d 7
dtype: int64

a 1
b 6
d 4
e 9
dtype: int64
Now we subtract two series using .sub function.
# subtracting two series using
# .sub
data.sub(data1, fill_value=0)
Output :

Pandas DataFrame
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns). A Data frame is a two-
dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
Pandas DataFrame consists of three principal components, the data, rows, and columns.
Creating a Pandas DataFrame
Pandas DataFrame will be created by loading the datasets from existing storage, storage can
be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.
Here are some ways by which we create a dataframe:
Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.
import pandas as pd

# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst)
print(df)
Output:
Output
Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of narray/list,
all the narray must be of same length. If index is passed then the length index should be
equal to the length of arrays. If no index is passed, then by default, index will be range(n)
where n is the array length.
# Python code demonstrate creating
# DataFrame from dict narray / lists
# By default addresses.
import pandas as pd
# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Age':[20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.


print(df)

Output:
For more details refer to Creating a Pandas DataFrame
Table of Content
 Dealing with Rows and Columns
 Indexing and Selecting Data
 Selecting a single row
 Working with Missing Data
 Iterating over rows and columns

Dealing with Rows and Columns in Pandas DataFrame


A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns. We can perform basic operations on rows/columns like selecting,
deleting, adding, and renaming.
Column Selection: In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.
# Import pandas package
import pandas as pd

# Define a dictionary containing employee data


data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select two columns
print(df[['Name', 'Qualification']])

Output:

Row Selection: Pandas provide a unique method to retrieve rows from a Data
frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can
also be selected by passing integer location to an iloc[] function.

Note: We’ll be using nba.csv file in below examples.


# importing pandas package
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)

Output:
As shown in the output image, two series were returned since there was only one parameter
both of the times.
For more Details refer to Dealing with Rows and Columns
Indexing and Selecting Data in Pandas
Indexing in pandas means simply selecting particular rows and columns of data from a
DataFrame. Indexing could mean selecting all the rows and some of the columns, some of
the rows and all of the columns, or some of each of the rows and columns. Indexing can also
be known as Subset Selection.
Indexing a Dataframe using indexing operator []
Indexing operator is used to refer to the square brackets following an object.
The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing
operator to refer to df[].
In order to select a single column, we simply put the name of the column in-between the
brackets
# importing pandas package
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving columns by indexing operator
first = data["Age"]
print(first)
Output:

Indexing a DataFrame using .loc[ ]


This function selects data by the label of the rows and columns. The df.loc indexer selects
data in a different way than just the indexing operator. It can select subsets of rows or
columns. It can also simultaneously select subsets of rows and columns.
In order to select a single row using .loc[], we put a single row label in a .loc function.
# importing pandas package
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)
Output:
As shown in the output image, two series were returned since there was only one parameter
both of the times.

Indexing a DataFrame using .iloc[ ]


This function allows us to retrieve rows and columns by position. In order to do that, we’ll
need to specify the positions of the rows that we want, and the positions of the columns
that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer
locations to make its selections.
In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving rows by iloc method
row2 = data.iloc[3]
print(row2)

Output:

You might also like