0% found this document useful (0 votes)

3 views

Pandas Module

Pandas is an open-source Python library primarily used for data manipulation, analysis, and cleaning, providing easy-to-use data structures and analysis tools. It allows for the creation of DataFrames and Series, supports various operations, and offers functionalities for handling missing data and reading CSV files. Key features include the ability to clean data, analyze correlations, and integrate seamlessly with other libraries.

Uploaded by

misachmatthew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Pandas Module

Uploaded by

misachmatthew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

PYTHON MODULE.

PANDAS
Pandas is an open-source Python library

It is majorly used for:

 data manipulation,
 analysis,
 data cleaning.

It provides easy-to-use
1. data structures
2. data analysis tools

Why do we need Pandas?

Easy to learn and use
Works well with other libraries like NumPy, Matplotlib, and Scikit-learn
Excellent performance with large datasets
Built-in functions for time series analysis
IMPORTING PANDAS
It's a common convention in Python to import the Pandas library and give it an alias pd.
An alias is essentially a secondary name or shortcut for an object, module, or function.
import pandas as pd
This shorthand makes it easier to refer to the library's functions and methods without needing to write out pandas
every time.
import pandas as pd

# Creating a DataFrame using 'pd'

data = {
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'City': ['Kampala', 'Entebbe', 'Mbarara']
}

df = pd.DataFrame(data) # pd is used to call pandas

functions

print(df)
PANDAS SERIES
What Is A Series?
A Pandas Series is essentially a one-dimensional labeled array capable of holding any data type (integers,
strings, floats, etc.).
Each element in the Series has an associated index (label), and you can access and manipulate these elements by
their index.
Key Features of a Pandas Series:
It's like a column in a DataFrame.
It has an index that labels each element.
It can hold various data types like integers, floats, or even strings.

Labels
If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has
index 1 etc.
This label can be used to access a specified value.
With the index argument, you can name your own labels
Note: The keys of the dictionary become the labels.
Data Used Series:
import pandas as pd 0 1.2
1 0.8
# Creating a Pandas Series for 2 1.5
'DataUsed_GB' 3 2.1
data_used_series = 4 0.9
pd.Series(data_used_GB) 5 1.0
6 1.8
# Display the Series 7 2.3
print("Data Used Series:") 8 1.6
print(data_used_series) dtype: float64

# Creating a Pandas Series with custom Dates Series with User IDs as index:
index UserID
dates_series = pd.Series(dates, 101 2025-03-01
index=userIDs) 102 2025-03-01
103 2025-03-01
# Display the Series with custom index 104 2025-03-02
print("\nDates Series with User IDs as 105 2025-03-02
index:") 106 2025-03-02
print(dates_series) 107 2025-03-03
108 2025-03-03
109 2025-03-03
dtype: object
PANDAS DATAFRAME
A Pandas DataFrame is a two-dimensional labeled data structure, similar to a table in a
database or an Excel spreadsheet. It contains rows and columns, where each column can be of
a different data type (integers, floats, strings, etc.).

Key Features of a Pandas DataFrame:

•Rows and columns can be labeled with custom indices.
•Each column is a Pandas Series.
•Supports a wide range of operations like filtering, sorting, merging, reshaping, and more.
•Allows easy handling of missing data.

Locate Row
The DataFrame is like a table with rows and columns.
Pandas use the loc (locate) attribute to return one or more specified row(s)
import pandas as pd
# Data
userIDs = [101, 102, 103, 104, 105, 106, 107, 108, 109]
dates = ["2025-03-01", "2025-03-01", "2025-03-01", "2025-03-02", "2025-03-02", "2025-03-02", "2025-03-03", "2025-03-03",
"2025-03-03"]
data_used_GB = [1.2, 0.8, 1.5, 2.1, 0.9, 1.0, 1.8, 2.3, 1.6]
call_duration_min = [45, 30, 60, 75, 40, 50, 65, 90, 55]
locations = ["New York", "San Francisco", "Chicago", "New York", "Los Angeles", "Chicago", "San Francisco", "Los Angeles", "New
York"]
signal_strength_dBm = [-70, -65, -75, -68, -60, -72, -63, -62, -69]

# Creating DataFrame
data = {
'UserID': userIDs,
'Date': dates,
'DataUsed_GB': data_used_GB,
'CallDuration_min': call_duration_min,
'Location': locations,
'SignalStrength_dBm': signal_strength_dBm
}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
UserID Date DataUsed_GB CallDuration_min Location SignalStrength_dBm
0 101 2025-03-01 1.2 45 New York -70
1 102 2025-03-01 0.8 30 San Francisco -65
2 103 2025-03-01 1.5 60 Chicago -75
3 104 2025-03-02 2.1 75 New York -68
4 105 2025-03-02 0.9 40 Los Angeles -60
5 106 2025-03-02 1.0 50 Chicago -72
6 107 2025-03-03 1.8 65 San Francisco -63
7 108 2025-03-03 2.3 90 Los Angeles -62
8 109 2025-03-03 1.6 55 New York -69
CSV Files/JSON Files
Pandas provides an easy way to read CSV (Comma Separated Values) files using the read_csv() function.

This allows you to load data from CSV files into a Pandas DataFrame for easy manipulation and analysis.

import pandas as pd

# Read CSV file

df = pd.read_csv('user_data.csv')

UserID,Date,DataUsed_GB,CallDuration_min,Location,Sign # Display the DataFrame

alStrength_dBm print(df)
101,2025-03-01,1.2,45,New York,-70
102,2025-03-01,0.8,30,Santa Clara,-65 # Display the first 3 rows
103,2025-03-01,1.5,60,Chicago,-75 print("\nFirst 3 rows:")
104,2025-03-02,2.1,75,New York,-68 print(df.head(3))
105,2025-03-02,0.9,40,Los Angeles,-60
# Display specific columns (e.g., 'UserID' and 'DataUsed_GB')
print("\nSelected columns:")
print(df[['UserID', 'DataUsed_GB']])
OUTPUT
UserID Date DataUsed_GB CallDuration_min Location SignalStrength_dBm
0 101 2025-03-01 1.2 45 New York -70
1 102 2025-03-01 0.8 30 Santa Clara -65
2 103 2025-03-01 1.5 60 Chicago -75
3 104 2025-03-02 2.1 75 New York -68
4 105 2025-03-02 0.9 40 Los Angeles -60

First 3 rows:
UserID Date DataUsed_GB CallDuration_min Location SignalStrength_dBm
0 101 2025-03-01 1.2 45 New York -70
1 102 2025-03-01 0.8 30 Santa Clara -65
2 103 2025-03-01 1.5 60 Chicago -75

Selected columns:
UserID DataUsed_GB
0 101 1.2
1 102 0.8
2 103 1.5
3 104 2.1
4 105 0.9
MAX_ROWS
In Pandas, the default display of a DataFrame is limited to a certain number of rows when printed to the
console. This limit is controlled by the max_rows option. This setting specifies the maximum number of
rows that will be shown when printing a DataFrame.
To display a maximum number of rows: You can change the max_rows setting to control how many rows
you want to display when printing a DataFrame.

import pandas as pd
# Create a sample DataFrame with more rows
data = {'UserID': range(1, 101), 'Name': ['User'+str(i) for i in range(1, 101)]}
df = pd.DataFrame(data)

# Set max_rows to control the display of rows

pd.set_option('display.max_rows', 20)

# Print the DataFrame (only 20 rows will be shown)

print(df)
PANDAS - ANALYZING DATAFRAMES
1. Displaying the First and Last Few Rows:
•head(): Displays the first few rows of the DataFrame
(default is 5 rows). 3. Viewing Basic Information:
•tail(): Displays the last few rows of the DataFrame (default is •info(): Provides concise summary of
5 rows). the DataFrame, including the number of
# Display the first 5 rows non-null entries, column names, and
print(df.head()) data types.
python
# Display the first 10 rows print(df.info())
print(df.head(10))
4. Accessing Specific Columns:
# Display the last 5 rows To view a specific column or subset of
print(df.tail()) columns, you can access them directly by
their names:
# Display the last 10 rows python
print(df.tail(10)) # Access a single column
2. Viewing the Shape of the DataFrame: print(df['column_name']) # Access
To know the number of rows and columns in the DataFrame, multiple columns print(df[['column1',
use the shape attribute: 'column2']])
print(df.shape) # Output: (rows, columns)
Cleaning Data
 Cleaning Data
 Cleaning Empty Cells
 Cleaning Wrong Format
 Cleaning Wrong Data
 Removing Duplicates
PANDAS - CLEANING DATA
Data cleaning is an essential step in
the data analysis process, as raw
data often comes with
inconsistencies, missing values,
duplicates, and errors
1. Handling Missing Data (NaN values)
Missing data is often represented as NaN (Not a Number). You can clean or fill missing data using the following
methods:
a) Identifying Missing Values
To check for missing values in a DataFrame, you can use the isnull() method:
# Check for missing values in the DataFrame
print(df.isnull())

# Count missing values in each column

print(df.isnull().sum())
b) Dropping Missing Values
If you want to remove rows or columns with missing data, you can use dropna().
# Drop rows with any missing values
df_cleaned = df.dropna()

# Drop rows where all columns are NaN

df_cleaned = df.dropna(how='all')

# Drop columns with missing values

df_cleaned = df.dropna(axis=1)
c) Filling Missing Values
2. Removing Duplicates
You can remove duplicate rows using drop_duplicates(). You can
also specify which columns to check for duplicates.
# Remove duplicate rows
df_cleaned = df.drop_duplicates()

# Remove duplicates based on specific columns

df_cleaned = df.drop_duplicates(subset=['column_name'])

# Keep the last occurrence of each duplicate (instead of the

first)
df_cleaned = df.drop_duplicates(keep='last')

3. Renaming Columns
If the column names are not meaningful or need to be changed, you can rename them using rename().

# Rename columns
df.rename(columns={'old_name': 'new_name', 'old_column':
'new_column'}, inplace=True)
CLEANING EMPTY CELLS
Cleaning empty cells (or missing values) is a crucial step when preparing data for analysis. Empty cells
can be represented in different ways, such as NaN, None, or an empty string ""
1. Identifying Empty Cells
To check for empty or missing cells, you can use isnull() or isna(). These
functions return a DataFrame of the same shape as the original, but with True where
there are missing values and False where there are not.
import pandas as pd

# Sample DataFrame with missing values

data = {'UserID': [101, 102, 103, None, 105],
'Name': ['Alice', 'Bob', None, 'David', 'Eve'],
'Age': [25, None, 30, 22, 35]}

df = pd.DataFrame(data)

# Check for missing values

print(df.isnull())

# Count missing values in each column

print(df.isnull().sum())
2. Dropping Empty Cells
You can remove rows or columns with empty cells using dropna().

a) Dropping Rows with Empty Cells

By default, dropna() removes any row with at least one
missing value: 3. Filling Empty Cells
# Drop rows with any missing values
df_cleaned = df.dropna() 4. Replacing Empty Cells with Conditional Logic

print(df_cleaned) 5. Checking and Handling Empty Strings

b) Dropping Rows Based on Specific Columns
You can drop rows with missing values only in certain
columns:
# Drop rows where 'Name' or 'Age' has missing values
df_cleaned = df.dropna(subset=['Name', 'Age'])

print(df_cleaned)
RESEARCH ABOUT:
 Cleaning Wrong Format
 Cleaning Wrong Data
 Removing Duplicates
Key Summary of Key Pandas Functions for Cleaning Data:
•isnull() / isna(): Check for missing values.
•dropna(): Drop rows or columns with missing values.
•fillna(): Fill missing values with a constant, method, or statistic.
•drop_duplicates(): Remove duplicate rows.
•rename(): Rename columns.
•astype(): Convert data types.
•str.strip(): Remove leading/trailing spaces.
•str.lower() / str.upper(): Normalize case.
•replace(): Replace specific values.
•get_dummies(): Convert categorical variables to dummy variables (one-hot enc
•clip(): Cap values in a column.
•map(): Map categories to numerical values.
PANDAS - DATA CORRELATIONS
In Pandas, data correlations are used to measure the relationship between two or
more variables.

Correlation helps determine how one variable might change in response to another

METHODS TO COMPUTE CORRELATIONS

 Pearson correlation coefficient

 Spearman rank correlation
 Kendall tau correlation(Research About this)
1. Pearson Correlation (default)
The Pearson correlation coefficient measures the linear relationship between two variables. It
returns a value between -1 and 1:
•1 indicates a perfect positive linear relationship.
•-1 indicates a perfect negative linear relationship.
•0 indicates no linear relationship.
You can compute the Pearson correlation using corr() on a DataFrame:

import pandas as pd
X Y Z
X 1.000000 -1.000000 0.970725
# Sample DataFrame
Y -1.000000 1.000000 -0.970725
data = {'X': [1, 2, 3, 4, 5], Z 0.970725 -0.970725 1.000000
'Y': [5, 4, 3, 2, 1],
'Z': [1, 1, 2, 2, 3]} Here, you can see that X and Y have a perfect
negative correlation (-1.0), while X and Z have a
df = pd.DataFrame(data) strong positive correlation (0.97).

# Compute the Pearson correlation matrix

correlation_matrix = df.corr()

print(correlation_matrix)
2. Spearman Rank Correlation
The Spearman rank correlation measures the monotonic relationship between two
variables. Unlike Pearson, Spearman does not require a linear relationship and works well
for ordinal or non-linear data.
You can calculate the Spearman correlation by passing method='spearman' to corr()

# Compute the Spearman rank correlation X Y Z

X 1.000000 -1.000000 1.000000
spearman_corr = df.corr(method='spearman') Y -1.000000 1.000000 -1.000000
Z 1.000000 -1.000000 1.000000
print(spearman_corr)

Here, the Spearman correlation between X and Y is -1,

showing a perfect negative monotonic relationship.
Research About
Plotting
Note “Use Knowledge from
Matplotlib”

Pandas Basics
No ratings yet
Pandas Basics
84 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas
No ratings yet
Pandas
41 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Pandas
No ratings yet
Pandas
5 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Unit 2
No ratings yet
Unit 2
81 pages
Lab 9
No ratings yet
Lab 9
9 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Pandas
No ratings yet
Pandas
82 pages
Pandas
No ratings yet
Pandas
8 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Class Xii Information Practices Ppt on Data Handling Using Pandas-i
No ratings yet
Class Xii Information Practices Ppt on Data Handling Using Pandas-i
64 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Python_assignment pandas
No ratings yet
Python_assignment pandas
3 pages
Pandas
No ratings yet
Pandas
21 pages
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
No ratings yet
Data Science With Python - Lesson 07 - Data Manipulation With Python - Pandas
72 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
12 SM Ip
No ratings yet
12 SM Ip
180 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Mohit
No ratings yet
Mohit
19 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Pandas Dataframe activity_removed_removed (1)_removed
No ratings yet
Pandas Dataframe activity_removed_removed (1)_removed
11 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
JOINS (1)
No ratings yet
JOINS (1)
10 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
16 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Paper 5
No ratings yet
Paper 5
35 pages
Login Commond
No ratings yet
Login Commond
3 pages
Process Lasso Setup For Sierra Chart
No ratings yet
Process Lasso Setup For Sierra Chart
6 pages
Fluid Mechanics 5th Edition Kundu Solutions Manual pdf download
100% (1)
Fluid Mechanics 5th Edition Kundu Solutions Manual pdf download
20 pages
Mean Stack
No ratings yet
Mean Stack
33 pages
NX Mach Design Solutions 2022
No ratings yet
NX Mach Design Solutions 2022
7 pages
Sindhura Durgaprasad Putrevu Resume
No ratings yet
Sindhura Durgaprasad Putrevu Resume
3 pages
Devops assignment@CSE SYNDICATE
No ratings yet
Devops assignment@CSE SYNDICATE
10 pages
KT-502H en Tcd210202ab 20211221 Manual W
No ratings yet
KT-502H en Tcd210202ab 20211221 Manual W
3 pages
PHP 13
No ratings yet
PHP 13
57 pages
Document 2
No ratings yet
Document 2
5 pages
Log
No ratings yet
Log
431 pages
Using The Parent Order Forms in Your Kit?
No ratings yet
Using The Parent Order Forms in Your Kit?
1 page
Very Mature: Personality Disorder Journal
No ratings yet
Very Mature: Personality Disorder Journal
1 page
Zonal Marking The Making Of Modern European Football Michael Cox download
No ratings yet
Zonal Marking The Making Of Modern European Football Michael Cox download
29 pages
Tata Consultancy Services Research Scholar Program (Tcs RSP)
No ratings yet
Tata Consultancy Services Research Scholar Program (Tcs RSP)
4 pages
Tagtqt3 - 20DDS - CL3.CT - Tran Van Khue - de 1
No ratings yet
Tagtqt3 - 20DDS - CL3.CT - Tran Van Khue - de 1
8 pages
MPMC - Pheriperals & IO Devices
No ratings yet
MPMC - Pheriperals & IO Devices
74 pages
Nextcloud Manual
No ratings yet
Nextcloud Manual
143 pages
Lesson 3. Data Models and Data Structures
No ratings yet
Lesson 3. Data Models and Data Structures
17 pages
Drilling Machine: Definition, Parts, Types, and Operations (With PDF
100% (1)
Drilling Machine: Definition, Parts, Types, and Operations (With PDF
19 pages
Classified 20231016 1
No ratings yet
Classified 20231016 1
3 pages
[Ebooks PDF] download Agile Methods Large Scale Development Refactoring Testing and Estimation XP 2014 International Workshops Rome Italy May 26 30 2014 Revised Selected Papers 1st Edition Torgeir Dingsøyr full chapters
100% (5)
[Ebooks PDF] download Agile Methods Large Scale Development Refactoring Testing and Estimation XP 2014 International Workshops Rome Italy May 26 30 2014 Revised Selected Papers 1st Edition Torgeir Dingsøyr full chapters
52 pages
CTF - Kioptrix Level 3 - Walkthrough Step by Step - Yeah Hub
No ratings yet
CTF - Kioptrix Level 3 - Walkthrough Step by Step - Yeah Hub
26 pages
Best Journal
No ratings yet
Best Journal
9 pages
13.1.10 Packet Tracer - Configure A Wireless Network
No ratings yet
13.1.10 Packet Tracer - Configure A Wireless Network
5 pages
Ims561 Sow
No ratings yet
Ims561 Sow
3 pages
Manual Spanish
No ratings yet
Manual Spanish
76 pages
Proteus
No ratings yet
Proteus
84 pages
User-Centered Website Development: A Human-Computer Interaction Approach
No ratings yet
User-Centered Website Development: A Human-Computer Interaction Approach
37 pages

Pandas Module

Uploaded by

Pandas Module

Uploaded by

PYTHON MODULE.

It is majorly used for:

Why do we need Pandas?

# Creating a DataFrame using 'pd'

df = pd.DataFrame(data) # pd is used to call pandas

Key Features of a Pandas DataFrame:

# Read CSV file

UserID,Date,DataUsed_GB,CallDuration_min,Location,Sign # Display the DataFrame

# Set max_rows to control the display of rows

# Print the DataFrame (only 20 rows will be shown)

# Count missing values in each column

# Drop rows where all columns are NaN

# Drop columns with missing values

# Remove duplicates based on specific columns

# Keep the last occurrence of each duplicate (instead of the

# Sample DataFrame with missing values

# Check for missing values

# Count missing values in each column

a) Dropping Rows with Empty Cells

print(df_cleaned) 5. Checking and Handling Empty Strings

METHODS TO COMPUTE CORRELATIONS

 Pearson correlation coefficient

# Compute the Pearson correlation matrix

# Compute the Spearman rank correlation X Y Z

Here, the Spearman correlation between X and Y is -1,

You might also like