Box Plot Data-Aggregation To Normalization DJB Notes 25-04-2024

A boxplot, or box-and-whisker plot, graphically summarizes the central tendency, dispersion, and skewness of a dataset using boxes representing quartiles and outliers. It divides the dataset into quartiles with a median line and whiskers extending to the minimum and maximum non-outlier values. Boxplots are useful for comparing distributions and identifying outliers.

Uploaded by

xogoj12262

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Box Plot Data-Aggregation To Normalization DJB Notes 25-04-2024

Uploaded by

xogoj12262

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Box Plots

● A boxplot, also known as a box-and-whisker plot, is a graphical

representation of the distribution of a dataset. It summarizes the central
tendency, dispersion, and skewness of the data in a concise manner.
Here's a breakdown of the components of a boxplot:
● Median (Q2): The middle value of the dataset, also known as the second
quartile (Q2). It divides the data into two halves, with 50% of the data
points falling below it and 50% above it.
● Quartiles (Q1 and Q3): The quartiles divide the dataset into four equal
parts. Q1 represents the first quartile, which is the median of the lower
half of the data. Q3 represents the third quartile, which is the median of
the upper half of the data.
● Interquartile Range (IQR): The IQR is the range between the first quartile
(Q1) and the third quartile (Q3). It covers the middle 50% of the data.
Box Plots

● Whiskers: The whiskers extend from the edges of the box to the
minimum and maximum values within 1.5 times the IQR from the
first and third quartiles, respectively. They represent the range of
the data, excluding outliers.
● Outliers: Data points that fall outside the whiskers are considered
outliers and are plotted individually as points. They represent data
values that are significantly different from the rest of the dataset.
● Boxplots are particularly useful for comparing distributions between
different groups or variables and identifying potential outliers. They
provide a visual summary of the data's spread, skewness, and central
tendency in a single plot, making them a valuable tool in exploratory
data analysis and statistical analysis.
Box Plots

● import pandas as pd
● import matplotlib.pyplot as plt

● # Load the CSV file into a Pandas DataFrame

● df = pd.read_csv('D:\DATA_SCIENCE\Sample_CSV_files\height_weight.csv')

● # Create box plots for height and weight

● plt.figure(figsize=(10, 6))

● # Box plot for height

● plt.subplot(1, 2, 1)
● plt.boxplot(df['height'])
● plt.title('Box Plot of Height')
● plt.ylabel('Height')
Box Plots

● # Box plot for weight

● plt.subplot(1, 2, 2)
● plt.boxplot(df['weight'])
● plt.title('Box Plot of Weight')
● plt.ylabel('Weight')

● plt.tight_layout()
● plt.show()
Data aggregation and grouping
with Pandas
● In Pandas, data aggregation and grouping are fundamental
operations for data analysis. Here's a basic overview of how to
perform these tasks:
● Grouping Data: The groupby() function is used to split the data
into groups based on some criteria.
import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob'],
'Score': [85, 92, 78, 90, 88]}
df = pd.DataFrame(data)

# Grouping by 'Name'
grouped = df.groupby('Name')

# Displaying groups
for name, group in grouped:
print(name)
print(group)
Aggregating Data:

● After grouping, you can apply aggregate functions like

sum, mean, count, etc. using agg().
# Aggregating data
agg_result = grouped.agg({'Score': 'mean'}) # Mean
score for each person
print(agg_result)
Applying Multiple Aggregations:

● You can apply multiple aggregation functions

simultaneously.
# Applying multiple aggregations
agg_result = grouped.agg({'Score': ['mean', 'sum',
'count']})
print(agg_result)
Custom Aggregation Functions:

● You can define custom aggregation functions.

# Custom aggregation function
def custom_agg(series):
return series.max() - series.min()

agg_result = grouped.agg({'Score': custom_agg})

print(agg_result)
Flattening MultiIndex:

● You can flatten the MultiIndex columns after

aggregation.
# Flattening MultiIndex
agg_result.columns = ['_'.join(col).strip() for col in
agg_result.columns.values]
print(agg_result)
Data transformation and
normalization
● In SciPy, you can perform data transformation and
normalization using various functions available in the
scipy.stats module. Here's a basic overview of how to do
data transformation and normalization using SciPy:
● Data Transformation: Transformation involves changing the
scale or distribution of your data. Common transformations
include log transformation, square root transformation, etc.
You can use the scipy.stats.boxcox function for power
transformations like Box-Cox transformation.
from scipy import stats

# Example data
data = [1, 2, 3, 4, 5]

# Perform Box-Cox transformation

transformed_data, lambda_value = stats.boxcox(data)

print("Transformed data:", transformed_data)

print("Lambda value:", lambda_value)
Data Normalization:

● Normalization scales the values of your data to a fixed

range, usually between 0 and 1. The scipy.stats.zscore
function is commonly used for z-score normalization.
from scipy import stats

# Example data
data = [1, 2, 3, 4, 5]

# Perform z-score normalization

normalized_data = stats.zscore(data)

print("Normalized data:", normalized_data)

Name:Derangula Divakar Mobile: +91-7674865689: Professional Skills
67% (6)
Name:Derangula Divakar Mobile: +91-7674865689: Professional Skills
3 pages
Azure COst Management
No ratings yet
Azure COst Management
946 pages
SOQL Queries
100% (2)
SOQL Queries
18 pages
Unit 3
No ratings yet
Unit 3
20 pages
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
No ratings yet
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
4 pages
Pivot Table
No ratings yet
Pivot Table
16 pages
UNIT 3
No ratings yet
UNIT 3
45 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Boxplots in R-1
No ratings yet
Boxplots in R-1
10 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Datascienece
No ratings yet
Datascienece
18 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
02data Part2
No ratings yet
02data Part2
34 pages
Matplotlib Notes
No ratings yet
Matplotlib Notes
5 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
ADS PRINT ans
No ratings yet
ADS PRINT ans
4 pages
5-Number Summary: Median
No ratings yet
5-Number Summary: Median
1 page
CHP 2
No ratings yet
CHP 2
52 pages
boxblot in fods
No ratings yet
boxblot in fods
5 pages
ADS imp ans
No ratings yet
ADS imp ans
11 pages
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
100% (1)
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
37 pages
BarPlot and Histogram
No ratings yet
BarPlot and Histogram
28 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Module -3
No ratings yet
Module -3
43 pages
02 Data
No ratings yet
02 Data
62 pages
Visualization - Hist and Box
No ratings yet
Visualization - Hist and Box
23 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
02Data (2)
No ratings yet
02Data (2)
36 pages
ds11.p
No ratings yet
ds11.p
3 pages
lecture4
No ratings yet
lecture4
60 pages
Data Preprocessing
No ratings yet
Data Preprocessing
64 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
Lec 2
No ratings yet
Lec 2
26 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Data Visualization Lab3
No ratings yet
Data Visualization Lab3
23 pages
Unit3_4) Matplotlib and seaborn.ipynb - Colab
No ratings yet
Unit3_4) Matplotlib and seaborn.ipynb - Colab
5 pages
Chapter 2 Final of Final
No ratings yet
Chapter 2 Final of Final
158 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
5_Data Summaries and Visualization (4)
No ratings yet
5_Data Summaries and Visualization (4)
87 pages
DM Lec2 Getting To Know Your Data
No ratings yet
DM Lec2 Getting To Know Your Data
34 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
Boxplots
No ratings yet
Boxplots
9 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
Day 3
No ratings yet
Day 3
24 pages
02Data
No ratings yet
02Data
65 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
2.5. Introduction To Matplotlib - 2
No ratings yet
2.5. Introduction To Matplotlib - 2
60 pages
machine learning unit 2
No ratings yet
machine learning unit 2
9 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
Box Plot
No ratings yet
Box Plot
4 pages
5_Data Summaries and Visualization
No ratings yet
5_Data Summaries and Visualization
97 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Practical Vulnerability Management A Strategic Approach to Managing Cyber Risk 1st Edition Andrew Magnusson - Download the full ebook now to never miss any detail
100% (1)
Practical Vulnerability Management A Strategic Approach to Managing Cyber Risk 1st Edition Andrew Magnusson - Download the full ebook now to never miss any detail
59 pages
Cloud Computing Lab Activity 1_SH2024_Quiz1 2
No ratings yet
Cloud Computing Lab Activity 1_SH2024_Quiz1 2
1 page
JasperReports Server Security Guide
No ratings yet
JasperReports Server Security Guide
92 pages
Python Cheatsy
No ratings yet
Python Cheatsy
1 page
View Equivalent Schedule in DBMS
No ratings yet
View Equivalent Schedule in DBMS
22 pages
Iris Intersystem Health Technology Guide
No ratings yet
Iris Intersystem Health Technology Guide
12 pages
testplan
No ratings yet
testplan
5 pages
Software Engineer Intern Assignment
No ratings yet
Software Engineer Intern Assignment
7 pages
Data Mining 5 Semester Bca
No ratings yet
Data Mining 5 Semester Bca
44 pages
GraphQL Basics
No ratings yet
GraphQL Basics
42 pages
9781119823414
No ratings yet
9781119823414
53 pages
Module 4 Entity Relationship Diagramming: ISMG6080 Zhiping Walter
No ratings yet
Module 4 Entity Relationship Diagramming: ISMG6080 Zhiping Walter
80 pages
Professional Cloud DevOps Engineer - en
No ratings yet
Professional Cloud DevOps Engineer - en
33 pages
ESMR For Different Type of Class Protection Level: (Please Refer To Answer Sheet of Excel For Detailed Calculation)
No ratings yet
ESMR For Different Type of Class Protection Level: (Please Refer To Answer Sheet of Excel For Detailed Calculation)
1 page
Ejercicios Capitulo 3
No ratings yet
Ejercicios Capitulo 3
8 pages
Salesforce Integration Questions For Discovery
No ratings yet
Salesforce Integration Questions For Discovery
3 pages
Django Test Count
No ratings yet
Django Test Count
4 pages
The Ultimate Guide for BloodHound Community Edition
No ratings yet
The Ultimate Guide for BloodHound Community Edition
47 pages
CS687 - Access Control 1 - Spring 2020
No ratings yet
CS687 - Access Control 1 - Spring 2020
41 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
16 pages
Ultratrend DMS 4.1 Manual PDF
No ratings yet
Ultratrend DMS 4.1 Manual PDF
22 pages
Southern Philippines Institute: of Science and Technology
No ratings yet
Southern Philippines Institute: of Science and Technology
10 pages
Azure Fundamental QB.docx
No ratings yet
Azure Fundamental QB.docx
12 pages
Lab Guide - PDF - EN
No ratings yet
Lab Guide - PDF - EN
114 pages
BI Architecture - 1
No ratings yet
BI Architecture - 1
11 pages
Scenario Based Interview Questions For Java Developer
No ratings yet
Scenario Based Interview Questions For Java Developer
5 pages
SQL Server MCQ Document
No ratings yet
SQL Server MCQ Document
4 pages