0% found this document useful (0 votes)
3 views

Boxplot (4) (1)

The document provides an overview of boxplots, including their definition, components, and applications in data analysis. It explains how to create and interpret boxplots, highlighting their usefulness in identifying outliers and comparing distributions. Boxplots are emphasized as a robust tool for visualizing data spread and skewness, making them valuable in exploratory data analysis and various practical scenarios like fraud detection.

Uploaded by

Swaira Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Boxplot (4) (1)

The document provides an overview of boxplots, including their definition, components, and applications in data analysis. It explains how to create and interpret boxplots, highlighting their usefulness in identifying outliers and comparing distributions. Boxplots are emphasized as a robust tool for visualizing data spread and skewness, making them valuable in exploratory data analysis and various practical scenarios like fraud detection.

Uploaded by

Swaira Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Boxplots

22-CSE-09, 22-CSE-12,
22-CSE-15, 22-CSE-18​
Index

1 2 3 4 5
Introduction Parts of Create and Applications Summary
of Boxplot Boxplot Interpret of Boxplot
Boxplot

2 Boxplot
Introduction:
What is Boxplot?
A boxplot (or box-and-whisker plot) is a
standardized way to display data distribution
based on a five-number summary:
• Minimum
• Maximum
• Q1
• Q3
• Median

3
Why Use Boxplots?

• Compact representation of data spread and skewness.


• Quickly compare multiple distributions.
• Identify outliers without complex calculations.

Uses:
• Exploratory Data Analysis (EDA).
• Preprocessing data for machine learning.

4 Boxplot
Comparison with other plots:

Boxplot Scatter plot Histogram


• Summarizes data • Shows individual data • Displays the
using five statistics: points to reveal frequency distribution
min, Q1, median, Q3, patterns and of a single variable.
and max. relationships. • Helps visualize the
• Highlights outliers • Ideal for exploring shape (e.g., skewed,
clearly beyond the correlation between normal, bimodal) of
whiskers. two variables. data.
• Great for comparing • Can become messy • Doesn’t directly show
multiple groups side with large datasets outliers or compare
by side with minimal and doesn’t groups easily.
clutter. summarize
distribution.

5 Boxplot
Example: Analyzing customer
ages for a retail store.

6 Boxplot
7 Presentation title
3. Whiskers
Lines extending from the box to the smallest and largest data points
within a certain range.
Formula Tip:
• Whiskers go up to the values that are not considered outliers.
• Lower Whisker: smallest data point ≥ (Q1 − 1.5 × IQR)
• Upper Whisker: largest data point ≤ (Q3 + 1.5 × IQR)
IQR (Interquartile Range) = Q3 − Q1
Why Whiskers? They help us see the spread of most of the data.

4. Outliers
Data points far outside the whiskers (unusually high/low values).
In a boxplot, they’re usually small dots or stars.
8 Boxplot
9 Boxplot
Creating and
Interpreting a
Boxplot

Presentation title 10
What we'll cover
• Content:
o Steps to create a boxplot
o How to interpret a boxplot:
 Symmetrical vs. skewed data
 Identifying outliers
 Comparing multiple boxplots
o Example with numbers and a boxplot

11 Presentation title
Creating a Boxplot
1. Collect Data: Gather a numerical dataset (e.g., test
scores, sales figures).
2. Order Data: Arrange data in ascending order.
3. Find Key Values:
• Median (Q2): Middle value of the dataset.
• First Quartile (Q1): Median of the lower half.
• Third Quartile (Q3): Median of the upper half.
• Interquartile Range (IQR): Q3 - Q1.
• Whiskers: Extend to the smallest/largest values within 1.5 * IQR
from Q1/Q3.
• Outliers: Values beyond the whiskers.
12 Presentation title
4. Draw the Boxplot:
Code to create boxplot:

13 Presentation title
Interpreting a Boxplot -
Symmetry vs. Skewness
• Symmetrical Data: Median is centered in the box,
whiskers are equal length.
o Indicates a balanced distribution (e.g., normal distribution).
• Skewed Data:
o Right Skew (Positive): Longer upper whisker, median closer to
Q1.
o Left Skew (Negative): Longer lower whisker, median closer to
Q3.
• Why It Matters: Skewness affects data analysis and
model assumptions.
14 Presentation title
15 Presentation title
Interpreting a Boxplot -
Outliers
• What Are Outliers?: Data points below Q1 - 1.5 * IQR or
above Q3 + 1.5 * IQR.
• How to Spot Them: Marked as dots or stars outside the
whiskers in a boxplot.
• Why They Matter:
• May indicate errors, anomalies, or significant variations.
• Critical in data mining for fraud detection or quality control.

16 Presentation title
Interpreting a Boxplot -
Comparing Boxplots
• Purpose: Compare distributions across groups (e.g., sales
by region).
• How to Read:
• Compare medians: Higher/lower central tendency.
• Compare IQRs: Spread of the middle 50% of data.
• Compare whiskers/outliers: Range and anomalies.
• Example: Boxplots of test scores for different classes.
17 Presentation title
Applications of Box Plot
•Outlier
Detection:
•Data
•Comparative
Analysis:
When
Distribution

in DM
Box plots are
excellent for
Visualization:
Box plots
comparing
multiple
datasets (e.g.,
provide a five-
identifying number sales across
outliers in a summary regions or
dataset, (minimum, Q1, performance
median, Q3, across
which can be
maximum), models), box
crucial in plots make it
allowing data
fraud scientists to easy to
detection or understand the compare
data distribution and medians,
spread of data spreads, and
cleaning
quickly. detect outliers
processes. across
18 Presentation title
categories.
Importance of Box Plot in
DM
• Box plots condense large datasets into a simple graphical
representation, aiding quick understanding and decision-making.
• Helps in selecting or transforming variables based on their
distribution, variability, and presence of outliers.
• They provide a clear way to present statistical summaries to
stakeholders, especially in dashboards or reports.
• Unlike mean and standard deviation, box plots use median and
interquartile range, making them more robust to noise and outliers.

19 Presentation title
Practical Scenario: Fraud
Scenario: Credit CardDetection
Fraud Detection
Use of Box Plot:
A bank analyzes of customers to detect potential fraud activities. daily
•A box plot isamounts
transaction created for each Insight:
customer showing their daily •These outliers are far from the
transaction amounts over a typical spending behavior and
month. may indicate unauthorized
•Most customers show consistent transactions.
spending patterns (tight Action Taken:
interquartile ranges). •The system flags the account
•For one customer, the box plot for review. The fraud team
shows: investigates and confirms
• A low median transaction fraudulent activity. The
(e.g., $50) customer is notified, and the
• Multiple outliers above card is deactivated.
20
$2000 Presentation title
Real World Example
Student Math Test Scores
Suppose a teacher wants to visualize the distribution of Math scores out
of 100 for a class of 30 students. The scores are:
45, 50, 52, 55, 58, 60, 60, 61, 62, 63, 65, 65, 66, 67, 68, 70, 71, 72, 74,
75, 76, 78, 80, 82, 85, 88, 90, 92, How
94, 95it's useful
Box plot would show: •Teachers can quickly spot outliers
(e.g., very low or very high
• Minimum score: 45
scores).
• First quartile (Q1): ~62 •They can compare this box plot
• Median (Q2): ~70 to other classes to assess
• Third quartile (Q3): ~82 performance variability.
•It helps identify whether the class
• Maximum score: 95
distribution is skewed or
21
symmetric.
Presentation title
22 Presentation title

You might also like