0% found this document useful (0 votes)
11 views

Week 2-4-Organizing and Visualizing Variables

Uploaded by

Naomi Ivo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Week 2-4-Organizing and Visualizing Variables

Uploaded by

Naomi Ivo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lesson 2

Organizing and Visualizing Variables

https://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-visualization/
ORGANIZING
1 CATEGORICAL Variables
Main Reason Young Adults Shop Online
Reason For Shopping Online? Percent
Better Prices 37%
Avoiding holiday crowds or hassles 29%
Convenience 18%
Better selection 13%
Ships directly 3%
Source: Data extracted and adapted from “Main Reason Young Adults Shop Online?” USA Today, December 5, 2012, p. 1A.

Organizing Categorical Data – SUMMARY Table


with ONE Category
A random sample of 400 invoices is drawn. Each invoice is categorized as a small,
medium, or large amount. Each invoice is also examined to identify if there are any
errors.

Frequency of Invoices Categorized By Size and The Presence Of Errors


No Errors Errors Total

Small Amount 170 20 190


Medium Amount 100 40 140
Large Amount 65 5 70
Total 335 65 400

Organizing Categorical Data – CONTINGENCY Table


with TWO Categories
Contingency Table based on Percentage Of Overall Total
No
Errors Errors Total 42.50% = 170 / 400
Small 170 20 190 25.00% = 100 / 400
Amount 16.25% = 65 / 400
Medium 100 40 140
Amount
No
Large 65 5 70 Errors Errors Total
Amount
Small 42.50% 5.00% 47.50%
Total 335 65
400 Amount
Medium 25.00% 10.00% 35.00%
Amount
Findings Large 16.25% 1.25% 17.50%
Amount
Total 83.75% 16.25% 100.0%

83.75% of sampled invoices have no errors and 47.50% of sampled invoices are for
small amounts.
Contingency Table based on Percentage Of Rows Total
No
Errors Errors Total 89.47% = 170 / 190
Small 170 20 190 71.43% = 100 / 140
Amount 92.86% = 65 / 70
Medium 100 40 140
Amount
No
Large 65 5 70 Errors Errors Total
Amount
Small 89.47% 10.53% 100.0%
Total 335 65 400 Amount
Medium 71.43% 28.57% 100.0%
Amount
Findings Large 92.86% 7.14% 100.0%
Amount
Total 83.75% 16.25% 100.0%

Medium invoices have a larger chance (28.57%) of having errors than small (10.53%) or
large (7.14%) invoices.
Contingency Table based on Percentage Of Columns Total
No
Errors Errors Total 50.75% = 170 / 335
Small 170 20 190 30.77% = 20 / 65
Amount
Medium 100 40 140
Amount
No
Large 65 5 70 Errors Errors Total
Amount
Small 50.75% 30.77% 47.50%
Total 335 65 400 Amount
Medium 29.85% 61.54% 35.00%
Amount
Findings Large 19.40% 7.69% 17.50%
Amount
Total 100.0% 100.0% 100.0%

There is a 61.54% chance that invoices with errors are of medium size.
A Contingency Table – WHY?
1. Study patterns that may exist between the responses of two or
more categorical variables
2. Cross tabulates or tallies jointly the responses of the
categorical variables
3. For two variables the tallies for one variable are located in the
rows and the tallies for the second variable are located in the
columns
ORGANIZING
1 CATEGORICAL Variables

Categorical Data

Tallying Data

One Two/More
Categorical Categorical
Variable Variables

Summary Contingency
Table Table
ORGANIZING
2 NUMERICAL Variables
Age of Surveyed College Students in Ascending Order
Day Students
16 17 17 18 18 18
19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45

Organizing Numerical Data – ORDERED ARRAY


▪ A sequence of data, in rank order, from the smallest value to the largest value
▪ Shows range (minimum value to maximum value)
▪ May help identify outliers (unusual observations)
A manufacturer of insulation randomly selects 20 winter days and records the
daily high temperature
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15% 3 15%


20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%
50 but less than 60 2 10% 20 100%

Total 20 100 20 100%

Organizing Numerical Data – FREQUENCY DISTRIBUTION


A summary table in which the data are arranged into numerically ordered classes
STEPS:
1. Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53,
58
2. Find range: 58 - 12 = 46
3. Select number of classes: 5 (usually between 5 and 15)
4. Compute class interval (width): 10 (46/5 then round up)
5. Determine class boundaries (limits):
Class 1: 10 to less than 20
Class 2: 20 to less than 30
Class 3: 30 to less than 40
Class 4: 40 to less than 50
Class 5: 50 to less than 60
6. Compute class midpoints: 15, 25, 35, 45, 55
7. Count observations & assign to classes
A Frequency Distribution – WHY?
1. It condenses the raw data into a more useful form
2. It allows for a quick visual interpretation of the data
3. It enables the determination of the major characteristics of the
data set including where the data are concentrated / clustered
Frequency Distributions: Some Tips
1. Different class boundaries may provide different pictures for
the same data (especially for smaller data sets)
2. Shifts in data concentration may show up when different
class boundaries are chosen
3. As the size of the data set increases, the impact of
alterations in the selection of class boundaries is greatly
reduced
4. When comparing two or more groups with different sample
sizes, you must use either a relative frequency or a
percentage distribution
Relative & Percent Frequency Distribution
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15%
20 but less than 30 6 .30 30%
30 but less than 40 5 .25 25%
40 but less than 50 4 .20 20%
50 but less than 60 2 .10 10%
Total 20 1.00 100%
ORGANIZING
2 NUMERICAL Variables

Numerical Data

Ordered Array Frequency Cumulative


Distributions Distributions
1 2
TABLE
VISUALIZING
3 CATEGORICAL Variables
Visualizing Categorical Data
Through Graphical Displays
Categorical
Data
Visualizing Data

Summary Contingency
Table For One Table For Two
Variable Variables

Bar Pareto Side By Side


Chart Chart Bar Chart
Pie Chart
The Pareto Chart
Ordered Summary Table For Causes
Of Incomplete ATM Transactions
Cumulative
Cause Frequency Percent Percent
Warped card jammed 365 50.41% 50.41%
Card unreadable 234 32.32% 82.73%
ATM malfunctions 32 4.42% 87.15%
ATM out of cash 28 3.87% 91.02%
Invalid amount requested 23 3.18% 94.20%
Wrong keystroke 23 3.18% 97.38%
Lack of funds in account 19 2.62% 100.00%
Total 724 100.00%
Source: Data extracted from A. Bhalla, “Don’t Misuse the Pareto Principle,” Six Sigma Forum
Magazine, May 2009, pp. 15–18.
The “Vital Few”
Side By Side Bar Charts
The side by side bar chart represents the data from a contingency table.

No Invoice Size Split Out By Errors & No


Errors Errors Total Errors
Small 50.75% 30.77% 47.50%
Amount Errors

Medium 29.85% 61.54% 35.00%


Amount
No Errors
Large 19.40% 7.69% 17.50%
Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
Total 100.0% 100.0% 100.0%
Large Medium Small

Invoices with errors are much more likely to be of medium size (61.54% vs 30.77%
and 7.69%)
VISUALIZING
3 CATEGORICAL Variables

Categorical
Data
Visualizing Data

Summary Contingency
Table For One Table For Two
Variable Variables

Bar Pareto Side By Side


Chart Chart Bar Chart
Pie Chart
VISUALIZING
4 NUMERICAL Variables
Stem and Leaf Display
Age of Surveyed Day Students
College Students
16 17 17 18 18 18
19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45

Age of College Students


Day Students Night Students

Stem Leaf Stem Leaf


1 67788899 1 8899
2 0012257 2 0138
3 28 3 23
4 2 4 15
To see how the data are distributed and where concentrations of data exist
The Histogram
Relative
Class Frequency
Frequency
Percentage

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100

8
(In a percentage histogram the
Histogram: Age Of Students
vertical axis would be defined
6
Frequency
to show the percentage of
observations per class)

0
5 15 25 35 45 55 More
The Frequency Polygon
Useful When Comparing Two or More Groups
The Percentage Polygon
The Polygon
1. A percentage polygon is formed by having the
midpoint of each class represent the data in that class
and then connecting the sequence of midpoints at their
respective class percentages.
2. The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis, and
the cumulative percentages along the Y axis.
3. Useful when there are two or more groups to compare.
Two Numerical
Variables

Scatter Time-
Plot Series
Plot
Scatter Plot
For numerical data consisting of paired observations taken from two numerical
variables and to examine possible relationships between two numerical variables

Volume Cost per


per day day Cost per Day vs. Production Volume
23 125
250
26 140
200
Cost per Day
29 146
150
33 160
38 167
100

42 170
50

50 188 0
20 30 40 50 60 70
55 195
Volume per Day
60 200
Time Series Plot
to study patterns in the values of a numeric variable over time

Number of
Number of Franchises, 1996-2004
Year Franchises
120
1996 43
100
1997 54
Franchises
Number of 80
1998 60
60
1999 73
2000 82 40
2001 95 20
2002 107 0
2003 99 1994 1996 1998 2000 2002 2004 2006
2004 95 Year

Numeric variable is measured on the vertical (y) axis and the


time period is measured on the horizontal (x) axis
VISUALIZING
4 NUMERICAL Variables

Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Histogram Polygon Ogive
Display
ORGANIZING VISUALIZING

1 2 3 4
TABLE CHART

You might also like