MathEng3_EDA_Lesson 4_Frequency Distribution and Graphs
MathEng3_EDA_Lesson 4_Frequency Distribution and Graphs
and Graphs
MATHENG3 – Engineering Data Analysis
Engr. Jerime C. Jimenez, MSCE, SO2
Lesson Objectives
• Represent data using bar graphs, Pareto charts, time series graphs,
pie graphs, and dot plots.
When the data is presented in original form, they are called raw data.
Organizing Data
Ages of the Top 50 Wealthiest People
45 46 64 57 85
A frequency distribution is the organization of raw data in table form, using classes and frequencies
Frequency Distribution
Ages of the Top 50
CLASS LIMITS TALLY FREQUENCY
Wealthiest People
88 45 89 67 56 81 – 89 |||| || 7
90 – 98 || 2
81 58 55 62 38
50
55 56 64 81 38
Frequency Distribution Table
49 68 91 56 68
46 47 83 71 62
ENGR. JERIME C. JIMENEZ, MSCE, SO2
Categorical and
Grouped
Organize data using a frequency distribution
Categorical Frequency Distributions
The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal- or ordinal-level data.
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Categorical Frequency Distributions
The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal- or ordinal-level data.
Step 2: Tally the data and place the results in column ‘Tally’.
Categorical Frequency Distributions
The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal- or ordinal-level data.
Step 3: Count the tallies and place the results in column ‘Frequency’.
Categorical Frequency Distributions
The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal- or ordinal-level data.
Step 4: Find the percentage of values in each class by using the formula:
𝒇
% = ⋅ 𝟏𝟎𝟎
𝒏
Categorical Frequency Distributions
The categorical frequency distribution is used for data that can be placed in
specific categories, such as nominal- or ordinal-level data.
Conclusion: More people have type ‘O’ blood than any other type.
Grouped Frequency Distributions
When the range of the data is large, the data must be grouped into classes that are
more than one unit in width, in what is called a grouped frequency distribution.
CLASS CLASS
TALLY FREQUENCY
LIMITS BOUNDARIES
58 – 64 57.5 – 64.5 | 1
65 – 71 64.5 – 71.5 |||| | 6
72 – 78 71.5 – 78.5 |||| |||| 10
79 – 85 78.5 – 85.5 |||| |||| |||| 14
86 – 92 85.5 – 92.5 |||| |||| || 12
93 – 99 92.5 – 99.5 |||| 5
100 – 106 99.5 – 106.5 || 2
Total 50
Grouped Frequency Distributions
2. It is preferable but not absolutely necessary that the class width be an odd number.
This ensures that the midpoint of each class has the same place value as the data.
Age Age
10 – 20 10 – 20
20 – 30 21 – 31
30 – 40 32 – 42
40 – 50 43 – 53
127 117 114 110 120 2 105 109 104.5 109.5 |||| |||
120 116 115 121 117 3 110 114 109.5 114.5 |||| |||| |||| |||
134 118 118 113 105 4 115 119 114.5 119.5 |||| |||| |||
118 122 117 120 110 5 120 124 119.5 124.5 |||| ||
H = 19 and L = 12
Ungrouped Frequency Distributions
Example: The data shown here represent the number of miles per gallon (mpg) that
30 selected four-wheel drive sport utility vehicles obtained in city driving. Construct a
frequency distribution, and analyze the distribution.
STEP 3 Plot the frequencies for each class, and draw the vertical bars for the
histogram and the lines for the frequency polygon and ogive.
NOTE: Remember that the lines for the frequency polygon begin and end on
the x axis while the lines for the ogive begin on the x axis.
Histogram
The histogram is a graph that displays the data by using contiguous vertical bars
(unless the frequency of a class is 0) of various heights to represent the frequencies of
1 99.5 104.5 2
2 104.5 109.5 8
3 109.5 114.5 18
4 114.5 119.5 13
5 119.5 124.5 7
6 124.5 129.5 1
7 129.5 134.5 1
Total 50
x
3 109.5 114.5 18 12
4 114.5 119.5 13
9
5 119.5 124.5 7
6 124.5 129.5 1 6
7 129.5 134.5 1 3
Total 50
0 x
99.5° 104.5° 109.5° 114.5° 119.5° 124.5° 129.5° 134.5°
Step 2: Represent the frequency in the y axis and the class boundary on the x axis.
Histogram
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
y
6 124.5 129.5 1 6
7 129.5 134.5 1 3 2
1 1
Total 50
0 x
99.5° 104.5° 109.5° 114.5° 119.5° 124.5° 129.5° 134.5°
Step 3: Using frequencies as the heights, draw vertical bars for each class.
Frequency Polygon
The frequency polygon is a graph that displays the data by using lines that connect
points plotted for the frequencies at the midpoints of the classes. The frequencies are
Frequency Polygon
Frequency Polygon
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
1 99.5 104.5 2
2 104.5 109.5 8
3 109.5 114.5 18
4 114.5 119.5 13
5 119.5 124.5 7
6 124.5 129.5 1
7 129.5 134.5 1
x
102° 107° 112° 117° 122° 127° 132°
Step 2: Draw the x and y axes. Label the x axis with the midpoint of each class, and then use
a suitable scale on the y axis for the frequencies.
Frequency Polygon
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
y
x
102° 107° 112° 117° 122° 127° 132°
Step 3: Using the midpoints for the x values and the frequencies as the y values, plot the
points.
Frequency Polygon
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
y
x
102° 107° 112° 117° 122° 127° 132°
Step 4: Connect adjacent points with line segments. Draw a line back to the x axis at the
beginning and end of the graph, at the same distance that the previous and next midpoints
would be located.
Ogive
The ogive is a graph that represents the cumulative frequencies for the classes in a
frequency distribution. This type of graph is also called the cumulative frequency
Ogive
Ogive
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
1 99.5 104.5 2
2 104.5 109.5 8
3 109.5 114.5 18
4 114.5 119.5 13
5 119.5 124.5 7
6 124.5 129.5 1
7 129.5 134.5 1
Total 50
CUMULATIVE 45
FREQUENCY
40
Less than 99.5 0
35
Less than 104.5 2
30
Less than 109.5 10
25
Less than 114.5 28
20
Less than 119.5 41
15
Less than 124.5 48
10
Less than 129.5 49
5
Less than 134.5 50
Step 2: Draw the x and y axes. Label the x axis with the class boundaries. Use an appropriate
scale for the y axis to represent the cumulative frequencies.
Ogive
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
CUMULATIVE 45
FREQUENCY
40
Less than 99.5 0
35
Less than 104.5 2
30
Less than 109.5 10
25
Less than 114.5 28
20
Less than 119.5 41
15
Less than 124.5 48
10
Less than 129.5 49
5
Less than 134.5 50
Step 3: Plot the cumulative frequency at each upper class boundary. Upper boundaries are
used since the cumulative frequencies represent the number of data values accumulated up to
the upper boundary of each class.
Ogive
Example: Construct a histogram to represent the data shown for the record high
temperatures for each of the 50 states.
CUMULATIVE 45
FREQUENCY
40
Less than 99.5 0
35
Less than 104.5 2
30
Less than 109.5 10
25
Less than 114.5 28
20
Less than 119.5 41
15
Less than 124.5 48
10
Less than 129.5 49
5
Less than 134.5 50
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑐𝑜𝑢𝑛𝑡
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Relative Frequency Graphs
Example: Construct a histogram, frequency polygon, and ogive using relative
frequencies for the distribution of the miles that 20 randomly selected runners ran
during a given week.
1 5.5 10.5 1
2 10.5 15.5 2
3 15.5 20.5 3
4 20.5 25.5 5
5 25.5 30.5 4
6 30.5 35.5 3
7 35.5 40.5 2
Total 20
Step 1: Convert each frequency to a proportion or relative frequency by dividing the frequency
for each class by the total number of observations.
Relative Frequency Graphs
Example: Construct a histogram, frequency polygon, and ogive using relative
frequencies for the distribution of the miles that 20 randomly selected runners ran
during a given week.
Relative frequency
0.20
6 30.5 35.5 3/20 = 0.15 0.20
7 35.5 40.5 2/20 = 0.10
0.15 0.15
Total 1.00 0.15
0.10 0.10
0.10
0.05
0.05
0 x
5.5 10.5 15.5 20.5 25.5 30.5 35.5 40.5
Miles
Relative Frequency Graphs
CLASS MID
# BOUNDARIES POINTS
REL. FREQ.
Relative frequency
5 25.5 30.5 28 4/20 = 0.20 0.20
0.05
0
8 13 18 23 28 33 38
x
Miles
Relative Frequency Graphs
CUM. CUM. REL.
FREQ. FREQ.
y Ogive for Runners’ Miles
Relative frequency
0.80
Less than 25.5 11 0.55
0.20
0 x
5.5 10.5 15.5 20.5 25.5 30.5 35.5 40.5
Miles
Graphs
Other Types of
Electronics $ 728
Dorm décor
Shoes 72 Shoes
0 x
$100 $200 $300 $400 $500 $600 $700 $800
Step 1: Draw and label the x and y axes. For the horizontal bar graph place the frequency
scale on the x axis, and for the vertical bar graph place the frequency scale on the y axis
Bar Graph
Example: The table shows the average money spent by first-year college students.
Draw a horizontal and vertical bar graph for the data.
y Average Amount Spent
$600
Electronics $ 728
$500
Dorm décor 344
$400
$200
Shoes 72
$100
0 x
Electronics Dorm décor Clothing Shoes
Step 1: Draw and label the x and y axes. For the horizontal bar graph place the frequency
scale on the x axis, and for the vertical bar graph place the frequency scale on the y axis
Bar Graph
Bar graphs can also be used to compare data for two or more groups. These types of bar
graphs are called compound bar graphs.
Step 1: Arrange the data from the largest to smallest according to frequency
Pareto Chart
Example: The data shown here consist of the number of police calls for specific
categories that a local municipality received in 2011. Draw and analyze a Pareto Chart
for the data. y Police Calls
Disabled vehicle 65 50
40
Driving under the
38 30
influence
20
Loud
27 10
noise/music/party
0 x
Step 2: Draw and label the x and y axes. Then, draw the bars corresponding to the
frequencies.
Pareto Chart
Step 1: Draw and label the x and y axes. Label the x axis for years and the y axis for the
percent.
Time Series Graph
Year 1970 1980 1990 2000 2010
Percent 37 33 25 23 19
40
35
30
Percent
25
20
15
10
0 x
1970 1980 1990 2000 2010
Year
Time Series Graph
When two or more data sets are compared on the same graph it is called a compound
time series graph.
Step 1: Since there are 360° in a circle, the frequency for each class must be converted to a
𝑓
proportional part of the circle. This conversion is done by using the formula 𝐷𝑒𝑔𝑟𝑒𝑒𝑠 = ⋅ 360°
𝑛
Pie Graph
Example: This frequency distribution shows the number of pounds of each snack food
eaten during the Super Bowl. Construct a pie graph for the data.
𝑓
Step 2: Each frequency must also be converted to a percentage. % = ⋅ 100
𝑛
Pie Graph
Example: This frequency distribution shows the number of pounds of each snack food
eaten during the Super Bowl. Construct a pie graph for the data.
Step 3: Using a protractor and a compass, draw the graph, using the appropriate degree
measures found in Step 1, and label each section with the name and percentages.
Pie Graph
Example: This frequency distribution shows the number of pounds of each snack food
eaten during the Super Bowl. Construct a pie graph for the data.
Potato chips
Popcorn 3.8 million 46° 12.7% 38%
Snack nuts 2.5 million 30° 8.3%
30.0 million 360° 99.9%
Pretzels
14%
Tortilla chips
27%
Dotplot
A dotplot is a statistical graph in which each data value is plotted as a point (dot)
above the horizontal axis. Dotplots are used to show how the data values are
Step 1: Find the lowest and highest data values, and decide what scale to use on the
horizontal axis.
Dotplot
Example: The data show the number of named storms each year for the last 40 years.
Construct and analyze a dotplot for the data.
Lowest data: 4
Highest data: 28
5 10 15 20 25 30
Step 2: Draw a horizontal line, and draw the scale on the line
Dotplot
Example: The data show the number of named storms each year for the last 40 years.
Construct and analyze a dotplot for the data.
Lowest data: 4
Highest data: 28
5 10 15 20 25 30
The graph shows that the majority of the named storms occur
with frequency between 6 and 16 per year. There are only 3
years when there were 19 or more named storms per year.
Step 3: Plot each data value above the line. If the value occurs more than once, plot the other
point above the first point.
Stem and Leaf Plots
The stem and leaf plot is a method of organizing data and is a combination of sorting
and graphing. It has the advantage over a grouped frequency distribution of retaining
52 62 51 50 69 50, 51, 51, 52, 53, 53, 55, 55, 56, 57, 57, 58, 59, 62, 63,
65, 65, 66, 66, 67, 68, 69, 69, 72, 73, 75, 75, 77, 78, 79
58 77 66 53 57
75 56 55 67 73
79 59 68 65 72
57 51 63 69 75
65 53 78 66 55
52 62 51 50 69 50, 51, 51, 52, 53, 53, 55, 55, 56, 57, 57, 58, 59, 62, 63,
65, 65, 66, 66, 67, 68, 69, 69, 72, 73, 75, 75, 77, 78, 79
58 77 66 53 57
75 56 55 67 73 50 – 54 50, 51, 51, 52, 53, 53
79 59 68 65 72 55 – 59 55, 55, 56, 57, 57, 58, 59
57 51 63 69 75 60 – 64 62, 63
65 53 78 66 55 65 – 69 65, 65, 66, 66, 67, 68, 69, 69
70 – 74 72, 73
75 - 79 75, 75, 77, 78, 79
50 – 54 50, 51, 51, 52, 53, 53 Leading digit (stem) Trailing digit (leaf)
55 – 59 55, 55, 56, 57, 57, 58, 59 5 011233
60 – 64 62, 63 5 5567789
65 – 69 65, 65, 66, 66, 67, 68, 69, 69 6 23
70 – 74 72, 73 6 55667899
75 - 79 75, 75, 77, 78, 79 7 23
7 55789
50 – 54 50, 51, 51, 52, 53, 53 Leading digit (stem) Trailing digit (leaf)
55 – 59 55, 55, 56, 57, 57, 58, 59 5 011233
60 – 64 62, 63 5 5567789
65 – 69 65, 65, 66, 66, 67, 68, 69, 69 6 23
70 – 74 72, 73 6 55667899
75 - 79 75, 75, 77, 78, 79 7 23
7 55789
When you analyze a stem and leaf plot, look for peaks and gaps in the distribution. See if the
distribution is symmetric or skewed. Check the variability of the data by looking at the
spread.
Stem and Leaf Plot
Example 2: The number of stories in two selected samples of tall buildings in Atlanta
and Philadelphia is shown. Construct a back-to-back stem and leaf plot, and compare
the distributions.
Step 2: Construct a stem and leaf plot, using the same digits as stems. Place the digits for the
leaves for Atlanta on the left side of the stem and the digits for the leaves for Philadelphia on
the right side.
Stem and Leaf Plot
Example 2: The number of stories in two selected samples of tall buildings in Atlanta
and Philadelphia is shown. Construct a back-to-back stem and leaf plot, and compare
the distributions.
Step 3: Compare the distributions. The buildings in Atlanta have a large variation in the
number of stories per building. Although both distributions are peaked in the 30- to 39-story
class, Philadelphia has more buildings in this class. Atlanta has more buildings that have 40
or more stories than Philadelphia does.
Frequency Distributions
and Graphs
MATHENG3 – Engineering Data Analysis
Engr. Jerime C. Jimenez, MSCE, SO2