Frequency Distribution and Graphs
Frequency Distribution and Graphs
Learning Objectives
Upon completion of Chapter 2, you will be able to:
I. Basic Vocabulary
• Raw data is data in its original form.
• A frequency distribution is the organization of raw data into a table using categories for the
data in one column and the frequencies for each category in the second column.
• Frequency (f) is the tally or count of the number of data values in each class.
• Relative frequency (f/n) is the tally or count of the number of data values in each class
divided by the total number of data values.
• Cumulative Frequency is the tally or count of the number of data values in a class plus the
frequencies for all lower classes.
• Cumulative relative fequency is the cumlative frequency divided by the totally number of
data values.
b) Ungrouped frequency distribution (for data with a small range) is a chart of each
possible individual value of data in the first column and the count of the amount of
data with that value in the second column.
Dr. Janet Winter, [email protected] Stat 200 Page 1
B. Examples of Frequency Distributions
I. Qualitative or Categorical Frequency Distributions
• Create a table with gender (Male/Female) in the first column and the count of the
number of men and women in the class in the second column.
• Create a table with level of Employment (none, part time, full time) in the first
column and the count of the number of students in the class in each category in the
second column.
• In the first column, list the numbers 0, 1, 2, 3, 4… representing the number of cars
in your family. In the second column, list the count of the number of students with
that many cars in their family.
I. Find the class limits (same number of decimal places as the data).
II. Find upper class boundaries by adding ½ unit to the upper class limit of each class.
III. Find the lower class boundaries by subtracting ½ unit from the lower class limit of each
class.
IV.Graphs
A. The Role of Graphs
• Presents the data in pictorial form.
• Attracts attention in a publication or a presentation.
B. Types of Graphs
• Bar graph – graph of the frequency distribution for qualitative or categorical data.
• Histograph – graph of the frequency distribution for quantitative data.
• Ogive – graph of the cumulative frequency for quantitative data.
• Frequency polygon – graph of the frequency for quantitative data.
Vertical (or horizontal) bars are proportional to the frequencies for each class.
Class
Frequency
Boundaries
0.5 – 20.5 4
20.5 – 40.5 9
40.5 – 60.5 20
60.5 – 80.5 40
80.5 – 100.5 24
Note: The scale on the non-frequency axis is either the class boundaries or class midpoints.
• Class midpoints are located in the middle of the bars and class boundaries are
located at the ends of the bars.
D. Frequency Polygon
Scale: class midpoints
• Plot the frequency of each class at its midpoint, i.e., (class midpoint, class frequency.)
• The scale is sequential midpoints.
• Extend the midpoint scale once below the first class midpoint and once above the last
class midpoint. Label the extensions.
• Plot a point at each extension with a frequency of zero (extension, 0).
• Connect all of the points with line segments forming a polygon.
Note: Remember a polygon is a many sided closed figure. The extension points and the
axis make the figure closed.
Note: Cumulative frequency for each upper boundary is the sum of the frequency in that
class plus all lower class frequencies.
Start with the lowest class boundary (lowest lower boundary, 0) and a frequency of zero,
then plot the cumulative frequency at the class boundary of each class. End with the
highest upper boundary (highest upper boundary, n)
Number of Students
90
80
70
60
50
40
30
20
10
0
0.5 20.5 40.5 60.5 80.5 100.5
Note: The line segments connect at (.5, 0), (20.5, 4), (40.5, 13), (60.5, 33), (80.5, 73),
(100.5, 97) which are the (lowest lower boundary, 0), (first upper boundary, frequency for
first class), (second upper boundary, frequency for second class),…(last upper boundary,
total frequency).
30
Frequency
20
10
0
Auto Bus Trolley Train Walk
50
45
40
35
12 1 2 3 4 5
Time
IV. A pie graph is a circle divided into sections proportional to the percentage in each category.
Pretzels
14%
Tortilla
Chips
27%
Note: The degree for a segment is the relative frequency for the segment times 360°.
V. A stem-and-leaf plot
• Use for quantitative data
• Vertically ordered list of the left part of the data digits (or stem)
• The right most digit of the data digits (called the leaf) listed horizontally and
sequentially to the right
• Retains actual data while showing it in graphic form.
b) Example:
Data: 123 125 131 113 101 102 104 111
114 111 132 133 141 142 143 132
Stem Plot:
10 1 2 4
11 1 1 3 4
12 3 5
13 1 2 2 3
14 1 2 3
2. A statistics professor gives a very easy 100 point test, with the highest score being 98 and
the lowest score being 71. We want to divide this data into categories. Then, a reasonable
width of categories could be
a) 1
b) 5
c) 10
3. The manager of a computer store wishes to track how many computer monitors of
different screen sizes are sold during the week. He tallies the sales by the following
categories: less than 15”, 15-15.9”, 16-16.9”, 17-17.9” 18-18.9”, 19-19.9”, and 20” and
above. The best way to represent the data is using a
a) Histogram.
b) Frequency polygon.
c) Ogive.
d) All of the above.
5. If we would like to display all the areas of the states in the Unites States and we only care
about the states with the largest areas, then an appropriate graph would be a
a) Pareto chart.
b) Time series graph.
c) Pie graph.
6. The dean of engineering at a school wishes to track the number of students with
engineering majors over the past 10 years. An appropriate graph would be a
a) Pareto chart.
b) Time series graph.
c) Pie graph.
• Pareto charts and bar graphs are frequency graphs for qualitative variables.
• Time series graphs are used to show a pattern or trend that occurs over time.
• Pie graphs are used to show the relationship between the parts and the whole for
qualitative or categorical data.
• Data can be organized in meaningful ways using frequency distributions and graphs.
2. A statistics professor gives a very easy 100 point test, with the highest score being a 98 and
the lowest score being 71. We want to divide this data into categories. Then, a reasonable
width of categories could be
b) 5
3. The manager of a computer store wishes to track how many computer monitors of
different screen sizes are sold during the week. He tallies the sales by the following
categories: less than 15”, 15-15.9”, 16-16.9”, 17-17.9” 18-18.9”, 19-19.9”, and 20” and
above. The best way to represent the data is using a
d) All of the above.
5. If we would like to display the areas of the states in the United States and we only care
about the states with the largest areas, then an appropriate graph would be a
a) Pareto chart.
6. The dean of engineering at a school wishes to track the number of students with
engineering majors over the past 10 years. An appropriate graph would be a
b) Time series graph.