0% found this document useful (0 votes)
8 views

3. Lacture Note 03_Frequency Distributions and Graphical Representation

Uploaded by

Sabbir Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

3. Lacture Note 03_Frequency Distributions and Graphical Representation

Uploaded by

Sabbir Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Frequency Distributions and Graphical Representations

Frequency Distribution
When conducting a statistical study, the researcher must gather data for the particular variable
under study. For example, if a researcher wishes to study the number of people who were bitten
by poisonous snakes in a specific geographic area over the past several years, he or she has to
gather data from various doctors, hospitals, or health departments.

To describe situations, draw conclusions, or make inferences about events, the researcher must
organize the data in some meaningful way. The most convenient method of organizing data is to
construct a frequency distribution.

A frequency distribution is the organization of raw data in table form, using classes and frequencies.
Example
Suppose a researcher wished to do a study on the ages of the top 50 wealthiest people in the
world. The researcher first would have to get the data on the ages of the people. In this case, these
ages are listed in Forbes Magazine. When the data are in original form, they are called raw data
and are listed next.

49 57 38 73 81 78 82 43 64 67 52 56 81
77 79 85 40 85 59 80 60 71 57 61 69 61
83 90 87 74 74 59 76 65 69 54 56 69 68
78 65 85 49 69 61 48 81 68 37 43
A frequency distribution consists of classes and their corresponding frequencies. Each raw data
value is placed into a quantitative or qualitative category called a class. The frequency of a class
then is the number of data values contained in a specific class. A frequency distribution is shown
for the preceding data set.

The classes in this distribution are 35–41, 42–48, etc. These values are called class limits. The data
values 35, 36, 37, 38, 39, 40, 41 can be tallied in the first class; 42, 43, 44, 45, 46, 47, 48 in the
second class; and so on.
Two types of frequency distributions that are most often used are the categorical frequency
distribution and the grouped frequency distribution.

Categorical Frequency Distributions


The categorical frequency distribution is used for data that can be placed in specific categories, such
as nominal- or ordinal-level data. For example, data such as political affiliation, religious affiliation,
or major field of study would use categorical frequency distributions.

Example

Twenty-five army inductees were given a blood test to determine their blood type. The data set is

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.

Since the data are categorical, discrete classes can be used. There are four blood types: A, B, O,
and AB. These types will be used as the classes for the distribution.

The procedure for constructing a frequency distribution for categorical data is given next

Step 1 Make a table as shown.

Class Tally Frequency Percent

AB

Step 2 Tally the data and place the results in column B.

Step 3 Count the tallies and place the results in column C.

Step 4 Find the percentage of values in each class by using the formula %= f/n*100% where f=
frequency of the class and n= total number of values. For example, in the class of type A blood,
the percentage is %=5/25*100%=20%
Percentages are not normally part of a frequency distribution, but they can be added since they
are used in certain types of graphs such as pie graphs.
Also, the decimal equivalent of a percent is called a relative frequency.

Step 5 Find the totals for columns C (frequency) and D (percent). The completed table is shown.

Grouped Frequency Distributions


When the range of the data is large, the data must be grouped into classes that are more than one
unit in width, in what is called a grouped frequency distribution. For example, a distribution of
the number of hours that boat batteries lasted is the following.

In this distribution, the values 24 and 30 of the first class are called class limits. The lower-class limit
is 24; it represents the smallest data value that can be included in the class. The upper-class limit is
30; it represents the largest data value that can be included in the class. The numbers in the second
column are called class boundaries. These numbers are used to separate the classes so that there
are no gaps in the frequency distribution. The gaps are due to the limits; for example, there is a
gap between 30 and 31.

Students sometimes have difficulty finding class boundaries when given the class limits. The basic
rule of thumb is that the class limits should have the same decimal place value as the data, but the
class boundaries should have one additional place value and end in a 5. For example, if the values
in the data set are whole numbers, such as 24, 32, and 18, the limits for a class might be 31–37,
and the boundaries are 30.5–37.5. Find the boundaries by subtracting 0.5 from 31 (the lower-class
limit) and adding 0.5 to 37 (the upper-class limit).

Lower limit - 0.5 = 31 - 0.5 = 30.5 = lower boundary


Upper limit - 0.5 = 37 - 0.5 = 37.5 = upper boundary

If the data are in tenths, such as 6.2, 7.8, and 12.6, the limits for a class hypothetically might be
7.8–8.8, and the boundaries for that class would be 7.75–8.85. Find these values by subtracting
0.05 from 7.8 and adding 0.05 to 8.8.

Finally, the class width for a class in a frequency distribution is found by subtracting the lower (or
upper) class limit of one class from the lower (or upper) class limit of the next class. For example,
the class width in the preceding distribution on the duration of boat batteries is 7, found from 31
- 24 = 7.

The class width can also be found by subtracting the lower boundary from the upper boundary
for any given class. In this case, 30.5 - 23.5 = 7.

The researcher must decide how many classes to use and the width of each class. To construct a
frequency distribution, follow these rules:

1. There should be between 5 and 20 classes. Although there is no hard-and-fast rule for the
number of classes contained in a frequency distribution, it is of the utmost importance to
have enough classes to present a clear description of the collected data.
2. It is preferable but not absolutely necessary that the class width be an odd number. This
ensures that the midpoint of each class has the same place value as the data.
The class midpoint Xm is obtained by adding the lower and upper boundaries and dividing
by 2, or adding the lower and upper limits and dividing by 2:
Xm = (lower boundary + upper boundary)/2 or Xm = (lower limit + upper limit)/2
The midpoint is the numeric location of the center of the class. Midpoints are necessary for
graphing. If the class width is an even number, the midpoint is in tenths.
3. The classes must be mutually exclusive. Mutually exclusive classes have nonoverlapping class
limits so that data cannot be placed into two classes. Many times, frequency distributions
such as
Age
10–20
20–30
30–40
40–50
are found in the literature or in surveys. If a person is 40 years old, into which class should
she or he be placed? A better way to construct a frequency distribution is to use classes such
as
Age
10–20
21–31
32–42
43–53
4. The classes must be continuous. Even if there are no values in a class, the class must be
included in the frequency distribution. There should be no gaps in a frequency distribution.
The only exception occurs when the class with a zero frequency is the first or last class. A
class with a zero frequency at either end can be omitted without affecting the distribution.
5. The classes must be exhaustive. There should be enough classes to accommodate all the
data.
6. The classes must be equal in width. This avoids a distorted view of the data.

Guidelines for Creating Frequency Distributions from Grouped Data:


1. Find the range of values—the difference between the highest and lowest values.

2. Decide how many intervals to use (usually choose between 6 and 20 unless the data set is very
large). The choice should be based on how much information is in the distribution you wish to
display.

3. To determine the width of the interval, divide the range by the number of class intervals selected.
Round this result as necessary.

4. Be sure that the class categories do not overlap!

5. Most of the time, use equally spaced intervals, which are simpler than unequally spaced intervals
and avoid interpretation problems. In some cases, unequal intervals may be helpful to emphasize
certain details. Sometimes wider intervals are needed where the data are sparse.

Exercise
These data represent the record high temperatures in degrees Fahrenheit (8F) for each of the 50
states. Construct a grouped frequency distribution for the data using 7 classes. (Do it by yourself)

112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Cumulative Frequency Distribution
Sometimes it is necessary to use a cumulative frequency distribution. A cumulative frequency
distribution is a distribution that shows the number of data values less than or equal to a specific
value (usually an upper boundary). The values are found by adding the frequencies of the classes
less than or equal to the upper-class boundary of a specific class. This gives an ascending cumulative
frequency. In this example, the cumulative frequency for the first class is 0 + 2 = 2; for the second
class it is 0 + 2 + 8 = 10; for the third class it is 0 + 2 + 8 + 18 = 28. Naturally, a shorter way to
do this would be to just add the cumulative frequency of the class below to the frequency of the
given class. For example, the cumulative frequency for the number of data values less than 114.5
can be found by adding 10 1 18 5 28. The cumulative frequency distribution for the data in this
example is as follows:

Cumulative frequencies are used to show how many data values are accumulated up to and
including a specific class. In this example, 28 of the total record high temperatures are less than or
equal to 1148F. Forty-eight of the total record high temperatures are less than or equal to 1248F.

After the raw data have been organized into a frequency distribution, it will be analyzed by looking
for peaks and extreme values. The peaks show which class or classes have the most data values
compared to the other classes. Extreme values, called outliers, show large or small data values that
are relative to other data values.

Different types of distributions are used in statistics and are helpful when one is organizing and
presenting data.

The reasons for constructing a frequency distribution are as follows:


1. To organize the data in a meaningful, intelligible way.

2. To enable the reader to determine the nature or shape of the distribution.

3. To facilitate computational procedures for measures of average and spread.

4. To enable the researcher to draw charts and graphs for the presentation of data.

5. To enable the reader to make comparisons among different data sets.


Bar Graphs
When the data are qualitative or categorical, bar graphs can be used to represent the data. A bar
graph can be drawn using either horizontal or vertical bars.

A bar graph represents the data by using vertical or horizontal bars whose heights or lengths
represent the frequencies of the data.

The table shows the average money spent by first-year college students. Draw a horizontal and
vertical bar graph for the data.

Electronics $728
Dorm decor 344
Clothing 141
Shoes 72

1. Draw and label the x and y axes. For the horizontal bar graph place the frequency scale on the
x axis, and for the vertical bar graph place the frequency scale on the y axis.

2. Draw the bars corresponding to the frequencies.

The Histogram
The histogram is a graph that displays the data by using contiguous vertical bars (unless the
frequency of a class is 0) of various heights to represent the frequencies of the classes.

In other words, Frequency histogram is simply a bar graph with the continuous class intervals listed
on the x-axis and the frequency of occurrence of the values in the interval on the y-axis.
Example: Construct a frequency distribution to represent the data shown for the record high
temperatures for each of the 50 states. Then construct a histogram.

112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
The procedure for constructing a grouped frequency distribution for numerical data follows.

Step 1 Determine the classes.

Find the highest value and lowest value: H = 134 and L = 100.

Find the range: R = highest value - lowest value = H - L, so R = 134 - 100 = 34

Select the number of classes desired (usually between 5 and 20). In this case, 7 is arbitrarily
chosen.

Find the class width by dividing the range by the number of classes. So, Width = R/number of
classes=34/7=4.9

Round the answer up to the nearest whole number if there is a remainder: 4.9~ 5.

Select a starting point for the lowest class limit. This can be the smallest data value or any
convenient number less than the smallest data value. In this case, 100 is used. Add the width to
the lowest score taken as the starting point to get the lower limit of the next class. Keep adding
until there are 7 classes, as shown, 100, 105, 110, etc.

Subtract one unit from the lower limit of the second class to get the upper limit of the first class.
Then add the width to each upper limit to get all the upper limits. 105 – 1= 104

The first class is 100–104, the second class is 105–109, etc.

Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to each
upper class limit: 99.5–104.5, 104.5–109.5, etc.

Step 2 Tally the data.

Step 3 Find the numerical frequencies from the tallies.

The completed frequency distribution is


Histogram Construction

Step 1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is
always the vertical axis.

Step 2 Represent the frequency on the y axis and the class boundaries on the x axis.

Step 3 Using the frequencies as the heights, draw vertical bars for each class.

The Frequency Polygon


The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the heights
of the points.
Example: Using the frequency distribution given in Example of Histogram, construct a frequency
polygon.

Step 1 Find the midpoints of each class. Recall that midpoints are found by adding the upper and
lower boundaries and dividing by 2: (99.5 + 104.5)/2=102, (104.5 - 109.5)/2=107 and so on.
The midpoints are

Step 2 Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.

Step 3 Using the midpoints for the x values and the frequencies as the y values, plot the points.

Step 4 Connect adjacent points with line segments. Draw a line back to the x axis at the beginning
and end of the graph, at the same distance that the previous and next midpoints would be located,
as shown in Figure below:

The frequency polygon and the histogram are two different ways to represent the same data set.
The choice of which one to use is left to the discretion of the researcher.
The Ogive
The third type of graph that can be used represents the cumulative frequencies for the classes. This
type of graph is called the cumulative frequency graph, or ogive. The cumulative frequency is the
sum of the frequencies accumulated up to the upper boundary of a class in the distribution.

The ogive is a graph that represents the cumulative frequencies for the classes in a frequency
distribution.

Example:
Step 1 Find the cumulative frequency for each class.

Step 2 Draw the x and y axes. Label the x-axis with the class boundaries. Use an appropriate scale
for the y-axis to represent the cumulative frequencies.

Step 3 Plot the cumulative frequency at each upper-class boundary. Upper boundaries are used
since the cumulative frequencies represent the number of data values accumulated up to the upper
boundary of each class.

Step 4 Starting with the first upper-class boundary, 104.5, connect adjacent points with line
segments. Then extend the graph to the first lower class boundary, 99.5, on the x-axis.
Relative Frequency Graphs
The histogram, the frequency polygon, and the ogive shown previously were constructed by using
frequencies in terms of the raw data. These distributions can be converted to distributions using
proportions instead of raw data as frequencies. These types of graphs are called relative frequency
graphs.

Graphs of relative frequencies instead of frequencies are used when the proportion of data values
that fall into a given class is more important than the actual number of data values that fall into
that class. For example, if you wanted to compare the age distribution of adults in Philadelphia,
Pennsylvania, with the age distribution of adults of Erie, Pennsylvania, you would use relative
frequency distributions.

The Pie Graph


Pie graphs are used extensively in statistics. The purpose of the pie graph is to show the relationship
of the parts to the whole by visually comparing the sizes of the sections. Percentages or proportions
can be used. The variable is nominal or categorical.

This frequency distribution shows the number of pounds of each snack food eaten during the Super
Bowl. Construct a pie graph for the data.

Snack Pounds (frequency)


Potato chips 11.2 million
Tortilla chips 8.2 million
Pretzels 4.3 million
Popcorn 3.8 million
Snack nuts 2.5 million
Total n = 30.0 million
Step 1 since there are 360% in a circle, the frequency for each class must be converted into a
proportional part of the circle. This conversion is done by using the formula
Degrees=f/n*360
where f = frequency for each class and n = sum of the frequencies.
Hence, the following conversions are obtained. The degrees should sum to 360.
Step 3 Next, using a protractor and a compass, draw the graph using the appropriate degree
measures found in step 1, and label each section with the name and percentages.

Stem and Leaf Plots


Histograms summarize a dataset and provide an idea of the shape of the distribution of the data.
However, some information is lost in the summary. We are not able to reconstruct the original
data from the histogram.

John W. Tukey created an innovation in the 1970s that he termed the “stem-and leaf diagram.”
Tukey (1977) elaborates on this method and other innovative exploratory data analysis techniques.
The stem-and-leaf diagram not only provides the desirable features of the histogram, but also gives
us a way to reconstruct the entire data set from the diagram. Consequently, we do not lose any
information by constructing the plot.

The basic idea of a stem-and-leaf diagram is to construct “stems” that represent the class intervals
and to have “leaves” that exhibit all the individual values.

To form the leaves, we place a single digit for each observation that belongs to that class interval
(stem). The value used will be the single digit that appears after the decimal point. If a particular
value is repeated in the data set, we repeat that value on the leaf as many times as it appears in
the data set. Usually the numbers on the leaf are placed in increasing order. In this way, we can
exhibit all of the data. Intervals that include more observations than others will have longer leaves
and thus produce the frequency appearance of a histogram.

Example: At an outpatient testing center, the number of cardiograms performed each day for 20
days is shown. Construct a stem and leaf plot for the data.

25 31 20 32 13
14 43 02 57 23
36 32 33 32 44
32 52 44 51 45
Step 1 Arrange the data in order: 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 44, 45,
51, 52, 57.

[Arranging the data in order is not essential and can be cumbersome when the data set is large;
however, it is helpful in constructing a stem and leaf plot. The leaves in the final stem and leaf plot
should be arranged in order]

Step 2 Separate the data according to the first digit, as shown.

02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45

51, 52, 57

Step 3 A display can be made by using the leading digit as the stem and the trailing digit as the
leaf. For example, for the value 32, the leading digit, 3, is the stem and the trailing digit, 2, is the
leaf. For the value 14, the 1 is the stem and the 4 is the leaf. Now a plot can be constructed as
shown
Figure shows that the distribution peaks in the center and that there are no gaps in the data. For 7
of the 20 days, the number of patients receiving cardiograms was between 31 and 36. The plot
also shows that the testing center treated from a minimum of 2 patients to a maximum of 57
patients in any one day.

If there are no data values in a class, you should write the stem number and leave the leaf row
blank. Do not put a zero in the leaf row.

Box-and-Whisker Plots
John W. Tukey created another scheme for data analysis, the box-and-whisker plot. The box-and-
whisker plot provides a convenient and compact picture of the general shape of a data distribution.
Although it contains less information than a histogram, the Box-and-Whisker plot can be very
useful in comparing one distribution to other distributions.

A box plot is a graphical display, based on quartiles, that helps us picture a set of data. To construct
a box plot, we need only five statistics: the minimum value, Q1 (the first quartile), the median, Q3
(the third quartile), and the maximum value.

The length of the box is called the interquartile range, the range of values that constitute the middle
half of the data. Out of the upper and lower ends of the box are the lines extending to the
perpendicular bars called whiskers, which represent extremes of the distribution. We consider the
minimum (i.e., the smallest value in the data set) and the maximum (i.e., the largest value in the
data set) to be the ends of whiskers.

Example
Consider the following data set: 3, 4, 8, 5, 7, 2, 5, 6, 5, 9, 7, 8, 6, 4, 5. Draw a box plot of the
data.
Uses of Box plot
The box plot is very useful for indicating the presence or absence of symmetry and for comparing
spread or variability of two or more data sets. If the distribution is not symmetric, it is possible that
the median will not be in the center of the box and that the whiskers will not be the same length.
Looking at box plots is a very good first step to take when analyzing data.
If a box-and-whisker plot indicates the presence of symmetry, the distribution may be a normal
distribution. Symmetry means that if we split the distribution (i.e., probability density function) at
the median, the half to the right will be the mirror image of the half to the left. For a box-and-
whisker plot that shows a symmetric distribution: (1) the median will be in the middle of the box;
and (2) the right and left whiskers will have equal lengths.

You might also like