Module Two: Frequency Distribution and Their Graphic Representations
Module Two: Frequency Distribution and Their Graphic Representations
Objectives
Introduction
After we obtain a set of measurements, a common next step is to put them in a systematic
order by grouping them in classes. A set of individual measurements taken as they come does
not convey much useful information. Generally we have a conception of how large they run
numerically.
The most common method of summarizing large data is to present them in condensed
form in tables or charts. The study of statistical presentation took most of the time in statistics
courses. The scope of statistics has grown to such an extent that less time is devoted to this kind
of work since then can mostly be done with the use of computers. Nevertheless, it is necessary
to discuss how students and researchers summarize data in frequency distributions.
Data obtained from surveys or research studies are usually a large set of measurements
and are by their very nature disorganized and varied. It is necessary therefore to organize them
in some systematic fashion where the numbers will emerge. The primary steps in arriving at
some meaningful interpretation of data require setting up a frequency distribution.
Frequency Distribution
Ungrouped Data or Raw Data are data which have not been arranged in a systemic order.
It is also called a raw data, while grouped data are those presented in the form of frequency
distribution. When the data or numerical raw data is arranged in ascending or descending order
is called an array.
Valuable information about a large data set can be gained and a good overall picture of it
can be taken by grouping the data into a number of classes. For instance, consider the scores of
50 students in Statistical Methods listed in Table 1. These scores ranged from 30 to 95. If you
arrange all the scores from highest to lowest then place a slash mark (tally) alongside each score
every time it occurs, the result shown in Table 2 is an ungrouped frequency distribution of
scores. Note that the scores are widely spread out and a number of scores have a frequency of
zero. Usually, it is advisable to “group” the scores into what is referred to as class interval and
then obtain a frequency distribution of group scores.
Table 1.
Total Quizzes Scores of a random sample of 60 Graduate students in Statistical Methods during
the first semester of school year 2012-2013
67 73 64 77 33 70
43 35 41 78 63 70
87 39 52 54 45 86
71 95 57 61 44 85
88 57 78 55 84 83
30 71 53 50 48 82
69 41 47 69 94 80
30 67 39 49 30 68
47 69 84 69 30 70
67 100 36 66 36 70
These data can be arranged in either descending or ascending order to form an array.
This is done so that it will be easy to construct a frequency distribution. A frequency
distribution is any arrangement of data that shows the frequency of occurrence of values falling
within arbitrarily defined ranged of variable known as class interval. This is the case of a
grouped frequency distribution.
For ungrouped case, a frequency distribution is an ordering of data from either highest to
lowest or lowest to highest with a frequency of occurrence. Table 2 shows the ungrouped
frequency distribution in descending order of the data given in Table 1. The tallies are also
shown.
Table 2.
Ungrouped Frequency Distribution of Scores in Statistical Methods Course
The first thing to decide is the size of the class interval. How many units shall it contain?
Obviously, the interval selected must not be so large that we lose the discrimination provided by
the original measurement, nor should the interval be so fine that the purposes of grouping are
defeated. The following are the steps in constructing a grouped frequency distribution.
Step 1. Find the difference between the highest and the lowest score values. Call this difference,
the range. For the given data in Table 2, the range is (100 - 30) = 70.
Step 2. Decide on the size of the class intervals. The general practice is to prefer not fewer than
10 or more than 20 class intervals. In our example an interval of 5 points will give 15
class intervals. The range 70 divided by the class size 5 gives 14 class intervals plus 1,
that will give a total of 15 class intervals. We shall designate the class interval by the
symbol “ i ”.
Step 3. Start the intervals with their lowest scores at multiple of the size of the interval – when
the interval is 5, start with 5, 10, 15, 20, etc.; when the intervals is 3, start with 3, 6, 9,
12, 15, etc. In our present example, the lowest class interval begins with 30. Add to this
i-1, to obtain the higher limit of the class interval. The top and bottom scores for each
interval are called class limits.
Step 4. The next higher class interval begins at the integer following the upper limit of the
lowest class interval. In our example, the next integer is 30. Follow the same steps as in
No. 3 to obtain the upper limit of the second class interval.
Follow these procedures for each successive higher class interval until all the
scores are included in their designated class interval.
Step 5. Assign each obtained score to the class interval within which it is included.
The group frequency distribution of the data in Table 2 appears in Table 3.
Table 3
Frequency Distribution of Scores of 50 Students in Statistical Methods
Class limits are the smallest and largest observations which are numerical data, events, etc
in each class. The exact limits of a number are equal to its apparent value plus and minus one-
half of the unit of measurement. The same is true with class intervals. Thus, the exact limits of
the given interval 26 - 30, 31 - 35, and 36 - 40 are 25.5 - 30.5, 30.5 - 35.5 and 35.5 - 40.5,
respectively.
Midpoints
In grouping the data in class intervals, we assume that all the observations are
concentrated at the midpoint of the interval, that is, we regard the midpoint as the representative
score of all the observations in that interval. The midpoint is halfway between the exact limits of
the interval. It may be obtained by averaging the class interval or lower limits. Thus, the
midpoint of the class intervals 26 - 30, 31 - 35, and 36 - 40 or the midpoints of the exact limits
25.5 - 30.5, 30.5 - 35.5 and 35.5 - 40.5 are 28, 33 and 38, respectively.
Sometimes we have a need for information regarding the number or percentage of values
“greater than” or “less than” a specified value. The answer is readily available by the preparation
of a cumulative frequency distribution. The cumulative frequencies are obtained by adding
successively, starting from the bottom, the individual frequencies. The top entry in the
cumulative frequency column is always equal to N.
From a cumulative frequency distribution we can obtain the number of cases or
frequencies below the upper exact limits of that interval.
Table 4
Frequency Distribution of Scores of 60 Students in Statistical Methods with the exact limits,
midpoint, cumulative frequency and cumulative percent
Exercise 2a
Exercises:
1. The data below are the grades of 50 students in Methods of Research class:
92 80 85 81 78
50 62 77 78 66
79 73 57 39 79
90 87 80 60 89
79 68 75 73 65
79 46 69 80 54
59 59 70 85 48
65 78 95 80 49
48 64 42 89 68
82 74 70 59 25
Questions:
2. The following scores represent the final examination grade for an elementary statistics
course.
21 65 75 30 57
74 52 70 82 36
80 77 81 95 41
65 92 85 55 76
52 25 64 75 78
25 80 98 81 67
41 71 83 54 64
72 88 62 74 43
60 78 89 76 84
48 84 90 35 70
34 67 47 82 69
74 63 80 85 61
Graphs typically have two coordinate axes: the x-axis (the horizontal axis) and the y-axis
(the vertical axis). A graph is a device for showing numerical values or relationship in pictorial
form. It enables us to think about a problem in visual terms. Graphic representations of
frequency distribution enable us to visualize important properties of a frequency distribution. A
bar graph is a graphical representation of a frequency distribution in which vertical bars are
centered above each category along the x-axis and are separated from each other by a space
(Jackson, 2012).
12.5
F re q u e n c y
10.0
7.5
5.0
2.5
Mean =64.55
Std. Dev. =17.004
N =60
0.0
30 40 50 60 70 80 90 100
Score in Statistics
Matrix 4.
Description and use of the different graphical forms commonly applied in
presenting the data
Cumulative frequency or ogive is a graphic representation of the sum of all the scores in a
frequency distribution up to any given point. Plotting the cumulative frequencies against the
lower (or upper) limits results in a “greater than” or “less than” ogive. The “less than” ogive
corresponding to the cumulative frequency distribution is given in Table 2. This is shown in
Figure 2.
Central location refers to a value of the variable near the center of the frequency
distribution. It is a middle point. Measures of central locations are called averages.
Figure 3 Graph of Cumulative Frequency Distribution or ogive for the Scores of 60 students in
Statistical Methods
Y
X
Mean Median Mode
X
Mode Median Mean
A : Mesokurtic
B: Leptokurtic
C: Platykurtic
95 80 56 90 91
40 42 87 78 56
79 63 57 39 79
90 87 80 60 89
79 68 75 73 65
79 46 69 80 54
59 59 70 85 48
65 78 93 80 49
48 64 42 89 68
82 71 74 59 45
2. The following scores represent the final examination grade for an elementary statistics
course.
23 60 79 32 57
74 52 70 82 36
80 77 81 95 41
65 92 85 55 76
52 10 64 75 78
25 80 98 81 67
41 71 83 54 64
72 88 62 74 43
60 78 89 76 84
48 84 90 15 70
34 67 17 82 69
74 63 80 85 61
23 60 79 32 57 74 70 52 82 36
80 77 81 95 41 65 85 92 55 76
52 10 64 75 78 25 98 80 81 67
41 71 83 54 64 72 63 88 74 43
60 78 89 76 84 48 90 84 15 79
34 67 17 82 69 74 80 63 85 61
Exercise 2.c
Revisit the data file you created in Part II of Exercise 1. Present some data in grouped and
ungrouped frequency distributions.