Frequency Distribution and Graphs
Frequency Distribution and Graphs
7 ;i : ···· ijiJCi-1stribu11on
.- .
..
. and Graph
Leamin& Outcomes
After completing this chapter, the students will able to:
0 Define some basic terms in formulation of frequency distribution.
0 . Organize data into a frequency distribution.
. .
CamScanner
l'l1:....en...·
Whl!tl cmiducling a statistical research, investigatkm or study, the research muBt gatru:r data
for the pnrliculnr variable under investigation. To describe situations, make concJw,fom,, and.
draw inferences about events, the researcher must organize the data gathered in B<Jm
mcnningful way. The easiest way and widely used of organizing data is to construct a frequeno;
distribution. A freq1w11cy distrib11tio11 is a grouping of the data i_nt categories showing the
number of observations in each of the non-overlapping classes.
After orgnnizing data, the next move of the researcher is to present the data so they can be
understood easily by those who will benefit from reading the study. The most useful method of
I
prt:senting data is by constructing graphs and charts. There are number of ways to plot graphs
and charts, and each one has a specific purpose.
This chapter dis·cussed how to organize data by constructing frequency distribution and how
to present data by constructing grap_hs and charts.
Before we get started in constructing f equenc_ydistribution, we_must define some terms that
·are essential · to understand deeper the nature of data that are displayed in a frequency
distribution.
• Raw data i the ata collected in original form.
• Ra11ge is the difference of the high st value and the lowest value in a dis'tribution.
• Frequency dist,:ibution is the organization of data i
.a tabular form, uslng mutually
exclusiv·e classes_showing the number of observations in each.
• Class Limits (or Apparent Limits) is the highest and iowest values describing a class.
• Class Bo,mda;ie_s (or Real Limits) 'is the upper and lower values of a cl s-for group
frequency.distribution whose·values has·additional decimal place more than the class
· limits and end with the.dig t 5.
• Iuten,al (or width).is the distance between the.class lower boundary and the class upper
boundary and it is denoted by the.symbol i.
' . .
• Freq11e11cy (f) is the number of values _in a specific class of a frequency distribution.
• Relative Frequency (rf) is the value obtained when the frequencies in eac -class of the
frequency distribution if divided by the total number of values. · - .
Ex ple: Twenty applicants were given a pe formance evtlm1tion ap rai_sal.The data set is_
. High ' High Hi h Low Average
Average Low Average Average Average
Low Average Average· High Hig
Low Low Average· H1. g h_ ·. Hi g h·.
Construct a frequency d·istributto for the data.
· Solution:
8tep 1: Construct"a table as shown below.
J
High.·, mr-n . ,,
Ayerage fm-III
,. Low tw ,'.
:
·8tep J: Convert the tallied da_tainto umerical
frequencies.
r . . .
High. mi-II 7
Average· Jw-Iil 8
Low Jw. .5 I .
Percentage=f xlOO.%
n
CamScanner
where j= frequen y of the class and
n= total number of values..
CamScanner
7 '·
8 35 (7 + 20) X 100 ·
Average 40
Low 5 (8 -j- 20) X 100
20 25 5+20 xlOO ·
Total 100·
' I
2. · Rule 2. Another ay to determine the class interval we can apply Formula 2-2.
•· · , · .· · · . Range 1
Sugges_ted .Class Interval= . (Formu a 2-2).
. · , . 1 + 3.322(1og anthm of total frequencie;)
3. Rule .3. Another guideline to determine the class interval is to have an ideal number·of
classe_s, then apply F<;>rmu]a 2-3.
·. . · Highest Value-Lowest Value (Formula 2-3)
t •
Example 1: Suppose a researcher wished to do a study on the monthly salary (in' P thousands) of
all center.agents of selected Business Process Outsourcing (BPO) companies. The research.first
would have to.·collect the data by asking each call center agents aboµt their mo_nthly salary. The
data collected in riginai° form.i called raw data. In this case, the data are..
.. ,·
'
'
CamScanner
Conf:ilruct a frequency distribution using Rule land detcrmim, the foJlow1ng
a. Range c. Relative frcqucncie11
b. Interval f. Percentages
c. Class limits g. Cumulative frcguencicfi
d. Class boundaries h. Midpoints
Solution: .
Btep lz Arrange the ra data in a_scending or descending order. In lhiR particula,r xampl«t
will arrange raw data in ascending order. This will make it easier for w, to lally th, efaJ-1.
P14.10 P17.95 P20.25 P21.75 1322.90 P23.70 P24.75 P26.50 P27.50 fi30.60
14.30 18.35 20.30 21.80 22.90 23.70 25.00 26,50 27.60 3().75
15.50 18.40 20.40 21.90 23.00 23.85 25.00 26.80 27.80 30. (J
15.70 18.70 20.50 21.90 23.20 24.10 25.20 26.90 27.90 30.90
17.00 18.80 20.80 · 22.00· 23.40 24.30 26.00 27.00 27.90 31.00
17.30 20.00 ·21.00 22.60 23.40 24.50 26.10 27.00 29.30 32.10
17.40 · 20.20 21.30 22.75 23.50 24.60 26.20 27.30 29.50 32.90
17.80 20.25 21.60 22.80 23,70 24.70 26.30 27.40 30.10 33,70
The objective is to use just enough classes. We can determine the number of class€
11 11
(k) usi.ng 2 to tlie k r,lle • This will enable us to select the smallest number (k) for th
number of classes such that 2k (2 raised to the power of k) is greater than the number<
obse vations (n). Using our example, there.are 80 call center agents ,(or n·= 80). If
w
apply k = 6, which means we would use 6,classes, then 2k = 2'' = 64, somewhat less tha
80. Thus, 6 is not enough classes. If we try =k 7, then 2k = 27 = 128, which is greater tha
80. Therefore, the recommended number of classes is.7.
. Generally the class.interval (or width) should he equal for all classes. The classe
must over all the values.in the raw data (that is, from lowest to highest), Class inten'
is generated using the formula:
19 60
Suggested Class Interval= Range . H V- LV.. = : = 2.80·3 ·
. Numbe of Classes . •k 7
i- ·•.·.·--·--··· ....·.. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ·.'.·. .·. ·.
r
@J
··..·. . . . . . . . . . . . . . . . . . . . . . . . .:. . _...:........ . . . . . . . . . . .-. :--;
Note: Round the value of the interval up to the nearest whole number if there
--..··-·--:-·....... ai . e.r. ._. ..._..._. ....··.-.···-·..........................·........·...............................·...·...·.·.-
15
···
··..·..........._................................:......................................................-....-..-..·--- ..,
• Select a starting point for the lowest class limit.
CamScanner
CamScanner
The starting point can be the smallest data value or any.convenient number less
than the smallest data value. In our case 14 is used.
\Ve need to add the_interval (or width) to the lowest score taken as the starting
point to obtain, the lm,,er li1nit of the next class. Keep adding until we reach the7,
classes, as reflected 14, 17, 20, 23, 26, 29, and 32.
To obtain the upper class limits, we need to.subtract one unit to the lower limit of
the second class to obtain tl e upper limit of the first class. That is, 17 - 1 = 16. Then add
the interval (or width) to each upper limit to obtain all the upper limits.
,:: J:fassLimits ·
14-16
17-19 ·,
20-22
23-25
26-28
29-31
32-34
• Set the class bou·ndaries in each class. To obtain the class boundaries, we need to
subtract 0.5 from each lower class limit and add 0.5 to each upper class limit.
Gass Limits : ·. {'(J tiJqWJ.q S'\i
14 -16 13.5 -16.5
17-19 16.5 - 19.5
20-22 19.5-22.5
23-25 22.5 - 25.5
26-28 25.5 "."' 28.5
29-31 28.5--31.5
32-34 31.5- 34.5
CamScanner
CamScann
CamScann
. _:F :;,· ·t.":'
. :..
,f :- .':
ltlnge .
Suggested Class Interval=l+3322(1ogarithm of totalfrequencie)
f"
. . 77-18 '59 .
Suggested Class Interval=
1 + 3.322(1og 50)
= 1 + 3 .322(1•698970004)
...9
Suggested Class Interval= .:, = 8.88:::9
·· 6.643978354
,· .
• Select a starting point for the lowest class limit. The lowest value in the da set i l8
17.5-26.5 Ill
. 27-35
26..5-35.5
36-44 ha
35.5-44.5
45-53 ha-Ill!
. 44.5-53.S
54-62 ffil-00-IIJI
53.5-62.5
63-71 IHI-'m!-1
' 72-80 62.5-71.5· !RJ-I.
71.5-80.s , I
CamScann
CamScann
CamScann
· Whe the dat setcontains large number_·of valu·es, making conclusiot:15 from an ordered
array or stem- nd-leaf plot is often difficult. We will need graphs or :11arts_in such si atio .
There are a i:iumber of graphs · or ¢larts to visuallr s h o w_ numenfal data. These
mclude
hi togram/ frequercy polygon, and cumulative freq4ency (ogive}. - '_ :. r .
• . • ' I
In this section, we·dis<;t1ssed several graphical methods' that a use·d for int rval data. The
most imp rtant of these gr phical me.thods i,s the histogram. ·.Histogram·is a·powerful graphical
technique used to summarize interval da_ta, but it·also helps explain important aspect of
probability.
A. Histogram
.
A histogram is a graph in which the classes are mar ed on the.horizori alaxis (x-axis) and the
class frequencies on -the vertical axis (y-axis). The height of the bars -represents the· class
.frequencies, and the bars re drawn adjace11t to each other.-_·Nevertheiess, the histogram 'focuses
. . .
on the. frequency of each dass .and sac_rifices whatever information was contained in the actual.
. \ . ,. .
. . . '
·observations.
B. . Frequency Polygon
. A frequency_polygon is a graph that displays the data using points which ar connected
.by· lines. The frequendes are represented by the heights of the points at the midpoints of the
classes..
The ve tical ax.is represents the frequency of thr distribution while th horizont l axis represents
the midpoints of .the frequency distributi9h.
• •
. · ' ··· · ··
J
· · · · • • •
Ex ple: A shown below iif the.frequency distribution in th Example 1.in Section 2.3.
,4·
,l-7'-19 20-22 ,16.5 ,- 19.5..:
4
. •\
'18 '9
23,.2, 5· :19_5..::22.5 13
21 16 29
26-28 · · 22:5- 25.5 24'
,·
,29 31 . J
23 . 52
· ?5.5 ::.._ 28.5
69 17
27 28.5- 1.5
··30_ 8 77
. .3 2- 34' ' 31.5 - 34.S . 33 . .
Construct a histqgram, f{equencyp ly.gon a.nd· c· ·. 1. . . ·. . ' 3 . 80 . . .
can · ' umu ativ f
. you reached based on_the information present d. , _. e equency polygon. Wht - . .
CamScann
. . . : .. ; .: .· . . ·_ in the histogram. . . a onclus1ons -
.
·.
CamScann
Solution:
a. Constnicting a Histogr .
Bte,o l:Find_the midpo ts of each class.
8tep !: Draw d label the x-axis and y-ro<!s.
tJt,/,J:.Represent the frequency On the y-axiS and the midpoints on the x-axis.
8tep 4: Use the frequ ncy to·represe ! the height and draw the vertical bars.
Toe-class frequencies are scaled along the vertical axis and the dass midpoints along the
horizontal axis. From Figure 2.1 we·note that there are employees in the PlS,000 class
midpoints or J:14,000_-r16,000. Therefore, the height of the column for class Pl4,000-Pl6,000 is
4. Applying
the same thing to other classes we shall obtain the graph below. ·
1
. the histogram shows, the class with the greatest number of data values (23) is P23,000-
..
,As
P25,00_0,· followed by 17 _for l326,000-P28,000. The graph also has one p ak with the data
c us_tering around it.· . . . · ·
·. '
20
>,
15
-= r
·....10
.5
I: 0
18 21 : 24 27 ·' 30 33
Salary (in Thousands) '' '·
:,
CamScann
Figure 22: Frequency Polygon for Call Center Agents' Salary .
25 -, .:..._
20
15 18 . 21 2427 30 33
I
SaJary !inThousands)
>.100
u
C:
80
C"
60
. "'·£i
,u
40
"§3 20
UO
CamScann
As discussed in the previous section, the only allowable calculation on nominal data i11
count the frequ ncy of each value of the yariable. We can graphically display th' counts in thrl
ways: pareto charts, bar charts, and pie charts. This section also includes on how to gruphicall
display time series graphr pictograph and scatter plot.
A. Pareto Chart.
•
· A pareto chart is a graph used.to represent a frequency distribution for D categorical clat,
(or nominal-level) and frequencies are displayed by th heights of vertical. bars, which ar(
arranged in order from highest to lowest. ·
•
B. Bar Chart (Bar Graph)
' ,
A bar'chart is similar to bar histogram. The bases of the rectangles are arbitrary interval
whose centers are the codes. The height of.each rect,mgle represents the frequency of tha
category. It is also applicable for categorical data (or nominal-level).
I'
A pie chart is a circle divided into, portions that represent the relative frequen ies (o
-percentages) of the dat belonging to different categories. T e data in a pie chart should b
categorical or non:iinal-level.
E. Picto ph (Pictogr m)
F. Scatter Plot
A sc tter plot is,u ed ta ex ine possible rel tionship.s betv!een two numerical variabl :
The tw_ovariables ate. p l_o ti n x-axis and y-_axis. ·
. Now we will illustrate how to construct the pareto cl} rt, bar chart, pie chart, time seriE
graph · t - · · · ·
pieograph,.and scatter plot using 'the succeeding examples.
. .
'
' .
o . youth s, cons truct'
' '
·· Example l ·U · · · · · · f
. • 5!ng the information in the table about the favorite snacks o
P areto chat b .
. . r ': .ar chart, an? pie chart.
CamScann
•
Junk Foods 135
Cand 250
Ice Cream· l85
Chocolate 210
Others· 90
' \
Solutio,n:
a. Const.ruc.·ting a Pareto Chart
"8tep 1: Arrange the data from highest t? lci est_according to frequen y.·
. . -, . .
8t¥ 2: or:aw arid label the x-axis (Products) and y-axis (Sales).
. 8tep J: Construc.the chart by arranging the frequency from highest to lowest and from left to
· right·. Make a bar with the sanie· width and. draw the•height corr_esp<?nding
to•the frequencies..Figure 2.·4shows the Pareto Chart on the favorite snacks of the youth.
·Favorite Snacks
300 ' .
-;;, 250·
·;.::==200
0
. :::
:;. 150 .
•• ' 1 -'
:§
100
.. . .
I l l
QJ
!SI
r.n 50
0 Ice c·ream
Candy Chocolate Junk Foods Others
Products I
•, \
8tep'l: Draw_and label the x-axis (Products) and y-:axis (Sale ).·. .
2
8tep_ : Mak a b r with the s ine width'and dr w th height,·cor;es · d' .· . .·
. p· · · · · • pon mg to-the freq ·
1
gur_e 2.5.·shows the Ba Chart on the favoritenacks of the youth. . uencies.
I ,
· Page 39
CamScann
CamScann
. 8tt/' J: Using a protractor, graph each section and write its name and appropriate percentage, as
shown in Figure 2.6.
Figure 2.6: Pie Cl,art for Example 1
Favorite Snacks
Others·
10%'
Cmdy
Junk Foods
,'
16%.
Ice Cream
21%:
.- !' '•
'' '
Example 2: Using the information in the table belO\ about the dollar to peso exchange rate from
January to December of 2015, construct a time series graph. ·
, l
Solution:
·"Il"l
42
Q.. 41
Nov
CamScann
Ex ple 3: The VSAS Realty Inc. is a real estate who develops household in Rizalprovince.)
information in the table shov,, the number of house construction from 2011 to 2015. Constru
pictograph.
Year ·. ·, ,. ·,
' 2011 2012 2013 2014 2015
No; of.Houses ·:, 400 250 · 600 550 700
Solution:
'8tep l: Draw and label the x-axis and y-axis.
'8tep 2: Label the x-axis for years and y-axis for Number of Houses. . ,
800
700
ID 600
5 soo·
-=
400
. 300 I,
200
100
-.2011 2012
2013 2014· 2015
1tl'i- Year . ,,-
Example 4: The ciwn r of a chain of halo-halo stores w uld like to study.the' effect of atmosphe
te_mperature on sales during the summer s_easo . A. random sample of 12 days is selected with I
re ults given s follows: .
I • •
1 2 3 4 ·5 6 7 8, 9 10 11 12
79. 76 - .
83 93 94 97 ·. 85'
84 90 88 - 82·
147 78
143 147 168 206 155 192 211 209. 200 150
187
Put the data on a scatter diagram;
Solution:
8tep l;i Draw and label the x-axis and y-axis; '
8tep 2: Label the x-axis for Temperature (°F) and y-axis for Sales.
CamScann
'8ttf J: Plot the points of each ordered pairs in the Cartesian coordinate system.
225
••
200 -
♦••♦
175
•
-.♦
150
,-.. •*
--
>--125
- 100
<ll_
Cl
75 -
ti) 50
25 -
.0
0
15 30 45 60 - 75 90. , I
Temperature (X)
Goo graphical displays tell_what the da_ta are conveying. Sadly any graphs or charts
shown in newsp_apers and magazines are misleading, incorrect, or coinplica e_d that must not be .
used. In order to correctly develop a good graphs/charts there are some guidelines th_atne e ds
to bear in mind such as . ., .
I.
I
CamScann