Elementary Statistics: Frequency Distribution " "
Elementary Statistics: Frequency Distribution " "
Frequency Distribution
A possible frequency representation of the foregoing bulb data may be given as:
Table:2
Class 𝒇 Cumulative 𝑭
Class Mark Frequency 𝒓𝒊 = 𝑵𝒊 Frequency
𝑹𝒊 = 𝑵𝒊
𝒇𝒊
800-850 825 3 0.03 3 0.03
850-900 875 4 0.04 7 0.07
900-950 925 8 0.08 15 0.15
950-1000 975 12 0.12 27 0.27
1000-1050 1025 16 0.16 43 0.43
1050-1100 1075 28 0.28 71 0.71
1100-1150 1125 24 0.24 95 0.95
1150-1200 1175 5 0.05 100 1.00
Here eight classes have been chosen in which we have placed the data. Each
class is of length 50(hrs) and the extent of the classes is sufficient to include all
of the data. Although the selection of equal length classes is not necessary, it is
certainly convenient and quite common.
The center of each class is characterized by a class mark 𝑥𝑖 . Since the actual raw
values are not present in the frequency distribution, the class mark 𝑥𝑖 will often
be used to represent each datum in the 𝑖 −th class during computations.
Histogram of the bulb life distribution is shown below (Fig.1).
It is sometimes useful to consider the frequency distribution derived by
accumulating the individual class frequencies. The resulting compilation is
known as cumulative frequency distribution. Fig.2 illustrates the cumulative
frequency histogram for the bulb life distribution.
Histogram Histogram
30 150
Frequency
20 cum.freq.distr. 100
10
0 50
Frequency 0
Frequency
Bulb-life(hrs.)
Bulb-life(hrs.)
Sometimes line graphs are employed instead of bar graphs to depict both
frequency and cumulative frequency distributions. To construct the frequency
Polygon corresponding to the distribution, we first extend the frequency
distribution class in either direction and assign each extra classes the frequency
0. We then plot the points 𝑥𝑖 , 𝑓𝑖 , 𝑖 = 0,1,2, … . . , 𝑚, 𝑚 + 1, where 𝑚 is the
original number of classes, and connect the pairs of adjacent points by straight
lines.
The cumulative frequency polygon, or ogive, is a piecewise linear representation
of the cumulative frequency distribution in which the cumulative frequency of a
class is plotted at the upper class boundary. The frequency polygon and ogive are
shown in Fig.3 and 4 respectively.
Ogive -bulb life
30 freq. polygon 120
25 100
20
80
15
60 Ogive -bulb
10
freq. polygon 40 life
5
0 20
875
825
925
975
1025
1075
1125
1175
0
825
875
925
975
1025
1075
1125
1175
Fig.3
Fig.4
Relative –Frequency Distributions
Given that we understand the intuitive nature of the discussion, suppose we wish
to employ relative frequencies to give meaning to the statement 𝑷 𝒂 < 𝑿 < 𝒃 ,
where 𝒂 𝐚𝐧𝐝 𝒃 are not class boundaries. Assuming the data in each class are
uniformly distributed throughout the class interval, it seems reasonable to
estimate the desired probability by finding the fraction of the data that lies
between a and b.
Referring to Fig.5 and letting c denote the common length of the classes, 𝒓𝒊 the
relative frequency of the 𝒊 −th class , and 𝒖𝒊 𝒂𝒏𝒅 𝒍𝒊 the upper and lower
boundaries of the 𝒊-th class , a straightforward geometrical analysis yields,
𝒖𝟐 − 𝒂 𝒃 − 𝒍𝟒
𝑷 𝒂<𝑿<𝒃 = 𝒓𝟐 + 𝒓𝟑 + 𝒓𝟒
𝒄 𝒄
Fig.5
a 𝑢2 𝑥3 𝑙4 𝑏
The step function 𝒔(𝒙)depends solely on the sample data. Given a new set of
observations, it is likely that a different histogram , and thus a different step
function, would be obtained. Indeed, the result could be a markedly different step
function. The problem is that we really would like some function that would , at
least in theory, give the actual values of the probability statements, not simply
estimates based upon this or that sample. We desire a function 𝒇(𝒙) such that
𝒃
𝑷 𝒂 < 𝑿 < 𝒃 = න 𝒇 𝒙 𝒅𝒙
𝒂
For all values of a and b, 𝒂 < 𝒃. Of course , the “actual” function that is
appropriate for the population might look very different from the step function
𝒔(𝒙) derived from a given set of observations. One role of statistical analysis is to
provide a methodology for deciding on an appropriate 𝒇 𝒙 .
Probability as an integral
Again consider the relative –frequency histogram of Fig. , only this time let the 𝒚
axis be rescaled by dividing each relative frequency by c, the common class
length, to obtain the normalized relative frequencies
𝒓𝒊 𝒇𝒊
𝒑𝒊 = =
𝒄 𝒄𝑵
If the tops of the bars of this normalized relative-frequency histogram are viewed
as portions of a step function, 𝒔 𝒙 ,
𝒃
න 𝒔 𝒙 𝒅𝒙 = 𝒖𝟐 − 𝒂 𝒑𝟐 + 𝒄𝒑𝟑 + (𝒃 − 𝒍𝟒 )𝒑𝟒
𝒂
Megaflop observations (Probability and Statistics for the engineering , Computing and Physical sciences; Edward
R.Dougherty; PH,1990)
3.9 4.7 3.7 5.6 4.3 4.9 5.0 6.1 5.1 4.5
5.3 3.9 4.3 5.0 6.0 4.7 5.1 4.2 4.4 5.8
3.3 4.3 4.1 5.8 4.4 3.8 6.1 4.3 5.3 4.5
4.0 5.4 3.9 4.7 3.3 4.5 4.7 4.2 4.5 4.8
For a frequency distribution with 𝒎 classes and total frequency 𝑵, the empirical
mean is defined as
𝒎 𝒎
𝟏
ഥ = 𝒇𝒊 𝒙𝒊 = 𝒓𝒊 𝒙𝒊
𝒙
𝑵
𝒊=𝟏 𝒊=𝟏
Table:3
Class Class Mark Frequency Relative Freq.
3.25-3.75 3.5 3 0.075
3.75-4.25 4.0 8 0.200
4.25-4.75 4.5 14 0.350
4.75-5.25 5.0 6 0.150
5.25-5.75 5.5 4 0.100
5.75-6.25 6.0 5 0.125
Employing the frequency information results in empirical mean (or simply mean) as
above
1
𝑥ҧ = (3 × 3.5 + 8 × 4.0 + 14 × 4.5 + 6 × 5.0 + 4 × 5.5 + 5 × 6.0 = 4.6875
40
Median
For raw data sample 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , … … , 𝒙𝑵 the median is defined in one of the two ways depending on whether 𝑵 is odd
or even. Let
𝒚𝟏 ≤ 𝒚𝟐 ≤ ⋯ … . . ≤ 𝒚𝑵
be a relisting of the data according to increasing magnitude.
If 𝑵 is odd, then the median is defined to be the middle value in the relisting:
= 𝒚(𝑵+𝟏)/𝟐
𝒙
If 𝑵 is even , then the median is defined to be the mean of the two “middle” value in the relisting:
𝒚𝑵/𝟐 + 𝒚(𝑵+𝟐)/𝟐
=
𝒙
𝟐
To adapt the above definition to frequency distributions, we find the point 𝐨𝐧 𝐭𝐡𝐞 𝒙 axis that divide the area of the
histogram in two. Suppose 𝑭𝒌−𝟏 ≤ 𝑵Τ𝟐 ≤ 𝑭𝒌 , then the point on the 𝒙 axis that divides the area of the histogram into
two equal portions lies within the 𝒌th class-called median class.