Lecture_04_frequency & Frequency Distribution
Lecture_04_frequency & Frequency Distribution
DISTRIBUTION
Md. Earfan Ali Khondaker, PhD
Professor
Dept. of Statistics, HSTU
Learning objectives
Studying this chapter the Student able to know
The simple way of representing tables and
graphs
Frequency distribution gives more precious
information
Source and presentation of data
Necessity of frequency & frequency
Distribution
Frequency distributions may be presented by
graphs and charts in order to make them clear,
more easily understandable and to compare
distributions quickly.
It is also easy to understand by illiterate persons
and people from different regions with different
languages.
Graphical representation brings to light the
salient features of the data at a glance. It is also
useful in locating some partition values.
Frequency
Definition: In many situations several population
elements assume same values; i.e., numerical
values of population characteristics may often
repeat again and again. Such repetition of the
value of variable is called frequency. Frequency
is usually denoted by ‘f’.
Example
The manager of Hudson Auto would like to have a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Frequency Distribution:
Definition: Information collected in any process
are usually classified or grouped according to
specific characteristics. Arrangement of
observational data according to frequencies of
the observations is called frequency distribution.
Frequency Distribution:
Frequency distribution should be such that
the arrangement according to the
observations becomes easily understandable.
Frequency distributions are constructed
mainly to present the data in condensed from
and for easy understanding.
Frequency distribution is very important in
studies.
Cumulative Frequency Distribution:
Definition: It is the number of observations up
to the end of particular class as obtained by
cumulating the frequency of previous classes
including the class in question.
In other words, the cumulative frequency of
the last class is the sum of all frequencies
including the preceding distributions which is
known as cumulative frequency distribution
Notes: Generally a frequency distribution
contains following five parts
Construction of Frequency Distributions:
Following are the steps for the construction of a
frequency distribution.
1. Finding the Range: In constructing frequency
distribution the highest and the lowest value in the
data set are first identified and their difference is
obtained. This difference between the highest value
and the lowest value is called the range usually
denoted by R.
Range, R = XH - XL
XH = Highest value of the variable x from the
given data set
XL = Lowest value of the variable x from the given
data set
2. Decision about the Number of Classes:
It is necessary to decide the number of classes in
which the entire data set should be divided. The
number of class intervals should not be too large or
too small; usually it lies between 5 and 20,
considering the practical situation. Statistician M. A.
Sturge’s formula gives a guideline for desired
number of classes. The formula is
K = 1 + 3.322 log10N
where, K = Number of classes
N = Total number of observations in the
data set
3. Choosing the Class Interval:
Each class will have two limits, the lower limit
(the lower value) and the upper limit (the higher
value). The difference of the upper limit and the
lower limit of a class is known as class interval,
usually denoted by c or h. If the range is divided
by the number of classes, we get the class
interval.
𝑅𝑎𝑛𝑔𝑒 𝑅
Class Interval, 𝑐 = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝐾
4. Counting of Frequencies:
Count the number of tally marks corresponding
to each class interval and write the result in the
respective frequency column.
Methodology Frequency Distribution
a) Exclusive method
b) Inclusive method
Exclusive method
In this method, the upper limit of any class
interval is kept the same as the lower limit of
the just higher class or there is no gap between
upper limit of one class and lower limit of
another class. It is continuous distribution as for
example
CI Tally marks Frequency (f)
0 - 10
10 - 20
20 - 30
Inclusive method
There will be a gap between the upper limit of any
class and the lower limit of the just higher class. It is
discontinuous distribution as for example
CI Tally marks Frequency (f)
0-9
10 - 19
20 - 29
To convert discontinuous distribution to continuous distribution by
subtracting 0.5 from lower limit and by adding 0.5 to upper limit
Note: The arrangement of data into groups such that each group
will have some numbers. These groups are called class and number
of observations against these groups is called frequencies.
Definition of terms
We need to define some terms when taking
about frequency distribution. Some of these we
have used already; in this section we shall define
the terms more closely.
(a) Class frequency
The class frequency in a distribution gives the
number of observations falling within that
particular class. When presenting a frequency
distribution in tabular form, the classes always go in
the left hand column, with the class frequencies on
the right.
(b) Class limits
The smallest and largest values (rounded where
necessary) that can go into any given class are
termed its class limits.
In the electric bulbe weights (in gm.) table the
class limits are 2.0, 2.9, 3.0, 3.9, and so on. We
usually differentiate between the lower class
limits (2.0, 3.0, 4.0, etc.) and the upper class
limits (2.9, 3.9, 4.9, etc.).
(c) Class boundaries
These represent the actual or true limits to a
class. There is a fine distinction between class
boundaries and class limits, and it is important
to be clear on this distinction.
In our example we may note that a fish weighing
(say) 2.96 kg will be recorded in the survey as
weighing l 3.0 kg. The class boundaries in this
example are actually 1.95, 2.95, 3.95, and so on.
(d) Class marks
The class mark is the mid-point of the class, and
is obtained by taking the arithmetic mean of the
upper and lower class limits.
In the example, the class marks are 2.45, 3.45,
4.45, etc. These are often also referred to as
mid-marks, mid-points, mid-values, etc.
(e) Class interval, or range
The class interval is the length of any class, the
range of values it contains. The class interval of a
class is the difference between the lower class
limit of that class and the lower class limit of the
next class. If all the intervals are equal then it is
also equal to the difference between successive
class marks.
For example, the class interval for the bulb
weights is 1.0 kg and is equal for all classes.
Note that the class interval is not necessarily the
difference between the upper and lower limits
of the class. (In our table this is equal to 0.9 gm.)
Continuous and discrete data
The first type we call a continuous variable and we
also refer to continuous data; the second type we
call discrete.
Examples of continuous variables are:
(a) length of machine;
(b) weight of machine;
(c) temperature of engine surface.
(d) costs of parts
Examples of discrete variables are:
(a) number of bulb in a house;
(b) number of parts in auto-mobile;
(c) number of mechanic.