Unit-1 IGNOU Statistics
Unit-1 IGNOU Statistics
OF
A CHARACTER
Structure
6
1:1 Introduction
Objectives
1.2 Raw Materials of Statistics
1.3 Frequency Distributions
Ungrouped Frequency Distributions
Grouped Frequency Distributions
1.4 Diagrammatic Representation of Frequency Distributions
Frequencies
Cumulative Frequencies
Frequency Curve
Broad Classes of Distributions
1.5 Summary
1.6 Solutions and Answers
1 . INTRODUCTION
In this unit, we shall talk about the basics of statistics. We sLal1 define the terms
which we shall be using again and again throughout this course. It is possible that
you have read all this before. But that might have been some years ago. So a quick
look through this unit will help you to recall the relevant facts. In case you have never
beeh introduced to statistics before, this unit will gradually acquaint you with its basic
conc~pts.You will find that most of the terms we use in statistics are part of our daily
voc~.bulary.But we have to know their precise meaning before we use them in
statistics.
Further, you will see how to collect the data relating to a given investigation. You
will also be introduced to the concept of frequency distributions. Through simple
examples, we shall acquaint you with the various modes of presenting a frequency
d~stribution-tabular as well as diagrammatic.
Objectives
On reading this unit, you should be able to :
distinguish between a qualitative and a quantitative character,
differentiate between a discrete and a continuous variable,
draw up a frequency table and get the relative frequencies, cumulative frequencies
and frequency densities,
decide upon a suitable mode of representing a frequency distribution ,
diagrammatically.
Further, we can classify the data as primary or secondary. If we collect our own data
on the relevant group of individuals and use it in a study, then the data will be called
primary. In some cases, however, we may choose to make use of the data already
available in government publications or the data collected by some other agency.
Such data are said to be secondary. We can save a lot of time and money if we use
secondary data. But, at the same time, we have to be very careful. We have to make
sure that
the data are relevant to our enquiry,
the concepts and definitions used conform to what we have in mind, and
the data are reliable.
On the other hand, if we decide to use primary data, we shall have to decide on how
to go about collectmg it. Primary data can be obtained in a number of ways,
depending on the information sought and also on our knowledge of the relevant
group of individuals. We give below some of the commonly used methods.
1) Direct Observation
Suppose we want to know the number of leaves per twig of a tree, or the weight (in
grams) per egg in a basket of eggs or the health status (good/indifferent/poor) per
student in a class. In each of these cases, we can obtain the required information by
direct observation, through counting or measurement or, simply, by inspection.
But in social and behavioural sciences, we collect information from persons who are
supposed to know. These persons are called informants. We can either get the ,
information directly from the informants or through intermediaries (called
enumerators) appointed for the purpose. In such cases, we can use the following
methods.
2) Questionnaire Method
If the informants happen to be sufficiently enlightened, then we can give them blank
questionnaire forms and request them to provide the necessary information by filling
out the forms. This method would be appropriate in gathering information about,
say, the attitude of doctors towards euthanasia (mercy-killing).
3) Interview Method
In case the informants are illiterate or not enlightened enough, the enumerators fill
out the schedule by a thorough and tactful questioning of each informant. As you are
aware, this method is used'in the population census held once in ten years in our
country.
c - -
&quency DWdbutkn of a
solve these exercises and check whether y o i have grasped these ideas or not. Character
# E 1) Indicate which of the following are primary data and wnich are secondary :
1 a) Data taken from the Government of India publication, Statistical Abstract
India of 1986.
Data collected by a market research bureau through door-to-door enquiry
to study the demand for a newly marketed shaving lotion.
c) Data collected by a medical research group through questioning of patients
visiting a hospital's dut-door facilities.
d) Weather data recorded by the Department of Meteorology and then used
by the investigator for writing a Ph.D. thesis.
E 2 ) What mode of data collection would you recommend for
a) studying the progress of a publichealth programme covering a city's slums?
b) finding out t,he reactions of a number of economists to this year's budget
proposals?
c) estimating the yieldaate (per acre) of a particular variety of wheat?
d) estimating the time taken to complete a particular calculation?
We have observed before that data relate to one or more "characters". Let us look
at this term more closely.
Characters fall into two broad categories.
There are certain characters which take varying forms for different individuals but
cannot be expressed numerically.The brand name of motor cars plying in an Indian
city is such a character; it may be Ambassador Contessa, Premier Padmini Deluxe,
Standard Herald Gazelle, Maruti 1000or other. The employees in a city hospital may
be observed for their smoking habits; any given employee will then be recorded as a
smoker o r a non-smoker. Such a character, whose possible forms can be distinguished
verbally, but not numerically, is called a qualitative character (or attribute).
On the other hand, we can express characters like the size of families, age of teachers,
lieight of students, weight of eggs, etc., in numerical or quantitative terms. The size
of a family (i.e., the number of members in the family) will be a positive integer-
1,2,3, etc. The age of a teacher may be given in years or in years and months. The
height of a student may be given in centimetres and may be rounded off to the nearest
centimetre. The weight of an egg may be recorded in grams and again may be
rounded off to the nearest tenth of a gram. Such characters are called quantitative
characters (or variables).
A qualitative character, too ultimately yields numerical data. This is because we will
finally note how many of the individuals under study have any given form of the
character. In the case of motor cars in a city, we thus note how many of the cars are
Ambassador Contessas, how many are Maruti 1000, and so on. However, the data
<ona quantitative character are numerical right from the beginning and so we can give
them a more in-depth statistical treatment than those on a qualitative character. A
qualitative character whose forms have an implied ranking (or gradation), however,
stands on a somewhat different footing. We can assign scores to these forms and thus,
express the raw data in quantitative terms. Data of this type are called ordinal data.
For example, an employee's performance in a year may be very good, good,
satisfactory, bad or poor. But we can assign the scores 5,4,3,2 and 1, to these five
categories, and immediately the data on the performance of the employees in an
office assume a numerical look. Surely, there is a lot of arbitrariness in assigning
scores this way. Nevertheless, this method of scoring is quite popular with research
workers in social and behavioural sciences.
Note that 'scoring' must be distinguished from 'coding' used to facilitate the
processing of data on an electronic computer. We use codes mainly for identification
purposes, similar to the use of roll numbers in IGNOU. Scores, on the other hand,
are more informative. For example, if you get a B grade in MTE-11, it means that
you have a good grasp of the course. ,
-
We have classified characters into two categorieq: qualitative and quantitative. Now
quantitative characters or variables, in their turn, may be classified as discrete and
continuous.
A discrete variable is one that can conceivably assume only some discrete,-or isolated
values. The size of families, the proportion or the number of males in each group of
25 students, or the length of a word are variables of this type. The size of a family
or the length of a word may take values like 1,2,3, etc., but no values in between.
The number of males in a group of 25 students may be 0,1,2, ...,24 or 25, while the
proportion of males may be 0,0.04,0.08 ....,0.96 or 1; values in between these -
numbers are inconceivable.
A continuous variable, on the other hand, can possibly take any value in some
interval. For example, the age (in years) of teachers, the height (in cm.)of students,
the weight (in grams) of eggs are all continuous variables. Supposing the minimum
age at which a person can join the teaching profession is a years and that every
member of the teaching community has to retire on reaching the age P years, then
the age of teachers must vary between a and p and can take an); value within the
interval [a, PI. Indeed; the actual age of a teacher may well be 32.119237 years!
However, there will be hardly any need to record the age with this much precision!
The enquirer may be satisfied by taking the age correct to the second decimal p l a a
so that the teachers age may be recorded as 32.12 years. This is an example of how
limitations of the measuring instruments can introduce a discreteness into the
observations of a continuous variable. Similarly, the actual monthly income of an
Indian which is a continuous variable, has to be expressed in rupees or in rupees and
paise, since the paisa happens to be the smallest denomination coin in the Indian
system of currency. This is also the case with the score in an examination of students
taking the examination. The score is invariably expressed in integers and yet it has
to be regarded as a continuous variable. This is because the score is supposed to
measure the p~oficiencyof the students in the subject concerned, and the proficiency
may be taken to vary in a continuous manner (say, between 0 and 100).
Try this exercise now.
E 4) Indicate which of the following variables are discrete and which are continuous :
a) diameter of ball-bearings produced by a steel mill;
b) number of beds per hospital in a city;
c) proportion of heads in sets of 10 tosses of a coin;
d) length (in mm) of needles produced by a factory;
e) weight of loaves (in kg) produced by a bakery;
f) size of households in a village.
In this section, we shall discuss the method of organising raw data into frequency
distributions. You will see that we can get information out of a frequency distribution
more easily than out of raw data. Here, we shall first discuss ungrouped frequency
distributions and then discuss grouped ones.
Blue
Lilac
White
Pink
Total 314 0.999
The figures in the second column of Table 1 are called the frequencies of the four
classes (or of the four colours). So 'frequency' indicates how frequently the
corresponding form of the character under study (viz., colour) occurs in the collected
data. The sum of the frequencies, 314 in this case, is said to be the total frequency.
The first two columns in Table 1 constitute a frequency table. Since these indicate
the manner in which the total frequency 314 (or the total number of individuals) is
distributed among the four classes, they are also said to represent the frequency
distribution of colour for the 314 flowers. Perhaps a better expression is 'the
frequency distribution of the 314 flowers by colour'.
Alternatively, we can also write the frequency distribution in terms of the proportions
of blue, lilac, white and pink flowers in the group. These proportions give the relative
frequencies, and are shown in the third column of Table 1. By definition,
frequency of the class
relative frequency of a class = - ,
total of frequency
... (1)
Then what is the total relative frequency? One, of course. But you can see that in
Table 1, the relative frequencies do not add up exactly to 1. This is because the ,
individual figures are all approximate, rounded off to a certain number of decimal
places. '
Note that while the distribution of frequencies answers questions of the type 'How
many flowers in tbe given group are blue?', the relative frequency has to do with
questions like 'what is the proportion (or percentage) of blue flowers in the group?'
Further, in any situation, a frequency must be non-negative integer. The value 0 is
admissible, for in the above situation it is conceivable that we might have a fifth
flower colour, say yellow, which was absent in the sample. A relative frequency, on
the other hand, must b e a rational number in the interval [0,1].
The simplest type of classification of a group of individuals by a qualitative character
is a dichotomy, i.e., a classification with just two classes. A group of students may
' thus he classified by sex as boys and girls or by performancr at an examination as
succesdful and msuccessful.
-
Let us now take an example of the data on a discrete variable.
4 4 2 4 5 2 3 3
4 3 5 5 6 6 7 5
5 3 7 2 7 6 2 6
8 1 6 5 6 6 8 7
7 9 5 4 5 5 6 3
As in the case of a qualitative character, here too, we would like to summarise the
data by forming a frequency table. For this it would be necessary to count the number
of times 1 appears, the number of times 2 appears and so on. We can count more
easily if we follow a tallying system. This system can be used by people without any
formal training in arithmetic (like our cave-dwelling forebears!)
Thus, we take nine classes defined by the nine distinct values 1,2,...9, noting that 9
was the largest household size recorded in the data. The second column in Table 3
shows the tallies against each of these values. After counting the tallies, we write the
frequencies in the third column. In the fourth column we have written the relative
-- frequencies.
Fig. 1 : Early notch-curt~ngby
primitive man
Table 3 : Frequency table for size of 80 households
Household Tallies Frequency Relative
size frequency
Total
There are two more ways in which we can represent the frequency distribution of a
discrete variable. Both make use of what are called the cumulative frequencies of the
variable. For a discrete variable like household size, the frequencies answer questions
of the type : 'How many individuals in the given group have the value k of the
variable?', and the relative frequencies answer questions like : 'What proportion of
the individuals has the value k of the.variable?' But how do we answer a question
like "How many individuals have the value k or less?" or "How many individuals
have the value k or more?"
From Table 3, you can see that the number of households of size k or less is 3 for
k = l , 3+7=10 for k=2,10+11=21 for k=3, and so on. We obtained these figures by .
taking cumulative totals of the frequencies in Table 3, starting from the lowest
observed value of the variable and going successively to the higher values. These are
called cumulative frequencies of the less than type. Similarly, to get the number of
We cannot talk of cumulative
frequencies of a qualitative
character unless it is of the
odinal type.
a
. ---
for the data on household size by means of Tables 4a and 4b.
Table 4a :Cumulative frequency table of "less than" type for size of 80 householas.
Household size Cwnulative frequencies
While making use of Table 4a, you should remember that the cumulative frequency
of the less-than type is 0 for any value of the variable less than 1, is 3 for any value
between 1and 2 but less than 2, is 10 for any value between 2 and 3 but less than 3,
and so on. Finally, the cumulative frequency of the less than type is 80 (the total
frequency) for 9 orfany value exceeding 9,
Similarly, the cumulative frequency of the more-than type is 0 for any value of the
variable exceeding 9, is 2 for any value between 8 and 9 but exceeding 8, is 6 for any
value between 7 and 8 but exceeding 7, and so on. Finally, the cumulative f~equency
of the more than type is 80 for the value 1 or any value less than 1.
Thus, we can see that the cumulative frequencies are constant in some intervals,but
when they change, they change in jumps.
It goes without saying that by taking cumulative total of the relative frequencies (or
by dividing the cumulative frequencies by the total frequency), we cah form two other
tables : a table of cumulative proportions of the less than type and a table of
cumulative propdrtions of the more-than t y k . The fonner would provide answers to
questions like, 'What is the proportion of individuals having the value of the variable
less than or equal to k?' The latter would answer questions like, 'What is the
proportion of individuals having the value of the variable greater than or equal to k?
If you have understood the discussion so far, you will surely be able to do these
exercises.
G B V P B G B S G S S
G S G V B B G S V B G
S G B B G G V G G S B
V S S G S V S B S G S
B V S S B S S S B G G
[V=very good, G= good, S=satisfactory , B=bad, P=poor]
Draw up a frequency table and a relative frequency table for these data. Hence,
answer the following questions:
a) How many of the inmates think the services are good?
b) How many think that the services are at least satisfactory?
c)' What is the percentage of inmates who consider the services to be less than
satisfactory?
E 6) The following data indicate the length per word for the 91 words in Tagore's
poem 'Where the mind is without fear and the head is held high, etc.':
5 4 3 5 8 6 6 3 4 5
3 4 4 5 8 2 6 7 4
4 5 6 4 9 6 4 2 6
2 9 2 3 3 3 2 4 2
7 2 4 4 4 3 4 4 7
4 4 9 3 7 4 5 4 2
3 5 2 5 10 3 5 8 6
3 3 6 2 5 3 3 7 3
4 5 8 5 3 4 4 3 2
2 3 5 5 5 3 2 6 7
Draw up a frequency table. In the same table, show the relative frequencies and
the cumulative frequencies of both types. Hence,, answer the following
questions :
a) How many of the words have at least 6 letters?
b) How many have 5 letters or more?
c) What is the proportion of words with 2 letters?
d) What is the proportion of words of length 4 or more?
Until now, we have seen how to construct ungrouped frequency tables for qualitative
characters and discrete variables. For such a table, we count the frequencies of each
distinct attribute or value taken by the variable, and so there is no loss of information.
But this may not always be feasible. For example, suppose we have raw data on the
number of grains per earhead for 400 ears of a variety of wheat. It is quite possible
that there are some earheads with as few as 8 grains and some with as many as 57
grains. In this case, if we construct a frequency table taking each distinct value
between 8 and 57, then the table would be too long. Then again, ungrouped
frequency tables cannot be constructed for data on continuous variables, because a
continuous variable can take infinitely many distinct values. In such cases, then, it
becomes necessary to group some variable values together and then construct
frequency tables. We shall discuss such grouped frequency distributions in the next
sub-section.
Here, the values are recorded correct to one decimal place (i.e., correct to the nearest
tenth of a centimetre). The lowest observation in the set is 0.8 and the highest 9.2.
If we take our classes as 0.6-1.3, 1.4-2.1, 2.2-2.9,... .,8.6-9.3, then the total number
of classes will be 11; To get the frequencies for these classes, we again go in for the
tallying system which we had adopted in Table 3. This is done in Table 6.
Table 6 :Frequency tabie for the data of Table 5 on petiole length of leaves of a pipal tree Fig. 2 : Petiole length
Petiole length (cm) Tallies Frequency
- -
Total 198
But here the classes need to be redefined. The reason is that the value recorded as,
say, 4.6 actually stands for some value between 4.55 and 4.65. Similarly the value 5.7
stands for some value between 5.65 and 5.75. Thus, the class taken as 4.6-5.7 in
Table 6, in fact, begins at 1.55 and ends at 5.75. The other classes have to be viewed
in the same way. We then have to properly define the classesln terms of
class-intervals, with no gap between any two successive intervals. The two end-points
of a class-interval are called class boundaries (the lower and the upper) while the
mid-point is called the clasp mark. The width (or length) of a class interval is, of
course, the difference between the upper class boundary and the lower. The
end-values of a class, when the classes are defined as in Table 6, are called the class
limits to distinguish them from the class boundaries. We may then say that the
frequency table in the form of Table 7 presents the frequency distribution of petiole
length more appropriately than does Table 6. The width of each class here is 0.8 cm.
Table 7 : Frequencylrelativefrequency table for petiole length of 198 leaves of a pipal tree
Petiole length (cm) . Frequency Relative Frequency
Class interval
0.55-1.35 2 0.0101
1.35-2.15 6 0.0303
2.15-2.95 8 0.0404
2.95-3.75 10 0.0505
3.75-4.55 24 0.1212
4.55-5.35 43 0.2 172
5.3-6.15 52 0.2626
Descriptive Statis Tabje 7 continued - -
.. p-
,.>
The third column in Table 7 shows the same frequency distribution in terms of relative
frequencies. While the frequencies tell us, for any interval, how many of the leaves
have petiole length between the two class boundaries, the relative frequencies indicate
what the proportion (or percentage) of such leaves is.
Here again, the cumulative frequencies of less than and more than types provide us
with two additional modes of representing the same frequency distribution. You can
see such representation in Table 8.
Table 8 : Cumulative frequencies for petiole length of 198 leaves of a pipal tree
--
Cumulative fre4uency
Petiole length (cm)
Class interval (less than type) (more than type)
of the different classes. By the frequency density of a class we mean the frequency
per unit of width in the class. It is somewhat similar to the population density of a
locality and is defined by the formula:
class frequency
frequency density of a class =
class width
The series of class intervals taken together with the series of frequency densities
should give a good idea of the frequency distribution of the variable being studied.
You may wonder why we need to bring in frequency densities at all. Are'nt the
frequencies supposed to give us an idea of the frequency distribution? But there are
situations where we have to take classes of varying widths. In these cases, frequency
densities become more meaningful. For example, consider the frequency distribution
..- of a variable like monthly family income or family weal'h at a given date. For any
group of families, a large majority of the families will h. ve incomes in the lower
income brackets while the number of families will be sma8'sr towards the higher
I income brackets. As in other cases, here too, we may choose to have classes of the
same width. However, if the common width is small, say Rs. 200, then too many
classes will have to be taken. Many of these classes might be empty. This will bring
Frequency Distributionof a
Character
in an irregular pattern and gross distortion in the true nature of the distribution. On
the other hand, if the common width is large, say Rs. 1,000, then the number of
classes will be too few and the true nature of the distribution, which usually shows
rapid changes in the lower parts of the range, will get blurred. This will also lead to
serious errors in the statistical measures computed on the basis of the grouped data.
Therefore, it would be advisable to have classes of varying width-narrower classes
in the lower parts of the income range and classes of increasing width towards the
higher parts of the range. Now, when the classes are of varying width, the class
frequencies will not be comparable. In such situations, the frequency densities that
are obtained from the frequencies by reducing them to a common base (see Table 9)
should be used.
Table 9 : Frequency distribution of monthly income for 1,276 urban families
Income (Us) Frequency Frequency density
0 218 1.0900
200 153 0.7650
400 19n 0.6333
700 152 0.5067
1000 159 0.3975
2800 49 0.0817
3400 23 0.0375
4000 15 0.0188
4800 8 0.0080
--
Total 1,276 -
Thus, frequency densities give us a true picture of the frequency distribution when
the classes are of varying width. This will be all the more obvious when we consider
the problem of diagrammatic representation of the frequency distribution of a
continyous variable in the next section.
We have already mentioned at the end of Sec. 1.3.1 that even in the case of a discrete
variable (or, for that matter, of a qualitative character), we may have to define the
classes in terms of more than one distinct value of the variable (or more than one
distinct form of the qualitative character). Table 10 illustrates this point. We would
have to deal with as many as 50 classes if we did not use the type of condensation
that is indicated by the first column of the table.
Table 10 : Frequency distribution of number of grains per earhead for 400 ears of
a variety,of wheat (see Fig. 3)
8-12 1
13-17 17
18-22 25
23-27 86
28-32 125
33-37 77
Total
- - Fig. 3 : Wheat earhead
Gnnmrro < ~ i l r i r t i r m lM ~ ~ h n d r Aori,.riltr,rnl
fnr Ulnrrierc I>\, P:\ltct~an<l C > > ( i h a t m ~ 17
Now before we end this section, we list the main considerations guiding the
construction of a frequency table.
For one thing, the classes should be exhaustive, in the sense that each of the
observations should be assignable to one class or another.
Secondly, the classes should be mutually exclusive. This means that no two classes
should overlap so that each of the observations can be assigned to exactly one of the
classes without any ambiguity. These two criteria have clearly been followed in
constructing the tables in Sections 1.3.1 and 1.3.2.
Thirdly, while it seems natural in the case of a qualitative character to take a separate
class for each distinct form of the character and in the case of a discrete variable to
1
take a class for each distinct value of the variable, the classes should not be t o e
numerous. For the main objective is to summarise the data into an easily manageable
and comprehensible form. Besides, having too many classes might result in a situation
where many of the classes may have zero frequencies. Whereas the true distribution
may show a gradual increase or decrease of frequency, the observed distribution, in
such a case, will indicate abrupt changes in the frequency. Because of this, in many
I
cases, we have to define the classes in terms of more than a single value of the
character concerned.
But the classes should not be too few either. If there are too few classes, we are likely
to overlook some important features of the distribution. For instance, an
asymmetrical distribution may appear t o be fairly symmetrical. We shall talk about
symmetrical distributions in Section 1.4.4
Further, in the computation of various measures related to the distribution, we
assume that the observations within each class interval are concentrated at the class
mark, instead of being spread over it. You will come across this in Unit 2. So, if the
classes are too few, or equivalently, if each class is too wide, then this assumption
may lead to considerable error in the computation of these measures.
Last, but not the least, we should see to it that in the case of a variable, the classes
are defined in terms of the same number of distinct values of the variable or are of
the same width. Otherwise, the frequencies (or the relative frequencies) for the
different classes will not be comparable. On occasion we have to deviate from this
rule, as you have seen from Table 9. In such cases, we have to work with frequency
densities.
So far we have seen that a frequency distribution presents the data in a concise form.
We can get a general idea of a distribution more readily and effectively through an
appropriate diagram. In the next section, we talk about this diagrammatic
representation.
1.4.1 Frequencies
We shall discuss the cases of ungrouped and grouped frequency distributions, one by
one.
Fig. 4 : Bar diagram showing the frequency distribution of 314 flowers in an F1 population of linseed
by colour.
,
DesxipUve statis~ica The rclative frequencies can be represented by means ot a bar diagram in the same
way as the absolute frequencies. But in case of data on a qualitative character, a
better mode of representing them would be to use what is called a pie diagram or
chart. This diagram makes use of a circle, whose total area is divided into as many
sectors as there are classes by drawing angles at the centre. The area of each section
represents (is proportional to) the corresponding relative frequency. To illustrate the
use of a pie diagram, let us consider the relative frequency table of colour of flowers
in the F2 generation of linseed (Table 1). We first determine the angles (in degrees)
to be drawn at the centre of the circle (see Table 11).
Table 11 : Angles to be drawn at the centre of pie diagram for the frequency
distribution of Table 1
The figures in the third column of the table indicate the measures (in degrees) of the
angle to be drawn for each class, its sides extending from the centre of the circle to
its circumference. Note that the angle for any given class measures
360" x relative frequency
1
Now, the area of a sector of angle 0 radians in a circle of radius r is ?r20. Thus, in a
L
given circle, the area of a sector is directly proportional to its angle (whether in radians
or in degrees). So if we draw sectors with angles given in the third column of
Table 11, then the area of each sector is proportional to the corresponding angle
which, in turn, is proportional to the corresponding relative frequency.
You can see the pie diagram corresponding to Table 11 in Fig. 5.
If we are dealing with a discrete variable, we can also form a column diagram to
represent its frequency distribution. In Fig. 6(a) we have the column diagram for the
frequency distribution of household size (Table 3).
Frequency DLstributIot~ofa
Character
I
(a) Household size
20 7
* is-
5
5 10-
r=
5 -
I (b)lHousehold size
Fig. 6 : (a) Column diagram (b) Frequency polygon for the data on household size
When the possible distinct values of the variable are equispaced, as in the case of
household size, word length, etc., an alternative mode of representing the frequency
distribution is available to us. Again we take two mutuatly perpendicular axes, the
horizontal for the variable and the vertical for the frequency (relative frequency).
Then we plot each distinct value and the corresponding frequency (relative
frequency) as a point on the graph paper, with respect to these axes. See Fig. 6(b)
which represents the frequency distribution in Table 3. Then we join the points for
the successive values of the variable by straight line segments. Next we take two
additional points, one for the possible lower value than the lowest in the table and
the other for the possible higher value than the highest in the table, the corresponding
frequencies being of course, zero. For the distribution of household size, for instance,
the two additional points will correspond to household size 0 and household size 10.
Then we join these points with the points corresponding to the adjoining value, and
thus obtain a closed polygon. Such a diagram is called a frequency polygon. Note that
we can also get the frequency polygon by joining together the tops of the columns in
the column diagram.
You may try your hand at drawing a frequency polygon now.
E8) Draw a colurnrl diagram and a frequency polygon to represent the frequency
distribution of the data in E6.
Petiole length
In Fig. 8, you can see the histogram for the frequency distribution of family income
given in Table 9.
E9) Draw a histogram to represent the frequency distribution of the yield of seed
cotton given in E7.
Here we have seen some ways of diagrammatically representing the frequency tables
of qualitative characters, and discrete and continuous variables. We can also use
cumulative frequencies to represent the frequency distribution of a variable. In the
next sub-section, we shall see how the cumulative frequency tables of a variable can
be diagrammatically represented.
-
I L I 1 1 I L I 1
1 . 2 3 4 5 6 7 8 9
Fig. 9 : Step diagram representing cumulative frequencies d the (a) less than type, (b) more than type,
for the data on household size.
P
The picture takes a somewhat different form when it comes to the cumulative
frequencies of the more-than type. From Table 4 you can see that the cumulative
frequency diagram will again be a step diagram, but like the one in Fig. 9(b).
b) Continuous variable
In representing the cumulative frequencies of either type for a continuous variable,
we proceed as in the discrete case, taking two rectangular axes of coordinates, the
horizontal for values of the variable and the vertical for cumulative frequency. But
we have to bear in mind that the cumulative frequency of the less than type for any
class corresponds to the upper class boundary and that it increases gradually and not
by jumps (as it does in the discrete case). Similarly, we have to remember that the
cumulative frequency of the more than type for any class corresponds to the lower
class boundary and that it decreases gradually and not by jumps.
So while drawing the diagram for the cumulative frequencies of either type, the points
corresponding to the successive class boundaries are joined by straight line segments.
Note that the cumulative frequency of less than (more than) type is zero (n) for any
variable value less than the lower boundary of the lowest class and is n (zero) for any
variable value exceeding the upper boundary of the highest class. Hence, the graph
of the cumulative frequency of the less than type will coincide with the horizontal
axis for values less than the lower boundary of the lowest class and parallel to that
axis at a height of n for all values equal to or exceeding the upper boundary of the
highest class.
In the case of the cumulative frequency diagram of the more than type, the picture
gets reversed : the graph will now be coincident with the horizontal axis for values
exceeding the upper boundary of the highest class and will be parallel to that axis at
a height of n for all values not exceeding the lower boundary of the lowest class. In
Figs. 10(a) and (b), we have these diagrams for the data on petiole length of'leaves
of a pipal tree.
Petiole length
I
Petiole length
Fig. 10 : Ogives for the data in Table 8 (a) less than type, (b) more than type.
The two cumulative frequency diagrams for a continuous variable resemble in shape
the two curves forming the top of an og&, a type of arch. Hence they have been
called the ogives of the distribution of the variable.
Here is an exercise for you.
E10) a3 Represent the cumulative frequencies of the less than and more than types
for the data in E6 by suitable diagrams.
b) Draw the ogives corresponding to the data in E7.
Requency~ b t r l b ~ t l d
o aa
So far we have considered frequency distributions of variables where the total number Charseter "
of individuals was finite. Later, in Block 4, you will see that the frequency
distributions that we encounter in real life situations arise from sampling from a large
group of individuals, called a population. In most of these situations, we can r ~ a r d
the population as infinite. Let us now discuss the diagrammatic representation of the
frequency distribution of an infinite population by a frequency curve.
Variable value
(4
- Variable value
(4
Fig. 11 : Histogram of a frequency distribution of a continuous variable approaching a smooth curve.
I
Similarly, we can also say that with increasing total frequency and decreasing class
-
width, the ogive of a continuous variable of either type will also gradually approac'h
", a smooth curve as shown in Fig. 12. For the sake of comparability, we draw these on
the basis of cumulative relative frequencies (rather than cumulative frequencies).
YA A
i Y
0 0
,d .-
,d
L,
7 *
0 Variable Value X 0 Variable value X
(a) (b)
Fig. 12 : Limiting forms of the ogives of a continuous variable : (a) less than type (b) more than type
Now here are some exercises. In each of these we have asked you to give a
diagrammatic representation to some of the frequency distributions which you have
met in this urlit.
Many of the distributions that are encountered in the physical, biological and
behavioural sciences, .as well as those arising from measurements in the field of
manufacturing industry, closely follow this form. For instance, if we collect data on
the stature (in cm) of a large number of adult males of a given race, then we will end
up with a distribution of this type.
ii) Bell-shaped Moderately Asymmetrical Distribution
A distribution of this type also has a single maximum, but the frequency or frequency
density decreases on one side at a higher rate than on the other (Fig. 14).
Y t
I
Y 4
Y
.&
2
a"
CT
4
.x--
B
0
,
Variable value X
L-
-B-
.-P
0' Variable.value X
,
The income distribution of Table 9 falls in this category, as you can see from Fig. 8.
The distribution of land-holding per family, the distribution of age at death of people
of age 60 years or less, the distribution of life of lamp bulbs, etc., will also be similar
to Fig. 15 (a).
iv) U-shaped Distribution
Such distributions are extremely rare. A .distribution of this type has its minimum
frequency (or frequency density) towards the middle of the range of variation while
the frequency (or frequency density)graduallyincreases, at the same rate or at
different rates, as the variable value changes either to the left or to the right (see
Fig. 16).
A
Y
-P
.-A
-0
+
$
-2-
.P
- -
0 Variable value X
,
In general, a multimodal distribution signifies heterogenity of the data-the fact that
the data have been obtained from groups with widely different characteristics.
So, we have seen that we can obtain a lot of information about the data from its
pictorial representation.
With this we bring this unit to a close. Let us go back and recall the points covered
in it.
SUMMARY
, In this unit, we have discussed the following points :
1) The term statistics may mean either numerical data arising in some sphere of life
or the scientific discipline that concerns itself with the collection, analysis and
interpretation of such numerical data.
2) Methods of data collection :
Direct observation method
Questionnaire method
Interview method
3) Classification of characters into qualitative and quantitative, and that of
quantitative characters into discrete and continuous ones.
4) Representation of frequency distribution of a character by means of a table.
5) Relative frequencies and cumulative frequencies.
6) Representation of the frequency distribution of a character by means ,of a
diagram : bar diagram, pie diagram, column diagram, frequency polygon,
histogram, ogive curve.
7) Classification of univariate distributions into certain broad categories :
Bell-shaped symmetrical and asymmetrical distributions,
J-shaped distributions,
U-shaped distributions,
Multimodal distributions.
1.6 SOLUTIONS
E l ) a) and d) are secondary data,
b) and c) are primary.
E2) a) interview method
b) questionnaire method (or interview method)
c) measurement (of yield for somesample plots)
d) measurement.
E3) c) and d) are qualitative,
a), b) and e) are quantitative.
E49 b), c) and f) are discrete,
a), d) and e) are continuous.
E5) The frequency distribution is :
Asesment Frequency Relative Frequency
V 7 0.1272
G 16 0.2909
S 18 0 3272
B 13 0.2363
P 1 0.0181
55
E6)
Cumulative frequency
Word Length Frequency Relative Freq. lese than more than
2 13 0.1428 13 91
3 19 0.2087 32 78
4 21 0.2307 53 59
5 15 0.1648 68 38
6 9 0.0989 77 23
E7) Noting that the lowest and the highest of the observations are 23 and 115, you
may take your classes as 21-30, 31-40,. ....., 111-120,
(a)
Cumulatiye Requency
Yield (gm) Claw mark Frequency Relatlve Freq. kss than more than
47 79 3
c) i) - ii) - iii) -
120 120 120
Frequency Dbtrlbutkn d a
Chrrtk
Ellj
E12)