0% found this document useful (0 votes)
97 views

Unit-1 IGNOU Statistics

This document provides an introduction to frequency distributions and related statistical concepts. It discusses raw materials of statistics such as characters, individuals, primary and secondary data collection methods. It defines key terms like qualitative vs quantitative characters, discrete vs continuous variables. It explains how to construct frequency tables and represents distributions diagrammatically using frequencies, cumulative frequencies and frequency curves. Examples are provided to illustrate concepts like ungrouped and grouped frequency distributions. The objectives are to understand these foundational statistical concepts and be able to draw frequency tables and choose appropriate representation methods.

Uploaded by

Carbideman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Unit-1 IGNOU Statistics

This document provides an introduction to frequency distributions and related statistical concepts. It discusses raw materials of statistics such as characters, individuals, primary and secondary data collection methods. It defines key terms like qualitative vs quantitative characters, discrete vs continuous variables. It explains how to construct frequency tables and represents distributions diagrammatically using frequencies, cumulative frequencies and frequency curves. Examples are provided to illustrate concepts like ungrouped and grouped frequency distributions. The objectives are to understand these foundational statistical concepts and be able to draw frequency tables and choose appropriate representation methods.

Uploaded by

Carbideman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT 1 FREQUENCY DISTR-IBUTION

OF
A CHARACTER
Structure
6
1:1 Introduction
Objectives
1.2 Raw Materials of Statistics
1.3 Frequency Distributions
Ungrouped Frequency Distributions
Grouped Frequency Distributions
1.4 Diagrammatic Representation of Frequency Distributions
Frequencies
Cumulative Frequencies
Frequency Curve
Broad Classes of Distributions
1.5 Summary
1.6 Solutions and Answers

1 . INTRODUCTION
In this unit, we shall talk about the basics of statistics. We sLal1 define the terms
which we shall be using again and again throughout this course. It is possible that
you have read all this before. But that might have been some years ago. So a quick
look through this unit will help you to recall the relevant facts. In case you have never
beeh introduced to statistics before, this unit will gradually acquaint you with its basic
conc~pts.You will find that most of the terms we use in statistics are part of our daily
voc~.bulary.But we have to know their precise meaning before we use them in
statistics.
Further, you will see how to collect the data relating to a given investigation. You
will also be introduced to the concept of frequency distributions. Through simple
examples, we shall acquaint you with the various modes of presenting a frequency
d~stribution-tabular as well as diagrammatic.

Objectives
On reading this unit, you should be able to :
distinguish between a qualitative and a quantitative character,
differentiate between a discrete and a continuous variable,
draw up a frequency table and get the relative frequencies, cumulative frequencies
and frequency densities,
decide upon a suitable mode of representing a frequency distribution ,
diagrammatically.

RAW MATERIALS OF STATISTICS


We have told you that in this unit we q e going to define some basic terms which
occur frequently in statistics. How about starting with the word "stati~tics"?We use 9
the term "statistics" in two different contexts. Numerical data arising in some sphere 'data' is the plural of 'datum'.
Q
of ]if<, as well as the discipline that concerns itself with the collection, analysis and
interpretation of such data are both called statistics.
For example, we talk about
the admission statistics of IGNOU,
the statistics of steel production in India, or
8
the statistics of the Indian team's performance in international cricket tests.
In all these cases, we are talking about nymerical data. 7
I D d p t i v e ststistics . 0 , n the other hand, when we talk about
I
a student of statistics or
a book of statistics,
We have the discipline in mind.
Now let us turn ovr attention to the two concepts of "character" and "individual"
I
which are basic to any statistical study. To understand these two terms, we consider
the following cases :
1) A teacher looks at the grades (say A,B,C,D and E) awarded to his students on
their performance in an examination. Here, the students are the individuals and
the character is the grade (per student).
I
2) An economist collects data on the size and the expenditure on food in a given
month for urban households. In this case, the individuals are the households.
What about the character? Here we see that there are two characters under
study, namely, household size and expenditure on food (per household) in the
given month.
Thus, in any instance, the data relate to one or more characters and a group of
individuals who possess the character or characters in varying forms or amounts.

Further, we can classify the data as primary or secondary. If we collect our own data
on the relevant group of individuals and use it in a study, then the data will be called
primary. In some cases, however, we may choose to make use of the data already
available in government publications or the data collected by some other agency.
Such data are said to be secondary. We can save a lot of time and money if we use
secondary data. But, at the same time, we have to be very careful. We have to make
sure that
the data are relevant to our enquiry,
the concepts and definitions used conform to what we have in mind, and
the data are reliable.

On the other hand, if we decide to use primary data, we shall have to decide on how
to go about collectmg it. Primary data can be obtained in a number of ways,
depending on the information sought and also on our knowledge of the relevant
group of individuals. We give below some of the commonly used methods.

1) Direct Observation
Suppose we want to know the number of leaves per twig of a tree, or the weight (in
grams) per egg in a basket of eggs or the health status (good/indifferent/poor) per
student in a class. In each of these cases, we can obtain the required information by
direct observation, through counting or measurement or, simply, by inspection.
But in social and behavioural sciences, we collect information from persons who are
supposed to know. These persons are called informants. We can either get the ,
information directly from the informants or through intermediaries (called
enumerators) appointed for the purpose. In such cases, we can use the following
methods.

2) Questionnaire Method
If the informants happen to be sufficiently enlightened, then we can give them blank
questionnaire forms and request them to provide the necessary information by filling
out the forms. This method would be appropriate in gathering information about,
say, the attitude of doctors towards euthanasia (mercy-killing).

3) Interview Method
In case the informants are illiterate or not enlightened enough, the enumerators fill
out the schedule by a thorough and tactful questioning of each informant. As you are
aware, this method is used'in the population census held once in ten years in our
country.
c - -
&quency DWdbutkn of a
solve these exercises and check whether y o i have grasped these ideas or not. Character

# E 1) Indicate which of the following are primary data and wnich are secondary :
1 a) Data taken from the Government of India publication, Statistical Abstract
India of 1986.
Data collected by a market research bureau through door-to-door enquiry
to study the demand for a newly marketed shaving lotion.
c) Data collected by a medical research group through questioning of patients
visiting a hospital's dut-door facilities.
d) Weather data recorded by the Department of Meteorology and then used
by the investigator for writing a Ph.D. thesis.
E 2 ) What mode of data collection would you recommend for
a) studying the progress of a publichealth programme covering a city's slums?
b) finding out t,he reactions of a number of economists to this year's budget
proposals?
c) estimating the yieldaate (per acre) of a particular variety of wheat?
d) estimating the time taken to complete a particular calculation?

We have observed before that data relate to one or more "characters". Let us look
at this term more closely.
Characters fall into two broad categories.
There are certain characters which take varying forms for different individuals but
cannot be expressed numerically.The brand name of motor cars plying in an Indian
city is such a character; it may be Ambassador Contessa, Premier Padmini Deluxe,
Standard Herald Gazelle, Maruti 1000or other. The employees in a city hospital may
be observed for their smoking habits; any given employee will then be recorded as a
smoker o r a non-smoker. Such a character, whose possible forms can be distinguished
verbally, but not numerically, is called a qualitative character (or attribute).
On the other hand, we can express characters like the size of families, age of teachers,
lieight of students, weight of eggs, etc., in numerical or quantitative terms. The size
of a family (i.e., the number of members in the family) will be a positive integer-
1,2,3, etc. The age of a teacher may be given in years or in years and months. The
height of a student may be given in centimetres and may be rounded off to the nearest
centimetre. The weight of an egg may be recorded in grams and again may be
rounded off to the nearest tenth of a gram. Such characters are called quantitative
characters (or variables).
A qualitative character, too ultimately yields numerical data. This is because we will
finally note how many of the individuals under study have any given form of the
character. In the case of motor cars in a city, we thus note how many of the cars are
Ambassador Contessas, how many are Maruti 1000, and so on. However, the data
<ona quantitative character are numerical right from the beginning and so we can give
them a more in-depth statistical treatment than those on a qualitative character. A
qualitative character whose forms have an implied ranking (or gradation), however,
stands on a somewhat different footing. We can assign scores to these forms and thus,
express the raw data in quantitative terms. Data of this type are called ordinal data.
For example, an employee's performance in a year may be very good, good,
satisfactory, bad or poor. But we can assign the scores 5,4,3,2 and 1, to these five
categories, and immediately the data on the performance of the employees in an
office assume a numerical look. Surely, there is a lot of arbitrariness in assigning
scores this way. Nevertheless, this method of scoring is quite popular with research
workers in social and behavioural sciences.
Note that 'scoring' must be distinguished from 'coding' used to facilitate the
processing of data on an electronic computer. We use codes mainly for identification
purposes, similar to the use of roll numbers in IGNOU. Scores, on the other hand,
are more informative. For example, if you get a B grade in MTE-11, it means that
you have a good grasp of the course. ,
-

u i s h .driables and attributes now.


See if you can d ~ s t ~ ~ ~ gbetw~d

E3) Classify the following characters as qualitative or quanhtative.


a) word-length (i.e., number of letters per word) of the words of a poem; , r i;
b) diameter of balls (in cm)produced by a firm;
c) mother tongue of the residents of a city;
d) attitude towards family planning of the couples living in a locality;
e) proportion of males in each group of 25 students. I :

We have classified characters into two categorieq: qualitative and quantitative. Now
quantitative characters or variables, in their turn, may be classified as discrete and
continuous.
A discrete variable is one that can conceivably assume only some discrete,-or isolated
values. The size of families, the proportion or the number of males in each group of
25 students, or the length of a word are variables of this type. The size of a family
or the length of a word may take values like 1,2,3, etc., but no values in between.
The number of males in a group of 25 students may be 0,1,2, ...,24 or 25, while the
proportion of males may be 0,0.04,0.08 ....,0.96 or 1; values in between these -
numbers are inconceivable.
A continuous variable, on the other hand, can possibly take any value in some
interval. For example, the age (in years) of teachers, the height (in cm.)of students,
the weight (in grams) of eggs are all continuous variables. Supposing the minimum
age at which a person can join the teaching profession is a years and that every
member of the teaching community has to retire on reaching the age P years, then
the age of teachers must vary between a and p and can take an); value within the
interval [a, PI. Indeed; the actual age of a teacher may well be 32.119237 years!
However, there will be hardly any need to record the age with this much precision!
The enquirer may be satisfied by taking the age correct to the second decimal p l a a
so that the teachers age may be recorded as 32.12 years. This is an example of how
limitations of the measuring instruments can introduce a discreteness into the
observations of a continuous variable. Similarly, the actual monthly income of an
Indian which is a continuous variable, has to be expressed in rupees or in rupees and
paise, since the paisa happens to be the smallest denomination coin in the Indian
system of currency. This is also the case with the score in an examination of students
taking the examination. The score is invariably expressed in integers and yet it has
to be regarded as a continuous variable. This is because the score is supposed to
measure the p~oficiencyof the students in the subject concerned, and the proficiency
may be taken to vary in a continuous manner (say, between 0 and 100).
Try this exercise now.

E 4) Indicate which of the following variables are discrete and which are continuous :
a) diameter of ball-bearings produced by a steel mill;
b) number of beds per hospital in a city;
c) proportion of heads in sets of 10 tosses of a coin;
d) length (in mm) of needles produced by a factory;
e) weight of loaves (in kg) produced by a bakery;
f) size of households in a village.

The distinction between a discrete and a continuous variable is important. Quite


often, the statistical analysis of the data will differ accordingly. In fact, there are some
techniques of statistical infer-, which are based on the assumption that the
variable under study is continuous. These are dearly inapplicable to data on a discrete
variable. -
In the next section, we shall discuss the concept of frequency distributions of
qualiiative characters and variables.
Frequency piatribntlon d a
ÿÿÿÿ FREQUENCY DISTRIBUTIONS Character

In this section, we shall discuss the method of organising raw data into frequency
distributions. You will see that we can get information out of a frequency distribution
more easily than out of raw data. Here, we shall first discuss ungrouped frequency
distributions and then discuss grouped ones.

1.3.1 Ungrouped Frequency Distributions


We use ungrbuped frequency distributions when the data is of a qualitative nature,
or when the variable under consideration is discrete. Here, we will take one example
of each situation for illustration.

Frequency Distribution of a Qualitative Character


A botanist obtained a variety of linseed by cross-breeding of two pure varieties. She
observed the colour of flowers of plants grown through inbreeding of the new mixed
type (called plants of the Fa generation). On the basis of these observations, she -
prepared the following table.
Table 1 : Classification of flowers in an F, population of linseed by colour
Colour . Number of flowers Relative frequency
(frequency)

Blue
Lilac
White
Pink
Total 314 0.999

(Ref: Statistical Methods for Agricultural Workers by Panse and Sukhatrne).

The figures in the second column of Table 1 are called the frequencies of the four
classes (or of the four colours). So 'frequency' indicates how frequently the
corresponding form of the character under study (viz., colour) occurs in the collected
data. The sum of the frequencies, 314 in this case, is said to be the total frequency.
The first two columns in Table 1 constitute a frequency table. Since these indicate
the manner in which the total frequency 314 (or the total number of individuals) is
distributed among the four classes, they are also said to represent the frequency
distribution of colour for the 314 flowers. Perhaps a better expression is 'the
frequency distribution of the 314 flowers by colour'.
Alternatively, we can also write the frequency distribution in terms of the proportions
of blue, lilac, white and pink flowers in the group. These proportions give the relative
frequencies, and are shown in the third column of Table 1. By definition,
frequency of the class
relative frequency of a class = - ,

total of frequency
... (1)
Then what is the total relative frequency? One, of course. But you can see that in
Table 1, the relative frequencies do not add up exactly to 1. This is because the ,
individual figures are all approximate, rounded off to a certain number of decimal
places. '
Note that while the distribution of frequencies answers questions of the type 'How
many flowers in tbe given group are blue?', the relative frequency has to do with
questions like 'what is the proportion (or percentage) of blue flowers in the group?'
Further, in any situation, a frequency must be non-negative integer. The value 0 is
admissible, for in the above situation it is conceivable that we might have a fifth
flower colour, say yellow, which was absent in the sample. A relative frequency, on
the other hand, must b e a rational number in the interval [0,1].
The simplest type of classification of a group of individuals by a qualitative character
is a dichotomy, i.e., a classification with just two classes. A group of students may
' thus he classified by sex as boys and girls or by performancr at an examination as
succesdful and msuccessful.
-
Let us now take an example of the data on a discrete variable.

Ungrouped Frequency Distribution of a Discrete Variable


Consider the data collected by a social scientist on household size for households in
an urban locality, given inJable 2.
Table 2 : Data on household size for 80 h'ouseholds in an urban locality

4 4 2 4 5 2 3 3
4 3 5 5 6 6 7 5
5 3 7 2 7 6 2 6
8 1 6 5 6 6 8 7
7 9 5 4 5 5 6 3

As in the case of a qualitative character, here too, we would like to summarise the
data by forming a frequency table. For this it would be necessary to count the number
of times 1 appears, the number of times 2 appears and so on. We can count more
easily if we follow a tallying system. This system can be used by people without any
formal training in arithmetic (like our cave-dwelling forebears!)
Thus, we take nine classes defined by the nine distinct values 1,2,...9, noting that 9
was the largest household size recorded in the data. The second column in Table 3
shows the tallies against each of these values. After counting the tallies, we write the
frequencies in the third column. In the fourth column we have written the relative
-- frequencies.
Fig. 1 : Early notch-curt~ngby
primitive man
Table 3 : Frequency table for size of 80 households
Household Tallies Frequency Relative
size frequency

Total

There are two more ways in which we can represent the frequency distribution of a
discrete variable. Both make use of what are called the cumulative frequencies of the
variable. For a discrete variable like household size, the frequencies answer questions
of the type : 'How many individuals in the given group have the value k of the
variable?', and the relative frequencies answer questions like : 'What proportion of
the individuals has the value k of the.variable?' But how do we answer a question
like "How many individuals have the value k or less?" or "How many individuals
have the value k or more?"

From Table 3, you can see that the number of households of size k or less is 3 for
k = l , 3+7=10 for k=2,10+11=21 for k=3, and so on. We obtained these figures by .
taking cumulative totals of the frequencies in Table 3, starting from the lowest
observed value of the variable and going successively to the higher values. These are
called cumulative frequencies of the less than type. Similarly, to get the number of
We cannot talk of cumulative
frequencies of a qualitative
character unless it is of the
odinal type.

a
. ---
for the data on household size by means of Tables 4a and 4b.

Table 4a :Cumulative frequency table of "less than" type for size of 80 householas.
Household size Cwnulative frequencies

Table 4b : Cumulative frequency table of 66morethan" type for size of 80 households


Household size Cumulative frequencies

any value gre9ter than 9 u

While making use of Table 4a, you should remember that the cumulative frequency
of the less-than type is 0 for any value of the variable less than 1, is 3 for any value
between 1and 2 but less than 2, is 10 for any value between 2 and 3 but less than 3,
and so on. Finally, the cumulative frequency of the less than type is 80 (the total
frequency) for 9 orfany value exceeding 9,
Similarly, the cumulative frequency of the more-than type is 0 for any value of the
variable exceeding 9, is 2 for any value between 8 and 9 but exceeding 8, is 6 for any
value between 7 and 8 but exceeding 7, and so on. Finally, the cumulative f~equency
of the more than type is 80 for the value 1 or any value less than 1.
Thus, we can see that the cumulative frequencies are constant in some intervals,but
when they change, they change in jumps.
It goes without saying that by taking cumulative total of the relative frequencies (or
by dividing the cumulative frequencies by the total frequency), we cah form two other
tables : a table of cumulative proportions of the less than type and a table of
cumulative propdrtions of the more-than t y k . The fonner would provide answers to
questions like, 'What is the proportion of individuals having the value of the variable
less than or equal to k?' The latter would answer questions like, 'What is the
proportion of individuals having the value of the variable greater than or equal to k?
If you have understood the discussion so far, you will surely be able to do these
exercises.
G B V P B G B S G S S
G S G V B B G S V B G
S G B B G G V G G S B
V S S G S V S B S G S
B V S S B S S S B G G
[V=very good, G= good, S=satisfactory , B=bad, P=poor]

Draw up a frequency table and a relative frequency table for these data. Hence,
answer the following questions:
a) How many of the inmates think the services are good?
b) How many think that the services are at least satisfactory?
c)' What is the percentage of inmates who consider the services to be less than
satisfactory?

E 6) The following data indicate the length per word for the 91 words in Tagore's
poem 'Where the mind is without fear and the head is held high, etc.':
5 4 3 5 8 6 6 3 4 5
3 4 4 5 8 2 6 7 4
4 5 6 4 9 6 4 2 6
2 9 2 3 3 3 2 4 2
7 2 4 4 4 3 4 4 7
4 4 9 3 7 4 5 4 2
3 5 2 5 10 3 5 8 6
3 3 6 2 5 3 3 7 3
4 5 8 5 3 4 4 3 2
2 3 5 5 5 3 2 6 7

Draw up a frequency table. In the same table, show the relative frequencies and
the cumulative frequencies of both types. Hence,, answer the following
questions :
a) How many of the words have at least 6 letters?
b) How many have 5 letters or more?
c) What is the proportion of words with 2 letters?
d) What is the proportion of words of length 4 or more?

Until now, we have seen how to construct ungrouped frequency tables for qualitative
characters and discrete variables. For such a table, we count the frequencies of each
distinct attribute or value taken by the variable, and so there is no loss of information.
But this may not always be feasible. For example, suppose we have raw data on the
number of grains per earhead for 400 ears of a variety of wheat. It is quite possible
that there are some earheads with as few as 8 grains and some with as many as 57
grains. In this case, if we construct a frequency table taking each distinct value
between 8 and 57, then the table would be too long. Then again, ungrouped
frequency tables cannot be constructed for data on continuous variables, because a
continuous variable can take infinitely many distinct values. In such cases, then, it
becomes necessary to group some variable values together and then construct
frequency tables. We shall discuss such grouped frequency distributions in the next
sub-section.

1.3.2 Grouped Frequency Distribution


To illustrate the method of construction of a grouped frequency table, we consider ,
the data collected by a botanist in Shillong, shown in Table 5. Note that we are
, 1. .A, _ _ L---
-._L-L,_
lable 5 : Petiole length (in cm.) of 198 leaves of :r r .,ur-yearold pipal tree (see Fig. 2) Frequency DlsMbutionof@
Character
4.5 5.4 5.3 6.3 5.7 5.5 4.1 2.9 2.7 6.0 5.9 1.8 3.7 4.1 5.6
2.6 3.0 6.0 7.8. 4.5 5.7 4.5 8.0 5.5 7.5 3.1 3.1 5.2 6.8 9.2
5.5 4.5 5.5 7.0 4.5 4.0 5.9 3.8 6.0 5.2 5.6 7.0 6.3 5.1 6.0
6.3 4.5 5.0 5.3 5.6 6.3 3.4 5.1 6.7 6.2 7.2 6.2 5.0 6.1 6.3
4.7 4.1 6.1 5.6 5.5 4.4 6.0 5.0 3.4 5.0 2.5 5.7 5.2 6.1 6.5
5.5 5.5 4.5 5.5 7.7 7.0 7.3 6.5 6.7 6.1 6.7 4.7 8.5 4.7 6.7
6.5 4.2 h.9 3.9 7.2 4.2 6.1 1.6 7.2 6.5 3.6 5.9 5.3 6.6 5.0
6.2 1.9 2.2 5.2 6.6 4.9 5.9 5.4 6.5 6.6 6.8 4.1 4.7 5.7 4.1
5.7 5.0 5.7 5.2 2.8 4.3 4.6 4.9 6.0 5.9 4.5 3.7 5.7 3.8 5.6
5.2 3.9 6.5 5.0 5.2 6.0 2.3 5.2' 3.2. 5.5 7.1 7.0 3.2 7.2 5.9
5.3 1.6 6.9 6.1 6.3 6.7 2.4 6.3 4.8 4:6 6.7 1.5 6.8 5.9 5.3
7.0 4.3 6.7 5.4 4.7 5.1 5.2 7.4 4.5 6.4 5.0 2.0 5.7 4.6 4.9
5.2 6.0 4.5 6.1 3.5 5.9 5.0 6.8 5.0 1.0 5.5 4.9 5.9 5.2 6.1
0.8 5.3 5.9

Here, the values are recorded correct to one decimal place (i.e., correct to the nearest
tenth of a centimetre). The lowest observation in the set is 0.8 and the highest 9.2.
If we take our classes as 0.6-1.3, 1.4-2.1, 2.2-2.9,... .,8.6-9.3, then the total number
of classes will be 11; To get the frequencies for these classes, we again go in for the
tallying system which we had adopted in Table 3. This is done in Table 6.

Table 6 :Frequency tabie for the data of Table 5 on petiole length of leaves of a pipal tree Fig. 2 : Petiole length
Petiole length (cm) Tallies Frequency

- -

Total 198

But here the classes need to be redefined. The reason is that the value recorded as,
say, 4.6 actually stands for some value between 4.55 and 4.65. Similarly the value 5.7
stands for some value between 5.65 and 5.75. Thus, the class taken as 4.6-5.7 in
Table 6, in fact, begins at 1.55 and ends at 5.75. The other classes have to be viewed
in the same way. We then have to properly define the classesln terms of
class-intervals, with no gap between any two successive intervals. The two end-points
of a class-interval are called class boundaries (the lower and the upper) while the
mid-point is called the clasp mark. The width (or length) of a class interval is, of
course, the difference between the upper class boundary and the lower. The
end-values of a class, when the classes are defined as in Table 6, are called the class
limits to distinguish them from the class boundaries. We may then say that the
frequency table in the form of Table 7 presents the frequency distribution of petiole
length more appropriately than does Table 6. The width of each class here is 0.8 cm.
Table 7 : Frequencylrelativefrequency table for petiole length of 198 leaves of a pipal tree
Petiole length (cm) . Frequency Relative Frequency
Class interval

0.55-1.35 2 0.0101
1.35-2.15 6 0.0303
2.15-2.95 8 0.0404
2.95-3.75 10 0.0505
3.75-4.55 24 0.1212
4.55-5.35 43 0.2 172
5.3-6.15 52 0.2626
Descriptive Statis Tabje 7 continued - -
.. p-
,.>

Petiole length (em) Frequency Relative Frequency


Class interval
6.15-6.95 33 0.1667
6.95-7.75 15 0.0758
7.75-8.55 4 0.0202
8.55-9.35 1 0.0050
Total 198 1.0000

The third column in Table 7 shows the same frequency distribution in terms of relative
frequencies. While the frequencies tell us, for any interval, how many of the leaves
have petiole length between the two class boundaries, the relative frequencies indicate
what the proportion (or percentage) of such leaves is.
Here again, the cumulative frequencies of less than and more than types provide us
with two additional modes of representing the same frequency distribution. You can
see such representation in Table 8.
Table 8 : Cumulative frequencies for petiole length of 198 leaves of a pipal tree
--
Cumulative fre4uency
Petiole length (cm)
Class interval (less than type) (more than type)

We have to be careful in interpreting the cumulative frequencies of either type for


a continuous variable. In Table 8, 2 is the number of leaves having petiole length
1.35 cm or less, 8 is the number of leaves having petiole length 2.15 cm or less, and
so on. Hence, the cumulative frequencies of less than type now correspond actually
to the respective upper class boundaries. On the other hand, if we look at the column
of cumulative frequencies of more than type in Table 8, then obviously the number
of leaves having petiole length 8.55 cm or more is 1, the number of leaves with petiole
length 7.75 cm or more is 5, and so on. Thus, the cumulative frequencies of more than
type now correspond to the respective lower class boundaries.
But then there is yet another way of describing the frequency distribution of a
continuous variable, viz., through the use of what are called the frequency densities ,

of the different classes. By the frequency density of a class we mean the frequency
per unit of width in the class. It is somewhat similar to the population density of a
locality and is defined by the formula:
class frequency
frequency density of a class =
class width
The series of class intervals taken together with the series of frequency densities
should give a good idea of the frequency distribution of the variable being studied.
You may wonder why we need to bring in frequency densities at all. Are'nt the
frequencies supposed to give us an idea of the frequency distribution? But there are
situations where we have to take classes of varying widths. In these cases, frequency
densities become more meaningful. For example, consider the frequency distribution
..- of a variable like monthly family income or family weal'h at a given date. For any
group of families, a large majority of the families will h. ve incomes in the lower
income brackets while the number of families will be sma8'sr towards the higher
I income brackets. As in other cases, here too, we may choose to have classes of the
same width. However, if the common width is small, say Rs. 200, then too many
classes will have to be taken. Many of these classes might be empty. This will bring
Frequency Distributionof a
Character

in an irregular pattern and gross distortion in the true nature of the distribution. On
the other hand, if the common width is large, say Rs. 1,000, then the number of
classes will be too few and the true nature of the distribution, which usually shows
rapid changes in the lower parts of the range, will get blurred. This will also lead to
serious errors in the statistical measures computed on the basis of the grouped data.
Therefore, it would be advisable to have classes of varying width-narrower classes
in the lower parts of the income range and classes of increasing width towards the
higher parts of the range. Now, when the classes are of varying width, the class
frequencies will not be comparable. In such situations, the frequency densities that
are obtained from the frequencies by reducing them to a common base (see Table 9)
should be used.
Table 9 : Frequency distribution of monthly income for 1,276 urban families
Income (Us) Frequency Frequency density

0 218 1.0900
200 153 0.7650
400 19n 0.6333
700 152 0.5067
1000 159 0.3975

2800 49 0.0817
3400 23 0.0375
4000 15 0.0188
4800 8 0.0080
--
Total 1,276 -
Thus, frequency densities give us a true picture of the frequency distribution when
the classes are of varying width. This will be all the more obvious when we consider
the problem of diagrammatic representation of the frequency distribution of a
continyous variable in the next section.
We have already mentioned at the end of Sec. 1.3.1 that even in the case of a discrete
variable (or, for that matter, of a qualitative character), we may have to define the
classes in terms of more than one distinct value of the variable (or more than one
distinct form of the qualitative character). Table 10 illustrates this point. We would
have to deal with as many as 50 classes if we did not use the type of condensation
that is indicated by the first column of the table.
Table 10 : Frequency distribution of number of grains per earhead for 400 ears of
a variety,of wheat (see Fig. 3)

Number of grains per earhead Frequency

8-12 1
13-17 17
18-22 25
23-27 86
28-32 125
33-37 77

Total
- - Fig. 3 : Wheat earhead
Gnnmrro < ~ i l r i r t i r m lM ~ ~ h n d r Aori,.riltr,rnl
fnr Ulnrrierc I>\, P:\ltct~an<l C > > ( i h a t m ~ 17
Now before we end this section, we list the main considerations guiding the
construction of a frequency table.
For one thing, the classes should be exhaustive, in the sense that each of the
observations should be assignable to one class or another.
Secondly, the classes should be mutually exclusive. This means that no two classes
should overlap so that each of the observations can be assigned to exactly one of the
classes without any ambiguity. These two criteria have clearly been followed in
constructing the tables in Sections 1.3.1 and 1.3.2.
Thirdly, while it seems natural in the case of a qualitative character to take a separate
class for each distinct form of the character and in the case of a discrete variable to
1
take a class for each distinct value of the variable, the classes should not be t o e
numerous. For the main objective is to summarise the data into an easily manageable
and comprehensible form. Besides, having too many classes might result in a situation
where many of the classes may have zero frequencies. Whereas the true distribution
may show a gradual increase or decrease of frequency, the observed distribution, in
such a case, will indicate abrupt changes in the frequency. Because of this, in many
I
cases, we have to define the classes in terms of more than a single value of the
character concerned.
But the classes should not be too few either. If there are too few classes, we are likely
to overlook some important features of the distribution. For instance, an
asymmetrical distribution may appear t o be fairly symmetrical. We shall talk about
symmetrical distributions in Section 1.4.4
Further, in the computation of various measures related to the distribution, we
assume that the observations within each class interval are concentrated at the class
mark, instead of being spread over it. You will come across this in Unit 2. So, if the
classes are too few, or equivalently, if each class is too wide, then this assumption
may lead to considerable error in the computation of these measures.
Last, but not the least, we should see to it that in the case of a variable, the classes
are defined in terms of the same number of distinct values of the variable or are of
the same width. Otherwise, the frequencies (or the relative frequencies) for the
different classes will not be comparable. On occasion we have to deviate from this
rule, as you have seen from Table 9. In such cases, we have to work with frequency
densities.

Try this exercise now.

E 7) Consider the data shown below :


Yield of seed cotton (in gm) for 120 plots of size 0.0005 acre
93 81 57 42 95 80 52 70 105 72 .
60 68
49 74 60 57 63 51 100 41 50 66 65 81

a) Draw up a frequency table with 10 classes. Also show, alongside the


frequencies, the relative frequencies and the cumulative frequencies of both
types.
b) Estimate the number of plots with an yield of
i) 6 5 . 5 g m t o 8 5 . 5 g m ;
'
ii) more than 100 gm: and
iii) less than 60 gm.
Frequency Distribution of a
c) What is the proportion of plots with yieJd Character
i) between 70 gm and 100 gm ?
ii) less than 75 gm ?
iii) more than 105 gm ?

So far we have seen that a frequency distribution presents the data in a concise form.
We can get a general idea of a distribution more readily and effectively through an
appropriate diagram. In the next section, we talk about this diagrammatic
representation.

1.4 DIAGRAMMATIC REPRESENTATION OF


FREQUENCY DISTRIBUTIONS
We can use various kinds of diagrams to represent frequency distributions. In this
section, we shall first see how to give a visual representation to the information in a
frequency table. Then we shall talk about the representation of cumulative
frequencies. After this, we shall discuss frequency curves, the diagrammatic
representation of the frequency distribution of a variable which takes infinitely many
values. Finally, we shall classify distributions into broad categories on the basis of
their shapes. So let us start with the table of frequencies.

1.4.1 Frequencies
We shall discuss the cases of ungrouped and grouped frequency distributions, one by
one.

a) Case of an Ungrouped Frequency Distribution


An ungrouped frequency distribution of a qualitative character, given by the
frequencies or the relative frequencies may be represented by means of what is called
a bar diagram. The bars (actually rectangles) are as many as there are classes. These
are taken perpendicular to the same base line, either vertically or horizontally.
Further, the bars are eqilispaced and have the same width. Their height or length (as
the case may be) indicates the frequencies (or relative frequencies) for the respective
class. The frequency distribution for Table 1is represented in the bar diagram in Fig.
4.

Fig. 4 : Bar diagram showing the frequency distribution of 314 flowers in an F1 population of linseed
by colour.
,
DesxipUve statis~ica The rclative frequencies can be represented by means ot a bar diagram in the same
way as the absolute frequencies. But in case of data on a qualitative character, a
better mode of representing them would be to use what is called a pie diagram or
chart. This diagram makes use of a circle, whose total area is divided into as many
sectors as there are classes by drawing angles at the centre. The area of each section
represents (is proportional to) the corresponding relative frequency. To illustrate the
use of a pie diagram, let us consider the relative frequency table of colour of flowers
in the F2 generation of linseed (Table 1). We first determine the angles (in degrees)
to be drawn at the centre of the circle (see Table 11).

Table 11 : Angles to be drawn at the centre of pie diagram for the frequency
distribution of Table 1

Flower colour Relative frequency Angle to be taken

Blue 0.538 193.7"


Lilac 0.194 69.8"
White 0.198 71.3"
Pink 0.070 25.2"

Total 1.000 360.0"

The figures in the third column of the table indicate the measures (in degrees) of the
angle to be drawn for each class, its sides extending from the centre of the circle to
its circumference. Note that the angle for any given class measures
360" x relative frequency
1
Now, the area of a sector of angle 0 radians in a circle of radius r is ?r20. Thus, in a
L
given circle, the area of a sector is directly proportional to its angle (whether in radians
or in degrees). So if we draw sectors with angles given in the third column of
Table 11, then the area of each sector is proportional to the corresponding angle
which, in turn, is proportional to the corresponding relative frequency.
You can see the pie diagram corresponding to Table 11 in Fig. 5.

Fig. 5 : Pie diagram corresponding to Table 1 I.

If we are dealing with a discrete variable, we can also form a column diagram to
represent its frequency distribution. In Fig. 6(a) we have the column diagram for the
frequency distribution of household size (Table 3).
Frequency DLstributIot~ofa
Character

I
(a) Household size
20 7

* is-
5
5 10-

r=
5 -

I (b)lHousehold size
Fig. 6 : (a) Column diagram (b) Frequency polygon for the data on household size

When the possible distinct values of the variable are equispaced, as in the case of
household size, word length, etc., an alternative mode of representing the frequency
distribution is available to us. Again we take two mutuatly perpendicular axes, the
horizontal for the variable and the vertical for the frequency (relative frequency).
Then we plot each distinct value and the corresponding frequency (relative
frequency) as a point on the graph paper, with respect to these axes. See Fig. 6(b)
which represents the frequency distribution in Table 3. Then we join the points for
the successive values of the variable by straight line segments. Next we take two
additional points, one for the possible lower value than the lowest in the table and
the other for the possible higher value than the highest in the table, the corresponding
frequencies being of course, zero. For the distribution of household size, for instance,
the two additional points will correspond to household size 0 and household size 10.
Then we join these points with the points corresponding to the adjoining value, and
thus obtain a closed polygon. Such a diagram is called a frequency polygon. Note that
we can also get the frequency polygon by joining together the tops of the columns in
the column diagram.
You may try your hand at drawing a frequency polygon now.

E8) Draw a colurnrl diagram and a frequency polygon to represent the frequency
distribution of the data in E6.

b) Case of a Grouped Frequency Distribution


You would agree that the diagrammatic representation of a grouped frequency
distribution has to be different from that of the ungrouped one. The reason for this
is that unlike those for the ungrouped case, the frequencies in a grouped distribution
are scattered over the different class intervals.
To represent the frequencies (relative frequencies), we again take two rectangular
axes of coordinates, the horizontal for the variable value and the vertical for the
frequency density (relative frequency density). Having marked the class boundaries
on the horizontal axis, we draw on each class interval as base, a rectangle whose
height equals the corresponding frequency density (relative frequency density). The
area of each rectangle, therefore, represents the product of the class width and the
frequency density (relative frequency density), i.e., the class frequency (relative
frequency). The resulting diagram is called a histogram. In Fig. 7, we show you the
histogram for the frequency distribution of petiole length per leaf of a pipal tree,
drawn on the basis of the frequency densities given in Table 7.
I 0.55 1.35 2.15 295 3 T 4.55 5.35 6.15 6.95 7.75 8.55. 9.35

Petiole length

Fig. 7 : ~ i s t o ~ r a m ' f the


o r frequency distribution of petiole length.

In Fig. 8, you can see the histogram for the frequency distribution of family income
given in Table 9.

Fig. 8 : Histogram for the frequency distribution of family income

Try this exercise now.

E9) Draw a histogram to represent the frequency distribution of the yield of seed
cotton given in E7.

Here we have seen some ways of diagrammatically representing the frequency tables
of qualitative characters, and discrete and continuous variables. We can also use
cumulative frequencies to represent the frequency distribution of a variable. In the
next sub-section, we shall see how the cumulative frequency tables of a variable can
be diagrammatically represented.

1.4.2 Cumulative Frequencies


Now we divide our discussion into two parts : a) discrete and b) continuous variables.
a) Discrete variable Frequency Dtslrhtka d rn
Chrsetw
We again take two perpendicular axes of coordinates. The vertical axis will now be
used for the cumulative frequency while the horizontal axis will continue to be used
for the variable itself. But note the way the cumulative frequency changes : in the
discrete case, whenever it changes it changes by jumps (a point already mentioned in
Sec. 1.3.2).
In Table 4, which is a cumulative frequency table of the less than type, the cumulative
frequency is zero for values of the variable less than 1,is 3 for values not less than 1
but less than 2, is 10 for values not less than 2 but less than 3, and so on. Hence, the
cumulative frequency diagram takes the form indicated in Fig. 9(a). It is called a step
diagram owing to its resemblance to a flight of steps.

(a) Household size

-
I L I 1 1 I L I 1
1 . 2 3 4 5 6 7 8 9

(b) Househald size

Fig. 9 : Step diagram representing cumulative frequencies d the (a) less than type, (b) more than type,
for the data on household size.
P
The picture takes a somewhat different form when it comes to the cumulative
frequencies of the more-than type. From Table 4 you can see that the cumulative
frequency diagram will again be a step diagram, but like the one in Fig. 9(b).

b) Continuous variable
In representing the cumulative frequencies of either type for a continuous variable,
we proceed as in the discrete case, taking two rectangular axes of coordinates, the
horizontal for values of the variable and the vertical for cumulative frequency. But
we have to bear in mind that the cumulative frequency of the less than type for any
class corresponds to the upper class boundary and that it increases gradually and not
by jumps (as it does in the discrete case). Similarly, we have to remember that the
cumulative frequency of the more than type for any class corresponds to the lower
class boundary and that it decreases gradually and not by jumps.
So while drawing the diagram for the cumulative frequencies of either type, the points
corresponding to the successive class boundaries are joined by straight line segments.
Note that the cumulative frequency of less than (more than) type is zero (n) for any
variable value less than the lower boundary of the lowest class and is n (zero) for any
variable value exceeding the upper boundary of the highest class. Hence, the graph
of the cumulative frequency of the less than type will coincide with the horizontal
axis for values less than the lower boundary of the lowest class and parallel to that
axis at a height of n for all values equal to or exceeding the upper boundary of the
highest class.
In the case of the cumulative frequency diagram of the more than type, the picture
gets reversed : the graph will now be coincident with the horizontal axis for values
exceeding the upper boundary of the highest class and will be parallel to that axis at
a height of n for all values not exceeding the lower boundary of the lowest class. In
Figs. 10(a) and (b), we have these diagrams for the data on petiole length of'leaves
of a pipal tree.

Petiole length

I
Petiole length

Fig. 10 : Ogives for the data in Table 8 (a) less than type, (b) more than type.

The two cumulative frequency diagrams for a continuous variable resemble in shape
the two curves forming the top of an og&, a type of arch. Hence they have been
called the ogives of the distribution of the variable.
Here is an exercise for you.

E10) a3 Represent the cumulative frequencies of the less than and more than types
for the data in E6 by suitable diagrams.
b) Draw the ogives corresponding to the data in E7.
Requency~ b t r l b ~ t l d
o aa
So far we have considered frequency distributions of variables where the total number Charseter "
of individuals was finite. Later, in Block 4, you will see that the frequency
distributions that we encounter in real life situations arise from sampling from a large
group of individuals, called a population. In most of these situations, we can r ~ a r d
the population as infinite. Let us now discuss the diagrammatic representation of the
frequency distribution of an infinite population by a frequency curve.

1.4.3 Frequency Curve


Let us try to visualise what the frequency distribution or its histogram would look
like in an infinite population, especially when the variable is continuous. We first
divide [a,b], the range of variation of a continuous variable, into a few class intervals
when the total frequency (i.e., the sample size) is small. But let us consider samples
of increasing size and at the same time suppose the class intervals are taken smaller
and smaller. Suppose we draw the histograms of the distributions obtained in this
manner. To make these histograms comparable, we replace frequency density by
relative frequency density on the vertical axis. Also see Fig. ll(a), (b) and (c). Isn't
it natural then to expect that the histogram will gradually take the form of a smooth
curve (Fig. 11 (d))? This smooth curve, representing the frequency distribution of
the variable in the infinite population, is called the frequency curve of the variable.

Variable value - Variahle value

Variable value
(4
- Variable value
(4
Fig. 11 : Histogram of a frequency distribution of a continuous variable approaching a smooth curve.
I
Similarly, we can also say that with increasing total frequency and decreasing class
-
width, the ogive of a continuous variable of either type will also gradually approac'h
", a smooth curve as shown in Fig. 12. For the sake of comparability, we draw these on
the basis of cumulative relative frequencies (rather than cumulative frequencies).

YA A
i Y

0 0
,d .-
,d

L,

7 *
0 Variable Value X 0 Variable value X
(a) (b)
Fig. 12 : Limiting forms of the ogives of a continuous variable : (a) less than type (b) more than type
Now here are some exercises. In each of these we have asked you to give a
diagrammatic representation to some of the frequency distributions which you have
met in this urlit.

Ell) Represent the frequency distribution of assessment of the services offered by


the nursing home for the 55 inmates (given in E5) in terms of frequencies.
E12) Draw a suitable diagram to represent the frequency distribution given in
Table 10.

So far we have discussed various ways of visual representation of frequency


distributions. Now we shall see how frequency distributions can be classified into
certain broad categories according to shape.

1.4.4 Broad Classes of Distributions


In this section, we'll consider five different classes of distributions. These are
i) Bell-shaped symmetiical
ii) Bell-shaped moderately asymmetrical
iii) J-shaped
iv) b-shaped
v) Multimodal
distributions.
Let's discuss these one by one.
i) Bell-shaped Symmetrical Distribution
Such a distribution is also called a unimodal symmetrical distribution. It *maybe
related to /either a discrete or a continuous variable. It has the feature that its highest
frequency or frequency density occurs right at the middle of its range of variation,
and the frequency or frequency density decreases on either side gradually and at the
-same rate (see Fig. 13).

Fig. 13 : Frequency cuwe of a symmetrical distribution

Many of the distributions that are encountered in the physical, biological and
behavioural sciences, .as well as those arising from measurements in the field of
manufacturing industry, closely follow this form. For instance, if we collect data on
the stature (in cm) of a large number of adult males of a given race, then we will end
up with a distribution of this type.
ii) Bell-shaped Moderately Asymmetrical Distribution
A distribution of this type also has a single maximum, but the frequency or frequency
density decreases on one side at a higher rate than on the other (Fig. 14).
Y t

Fig. 14 : Frequency cuive of a bell-shaped asymmetrical distribution. .


-
While we very rarely encounter an exactly symme.2rical distribution, most of the
real-life distributions will fall in the present category. The distribution of petiole
length per leaf of a pipal tree, as indicated by the histogram of Fig. 10, has a
distribution with a longer tail to the left of the maximum than to the right. The
distribution of number of defects per piece of a manufactured item, on the other
hand, will have a longer tail to the right of the maximum than t a the left. The
distribution of the births occurring in a year in a big community by age of mother
will also be found to belong to this category.
iii) J-shaped Distribution
A J-shaped distribution may be said to be the most extreme form of an asymmetrical
distribution. Here the frequency or frequency density is maximum at one end of the
range and decreases monotonically as the variable value changes from this end of the
range to the other (see Fig. 15).

I
Y 4
Y

.&
2
a"
CT
4
.x--
B

0
,

Variable value X
L-
-B-
.-P

0' Variable.value X
,

Fig. 15 : Two J-shaped dlstributlons

The income distribution of Table 9 falls in this category, as you can see from Fig. 8.
The distribution of land-holding per family, the distribution of age at death of people
of age 60 years or less, the distribution of life of lamp bulbs, etc., will also be similar
to Fig. 15 (a).
iv) U-shaped Distribution
Such distributions are extremely rare. A .distribution of this type has its minimum
frequency (or frequency density) towards the middle of the range of variation while
the frequency (or frequency density)graduallyincreases, at the same rate or at
different rates, as the variable value changes either to the left or to the right (see
Fig. 16).
A
Y
-P
.-A
-0
+
$

-2-
.P

- -
0 Variable value X

Fig. 16 : A U-shaped distribution.

The distribution of days in a month by the degree of cloudiness at a place (if


may be considered a continuous variable) has been found to follow this
In other words, the number of days with no or very high cloudiness will be
large, while there will be fewer days with moderately low or moderately high degrees
of cloudiness.
-
v) Multimodal Distribution
In some situations, we may come across distributions with more than one maximum
as in Fig. 17. You may realise that such a distribution may result if several groups of
'
individuals are mixed together. For each group separately, the distribution may be
unimodal, but if they have distinct maxima, then the distribution in the composite
group will take on a multimodal form.

Fig. 17 : A multimodal distribution

,
In general, a multimodal distribution signifies heterogenity of the data-the fact that
the data have been obtained from groups with widely different characteristics.
So, we have seen that we can obtain a lot of information about the data from its
pictorial representation.
With this we bring this unit to a close. Let us go back and recall the points covered
in it.

SUMMARY
, In this unit, we have discussed the following points :
1) The term statistics may mean either numerical data arising in some sphere of life
or the scientific discipline that concerns itself with the collection, analysis and
interpretation of such numerical data.
2) Methods of data collection :
Direct observation method
Questionnaire method
Interview method
3) Classification of characters into qualitative and quantitative, and that of
quantitative characters into discrete and continuous ones.
4) Representation of frequency distribution of a character by means of a table.
5) Relative frequencies and cumulative frequencies.
6) Representation of the frequency distribution of a character by means ,of a
diagram : bar diagram, pie diagram, column diagram, frequency polygon,
histogram, ogive curve.
7) Classification of univariate distributions into certain broad categories :
Bell-shaped symmetrical and asymmetrical distributions,
J-shaped distributions,
U-shaped distributions,
Multimodal distributions.

1.6 SOLUTIONS
E l ) a) and d) are secondary data,
b) and c) are primary.
E2) a) interview method
b) questionnaire method (or interview method)
c) measurement (of yield for somesample plots)
d) measurement.
E3) c) and d) are qualitative,
a), b) and e) are quantitative.
E49 b), c) and f) are discrete,
a), d) and e) are continuous.
E5) The frequency distribution is :
Asesment Frequency Relative Frequency
V 7 0.1272
G 16 0.2909
S 18 0 3272
B 13 0.2363
P 1 0.0181
55

E6)
Cumulative frequency
Word Length Frequency Relative Freq. lese than more than

2 13 0.1428 13 91
3 19 0.2087 32 78
4 21 0.2307 53 59
5 15 0.1648 68 38
6 9 0.0989 77 23

E7) Noting that the lowest and the highest of the observations are 23 and 115, you
may take your classes as 21-30, 31-40,. ....., 111-120,
(a)
Cumulatiye Requency
Yield (gm) Claw mark Frequency Relatlve Freq. kss than more than

21-30 25.5 2 0.0166 2 120


31-40 35.5 5 0.0416 7 118
41-50 45.5 13 0.1083 20 113
51-60 55.5 21 0.175 41 100
61-70 65.5 27 0.225 68 79
71-80 75.5 22 0.1833 90 52
I .-
b) i). 43.5 ii) 5 iii) 41
,

47 79 3
c) i) - ii) - iii) -
120 120 120
Frequency Dbtrlbutkn d a
Chrrtk

Ellj

E12)

You might also like