Unit-1 - Data Classification and Tabulation - L-4 - 28th November, 2020
Unit-1 - Data Classification and Tabulation - L-4 - 28th November, 2020
The data collected for the purpose of a statistical inquiry sometimes consists of a
few fairly simple figures, which can be easily understood without any special
treatment. But more often there is mass of raw data without any structure. Thus,
unwieldy, unorganized and shapeless mass of collected data is not capable of being
easily associated or interpreted. Unorganized data are not fit for further analysis
and interpretation. In order to make the data simple and easily understandable the
first task is to condense and simplify them in such a way that irrelevant data are
removed and their significant features are stand out prominently. The procedure
adopted for this purpose is known as method of classification and tabulation.
Classification helps proper tabulation. Classification is the process of arranging
data into sequences and groups according to their common characteristics or
separating them into different but related parts.
Or we can say that the process of grouping large number of individual facts and
observations on the basis of similarity among the items is called classification.
Page 1 of 10
Guiding principles (rules) of classifications
a) Geographical Classification
Page 2 of 10
Ex: Sales of the company (In Million Rupees) (region wise)
Region sales
East 235
West 215
North 265
South 247
b) Chronological Classification
If the statistical data are classified according to the time of its occurrence, the type
of classification is called chronological classification.
Month Sales
(Rs.) in lakhs
January 45
February 63
March 48
April 54
May 56
June 60
Page 3 of 10
July 64
c) Qualitative Classification
a) Simple classification
b) Manifold classification
i) Simple classification: If the classification is done into only two classes then
classification is known as simple classification.
Page 4 of 10
In this classification marks obtained by students is variable and number of students
in each class represents the frequency.
Tabulation
Tabulation may be defined, as systematic arrangement of data is column and rows.
It is designed to simplify presentation of data for the purpose of analysis and
statistical inferences.
2. To facilitate comparison
Page 5 of 10
Classification of tables
Frequency Distribution
Frequency distribution is a table used to organize the data. The left column (called
classes or groups) includes numerical intervals on a variable under study. The right
column contains the list of frequencies, or number of occurrences of each
class/group. Intervals are normally of equal size covering the sample observations
range.
It is simply a table in which the gathered data are grouped into classes and the
number of occurrences, which fall in each class, is recorded.
Series of individual observation is a series where the items are listed one after the
each observation. For statistical calculations, these observations could be arranged
is either ascending or descending order. This is called as array.
Page 6 of 10
Discrete (ungrouped) Frequency Distribution
21, 23, 19, 17, 12, 15, 15, 17, 17, 19, 23, 23, 21, 23, 25, 25, 21, 19, 19, 19
Page 7 of 10
Continuous frequency distribution (grouped frequency distribution)
Continuous data series is one where the measurements are only approximations
and are expressed in class intervals within certain limits. In continuous frequency
distribution the class intervals are theoretically continuous from the starting of the
frequency distribution till the end without break. i.e., the variable which can take
very intermediate value between the smallest and largest value in the distribution is
a continuous frequency distribution.
The presentation of the above data can be expressed into groups. These groups are
called classes or the class interval.
Each class interval is bounded by two figures called the class limits.
The lower value of a class interval is called lower limit and upper value of that
class interval is called the upper limit. Thus, each class interval has lower and
upper limits.
For Example:
In the class interval 10 - 20, 10 is the lower limit and 20 is the upper limit.
Page 8 of 10
In this, the class intervals are (0 – 10), (10 – 20), (20 – 30). In this, we include
lower limit but exclude upper limit.
So, (10 – 20) means values from 10 and more but less than 20.
(20 – 30) would mean values from 20 and more but less than 30.
Here, also we arrange the data into different groups called class intervals, i.e.,
(0 – 10), (11 – 20), (21 – 30).
Page 10 of 10