0% found this document useful (0 votes)
9 views

CHAPTER 4 ORGANISATION OF DATA

Need this to organize my data and paper work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

CHAPTER 4 ORGANISATION OF DATA

Need this to organize my data and paper work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

CHAPTER 4 ORGANISATION OF DATA

MEANING OF CLASSIFICATION

Classification is the process of arranging data into sequences and groups according to their common characteristics.
OR
CLASSIFICATION is the Process of arranging data into sequences and groups according to their common characteristics
• Under classification, on the basis of chosen characteristics, the similarity and dissimilarity in the various items are noted and
items exhibiting similarity are grouped together in one class. Through classification, we try to strike a note of homogeneity in
the heterogeneous elements of the collected information.
Objectives of Classification
1. Brief and simplify: In classification, the aim is to eliminate unnecessary details and convert the huge mass of complex data
into simple, condensed, logical & comprehensible form. It helps in highlighting the significant features of the data. For
example, the huge and fragmented data collected during a population census has to be classified according to gender, marital
status, education, occupation, etc., to ascertain the structure and nature of the population.
2. To explain similarity and dissimilarity of Data: Classification facilitates the grouping of data according to certain similarities
and dissimilarities. This enables the investigators to grasp them easily. Facts like educated and uneducated, married and
unmarried, employed and unemployed, etc. are kept in separate classes.
3. To facilitate comparisons: Classification enables us to make meaningful comparisons, draw inferences and locate fact.
4. To study the relationships: Classification helps in finding out cause and effect relationship based on some criteria between
the data.
For example, the characteristics of income and education can be related after classifying the mass of data.
5. To prepare the data for tabulation: Only classified data can be presented in tabular form. Classification, thus provides a
basis for tabulation and further statistical processing.
6. Scientific Arrangement: Classification facilitates arrangement of data in a scientific manner which increase their accuracy
and reliability
Requisites of a Good Classification
A good classification must possess the following features:
1. Suitability: The classification should conform to the object of the enquiry. For example, if investigation is conducted to
inquire into the economic conditions of workers, then it will be of no use to classify them on the basis of their religion.
2. Unambiguous: The classification should not lead to any confusion. It should not be difficult to place units into different
groups according to their common characteristics.
3. Flexibility: A good classification should be capable of being adjusted according to the changed situations and conditions.
4. Mutually Exclusive: The classes must not overlap so that an observed value belongs to one and only one of the classes.
There must be no item which can find its way into more than one class.
5. Stability: The principle of classification, once decided, should remain same throughout the analysis, otherwise it will not be
possible to get meaningful results.
6. Homogeneity: A classification is said to be homogeneous if similar items are placed in a class. All units belonging to a group
should exhibit similar characteristics.
METHODS OF CLASSIFICATION
Statistical data is classified after taking into account the nature, scope, and purpose of an investigation. Generally, data is
classified on the following four basis (see chart):
METHODS Of CLASSIFICATION

Geographical Chronological Qualitative Classification Quantitative or


Classification Classification Numerical
Classification
(Data is classified (Data is classified with (Data is classified on the (Data is classified on
according to respect to different basis of descriptive the basis of
geographical location or periods of Time) characteristics) characteristics which
region) can be measured)

Simple Classification Manifold Classification


1. Geographical Classification (Spatial Classification)
When the data is classified according to geographical location or region (such as countries, states, districts, etc.), it is known
as geographical classification. When population of different states is presented, we say that it is according to geographical
classification.
2. Chronological Classification (Temporal Classification)
When data is classified with respect to different periods of time (such as decade, years, months, etc.), the type of
classification is known as chronological classification.
3. Qualitative Classification
In qualitative classification, data is classified on the basis of attributes like gender, literacy, region, caste, education, etc.
which cannot be quantified.
This type of classification is of two types:
I. Simple Classification: When facts are classified into two classes POPULATION
according to their attribute only. For example, if we divide (or
classify) the population of a city into two groups (males and
females), then the classification is simple. Males Females
II. Manifold Classification: When facts are classified according to
more than one attribute, or when each class is sub-divided into more than two sub-classes.

4. Quantitative Classification (or Numerical Classification)


In this classification, data is classified on the basis of some characteristics which can be measured such as height, weight,
income, expenditure, production, or sales.
For example, if we classify students according to their age in a school, then we can always express the ages numerically, i.e. in
terms of numbers. Let us suppose that the ages of 500 students of a school lie between 10 and 18 years. Now, we can classify
them into 4 groups as follows: 10 to 12 years, 12 to 14 years, 14 to 16 years, and 16 to 18 years and classify as:

CONCEPT OF VARIABLE
A variable refers to quantity or characteristic whose value varies from one investigation to another.
Examples:
(i) "Price" is a variable as prices of different commodities is different.
(ii) "Age" is a variable as age of different students varies.
(iii) Similarly, some more variables are: Height, Weight, Wages, Expenditure, Imports, Production, etc.
It may be noted that different variables are measured in different units. For example, age is measured in years, height in
inches or centimeters, weight in kgs, income in rupees etc.
Variable Vs Attribute
'Variable' is generally taken as anything that changes or varies over a period of time. But, in statistics, only that change is taken
as a variable which can be numerically expressed, such as length, height, width, temperature, etc.
Thigs which cannot be measured numerically such as intelligence, beauty, efficiency, aptitude, etc. are called 'Attributes'.
Variables are of two kinds:
(i) Discrete Variable (Discontinuous Variable).
(ii) Continuous Variable.
Discrete Variable (Discontinuous Variable)
Variables which are capable of taking only exact or finite value and generally not any fractional value are termed as discrete
variables. In other words, discrete variables are expressed in terms of complete numbers.
For example, number of workers or number of students in a class are discrete variable as they cannot be in fractions. Similarly,
number of members in a family can be 1,2 or so on, but cannot be 1.5,2.75. Some other examples can be population of a
town, number of rooms in a house, total number of mobiles in a family, etc.
Continuous Variable
Those variables which can take all the possible values (integral as well as fractional) in a given specified range are termed as
continuous variables. In such a case, data is obtained by measurement.
• For example, Temperature is a continuous variable because it can take any value in the range of measurement, like 20°C or
20.1°C or 20.2°C or 20.5°C and so on.
Discrete Variable Vs Continuous Variable
Basis Discrete Variable Continuous Variable
Meaning Discrete variable is a variable which is Continuous variable is a variable which
capable of taking only exact value and can take all the possible values (integral as
generally not any fractional value. well as fractional) in a given specified
range.
Change in Values These variables increase in complete These variables can increase in fractions
numbers. as well as in complete numbers.
Data Collection In case of discrete variable, data is In case of continuous variable, data is
obtained by counting. obtained by measurement.
Example Number of workers or number of students Height or weight of individuals, are
in a class are discrete variables as they continuous variables as they can be in
cannot be in fractions. fractions.

STATISTICAL SERIES
The arrangement of classified data in some logical order, like according to the size, according to the time of occurrence or
according to some other measurable or non-measurable characteristics, is known as Statistical Series.
• Statistical series are prepared to present the collected and classified data in a properly arranged way.
• For example, if data pertaining to marks of 35 students in a class are arranged according to their roll numbers, then it can be
called statistical series.

4.7 KINDS OF STATISTICAL SERIES


Statistical series can be divided as:

4.8 INDIVIDUAL SERIES (Series without frequency)


Individual series refers to that series in which items are listed singly, i.e. each item is given a separate value of
measurement.
An Individual individual series may be presented in two ways:
(i) According to Serial Number;
(ii) According to order of Magnitude (Ascending or Descending order).
(i) According to Serial Number
An individual series can be arranged in a serial order. So, marks obtained by 10 students may be arranged either in serial
number or in order of their roll numbers as shown in the following table:
(ii) According to order of magnitude (Ascending or Descending order)
An individual observation can also be arranged in order of magnitudes (ascending order or descending order).
• The arrangement of raw data in ascending or descending order of magnitude is known as 'Array'.
• So, marks obtained by 10 students can be presented either in ascending order or in descending order.
DISCRETE SERIES (UNGROUPED FREQUENCY DISTRIBUTION)
A much better way of presentation of data is to express it in the form of a discrete series. A discrete series is that series where
individual values differ from each other by definite amount.
• In a discrete frequency distribution, various values of the variable are shown along with their corresponding frequencies.
• For a discrete variable, the classification of its data is known as a 'Frequency Array'.
• A discrete variable takes a definite integral value and not fractional values. So, we have frequencies that correspond to each
of its integral values.
Construction of Discrete Frequency Distribution using Tally Marks
The method of Tally Marks or Tally Bars is used to count the number of observations or the frequency of each value of the
variable.
The various steps involved are:
1. First of all, a table is prepared with three columns having 3 headings: (i) Variable; (ii) Tally Marks; and (iii) Frequency.
2. In the 1st column, each possible value of the variable is written.
3. In the 2nd column, a tally mark denoted by | is noted for every observation, against its corresponding value.
• After a particular value has occurred four times, for the fifth occurrence, we put a cross tally mark (Hit), cutting the first four
tally marks and this gives a block of five.
• For the sixth item, we put another tally mark leaving some space.
4. In the 3rd column, we put the total of tally marks corresponding to each value of the variable.
Tally Marks
The method of tally marks is used to count the number of observations or the frequency of each value of the variable.
• Each possible value of the variable is written in a column.
• For every observation, a tally mark denoted by I is noted against its corresponding value.
• Five observations are denoted as HU, i.e., the fifth tally mark crosses the earlier four marks and so on.
Example 1. Represent the marks of 28 students of class XI in the form of discrete frequency distribution:
20 15 15 10 15 20 15 10 20 15 10 25 20 25
25 10 25 15 20 10 20 15 10 20 15 20 10 15
Solution:
The discrete frequency distribution can be constructed in the following steps:
1. In the 1st column, write the values of different marks (10,15, 20 and 25).
2. In the 2nd column, tally marks are entered (1st value is 20, so, first tally mark is given to 20 and next tally mark to 15 and so
on). These tally marks are grouped in the blocks of five.
3. In the 3rd column, the tally marks corresponding to the different values are counted and entered into the third column of
frequency.
Using this procedure, we get the following frequency distribution:
Marks Tally Marks Frequenc
y
10 III II 7
15 IIII IIII 9
20 IIII III 8
25 lIII 4
Total Students 28
Thus, from the above table, it is clear that out of 28 students, 7 students got 10 marks, 9 got 15 marks,
8 got 20 marks and 4 got 25 marks.
Note: In the given frequency distribution, the marks of students are a discrete variable which can take integral values of 10,
15, 20 and 25. It must be noted that in case of discrete frequency distribution, the identity of various observations is not lost, i.
e., it is possible to get back the original observations from the given frequency distribution.
Relative Frequency Distribution
When actual frequencies are expressed as a percentage of total number of observations, then relative frequencies are
obtained.
Example 2. Transform the following data of size of 30 pair of shoes sold by a showroom on a particular day in the form of
discrete series. Also calculate Relative Frequency.
8 9 5 8 6 7 9 10 8 7 8 6 7 10 7
5 8 5 6 7 9 8 6 10 8 7 7 8 9 8
Solution:
Size of Shoes — Tally Marks Frequency Relative Frequency
(%)
5 ||| 3 3
× 100 = 10
30
6 |||| 4 4
× 100 = 13.33
30
7 |||| || 7 7
× 100 = 23.33
30
8 |||| |||| 9 9
× 100 = 30 30
30
9 IIII 4 4
× 100 = 13.33
30
10 III 3 3
× 100= 10 30
30
Total Shoes sold 30
st
In the given table, 1 column shows the size of shoes, which varies from 5 to 10. The sizes have been arranged in the
ascending order. 2nd column shows the tally marks and 3rd column shows the sum total of tally marks (i.e. frequency) and 4th
column shows relative frequency.

4.10 CONTINUOUS SERIES (GROUPED FREQUENCY DISTRIBUTION)


As compared to a discrete variable, a continuous variable can take any value in an interval. A continuous series is that series
which represents continuous variables, showing range of values of different items of the series.
Continuous Series is also known as Frequency Distribution or Grouped Frequency Distribution or Series with Class Intervals or
Series of Grouped Data.
In a continuous series, the measurements are only approximations and are expressed in class- intervals, i.e. within certain
limits. Here classes are framed without any break from beginning to end.
Important Terms under Continuous Series
In constructing a continuous series, we come across certain words like class, class-limits, class- intervals, range, etc., which
should be properly understood.
Class
Class hereby means a group of numbers in which items are placed such as 0-10,10-20,20-30, etc.
• The classes should be clearly defined and should not lead to any confusion.
• Classes should be exhaustive and mutually exclusive, so that any value of the variable corresponds to one and only one of
the classes.
Class limits
The lowest and highest values of the variables within a class is called 'class limit'.
• In continuous series, each class is located between two numbers. These two numbers make class limit.
• The lowest value of a class is known as 'lower limit' or 'l1 while the highest value constitutes the 'upper limit' or 'l2.
• If class is 10-20, then lower and upper limits will 10 and 20 respectively.
• It is convenient to have lower limit of a class either equal to zero or some multiple of 5.
• Class limits should be whole numbers as far as possible and be such that every item of the given data is included in these.
Class-Interval
The difference between the lower limit (l1) and upper limit (l2) is known as class-interval.
• The class-interval is generally indicated by the symbol 'i' or 'c'.
• Symbolically, Class-Interval ('i' or 'c') = l2 – l1
• Class-interval is also known as 'magnitude' or 'size' or 'length' of the class.
Width of Class-Intervals
While constructing the frequency distribution, it is desirable that the width of each class-interval should be equal in size. The
size (or width) of each class-interval can be determined by the following formula:
Largest Observation−Smallest Observation
Width of Class-Interval =
Number of Classes Desired
Before taking a final decision on the width of various class-intervals, it is necessary to consider the following points:
1. Normally, a class-interval should be a multiple of 5, as it is easy to grasp numbers like 5, 10,15,........, etc.
2. It should be convenient to find the mid-value of a class-interval.
3. As far as possible, all classes should be of equal width. A frequency distribution with equal class width is convenient to
represent on a diagram and easy to analyse.
4. Most of the observations in a class should be uniformly distributed or concentrated around its mid-value.
Range
The range of a frequency distribution can be defined as the difference between the lower limit of first class-interval and the
upper limit of the last class-interval. For example, if classes are 0-10,10-20, ….. till 70-80, then range is 80 - 0 = 80.
Mid-Point or Mid-Value
Mid-point is the central point of a class-interval.
• It is calculated by dividing the total of magnitude of lower and upper limits by 2.
Lower Class Limit +Upper Class Limit
Mid-Point or Mid-Value =
2
10+20
• For example, mid-point of class 10-20 will be: Mid-Point = = 15
2
Frequency
Frequency refers to the number of items (observations) falling within a particular class. For example, if class 0-20 has 10
students, then frequency is 10. It means that there are 10 students who have obtained marks between 0 and 20.
Frequency Distribution
Frequency distribution is a table, which shows how the different values of a variable are distributed in different classes
along with their corresponding class frequencies. A frequency distribution is a comprehensive way to classify raw data of a
quantitative variable.
Class Frequency
The number of observations corresponding to a particular class is known as class frequency or the frequency of that class.
• It is denoted generally by f.
• The sum of frequencies is denoted as ∑ f or N.
• For example, if 15 students have secured marks between 60 and 70, then T5' is the class frequency.

4.11 TYPES OF FREQUENCY DISTRIBUTION (CONTINUOUS SERIES)


The continuous series are mainly of following types:
1. Exclusive Series 4. Equal and Unequal class-interval series
2. Inclusive Series 5. Mid-value series
3. Open-End distribution 6.Cumulative frequency Series ('Less than' and 'More than')

Exclusive Series
The classes of the type 10-20, 20-30, 30-40, etc., wherein the upper limit of one class-interval becomes the lower limit of
the next class, are known as exclusive classes. Such classification ensures continuity of data because the upper limit of one
class is the lower limit of succeeding class.
TYPES OF CONTINUOUS SERIES

Exclusive Series
(Classes of type 10-20,20-30, etc.)

Inclusive Series
(Classes of type 10-19,20-29, etc.)

Open-End Distribution (Lower limit of first class and upper limit of last class is
not given)
Example: Below 5, below 10, below 15
Cumulative Frequency Series
(Less than and More than Series)

Mid-value Series
(Middle values of a class-interval are given)

Unequal Class-interval Series


(Class-intervals are not equal)

Equal Class-interval Series


(Classes are of the same interval)

You might also like