0% found this document useful (0 votes)
8 views

Unit 2

Uploaded by

anamika857956
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit 2

Uploaded by

anamika857956
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT 2 DATA ORGANISATION AND

GRAPHICAL REPRESENTATION*
Structure
2.0 Objectives
2.1 Introduction
2.2 Classification and Tabulation of Qualitative and Quantitative Data
2.2.1 Classification
2.2.2 Tabulation

2.3 Construction of Frequency Distribution


2.3.1 Computation of Ungrouped Frequency Distribution
2.3.2 Computation of Grouped Frequency Distribution

2.4 Cumulative Frequency Distribution


2.5 Percentile and Percentile Ranks
2.6 Graphical Representation of Data
2.6.1 Bar Graph
2.6.2 Histogram
2.6.3 Frequency Polygon
2.6.4 Cumulative Percentage Frequency Curve or Ogive
2.6.5 Circle Graph or Pie Chart

2.7 Let Us Sum Up


2.8 References
2.9 Key words
2.10 Answers to Check Your Progress
2.11 Unit End Questions

2.0 OBJECTIVES
After reading this unit, you will be able to:
 discuss the classification and tabulation of statistical data;
 describe the steps in construction of a frequency distribution ;
 create a cumulative frequency distribution table;
 explain the meaning of percentile and percentile ranks; and
 discuss the graphical representation of data.

2.1 INTRODUCTION
The objective of all statistical inquiry is to describe and understand the
population of interest. For example, in an exit poll survey, a news channel
wants to assess the political attitude of the voters, how they are going to vote in
* Dr. Vijay Viegas, Assistant Professor, Abbé Faria Post Graduate Department of Psychology,
34 St. Xavier’s College, Goa
the upcoming election, and what are the chances of current Government to Data Organisation
come back in power again? This information about the population of interest and Graphical
Representation
can be gained from a number of statistical enquiries. Exit poll surveys provide
tentative information about which party will gain what percentage of votes in
which state of India and so on. Such exit poll surveys make use of basic
statistical techniques that can be categorised under descriptive statistics.
In the previous unit we mainly discussed about the term statistics, its definition,
nature and also key terms. We also discussed about scales of measurement and
the two main categories of statistics, namely descriptive and inferential
statistics. In the present unit, we will mainly focus on the varied aspects of
descriptive statistics, viz, classification, tabulation, organisation and graphical
representation of data. One of the most basic yet important method known as
frequency distribution will also be discussed in this unit. Further, we will also
discuss the method of cumulative frequency distribution, percentile, percentile
rank and graphical representation of data.

2.2 CLASSIFICATION AND TABULATION OF


QUALITATIVE AND QUANTITATIVE DATA
Any data can be qualitative or quantitative in nature. Qualitative data are
measures of types and are denoted by a name, symbol, or a number code. They
are types of information that have features that can not be measured. In
simple words, qualitative data are data aboutcategorical variables. Some
examples of qualitative data are the smoothness ofyour skin, and the colour of
your eyes, the texture of your hair, the softness ofyour palm etc.
Whereas, quantitative data states information about quantities, that is,
information that can be measured and written down with numbers. In other
words, quantitative classification refers to the classification of data according
to some characteristics that can be measured. Examples of quantitative data are
weight, height, shoe size, and the length of fingernails, income, sales, profits,
production etc.
In descriptive statistics, classification and tabulation of data, whether
qualitative or quantitative, are two important functions that will help the
researcher in organising the data in a better manner so that further statistical
analysis (whether by computing measures of central tendency, measures of
variability or inferential statistics) can be carried out.
In this context, we also need to explain the term univariate analysis. The term
univariate implies that there is only one variable. And when statistical analysis
is to be carried out with just one variable, descriptive statistics are used. For
example, if a researcher wants to study achievement motivation of students in
class tenth, the data obtained (with the help of a standardised psychological
test) cannot be subjected to inferential statistics or higher level statistical
techniques. The researcher will be able to classify and tabulate the data based
on the students who secured higher, lower or moderate scores. He/ she may
further be able to compute mean (that will be discussed in the unit on measures
of central tendency) and standard deviation (that will be discussed in the unit
on measures of variability).
Thus, in the context of univariate analysis, we mainly focus on the use of

35
Introduction descriptive statistics. In the present section we will discuss classification and
tabulation of data.

2.2.1 Classification
Data classification is a method of organising data into groups for its most
effective and efficient use. A well-planned data classification system makes
vital data easy to find and retrieve whenever required. In other words, the
process of ordering data into homogenous groups or classes according to some
common characteristics present in the data is called classification. For
example, it is a common exercise that during the process of sorting letters in a
post office, the letters are classified according to the cities and further arranged
according to streets and other details, so that it becomes easier to deliver the
letters to its destination.
In the context of research, the data collected by a researcher is arranged in
formats that will help him/ her draw conclusions. Basically, classification
involves sorting the data based on similarities. Once the data is classified, the
researcher can proceed with further statistical analysis and decision making.
Some of the main objectives of classification are as follows:
1) The data is presented in a concise form. A raw data as such has no
meaning. But once it is classified, it will reflect some meaning.
2) Classification helps in identifying the similarities and diversities in the
data. For example, based on the marks obtained in an English test,
students can be grouped in to those obtaining 76-100, those obtaining
marks between 51-75, those obtaining marks between 26-50 and those
obtaining marks between 1-25. Each of these groups are distinct from
each other in terms of marks obtained, but are grouped because of
similarity of marks obtained by them (refer to table 2.1).

Table 2.1: Marks obtained by Students

Marks Obtained Students

76- 100 28

51-75 40

26-50 12

1-25 20

3) Classification also helps in comparisons. The groups can be compared


with each other and conclusions can be drawn. Computation of
percentage will tell us the percentage of students falling in each of the
four groups, mentioned in the above example.
4) Classification can be carried out for both qualitative as well as
quantitative data. Individuals can be classified on the basis of colour of
their hair or gender, that would be qualitative data. And individuals can
also be categorised based on quantitative data, for example, their income,
their age and so on.
36
One way in which quantitative data can be adequately classified is with the Data Organisation
help of frequency distribution, that will be discussed in detail later in this unit. and Graphical
Representation
2.2.2 Tabulation
Tabulation is the process of insertion of classified data into tabular form. A
table is a symmetric arrangement of statistical data in rows and columns. Rows
are horizontal arrangements, whereas, columns are vertical arrangements. It
may be simple, double or complex depending upon the type of classification
used for various purposes at any given time by an individual.
Tables are an important aspect of any research report or thesis. Any table will
have some key components that are discussed as follows:
1) Table number: Any table needs to have a table number. In various units
of this course, you will notice that all the tables are numbered. This
mainly helps in identification of the table as well as provides a reference.
So if you are asked to refer to say table 2.2, you know exactly where to
look for it in this unit. Table numbers need to be provided in a systematic
manner and in serial order, especially if you have included more than one
table in your report or thesis.
2) Title for the table: Besides table number, a table should also have a title
that should be specific in nature and should in short reflect what the table
is about. Such a title also needs to be clear and self explanatory and
should instantly help the reader gauge what the table is about.
3) Captions and stubs: Any table will then have rows and columns based
on its contents. The headings given for columns are termed as captions.
Whereas, stubs are the heading that are given to the rows. These again
need to be concise and self explanatory. The captions and stubs will be
decided by the researcher based on the research he/ she is carrying out.
4) Body of the table: Body of the table is the main part of the table that
reflects the numerical information that is collected based on the data
collection. The numerical data here will be classified based on the
captions and stubs.
5) Headnote: Tables also have headnotes which could be written in extreme
right below the title and these provide information about units of
measurement.
6) Footnote: These are written below the table and may display crucial
information about the information given in the captions and stubs.
7) Source of data: The source of data can then be mentioned below the
table.

37
Introduction A table thus prepared is give below:

Table 2.2: Percentage of male and female students based on marks obtained
by them in English test

Marks Obtained in Gender (Caption head)


English (Stub Head)
Males (N= 50) Females (N= 50) (Caption)
(Caption)

76 -100 (Stub) 20% 21%

50-75 (Stub) 12% 13%

26 to 50 (Stub) 40% 39%

1-25 (Stub) 28% 27%

Total 100% 100%

Footnote: Number of students is in terms of percentage (%).


Source: Data collected from the Term End Examination results

As discussed above classification and tabulation are significant in organising


the data. Some of the merits of classification and tabulation are as follows:
1) Clarifies the data: The information arranged in the form of table is
easily accessible and provides adequate and clear information to the user
of the data.
2) Simplification: Classification and tabulation of data reduces the mass
that is, the size of the data and present the data in simplest possible way.
When the data is presented in the tables and classified, all the
complexities are removed and the data is made very simple and clear for
the user.
3) Facilitates comparisons: It enables quick comparison of the statistical
data shown in rows and columns.
4) Information can be easily referred: When an information is tabulated,
it is very easy to refer to.
Check Your Progress I
1) What is quantitative data?
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
2) List the merits of classification and tabulation.
......................................................................................................................
......................................................................................................................
38
...................................................................................................................... Data Organisation
and Graphical
...................................................................................................................... Representation

......................................................................................................................

2.3 CONSTRUCTION OF FREQUENCY


DISTRIBUTION
Earlier in this unit we discussed about classification and tabulation of data.
And frequency distribution is a way in which raw data can be classified so as to
provide a clear understanding of the data. Frequency distribution is a tabular
representation, in which the raw data is organised in to class intervals.
Frequency distribution can be categorised in to three types:
1) Relative frequency distribution: Such a distribution denotes that the
score that is allotted for each class interval is the proportion of total
number of cases in a distribution. For example, in a frequency
distribution of 100 employees based on years of experience, 35
employees fall in the range (class interval) 10-14 years of experience,
then the relative frequency distribution will be 35/100= 0.35. Thus, it can
be said that 35% of the employees fall in this class interval.
2) Cumulative Frequency Distribution: Such a distribution for a certain
class interval is summation of the frequencies for that class interval and
for the class interval below that class interval. This will be discussed in
detail in the next section of this unit.
3) Cumulative Relative Frequency Distribution: In such a distribution,
the cumulative relative frequency for a particular score is the relative
frequency for that score in summation with the relative frequencies of all
the scores that lie before this particular score. This will be clear from
table 2.3, that provides examples of the three types of frequency
distribution.

Table 2.3: Examples of relative frequency, cumulative frequency and


cumulative relative frequency distributions

Scores Frequency Relative Cumulative Cumulative


Frequencies Frequency Relative
Frequency

34 3 10% 30 100%

23 4 13.33% 27 89.99%

22 10 33.33% 23 76.66%

21 6 20% 13 43.33%

19 7 23.33% 7 23.33%

N= 30
39
Introduction In frequency distribution there are two main methods to describe class interval.
1) The exclusive method: In this method, the upper limit of a certain class
interval is the lower limit of the class interval next to it, thus there is a
continuity between the class intervals. The score that equals the upper
limit of a class interval is exclusive in the sense that it will fall in the
class interval where the score is its lower limit. Thus, in exclusive method
the score equal to upper limit is not included in that class interval, but a
score equal to its lower limit is included in it. For example, in a
distribution with class intervals using exclusive method, a score 20 will
fall in class interval 20- 30 and not in 10- 20 class interval.
2) The inclusive method: In inclusive method there is no continuity
between the class intervals and this method is especially for discrete
scores. In this method, scores equal to both lower and upper limit are
included in the class interval. For example, the class intervals will be 1-5,
6-10, 11- 15 and so on.
Frequency distribution can also be categorised in to ungrouped or grouped
frequency distribution.
Ungrouped Frequency Distribution: An ungrouped frequency distribution is
the one in which all the values are listed in an ascending or a descending order.
Based on the frequency of occurrence of each score, a tally mark ( / ) is placed
in front of the respective value and frequency (denoted by ‘f’) of each score is
stated in the next column. The example of ungrouped frequency distribution is
given in table 2.4:

Table 2.4: Ungrouped frequency distribution

Values Tallies f

6 /// 3

9 //// 4

12 //// 5

23 / 1

24 // 2

Grouped Frequency Distribution: Sometimes the data is too large and it is


not possible to have a frequency distribution in an ungrouped form, as then the
researcher will not be able to get a clear picture. In such cases a grouped
frequency distribution can be used. Here the data are organised in to classes or
class interval and then a tally mark is placed based on which class interval a
given score falls in and then the frequency is denoted. The example is given in
table 2.5.

40
Data Organisation
Table 2.5: Grouped frequency distribution and Graphical
Representation
Values Tallies f

1-5 /// 3

6-10 //// 5

11-15 // 2

16-20 / 1

21-25 / 1

The concept of grouped and ungrouped frequency distribution must be clear


from the above examples. We will now discuss computation of frequency
distribution with the help of an example.
Suppose, in a class of forty students, following marks were obtained on a test
of ten marks. The marks obtained by the forty students are given as follows:

3 8 6 5 6 4 7 6
5 3 5 6 3 5 4 4
3 6 7 8 1 10 7 6
4 5 0 7 6 5 6 7
1 7 5 4 5 8 5 7

These numbers (marks of the students) are called as raw data, as they are
obtained from the field directly and haven’t gone through any statistical
analysis. Now the question is, what these numbers or raw data suggest about
the target population of students? Which marks are most common? How many
students got highest marks? How many students passed this test? With raw
data, though, it is not possible to draw any conclusion. Thus, we need to create
a frequency distribution on the basis of the raw scores. Frequency can be
calculated for each of the obtained score by the students.
Frequency is the number of times a particular variable/ individual or
observation (obtained marks in our context) occurs in raw data.
The distribution of a variable is the pattern of frequencies of the observation.
Frequency distributions are portrayed as frequency tables, histograms,
or polygons. It is just the arrangement of scores and the frequency of
occurrence within a group. A frequency distribution table is one way you can
organise data so that it makes more sense to the reader.
As discussed earlier, there are two major types of frequency distribution,
grouped frequency distribution and ungrouped frequency distribution. The
computation for both these frequency distributions are discussed as follows:

41
Introduction 2.3.1 Computation of Ungrouped Frequency Distribution
To calculate frequency we are going to use Tally Score Method – “This
method consists of making a stroke in the proper class for each observation and
summing these for each class to obtain the frequency. It is customary for
convenience in counting to place each fifth stroke through the preceding
four . . .” (Lawal, 2014, page 13). The frequency can be tabulated as follows
(based on example of marks obtained by forty students:

Table 2.6: Frequency distribution using tally method

Marks Tallies Frequency (f)

0 / 1

1 // 2

2 0

3 //// 4

4 //// 5

5 //// //// 9

6 //// /// 8

7 //// // 7

8 /// 3

9 0

10 / 1

∑ = 40

Please note that the total (∑) should be equal to the number of students, that is,
40. Now, we can conclude following information from frequency table:
 Only one student got full marks.
 Most common marks is five followed by six.
 Only one student scored zero on the test.
The steps involved in creating an ungrouped frequency distribution are as
follows:
Step 1: Arrange your raw data in an array-ascending or descending order.
Step 2: Make a table with three columns and name them as variable (that is,
marks in the case of the present example), tallies and frequency.
Step 3: Enter your variables (marks in case of this example) in the first column
from lowest to highest order.
42
Step 4: Now, go one by one, through your raw data and make a mark (/) for Data Organisation
each variable next to its value in the second column of your table. and Graphical
Representation
Step 5: Count the tally marks for each variable and write its total in third
column, that is, frequency column.

2.3.2 Computation of Grouped Frequency Distribution


One disadvantage of the ungrouped frequency distribution method is that it will
be tiresome and difficult to make a table for larger values or observations.
Suppose, in the above example of class test if the number of students were 250,
then would it be convenient to make an ungrouped frequency distribution
table for such data? Probably no! Then what can we do? We can use
another statisticalprocedure called as grouped frequency distribution method.
To understand this method, let us take another example. Suppose, you have the
scores obtained by students on class test in History:

12 7 13 14 12 23 21 14 13 23

30 12 1 21 23 21 23 21 5 21

11 22 30 14 4 17 35 24 13 17

Step 1: Range is to be found. In the case of our example, the lowest value
is 1 and the highest value is 35. Range= Highest Score - Lowest Score
(R=H-L)
Thus, R = 35-1 = 34.

Step 2: The class interval can be derived by dividing the range by number of
categories that we need.
i = Range/ Number of categories needed
In our example, the range is obtained as 34, and total number of scores
(number of students) are 30. Thus, around 6 categories would be sufficient.
Thus,
i = 34/ 6= 5.7, that can be rounded off to 6.
While creating categories, ensure that not more than 10 categories are created
if there are approximately 50 scores, not more than around 10 to 15 categories
are created if the scores are between 50 to 100 and not more than 20 categories
are created if the scores are more than 100 (Mangal, 2002). Make sure you
have a few items in each category. For example, if you have 20 items, choose 5
classes (4 items per category), not 20 classes (which would give you only 1
item per category).
It is sometimes possible that the ‘i’ obtained is not a whole number. In such a
situation, a number nearest to this obtained number can be taken. For example
if ‘i’ is obtained as 5.8 then 6 can be taken being the nearest number.
It is also possible that the class interval or ‘i’ is finalised before the number of
categories are decided. For convenience, the class interval of 10, 5, 2, for
example, can be taken.
43
Introduction Thus, class interval can be derived in either way as mentioned above.
Step 3: Frequency distribution table can now be created. The following is to be
done to create a frequency distribution table:
a) For this a table with three columns is to be created with variable (that is,
marks in the case of the present example), tallies and frequency (this is
similar to the steps followed in creating an ungrouped frequency
distribution).
b) Then enter your variables in the first column.
c) Go through your raw data and make a mark (/) for each variable next to
its value in the second column of your table.
d) Count the tally marks for each variable and write its total in third column,
that is, frequency column.

Marks Tallies Frequency (f)

31- 36 / 1

25- 30 // 2

19- 24 //// //// / 11

13- 18 //// 5

7- 12 //// /// 8

1- 6 /// 3

Total 30

Step 4: Totalling the frequencies. All the frequencies in the third column are
totalled and the number thus achieved needs to be equal to the total number of
scores. In case of our example, N = 30 and the total of frequencies is also 30.
Check Your Progress II
1) What is frequency distribution?
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
2) The number of people treated in a local hospital on a daily basis is given
below, construct the frequency distribution table with class interval 5. 15,
23, 12, 10, 28, 7, 12, 17, 20, 21, 18, 13, 11, 12, 26, 30, 16, 19, 22, 14, 17,
21, 28, 9, 16, 13, 11, 16, 20, 1
44
Data Organisation
Class Interval Tallies f and Graphical
Representation

2.4 CUMULATIVE FREQUENCY DISTRIBUTION


After understanding frequency distribution, let us now take a look at the
cumulative frequency distribution. Cumulative frequency can be obtained
when we successively add all the frequencies from the bottom of the
distribution (Mangal, 2002). A cumulative frequency distribution table is a
more meticulous table. It looks similar to a frequency distribution table but it
has added column that gives the cumulative frequency.
Let us understand the cumulative frequency distribution table with an example.
In a walking race conducted by a club, all 10 of the participants had to fill out a
form that gave their personal and demographic details. Participants filled in
various details but here we will consider their age for constructing the
cumulative frequency distribution table.
The ages (in years) of the participants were as follows:
36, 48, 54, 92, 57, 63, 66, 76, 66, 80
Now, answer the following questions based on above raw data:
 How many participants aged less than 45?
 How many participants aged more than 44?
 What is the percentage of participants who are older than 65 years?
The answer to these questions can be best given using cumulative frequency
distribution method. These are called as cumulative frequencies “because they
tell how many scores are accumulated up to this point on the table”(Aron, Aron
and Coups, 2013, page 7).
45
Introduction Let us present this data in a cumulative frequency distribution table.
Step 1: Divide the values into intervals, and then count the number of values in
each interval. In this case, intervals of 10 are appropriate. Since 36 is the
lowest age and 92 is the highest age, start the intervals at 35 to 44 and end the
intervals with 85 to 94.
Step 2: Create a table similar to the frequency distribution table but with three
extra columns.
Step 3: In the first column or the lower value column, list the lower value of
the intervals. For example, in the first row, you would put the number 35.
Step 4: The next column is the upper value column. Place the upper value of
the intervals. For example, you would put the number 44 in the first row.
Step 5: The third column is the Frequency column. Record the number of times
a value appears between the lower and upper values of the intervals. For
example in the first row, place the number 1.
Step 6: The fourth column is the Cumulative frequency column. Here, we add
the cumulative frequency of the previous row to the frequency of the current
row. Since, this is the first row, the cumulative frequency is the same as the
frequency. However, in the second row, the frequency for the 35–44 interval
(i.e., 1) is added to the frequency for the 45–54 interval (i.e., 2). Thus, the
cumulative frequency is 3, meaning we have 3 participants in the 35 to 54 age
group.
1+2=3
Step 7 and 8 can be added to obtain cumulative percentage frequency.
Step 7: The next column is the Percentage column. In this column, list the
percentage of the frequency. To do this, divide the frequency by the total
number of values and multiply by 100. In this case, the frequency of the first
row is 1 and the total number of values is 10. The percentage would then be 10.
10(1 ÷ 10) × 100 = 10
Step 8: The final column is Cumulative percentage frequency. In this column,
multiply the cumulative frequency by 100 and then divide it by the total
number of values. Note that the last number in this column should always
equal 100.0. In this example, the cumulative frequency is 1 and the total
number of values is 10, therefore the cumulative percentage frequencyof the
first row is 10.0.
1 × 100 ÷ 10 = 10
The cumulative frequency distribution table will look like this:

Lower Upper Frequency Cumulative Percentage Cumulative


Value (age Value (f) frequency percentage
in years) (age in frequency
years)

85 94 1 10 10 100

46
Data Organisation
75 84 2 9 20 90 and Graphical
Representation
65 74 2 7 20 70

55 64 2 5 20 50

45 54 2 3 20 30

35 44 1 1 10 10

N= 10

Based on preceding table, now following information can be obtained:


 Number of participants aged less than 45 years= 1
 Number of participants aged more than 44 years = 9
 Percentage of participants aged above 65 years = 50%
Note that cumulative frequencies can easily be converted to cumulative
percentage frequencies by carrying out multiplication between the
cumulative frequencies and 100 and dividing by N (N is the total
number of frequencies in the distribution). Cumulative percentage frequencies
provide information about the percentage of frequencies that lie below
a certain score/ class interval (Mangal,2002).
Check Your Progress III
1) How is cumulative frequency obtained?
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
2) The number of people treated in a local hospital on a daily basis is given
below, construct cumulative frequency distribution and cumulative
percentage frequency with class interval 5.
15, 23, 12, 10, 28, 7, 12, 17, 20, 21, 18, 13, 11, 12, 26, 30, 16, 19, 22, 14,
17, 21, 28, 9, 16, 13, 11, 16, 20. 1

Class Tallies f Cumulative Cumulative


Interval frequency percentage
frequency

47
Introduction

2.5 PERCENTILE AND PERCENTILE RANKS


There are two terms that are used frequently in academic and corporate world:
percentile and percentile ranks. Both these statistical terms are used as
indicators of performance in comparison to others in a large group. It can be
said that these indicators are relative measures of one’s performance. There are
many tests that report scores in percentile or percentile ranks. You may have
heard about Common Aptitude Test (CAT)-a common entrance exam
conducted for MBA admissions in India. This exam gives result in percentile.
For example, a student may obtain 90th percentile in math ability and 84th
percentile in verbal ability.
In this section of the unit, we will discuss about the terms percentile and
percentile rank and also learn how to compute them.
Percentile: A percentile can be explained as “a point on the score scale below
which a given percent of cases lie” (Mangal, 2002, page 56). For example, if a
student obtained 90th percentile (P90), it means that 90% of the students have
scored below him/ her or if the student obtains 84 percentile (P84) then 84% of
the students lie below him/ her. Percentiles are expressed in terms of
percentage of persons in the standardization sample who fall below a given raw
score. A percentile will show an individual’s relative position in the
standardization sample. There is the difference between rank and percentile. In
ranks we count from the top and the best person in the group gets Rank 1.
However, in percentile we count from the bottom and lower the percentile,
poorer is an individual’s position in the group. The 50th percentile or P50 is like
the median. Above 50th percentile denotes above average performance while
below P50 denotes below average performance. Percentiles are different from
percentage scores. Percentage scores are raw scores which are expressed in
terms of percentage of correct items, while percentiles are derived scores.
Advantages of Percentile Scores
1) It is universally applicable.
2) It can be readily understood and are easy to compute even by untrained
persons.
3) Is suitable for any type of test.

48
Drawbacks of Percentile Scores Data Organisation
and Graphical
1) Percentiles show individuals relative position in the normative score but Representation
not the individuals score compared with one another.
2) Percentile score have inequality of the unit and this is a major drawback.
Computation of percentile: Percentile can be computed as follows:
The formula for computation of percentiles is similar to that of median
(Mangal, 2002).
P = L + [(pN/ 100- F)/ f] X i
Where,
L = The lower limit of the percentile class or the class where the percentile
may lie.
p = Number of percentile for which calculation is to be carried out.
N = The total number of frequencies
F = Total of the frequencies that exist before the percentile class
f= Frequency of the percentile class
i= The size of the class interval
Thus, the formula for 1st percentile would be
P1 = L + [(N/ 100- F)/ f] X i
And the formula for 10th percentile would be
P10 = L + [(10N/ 100- F)/ f] X i
= L + [(N/ 10- F)/ f] X i
And the formula for 75th percentile would be
P75 = L + [(75N/ 100- F)/ f] X i
= L + [(3N/ 4- F)/ f] X i
Let us now compute percentile with the help of an example given in table 2.7.
Table 2.7: Data for computation of Percentile

Class Interval f

25-29 5

20-24 4

15-19 6

10-14 4

5-9 4

0-4 7

N= 30

49
Introduction Now if we want to compute 30th percentile for the above data, we will compute
with the help of the following steps:
Step 1: Find the class interval within which the 30th percentile will fall. P30
indicates that 30% of the scores lie below this point. Thus, 30% of N = 30 X
30/100 = 9. Now as we look at the data, the 9th score from below lies in the
class interval 5-9.
Step 2: L, that is, the lower limit of the percentile class or the class where the
percentile may fall is identified. In the case of this example, it will be 4.5 that
is the lower limit of class interval 5-9.
Step 3: F, that is, total of the frequencies that exist before the percentile class is
7. In case of this example and f, that is, frequency of the percentile class is 4.
Step 4: Let us now substitute the values in the formula
P30 = L + [(30N/ 100- F)/ f] X i
= 4.5 + [(30 X 30/ 100- 7)/ 4] X 5
= 4.5 + [(9-7)/4] X 5
= 4.5 + 2/4 X5
= 4.5 +2.5
=7
Thus, the obtained P30 is 7 that falls in the class interval 5-9.
Percentile Ranks: In statistics, percentile rank refers to the percentage of
scores that are identical to or less than a given score. Percentile rank can be
explained as “the number representing the percentage of the total number of
cases lying below the given score” (Mangal, 2002, page 60). Percentile ranks,
like percentages, fall on a continuum from 0 to 100. For example, a percentile
rank of 50 indicates that 50% of the scores in a distribution of scores fall at or
below the score at the 50th percentile. Percentile ranks are beneficial when you
want to quickly understand how a specific score compares to the other scores
in a distribution. For instance, knowing someone scored 300 points in an exam
doesnot tell you much. You do not know how many points were possible, and
even if you did, you would not know how that person scored compared to the
rest of his/her classmates. If, however, you were told that he/she scored at the
95th percentile rank, then you would know that he/she did as well or better than
95%of his/her class.
Computation of percentile rank: Percentile rank can be computed for an
ungrouped data as well as grouped data. These computations have been
discussed as follows with the help of examples:
Computation of Percentile rank for ungrouped data: The formula for
computation of percentile rank for ungrouped data is:
PR= 100-100R- 50/ N
Where,
PR= Percentile Rank
50
R = The rank position of the person for whom the percentile rank is to be Data Organisation
computed. and Graphical
Representation
N= Total number of persons in the group.
We will now compute percentile rank with the help of the following data:
The marks obtained by 10 students in a psychology test are given as follows:
34, 45, 23, 67, 43, 78, 87, 56, 88, 46
We will now find percentile rank for the marks 67.
Step 1: The marks are to arranged in descending order as follows:

Marks Rank order

88 1

87 2

78 3

67 4

56 5

46 6

45 7

43 8

34 9

23 10

Step 2: Rank for the marks are identified. As can be seen above, the Rank for
marks 67 is 4 and N is 10.
Step 3: Let us now substitute the values in the formula
PR= 100-(100R- 50/ N)
= 100- (100 X 4- 50/ 10)
= 100- (400-50/ 10)
= 100- 350/10)
= 100- 35
= 65
Thus, the percentile rank obtained for rank 67 is 65.
Computation of Percentile rank for grouped data: There are two methods for
computing percentile rank for grouped data. One is where as such formula is
not required and the other where formula is required. 51
Introduction We will now compute percentile rank with the help of the following data:

Marks f

90-99 1

80-89 3

70-79 2

60-69 10

50-59 9

40-49 3

30-39 6

20-29 7

10-19 8

0-9 1

N= 50

We will compute percentile for marks 35.


Method 1: Without formula
The steps in this computation are discussed as follows:
Step 1: We know that the marks 35 fall in the class interval 30-39. If we add
the frequencies that are below the upper limit of class interval 20- 29, that is
29. 5, there are (7 +8 +1)= 16 cases.
Step 2: We need to find out the number of cases that lie below 35. Thus, 35-
29.5= 5.5.
Step 3: The frequency distribution for class intervals 30- 39 is 6. Thus,
these 10 marks (30-39) are shared by 6 individuals. The interval shared
by eachof the 6 individuals is 5.5. 6/10 x 5.5 = 3.3.
Step 4: Thus, up to marks 35, there are 16 +3.3 = 19. 3 or 19 cases.
Step 5: To present these cases on a scale of 100. we multiply these cases with
100/N. N= 50.
19.3 X 100/ 50 = 1930/50 = 38. 6
Thus, the percentile rank is 38.6 or 39 for marks 35.
Method 2: With formula
The formula for computation of percentile rank for grouped data is:
PR= 100/ N [F + (X-L/ i) x f]
52
Where, Data Organisation
and Graphical
PR= Percentile Rank Representation

F= The cumulative frequency that lies below the class interval that consists
of X
X= The marks for which the percentile rank is to be computed.
L= The lower limit of the class interval that consists of X
i= Size of the class interval
f= Frequency of the class interval that consists of X
N= Total number of cases in the distribution
We will take the same example discussed above and compute the percentile
rank for marks 35 with the help of the formula.
Step 1: The cumulative frequency below the class interval (30-39) that consists
of X (35) is 16 (7 +8 +1). Thus F is 16.
Step 2: L, that is, the lower limit of the class interval that consists of X, is 29.5,
i = 10 and f = 6.
Step 3: Let us now substitute the values in the formula
PR= 100/ N [F + (X-L/ i) x f]
= 100/ 50 [16+ (35-29.5/10) x 6]
= 2 [16 + 5.5/ 10 x 6
= 2 [16+3.3]
= 2 x 19.3
= 38.6
Thus, the percentile rank is 38.6 or 39 for marks 35.
Percentile and percentile rank can be termed as important in statistics as they
not only provide information about the comparative position of an individual in
a particular group based on certain characteristics, but they also help in
comparing individuals in two or more groups or under two or more
circumstances or conditions. For example, if a learner from one college
obtained 55 marks in psychology and another learner from another college
obtained 65 marks, these cannot be compared, but if these marks are converted
in to percentile rank and then it is stated that both have 60th percentile rank,
then a comparison is possible. Percentiles also play an important role in
standardisation of psychological tests where the raw data can be converted to
percentiles and interpreted.
Check Your Progress IV
1) What is percentile?
......................................................................................................................
...................................................................................................................... 53
Introduction ......................................................................................................................
......................................................................................................................
......................................................................................................................
2) Compute percentile rank for 22 in the following data:
23, 34, 22, 33, 45, 55, 32, 43, 46, 21

2.6 GRAPHICAL REPRESENTATION OF DATA


All the available numerical data can be represented graphically. A graph is the
representation of data by using graphical symbols such as lines, bars, pie
diagrams, dots etc. A graph represents a numerical data in the form of a
structure and provides important information to the user of the data.
When an organised data is graphically represented it not only looks attractive
but it is easier to understand. A large amount of data can be presented in a very
concise and attractive manner. Graphs are effective and economical as well.
They are also easy to interpret and adequately reflect any comparison between
two sets of data.
There are various types of graphs like bar graph, histogram, frequency polygon
etc. that can be effectively used to graphically represent data. However, one
must know when to use which graphs.
Let us now discuss various types of graphs.

2.6.1 Bar Graph or Bar Diagram


A bar graph is also called as bar diagram. It is the most frequently used graph
in statistics. A bar graph is a type of graph, which contains rectangles or
rectangular bars. The lengths of these bars should be proportional to the
numerical values represented by them. In bar graph, the bars may be plotted
either horizontally or vertically depending on the interest of the plotter.

54
Bar graph or diagram can be easily drawn for raw scores, frequencies, Data Organisation
percentages and mean (Mangal, 2002). and Graphical
Representation
The following needs to be taken care of while drawing bar graphs (Mangal,
2002):
1) Rules need to be followed with regard to the length of the bars, though no
rules are applicable to the width, all the bars need to be of equal width.
The lengths or heights of the bars in the bar graph need to in proportion
with the amount of variables.
2) The space between two bars could be around half of the width of a bar
and the space between any two bars should be same.
The steps followed while drawing a vertical bar graph are as follows:
Step 1: On a graph paper draw the vertical (y axis) and horizontal (x axis)
lines. These lines should be perpendicular to each other and need to intersect at
0.
Step 2: Provide adequate labels to the y axis and x axis.
Step 3: A scale needs to be selected for the length of the bars that is usually
written on the extreme right at the top of the bar graph.
Step 4: On x axis, we need to select a width for the bars as well as the gap
between the bars that needs to be uniform.
Step 5: Based on your data you may then draw the graph.
An example of bar graph or diagram is given in figure 2.1, which is based on
the table 2.1 that reflects the marks obtained by students in a class test in
Psychology of 100 marks. There are 20 students who scored marks between
1-25, 12 who secured marks between 26 and 50, 40 students who secured
marks between 51 and 75 and 28 students secured between 76-100 marks:
The bar graphs based on table 2.1 will look as follows:
50
40

40
Number of students
30

28
20

20
10

12
0

1 to 25 26 to 50 51 to 75 76 to 100
Marks obtained

Fig. 2.1: Bar Graph 55


Introduction 2.6.2 Histogram
A Histogram is a bar diagram that can be drawn based on frequency
distribution. The following steps are to be taken while drawing a histogram.
Step 1: Histogram is based on frequency distribution and a grouped frequency
distribution has class intervals, therefore, before drawing a histogram, two
more class intervals are added, one below and one above. As can be seen in
table 2.8. The frequency distribution originally had 5 class interval, but two
more,one below and one above have been added.
Step 2: Further for histogram, the class intervals are changed as can be seen in
figure 2.2. where class interval 10-19 has changed to 9.5-19.5 and so on.
Step 3: On x axis, the actual lower limits of all the class intervals are then
plotted. And frequencies are plotted on the y axis.
Step 4: A single rectangle will then represent each frequency.
Ensure that the height of the graph is around 75% of its width.

Table 2.8: Data for Histogram

Class Intervals (10) Class Intervals taken Frequencies


for Histogram

70-79 69.5- 79.5 0

60-69 59.5- 69.5 5

50-59 49.5- 59.5 4

40-49 39.5- 49.5 13

30-39 29.5- 39.5 12

20- 29 19.5- 29.5 10


10-19 9.5- 19.5 0

14

11
Frequencies

0
9.5 19.5 29.5 39.5 49.5 59.5 69.5
Actual lower limits of Class Interval

Fig. 2.2: Histogram

56
Data Organisation
2.6.3 Frequency Polygon and Graphical
A line graph used for plotting frequency distribution is called frequency Representation
polygon. Frequency polygon can either be constructed directly or it can also be
constructed by drawing a straight line through the midpoints of the upper base
of the histogram (Mangal, 2002), that is shown in figure 2.4.
Steps followed while drawing a frequency polygon are as follows:
Step 1: As we know that the frequency polygon is based on frequency
distribution. In case of frequency polygon as well before drawing a frequency
polygon, two more class interval are added, one below and one above. As can
be in table 2.9.
Step 2: For all the class intervals, midpoints are computed.
Step 3: Like every graph, frequency polygon also has x axis and y axis. On x
axis, the midpoints are to be plotted and the frequencies will be represented on
the y axis.
Step 4: The corresponding frequencies of the class intervals are then plotted
based on the midpoints given on x axis.
Step 5: These points are then joined to form a line.
Ensure that the height of the graph is around 75% of its width.Once plotted, the
frequency polygon will look as given in figure 2.3.
Table 2.9: Data for Frequency Polygon
Class Intervals (10) Midpoints of Class Frequencies
Intervals

70-79 74.5 0
60-69 64.5 5
50-59 54.5 4
40-49 44.5 13
30-39 34.5 12
20- 29 24.5 10
10-19 14.5 0

14

11
Frequencies

0
14.5 24.5 34.5 44.5 54.5 64.5 74.5
Midpoints of Class
class interval
Interval

Fig. 2.3: Frequency Polygon 57


Introduction

Actual lower limits of Class Interval

Fig. 2.4 : Frequency Polygon drawn with the help of Histogram

2.6.4 Cumulative Frequency Percentage Curve or Ogive


Cumulative frequency percentage can be plotted in form of a graph and this
graph is called as cumulative frequency percentage curve or ogive. Such a
graph is a line graph. On y axis the cumulative frequency percentages are
plotted and on x axis, the upper limit of the class intervals are plotted. This
graph lacks a negative slope and when a certain class interval has zero
frequency then the line or curve will remain horizontal.

As was discussed under section on cumulative frequency distribution,


cumulative frequency percentage is computed by multiplying the cumulative
frequency by 100/ N, where N stands for total number of frequencies.

The steps to draw a cumulative frequency percentage curve or ogive are as


follows:

Step 1: The frequency distribution table should be ready with computation of


cumulative frequency percentages.

Step 2: Plot the cumulative frequency percentage on y axis and the upper
limits of class interval on x axis.

Step 3: Plot the points representing the cumulative frequency percentage for
each class interval.

Step 4: Join the points with the help of a line.

58
Data Organisation
Table 2.10 : Data for Cumulative frequency and cumulative and Graphical
frequency percentage Representation
Class Upper Limit Frequencies Cumulative Cumulative
Intervals of Class frequencies frequency
(10) Intervals percentage

60-69 69.5 10 44 100

50-59 59.5 12 34 77.27

40-49 49.5 13 22 50

30-39 39.5 4 9 20.45

20- 29 29.5 5 5 11.36

10-19 19.5 0 0 0

Upper limits of Class Interval

Fig. 2.5: Cumulative Frequency Percentage Curve or Ogive

2.6.5 Circle Graph or Pie Chart


A pie chart is also known as a circle graph. A pie chart is defined as a graph,
which contains a circle which is divided into sectors. These sectors illustrate
the numerical proportion of the data. Each portion of the circle represents the
data. This circle graph is called as pie chart because ‘pie’ (π) is a quantity that
is considered when the circumference of a circle is determined (Mangal, 2002).
Steps in construction of a pie chart:
Step 1: The data represented here is presented through 360º because the
surface area of the circle covers 2π or 360º.
Step 2: The total frequency is considered equal to 360º and then angle for each
component part is computed. This is done by using the formula:
(Frequency of the component/ Total frequency) X360º.
If the components are presented in percentages then the formula used is
(Percentage value of a particular component/ 100) X360º
Step 3: The sections are then drawn after the angles are determined.
59
Introduction
Table 2.11: Data for Pie Chart

Occupation Number of Angle of the circle


Individuals

Lawyer 5 5/30 x360º= 60º

Accountant 6 6/30 x360º= 72º

Psychologist 4 4/30 x360º= 48º

Engineer 7 7/30 x360º= 84º

Doctor 8 8/30 x360º= 96º

Total 30 360º

Law
Doct
yer
or
Acc
ount
Engi Psyc ant
neer holo
gist

Fig. 2.6: Circle Graph pr Pie Chart


Check Your Progress V
1) What care needs to be taken while drawing a bar graph?
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
2) What is a pie chart?
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
60
Data Organisation
2.7 LET US SUM UP and Graphical
Representation
In this unit we initially discussed about classification and tabulation of
qualitative and quantitative data. In descriptive statistics classification and
tabulation of data, whether qualitative or quantitative, are two important
functions that help researchers in organising the data in a better manner and
then to subject it to further statistical analysis. Data classification is the method
of organising data into groups for its most effective and efficient use. Well-
planned data classification system makes vital data easy to find and retrieve
whenever required. Tabulation, on the other hand, is the process of insertion of
classified data into tabular form. A table is a symmetric arrangement of
statistical data in rows and columns. We also discussed about the key
components of tabulation. The significance of classification and tabulation was
also highlighted.
Further in this unit, we discussed about frequency distribution. Frequency
distribution is arranged in a tabular form in which the raw data is organised in
to class intervals. Frequency distribution can be categorised as relative
frequency distribution, cumulative frequency distribution and cumulative
relative frequency distribution, which were discussed in the unit with the help
of examples. Besides the two main methods, namely, the exclusive and
inclusive methods, of describing class interval in frequency distribution were
also discussed. The unit then focused on computing frequency distribution for
both ungrouped and grouped data. The steps involved in creating a cumulative
frequency distribution were also highlighted. Cumulative frequency
percentage was also explained in the unit.
Further, the unit focused on the concepts and computation of percentile and
percentile rank with the help of examples. A percentile can be explained as a
point on the score scale below which a given percent of cases lie and percentile
rank refers to the percentage of scores that are identical to or less than a given
score.
The last section of the unit explained the graphical representation of data. A
graph is the representation of data that uses graphical symbols such as lines,
bars, pie diagrams, dots etc. When an organised data is graphically represented,
it not only looks attractive but it is easier to understand. A large amount of data
can be presented in a very concise and attractive manner. Graphs are effective
and economical as well. In the present unit, bar graph, histogram, frequency
polygon, cumulative frequency percentage curve or ogive and piechart were
discussed in detail with the help of examples and figures.

2.8 REFERENCES
Kurtz, A. K., & Mayo, S. T. (2012). Statistical Methods in Education and
Psychology. Springer Science & Business Media.
Kurtz A.K., Mayo S.T. (1979) Percentiles and Percentile Ranks. In: Statistical
Methods in Education and Psychology. Springer, New York, NY
Miles, J. N. V., & Banyard, P. (2007). Understanding and Using Statistics in
Psychology: A Practical Introduction. London: Sage.
Wright, D. B., & London, K. (2009). First Steps in Statistics (2nd ed.).
London: Sage.
61
Introduction Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. Sage.
Rosnow, R. L., & Rosenthal, R. (2005). Beginning Behavioural Research: A
Conceptual Primer (5th ed.). Englewood Cliffs, NJ: Pearson/Prentice Hall.
Aron, A., Coups, E. J. & Aron, E. N. (2013). Statistics for Psychology (6th ed.).
Pearson Education

2.9 KEY WORDS


Classification: It is the process of ordering data into homogenous groups or
classes according to some common characteristics present in the data is called
classification.
Tabulation: It is the process of insertion of classified data into tabular form.
Frequency: It is the number of times a particular variable/ individual or
observation (obtained marks in our context) occurs in raw data.
Percentiles: These are expressed in terms of percentage of persons in the
standardisation sample who fall below a given raw score. A percentile will
show an individual’s relative position in the standardisation sample.
Percentile ranks: refers to the percentage of scores that are identical to or less
than a given score. Percentile ranks, like percentages, fall on a continuum from
0 to 100.

2.10 ANSWERS TO CHECK YOUR PROGRESS


Check Your Progress I
1) What is quantitative data?
Quantitative data states information about quantities, that is, information
that can be measured and written down with numbers.
2) List the merits of classification and tabulation.
The merits of classification and tabulation are as follows:
a) It helps in clarifying the data
b) The data is presented in simple form
c) Comparison is possible between the data
d) Information can be easily referred to
Check Your Progress II
1) What is frequency distribution?
Frequency distribution is a way in which raw data can be classified so as to
provide a clearer understanding of the data.
2) The number of people treated in a local hospital on a daily basis is given
below, construct the frequency distribution table with class interval 5.
15, 23, 12, 10, 28, 7, 12, 17, 20, 21, 18, 13, 11, 12, 26, 30, 16, 19, 22, 14, 17,
62 21, 28, 9, 16, 13, 11, 16, 20. 1
Data Organisation
Class Interval Tallies f and Graphical
Representation
30-34 / 1
25-29 /// 3
20-24 //// / 6
15-19 //// /// 8
10-14 //// //// 9
5-9 // 2
0-4 / 1
N= 30

Check Your Progress III


1) How is cumulative frequency obtained?

Cumulative frequency can be obtained when we successively add all the


frequencies from the bottom of the distribution
2) The number of people treated in a local hospital on a daily basis is given
below, construct the cumulative frequency distribution table with class
interval 5.
15, 23, 12, 10, 28, 7, 12, 17, 20, 21, 18, 13, 11, 12, 26, 30, 16, 19, 22, 14,
17, 21, 28, 9, 16, 13, 11, 16, 20. 1

Class Interval f Cumulative Cumulative


Frequency Percentage
Frequency

30-34 1 30 100

25-29 3 29 96.67

20-24 6 26 86.67

15-19 8 20 66.67

10-14 9 12 40

5-9 2 3 10

0-4 1 1 3.33

N= 30

63
Introduction Check Your Progress IV
1) What is percentile?
Percentile can be described as a point on the score scale below which a given
percent of cases lie.
2) Compute percentile rank for 22 in the following data:
23, 34, 22, 33, 45, 55, 32, 43, 46, 21

Data Rank order


55 1
46 2
45 3
43 4
34 5
33 6
32 7
23 8
22 9
21 10

The percentile rank for 22 is 15.


Check Your Progress V
1) What care needs to be taken while drawing a bar graph?
The lengths or heights of the bars in the bar graph need to in proportion with
the amount of variables. The space between two bars could be around half of
the width of a bar and the space between any two bars should be same.
2) What is a piechart?
A pie chart is defined as a circular graph, which contains a circle which is
divided into sectors.

2.11 UNIT END QUESTIONS


1) Explain classification of data with a focus on its objective.
2) Describe the key components of a table.
3) Elucidate percentile and percentile ranks with suitable examples.
4) Describe bar diagram with suitable diagram
5) Discuss the steps involved in drawing a cumulative frequency percentage
64 curve or ogive.

You might also like