Unit 2
Unit 2
GRAPHICAL REPRESENTATION*
Structure
2.0 Objectives
2.1 Introduction
2.2 Classification and Tabulation of Qualitative and Quantitative Data
2.2.1 Classification
2.2.2 Tabulation
2.0 OBJECTIVES
After reading this unit, you will be able to:
discuss the classification and tabulation of statistical data;
describe the steps in construction of a frequency distribution ;
create a cumulative frequency distribution table;
explain the meaning of percentile and percentile ranks; and
discuss the graphical representation of data.
2.1 INTRODUCTION
The objective of all statistical inquiry is to describe and understand the
population of interest. For example, in an exit poll survey, a news channel
wants to assess the political attitude of the voters, how they are going to vote in
* Dr. Vijay Viegas, Assistant Professor, Abbé Faria Post Graduate Department of Psychology,
34 St. Xavier’s College, Goa
the upcoming election, and what are the chances of current Government to Data Organisation
come back in power again? This information about the population of interest and Graphical
Representation
can be gained from a number of statistical enquiries. Exit poll surveys provide
tentative information about which party will gain what percentage of votes in
which state of India and so on. Such exit poll surveys make use of basic
statistical techniques that can be categorised under descriptive statistics.
In the previous unit we mainly discussed about the term statistics, its definition,
nature and also key terms. We also discussed about scales of measurement and
the two main categories of statistics, namely descriptive and inferential
statistics. In the present unit, we will mainly focus on the varied aspects of
descriptive statistics, viz, classification, tabulation, organisation and graphical
representation of data. One of the most basic yet important method known as
frequency distribution will also be discussed in this unit. Further, we will also
discuss the method of cumulative frequency distribution, percentile, percentile
rank and graphical representation of data.
35
Introduction descriptive statistics. In the present section we will discuss classification and
tabulation of data.
2.2.1 Classification
Data classification is a method of organising data into groups for its most
effective and efficient use. A well-planned data classification system makes
vital data easy to find and retrieve whenever required. In other words, the
process of ordering data into homogenous groups or classes according to some
common characteristics present in the data is called classification. For
example, it is a common exercise that during the process of sorting letters in a
post office, the letters are classified according to the cities and further arranged
according to streets and other details, so that it becomes easier to deliver the
letters to its destination.
In the context of research, the data collected by a researcher is arranged in
formats that will help him/ her draw conclusions. Basically, classification
involves sorting the data based on similarities. Once the data is classified, the
researcher can proceed with further statistical analysis and decision making.
Some of the main objectives of classification are as follows:
1) The data is presented in a concise form. A raw data as such has no
meaning. But once it is classified, it will reflect some meaning.
2) Classification helps in identifying the similarities and diversities in the
data. For example, based on the marks obtained in an English test,
students can be grouped in to those obtaining 76-100, those obtaining
marks between 51-75, those obtaining marks between 26-50 and those
obtaining marks between 1-25. Each of these groups are distinct from
each other in terms of marks obtained, but are grouped because of
similarity of marks obtained by them (refer to table 2.1).
76- 100 28
51-75 40
26-50 12
1-25 20
37
Introduction A table thus prepared is give below:
Table 2.2: Percentage of male and female students based on marks obtained
by them in English test
......................................................................................................................
34 3 10% 30 100%
23 4 13.33% 27 89.99%
22 10 33.33% 23 76.66%
21 6 20% 13 43.33%
19 7 23.33% 7 23.33%
N= 30
39
Introduction In frequency distribution there are two main methods to describe class interval.
1) The exclusive method: In this method, the upper limit of a certain class
interval is the lower limit of the class interval next to it, thus there is a
continuity between the class intervals. The score that equals the upper
limit of a class interval is exclusive in the sense that it will fall in the
class interval where the score is its lower limit. Thus, in exclusive method
the score equal to upper limit is not included in that class interval, but a
score equal to its lower limit is included in it. For example, in a
distribution with class intervals using exclusive method, a score 20 will
fall in class interval 20- 30 and not in 10- 20 class interval.
2) The inclusive method: In inclusive method there is no continuity
between the class intervals and this method is especially for discrete
scores. In this method, scores equal to both lower and upper limit are
included in the class interval. For example, the class intervals will be 1-5,
6-10, 11- 15 and so on.
Frequency distribution can also be categorised in to ungrouped or grouped
frequency distribution.
Ungrouped Frequency Distribution: An ungrouped frequency distribution is
the one in which all the values are listed in an ascending or a descending order.
Based on the frequency of occurrence of each score, a tally mark ( / ) is placed
in front of the respective value and frequency (denoted by ‘f’) of each score is
stated in the next column. The example of ungrouped frequency distribution is
given in table 2.4:
Values Tallies f
6 /// 3
9 //// 4
12 //// 5
23 / 1
24 // 2
40
Data Organisation
Table 2.5: Grouped frequency distribution and Graphical
Representation
Values Tallies f
1-5 /// 3
6-10 //// 5
11-15 // 2
16-20 / 1
21-25 / 1
3 8 6 5 6 4 7 6
5 3 5 6 3 5 4 4
3 6 7 8 1 10 7 6
4 5 0 7 6 5 6 7
1 7 5 4 5 8 5 7
These numbers (marks of the students) are called as raw data, as they are
obtained from the field directly and haven’t gone through any statistical
analysis. Now the question is, what these numbers or raw data suggest about
the target population of students? Which marks are most common? How many
students got highest marks? How many students passed this test? With raw
data, though, it is not possible to draw any conclusion. Thus, we need to create
a frequency distribution on the basis of the raw scores. Frequency can be
calculated for each of the obtained score by the students.
Frequency is the number of times a particular variable/ individual or
observation (obtained marks in our context) occurs in raw data.
The distribution of a variable is the pattern of frequencies of the observation.
Frequency distributions are portrayed as frequency tables, histograms,
or polygons. It is just the arrangement of scores and the frequency of
occurrence within a group. A frequency distribution table is one way you can
organise data so that it makes more sense to the reader.
As discussed earlier, there are two major types of frequency distribution,
grouped frequency distribution and ungrouped frequency distribution. The
computation for both these frequency distributions are discussed as follows:
41
Introduction 2.3.1 Computation of Ungrouped Frequency Distribution
To calculate frequency we are going to use Tally Score Method – “This
method consists of making a stroke in the proper class for each observation and
summing these for each class to obtain the frequency. It is customary for
convenience in counting to place each fifth stroke through the preceding
four . . .” (Lawal, 2014, page 13). The frequency can be tabulated as follows
(based on example of marks obtained by forty students:
0 / 1
1 // 2
2 0
3 //// 4
4 //// 5
5 //// //// 9
6 //// /// 8
7 //// // 7
8 /// 3
9 0
10 / 1
∑ = 40
Please note that the total (∑) should be equal to the number of students, that is,
40. Now, we can conclude following information from frequency table:
Only one student got full marks.
Most common marks is five followed by six.
Only one student scored zero on the test.
The steps involved in creating an ungrouped frequency distribution are as
follows:
Step 1: Arrange your raw data in an array-ascending or descending order.
Step 2: Make a table with three columns and name them as variable (that is,
marks in the case of the present example), tallies and frequency.
Step 3: Enter your variables (marks in case of this example) in the first column
from lowest to highest order.
42
Step 4: Now, go one by one, through your raw data and make a mark (/) for Data Organisation
each variable next to its value in the second column of your table. and Graphical
Representation
Step 5: Count the tally marks for each variable and write its total in third
column, that is, frequency column.
12 7 13 14 12 23 21 14 13 23
30 12 1 21 23 21 23 21 5 21
11 22 30 14 4 17 35 24 13 17
Step 1: Range is to be found. In the case of our example, the lowest value
is 1 and the highest value is 35. Range= Highest Score - Lowest Score
(R=H-L)
Thus, R = 35-1 = 34.
Step 2: The class interval can be derived by dividing the range by number of
categories that we need.
i = Range/ Number of categories needed
In our example, the range is obtained as 34, and total number of scores
(number of students) are 30. Thus, around 6 categories would be sufficient.
Thus,
i = 34/ 6= 5.7, that can be rounded off to 6.
While creating categories, ensure that not more than 10 categories are created
if there are approximately 50 scores, not more than around 10 to 15 categories
are created if the scores are between 50 to 100 and not more than 20 categories
are created if the scores are more than 100 (Mangal, 2002). Make sure you
have a few items in each category. For example, if you have 20 items, choose 5
classes (4 items per category), not 20 classes (which would give you only 1
item per category).
It is sometimes possible that the ‘i’ obtained is not a whole number. In such a
situation, a number nearest to this obtained number can be taken. For example
if ‘i’ is obtained as 5.8 then 6 can be taken being the nearest number.
It is also possible that the class interval or ‘i’ is finalised before the number of
categories are decided. For convenience, the class interval of 10, 5, 2, for
example, can be taken.
43
Introduction Thus, class interval can be derived in either way as mentioned above.
Step 3: Frequency distribution table can now be created. The following is to be
done to create a frequency distribution table:
a) For this a table with three columns is to be created with variable (that is,
marks in the case of the present example), tallies and frequency (this is
similar to the steps followed in creating an ungrouped frequency
distribution).
b) Then enter your variables in the first column.
c) Go through your raw data and make a mark (/) for each variable next to
its value in the second column of your table.
d) Count the tally marks for each variable and write its total in third column,
that is, frequency column.
31- 36 / 1
25- 30 // 2
13- 18 //// 5
7- 12 //// /// 8
1- 6 /// 3
Total 30
Step 4: Totalling the frequencies. All the frequencies in the third column are
totalled and the number thus achieved needs to be equal to the total number of
scores. In case of our example, N = 30 and the total of frequencies is also 30.
Check Your Progress II
1) What is frequency distribution?
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
2) The number of people treated in a local hospital on a daily basis is given
below, construct the frequency distribution table with class interval 5. 15,
23, 12, 10, 28, 7, 12, 17, 20, 21, 18, 13, 11, 12, 26, 30, 16, 19, 22, 14, 17,
21, 28, 9, 16, 13, 11, 16, 20, 1
44
Data Organisation
Class Interval Tallies f and Graphical
Representation
85 94 1 10 10 100
46
Data Organisation
75 84 2 9 20 90 and Graphical
Representation
65 74 2 7 20 70
55 64 2 5 20 50
45 54 2 3 20 30
35 44 1 1 10 10
N= 10
47
Introduction
48
Drawbacks of Percentile Scores Data Organisation
and Graphical
1) Percentiles show individuals relative position in the normative score but Representation
not the individuals score compared with one another.
2) Percentile score have inequality of the unit and this is a major drawback.
Computation of percentile: Percentile can be computed as follows:
The formula for computation of percentiles is similar to that of median
(Mangal, 2002).
P = L + [(pN/ 100- F)/ f] X i
Where,
L = The lower limit of the percentile class or the class where the percentile
may lie.
p = Number of percentile for which calculation is to be carried out.
N = The total number of frequencies
F = Total of the frequencies that exist before the percentile class
f= Frequency of the percentile class
i= The size of the class interval
Thus, the formula for 1st percentile would be
P1 = L + [(N/ 100- F)/ f] X i
And the formula for 10th percentile would be
P10 = L + [(10N/ 100- F)/ f] X i
= L + [(N/ 10- F)/ f] X i
And the formula for 75th percentile would be
P75 = L + [(75N/ 100- F)/ f] X i
= L + [(3N/ 4- F)/ f] X i
Let us now compute percentile with the help of an example given in table 2.7.
Table 2.7: Data for computation of Percentile
Class Interval f
25-29 5
20-24 4
15-19 6
10-14 4
5-9 4
0-4 7
N= 30
49
Introduction Now if we want to compute 30th percentile for the above data, we will compute
with the help of the following steps:
Step 1: Find the class interval within which the 30th percentile will fall. P30
indicates that 30% of the scores lie below this point. Thus, 30% of N = 30 X
30/100 = 9. Now as we look at the data, the 9th score from below lies in the
class interval 5-9.
Step 2: L, that is, the lower limit of the percentile class or the class where the
percentile may fall is identified. In the case of this example, it will be 4.5 that
is the lower limit of class interval 5-9.
Step 3: F, that is, total of the frequencies that exist before the percentile class is
7. In case of this example and f, that is, frequency of the percentile class is 4.
Step 4: Let us now substitute the values in the formula
P30 = L + [(30N/ 100- F)/ f] X i
= 4.5 + [(30 X 30/ 100- 7)/ 4] X 5
= 4.5 + [(9-7)/4] X 5
= 4.5 + 2/4 X5
= 4.5 +2.5
=7
Thus, the obtained P30 is 7 that falls in the class interval 5-9.
Percentile Ranks: In statistics, percentile rank refers to the percentage of
scores that are identical to or less than a given score. Percentile rank can be
explained as “the number representing the percentage of the total number of
cases lying below the given score” (Mangal, 2002, page 60). Percentile ranks,
like percentages, fall on a continuum from 0 to 100. For example, a percentile
rank of 50 indicates that 50% of the scores in a distribution of scores fall at or
below the score at the 50th percentile. Percentile ranks are beneficial when you
want to quickly understand how a specific score compares to the other scores
in a distribution. For instance, knowing someone scored 300 points in an exam
doesnot tell you much. You do not know how many points were possible, and
even if you did, you would not know how that person scored compared to the
rest of his/her classmates. If, however, you were told that he/she scored at the
95th percentile rank, then you would know that he/she did as well or better than
95%of his/her class.
Computation of percentile rank: Percentile rank can be computed for an
ungrouped data as well as grouped data. These computations have been
discussed as follows with the help of examples:
Computation of Percentile rank for ungrouped data: The formula for
computation of percentile rank for ungrouped data is:
PR= 100-100R- 50/ N
Where,
PR= Percentile Rank
50
R = The rank position of the person for whom the percentile rank is to be Data Organisation
computed. and Graphical
Representation
N= Total number of persons in the group.
We will now compute percentile rank with the help of the following data:
The marks obtained by 10 students in a psychology test are given as follows:
34, 45, 23, 67, 43, 78, 87, 56, 88, 46
We will now find percentile rank for the marks 67.
Step 1: The marks are to arranged in descending order as follows:
88 1
87 2
78 3
67 4
56 5
46 6
45 7
43 8
34 9
23 10
Step 2: Rank for the marks are identified. As can be seen above, the Rank for
marks 67 is 4 and N is 10.
Step 3: Let us now substitute the values in the formula
PR= 100-(100R- 50/ N)
= 100- (100 X 4- 50/ 10)
= 100- (400-50/ 10)
= 100- 350/10)
= 100- 35
= 65
Thus, the percentile rank obtained for rank 67 is 65.
Computation of Percentile rank for grouped data: There are two methods for
computing percentile rank for grouped data. One is where as such formula is
not required and the other where formula is required. 51
Introduction We will now compute percentile rank with the help of the following data:
Marks f
90-99 1
80-89 3
70-79 2
60-69 10
50-59 9
40-49 3
30-39 6
20-29 7
10-19 8
0-9 1
N= 50
F= The cumulative frequency that lies below the class interval that consists
of X
X= The marks for which the percentile rank is to be computed.
L= The lower limit of the class interval that consists of X
i= Size of the class interval
f= Frequency of the class interval that consists of X
N= Total number of cases in the distribution
We will take the same example discussed above and compute the percentile
rank for marks 35 with the help of the formula.
Step 1: The cumulative frequency below the class interval (30-39) that consists
of X (35) is 16 (7 +8 +1). Thus F is 16.
Step 2: L, that is, the lower limit of the class interval that consists of X, is 29.5,
i = 10 and f = 6.
Step 3: Let us now substitute the values in the formula
PR= 100/ N [F + (X-L/ i) x f]
= 100/ 50 [16+ (35-29.5/10) x 6]
= 2 [16 + 5.5/ 10 x 6
= 2 [16+3.3]
= 2 x 19.3
= 38.6
Thus, the percentile rank is 38.6 or 39 for marks 35.
Percentile and percentile rank can be termed as important in statistics as they
not only provide information about the comparative position of an individual in
a particular group based on certain characteristics, but they also help in
comparing individuals in two or more groups or under two or more
circumstances or conditions. For example, if a learner from one college
obtained 55 marks in psychology and another learner from another college
obtained 65 marks, these cannot be compared, but if these marks are converted
in to percentile rank and then it is stated that both have 60th percentile rank,
then a comparison is possible. Percentiles also play an important role in
standardisation of psychological tests where the raw data can be converted to
percentiles and interpreted.
Check Your Progress IV
1) What is percentile?
......................................................................................................................
...................................................................................................................... 53
Introduction ......................................................................................................................
......................................................................................................................
......................................................................................................................
2) Compute percentile rank for 22 in the following data:
23, 34, 22, 33, 45, 55, 32, 43, 46, 21
54
Bar graph or diagram can be easily drawn for raw scores, frequencies, Data Organisation
percentages and mean (Mangal, 2002). and Graphical
Representation
The following needs to be taken care of while drawing bar graphs (Mangal,
2002):
1) Rules need to be followed with regard to the length of the bars, though no
rules are applicable to the width, all the bars need to be of equal width.
The lengths or heights of the bars in the bar graph need to in proportion
with the amount of variables.
2) The space between two bars could be around half of the width of a bar
and the space between any two bars should be same.
The steps followed while drawing a vertical bar graph are as follows:
Step 1: On a graph paper draw the vertical (y axis) and horizontal (x axis)
lines. These lines should be perpendicular to each other and need to intersect at
0.
Step 2: Provide adequate labels to the y axis and x axis.
Step 3: A scale needs to be selected for the length of the bars that is usually
written on the extreme right at the top of the bar graph.
Step 4: On x axis, we need to select a width for the bars as well as the gap
between the bars that needs to be uniform.
Step 5: Based on your data you may then draw the graph.
An example of bar graph or diagram is given in figure 2.1, which is based on
the table 2.1 that reflects the marks obtained by students in a class test in
Psychology of 100 marks. There are 20 students who scored marks between
1-25, 12 who secured marks between 26 and 50, 40 students who secured
marks between 51 and 75 and 28 students secured between 76-100 marks:
The bar graphs based on table 2.1 will look as follows:
50
40
40
Number of students
30
28
20
20
10
12
0
1 to 25 26 to 50 51 to 75 76 to 100
Marks obtained
14
11
Frequencies
0
9.5 19.5 29.5 39.5 49.5 59.5 69.5
Actual lower limits of Class Interval
56
Data Organisation
2.6.3 Frequency Polygon and Graphical
A line graph used for plotting frequency distribution is called frequency Representation
polygon. Frequency polygon can either be constructed directly or it can also be
constructed by drawing a straight line through the midpoints of the upper base
of the histogram (Mangal, 2002), that is shown in figure 2.4.
Steps followed while drawing a frequency polygon are as follows:
Step 1: As we know that the frequency polygon is based on frequency
distribution. In case of frequency polygon as well before drawing a frequency
polygon, two more class interval are added, one below and one above. As can
be in table 2.9.
Step 2: For all the class intervals, midpoints are computed.
Step 3: Like every graph, frequency polygon also has x axis and y axis. On x
axis, the midpoints are to be plotted and the frequencies will be represented on
the y axis.
Step 4: The corresponding frequencies of the class intervals are then plotted
based on the midpoints given on x axis.
Step 5: These points are then joined to form a line.
Ensure that the height of the graph is around 75% of its width.Once plotted, the
frequency polygon will look as given in figure 2.3.
Table 2.9: Data for Frequency Polygon
Class Intervals (10) Midpoints of Class Frequencies
Intervals
70-79 74.5 0
60-69 64.5 5
50-59 54.5 4
40-49 44.5 13
30-39 34.5 12
20- 29 24.5 10
10-19 14.5 0
14
11
Frequencies
0
14.5 24.5 34.5 44.5 54.5 64.5 74.5
Midpoints of Class
class interval
Interval
Step 2: Plot the cumulative frequency percentage on y axis and the upper
limits of class interval on x axis.
Step 3: Plot the points representing the cumulative frequency percentage for
each class interval.
58
Data Organisation
Table 2.10 : Data for Cumulative frequency and cumulative and Graphical
frequency percentage Representation
Class Upper Limit Frequencies Cumulative Cumulative
Intervals of Class frequencies frequency
(10) Intervals percentage
40-49 49.5 13 22 50
10-19 19.5 0 0 0
Total 30 360º
Law
Doct
yer
or
Acc
ount
Engi Psyc ant
neer holo
gist
2.8 REFERENCES
Kurtz, A. K., & Mayo, S. T. (2012). Statistical Methods in Education and
Psychology. Springer Science & Business Media.
Kurtz A.K., Mayo S.T. (1979) Percentiles and Percentile Ranks. In: Statistical
Methods in Education and Psychology. Springer, New York, NY
Miles, J. N. V., & Banyard, P. (2007). Understanding and Using Statistics in
Psychology: A Practical Introduction. London: Sage.
Wright, D. B., & London, K. (2009). First Steps in Statistics (2nd ed.).
London: Sage.
61
Introduction Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. Sage.
Rosnow, R. L., & Rosenthal, R. (2005). Beginning Behavioural Research: A
Conceptual Primer (5th ed.). Englewood Cliffs, NJ: Pearson/Prentice Hall.
Aron, A., Coups, E. J. & Aron, E. N. (2013). Statistics for Psychology (6th ed.).
Pearson Education
30-34 1 30 100
25-29 3 29 96.67
20-24 6 26 86.67
15-19 8 20 66.67
10-14 9 12 40
5-9 2 3 10
0-4 1 1 3.33
N= 30
63
Introduction Check Your Progress IV
1) What is percentile?
Percentile can be described as a point on the score scale below which a given
percent of cases lie.
2) Compute percentile rank for 22 in the following data:
23, 34, 22, 33, 45, 55, 32, 43, 46, 21