Unit 3 Statistical Graphics
Unit 3 Statistical Graphics
STATISTICAL GRAPHICS /
EXPLORATORY DATA ANALYSIS
Written By:
Miss Sumbal Asghar
Reviewed By:
Dr. Rizwan Akram Rana
Introduction
Graphical representation of data is for the purpose of easier interpretation. Facts and
figures as such do not catch our attention unless they are presented in an interesting way.
Graphical representation of data is the most commonly used interesting modes of
presentation. The purpose of this unit is to make you familiar with this interesting mode
of presentation.
Objectives
After reading this unit, you will be able to explain:
1. Bar Chart
2. Pictograms
3. Histogram
4. Frequency Polygon or Ogive
5. Scatter Plot
6. Box Plot
7. Pie Chart
Data for a bar chart are entered in columns. Each numeric data value becomes a bar. The
chart is constructed such that lengths of the different bars are proportional to the size of
the category they represent. X-axis represents the different categories and has no scale;
the y-axis does have a scale and indicates the units of measurement, in case of vertical
bar charts, and vice versa in case of horizontal bar charts.
In the following figure result of first, second and third term of a student in the subjects of
English, Urdu, Mathematics and Pak-Studies.
30
120
100
80
1st Term
60
2ns Term
40 3rd Term
20
0
English Urdu Maths Pak Studies
Pak Studies
Maths
3rd Term
2nd Term
Urdu First Term
English
0 20 40 60 80 100 120
31
3.1.1 Advantages and Disadvantages of Bar Charts
Following are the advantages of bar charts.
i) They show data category in a frequency distribution.
ii) They display relative numbers / proportions of multiple categories.
iii) They summarize a large amount of data in an easily interpretable manner.
iv) They make trends easier to highlight than tables do.
v) By bar charts estimates can be made quickly and accurately.
vi) They are easily accessible to everyone.
3.2 Pictograms
A pictogram is a graphical symbol that conveys its meaning through its pictorial
resemblance to a physical object. A pictogram may include a symbol plus graphic
elements such as border, back pattern, or color that is intended to covey specific
information s. we can also say that a pictogram is a kind of graph that uses pictures
instead of bars to represent data under analysis. A pictogram is also called “pictograph”,
or simply “picto”.
Pictograms form a part of our daily lives. They are used in transport, medication,
education, computers etc. they indicate, in iconic form, places, directions, actions or
constraints on actions in either the real world (a road, a town, etc) or in virtual world
(computer, internet etc.).
32
3.2.1 Advantages and Drawbacks of Pictograms
Following are the advantages of pictograms:
i) Pictograms can make warnings more eye-catching.
ii) They can serve as an “instant reminder” of a hazard or an established message.
iii) They may improve warning comprehension for those with visual or literacy
difficulties.
iv) They have the potential to be interpreted more accurately and more quickly than words.
v) They can be recognized and recalled far better than words.
vi) They can improve the legibility of warnings.
vii) They may be better when undertaking familiar routine tasks.
Example
The following table shows the number of laptops sold by a company for the months
January to March. Construct a pictograph for the table.
Month January February March
Number of laptops 25 15 20
Solution:
January
February
March
represents 5 laptops
33
Example
School Subject pictogram
Source: www.kids-pages.com
3.3 Histogram
A histogram is a type of graph that provides a visual interpretation of numerical data by
indicating the number of data points that lie within the range value. These range values
are called classes or bins.
A histogram looks similar to bar charts. Both are ways to display data set. The height of
the bar corresponds to the relative frequency of the amount of data in the class. The
higher the bar is, the greater the frequency of the data will bean vice versa. The main
difference between these graphs is the level of measurement of the data. Bar graphs are
used for data at nominal level of measurement. It measures the frequency of categorical
data. On the other hand histograms are used for data that is at least ordinal level of
measurement. As a common practice the bars of bar graph are rearranged in order for
decreasing height. However the bars of cannot be rearranged. They must be displayed in
order that the classes occur.
34
A bar graph presents actual counts against categories. The height of the bar indicates the
number of items in that category. A histogram displays the same categorical variables in
bins. While creating a histogram, you are actually creating a bar graph that shows how
many data points are there within the range (an interval), called a bin.
There are no hard and fast rules about how many bins there should be. But the rule of
thumb is 5-20 bins. Less than 5 bins will have little meaning and more than 20 bins, will
make data hard to read and interpret. Ideally 5-7 bins are enough.
ii) Bimodal
A bimodal shape, shown below, has two peaks. This shape may show that the data
has come from two different systems. Often in a single system, there may be two
modes in the data set.
35
iii) Skewed right
Some histograms will show a skewed distribution to the right, as shown below. A
distribution skewed to the right is said to be positively skewed. This kind of
distribution has a large number of occurrences in the lower value cells (left side)
and few in the upper value cells (right side). A skewed distribution can result when
data is gathered from a system with has a boundary such as zero. In other words, all
the collected data has values greater than zero.
36
v) Uniform
A uniform distribution, as shown below, provides little information about the
system. It may describe a distribution which has several modes (peaks). If your
histogram has this shape, check to see if several sources of variation have been
combined. If so, analyze them separately. If multiple sources of variation do not
seem to be the cause of this pattern, different groupings can be tried to see if a more
useful pattern results. This could be as simple as changing the starting and ending
points of the cells, or changing the number of cells. A uniform distribution often
means that the number of classes is too small.
vi) Random
A random distribution, as shown below, has no apparent pattern. Like the uniform
distribution, it may describe a distribution that has several modes (peaks). If your
histogram has this shape, check to see if several sources of variation have been
combined. If so, analyze them separately. If multiple sources of variation do not
seem to be the cause of this pattern, different groupings can be tried to see if a more
useful pattern results. This could be as simple as changing the starting and ending
points of the cells, or changing the number of cells. A random distribution often
means there are too many classes.
Source: http://www.pqsystems.com/qualityadvisor/DataAnalysisTools/histogram.php
37
3.4 Frequency Polygon
The frequency polygon is as graph that displays data by using lines that connect points
plotted for the frequencies at the midpoint of the classes. This graph is useful for
understanding the shape of distribution. They are good choice for displaying cumulative
frequency distribution.
There are two methods for of drawing a cumulative frequency curve or ogive.
i) The less than method
In this method a frequency distribution is prepared which gives the number of items
that are less than a certain size. It gives a series which is cumulatively upward.
ii) The greater than method
In this method a frequency distribution is prepared that gives the number of items
that exceed a certain size and gives a series which is cumulatively downward.
Example
Marks of 30 students of a class, obtained in a test out of 75, are given below: 42, 21, 50, 37,
38, 42, 49, 52, 38, 53, 57, 47, 29, 59, 61, 33, 17, 17, 39, 44, 42, 39, 14, 7, 27, 19, 54, 51.
Total 30
38
3.6 Scatter Plot
A scatter plot is used to plot data in XY- plane to show how much one variable or data set
is affected by another. It has points that show the relationship between two variables or
two sets of data. These points are sometimes called markers and position of these points
depends on the values in the columns sets on the XY axis. Scatter plot gives good visual
picture of the relationship or association between two variables or data sets, and aids to
interpretation of the correlation coefficient or regression model.
The relationship between two data sets or variables is called correlation. If the markers
are close together and make a straight line in the scatter plot, the two variables of data
sets have high correlation. If the markers are equally distributed in the scatter plot, the
correlation is low, or zero.
Correlation may be positive or negative. Correlation is positive when the values increase
together, i.e. if one value increases the other will also increase or if once value decreases
the other will also decrease. On the other hand, correlation is negative when one value
increases the other decreases, and vice versa.
39
Name of Student GPA
A 2.0
B 2.21
C 2.38
D 2.32
E 2.11
F 3.01
G 3.92
H 3.11
I 3.25
J 3.60
K 2.97
L 3.11
M 3.34
N 3.96
O 3.69
P 2.99
Q 2.94
R 3.41
S 3.90
Example
GPA
4.5
4
3.5
3
2.5
2 GPA
1.5
1
0.5
0
0 5 10 15 20
40
Example
120
100
80
Achievement
60
Motivation
40 Anxiety
20
0
0 5 10 15 20
41
3.7 Box Plot
The box plot is an exploratory graph. It is a standardized way of displaying the
distribution of data based on the five summary statistics: minimum, first quartile, median,
third quartile, and maximum. First and third quartile is called two hinges, first quartile is
the lower hinge and the third quartile is the upper hinge. Minimum and the maximum are
two whiskers. Minimum is the lower whisker and the maximum is the upper whisker. In
other words we can say that box plot visualizes five summary statistics: the median, two
hinges and two whiskers.
In the simplest box plot the central triangle spans the first quartile to the third quartile
(inter quartile range IQR). A segment inside the rectangle shows the median and whiskers
above and below the box show the locations of the minimum and maximum.
Box plot is useful for identifying outliers and for comparing distributions. In other words
we can say that box plot gives us information about the location and variation in the data
set. Particularly it helps us in detecting and illustrating location and variation changes
between different groups of data.
A box-plot can tell whether a data set is symmetric (when the median is in the center of
the box), but it can’t tell the shape of the symmetry the way a histogram can.
42
comparison among various segments of data. When items are presented on a pie chart, it
is easy to see which item has maximum frequency and which is not or which item is the
most popular and which is not. The main purpose of using a pie chart is to show part-
whole relationship. These charts are used for displaying data that are classified into
nominal or ordinal categories.
Scores
1st Term
2nd Term
iii) Do you want to compare the parts to each other or the parts to the whole?
If the main purpose is to show part-whole relationship then pie chart is useful but if
the main purpose is to show part-part relationship then pie chart is useless and wise
to use another chart.
43
Column1
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
5th Qtr
6th Qtr
Source: https://eagereyes.org/techniques/pie-charts
44
3.9 Self Assessment Questions
Q. 1 What is a bar chart?
Q. 2 For what purpose bar carts are used?
Q. 3 What type of characteristics a pictogram should have to successfully
convey the meaning?
Q. 4 Write down the advantages and drawbacks of using pictograms.
Q. 5 What is a histogram?
Q. 6 Draw a bell-shaped histogram.
Q. 7 Write down the methods for drawing cumulative frequency polygon.
Q. 8 Write down the rationale for using scatter plot.
Q. 9 Write down any four questions that can be answered using scatter plot.
Q. 10 Write down the types of box plot.
Q. 11 What is a pie-chart?
Q. 12 Write down the criteria to determine whether pie-chart is a right choice.
3.10 Activities
1. Make a list of advantages and disadvantages of bar chart.
2. Make a list of advantages and disadvantages of pictogram.
3. Make a list of the situations that provide rationale to use scatter plot.
4. Make a pie chart that shows the drawback of pie chart.
45
3.11 Bibliography
Gravetter, F. J., & Wallnau, L. B. (2002). Essentials of Statistics for the Behavioral
Sciences (4th Ed.). Wadsworth, California, USA.
https://eagereyes.org/techniques/pie-charts
http://www.pqsystems.com/qualityadvisor/DataAnalysisTools/histogram.php
46