0% found this document useful (0 votes)
179 views

Unit 3 Statistical Graphics

This document provides an introduction to statistical graphics and exploratory data analysis. It discusses several types of graphs used to visually represent data, including bar charts, pictograms, histograms, frequency polygons, scatter plots, box plots, and pie charts. For each graph type, the document outlines how to construct the graph, what it is used for, advantages and disadvantages. Examples are provided to illustrate bar charts, pictograms, and histograms. The objectives are to familiarize the reader with these common graphical representations of data.

Uploaded by

HafizAhmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Unit 3 Statistical Graphics

This document provides an introduction to statistical graphics and exploratory data analysis. It discusses several types of graphs used to visually represent data, including bar charts, pictograms, histograms, frequency polygons, scatter plots, box plots, and pie charts. For each graph type, the document outlines how to construct the graph, what it is used for, advantages and disadvantages. Examples are provided to illustrate bar charts, pictograms, and histograms. The objectives are to familiarize the reader with these common graphical representations of data.

Uploaded by

HafizAhmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-3

STATISTICAL GRAPHICS /
EXPLORATORY DATA ANALYSIS

Written By:
Miss Sumbal Asghar

Reviewed By:
Dr. Rizwan Akram Rana
Introduction
Graphical representation of data is for the purpose of easier interpretation. Facts and
figures as such do not catch our attention unless they are presented in an interesting way.
Graphical representation of data is the most commonly used interesting modes of
presentation. The purpose of this unit is to make you familiar with this interesting mode
of presentation.

Objectives
After reading this unit, you will be able to explain:
1. Bar Chart
2. Pictograms
3. Histogram
4. Frequency Polygon or Ogive
5. Scatter Plot
6. Box Plot
7. Pie Chart

3.1 Bar Chart


Bar charts are one of the most commonly used graphical representations of data used to
visually display compare values to each other. They are easy to create and interpret. They
are also flexible and have several variations of standard bar charts including vertical or
horizontal bar charts, component or grouped charts, and stacker bar charts.

Data for a bar chart are entered in columns. Each numeric data value becomes a bar. The
chart is constructed such that lengths of the different bars are proportional to the size of
the category they represent. X-axis represents the different categories and has no scale;
the y-axis does have a scale and indicates the units of measurement, in case of vertical
bar charts, and vice versa in case of horizontal bar charts.

In the following figure result of first, second and third term of a student in the subjects of
English, Urdu, Mathematics and Pak-Studies.

30
120

100

80

1st Term
60
2ns Term

40 3rd Term

20

0
English Urdu Maths Pak Studies

Fig 1: Vertical bar chart

Bar chart can also be represented in horizontal form.

Pak Studies

Maths

3rd Term
2nd Term
Urdu First Term

English

0 20 40 60 80 100 120

Fig 2: Horizontal bar chart

31
3.1.1 Advantages and Disadvantages of Bar Charts
Following are the advantages of bar charts.
i) They show data category in a frequency distribution.
ii) They display relative numbers / proportions of multiple categories.
iii) They summarize a large amount of data in an easily interpretable manner.
iv) They make trends easier to highlight than tables do.
v) By bar charts estimates can be made quickly and accurately.
vi) They are easily accessible to everyone.

Following are the disadvantages of bar charts.


i) They often require additional explanation.
ii) Thy fail to expose key assumptions, causes, impacts and patterns
iii) T hey can be manipulated to give false impressions.

3.2 Pictograms
A pictogram is a graphical symbol that conveys its meaning through its pictorial
resemblance to a physical object. A pictogram may include a symbol plus graphic
elements such as border, back pattern, or color that is intended to covey specific
information s. we can also say that a pictogram is a kind of graph that uses pictures
instead of bars to represent data under analysis. A pictogram is also called “pictograph”,
or simply “picto”.

A pictogram or pictograph represents the frequency of data as pictures of symbols. Each


picture or symbols may represent one or more units of data.

Pictograms form a part of our daily lives. They are used in transport, medication,
education, computers etc. they indicate, in iconic form, places, directions, actions or
constraints on actions in either the real world (a road, a town, etc) or in virtual world
(computer, internet etc.).

To successfully convey the meaning, a pictogram:


i) Should be self-explanatory.
ii) Should be recognizable by all people.
iii) Must represent a general concept.
iv) Should be clear concise and interesting.
v) Should be identifiable as a set, through uniform treatment of scale, style and
subject.
vi) Should be highly visible, easy to reproduce in any scale and in positive or negative
form.
vii) Should not be dependent upon a border and should work equally well in positive or
negative form.
viii) Should avoid stylistic fads or a commercial appearance and should imply to wide
audience that has a sophisticated, creative culture.
ix) Should be attractive when used with their design, elements and typestyles.

32
3.2.1 Advantages and Drawbacks of Pictograms
Following are the advantages of pictograms:
i) Pictograms can make warnings more eye-catching.
ii) They can serve as an “instant reminder” of a hazard or an established message.
iii) They may improve warning comprehension for those with visual or literacy
difficulties.
iv) They have the potential to be interpreted more accurately and more quickly than words.
v) They can be recognized and recalled far better than words.
vi) They can improve the legibility of warnings.
vii) They may be better when undertaking familiar routine tasks.

There are a number of disadvantages of relying on pictograms.


i) Very few pictograms are universally understood.
ii) Even well understood pictograms will not be interpreted equally by all groups of
peoples and across all cultures, and it takes years for any pictogram to reach
maximum effectiveness.
iii) They have the potential for interpreting the opposite or often undesired meaning
which can create additional confusion.

Example
The following table shows the number of laptops sold by a company for the months
January to March. Construct a pictograph for the table.
Month January February March
Number of laptops 25 15 20
Solution:
January

February

March

 represents 5 laptops

33
Example
School Subject pictogram

Source: www.kids-pages.com

3.3 Histogram
A histogram is a type of graph that provides a visual interpretation of numerical data by
indicating the number of data points that lie within the range value. These range values
are called classes or bins.

A histogram looks similar to bar charts. Both are ways to display data set. The height of
the bar corresponds to the relative frequency of the amount of data in the class. The
higher the bar is, the greater the frequency of the data will bean vice versa. The main
difference between these graphs is the level of measurement of the data. Bar graphs are
used for data at nominal level of measurement. It measures the frequency of categorical
data. On the other hand histograms are used for data that is at least ordinal level of
measurement. As a common practice the bars of bar graph are rearranged in order for
decreasing height. However the bars of cannot be rearranged. They must be displayed in
order that the classes occur.

34
A bar graph presents actual counts against categories. The height of the bar indicates the
number of items in that category. A histogram displays the same categorical variables in
bins. While creating a histogram, you are actually creating a bar graph that shows how
many data points are there within the range (an interval), called a bin.

There are no hard and fast rules about how many bins there should be. But the rule of
thumb is 5-20 bins. Less than 5 bins will have little meaning and more than 20 bins, will
make data hard to read and interpret. Ideally 5-7 bins are enough.

3.3.1 Shapes of Histogram


Histogram may be of different shapes. Following are some of the shapes.
i) Bell-shaped
A bell-shaped picture, shown below, usually presents a normal distribution.

ii) Bimodal
A bimodal shape, shown below, has two peaks. This shape may show that the data
has come from two different systems. Often in a single system, there may be two
modes in the data set.

35
iii) Skewed right
Some histograms will show a skewed distribution to the right, as shown below. A
distribution skewed to the right is said to be positively skewed. This kind of
distribution has a large number of occurrences in the lower value cells (left side)
and few in the upper value cells (right side). A skewed distribution can result when
data is gathered from a system with has a boundary such as zero. In other words, all
the collected data has values greater than zero.

iv) Skewed left


Some histograms will show a skewed distribution to the left, as shown below. A
distribution skewed to the left is said to be negatively skewed. This kind of
distribution has a large number of occurrences in the upper value cells (right side)
and few in the lower value cells (left side). A skewed distribution can result when
data is gathered from a system with a boundary such as 100. In other words, all the
collected data has values less than 100.

36
v) Uniform
A uniform distribution, as shown below, provides little information about the
system. It may describe a distribution which has several modes (peaks). If your
histogram has this shape, check to see if several sources of variation have been
combined. If so, analyze them separately. If multiple sources of variation do not
seem to be the cause of this pattern, different groupings can be tried to see if a more
useful pattern results. This could be as simple as changing the starting and ending
points of the cells, or changing the number of cells. A uniform distribution often
means that the number of classes is too small.

vi) Random
A random distribution, as shown below, has no apparent pattern. Like the uniform
distribution, it may describe a distribution that has several modes (peaks). If your
histogram has this shape, check to see if several sources of variation have been
combined. If so, analyze them separately. If multiple sources of variation do not
seem to be the cause of this pattern, different groupings can be tried to see if a more
useful pattern results. This could be as simple as changing the starting and ending
points of the cells, or changing the number of cells. A random distribution often
means there are too many classes.

Source: http://www.pqsystems.com/qualityadvisor/DataAnalysisTools/histogram.php

37
3.4 Frequency Polygon
The frequency polygon is as graph that displays data by using lines that connect points
plotted for the frequencies at the midpoint of the classes. This graph is useful for
understanding the shape of distribution. They are good choice for displaying cumulative
frequency distribution.

A frequency polygon is similar to histogram. The difference is that histogram tends to be


rectangles while a frequency polygon resembles a line graph.

3.5 Cumulative Frequency Polygon or Ogive


The cumulative frequency is the sum of the frequencies accumulated up to the upper
boundary of a class in the distribution. A graph that can be used to represent the
cumulative frequencies for the classes is called cumulative frequency graph or ogive.

An ogive is drawn on the basis of cumulative frequency. To construct cumulative


frequency, first we have to form cumulative frequency table. The upper limits of the
classes are taken on the x-axis and the cumulative frequencies on the y-axis and the
points are plotted.

There are two methods for of drawing a cumulative frequency curve or ogive.
i) The less than method
In this method a frequency distribution is prepared which gives the number of items
that are less than a certain size. It gives a series which is cumulatively upward.
ii) The greater than method
In this method a frequency distribution is prepared that gives the number of items
that exceed a certain size and gives a series which is cumulatively downward.

Example
Marks of 30 students of a class, obtained in a test out of 75, are given below: 42, 21, 50, 37,
38, 42, 49, 52, 38, 53, 57, 47, 29, 59, 61, 33, 17, 17, 39, 44, 42, 39, 14, 7, 27, 19, 54, 51.

Classes Frequency Cumulative Frequency


Less Than Greater Than
0-10 1 1 29 + 1 = 30
10-20 4 1 +4=5 22 + 7 = 29
20-30 3 5+3=8 15 + 7 = 22
30-40 7 8 + 7 = 15 8 + 7 = 15
40-50 7 15 + 7 = 22 5+3=8
50-60 7 22 + 7 = 29 1 +4=5
60-70 1 29 + 1 = 30 1

Total 30

38
3.6 Scatter Plot
A scatter plot is used to plot data in XY- plane to show how much one variable or data set
is affected by another. It has points that show the relationship between two variables or
two sets of data. These points are sometimes called markers and position of these points
depends on the values in the columns sets on the XY axis. Scatter plot gives good visual
picture of the relationship or association between two variables or data sets, and aids to
interpretation of the correlation coefficient or regression model.

The relationship between two data sets or variables is called correlation. If the markers
are close together and make a straight line in the scatter plot, the two variables of data
sets have high correlation. If the markers are equally distributed in the scatter plot, the
correlation is low, or zero.

Correlation may be positive or negative. Correlation is positive when the values increase
together, i.e. if one value increases the other will also increase or if once value decreases
the other will also decrease. On the other hand, correlation is negative when one value
increases the other decreases, and vice versa.

Scatter plot provides answers of the following questions.


i) Are variables X and Y or two data sets related?
ii) Are variables X and Y or two data sets linearly related?
iii) Are variables X and Y or two data sets non-linearly related?
iv) Does the variation Y or one data set change depending on X or other data set?
v) Are there outliers?

3.6.1 When to Use Scatter Plot?


Following situations provide a rationale to use a scatter plot.
i) When there is paired numerical data.
ii) When the dependent variable have multiple values for each value of independent
variable.
iii) When the researcher tries to determine whether the two variables are related, such as:
a) When trying to identify potential root causes of the problems.
b) To determine objectively whether a particular cause and effect are related.
c) When determining whether two effects those appear to be related both occur
with the same cause.
d) When testing for autocorrelation before constructing a control.

39
Name of Student GPA
A 2.0
B 2.21
C 2.38
D 2.32
E 2.11
F 3.01
G 3.92
H 3.11
I 3.25
J 3.60
K 2.97
L 3.11
M 3.34
N 3.96
O 3.69
P 2.99
Q 2.94
R 3.41
S 3.90

Example

GPA
4.5
4
3.5
3
2.5
2 GPA
1.5
1
0.5
0
0 5 10 15 20

40
Example

Name of Student Achievement Motivation Anxiety


A 95 50 15
B 96 84 54
C 65 46 25
D 59 33 36
E 68 24 56
F 84 86 54
G 59 90 58
H 74 14 47
I 58 66 56
J 59 71 59
K 68 56 68
L 59 71 84
M 62 79 59
N 35 82 62
O 48 80 10
P 57 69 15
Q 96 64 59
R 58 86 67
S 86 90 68

120

100

80

Achievement
60
Motivation

40 Anxiety

20

0
0 5 10 15 20

41
3.7 Box Plot
The box plot is an exploratory graph. It is a standardized way of displaying the
distribution of data based on the five summary statistics: minimum, first quartile, median,
third quartile, and maximum. First and third quartile is called two hinges, first quartile is
the lower hinge and the third quartile is the upper hinge. Minimum and the maximum are
two whiskers. Minimum is the lower whisker and the maximum is the upper whisker. In
other words we can say that box plot visualizes five summary statistics: the median, two
hinges and two whiskers.

In the simplest box plot the central triangle spans the first quartile to the third quartile
(inter quartile range IQR). A segment inside the rectangle shows the median and whiskers
above and below the box show the locations of the minimum and maximum.

Box plot is useful for identifying outliers and for comparing distributions. In other words
we can say that box plot gives us information about the location and variation in the data
set. Particularly it helps us in detecting and illustrating location and variation changes
between different groups of data.

3.7.1 Types of Box Plot


Commonly used types of box plot are single box plot and multiple box plot.

Single box plot


A single box plot can be drawn for one set of data with no distinct groups. In such a plot
the width of the box is arbitrary.

Multiple box lot


Multiple box plots can be drawn together to compare multiple data sets or to compare
groups in a single data set. In such a plot the width of the box plot can be set proportional
to the number of points in the given group or sample.

The box plot provides answers to the following questions.


i) Is a factor significant?
ii) Does the location differ between subgroups or between different data sets?
iii) Does the variation differ between subgroups or between different data sets?
iv) Are there any outliers?

A box-plot can tell whether a data set is symmetric (when the median is in the center of
the box), but it can’t tell the shape of the symmetry the way a histogram can.

3.8 Pie Chart


A pie chart displays data in an easy pie-slice format with varying sizes. The size of a slice
tells how much data exists in one element. The bigger the slice, the more of that
particular data was gathered and vice versa. Pie charts are mainly used to show

42
comparison among various segments of data. When items are presented on a pie chart, it
is easy to see which item has maximum frequency and which is not or which item is the
most popular and which is not. The main purpose of using a pie chart is to show part-
whole relationship. These charts are used for displaying data that are classified into
nominal or ordinal categories.

Scores
1st Term
2nd Term

3.8.1 How to Read a Pie Chart?


It is easy to read and interpret a pie-chart. Usually, a pie-chart has several bits of data,
and each is pictured on a pie-chart as a pie slice. Some data have larger slices than others.
So it is easy to decide which data have maximum frequency and which have minimum.

3.8.2 When to Use the Pie Chart?


There are some simple criteria that can be used to determine whether a pie chart is right
choice or not for a given data.

i) Do the parts make up a meaningful whole?


Pie charts should be used only if parts or slices can define the entire set of data in a
way that makes a meaningful sense to the viewer.

ii) Are the parts mutually exclusive?


If there is overlap between the parts, it is better to use any other chart.

iii) Do you want to compare the parts to each other or the parts to the whole?
If the main purpose is to show part-whole relationship then pie chart is useful but if
the main purpose is to show part-part relationship then pie chart is useless and wise
to use another chart.

iv) How many parts do you have?


If there are more than five to seven parts it advisable to use a different chart. Pie
charts with lots of slices of varying size are hard to read.

3.8.3 Draw Backs of Pie-Charts


There are two features that help us read the values on a pie chart: the angle a slice covers
(compared to the entire circle) and the area of slice (compared to the entire circle).
Generally, we are not very good at measuring angles. We only recognize angles of 90o
and 180o with high degree of precision. Other angles are rather impossible to perceive
with a high degree of precision. Look upon following two pie-graphs. In the first, which
quarter is larger and which is smaller? And what information can we get from the second
graph?

43
Column1
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
5th Qtr
6th Qtr

Source: https://eagereyes.org/techniques/pie-charts

44
3.9 Self Assessment Questions
Q. 1 What is a bar chart?
Q. 2 For what purpose bar carts are used?
Q. 3 What type of characteristics a pictogram should have to successfully
convey the meaning?
Q. 4 Write down the advantages and drawbacks of using pictograms.
Q. 5 What is a histogram?
Q. 6 Draw a bell-shaped histogram.
Q. 7 Write down the methods for drawing cumulative frequency polygon.
Q. 8 Write down the rationale for using scatter plot.
Q. 9 Write down any four questions that can be answered using scatter plot.
Q. 10 Write down the types of box plot.
Q. 11 What is a pie-chart?
Q. 12 Write down the criteria to determine whether pie-chart is a right choice.

3.10 Activities
1. Make a list of advantages and disadvantages of bar chart.
2. Make a list of advantages and disadvantages of pictogram.
3. Make a list of the situations that provide rationale to use scatter plot.
4. Make a pie chart that shows the drawback of pie chart.

45
3.11 Bibliography
Gravetter, F. J., & Wallnau, L. B. (2002). Essentials of Statistics for the Behavioral
Sciences (4th Ed.). Wadsworth, California, USA.

https://eagereyes.org/techniques/pie-charts

http://www.pqsystems.com/qualityadvisor/DataAnalysisTools/histogram.php

46

You might also like