Lecture 02
Lecture 02
Lecture 02:
Tabular and Graphical Presentation
of Data and Measures of Locations
04/02/25 1
Presentation of Qualitative Variables
• The simplest way of presenting/summarizing a qualitative
variable is by using a frequency table, which shows the
frequency of occurrence of each of the different categories.
04/02/25 2
An Example
• A manufacturer of jeans has plants in California (CA),
Arizona (AZ), and Texas (TX). A sample of 25 pairs of
jeans was randomly selected from a computerized
database, and the state in which each was produced was
recorded. The data are as follows:
• CA AZ AZ TX CA CA CA TX TX TX AZ AZ CA AZ TX
CA AZ TX TX TX CA AZ AZ CA CA
04/02/25 4
The Bar Chart
Frequency
10
0
CA AZ TX
04/02/25 5
Example … continued
04/02/25 6
Pie Diagram
CA
Angles (in degrees):
CA=(360)(.36)=129.6
TX 129.6o
115.2o AZ=(360)(.32)=115.2
115.2o
TX=(360)(.32)=115.2
AZ
04/02/25 7
Pie Chart from Minitab
Pie Chart of Place
AZ (8, 32.0%)
CA (9, 36.0%)
TX (8, 32.0%)
04/02/25 8
Presentation of Quantitative Variables
• When the quantitative variable is discrete (such as counts),
a frequency table and a bar graph could also be used for
summarizing it.
• Only difference is that the values of the variables could not
be reshuffled in the graph, in contrast to when the variable
is categorical or qualitative.
• For example suppose that we asked a sample 20 students
about the number of siblings in their family. The sample
data might be:
• 4, 1, 6, 2, 2, 3, 4, 1, 2, 2, 3, 7, 2, 1, 1, 5, 3, 4, 6, 3
04/02/25 9
Its Bar Graph is
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
1 2 3 4 5 6 7
04/02/25 10
An Example of a Real Data Set: Poverty versus PACT in SC
73 30 41 74 48 54
Lunch ActualLang ActualMath
31 24 30 77 43 55
59 32 38
75 45 57 94 41 62
46 26 30
57 29 40 88 49 62
90 63 67
80 51 63 78 50 59
29 17 24
54 30 44 79 46 58
41 24 26
67 28 33 61 41 47
51 30 41
76 45 50 45 26 34
41 25 30
87 61 61 87 49 62
43 32 36
54 27 33 68 36 52
70 33 36
60 32 41 76 45 56
93 50 66
35 26 35 32 22 31
84 50 66
51 29 36 63 39 53
64 27 32
50 35 42 33 20 26
52 36 43
43 23 26 64 44 53
50 31 43
66 32 44 39 20 22
53 28 35
86 63 75 37 21 27
78 36 41
54 25 33 47 23 30
57 31 42
87 60 69 40 29 41
51 39 42
49 29 37 43 25 27
55 41 53
46 38 43 37 24 31
60 37 45
50 38 44 64 37 43
96 46 66
57 40 50 59 36 45
75 34 45
90 60 75 70 32 41
60 29 36
26 17 20 55 37 46
71 43 53
47 23 27 90 38 47
68 42 51
53 37 39 45 32 35
76 47 52
58 34 43 31 25 24
82 49 55
16 13 15 35 29 32
15 14 18
04/02/25 11
Frequency Tables and Histograms
Consider the variable “Lunch,” which represents the
percentage of students in the school district whose
lunches are not free. The higher the value of this variable,
the richer the district.
n = Number of Observations = 86
LV = Lowest Value = 15
HV = Highest Value = 96
04/02/25 12
Frequency Table for Variable “Lunch”
04/02/25 13
Frequency Histogram
20
Frequency
10
10 20 30 40 50 60 70 80 90 100
Lunch
04/02/25 14
Stem-and-Leaf Plots
• An important tool for presenting quantitative data when
the sample size is not too large is via a stem-and-leaf plot.
• By using this method, there is usually no loss of
information in that the exact values of the observations
could be recovered (in contrast to a frequency table for
continuous data).
• Basic idea: To divide each observation into a stem and a
leaf.
• The stems will serve as the ‘body of the plant’ while the
leaves will serve as the ‘branches or leaves’ of the plant.
• An illustration makes the idea transparent.
04/02/25 15
An Example
• A random sample of 30 subjects from the 1910 subjects in
the blood pressure data set was selected. We present here
the systolic blood pressures of these 30 subjects.
04/02/25 17
Stem-and-Leaf … continued
• In this stem-and-leaf plot, because there will only be 5
stems if we use 9, 10, 11, 12, 13, we decided to subdivide
each stem into two parts corresponding to leaf values <= 4,
and those >= 5.
• Such a procedure usually produces better looking
distributions.
• Looking at this stem-and-leaf plot, notice that many of the
observations are in the range of 100-126.
• The exact values could be recovered from this plot.
• By arranging the leaves in ascending order, the plot also
becomes more informative.
04/02/25 18
Comparative Stem-and-Leaf Plots
• When comparing the distributions of two groups (e.g.,
when classified according to GENDER), side-by-side
stem-and-leaf plots (also side-by-side histograms) could be
used.
• To illustrate, consider 30 observations from the blood
pressure data set with Gender and Systolic Blood Pressure
being the observed variables.
• For the males (Sex = 0): 122, 120, 130, 110, 134, 136, 142,
100, 120, 162, 126, 132, 124, 130
• For the females (Sex = 1): 132, 94, 104, 100, 130, 110,
102, 110, 130, 92, 125, 108, 100, 130, 100, 100
04/02/25 19
Comparing Male/Female Systolic Blood
Pressures
Females Male
Leaves Stem Leaves
24 09
0000248 10 0
00 11 0
5 12 00246
0002 13 00246
14 2
15
16 2
04/02/25 20
Scatterplots:
Studying Relationship Between Poverty and Math
80
70
ActualMath
60
50
40
30
20
10
10 20 30 40 50 60 70 80 90 100
Lunch
• Overview
• Why do we need numerical summary
measures?
• Measures of Location
• Measures of Variation
• Measures of Position
• Box Plots
04/02/25 22
Why we Need Summary Measures?
• “A picture is worth a thousand words, but beauty is
always in the eyes of the beholder!”
• Graphs or pictures sometimes unwieldy
• Usually wants a small set of numbers that could
provide the important features of the data set
• When making decisions, objectivity is enhanced
when they are based on numbers!
• Numerical summaries and tabular/graphical
presentations complement each other
04/02/25 23
The Setting
• In defining and illustrating our summary
measures, assume that we have sample data
• Sample Data: X1, X2, X3, …, Xn
• Sample Size: n
• These summary measures are thus (sample)
statistics.
• If instead they are based on the population values,
they will be (population) parameters.
04/02/25 24
Measures of Location or Center
• These are summary measures that provide
information on the “center” of the data set
• Usually, these measures of location are where the
observations cluster, but not always
• In layman’s terms, these measures are what we
associate with “averages”
• Will discuss two measures: sample mean and
sample median
04/02/25 25
Sample Mean or Arithmetic Average
• The sample mean equals the sum of the
observations divided by the number of
observations.
• It is defined symbolically via
n
1 1
X X i X 1 X 2 X n
n i 1 n
04/02/25 26
Properties of the Sample Mean
• “Center of Gravity”
• Sum of the deviations of the observations from the
mean is always zero (barring rounding errors)
• Sample mean could however be affected
drastically by extreme or outliers
• The sample mean is very conducive to
mathematical analysis compared to other measures
of location
04/02/25 27
Illustration
• Consider the systolic blood pressure data set
considered in Lecture 01
• Sample Size = n = 30
• Data: 122, 135, 110, 126, 100, 110, 110, 126, 94,
124, 108, 110, 92, 98, 118, 110, 102, 108, 126,
104, 110, 120, 110, 118, 100, 110, 120, 100, 120,
92
04/02/25 28
Sample Mean Computation
30
X
i 1
i 122 135 92 3333
3333
X 111 .1
30
• This value of 111.1 could be interpreted as the
balancing point of the 30 systolic blood pressure
observations.
• Locating this in the histogram we have:
04/02/25 29
Relative Frequency (in %)
Sample Mean in Histogram
30
20
10
04/02/25 30
Sample Median
• Sample median (M) = value that divides the
arranged/ordered data set into two equal parts.
• At least 50% are <= M and at least 50% are >= M
• Not sensitive to outliers but harder to deal with
mathematically
• Appropriate when histogram is left or right-skewed
• Better to present both mean and median in practice
04/02/25 31
Illustration of Computation of Median
• Consider again the blood pressure data earlier.
• n=30: an even number.
• Median will be the average of the 15th and 16th
observations in arranged data.
• Arranged data: 92, 92, 94, 98, 100, 100, 100, 102,
104, 108, 108, 110, 110, 110, 110, 110, 110, 110,
110, 118, 118, 120, 120, 120, 122, 124, 126, 126,
126, 135
04/02/25 32
Continued ...
04/02/25 33
Relative Positions of Mean and Median
• For symmetric distributions, the mean and the
median coincide.
• For right-skewed distributions, the mean tends to be
larger than the median (mean pulled up by the large
extreme values)
• For left-skewed distributions, the mean tends to be
smaller than the median (mean pulled down by the
small extreme values)
04/02/25 34