Chap 3. Data presentation
Chap 3. Data presentation
Total 18
5
Frequency distribution for continuous variables
• Frequency distributions present data in a relatively compact
form, gives a good overall picture, and contain information that
is adequate for many purposes, but there are usually some
things which can be determined only from the original data.
٪ X
[40-50[ 45 870 14.0 90.9
[50-60[ 55 399 6.4 97.4
[60-70[ 65 127 2.0 99.4
[70-80[ (75) 37 0.6 100.0
6209 100 10
Frequency Absolute/relative frequency
Relative
Frequency
Absolute
2255 or 36.3%
2090 or 33.7%
Histogram
TOTAL :
40,3% 2.500 6209 or
100%
32,2% 2.000
870 or 14%
24,2%
Frequency
1.500
1.000
120 or 1.9%
37 or 0.6%
127 or 2%
8,1% 500
Mean = 33,03
Std. Dev. = 12,348
0% 0 N = 6.209
0 10 20 30 40 50 60 70 80
11
Age
Density of relative frequency
Absolute relative Density of relative
Classes Mid-
frequencies frequencies (%) frequencies
points
ci xi ni
Relative Frequency/by
class interval, i.e 10
[10-20[ 15 120 1,9 0,19
[20-30[ 25 2255 36,3 3,63
[30-40[ 35 2090 33,7 3,37
[40-50[ 45 870 14,0 1,4
[50-60[ 55 399 6,4 0,64
[60-70[ 65 127 2,0 0,2
[70-80[ 75 37 0,6 0,06
6209 100 10
12
N.B.
When classes have the same intervals, we can use
• Absolute frequencies
• Relative frequencies
• Density of frequency
13
3.2. Graphs
• Frequency distributions can be often displayed effectively using
graphs or diagrams
• Diagrams give a very clear picture of data
• The relationship between numbers of various magnitudes can
usually be seen more quickly and easily from a graph than from a
table.
• They have greater attraction and facilitate comparison.
• But it is not to be used when comparison is either not possible or
is not necessary.
• Diagrammatic representation is not an alternative to tabulation.
• It can give only an approximate idea and as such where greater
accuracy is needed diagrams will not be suitable.
Histogram
• For quantitative continuous data.
• Put the observation in the ascending order
• Take a number of classes near to N tot
• Define classes [1-2] [3-4] or [1-2[ [2-3[...
• Calculate the frequency (absolute, relative, cumulative) or
the frequency density for each class
• Draw a rectangle for each class.
• The base of the rectangle= the interval of the class
• The height of each bar gives the frequency in each interval.
• The area of the rectangle is proportional (not necessarly
equal) to number of observations of that class
• The total area equals the 100% of all observations
15
frequencies
Density of relative
continuous variable
Histogram
2.500
3,5
3
2.000
2,5
Frequency
1.500
2
1.000
1,5
1
500
9 15 15 7 11 12 14 10 11 8
8 11 11 14 8 10 11 11 10 11
7 15 12 6 14 9 15 8 8 14
15 10 11 13 11 11 15 12 15 10
11 9 8 13 9 8 13 14 15 15
10 10 7 15 15 7 14 9 3 10
15 10 15 8 15 8 14 9 6 13
12 11 9 9 13 14 8 13 8 5
Make a table of 10 classes, with equivalent interval (0-2; 2-4; 4-6;…18-20) of absolutes ,
relatives and cumulatives frequencies and the density of relative frequencies.
18
Exercise n° 3
• The distance of each plotted point above the base-line indicates its
numerical value.
30
20
10
0
Single Married Divorced Widowed
Marital status
C. Component (or sub-divided) Bar
Diagram
• Bars are sub-divided into component parts of
the figure.
• These sorts of diagrams are constructed when
each total is built up from two or more
component figures.
Component bar diagram
4. Pie-chart
• For displaying the relative frequency distribution of
qualitative or quantitative discrete data
• it is a circle divided into sectors so that the areas of the
sectors are proportional to the frequencies.
3.3.Summarizing data
38
Example
Example 1: The systolic blood pressure of seven
patients were as follows:
151, 124, 132, 170, 146, 124 and 113.
x
151 124 132 170 146 124 113
The mean is 7
137.14
39
x
x The sum of
n
Example 2.
Marks out of
20 for 20
students
15 7 12 10 8
11 14 10 11 11
15 6 9 8 14
16 13 11 12 10
41
2. Median:
1 2 3 4 5
7 8 10 12 15
n=5
Position of the median = 3
Value of the median = 10
N.B.: Median= 50th percentile = P50
Example 1 – n is odd
The reordered systolic blood pressure data seen earlier are:
43
Example 2 – n is even
Six men with high cholesterol participated in a study to
investigate the effects of diet on cholesterol level. At the
beginning of the study, their cholesterol levels (mg/dL) were as
follows:
366, 327, 274, 292, 274 and 230.
Rearrange the data in numerical order as follows:
The Median is half way between the middle two readings, i.e.
(274+292) 2 = 283.
Two men have the same cholesterol level- the Mode is 274.
44
1. Mode: the value or class with the highest
frequency in the sample / population
Marks over 15 7 12 10 8
20 of 20 11 14 10 11 11
students 15 6 9 8 14
(QCM) 16 13 11 12 10
Mode = 11
If continuous variable : modal Class
45
Exemples of unimodal distributions (one mode)
46
Symetric Distribution and unimodal
Mean
median
47
Unimodal distribution with negative
skewness
mean median
48
Unimodal distribution with positive
Skewness
median mean
49
Skewness
• If extremely low or extremely high
observations are present in a distribution,
then the mean tends to shift towards those
scores.
• Based on the type of skewness, distributions
can be:
a) Negatively skewed distribution: occurs when
majority of scores are at the right end of the curve
and a few small scores are scattered at the left end.
b) Positively skewed distribution: Occurs when the
majority of scores are at the left end of the curve
and a few extreme large scores are scattered at the
right end.
c) Symmetrical distribution: It is neither positively nor
negativelyskewed. A curve is symmetrical if one half
of the curve is the mirror image of the other half.
c) Geometric mean
54
1.Range
1. Range:
The difference between the maximum and the
minimum value in the data set
Range = Max – Min
Eg. data: -4 -3 -1 1 3 5
Range = 5 – (-4) = 9
easy to calculate;
• useful for “best” or “worst” case scenarios
• sensitive to extreme values
55
2. Variance
2. Variance: the mean of squared deviations from
the mean N
Always
( x )²
i
Population : ² i 1 positive
N population size
n
( x x)² i
Sample : s ² i 1
(n 1)
Eg. data: 4 3 1 2 3 5
Mean: 18/6 = 3
Squared deviations from the mean: 1 0 4 1 0 4
Sum of Squared deviations from the mean : 10
Variance: S² = 10/5 = 2
56
3 . Standard Deviation
• The sample standard deviation, s, is the square-root
of the variance
n
xi x
2
i 1
s
n 1
57
Example
Data Deviation Deviation2
151 13.86 192.02
124 -13.14 172.73
132 -5.14 26.45
170 32.86 1079.59
146 8.86 78.45
124 -13.14 172.73
113 -24.14 582.88
Sum = 960.0 Sum = 0.00 Sum = 2304.86
x 137.14
58
Example (contd.)
7
x x
2
i 2304.86
i 1
Therefore, 2304.86
s
7 1
19.6
59
4. Coefficient of Variation
• In some cases the varaince of a variable changes with its mean
• The coefficient of variation (CV) or relative standard deviation (RSD) is a measure of relative
variability.
• It is a ratio of data dispersion (standard deviation) to the mean and shows the extend of
variability in relation to the mean
s
CV 100%
x
• The CV is not affected by multiplicative changes in scale
• Consequently, a useful way of comparing the dispersion of variables measured
independently to the unit in which the measurement was taken
• Generally small values of CV are considered best, since that means that the variability in
measurements is small relative to their mean (measurements are consistent in their
magnitudes).
• i,e the higher the CV the greater the dispersion 60
Example
The CV of the blood pressure data is:
19.6
CV 100 %
137.1
14.3%
61
5.Inter-quartile range
• The Median divides a distribution into two halves.
• The first and third quartiles (denoted Q1 and Q3) are defined
as follows:
– 25% of the data lie below Q1 (and 75% is above Q1),
– 25% of the data lie above Q3 (and 75% is below Q3)
Q1 Q2 Q3
64
Exercise
In one class, the notes (out of 20) obtained in biostatistics from a
sample of students are as follows:
65
Mean 14
Mode 14
Median 13.5
Variance 14.67
Std dev 3.83
Range 11
2.3.2. Box-plots
• A box-plot is a visual description of the
distribution based on
– Minimum
– Q1
– Median
– Q3
– Maximum
• Useful for comparing large sets of data
67
Building a box plot
1. Calculate important values
69
Example 1: Box-plot
70
Remarks
• The box is always limited by Q1 andQ3
• But the whiskers can represent several things according
different authors/programs
the minimum and the maximum
The low and high subsequent values
A standard deviation above and below the mean
http://en.wikipedia.org/wiki/Box_plot
71
QUIZ 1/ 2 marks