Summarizing Data-Measures of Dispersion
Summarizing Data-Measures of Dispersion
Dispersion
Prepared by:
Dr Vijendra Singh
Department of Informatics
School of Computer Science, UPES
Numerically Summarizing Data
Dispersion
Measures of Variability
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Interquartile Range
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Variance
s s2 2
for a for a
sample population
Coefficient of Variation
Distribution Shape
z-Scores
Detecting Outliers
Distribution Shape: Skewness
An important measure of the shape of a distribution
is called skewness.
The formula for computing skewness for a data set is
somewhat complex.
Skewness can be easily computed using statistical
software.
Skewness a bit of history
___________________________________________________________________________________
Skweness:
T
x x
3 adjusted Fisher-Pearson
i
T standardised moment
sk i 1
T 1T 2 3 coefficient
where Q1 is the lower quartile, Q3 is the upper quartile, and Q2 is the median.
Skewness Example
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
2rd Quartile (Q2) = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Skewness Example
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
2rd Quartile (Q2) = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Skewness = .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.
Skewness = .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
59
z-Scores
xi x
zi
s
Population Z - score
Sample Z - score
z-Scores
xi x 425 490.80
z 1.20
s 54.74
Lisa Leslie
(77-63.7)/2.7 =4.92592593
Because O’Neal Z-Score > Lisa ‘s Z-Score,
We say O’Neal is in a higher position than Lisa in their Goups.
Empirical Rule
For data having a bell-shaped distribution:
71
(a) Compute the population mean and standard
deviation.
(b) Draw a histogram to verify the data is bell-
shaped.
(c) Determine the percentage of patients that have
serum HDL within 3 standard deviations of the
mean according to the Empirical Rule.
(d) Determine the percentage of patients that have
serum HDL between 34 and 80.8 according to the
Empirical Rule.
(e) Determine the actual percentage of patients
that have serum HDL between 34 and 80.8.
72
(a) Using a TI83 plus graphing calculator, we find
73
m 57.4 and 11.7
74
m 57.4 and 11.7
(d) Because 33.8 is 2 standard deviations below the
mean (57.4 - 2(11.7) = 34) and 81 is 2 standard
deviations above the mean (57.4 + 2(11.7) = 80.8),
the Empirical Rule states that approximately 95% of
the data will lie between 34 and 80.8.
(e) There are no observations below 34. There are
2 observations greater than 80.8. Therefore, 52/54
= 96.3% of the data lie between 34 and 80.8.
75
Detecting Outliers