0% found this document useful (0 votes)
12 views

Chapter 4

1. This document discusses various measures of dispersion used to quantify how spread out or clustered data values are around a central point. 2. Absolute measures of dispersion like range, interquartile range, and standard deviation have units, while relative measures like coefficient of variation are unit-less. 3. Common measures include range, interquartile range, mean deviation, variance, and standard deviation. Boxplots provide a visual representation of a data set's spread and outliers.

Uploaded by

Javeria Naseem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Chapter 4

1. This document discusses various measures of dispersion used to quantify how spread out or clustered data values are around a central point. 2. Absolute measures of dispersion like range, interquartile range, and standard deviation have units, while relative measures like coefficient of variation are unit-less. 3. Common measures include range, interquartile range, mean deviation, variance, and standard deviation. Boxplots provide a visual representation of a data set's spread and outliers.

Uploaded by

Javeria Naseem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Chapter 4

Measures of Dispersion
Measures of Dispersion
By comparing two different data sets (for now, measured
in the same units, e.g. kg, km etc). By chance, it may
happened that the two data sets have the same means,
medians or modes.
Does it mean that the two data sets are the same or they
have the same features.
No.
here we need some extra insight into the data; as a first step,
we need to measure their respective dispersions or
variabilities about the center and then compare them.
What are Absolute Dispersion and Relative
Dispersion

 Absolute & relative dispersion are two different ways to measure


the spread of a data set. They are used extensively in biological
statistics, as biological phenomena almost always show some
variation and spread.
 The easiest way to differentiate relative dispersion/absolute
dispersion is to check whether your statistic involves units. Absolute
measures always have units, while relative measures do not.
Most commonly used absolute
measures of dispersions

1. Range,
2. Mid-range
3. Inter-quartile Range (also called the fourth-spread),
4. Semi-inter-quartile Range( or Quartile deviation)
5. Mean Deviation
6. Variance
7. Standard Deviation
1: Range

 The range R, is defined as:


The difference between the largest and smallest observations in
a set of data.
Symbolically, it is given by relation:
R = 𝑋𝑚 - 𝑋0
Where,
𝑋𝑚 stands for largest observation
𝑋0 stands for smallest observation
Mid-range

 It is just the average of two extreme values, i.e.

max−𝑣𝑎𝑙𝑢𝑒 + min−𝑣𝑎𝑙𝑢𝑒
 mid-range =
2

𝑋𝑚 + 𝑋0
 mid-range =
2
Inter-quartile Range

 The interquartile range is a measure of spread and is defined as :


the difference between the third and first quartile .

 it is denoted by IQR and symbolically;


IQR = Q3 −Q1
Quartile Deviation

 The interquartile range is a measure of spread and it is denoted by Q.D.


and symbolically;
Q3 −Q1
Q.D. = 2

It is also called Semi-Inter-quartile range (SIQR) because it is just the half of IQR.
Co-efficient of Quartile Deviation

 The pure measure (free of units of measurements) is the co-


efficient of quartile deviation .
 It is defined as
Q3 −Q1
 Co-efficient of Quartile Deviation =
Q3 + Q1

 This measure is free of measurements units and can be used


to compare two or more data with different units of
measurement.
Mean Deviation:

 Mean (or median) deviation (MD) or mean absolute deviation


(MAD) is also a measure of dispersion defined as

 the average of the absolute differences/deviations between the


data values and the data center (usually, mean or median).
 Mathematically, Using the mean as the data center,
Mean deviation from mean

For ungrouped data:


𝑥𝑖 − 𝑥
M.D.=
𝑛

For grouped data:


𝑓𝑖 𝑥𝑖 − 𝑥
M.D.=
𝑛
Mean deviation from median:
 For ungrouped data:
𝑥𝑖 −𝑥
M.D. (median)=
𝑛

For grouped data:


𝑓𝑖 𝑥𝑖 − 𝑥
M.D. (median)=
𝑛
Example
 Find the MD and MedD for the following simple data.
 65 55 89 56 35 14 56 55 87 45 92
Solution:

 Lets denote the data by X.


 What we need first, are the mean and median. The mean is
𝑥𝑖
 𝑥= = 65 + 55 + ... + 92 / 11
𝑛

 = 59

Since n is odd, the median is just the middle observation of the


ordered data,
14 35 45 55 55 56 56 65 87 89 92
hence median is 56.
𝑥𝑖 − 𝑥
 M.D. = 𝑛
= 17.6

𝑥𝑖 −𝑥
 M.D. (median)= 𝑛
= 16.8
Example
 Find the MD and MedD for the following grouped data.

x : 14 35 45 55 56 65 87 89 92
f: 4 7 11 13 18 13 8 6 3
 Solution: Again, first we need the mean and the median to calculate the necessary
columns. The mean is
Solution
 𝑥 =58.7

 𝑥 =56

𝑓𝑖 𝑥𝑖 − 𝑥 1182
 M.D.= = = 14.2
𝑛 83

𝑓𝑖 𝑥𝑖 − 𝑥 1120
 M.D. (median)= = = 13.5
𝑛 83
Variance
 Variance is defined as:

The mean of the squared deviations of all the observations from the mean.

Population variance is denoted by 𝜎2

the sample variance is denoted by S2 or 𝜎2


Mathematically,
n-1 means unbiased estimation of size.
all of S^2 calculated and their mean will be equal to
sigma^2
E(S^2)=sigma^2
 For small samples(n <= 30)

𝑥𝑖 −𝑥 2
 S2 = 𝑛−1
for ungrouped data

𝑓𝑖 𝑥𝑖 −𝑥 2
 S2 = 𝑛−1
for grouped data

 F or large samples(n > 30)

𝑥𝑖 −𝑥 2
 S2 = for ungrouped data
𝑛
𝑓𝑖 𝑥𝑖 −𝑥 2
 S2 = for grouped data
𝑛
Standard deviation (SD)

 It is a widely used measure of variability or diversity, used in statistics


and probability theory.
 It shows how much variation or “dispersion” exists from the average
(mean, or expected value).
 A low standard deviation indicates that the data points tend to be
very close to the mean,
 whereas high standard deviation indicates that the data points are
spread out over a large range of values.
Formulas for SD
if observations in cm and variance is calculated the
results will be in cm^2 because ot the square we put
in formula of variance and to bring back to same
𝑥𝑖 −𝑥 2 scale and to generalize data we use S.D. e.g taking
 S= for ungrouped data square root will bring back cm^2 to cm giving a
𝑛−1 more generalized data

𝑓𝑖 𝑥𝑖 −𝑥 2
 S= for grouped data
𝑛−1

 F or large samples(n > 30)

𝑥𝑖 −𝑥 2
 S= 𝑛
for ungrouped data
𝑓𝑖 𝑥𝑖 −𝑥 2
 S=
𝑛
Formulas for SD

xi 2 𝑥𝑖 2
S= 𝑛
-- 𝑛
for ungrouped data

fixi 2 𝑓𝑖𝑥𝑖 2
S= 𝑛
- 𝑛
for grouped data
A few important properties of

1. summation Σ
2. , mean ¯ x
3. variance S2.
Properties of summation
- sum of the deviation from mean will be equal to zero of the same data size

Properties of mean
for which deviation was taken.
- sum of square of deviation from mean is always minimum. (for variance
check also if divided by divisor 'n' or 'n-1'
-combine mean calculation. denoted 'Xc'
Properties of variance
Box plot
 Stem-and-leaf displays and histograms convey rather general impressions
about a data set,
 whereas a single summary such as the mean or standard deviation
focuses on just one aspect of the data.
In recent years, a pictorial summary called a boxplot has been used
successfully to describe several of a data set’s most prominent following
features,
1. center,
2. spread,
3. the extent and nature of any departure from symmetry
4. identification of “outliers,”
outliers are observations that lie unusually far from the main body of the data.
A Box Plot is the visual representation of the statistical five number summary of a
given data
set.
A Five Number Summary includes:
1. Minimum value
2. First Quartile
3. Median (Second Quartile)
4. Third Quartile
5. Maximum value
Box plot
Example

 Ultrasound was used to gather the accompanying corrosion data on


the thickness of the floor plate of an aboveground tank used to store
crude oil, each observation is the largest pit depth in the plate,
expressed in milli-in.
 40 52 55 60 70 75 85 85 90 90 92 94 94 95 98 100 115 125 125
 The five-number summary is as follows:
 smallest xi =40
 lower fourth =72.5 Median= 90
 upper fourth 96.5
 largest xi =125
Boxplots That Show Outliers
DEFINITION
 Any observation farther than 1.5fs from the closest fourth is an outlier.An
outlier is extreme if it is more than 3fs from the nearest fourth, and it is mild
otherwise.
 Even a single extreme outlier in the sample warns the investigator that such
procedures may be unreliable, and the presence of several mild outliers
conveys the same message
Box plot
Comparative Boxplots

A comparative or side-by-side boxplot is a very


effective way of revealing similarities and
differences between two or more data sets
consisting of observations on the same.
CV= SD/ mean

Co-efficient of variation(cv)
noise to ratio. where is it
applied and used in aerospace
Moments
QUESTION

 Find the first four central moments for the following data.

14 35 45 55 55 56 56 65 87 89 92
Raw Moments

 In some texts moments about arbitrary values are also calculated, and the
central moments are then calculated using some relationships. You can
also calculate them by first calculating moments about zero, called raw
moments, and then calculate central moments using those equations.
Those equations are presented as.
Skewness graph of skewness
positive skewness
negative skewness

if near -3 then is highly


negatively skewed, if 3 then positifely
skewed
Skewness and kurtosis

lapto kurtic

meso kurtic

platy kurtic
Practice exercises:

 Section 1.4 in Chapter one of J L Devore’s Modern Mathematical Statistics with


Applications.

Then solve questions 41-46, 48, 49 in exercise 1.4. Also, calculate the first for raw and
central moments for the above questions.

You might also like