Introduction. Average and Dispersion
Introduction. Average and Dispersion
Data is indisputable
Data make better decisions
Data solve problems
Data evaluate performance
Data improve processes
Data understand consumers
Data (Variable)
A Data is a characteristic of a unit being observed that may assume more than one of a set of
values to which a numerical measure or a category from a classification can be assigned.
Example: income, age, weight, “occupation”, “industry”, “disease”, etc.
Numerical Data: - A characteristic of objects in the population which can be expressed
numerically is known as numerical data.
Example: Number of Facebook friends
Categorical Data (Attribute): - A Characteristic which cannot be expressed quantitatively but can
be described qualitatively is known as categorical data or attribute.
Example: Favorite Social Media Application.
Types of Data
Ex. (1) A survey by an electric company contains questions on the following. Describe the data
implicit in these 11 items as numerical or categorical data.
a. Age of household head.
b. Sex of household head.
c. Number of people in household.
d. Use of electric heating (yes or no).
e. Number of large appliances used daily.
f. Thermostat setting in winter.
g. Average number of hours heating is on.
h. Average number of heating days.
i. Household income.
j. Average monthly electrical bill.
k. Ranking of this electric company as compared with two previous electricity suppliers.
Types of data
Discrete Data: A discrete variable takes only distinct and integer values (analogous to 'counting').
Example: number of defective items, number of students absent in statistics class.
Continuous data: A continuous variable takes any value on a range of real numbers (analogous
to measurement). Example: height of first year students, time spent on studying at home.
Ex. (2) For each of the following indicate if a discrete or a continuous random variable provides the
best definition.
(a) number of defective items in a sample of 20 items from a large shipment
(b) yearly income for a family
(c) change in price of a share of IBM common stock in a month
(d) number of errors detected in a corporation's accounts
(e) number of claims on a medical insurance policy in a particular year
(f) amount of oil imported into the India in a month
(g) questions answered correctly in 50-objective question examination
(i) number of nonproductive hours in an 8-hoyrs workday.
Average: Introduction
Main Value: One of the objectives of the analysis of data is to get one single value which can
describe the characteristics of the entire mass of the data and which can be consider as
representative of the entire data. A value satisfying, this criterion is the central value or an
“average”.
Central Tendency: The average is the representative or typical value of the data. It usually lies
somewhere near the center of the group and that is why the average are termed as measures of
central tendency or central value.
Comparison: Large volume of data cannot be easily understood or remembered so a single
value, summarizing the prominent features of the data as the average can be used. If two or more
sets of data are to be compared then it is not possible to compare each and every item. So, we
require one figure, representing entire data as an average, in a condensed form. Thus averages
can facilitate comparisons.
Definition
Arithmetic Mean: The most widely used measure of location or central tendency is the Arithmetic
Mean. It is defined as sum of the observations divided by the number of observations.
Median: When all the observation of a variable is arranged in either ascending (descending)
order, the middle observation is known as median.
Mode: It is the most frequently occurring observation in a data i.e. most common or most
fashionable, if it exists
Average for ungrouped data
Example (1) A random sample of 22 business economists were asked to predict the percentage
growth in the consumer price index number over the next year. The forecasts were:
3.6 3.1 3.9 3.7 3.5 3.7 3.4
3.0 3.6 3.4 3.1 2.9 3.0 4.0 2.8
3.8 4.2 2.5 3.1 3.9 2.9 2.6
Find the sample mean.
Example (2): The following data represent the number of days it took 7 individuals to quit smoking
after completing a course designed for this purpose. What is sample median?
1 100 5 2 8 3 7
Example (3): A sample of 12 senior executives found the following results for percentage of total
compensation derived from bonus payments. Find the sample median.
15.8 7.3 28.4 18.2 15.0 24.7
13.1 10.2 29.3 34.7 16.9 25.3
Example (4) The following are the sizes of the last 8 dresses sold at a women's boutique. What is
the sample mode?
8 10 6 4 10 12 14 10
Average for grouped data (discrete case)
Example (5) Mr. XYZ is Quality control manager of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches. Find mean, median and Mode.
Class (No of defects) Frequency
0 2
1 8
2 10
3 6
4 4
Mean: Here the variable X assumes separate, distinct values 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 with the
corresponding frequencies 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 …..𝒇𝒌
Then Arithmetic Mean is
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿=
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝒇𝟏 𝒙𝟏 + 𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + ⋯ + 𝒇𝒌 𝒙𝒌
=
𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + ⋯ + 𝒇𝒌
∑ 𝒇𝒙
=
𝒏
where, 𝒏 = ∑ 𝒇 = 𝒇𝟏 + +𝒇𝟑 + ⋯ + 𝒇𝒌
Median: First calculate the cumulative frequency of less than type and then median is given as the
𝒏+𝟏
value of the variable for which cumulative frequency is at or exceeds starting from the top;
𝟐
where n represent the total number of observations.
Mode: Here mode can be obtained as the value of the variable with the maximum frequency.
Averages Grouped data (continuous case)
Example (6): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons. Calculate the mean, median and mode.
Class interval
Frequency
(computer usage in hours)
0-3 5
3-6 28
6-9 8
9-12 6
12-15 3
Mean: Here the variable X assumes 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 representative (mid value or class marks)
value of the class intervals with the corresponding frequencies 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 …..𝒇𝒌 .
Then Arithmetic Mean is
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿=
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝒇𝟏 𝒙𝟏 + 𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + ⋯ + 𝒇𝒌 𝒙𝒌
=
𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + ⋯ + 𝒇𝒌
∑ 𝒇𝒙
=
𝒏
where, 𝒏 = ∑ 𝒇 = 𝒇𝟏 + +𝒇𝟑 + ⋯ + 𝒇𝒌
Median: First calculate the cumulative frequency of less than type and then identify the median
class (The class interval which contains the median value) as the class interval for which
𝒏+𝟏
cumulative frequency is at or exceeds 𝟐 starting from the top; where n represent the total
number of observations. Then median is calculated using the formula,
𝒏+𝟏
(𝒍𝟐 − 𝒍𝟏 ) (
𝑴𝒆𝒅𝒊𝒂𝒏 = 𝒍𝟏 + 𝟐 − 𝒄. 𝒇)
𝒇
Where
𝒍𝟏 - lower limit of Median class
𝒍𝟐 - upper limit of Median class
𝒇- frequency of Median class
𝒄. 𝒇. – cumulative frequency of pre-Median class
Mode: First identify the model class (the class interval which contains the mode value) as the
class interval for which the frequency is maximum. Then mode is given by,
(𝒍𝟐 − 𝒍𝟏 )(𝒇𝟏 − 𝒇𝟎 )
𝑴𝒐𝒅𝒆 = 𝒍𝟏 +
(𝟐𝒇𝟏 − 𝒇𝟎 − 𝒇𝟐 )
Where
𝒍𝟏 - lower limit of Model class
𝒍𝟐 - upper limit of Model class
𝒇𝟎 - frequency of pre-model class
𝒇𝟏 - frequency of model class
𝒇𝟐 - frequency of post model class
Example (7): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below. Calculate the mean, median and mode.
Minutes
No. of aircrafts
Late
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2
Example (1): Following data is represent the operating system of smart phone used by class of
students. Prepare the frequency distribution of the data. (A = Android, W= Window Phone, I =
IPhone, AM = Amazon’s fire phone)
AM, A, I, I, I, W, I, AM, W, A,
I, I, W, A, I, A, AM, W, W, I,
I, I, W, A, A, A, W, I, AM, AM,
A, A, I, A, I, A, A, W, I, I
Solution: Frequency distribution of OS of smartphone
Example (2): Mr. XYZ is Quality control manger of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches.
2 1 3 2 1 3 3 2 4 1
2 1 0 1 0 2 3 2 1 3
1 2 1 4 4 2 4 3 2 2
Solution: Frequency distribution of No. of Defect
Class(No of defects) Frequency
0
1
2
3
4
Example (3): In a study of job satisfaction, a series of test was administered to 50 subjects. The
following data was obtained; higher score represent greater satisfactions. Summarise the data
using frequency distribution.
87 59 80 61 50 60 70 89 84 76
76 41 81 88 47 65 74 84 76 78
67 50 70 46 81 92 53 83 78 67
58 90 73 85 87 77 43 70 64 74
92 75 69 97 75 71 61 46 69 64
Example (4): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons:
4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7
3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5
4.1 4.1 8.8 5.6 4.3 3.3 7.1 10.3 6.2 7.6
10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7
7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1
Summarize the data by constructing a frequency distribution with class width of 2 hours.
Solution: Frequency distribution of Computer usage in hours
Class interval (computer usage in hours) Frequency
0-3
3-6
6-9
9-12
12-15
Total =
Presentation of Data: Diagrams
Following is the important point to be remembered while making a diagram and graph:
A good diagram and graph:
Provides a clear summary of data
Is a fair and honest representation
Highlights underlying patterns
Allows the extraction of a lot of information quickly.
A bad diagram:
Confuses the viewer
Misleads (either accidentally or intentionally).
Diagrams: Frequency diagram
Example (5) The following frequency diagram represent number of confirmed cases of COVID-19
In India and world.
P ia
A e
n
d
l
.
n
in
an a
zi
.A
sh
pa
nc
d
n
ta
B rali
ra
h
In
ila
.S
de
Ja
is
ra
C
B
t
he
us
ak
F
a
gl
T
2
-9
-9
-9
-9
-0
-0
91
93
95
97
99
01
19
19
19
19
19
20
Diagram: Subdivided bar diagram
Example (8): The following diagram represents distribution of senior, adults and child at hotel
accommodation at Irish, British, Mainland European and Rest of World.
Pie-chart: With a small number of categories, we could use a pie-chart. The angle can be
calculated using the formula.
Component value x 360
Angle in Degree=
Total value of all components
Example (9): The following Pie diagram represents distribution of favorite types of movie.
Example (10): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons:
4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7
3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5
4.1 4.1 8.8 5.6 4.3 3.3 7.1 10.3 6.2 7.6
10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7
7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1
Prepare the histogram representing the data. Calculate the mode from histogram and verify your
answer by calculating it using the formula. (Answer=4.6)
Class interval Frequency
0-3 5
3-6 28
6-9 8
9-12 6
12-15 3
Total = 50
Presentation of Data: Ogives
Cumulative (Less than type) frequency graph: It plots the frequency of all observation less than
a given observation. Plot the points by taking upper limit of class interval on x- axis and
corresponding cumulative on y axis. Join these points by smooth free hand curve.
Example (11): For the data given in question example (10). Draw the less than type cumulative
curve. Hence find the value of median from the graph and verify your answer by calculating it
using the formula. (Given Answer = 5.22)
Example (12): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below.
Minutes Late No. of aircrafts
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2
Prepare the histogram representing the data. Calculate the mode from histogram and verify your
answer by calculating it using the formula. (Mode =6.14)
Draw the less than type cumulative curve. Hence find the value of median from the graph and
verify your answer by calculating it using the formula. (Median = 11)
Quartiles: Quartiles are not the measure of central tendency but are partitioning value, that is they
are specific points in data set that separate large ordered data sets into four quarters.
First data must be arranged in ascending order and then quartiles are given by,
First quartile (lower quartiles), Q1: The first quartile, Q1, divides the ordered data set such that
25% of observations are at or below this value.
𝒏 + 𝟏 𝒕𝒉
𝑸𝟏 = ( ) 𝒗𝒂𝒍𝒖𝒆
𝟒
Second quartile, Q2: The second quartile, Q2, divides the ordered data set such that 50% of
observations are at or below this value.
𝒏 + 𝟏 𝒕𝒉 𝒏 + 𝟏 𝒕𝒉
𝑸𝟐 = 𝑴𝒆𝒅𝒊𝒂𝒏 = {𝟐 ( )} 𝒗𝒂𝒍𝒖𝒆 = ( ) 𝒗𝒂𝒍𝒖𝒆
𝟒 𝟐
Third quartile (Upper quartiles), Q3: The third quartile, Q3, divides the ordered data set such that
75% of observations are at or below this value.
𝒏 + 𝟏 𝒕𝒉
𝑸𝟑 = {𝟑 ( )} 𝒗𝒂𝒍𝒖𝒆
𝟒
Where 𝒏 is the number of observation in the data.
Example (1): The growing use of personal computers is suggested to be one reasons people can
operate at-home business. Following is a sample of age data for individuals working at home.
22 58 24 50 29 52 57 31 30 41
44 40 46 29 31 37 32 44 49 29
Compute the first, second and third quartiles.
Example (2) The IQ scores for a sample of 30 students who are entering their first year of high
school are shown below:
95 95 97 98 101
102 103 104 105 106
106 107 108 108 110
111 115 115 117 119
119 121 121 126 126
128 133 134 136 142
Find the three quartiles. Without calculating, give the value of median.
Example (3) Mr. XYZ is Quality control manager of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches. Find three quartiles.
Class (No of defects) Frequency
0 2
1 8
2 10
3 6
4 4
Example (4): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below. Calculate the three quartiles.
Minutes
No. of aircrafts
Late
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2
Dispersion: Introduction
In addition to averages, some additional information about the observation is required to know the
extent to which the values vary from one another and from central value.
A measure of spread or scatter of the data is called a measure of variation or dispersion.
The measure of dispersion can give us idea about reliability of the averages. When the variability
is less, the average is more reliable, so that it is a better estimate of the population average and if,
the dispersion is more, the average is not a good representing of the data.
The measures of dispersion can be used to compare two or more distributions. The one with less
dispersion is more consistent or homogenous and the one with more dispersion is less consistent.
Quartile
Coefficient of Q.D.
deviation
Mean
Coefficient of M.D
deviation
Standard
Coefficient of Variance
deviation
Range: It is defined as the difference between the maximum and minimum observation in the
data.
(𝑸𝟐 − 𝑸𝟏 ) + (𝑸𝟑 − 𝑸𝟐 )
𝑸. 𝑫. =
𝟐
(𝑸𝟑 − 𝑸𝟏 )
=
𝟐
And the corresponding relative measure is given by
𝑸𝟑 − 𝑸𝟏
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑸. 𝑫. =
𝑸𝟑 + 𝑸𝟏
Mean Deviation: It is defined as average of absolute deviation of value from mean.
𝟏
𝑴. 𝑫. = ̅|
∑|𝑿 − 𝑿
𝒏
And the corresponding relative measure is given by
𝑴. 𝑫.
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑴. 𝑫. =
̅
𝑿
Standard Deviation: It is defined as square root of average of squared deviation of value from
mean.
𝟏 𝟏
̅ )𝟐 = √ ∑ 𝒙𝟐 − (𝒙
𝑺. 𝑫. = 𝑺 = √ ∑(𝑿 − 𝑿 ̅)𝟐
𝒏 𝒏
And the corresponding relative measure is known as coefficient of variance and given by
𝑺. 𝑫.
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑺. 𝑫. = 𝟏𝟎𝟎
̅
𝑿
Example (1) Eight participants in a bike race had the following finishing times in minutes.
28 22 26 33 21 23 37 24
Compute the range, Q.D, M.D and S.D. and their coefficient.
Example (2) The Los Angeles Times regularly reports the air quality index for various area of the
southern California. A sample of air quality index values for Pomona provided the following data:
28 42 58 48 45 55 60 49 50
Compute the range, Q.D, M.D and S.D. and their coefficient.
𝟏 𝟏
̅ )𝟐 = √ ∑ 𝒇𝒙𝟐 − (𝒙
𝑺. 𝑫. = 𝑺 = √ ∑ 𝒇(𝑿 − 𝑿 ̅) 𝟐
𝒏 𝒏
Example (3) The score of 20 students in color sensitivity test is given by the following frequency
distribution. Calculate the range, Q.D, M.D and S.D. and their coefficient. (1-least sensitivity and 7-
most)
score 1 2 3 4 5 6 7
frequency 3 1 3 4 6 2 1
Example (4) Following is the frequency diminution of age of Instagram user in a random survey.
Calculate the range, Q.D, M.D and S.D. and their coefficient.
Example (5) In a study of job satisfaction, a series of test was administered to 50 subjects. The
following data was obtained; higher score represent greater satisfactions. Calculate the range,
Q.D, M.D and S.D. and their coefficient.