0% found this document useful (0 votes)
14 views

Introduction. Average and Dispersion

Data is essential for making better decisions and solving problems by evaluating performance and improving processes to understand consumers. Data can be numerical, representing quantitative values, or categorical, representing qualitative characteristics. A survey of households contained both numerical data like age and number of people, as well as categorical data like heating type and favorite social media. Data can also be discrete, taking distinct integer values, or continuous, taking any real number value. Averages like the mean, median, and mode are used to summarize data and facilitate comparisons by providing a single representative value of a data set.

Uploaded by

tilak chauhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Introduction. Average and Dispersion

Data is essential for making better decisions and solving problems by evaluating performance and improving processes to understand consumers. Data can be numerical, representing quantitative values, or categorical, representing qualitative characteristics. A survey of households contained both numerical data like age and number of people, as well as categorical data like heating type and favorite social media. Data can also be discrete, taking distinct integer values, or continuous, taking any real number value. Averages like the mean, median, and mode are used to summarize data and facilitate comparisons by providing a single representative value of a data set.

Uploaded by

tilak chauhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Relevance of Data

Data is indisputable
Data make better decisions
Data solve problems
Data evaluate performance
Data improve processes
Data understand consumers
Data (Variable)
A Data is a characteristic of a unit being observed that may assume more than one of a set of
values to which a numerical measure or a category from a classification can be assigned.
Example: income, age, weight, “occupation”, “industry”, “disease”, etc.
Numerical Data: - A characteristic of objects in the population which can be expressed
numerically is known as numerical data.
Example: Number of Facebook friends
Categorical Data (Attribute): - A Characteristic which cannot be expressed quantitatively but can
be described qualitatively is known as categorical data or attribute.
Example: Favorite Social Media Application.
Types of Data
Ex. (1) A survey by an electric company contains questions on the following. Describe the data
implicit in these 11 items as numerical or categorical data.
a. Age of household head.
b. Sex of household head.
c. Number of people in household.
d. Use of electric heating (yes or no).
e. Number of large appliances used daily.
f. Thermostat setting in winter.
g. Average number of hours heating is on.
h. Average number of heating days.
i. Household income.
j. Average monthly electrical bill.
k. Ranking of this electric company as compared with two previous electricity suppliers.

Types of data

Discrete Data: A discrete variable takes only distinct and integer values (analogous to 'counting').
Example: number of defective items, number of students absent in statistics class.
Continuous data: A continuous variable takes any value on a range of real numbers (analogous
to measurement). Example: height of first year students, time spent on studying at home.

Ex. (2) For each of the following indicate if a discrete or a continuous random variable provides the
best definition.
(a) number of defective items in a sample of 20 items from a large shipment
(b) yearly income for a family
(c) change in price of a share of IBM common stock in a month
(d) number of errors detected in a corporation's accounts
(e) number of claims on a medical insurance policy in a particular year
(f) amount of oil imported into the India in a month
(g) questions answered correctly in 50-objective question examination
(i) number of nonproductive hours in an 8-hoyrs workday.

Average: Introduction

Main Value: One of the objectives of the analysis of data is to get one single value which can
describe the characteristics of the entire mass of the data and which can be consider as
representative of the entire data. A value satisfying, this criterion is the central value or an
“average”.
Central Tendency: The average is the representative or typical value of the data. It usually lies
somewhere near the center of the group and that is why the average are termed as measures of
central tendency or central value.
Comparison: Large volume of data cannot be easily understood or remembered so a single
value, summarizing the prominent features of the data as the average can be used. If two or more
sets of data are to be compared then it is not possible to compare each and every item. So, we
require one figure, representing entire data as an average, in a condensed form. Thus averages
can facilitate comparisons.

Definition

Arithmetic Mean: The most widely used measure of location or central tendency is the Arithmetic
Mean. It is defined as sum of the observations divided by the number of observations.
Median: When all the observation of a variable is arranged in either ascending (descending)
order, the middle observation is known as median.

Mode: It is the most frequently occurring observation in a data i.e. most common or most
fashionable, if it exists
Average for ungrouped data
Example (1) A random sample of 22 business economists were asked to predict the percentage
growth in the consumer price index number over the next year. The forecasts were:
3.6 3.1 3.9 3.7 3.5 3.7 3.4
3.0 3.6 3.4 3.1 2.9 3.0 4.0 2.8
3.8 4.2 2.5 3.1 3.9 2.9 2.6
Find the sample mean.

Example (2): The following data represent the number of days it took 7 individuals to quit smoking
after completing a course designed for this purpose. What is sample median?
1 100 5 2 8 3 7
Example (3): A sample of 12 senior executives found the following results for percentage of total
compensation derived from bonus payments. Find the sample median.
15.8 7.3 28.4 18.2 15.0 24.7
13.1 10.2 29.3 34.7 16.9 25.3
Example (4) The following are the sizes of the last 8 dresses sold at a women's boutique. What is
the sample mode?
8 10 6 4 10 12 14 10
Average for grouped data (discrete case)
Example (5) Mr. XYZ is Quality control manager of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches. Find mean, median and Mode.
Class (No of defects) Frequency
0 2
1 8
2 10
3 6
4 4

Mean: Here the variable X assumes separate, distinct values 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 with the
corresponding frequencies 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 …..𝒇𝒌
Then Arithmetic Mean is
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿=
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝒇𝟏 𝒙𝟏 + 𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + ⋯ + 𝒇𝒌 𝒙𝒌
=
𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + ⋯ + 𝒇𝒌
∑ 𝒇𝒙
=
𝒏
where, 𝒏 = ∑ 𝒇 = 𝒇𝟏 + +𝒇𝟑 + ⋯ + 𝒇𝒌
Median: First calculate the cumulative frequency of less than type and then median is given as the
𝒏+𝟏
value of the variable for which cumulative frequency is at or exceeds starting from the top;
𝟐
where n represent the total number of observations.
Mode: Here mode can be obtained as the value of the variable with the maximum frequency.
Averages Grouped data (continuous case)

Example (6): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons. Calculate the mean, median and mode.
Class interval
Frequency
(computer usage in hours)
0-3 5
3-6 28
6-9 8
9-12 6
12-15 3

Mean: Here the variable X assumes 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 representative (mid value or class marks)
value of the class intervals with the corresponding frequencies 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 …..𝒇𝒌 .
Then Arithmetic Mean is
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿=
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝒇𝟏 𝒙𝟏 + 𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + ⋯ + 𝒇𝒌 𝒙𝒌
=
𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + ⋯ + 𝒇𝒌
∑ 𝒇𝒙
=
𝒏
where, 𝒏 = ∑ 𝒇 = 𝒇𝟏 + +𝒇𝟑 + ⋯ + 𝒇𝒌
Median: First calculate the cumulative frequency of less than type and then identify the median
class (The class interval which contains the median value) as the class interval for which
𝒏+𝟏
cumulative frequency is at or exceeds 𝟐 starting from the top; where n represent the total
number of observations. Then median is calculated using the formula,
𝒏+𝟏
(𝒍𝟐 − 𝒍𝟏 ) (
𝑴𝒆𝒅𝒊𝒂𝒏 = 𝒍𝟏 + 𝟐 − 𝒄. 𝒇)
𝒇
Where
𝒍𝟏 - lower limit of Median class
𝒍𝟐 - upper limit of Median class
𝒇- frequency of Median class
𝒄. 𝒇. – cumulative frequency of pre-Median class

Mode: First identify the model class (the class interval which contains the mode value) as the
class interval for which the frequency is maximum. Then mode is given by,
(𝒍𝟐 − 𝒍𝟏 )(𝒇𝟏 − 𝒇𝟎 )
𝑴𝒐𝒅𝒆 = 𝒍𝟏 +
(𝟐𝒇𝟏 − 𝒇𝟎 − 𝒇𝟐 )

Where
𝒍𝟏 - lower limit of Model class
𝒍𝟐 - upper limit of Model class
𝒇𝟎 - frequency of pre-model class
𝒇𝟏 - frequency of model class
𝒇𝟐 - frequency of post model class

Example (7): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below. Calculate the mean, median and mode.
Minutes
No. of aircrafts
Late
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2

Presentation of data: Frequency distribution


As the number of observations obtained gets larger, the data needs to be further condensed into
summary tables in order to properly present, analyses, and interpret the findings.
The most common method is preparing a frequency distribution, which is a summary table in
which the data is arranged into conveniently established numerically ordered class groupings or
categories.
Important terms:
Frequency
Class /Class interval
Inclusive type of class- interval
Exclusive type of class-interval
Class marks
Some important point to remember while preparing frequency distributions:
Each observation in the dataset must fall into one and only one class/class interval
The number of class intervals should be greater than 4 but no more than 15
Class intervals of equal widths are usually used (although not always)
The midpoint of each interval should be close to the average of the observations included in the
class interval
A useful rule of thumb for establishing the width of the intervals is as follows:
Width of class interval = (largest value-smallest value)/number of class intervals desired

Frequency distribution: Qualitative data

Example (1): Following data is represent the operating system of smart phone used by class of
students. Prepare the frequency distribution of the data. (A = Android, W= Window Phone, I =
IPhone, AM = Amazon’s fire phone)
AM, A, I, I, I, W, I, AM, W, A,
I, I, W, A, I, A, AM, W, W, I,
I, I, W, A, A, A, W, I, AM, AM,
A, A, I, A, I, A, A, W, I, I
Solution: Frequency distribution of OS of smartphone

Frequency distribution: Quantitative data

Example (2): Mr. XYZ is Quality control manger of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches.
2 1 3 2 1 3 3 2 4 1
2 1 0 1 0 2 3 2 1 3
1 2 1 4 4 2 4 3 2 2
Solution: Frequency distribution of No. of Defect
Class(No of defects) Frequency
0
1
2
3
4

Example (3): In a study of job satisfaction, a series of test was administered to 50 subjects. The
following data was obtained; higher score represent greater satisfactions. Summarise the data
using frequency distribution.
87 59 80 61 50 60 70 89 84 76

76 41 81 88 47 65 74 84 76 78

67 50 70 46 81 92 53 83 78 67

58 90 73 85 87 77 43 70 64 74

92 75 69 97 75 71 61 46 69 64

Solution: Frequency distribution of Satisfaction score


Class Interval (Satisfaction score) Frequency
40 - 49 5
50 - 59 5
60 - 69 10
70 - 79 15
80 - 89 11
90 - 99 4

Example (4): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons:
4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7
3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5
4.1 4.1 8.8 5.6 4.3 3.3 7.1 10.3 6.2 7.6
10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7
7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1
Summarize the data by constructing a frequency distribution with class width of 2 hours.
Solution: Frequency distribution of Computer usage in hours
Class interval (computer usage in hours) Frequency
0-3
3-6
6-9
9-12
12-15
Total =
Presentation of Data: Diagrams

Following is the important point to be remembered while making a diagram and graph:
A good diagram and graph:
Provides a clear summary of data
Is a fair and honest representation
Highlights underlying patterns
Allows the extraction of a lot of information quickly.
A bad diagram:
Confuses the viewer
Misleads (either accidentally or intentionally).
Diagrams: Frequency diagram

Example (5) The following frequency diagram represent number of confirmed cases of COVID-19
In India and world.

Diagram: Simple Bar Diagrams


Example (6) The following bar chart represent proportion of woman in the total labor force (%).

Proportion of Women in the Total


Labour Force (% )
50
45
40
35
30
25
20
15
10
5
0
a

P ia
A e

n
d

l
.

n
in

an a

zi
.A

sh

pa
nc

d
n

ta
B rali

ra
h

In
ila

.S

de

Ja

is
ra
C

B
t
he

us

ak
F

a
gl
T

Diagrams: Multiple bar diagram


Example (7): The following diagram represents Record of Disinvestment (Rs. In Crores) for the
year 1991-02.
Record of Disinvestment
14000
12000
10000 Target Set by
8000 Government
6000
4000 Actual
2000 Receipts
0
2

2
-9

-9

-9

-9

-0

-0
91

93

95

97

99

01
19

19

19

19

19

20
Diagram: Subdivided bar diagram

Example (8): The following diagram represents distribution of senior, adults and child at hotel
accommodation at Irish, British, Mainland European and Rest of World.

Diagram: Pie diagram

Pie-chart: With a small number of categories, we could use a pie-chart. The angle can be
calculated using the formula.
Component value x 360
Angle in Degree=
Total value of all components
Example (9): The following Pie diagram represents distribution of favorite types of movie.

Presentation of Data: Quantitively data


Presentation of Data: Histogram

Example (10): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons:
4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7
3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5
4.1 4.1 8.8 5.6 4.3 3.3 7.1 10.3 6.2 7.6
10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7
7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1
Prepare the histogram representing the data. Calculate the mode from histogram and verify your
answer by calculating it using the formula. (Answer=4.6)
Class interval Frequency
0-3 5
3-6 28
6-9 8
9-12 6
12-15 3
Total = 50
Presentation of Data: Ogives

Cumulative (Less than type) frequency graph: It plots the frequency of all observation less than
a given observation. Plot the points by taking upper limit of class interval on x- axis and
corresponding cumulative on y axis. Join these points by smooth free hand curve.
Example (11): For the data given in question example (10). Draw the less than type cumulative
curve. Hence find the value of median from the graph and verify your answer by calculating it
using the formula. (Given Answer = 5.22)
Example (12): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below.
Minutes Late No. of aircrafts
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2

Prepare the histogram representing the data. Calculate the mode from histogram and verify your
answer by calculating it using the formula. (Mode =6.14)
Draw the less than type cumulative curve. Hence find the value of median from the graph and
verify your answer by calculating it using the formula. (Median = 11)

Quartiles: Quartiles are not the measure of central tendency but are partitioning value, that is they
are specific points in data set that separate large ordered data sets into four quarters.
First data must be arranged in ascending order and then quartiles are given by,
First quartile (lower quartiles), Q1: The first quartile, Q1, divides the ordered data set such that
25% of observations are at or below this value.

𝒏 + 𝟏 𝒕𝒉
𝑸𝟏 = ( ) 𝒗𝒂𝒍𝒖𝒆
𝟒
Second quartile, Q2: The second quartile, Q2, divides the ordered data set such that 50% of
observations are at or below this value.

𝒏 + 𝟏 𝒕𝒉 𝒏 + 𝟏 𝒕𝒉
𝑸𝟐 = 𝑴𝒆𝒅𝒊𝒂𝒏 = {𝟐 ( )} 𝒗𝒂𝒍𝒖𝒆 = ( ) 𝒗𝒂𝒍𝒖𝒆
𝟒 𝟐
Third quartile (Upper quartiles), Q3: The third quartile, Q3, divides the ordered data set such that
75% of observations are at or below this value.

𝒏 + 𝟏 𝒕𝒉
𝑸𝟑 = {𝟑 ( )} 𝒗𝒂𝒍𝒖𝒆
𝟒
Where 𝒏 is the number of observation in the data.
Example (1): The growing use of personal computers is suggested to be one reasons people can
operate at-home business. Following is a sample of age data for individuals working at home.
22 58 24 50 29 52 57 31 30 41
44 40 46 29 31 37 32 44 49 29
Compute the first, second and third quartiles.

Example (2) The IQ scores for a sample of 30 students who are entering their first year of high
school are shown below:
95 95 97 98 101
102 103 104 105 106
106 107 108 108 110
111 115 115 117 119
119 121 121 126 126
128 133 134 136 142
Find the three quartiles. Without calculating, give the value of median.
Example (3) Mr. XYZ is Quality control manager of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches. Find three quartiles.
Class (No of defects) Frequency
0 2
1 8
2 10
3 6
4 4

Example (4): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below. Calculate the three quartiles.
Minutes
No. of aircrafts
Late
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2

Dispersion: Introduction

In addition to averages, some additional information about the observation is required to know the
extent to which the values vary from one another and from central value.
A measure of spread or scatter of the data is called a measure of variation or dispersion.
The measure of dispersion can give us idea about reliability of the averages. When the variability
is less, the average is more reliable, so that it is a better estimate of the population average and if,
the dispersion is more, the average is not a good representing of the data.
The measures of dispersion can be used to compare two or more distributions. The one with less
dispersion is more consistent or homogenous and the one with more dispersion is less consistent.

Dispersion: Types of dispersion

There are two kinds of measures of dispersion, namely:


• Absolute measures of dispersion
• Relative measures of dispersion
Absolute measures of dispersion indicate the amount of variation in a set of values; in
terms of units of observations. For example, when rainfall data is made available for different
days in mm, any absolute measures of dispersion give the variation in rainfall in mm.
On the other hand, relative measures (also known as coefficient) of dispersion are free from
the units of the measurements of the observations. They are pure numbers. They are used to
compare the variation in two or more sets, which are having different units of measurements of
observations.
Absolute Relative
Dispersion Dispersion

Range Coefficient of Range

Quartile
Coefficient of Q.D.
deviation

Mean
Coefficient of M.D
deviation

Standard
Coefficient of Variance
deviation

Range: It is defined as the difference between the maximum and minimum observation in the
data.

𝑹𝒂𝒏𝒈𝒆 = 𝑴𝒂𝒙. −𝑴𝒊𝒏.


And the corresponding relative measure is given by
𝑴𝒂𝒙. −𝑴𝒊𝒏
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑹𝒂𝒏𝒈𝒆 =
𝑴𝒂𝒙 + 𝑴𝒊𝒏
Quartile Deviation (Q.D.): It is defined as average spread in middle 50% of the data; it is average
of difference between the third quartile, Q3 and the first quartile, Q1 from Median Q2.

(𝑸𝟐 − 𝑸𝟏 ) + (𝑸𝟑 − 𝑸𝟐 )
𝑸. 𝑫. =
𝟐
(𝑸𝟑 − 𝑸𝟏 )
=
𝟐
And the corresponding relative measure is given by
𝑸𝟑 − 𝑸𝟏
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑸. 𝑫. =
𝑸𝟑 + 𝑸𝟏
Mean Deviation: It is defined as average of absolute deviation of value from mean.

𝟏
𝑴. 𝑫. = ̅|
∑|𝑿 − 𝑿
𝒏
And the corresponding relative measure is given by
𝑴. 𝑫.
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑴. 𝑫. =
̅
𝑿
Standard Deviation: It is defined as square root of average of squared deviation of value from
mean.
𝟏 𝟏
̅ )𝟐 = √ ∑ 𝒙𝟐 − (𝒙
𝑺. 𝑫. = 𝑺 = √ ∑(𝑿 − 𝑿 ̅)𝟐
𝒏 𝒏

And the corresponding relative measure is known as coefficient of variance and given by
𝑺. 𝑫.
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑺. 𝑫. = 𝟏𝟎𝟎
̅
𝑿

Example (1) Eight participants in a bike race had the following finishing times in minutes.
28 22 26 33 21 23 37 24
Compute the range, Q.D, M.D and S.D. and their coefficient.

Example (2) The Los Angeles Times regularly reports the air quality index for various area of the
southern California. A sample of air quality index values for Pomona provided the following data:
28 42 58 48 45 55 60 49 50
Compute the range, Q.D, M.D and S.D. and their coefficient.

For group data, the formula for


𝟏
𝑴. 𝑫. = ̅|
∑ 𝒇|𝑿 − 𝑿
𝒏

𝟏 𝟏
̅ )𝟐 = √ ∑ 𝒇𝒙𝟐 − (𝒙
𝑺. 𝑫. = 𝑺 = √ ∑ 𝒇(𝑿 − 𝑿 ̅) 𝟐
𝒏 𝒏

Example (3) The score of 20 students in color sensitivity test is given by the following frequency
distribution. Calculate the range, Q.D, M.D and S.D. and their coefficient. (1-least sensitivity and 7-
most)
score 1 2 3 4 5 6 7
frequency 3 1 3 4 6 2 1

Example (4) Following is the frequency diminution of age of Instagram user in a random survey.
Calculate the range, Q.D, M.D and S.D. and their coefficient.

Age of Instagram user No. of user


12-18 9
18-24 34
24-30 35
30-36 16
36-42 8
42-48 4
48-54 2
Example (4) Following is the frequency diminution of age of Instagram user in a random survey.
Calculate the range, Q.D, M.D and S.D. and their coefficient.
Age of Instagram user No. of user
12-18 9
18-24 34
24-30 35
30-36 16
36-42 8
42-48 4
48-54 2

Example (5) In a study of job satisfaction, a series of test was administered to 50 subjects. The
following data was obtained; higher score represent greater satisfactions. Calculate the range,
Q.D, M.D and S.D. and their coefficient.

Class Interval (Satisfaction score) Frequency


40 - 49 5
50 - 59 5
60 - 69 10
70 - 79 15
80 - 89 11
90 - 99 4

You might also like