0% found this document useful (0 votes)
33 views61 pages

Measures of Variability and Normal Distribution

The document discusses measures of variability, including range, interquartile range, variance, and standard deviation. It defines each measure and provides the formulas and steps to calculate them. The key measures are range, which is the difference between the highest and lowest values, and standard deviation, which quantifies how far data points deviate from the mean.

Uploaded by

bsaguado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views61 pages

Measures of Variability and Normal Distribution

The document discusses measures of variability, including range, interquartile range, variance, and standard deviation. It defines each measure and provides the formulas and steps to calculate them. The key measures are range, which is the difference between the highest and lowest values, and standard deviation, which quantifies how far data points deviate from the mean.

Uploaded by

bsaguado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

MEASURES OF

VARIABILITY AND
NORMAL
DISTRIBUTION
Module
Table of Contents

❖ INTRODUCTION ………………………………………………………… 3
❖ OBJECTIVES ………………………………………………………… 4
❖ TOPICS
❖ MEASURES OF VARIABILITY
● Importance of Measuring Variability ………………………… 6
● Range and Interquartile Range ………………………… 7
● Variance ………………………………………………… 10
● Standard Deviation ………………………………………… 20
● Consideration of Choosing Measures of Variability ………… 28

❖ NORMAL DISTRIBUTION
● Normal Distribution ………………………………………… 32
● Areas Under the Normal Curve ………………………… 38
● Shaded Region Under the Normal Curve ………………… 50
● Understanding Z-score ………………………………… 53
● Percentile Under the Normal Curve ………………………… 55
❖ PRACTICAL APPLICATIONS ………………………………………… 59

❖ LEARNING MATERIALS ………………………………………………… 61

2|Page
INTRODUCTION

Measures of variability and the normal distribution are fundamental concepts in statistics.

Measures of variability, such as the range, variance, and standard deviation, quantify the spread or

dispersion of data points in a dataset. The range is the simplest measure, representing the difference

between the maximum and minimum values. Variance and standard deviation provide more

detailed information by calculating the average of the squared differences from the mean, with the

standard deviation being the square root of the variance. A smaller standard deviation indicates

that data points are closer to the mean, while a larger one suggests greater variability.

The normal distribution, also known as the Gaussian distribution or bell curve, is a widely

observed and important statistical distribution. It is characterized by a symmetric, bell-shaped

curve, where data tends to cluster around the mean, with decreasing frequency as values move

away from the mean. The properties of the normal distribution, such as the 68-95-99.7 rule, which

states that approximately 68% of data falls within one standard deviation of the mean, 95% within

two standard deviations, and 99.7% within three standard deviations, make it a powerful tool in

statistical analysis, hypothesis testing, and modeling in various fields, from natural sciences to

social sciences. These concepts are essential in understanding and analyzing data, making

informed decisions, and drawing conclusions from research and observations.

3|Page
Objectives related to measures of variability and the normal
distribution typically focus on understanding, calculating, and
applying these concepts in various statistical and data analysis
contexts. Here are some common objectives related to these topics:
★ Understanding Measures of Variability:
1. Comprehend the concept of variability in data.
2. Differentiate between variance, standard deviation, range, and interquartile range
(IQR).
3. Recognize the significance of measures of variability in the field of statistics.
★ Calculating Variance and Standard Deviation:
1. Learn the procedures for computing variance and standard deviation for datasets.
2. Grasp the formulas and steps involved in these computations.
3. Gain proficiency in calculating variance and standard deviation for both sample and
population data.
★ Interpreting Variability Measures:
1. Interpret the implications of variance and standard deviation within the context of
data analysis.
2. Articulate the meaning of high and low values of these measures.
3. Explore the link between variability and the spread of data.
★ Exploring the Range and IQR:
1. Calculate and elucidate the significance of the range and IQR when assessing data
distribution.
2. Comprehend how the range and IQR aid in discerning data spread.
3. Analyze and contrast the range and IQR as measures of data spread.
★ Grasping Normal Distribution Fundamentals:
1. Define and elucidate the characteristics of the normal distribution.
2. Understand the key attributes of the normal distribution, including its bell-shaped
curve, mean, and standard deviation.
3. Identify the empirical rule (68-95-99.7) associated with the normal distribution.
★ Understanding Characteristics of Normal Distribution:
1. Justify the importance of the normal distribution in statistical and data analysis
contexts.
2. Describe the symmetry and skewness patterns observed in normal distributions.
3. Delve into how the normal distribution is employed to model real-world data.
★ Developing Problem-Solving Proficiency:
1. Cultivate problem-solving abilities in interpreting and analyzing data using
measures of variability and the normal distribution.
2. Solve a variety of statistical problems and exercises pertaining to these concepts.

4|Page
Measures of Variability
This module has been specifically created with your learning needs in mind. Its primary

purpose is to help you understand the concept of variability measures. This module is designed to

be comprehensive and self-contained for your current learning situation. The language used in this

module is tailored to your vocabulary level. The lessons are organized to align with the standard

curriculum sequence, but you have the flexibility to read them in a different order if it better

corresponds with the textbook you are currently using.

Upon completing this module, you should be able to:

1. Demonstrate and compute measures of variability (such as range, average deviation,

variance, and standard deviation) for statistical data.

2. Explain the concept of measures of variability (range, average deviation, variance, standard

deviation) for statistical data.

3. Identify the factors to consider when choosing a measure of variability.

5|Page
Importance of Measuring
Variability

The term "variability" refers to the distance between data points within a distribution and
their distance from its center. Measures of variability give you descriptive statistics that summarize
your data in addition to measures of central tendency.
Variability summarizes the distance between your points, whereas central tendency, or
average, indicates where the majority of your points are located. This is significant because the
degree of variability affects the degree to which results from the sample may be applied to the
entire population.
Low variability is desirable because it makes it easier to extrapolate population information
from sample data. It is more difficult to

6|Page
Range and Interquartile
Range

The range and interquartile range are two measures of the spread or dispersion of data in a

dataset.

Range is a statistical measure that represents the spread or dispersion of data in a dataset.

To calculate the range.

1. Find the minimum value ( the lowest number) in your dataset.

2. Find the maximum value (the highest number) in your dataset.

3. Subtract the minimum value from the maximum value.

Range = Maximum Value - Minimum Value

The range gives you an idea of how much the data values vary from the smallest to the largest in

the dataset. It’s a simple way to understand the extent of data dispersion.

Here’s an example calculating the range for a dataset:

Suppose you have the following set of exam scores for a class of students:

85, 92, 78, 95, 89, 63, 97, 88, 91, 72

To find the range:

1. First, find the minimum value in the dataset, which is 63.

2. Next, find the maximum value, which is 97.

3. Finally, subtract the minimum value from maximum value to calculate the range.

Range = Maximum Value - Minimum Value

7|Page
Range = 97 - 63

Range = 34

So, the range of exam scores in this dataset is 34. This means that the scores vary from a minimum

of 63 to a maximum of 97, with a range of 34 points.

Interquartile Range (IQR) is a statistical measure of the spread or dispersion of data that is less

sensitive to outliers than the range.

To calculate the interquartile range.

1. Arrange your data in ascending order.

2. Calculate the first quartile (Q1), which represents the 25th percentile of the data. It’s the

value below which 25% of the data falls.

3. Calculate the interquartile range by subtracting Q1 from Q3.

IQR = Q3 - Q1

The interquartile range gives you a measure of the spread of the middle 50% of the data. It’s useful

for identifying the variability of the central portion of the dataset while minimizing the influence

of extreme values or outliers.

Here’s an example of calculating the interquartile range (IQR) for a dataset:

Let’s use the following dataset of exam scores: 68, 75, 80, 85, 88, 92, 95, 98

To calculate the IQR:

1. First, arrange the data in ascending order: 68, 75, 80, 85, 88, 92, 95, 98

2. Calculate the first quartile (Q1) and the third quartile (Q3):

8|Page
● Q1 (25th percentile): The median of the lower half of the data, which is the average

of the 2nd and 3rd value in this case.

Q1 = ( 75 + 80 ) / 2 = 77.5

● Q3 (75th percentile): The median of the upper half of the data, which is the average

of the 5th and 6th values in this case.

Q3 - ( 88 + 92 ) / 2 = 90

3. Calculate the interquartile range (IQR) by subtracting the Q1 from Q3:

IQR - Q3 - Q1

IQR = 90 - 77.5

IQR = 12.5

So, the interquartile range (IQR) for this dataset is 12.5 . It represents the spread of the middle

50% of the data, indicating that the middle 50% of exam scores varies by 12.5 points.

9|Page
Variance

Variance is a measure of how data points differ from the mean. According to Layman, a
variance is a measure of how far a set of data (numbers) are spread out from their mean (average)
value.

Variance means to find the expected difference of deviation from actual value. Therefore,
variance depends on the standard deviation of the given data set. The more the value of variance,
the more the data is scattered from its mean and if the value of variance is low or minimum, then
it is less scattered from the mean. Therefore, it is called a measure of spread of data from mean.

Variance is the expected value of the squared variation of a random variable from its mean
value, in probability and statistics. Informally, variance estimates how far a set of numbers
(random) are spread out from their mean value.
The value of variance is equal to the square of standard deviation, which is another central tool.

Variance is symbolically represented by σ², s², or Var(X).

The formula for variance is given by:

Where:
X (or x) = Value of Observations
μ = Population mean of all Values
x̄ = Sample mean

10 | P a g e
N = Total number of values in the population

As we know already, the variance is the square of standard deviation, i.e.,


Variance = (Standard deviation)2 = σ2

EXAMPLE

Find the variance of the numbers


3, 8, 6, 10, 12, 9, 11, 10, 12, 7.

Given,
3, 8, 6, 10, 12, 9, 11, 10, 12, 7

SOLUTIONS:

STEP 1
Compute the mean of the 10 values given.

Mean = (3+8+6+10+12+9+11+10+12+7) / 10 = 88 / 10 = 8.8

STEP 2
Make a table with three columns, one for the X values, the second for the deviations and the third
for squared deviations. As the data is not given as sample data so we use the formula for population
variance. Thus, the mean is denoted by μ.

11 | P a g e
12 | P a g e
STEP 3

= 73.6 / 10
= 7.36

VARIANCE FORMULAS

Variance can be of either grouped or ungrouped data. To recall, a variance can of two types which
are:

1. Variance of a population
Population Variance - All the members of a group are known as the population. When we want to
find how each data point in a given population varies or is spread out then we use the population
variance. It is used to give the squared distance of each data point from the population mean.
2. Variance of a sample
Sample Variance - If the size of the population is too large then it is difficult to take each data
point into consideration. In such a case, a select number of data points are picked up from the
population to form the sample that can describe the entire group. Thus, the sample variance can be
defined as the average of the squared distances from the mean. The variance is always calculated
with respect to the sample mean.

The variance of a population is denoted by σ2 and the variance of a sample by s2.

There are separate variance formulas for the ungrouped data and the grouped data. The variance
formulas are mentioned below.

VARIANCE FORMULAS FOR UNGROUPED DATA

Population Variance Formula:

13 | P a g e
σ² = ∑ (x − x̅)² / n
Where,
σ² = Population Variance
∑ = denotes the sum
xi = ith observation of given data
x̄ = is the mean
n = Total number of observations (Population size)

EXAMPLE:
Calculate Population Variance (σ²) from the following data
10, 50, 30, 20,10, 20, 70, 30

Solution:

Mean x̅ = ∑x/n
= (10, 50, 30, 20,10, 20, 70, 30)/8
= 240/8
= 30

Population Variance
σ² = ∑ (x − x̅)² / n
= 3000/8

14 | P a g e
σ² = 375

Sample Variance Formula:


s² = ∑ (x − x̅)²/ n − 1
Where,
s² = Sample Variance
xi = ith observation of given data
x̄ = Sample mean
n = Sample size (or Number of data values in sample)

EXAMPLE:
Calculate Sample Variance (s²) from the following data
10,50,30,20,10,20,70,30

Mean x̅ = ∑x/n
= (10+50+30+20+10+20+70+30)/8
= 240/8
= 30

Sample Variance
s²= ∑ (x − x̅)²/ n − 1
= 3000/7
s² = 428.5714

15 | P a g e
VARIANCE FORMULAS FOR GROUPED DATA

Population Variance Formula:


σ² = ∑ f (m − x̅)² / N
Where,
σ² = Population Variance
∑ = denotes the sum
m = is the mid-point of the ith interval
x̄ = is the mean
N = Total number of observations (Population size)

EXAMPLE:
Calculate Population Variance (σ²) from the following grouped data

16 | P a g e
Solution:

Mean x̅ = ∑fx/n
= 55/25
= 2.2

Population Variance

= [147-(55)²/25]/25
= (147-121)/25
= 26/25
σ² = 1.04

Sample Variance Formula:


s² = ∑ f (m − x̅)² / N − 1
Where,
s² = Sample Variance
∑ = denotes the sum
m = is the mid-point of the ith interval
x̄ = is the mean
N = Total number of observations (Population size)

17 | P a g e
EXAMPLE:
Calculate Sample Variance (s²) from the following grouped data

Solution:

Mean x̅ = ∑fx/n
= 55/25
= 2.2

Sample Variance

= [147-(55)²/25]/24
= (147-121)/24
= 26/24
s² = 1.0833

18 | P a g e
PRACTICAL APPLICATION

ACTIVITY 1. IT'S YOUR TURN.

Find the Variance of the following.

1. Find the variance for the heights of the top 12 buildings in London, England. The heights
(in feet) are: 800, 720, 655, 655, 625, 600, 590, 529, 513, 502, 502, 502.

2. Calculate Population Variance (σ²) from the following data


85,96,76,108,85,100,85,70,95

3. Calculate Sample Variance (s²) from the following data


85, 96, 76, 108, 85, 80, 100, 85, 70, 95

4. Calculate Population Variance (σ²) and Sample Variance (σ²) from the following grouped
data

References:
https://www.cuemath.com/data/variance/
https://www.cuemath.com/variance-formula/
https://byjus.com/variance-formula/
https://atozmath.com/default.aspx

19 | P a g e
Standard
Deviation
In lesson 3, you understand how to compute the variance as it is defined as the average of
the squared differences from the Mean. There are different ways to compute variance. It can either
be the variance of a population and the variance for sample population. It also includes grouped
and ungrouped data which was already discussed in the previous lesson.

In this lesson, you will understand the importance of variance


in computing standard deviation. Standard Deviation is a measure of
how spread out numbers are. Its symbol is σ (the greek letter sigma).
The formula is easy: it is the square root of the Variance. Low, or
small, standard deviation indicates data are clustered tightly around
the mean, and high, or large, standard deviation indicates data are
more spread out. A standard deviation close to zero indicates that data
points are very close to the mean, whereas a larger standard deviation
indicates data points are spread further away from the mean.

In the image, the curve on top is more spread out and therefore has a
higher standard deviation, while the curve below is more clustered
around the mean and therefore has a lower standard deviation.

This is the formula for standard deviation:

Where:
σ = standard deviation
∑ = denotes the sum
xi = individual data point in the set
µ = is the mean
N = Total number of observations (Population size)

20 | P a g e
OK. Let us explain it step by step.
Say we have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11.
To calculate the standard deviation of those numbers:
1. Work out the mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result.
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!

The formula actually says all of that, and I will show you how.
Example:
1. Sam has 20 Rose Bushes. The number of flowers on each bush is
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Work out the Standard Deviation.
Solution:
Step 1. Work out the mean
The mean is:
= 9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+4
20
= 140
20
=7
And so μ = 7
Step 2. Then for each number: subtract the Mean and square the result.
This is the part of the formula that says:
So what is xi ? They are the individual x values 9, 2, 5, 4, 12, 7, etc…
In other words x1 = 9, x2 = 2, x3 = 5, etc.
So it says "for each value, subtract the mean and square the result", like this
(9 - 7)2 = (2)2 = 4
(2 - 7)2 = (-5)2 = 25
(5 - 7)2 = (-2)2 = 4
(4 - 7)2 = (-3)2 = 9
(12 - 7)2 = (5)2 = 25
(7 - 7)2 = (0)2 = 0
(8 - 7)2 = (1)2 = 1
... etc …
21 | P a g e
And we get these results:
4, 25, 4, 9, 25, 0, 1, 16, 4, 16, 0, 9, 25, 4, 9, 9, 4, 1, 4, 9

Step 3. Then work out the mean of those squared differences.


To work out the mean, add up all the values then divide by how many.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use "Sigma": Σ

Note:
The handy sigma notation says to sum up as many terms as we want:

Sigma Notation

We want to add up all the values from 1 to N, where N=20 in our case because there
are
20 values:

Which means: Sum all values from (x1-7)2 to (xN-7)2

We already calculated (x1-7)2=4 etc. in the previous step, so just sum them up:
= 4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178

But that isn't the mean yet, we need to divide by how many, which is done by
multiplying by 1/N (the same as dividing by N):

Mean of squared differences = (1/20) × 178 = 8.9


(Note: this value is called the "Variance")

Step 4. Take the square root of that and we are done!

σ= √(8.9) = 2.983…

22 | P a g e
SAMPLE STANDARD DEVIATION
Example:
2. Sam has 20 rose bushes, but only counted the flowers on 6 of them!
The "population" is all 20 rose bushes, and the "sample" is the 6 bushes that Sam
counted the flowers of.
Let us say Sam's flower counts are:
9, 2, 5, 4, 12, 7

Note:
We can still estimate the Standard Deviation.
But when we use the sample as an estimate of the whole population, the Standard
Deviation formula changes to this:

The formula for Sample Standard Deviation:

where:
s = sample standard deviation
∑ = denotes the sum
X = the value of the data distribution
𝑥 = is the mean
N = Total number of observations (sample size)
Note:
The important change is "N-1" instead of "N" (which is called "Bessel's correction").

The symbols also change to reflect that we are working on a sample instead of the whole
population:
● The mean is now x (called "x-bar") for sample mean, instead of μ for the population
mean,
● And the answer is s (for sample standard deviation) instead of σ.
But they do not affect the calculations. Only N-1 instead of N changes the calculations.

Solution:
Step 1. Work out the mean
Using sampled values 9, 2, 5, 4, 12, 7
The mean is (9+2+5+4+12+7) / 6 = 39/6 = 6.5
So: x = 6.5

23 | P a g e
Step 2. Then for each number: subtract the Mean and square the result.
(9 - 6.5)2 = (2.5)2 = 6.25
(2 - 6.5)2 = (-4.5)2 = 20.25
(5 - 6.5)2 = (-1.5)2 = 2.25
(4 - 6.5)2 = (-2.5)2 = 6.25
(12 - 6.5)2 = (5.5)2 = 30.25
(7 - 6.5)2 = (0.5)2 = 0.25

Step 3. Then work out the mean of those squared differences.

To work out the mean, add up all the values then divide by how many.

But hang on ... we are calculating the Sample Standard Deviation, so instead of
dividing by how many (N), we will divide by N-1

Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5


Divide by N-1: (1/5) × 65.5 = 13.1
(This value is called the "Sample Variance")

Step 4. Take the square root of that and we are done!

𝑠 = √(13.1) = 3.619

Note: For you to better compute the standard deviation in either the whole population or
sample population, you can use the Distribution Table in computing the variance. This
will help you compute it easily since standard deviation is the square root of variance.

STANDARD DEVIATION OF GROUPED DATA AND UNGROUPED DATA


In case of grouped data or grouped frequency distribution, the standard deviation can be
found by considering the frequency of data values. This can be understood with the help of an
example.
GROUPED DATA

SAMPLE POPULATION
2 2
√𝑠2 = √∑𝑓(𝑥−𝑥) √𝜎 2 =√
∑𝑓(𝑥−µ)
𝑛−1 𝑁

24 | P a g e
UNGROUPED DATA

2 2
√𝑠2 = √∑𝑓(𝑥−𝑥) √𝜎 2 =√
∑𝑓(𝑥−µ)
𝑛−1 𝑁

Where:
f = frequency
x = classmark for sample
X = classmark for population
𝑥 = sample mean
µ = population mean
N = count of values in Population
n = count of individual values in sample

Question: Calculate the mean, variance and standard deviation for the following data:

Class Interval 0-10 10-20 20-30 30-40 40-50 50-60


Frequency 27 10 7 5 4 2
Solution #1:

Class Interval Frequency (f) Mid Value (xi ) fxi fxi2

0-10 27 5 135 675

10-20 10 15 150 2250


20-30 7 25 175 4375

30-40 5 35 175 6125

40-50 4 45 180 8100

50-60 2 55 110 6050


∑f = 55 ∑fxi = 925 ∑fxi2= 27575

n= ∑f = 55
∑𝑓𝑥
Mean =
∑𝑓

25 | P a g e
= 925/55
= 16.818
(∑𝑓𝑥)2
∑𝑓𝑥 2 −
2 𝑛
Variance = s = 𝑛−1
(925)2
(27575) −
55
= 55−1
27575 – 15556.8182
= 54
= 222.559
2
∑𝑓𝑥 2 −(∑𝑓𝑥)
Standard Deviation = s = √ 𝑛−1
𝑛

= √222.559
= 14.918

Solution #2:

Class Frequen Mid fx 𝑥 x-𝑥 (x-𝑥)2 f(x-𝑥)2


Interval cy (f) Value
(x )

0-10 27 5 135 16.82 -11.82 139.71 3772.17

10-20 10 15 150 16.82 -1.82 2.31 23.1

20-30 7 25 175 16.82 8.18 66.91 468.37

30-40 5 35 175 16.82 18.18 330.51 1652.55

40-50 4 45 180 16.82 28.18 794.11 3176.44


50-60 2 55 110 16.82 38.18 1457.71 2915.42
∑f = 55 ∑fx= 925 ∑f(x-𝑥)2
=12008.
05

n= ∑f = 55
∑𝑓𝑥
Mean =
∑𝑓
= 925/55

26 | P a g e
= 16.818

∑𝑓(𝑥−𝑥)2
Variance = s2 =
𝑛−1
12008.05
= 54
= 222.37

2
∑𝑓(𝑥−𝑥)
Standard Deviation = s = √ 𝑛−1
= √222.37
= 14.91

Note: You can use solution number 1 and 2 when you are calculating standard deviation of
grouped data. You can refer to the formula that has been discussed in lesson 3 (Variance) and
then compute the standard deviation by getting the square root of the value of the variance.

Practice Problems on Standard Deviation


1. Calculate the standard deviation of the following values:
5, 10, 25, 30, 50.
2. Find the mean and standard deviation for the following data.

x 60 61 62 63 64 65 66 67 68

f 2 1 12 29 25 12 10 4 5

PRACTICAL APPLICATION
Let’s calculate the standard deviation for the number of gold coins on a ship run by pirates.
There are a total of 100 pirates on the ship. Statistically, it means that the population is 100. We
use the standard deviation equation for the entire population if we know a number of gold coins
every pirate has.
Statistically, let’s consider a sample of 5 and here you can use the standard deviation equation
for this sample population. This means we have a sample size of 5 and in this case, we use the
standard deviation equation for the sample of a population.
Consider the number of gold coins 5 pirates have; 4, 2, 5, 8, 6.
References:
https://www.mathsisfun.com/data/standard-deviation-formulas.html
https://www.nlm.nih.gov/oet/ed/stats/02-900.html
https://byjus.com/maths/standard-deviation/

27 | P a g e
Consideration of
Choosing a Measure of
Variability

Earlier in this module, we already discussed the five measures of variations such as the
IQV, range, interquartile range, and standard deviation which can be used to indicate a
distribution's level of variability. Which one should we use, however? There is no definite answer
to this question as we typically use one measure of variation, and choosing the appropriate one
involves several considerations. The variable's measurement level is one of the most fundamental
factors to consider when selecting a measure of variability, just like when selecting a measure of
central tendency. The data must be measured at the level required for that measure or higher to be
used correctly.

How to Choose a Measure for Variation

Figure 1

28 | P a g e
A. Nominal level. The options for a measure of variability with nominal variables are limited
to the IQV.

B. Ordinal level. For ordinal variables, it is more challenging to choose the appropriate
measure of variation. Although the IQV can be used to reflect variation in ordinal variable
distributions, it is less informative since it is not sensitive to the rank ordering of values implied
by ordinal variables. The interquartile range is also another option. The interquartile range,
however, is dependent on the difference between two scores to express variance, information
that is derived from measured ordinal scores. The interquartile range is the acceptable
compromise (showing Q1 and Q3 together with the median, taking the interquartile range to
be the range) of rank-ordered values, where the middle 50% of the observations are included.

C. Interval-ratio level. The three options for interval-ratio variables: are variance (also
known as standard deviation), range, or interquartile range. The variance and/or standard
deviation are typically favored because the range and, to a lesser extent, the interquartile range
are based on just two scores in the distribution (and, as a result, tend to be sensitive if either of
the two points is excessive). However, the range and the interquartile range might be utilized
if a distribution is so highly skewed that the mean is no longer indicative of the distribution's
central tendency. When reading tables or quickly scanning data to acquire a general
understanding of the degree of distributional dispersion, the range, and the interquartile range
will also be helpful.

Practical Application

You decide to investigate how young Americans feel about alcohol and (ATDRINK) cigarette use
(ATSMOKE). You obtain the following selected output shown below. You should note that
ATDRINK measures how respondents feel about trying alcohol, while ATSMOKE measures how
respondents feel about smoking one pack of cigarettes per day. These are substantially different
questions, and you should consider that in your answer.

a. What would an appropriate measure of variability be for these variables? Why?

b. Calculate the appropriate measure of variability for each variable.

29 | P a g e
c. In 2006, was there more variability in attitudes toward trying alcohol or smoking one pack of
cigarettes per day? Offer an explanation for your findings.

Adolescent Trying Alcohol Smoking One Pack


Attitudes Toward of Cigarettes per
Alcohol and Day
Cigarettes
Don’t Disapprove 71.0% 19.3%
Disapprove 16.2% 30.2%
Strongly Disapprove 12.8% 50.5%
Total 100% 100%
N = 1,466 N = 1,474

30 | P a g e
Normal Distribution
Previously, you have learned about continuous random variables - variables that have a

value anywhere in a given interval. In this module, you will learn about the most important of all

continuous random variables - the normal distribution variable.

In this module, you will be able to:

● illustrate a normal random variable and its characteristics;

● construct a normal curve;

● identify the regions under the normal curve that correspond to different standard normal

values;

● compute probabilities and percentiles using the standard normal table

31 | P a g e
Normal
Distribution

The normal distribution is the most important distribution in statistics. Many researchers

from different fields use its idea in order to test their research hypotheses that will generate new

knowledge and transform this knowledge into new applications that improve the quality of

people’s lives (Albay 2019, p. 82).

You are expected to learn normal distribution and its characteristics and how to construct

a normal curve.

At the end of the lesson, you are expected to:

1. illustrate a normal random variable and its characteristics,

2. construct a normal curve;

3. describe the characteristics of normal random variable; and

4. discuss the importance of knowing ourselves better than the other.

32 | P a g e
The Normal Random Variable
A continuous random variable is considered normal when its values are distributed
normally, that is, when the majority of the values are close to the expected value with only very
few values that are extremely smaller and extremely larger. For Example, in a grade 11
class,observed that the students normally have a height of 170 cm or very close to that, with only
a number of students who are extremely tall and some who are extremely short.This Illustrates a
normal random variable. Other Examples of normal random variables include blood pressure,
scores in a test, and the weights of students belonging to the same group.
Figure 6.1 shows the graph of a normal distribution. The graph of a normal distribution is
a bell-shaped curve, which is also called the normal curve, and the majority of the values are
clustered around the value of 5 with only very few values which are too small and too large.

Normal Random Variable


A continuous random variable X can be considered a normal variable when it has a
probability density function of the form:

33 | P a g e
1 (𝑥−𝜇)2

𝑓(𝑥) = 𝑒 2𝜎2
𝜎√2𝜋
where 𝜇 is the expected value (mean), 𝜎 is the standard deviation, 𝜋 ≈ 3.14, and 𝑒 ≈
2.178.

Properties of the Normal Distribution


The following are properties that can be observed
from the graph of a normal distribution, which is also
called the Gaussian distribution.

1. The curve of the distribution is bell-shaped. The


graph is asymptotic to the x-axis - the value of the
variable approached but will never be equal to 0.
2. The curve is symmetrical about the mean.

This means if we cut the curve about the mean,


we will have balanced proportions of the halves.
Specifically, we say that one is a reflection of the
other. Meaning, the qualities exhibited by one
are the same qualities exhibited by the other.
3. The mean, median and mode are of the equal
values and when sketched, they coincide at the center of the graph.

34 | P a g e
This means that the mean, median and mode of
the given distribution are located at exactly one
point since their values are equal, and they are
located at the center of the graph which
indicates the highest peak of the curve.

4. The width of the curve is determined by the standard deviation of the distribution.

The curve considered at the left side defines a


standard normal curve. A standard normal
curve is a normal distribution that has mean
value equal to 0 and standard deviation equal to
1.
5. The curve extends indefinitely approaching
the x-axis but never touching it. Thus, the curve is asymptotic to the line.
6. The area of the region under the curve is 1. It represents the probability or percentage or
proportion associated with the specific sets of measurement values.

This means that for every specific


measurement value, there corresponds exactly
one probability / percentage / proportion value
which describes a particular area of the region
under the normal curve.

7. The standard deviation precisely describes the spread of the normal curve. In fact,
approximately 68.3% of the values in the distribution are within one standard deviation
from mean (from each side), 95.4% is within two standard deviations from mean, and
99.7% is within three standard deviations from the mean.

35 | P a g e
These properties will be very important as you explore further the study of the normal
distribution and its applications. Moreover, knowing the properties of the distribution will also
facilitate the solutions to some problems involving the identification of the mean and standard
deviation, as well as the construction of the normal curve.

Example 1: Find the standard deviation of the normal distribution where 99.7% of the values
fall between 52 and 82.

Solution: The mean is the midpoint halfway between 52 and 82. Thus,
52 + 82
𝜇=
2
134
𝜇=
4

𝜇 = 67 The mean is 67.

Since 99.7% of the values fall between 52 and 82, then by property 7, there are
three deviations from the mean, i.e., 𝜇 + 3𝜎. Solving for 𝜎,
𝜇 + 3𝜎 = 82 or 𝜇 − 3𝜎 = 52
67 + 3𝜎 = 82 67 − 3𝜎 = 52
3𝜎 = 15 −3𝜎 = −15
𝜎=5 𝜎=5 The standard deviation is 5.

Example 2: Assume that 68.3% of grade 11 students have heights between 1.5 and 1.7 m and
the data are normally distributed.
a. Find the mean.

36 | P a g e
b. Compute the standard deviation.

Solution: a. To find the mean, compute the value that is halfway between 1.5 and 1.7.
1.5 + 1.7
𝜇= 2

𝜇 = 1.6 The mean height is 1.6 m.

b. By property 7 of the given normal distribution, 68.3% accounts for 1 standard


deviation of the mean. Therefore, 𝜇 − 𝜎 = 1.5 and 𝜇 + 𝜎 = 1.7. Use the second
equation to solve for the standard deviation, substituting 𝜇 = 1.6.
𝜇 + 𝜎 = 1.7
1.6 + 𝜎 = 1.7
𝜎 = 0.1 The standard deviation is 0.1.

Read and answer the following questions on a


separate sheet of paper.

1. Complete the statement by filling in the appropriate word or term on the blank.
a. The graph of a normal distribution is asymptotic to the ________.
b. The total area under the normal curve is ________.
c. The graph of a normal distribution is symmetric along the vertical line that contains
the ________ of the distribution.
d. The mean, median and mode of normal distribution are ________.
e. The graph of the normal distribution depends on the ________ and the ________.
2. The IQ scores of 95.4% of the grade 11 students are between 90 and 110.
a. Compute the mean.
b. Find the standard deviation.

37 | P a g e
Areas Under the
Normal Curve

Areas under all normal curves are related. For example, the area
percentage to the right of 1.5 standard deviations above the mean is identical
for all normal curves. (The term "area" will refer to "area percentage".)

The fact stated above is the reason we can find an area over an interval for any normal curve by
finding the corresponding area under a standard normal curve (with a mean of 0 and a standard
deviation of 1).

We have seen that the Empirical Rule (68% - 95% - 99.7%) subdivides the area under a normal
distribution into sections with widths of one standard deviation. These subdivisions are fine for
determining percentages as long as we are dealing with values that fall at these exact subdivision
locations.

What do we do when the value does not fall at an Empirical Rule subdivision? By using z-scores,
we have the ability to locate a percentage (or area) under a standard normal distribution at any
location. Z-scores allow for the calculation of area percentages (also called proportions or
probabilities) anywhere along a standard normal distribution curve (and, consequently along the
corresponding normal distribution).

The area percentage (proportion, probability) calculated using a z-score will be a decimal value
between 0 and 1, and will appear in a Z-Score Table. The total area under any normal curve is 1
(or 100%). Since the normal curve is symmetric about the mean, the area on either sides of the
mean is 0.5 (or 50%).

To find a specific area under a normal curve, find the z-score of the data value and use a
Z-Score Table to find the area. A Z-Score Table, is a table that shows the percentage of values (or
area percentage) to the left of a given z-score on a standard normal distribution.
Positive Z-Score Table Negative Z-Score Table

38 | P a g e
• These tables are designed only for the standard normal distribution, which has a mean of 0 and a
standard deviation of 1.

• The left most column is how many standard deviations above (or below) the mean to one decimal
place. (The label in the row contains the integer part and the first decimal of the z-score.)

• The part of the z-score denoting hundredths is found across the top row of the table. (The label
for columns contains the second decimal of the z-score.)

• The intersection of the rows and columns gives the probability or area under the normal curve.
Each value in the body of the table is a cumulative area.

Z-Score Tables come in different formats, determined by where the computations


were started. Consider these two most popular formats:
1. One form of the table yields probability or area starting at the mean and going to the right of
the mean up to the needed z-score. These tables are usually labeled "cumulative from mean". This
table basically works with half of the area under the normal curve, and the user must take this into
39 | P a g e
consideration and make adjustments when using this table. This type of table lists positive z-scores
only.
2. Another form of the table yields probability or area starting from negative infinity (the farthest
left) and going to the right up to the needed z-score. These tables are usually labeled "cumulative
from the left". This table works with the entire area under the normal curve, and requires less
adjustments than the first option. This table lists both positive and negative z-scores. Most
beginning statistical textbooks include this Z-Score Table, and this site will be using this format.

Example 1: .

40 | P a g e
41 | P a g e
Example 2:

42 | P a g e
43 | P a g e
Example 3:

44 | P a g e
45 | P a g e
Example 4:

46 | P a g e
47 | P a g e
Example 5:

48 | P a g e
49 | P a g e
Shaded Region Under
The Normal Curve

Mathematicians are not fond of Lengthy expressions. They use denotations, notations or
symbols instead.

Probability notations are commonly used to express a lengthy idea into symbols concerning the
normaI curve.

The following are the most common probability notations used in studying concepts on the
normaI curve.

P(a < z < b) this notation represents the idea stating the probability that the z-value is between
a and b

P(z> a) this notation represents the idea stating the probability that the z-value is above a

P(z< a) this notation represents the idea stating the probability that the z-value is below a where a
and b are z-score values.

P(z = a) = 0 this notation represents the idea stating the probability that the z-value is equal to a is
0. This notation indicates that a z-value is equal to exactly one point on the curve. With that singIe
point, a line can be drawn signifying the probability can be below or above it. That is why, for a
z-value to be exactly equalI to a value its probability is equal to 0.

Some of the terms involved in using notations.


Negative P (z < a) Positive P (z > a)
“less than z” “greater than z”
“to the left of z” “to the right of z”
“below z” “above z”
“lower than z” “more than z”
“under z” “at least z”

50 | P a g e
Illustration.
1. Find the proportion of the area between z = 2 and z = 3.

Steps Solution

Draw a normaI curve.

Locate the required z-values. Shade the


required region.

Locate from the z-TabIe the z = 2 has a corresponding area of 0.4772


corresponding areas of the given z-
vaLues. z = 3 has a corresponding area of 0.4987

With the graph, decide on what operation With the given graph, the operation to be
wiII be used to identify the proportion of used is subtraction.
the area of the region. Use probability
notation to avoid lengthy expressions. P(2 < z <3) = 0.4987 ‑ 0.4772 = 0.0215

Make a concluding statement. The required area between z = 2 and z =


3 is 0.0215.

2. Find the proportion of the area beIow z = 1.

Steps Solution

Draw a normaI curve. Locate the


required z-vaIue. Shade the required
region.

Locate from the z-Table the z = 1 has a corresponding area of 0.3413.


corresponding areas of the given z-
value This area signifies onIy from z = 0 to z = 1.

51 | P a g e
With the graph, decide on what With the given graph, the operation to be
operation wiII be used to identify the used is addition. P(z < 1) = 0.5000+ 0.3413
proportion of the area of the region. Use = 0.8413 This is so because the area of the
probability notation to avoid lengthy region from z = 0 to its Ieft is 0.5 since
expressions.
it represents haIf of the normaI curve. With
the property that the curve has area equal
to 1, therefore haIf of its area signifies
0.5000 or 0.5.

Make a concluding statement. The required area be;ow z = 1 is 0.8413.

3. Find the area that the z-values is exactIy equal to 1

Steps Solution

Draw a normal curve.

Locate the required z-values. Shade the


required region.

Locate from the z-TabIe the With the given graph, there is no
corresponding areas of the given z- need to decide on what operation to be
values. used since as defined, if a z- value is
equal to exactly one number then its
probability or the proportion of the area of
the region is automatically 0.

With the graph, decide on what P (z = 1) = 0


operation wiII be used to identify the
proportion of the area of the region. Use
probability notation to avoid Lengthy
expressions.

Make a concluding statement. The required area for the z-value exactly
equal to 1 is 0.

52 | P a g e
Understanding
Z-score
Z-score: Definition, Formula, and Uses
Introduction:
What is a Z-score?
A z-score measures the distance between a data point and the mean using standard
deviations. Z-scores can be positive or negative. The sign tells you whether the observation is
above or below the mean.
For example, a z-score of +2 indicates that the data point falls two standard deviations
above the mean, while a -2 signifies it is two standard deviations below the mean.
A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores,
and I’ll use those terms interchangeably.

Standardizing the raw data by transforming them into z-scores provides the following benefits:

o Understand where a data point fits into a distribution.

o Compare observations between dissimilar variables.

o Identify outliers

o Calculate probabilities and percentiles using the standard normal distribution.

How to Find a Z-score?

To calculate z-scores, take the raw measurements, subtract the mean, and divide by the standard
deviation.

The formula for finding z-scores is the following:

53 | P a g e
X represents the data point of interest.

Mu and sigma represent the mean and standard deviation for the population from which you drew
your sample.

Using Z-scores to Understand How an Observation Fits into a Distribution

Z-scores help you understand where a specific observation falls within a distribution. Sometimes
the raw test scores are not informative.

When your data are normally distributed, you can graph z-scores on the standard normal
distribution, which is a particular form of the normal distribution. The mean occurs at the peak
with a z-score of zero. Above average z-scores are on the right half of the distribution and below
average values are on the left. The graph below shows where the baby’s z-score of 0.74 fits in the
population.

54 | P a g e
Percentile Under the
Normal Curve

Introduction:
Which of the following expressions are familiar to you?
● “First honor”
● Top five”
● “A score of 98%”
These are expressions of order. They indicate relative standing. In real life many people want to a
high level in terms of relative standing

PERCENTILE:
For any set of measurements (arranged in ascending and descending order), a percentile (or a
centile) is a point in the distribution such that a given number of casesis below it.

A Percentile is a measure of relative standing. It is a descriptive measure of the relationship of the


measurement to the rest of the data.

A percentile is a comparison score between a particular score and the scores of the rest of a group.
It shows the percentage of scores that a particular score surpassed. For example, if you score 75
points on a test, and are ranked in the 85 th percentile, it means that the score 75 is higher than 85%
of the scores.

At the end of the lesson, you are expected to:


● Find z-scores when probability are given
● Locate percentile under normal curve

55 | P a g e
AREAS UNDER THE NORMAL CURVE

What are percentiles on a normal distribution curve?

A percentile is the value in a normal distribution that has a specified percentage of observations
below it. Percentiles are often used in standardized tests like the GRE and in comparing height and
weight of children to gauge their development relative to their peers.

56 | P a g e
57 | P a g e
What is the 95th percentile of a normal curve?

What is the 25th percentile of the standard normal distribution?

58 | P a g e
A. Directions: Solve the following problems.

Scenario: Danny is one of the students who took final examinations in three subjects. The results
of the examination are as follows:

Subject Average Score of 𝜎 Danny’s Score (x)


Students (𝜇)

Math 78 5 85

English 82 9 87

Science 85 15 92

1. In which subject did Danny get a higher score than the rest?
2. In which subject did Danny get a lower score than the rest?
3. If the top 7% of the examinees in Math will be given incentives, what must be their score
to be part of the list?

B. Directions: Solve the following.

1. What is the z-score of 𝑥 = 90 if 𝜇 = 95.3 and 𝜎 = 3?


2. What is the area under the normal curve bounded by 𝑥 = 90 and 𝜇 = 95.3 if 𝜎 = 3?
3. What is the probability that an amount between ₱85, 000 and ₱110, 000 will be randomly
chosen if the mean is ₱100, 000, with a standard deviation of ₱15, 000?

59 | P a g e
4. What is the probability that a randomly selected value lies between 115 and 125 if the mean
is 100, with a standard deviation of 15?

C. Directions: Find the area under a normal curve in percent given the following conditions.

1. from 𝑧 = 0 𝑡𝑜 𝑧 = 2.07
2. from 𝑧 = 0 𝑡𝑜 𝑧 = −1.03
3. from 𝑧 = −2.33 𝑡𝑜 𝑧 = 3.03
4. from 𝑧 = 0.22 𝑡𝑜 𝑧 = 2.22

D. Directions: Solve the following.

The average net sales per year of the products in 60 branches of DG Company is ₱85
million, with a standard deviation of ₱15 million. Determine how many branches have net sales
of:

1. ₱60 million to ₱85 million


2. ₱70 million to ₱105 million
3. ₱90 million to ₱110 million
4. Above ₱110 million
5. Below ₱60 million

60 | P a g e
Learning Materials

● Module
● Ppt Presentation
❖ https://www.canva.com/design/DAFxyx
PhHVw/74xjGdwEAk7imp4tFNb81Q/edit?
utm_content=DAFxyxPhHVw&utm_campa
ign=designshare&utm_medium=link2&utm
_source=sharebutton

61 | P a g e

You might also like