0% found this document useful (0 votes)
51 views

C - Normal Distribution

This document discusses the normal distribution and its key parameters. It provides details on: - The mean and standard deviation parameters of the normal distribution and how they define its shape and location. - How the normal distribution is a continuous probability distribution defined by its mean and standard deviation. It has a characteristic bell-shaped curve. - Key properties of the normal distribution including that about 99.73% of values lie within 3 standard deviations of the mean and the total area under the curve is equal to 1. - How to calculate probabilities and find values using the standard normal distribution table by transforming variables using the mean and standard deviation.

Uploaded by

Ri chal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

C - Normal Distribution

This document discusses the normal distribution and its key parameters. It provides details on: - The mean and standard deviation parameters of the normal distribution and how they define its shape and location. - How the normal distribution is a continuous probability distribution defined by its mean and standard deviation. It has a characteristic bell-shaped curve. - Key properties of the normal distribution including that about 99.73% of values lie within 3 standard deviations of the mean and the total area under the curve is equal to 1. - How to calculate probabilities and find values using the standard normal distribution table by transforming variables using the mean and standard deviation.

Uploaded by

Ri chal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 196

C.

Normal Distribution

Dr Arunangshu Mukhopadhyay
Professor
Dr B R Ambedkar NIT Jalandhar
Normal Distribution
Parameters of the Normal Distribution
Mean; The mean is the central tendency of the distribution. It defines the
location of the peak for normal distributions. Most values cluster around the
mean. On a graph, changing the mean shifts the entire curve left or right on
the X-axis.

2
Non-parametric tests

3
Descriptive/Inductive Statistics
Bell shaped curve represents probability density curve.

■ y= σ
Where, µ = population mean
σ = population SD Normal Distribution
x = actual value

µ 4
Why hump?

5
Normal Probability Distribution Curve
(Gaussian Distribution)

*One of the most important examples of a continuous probability


distribution is the normal distribution, normal curve, OR "bell-shaped
curve“ or Gaussian distribution. It is defined by the equation. 6
Normal Probability Distribution Curve
(Gaussian Distribution)

■ 99.73%of the values lie within +/-3 standard deviation(σ) of the mean.
■ total area under the curve is equal to 100%(or 1.00).
■ Two parameters, µ and σ. Note that the normal distribution is actually a
family of distributions, since µ and σ determine the shape of the
distribution. 7
The Normal Distribution
■ “Bell shaped”
■ Symmetrical f(X)
■ Mean, median and
mode are equal
■ Interquartile range
x
equals 1.33 s
■ Random variable Mean
has infinite range Median
Mode

8
The Normal Distribution

9
The Normal Distribution

10
The Mathematical Model

11
Many Normal Distributions
There are an infinite number of normal distributions

By varying the parameters σ and μ, we obtain different normal distributions 12


Standard Deviation and the
Normal Distribution
Standard deviation defines the shape of the
normal distribution (particularly width)
■ Larger std. dev. means more scatter about
the mean, worse precision.

■ Smaller std. dev. means less scatter about


the mean, better precision

13
Equations

14
Finding Probabilities
Probability is the area
under the curve!

f(X)

X
c d
15
Which Table to Use?

An infinite number of normal distributions means an infinite


number of tables to look up!
16
The Standard Normal Distribution (U)

17
Transformation
by linear transformation function

18
The Standard Normal Distribution (U)

Percent of items included between certain values of the


std. deviation 19
Standardizing Example

Normal Distribution Standardized


Normal Distribution

Shaded Area Exaggerated


20
Example:

Normal Distribution Standardized


Normal Distribution

Shaded Area Exaggerated 21


Standardized
Normal Distribution

22
Example:

Normal Distribution Standardized


Normal Distribution

23
Shaded Area Exaggerated
Examples;
1.The chest girths of a large sample of men were measured and the mean
and standard deviation of the measurements were found to be
Mean = 96 cm, Standard deviation = 8 cm
It is required to estimate proportion of men in the population with chest
girths
i) Greater than 104 cm
ii) Less than 100 cm
iii) Less than 90 cm
Sol. Since the sample is large, we can assume that the mean and standard
deviation of the sample are good estimates of the corresponding parameters
in the population, i.e., μ = 96 cm, σ = 8 cm 24
(i) (ii) σ=8
σ=8

104 100

μ = 96 μ = 96

(iii)
σ=8

90
μ = 96

25
26
27
Descriptive Statistics
2. The diameter of a metal shaft in a direct drive is having mean 0.2508 inch
& SD 0.0005 inch. The specification on the shaft has been established as
0.2500 ± 0.0015 inch. Determine what fraction of shaft produced confirm to
specifications?
Sol. 0.2500 ± 0.0015 inch = 0.2515, 0.2485 inch
Pr (x ≥ 0.2515) = Pr (U ≥ (0.2515 – 0.2508)/0.0005)
⇒ α = 0.0808 = 8.08%
Similarly, Pr (x ≤ 0.2485) = Pr (U ≤ (0.2485 – 0.2508)/0.0005)
⇒ α = Pr (U ≤ – 4.6) = 0 = 0%
Total conforming = 100 – (8.08 + 0) = 91.92%
Therefore, 0.9192 fraction of total produced shaft confirm to specifications.28
Inductive Statistics

For large value of


‘n’, distribution
becomes normal

µ
29
Equations

30
Centre Limit Theorem
The Central Limit Theorem (CLT) states that the distribution of a sample
mean that approximates the normal distribution, as the sample size becomes
larger, assuming that all the samples are similar, and no matter what the
shape of the population distribution.
Comparison between the Normal Theorem and Central Limit Theorem
No. Normal Theorem (NT) Central Limit Theorem (CLT)

1 Population mean = μ, and population Population mean = μ, and population standard deviation = σ
standard deviation = σ
2 Shape of the population histogram is Shape of the population histogram is either unknown or not
known to be a N(μ, σ2) curve normal
3 Sample average X is said to have Sample average X is said to have N(μx, σx2) approximately
N(μx, σx2) for any n only for large n
μx = μ, and σx = σ/√n μx = μ, and σx = σ/√n

31
Centre Limit Theorem

32
33
34
Z Formula for Sample Means

35
Z Values for Some of the More Common
Levels of Confidence

36
37
Example
Graphic Solution to Example

.5 .5
00 00
. 0 . 0
4 4
2 2
0 0 1
8 7 8 7 .
X 0 Z
5 7 4
1
Equal
Areas
of .0793
Statistical Estimation
• Point estimate -- the single value of a statistic calculated from a sample

• Interval Estimate -- a range of values calculated from a sample statistic(s)


and standardized statistics, such as the Z.
– Selection of the standardized statistic is determined by the sampling
distribution.
– Selection of critical values of the standardized statistic is determined by
the desired level of confidence.
Confidence Interval to Estimate μ
when σ is Known

41
Confidence Interval to Estimate μ
when n is Large
• Point estimate

• Interval
Estimate
Distribution of Sample Means
for (1-α)% Confidence

1−α

μ X

Z
0
Distribution of Sample Means
for (1-α)% Confidence

μ X

Z
0
Distribution of Sample Means
for (1-α)% Confidence

μ X

Z
0
Probability Interpretation
of the Level of Confidence
Distribution of Sample Means
for 95% Confidence

.025 .025
95%
.47 .47
50 50

μ X

Z
-1.96 0 1.96
95% Confidence Interval for μ
95% Confidence Intervals for μ

95%

μ X

X
X

X
X
X
X
95% Confidence Intervals for μ
Is our interval,
95%
143.22 ≤ μ ≤
162.78, in the
μ red?
X

X
X

X
X
X
X
Example
Example
Example:

53
Confidence limits
• A confidence interval is the probability that the actual parameter will
fall between a pair of values around the mean.
• It gives the degree of certainty or uncertainty in a sampling method.
• Mostly used intervals are 90%, 95% and 99%.
• Here we can say 99% interval will have greater probability of
containing true data than 90%.
• Confidence limit lies within the specification limit.
• It is symmetric from the mean. (e.g. for 5% significance limit, 2.5%
will be at both side of the mean.)

99% 95%90% 90% 95%99%


Continued…
• Example: assume a yarn having certain count say 32 is tested. From the test results we can say that we are
95% sure that the data will fall between count say 30 to 34 and 99% sure that data will fall between say 27
to 37.
• 100% confidence interval means no data will exist outside of this interval.
• When lesser deviation is there then this range will be narrower. And for a more deviating data this interval
would be wider.

99%95%90% 90% 95%99%

Changes with change Changes with change


in data & curve in data & curve

99%95%90% 90% 95%99%


Continued…
■ E. g. There’s a normal distribution curve of certain data, having mean μ. After testing the sample, we
got sample mean A as shown in the figure. This point lies between range 95% and 99% confidence
limit.
■ We can conclude that for 95% confidence limit point A is rejected.
But we cannot be sure that for 99% limit, point A to be rejected. It is based on rejection criteria.
■ Selection or rejection depends upon the perspective. e.g. if we talk about strength, higher strength is
not objectionable.

*
95% A 99%
Specification limit
• Specification limit is a range of product specification that is provided
by the customer. If product specification (i.e. mean count, mean
strength etc.) is higher or lower than the specification limit then the
product would not be acceptable.
• Example: if customer demands a yarn of count 20 with +/- 5%
specification limit then yarn having count more than 21 and less than
19 will not be accepted.
• Specification limit is not affected by the curve itself. It is a fixed value
throughout the demand and supply process.
• Wider specification limit means more variation in the mean permitted
by the customer whereas narrower specification limit means more
accurate and precise data is needed.

Lower specification limit Upper specification limit

μ-error μ μ+error
Continued…
• It doesn’t change with change in test data or the statistical curve.
• To reduce the error% either the sample size should be high or
deviation should be less.
• Narrower the curve lesser the error. Wider the curve higher the error.
• It is not compulsorily symmetric.

More error

Specification limit doesn’t change

Less error
Confidence limit with respect to
specification limit.
■ Below curves shows respective positions of confidence limits and
specification limit. Confidence limit
Specification limit

(A) Confidence limits within the specification limit (B) Confidence limits outside the specification limit
(Most desirable) (Objectionable)

(C) Lower confidence limit satisfying the (D) Upper confidence limit satisfying the
Specification limit but upper does not Specification limit but lower does not
(Objectionable) (Objectionable)
Determining Sample Size
when Estimating μ
• Z formula

• Error of Estimation
(tolerable error)

• Estimated Sample Size

• Estimated σ
Sample Size When Estimating μ: Example
Solution for Demonstration Problem
Sample size determination for estimating the population mean μ.

Ref
Inductive Statistics

65
Inductive Statistics

66
67
Inductive Statistics

68

69
70
71
72
Inductive Statistics
Q. A company manufacturing rope whose breaking strength is 300 lbs and
population SD = 24 lbs. It is believed by a newly developed process the
mean breaking strength can be improved.
i. Design a decision rule for rejecting the old process at 1% significance
level, if it is agreed to test 64 ropes.
ii. What’ll be the probability of accepting the old process when in fact the
new process has increased the mean breaking strength to 310 lbs?
Assume SD is still 24 lbs.

Sol. i. Here, µ = 300 lbs


σ = 24 lbs
n = 64
Statistical Hypothesis Test

74
Statistical Hypothesis Test

75
■ Based on the truth and the decision we make this table, where-
H0 is null hypothesis is true (e.g. No change in linear density)
H1 is alternate hypothesis is true (e.g. Linear density has changed)

Decision based on sample Truth about population


Alternate
Null Hypothesis
Hypothesis
(H0) True
(H1) True
Case A Case C
Null Hypothesis
No error Type 2 Error
H0
(1-⍺) (β) (Customer risk)
Case B Case D
Alternate Hypothesis
Type 1 Error No error
H1 (Reject H0)
(⍺) (Producer risk) (1-β)
Let’s take an example
Type (A)
■ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 31.
(taking 5% significance)
= 1.5 (at 2.5% significance both side, Z=1.96)

No error H0 True

z=1.5

-1.96 1.96
Accept reject
In actual Null hypothesis was true. And we fail to reject null hypothesis.
That is correct decision of “1-α” power
Let’s take an example
Type (B)
■ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 32.
(taking 5% significance)
= 2.1 (at 2.5% significance both side, Z=1.96)

H0 True

α error z=2.1

-1.96 μ=30 1.96


Accept reject
In actual Null hypothesis was true. And we rejected null hypothesis.
That means it was a wrong decision. This is called “Type 1” error (α risk)
Although it was an error, “but this mean is also not desirable. Because
sample mean is outside of confidence limit”.
Let’s take an example
Type (C)
■ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population mean is shifted to 34
that means alternate hypothesis (H1) is true. It’s agreed to test 10 samples. Test results shows mean= 31. (taking
5% significance)
= 1.5 (at 2.5% significance both side, Z=1.96)

H0 True H1 True

β error z=1.5

-1.96 μ0=30 1.96 μ1=34


Accept reject
In actual Alternate hypothesis was true. But we didn’t reject null hypothesis.
That is “Type 2” error (the point lies in β region) we call it β risk. (This is not
desirable)
Let’s take an example
Type (D)
■ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population mean is shifted to 34
that means alternate hypothesis (H1) is true. It’s agreed to test 10 samples. Test results shows mean= 32. (taking
5% significance)
= 2.1 (at 2.5% significance both side, Z=1.96)

H0 True H1 True

No error z=2.1

-1.96 μ0=30 1.96 μ1=34


Accept reject
In actual Alternate hypothesis was true. And we rejected null hypothesis too.
That is correct decision of “1-β” power. It is also not desirable.
Two types of error

False positive rate “sensitivity” or “power”

αα
False negative rate

ββ
81
Step 1: Decide on alpha and identify your decision rule (Zcrit)

null distribution
Rejection region

µ0 = 50

Z=0 Zcrit = 1.64

82
Step 2: State your decision rule in units of sample mean (Xcrit )

null distribution
Rejection region

µ0 = 50 Xcrit = 52.61

Z=0 Zcrit = 1.64

83
Step 3: Identify µA, the suspected true population mean for your
sample

alternative distribution
Acceptance region Rejection region Rejection region

µ0 = 50 Xcrit = 52.61 µA = 55

84
Step 4: How likely is it that this alternative distribution would
produce a mean in the rejection region?

power
beta alternative distribution
Rejection region

µ0 = 50 Xcrit = 52.61 µA = 55

Z = -1.51 Z=0

85
Power & Error

beta alpha

µ0 µA
Xcrit

86
Power is a function of

● The chosen alpha level (α)


● The true difference between μ0 and μA
● The size of the sample (n)
standard error
● The standard deviation (s or σ)

87
Changing alpha

beta alpha

µ0 µA
Xcrit

88
Changing alpha

beta alpha

µ0 µA
Xcrit

89
Changing alpha

beta alpha

µ0 µA
Xcrit

90
Changing alpha

beta alpha

µ0 Xcrit µA

91
Changing alpha

beta alpha

µ0 µA
Xcrit

• Raising alpha gives you less Type II error (more power) but
more Type I error. A trade-off.

92
Changing distance between μ0 and μA

beta alpha

µ0 µA
Xcrit

93
Changing distance between μ0 and μA

beta alpha

µ0 µA
Xcrit

94
Changing distance between μ0 and μA

beta alpha

µ0 µA
Xcrit

95
Changing distance between μ0 and μA

beta alpha

µ0
Xcrit µA

96
Changing distance between μ0 and μA

beta alpha

µ0 µA
Xcrit

• Increasing distance between μ0 and μA lowers Type II error


(improves power) without changing Type I error

97
Changing standard error

beta alpha

µ0 µA
Xcrit

98
Changing standard error

beta alpha

µ0 µA
Xcrit

99
Changing standard error

beta alpha

µ0 µA
Xcrit

100
Changing standard error

beta alpha

µ0 µA
Xcrit

101
Changing standard error

beta alpha

µ0 µA
Xcrit

• Decreasing standard error simultaneously reduces both kinds


of error and improves power.

102
To increase power
● Try to make μ really different from the null-hypothesis value (if possible)
● Loosen your alpha criterion (from .05 to .10, for example)
● Reduce the standard error (increase the size of the sample, or reduce
variability)

For a given level of alpha and a given sample size, power is


directly related to effect size.

103
1. Power increases as effect size increases

Power
Effect size

A
B

Beta = likelihood of type 2 error

104
2. Power increases as alpha decreases

Power

A
B

Beta = likelihood of type 2 error

105
3. Power increases as sample size increases

Low n

A
B

106
3. Power increases as sample size increases

High n

A
B

107
Alpha
Effect size

Power

Sample size

108
109
Contd.
A company manufacturing rope whose breaking strength is 300 lbs and
population SD = 24 lbs. It is believed by a newly developed process the
mean breaking strength can be improved.
i. Design a decision rule for rejecting the old process at 1% significance
level, if it is agreed to test 64 ropes.
ii. What’ll be the probability of accepting the old process when in fact the
new process has increased the mean breaking strength to 310 lbs?
Assume SD is still 24 lbs.

Sol. i. Here, µ = 300 lbs


σ = 24 lbs
n = 64
Inductive Statistics

2.3262 (1% Confidence level)

0
111
Z
Inductive Statistics
Decision Continue Process Adjust Process
H0 true; Process mean Yes α-error
hasn’t been shifted
H1 true; Process mean have β-error Yes
shifted
■ For Process change we’ve to make other curve.
■ β-error is more damaging to company as it’ll increase the complaints.
ii. Zβ = (307 – 310)/(24/√(64))
=–1
⇒ β = 0.1587 = 15.87%
Probability of falsely accepting old process is 0.1587.
112
Choice of Sample Size
Suppose that the null hypothesis is
false H1: μ ≠ μ0
and µ = µ0 + δ, where δ > 0

H1: μ ≠ μ0
The above can also be shown
as -

β
■ Taking right side only
More appropriately



Inductive Statistics
Q. To detect departure of 1 tex from 40 tex yarn count given, SD = 2. How
many sample to be tested?
α = 0.05 & β = 0.1
Sol. Here, δ = 1 tex, x-bar = 40 tex, σ = 2 tex, α = 0.05 & β = 0.1

■ As, n =

= [(1.96 + 1.28)222]/ 12
= 41.99 = 42 tests

117
Inductive Statistics
Q. Mean, µ = 12 gf/tex, σ = 1.5 gf/tex, n = 25
H0; µ = 12
H1; µ < 12
i. What is the critical region if α = 0.01?
ii. Find out β-error if mean strength have become 11.25 gf/tex?
Sol. α = 0.01
⇒ Zα = - 2.3263
Critical Region = 11.302
Zβ = (11.302 – 11.25)/(1.5/√(25) )
= 0.1733
Therefore, β = 0.4364
⇒ β-error = 43.64% 118
Sample Size Requirements
Sample size for one-sample z test:

where
1 – β ≡ desired power
α ≡ desired significance level (two-sided)
σ ≡ population standard deviation
Δ = μ0 – μa ≡ the difference worth detecting
Student’s t probability distribution

Inductive Statistics

V = 20

V=4

µ 121
Inductive Statistics

Critical region
Comparing a measured result
with a “known” value
• “Known” value would typically be a certified value from a standard
reference material (SRM)
• Another application of the t statistic

Will compare tcalc to tabulated value of t at appropriate df and CL

df = (n -1) for this test based on the concept of random variable


Note; For large values of v or N (certainly N >30) , the curves closely
approximate the standardized normal curve

Difference between normal distribution and “t” distribution


Standard Deviation
• The Standard Deviation, s, is simply the square root of the variance

• The Standard Deviation, s, is the sample standard


deviation, and is used to estimate the actual population
standard deviation, σ
Contd.
Point Estimation
Sample data is used to estimate parameters of a population
Statistics are calculated using sample data.
Parameters are the characteristics of population data

sample Population
mean   estimates mean
Sample SD Population SD
 
Standard Deviation
• What if we don’t want to assume that population SD σ is known?
• If σ is unknown, we can’t use our formula for the standard deviation of
the sample mean:

• Instead, we use the standard error of the sample mean:

• Standard error involves sample SD s as estimate of σ


Confidence intervals

Example of calculating a
confidence interval
Consider measurement of fibre denier:
Data: 1.34, 1.15, 1.28, 1.18, 1.33, 1.65, 1.48
DF = n – 1 = 7 – 1 = 6

133
Estimating the Mean of a Normal Population:
Small n and Unknown σ
• The population has a normal distribution.
• The value of the population standard deviation is unknown.
• The sample size is small, n < 30.
• Z distribution is not appropriate for these conditions
• t distribution is appropriate
The t Distribution
• Developed by British statistician, William Gosset
• A family of distributions -- a unique distribution for each value of its
parameter, degrees of freedom (d.f.)
• Symmetric, Unimodal, Mean = 0, Flatter than a Z
• t formula
Comparison of Selected t Distributions
to the Standard Normal
Standard Normal
t (d.f. = 25)
t (d.f. = 5)
t (d.f. = 1)

-3 -2 -1 0 1 2 3
Table of Critical Values of t

df t0.100 t0.050 t0.025 t0.010 t0.005


1 3.078 6.314 12.706 31.821 63.656
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604 α
5 1.476 2.015 2.571 3.365 4.032

23 1.319 1.714 2.069 2.500 2.807


24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
0 tα
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750

40 1.303 1.684 2.021 2.423 2.704 With df = 24 and α = 0.05,


60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
tα = 1.711.
∞ 1.282 1.645 1.960 2.327 2.576
Confidence Intervals for μ of a Normal
Population: Small n and Unknown σ
139
140
Example
Solution for Demonstration Problem

Number of Sample to be tested?


Continuous data
Comparing means
Independent
t-test
2
Comparing BETWEEN
groups One way
3+ ANOVA

Comparing
means
Comparing 2 Paired t-test
measurements WITHIN
the same subject

3+ Repeated
measures
ANOVA
ANOVA = Analysis of variance
144
Same mean but different standard deviation

146
147
148
R/F-1 R/F-2
Sample size 35 40
Mean strength 60 units 56 units
Std. deviation of strength 1.25 units 1.50 units

149
150
Estimating the Difference of Two Population Means
Inductive Statistics

Comparing replicate measurements or comparing
means of two sets of data
• Yet another application of the t statistic
• Example: Given the same sample analyzed by two different methods, do
the two methods give the “same” result?

Will compare tcalc to tabulated value of t at appropriate df and CL.


df = n1 + n2 – 2 for this test
154
Inductive Statistics
Case 2: σ12 ≠ σ22
Flowchart for comparing means of two sets of
data or replicate measurements
Use F-test to see if std. devs. of
the 2 sets of data are significantly
different or not

Std. devs. are significantly Std. devs. are not significantly


different different

Use the 2nd version of the Use the 1st version of the t-test
t-test (the beastly version)

156
157
Inductive Statistics

Inductive Statistics
Two types of cottons are tested for shedding in per 2500g of yarn and following results are obtained.
Is there a significant difference in two cottons in term of shedding %?
Paired t test

162
t-statistics with matched pairs
To compare the effect of finish on air permeability of various fabrics by
using t-test matched pairs method
Fabric A B C D
Permeability (cu.cm/s/sq.cm) without finish (X1) 915 671 457 366
Permeability (cu.cm/s/sq.cm) after finish (X2) 600 407 213 92
XD=X1-X2 315 264 244 274

As here the limit is not including zero value, it shows that the finish has
made significant difference in air permeability of fabrics. 163
Inductive Statistics

Plant A B C D E F G H I J Average
Before 45 73 46 124 33 57 83 34 26 17 53.8
After 36 60 44 119 35 51 77 29 24 11 48.9
Difference 9 13 2 5 -2 6 6 5 2 6 5.2
Inductive Statistics
t (table at 95% significance) = 1.83
As t (calculated) is greater than T(table). Therefore, improvement in process
is significant.

0 1.83 4.03
t 165
Population Variance

• Variance is an inverse measure of the group’s homogeneity.

• Variance is an important indicator of total quality in standardized


products and services.
Study of Variance
Single Variance => χ2 -test
Two Variance => F-test
More than two Variance => ANOVA
• A chi-squared test, also written as χ2 test, is a statistical hypothesis
test that is valid to perform when the test statistic is chi-squared
distributed under the null hypothesis, specifically Pearson's chi-squared
test and variants thereof.
• Pearson's chi-squared test is used to determine whether there is
a statistically significant difference between the
expected frequencies and the observed frequencies in one or more
categories of a contingency table.
167
Chi-square (𝜒2 ) probability distribution
• If we consider samples of size N drawn from a normal population with
standard deviation σ , and if for each sample we compute 𝜒2 , then,
A sampling distribution for 𝜒2 can be obtained. This distribution, called
the chi-square distribution, is given by

*where v = N - 1 is the number of degrees of freedom, and Y0 is a


constant depending v on such that the total area under the curve is 1.
*Note that degrees of freedom can be defined as the number of
independent random variables used for defining the new 𝜒2 random
variable.
168
Chi Square test
• For single variance study we use chi
square test
• For 2 variance we use f-test
• For multi-variance we go for ANOVA

We will study only this


section .
169
2
Selected χ Distributions
df = 3

df = 5

df = 10

0
Estimating the Population Variance
• Population Parameter σ2
• Estimator of σ2

• χ2 formula for Single Variance


Study of Variance
• In the standard applications of this test, the observations are classified
into mutually exclusive classes.

• If the null hypothesis that there are no differences between the classes in
the population is true, the test statistic computed from the observations
follows a χ2 frequency distribution.

• The purpose of the test is to evaluate how likely the observed


frequencies would be assuming the null hypothesis is true.

172
Study of Variance
■ Starting will always be zero. Starting will always be zero.
■ Area under curve is one.
V=2
V=4

V=6

Density
Function (Y)

χ2
173
Chi-square (𝜒2 )
• Used for small sample or small sampling distribution.
• The quantity 𝜒2 describes the magnitude of the discrepancy between
theoretical and observed value.
• Let X1, X2….., Xn be a random sample from a normal distribution with
parameters µ and then
𝜒2 = = with n - 1 degree of
freedom(df)

174
STUDY OF VARIANCE

• F-test is a statistical test which helps us in finding whether two populations sets have a
normal distribution of their data points have the same standard deviation or variances.
• But the first and foremost thing to perform F-test is that the data sets should have a
normal distribution.
• This is applied to F distribution under the null hypothesis.
• F-test is a very crucial part of the Analysis of Variances (ANOVA) and is calculated by
2
Confidence Interval for σ
Inference about single variance


Chi squared distribution
• The p-value is calculated using the Chi-squared distribution for this test
• Chi-squared is a skewed distribution which varies depending on the
degrees of freedom

Testing relationships between 2:


v = degrees of freedom
(no. of rows – 1) x (no. of columns – 1)
2
χ Table
df 0.975 0.950 0.100 0.050 0.025
1 9.82068E-04 3.93219E-03 2.70554 3.84146 5.02390
2 0.0506357 0.102586 4.60518 5.99148 7.37778 df = 5
3 0.2157949 0.351846 6.25139 7.81472 9.34840
4 0.484419 0.710724 7.77943 9.48773 11.14326
5 0.831209 1.145477 9.23635 11.07048 12.83249
6 1.237342 1.63538 10.6446 12.5916 14.4494 0.10
7 1.689864 2.16735 12.0170 14.0671 16.0128
8 2.179725 2.73263 13.3616 15.5073 17.5345
9 2.700389 3.32512 14.6837 16.9190 19.0228
10 3.24696 3.94030 15.9872 18.3070 20.4832
0 5 10 15 20
20
21
9.59077
10.28291
10.8508
11.5913
28.4120
29.6151
31.4104
32.6706
34.1696
35.4789
9.23635
22 10.9823 12.3380 30.8133 33.9245 36.7807
23 11.6885 13.0905 32.0069 35.1725 38.0756
24 12.4011 13.8484 33.1962 36.4150 39.3641 With df = 5 and α =
25 13.1197 14.6114 34.3816 37.6525 40.6465
0.10, χ2 = 9.23635
70 48.7575 51.7393 85.5270 90.5313 95.0231
80 57.1532 60.3915 96.5782 101.8795 106.6285
90 65.6466 69.1260 107.5650 113.1452 118.1359
100 74.2219 77.9294 118.4980 124.3421 129.5613
2
Two Table Values of χ
df = 7 df
1
0.950
3.93219E-03
0.050
3.84146
2 0.102586 5.99148
3 0.351846 7.81472
4 0.710724 9.48773
5 1.145477 11.07048
.05 6 1.63538 12.5916
7 2.16735 14.0671
8 2.73263 15.5073
.95 9 3.32512 16.9190
10 3.94030 18.3070

20 10.8508 31.4104
21 11.5913 32.6706
.05 22 12.3380 33.9245
23 13.0905 35.1725
24 13.8484 36.4150
0 2 4 6 8 10 12 14 16 18 20 25 14.6114 37.6525

2.16735 14.0671
2
90% Confidence Interval for σ
Solution for Demonstration Problem
Q
A machine is producing 5 samples of yarns. The % of the
moisture content of each was found with the following results
7.2 7.3 7.6 7.5 7.1
Calculate 95% confidence limits for the variance in moisture of
these samples.

183
F-test to compare standard deviations
• Used to determine if std. deviations are significantly different before
application of t-test to compare replicate measurements or compare
means of two sets of data

• Also used as a simple general test to compare the precision (as


measured by the std. deviation) of two sets of data

• Uses F distribution

184
‘F’ probability distribution
• Two samples, 1 and 2, of sizes N1 and N2, respectively, drawn
from two normal (or nearly normal) populations having variances
• Then statistics is defined as-

• Where,

PDF is defined as

where C is a constant depending on v 1 and v 2 such that the


total area under the curve is 1. 185
Fisher’s F distribution

F value at k1 and k2
(Where k1 and k2 are
degree of freedom two
variance )

186
F-test to compare standard deviations
Will compute Fcalc and compare to Ftable.

DF = n1 - 1 and n2 - 1 for this test.


Choose confidence level (95% is a typical CL).

Critical region
187
188
Study of Variance

Density
Function (Y)

F
Fk1, k2, 1-α/2 Fk1, k2, α/2
Study of Variance
k1 &k2 are degree of freedoms from 1st and 2nd set.
Fk2, k1, α/2 = 1/Fk1,k2,1-α/2
1/Fk2, k1, α/2 ≤ F = ≤ Fk1,k2,α/2

⇒ S12/(S22Fk1,k2,α/2) ≤ σ12/σ22 ≤ (S12 Fk2,k1,α/2) /S22

Practice Problems
1. A retailer buy garment from two different places. In first industry 20
samples were taken with mass variation as 25 and in second industry 25
samples were taken with mass variation as 14.1.
190
F24,19,0.025 = 2.114

191
Study of Variance
Sol. Here, S12 = 25, S22 = 14.1,
n1 = 20 & n2 = 25
Therefore, F19,24,0.025 = 2.06
& F24,19,0.025 = 2.114
S12/(S22Fk1,k2,α/2) ≤ σ12/σ22 ≤ (S12 Fk2,k1,α/2) /S22
⇒ 0.86 ≤ σ12/σ22 ≤ 3.72
As the range is completely positive and fall on one side of the number line,
i.e., sometimes σ1 is large and other time σ2 is large.
Therefore, there is no significant difference between the samples from two
sources.
192
Summery of Statistics

193
Summery of Confidence Interval Procedure

194
Test of Variance

195
196

You might also like