C - Normal Distribution
C - Normal Distribution
Normal Distribution
Dr Arunangshu Mukhopadhyay
Professor
Dr B R Ambedkar NIT Jalandhar
Normal Distribution
Parameters of the Normal Distribution
Mean; The mean is the central tendency of the distribution. It defines the
location of the peak for normal distributions. Most values cluster around the
mean. On a graph, changing the mean shifts the entire curve left or right on
the X-axis.
2
Non-parametric tests
3
Descriptive/Inductive Statistics
Bell shaped curve represents probability density curve.
■ y= σ
Where, µ = population mean
σ = population SD Normal Distribution
x = actual value
µ 4
Why hump?
5
Normal Probability Distribution Curve
(Gaussian Distribution)
■ 99.73%of the values lie within +/-3 standard deviation(σ) of the mean.
■ total area under the curve is equal to 100%(or 1.00).
■ Two parameters, µ and σ. Note that the normal distribution is actually a
family of distributions, since µ and σ determine the shape of the
distribution. 7
The Normal Distribution
■ “Bell shaped”
■ Symmetrical f(X)
■ Mean, median and
mode are equal
■ Interquartile range
x
equals 1.33 s
■ Random variable Mean
has infinite range Median
Mode
8
The Normal Distribution
9
The Normal Distribution
10
The Mathematical Model
11
Many Normal Distributions
There are an infinite number of normal distributions
13
Equations
14
Finding Probabilities
Probability is the area
under the curve!
f(X)
X
c d
15
Which Table to Use?
17
Transformation
by linear transformation function
■
18
The Standard Normal Distribution (U)
22
Example:
23
Shaded Area Exaggerated
Examples;
1.The chest girths of a large sample of men were measured and the mean
and standard deviation of the measurements were found to be
Mean = 96 cm, Standard deviation = 8 cm
It is required to estimate proportion of men in the population with chest
girths
i) Greater than 104 cm
ii) Less than 100 cm
iii) Less than 90 cm
Sol. Since the sample is large, we can assume that the mean and standard
deviation of the sample are good estimates of the corresponding parameters
in the population, i.e., μ = 96 cm, σ = 8 cm 24
(i) (ii) σ=8
σ=8
104 100
μ = 96 μ = 96
(iii)
σ=8
90
μ = 96
25
26
27
Descriptive Statistics
2. The diameter of a metal shaft in a direct drive is having mean 0.2508 inch
& SD 0.0005 inch. The specification on the shaft has been established as
0.2500 ± 0.0015 inch. Determine what fraction of shaft produced confirm to
specifications?
Sol. 0.2500 ± 0.0015 inch = 0.2515, 0.2485 inch
Pr (x ≥ 0.2515) = Pr (U ≥ (0.2515 – 0.2508)/0.0005)
⇒ α = 0.0808 = 8.08%
Similarly, Pr (x ≤ 0.2485) = Pr (U ≤ (0.2485 – 0.2508)/0.0005)
⇒ α = Pr (U ≤ – 4.6) = 0 = 0%
Total conforming = 100 – (8.08 + 0) = 91.92%
Therefore, 0.9192 fraction of total produced shaft confirm to specifications.28
Inductive Statistics
■
µ
29
Equations
30
Centre Limit Theorem
The Central Limit Theorem (CLT) states that the distribution of a sample
mean that approximates the normal distribution, as the sample size becomes
larger, assuming that all the samples are similar, and no matter what the
shape of the population distribution.
Comparison between the Normal Theorem and Central Limit Theorem
No. Normal Theorem (NT) Central Limit Theorem (CLT)
1 Population mean = μ, and population Population mean = μ, and population standard deviation = σ
standard deviation = σ
2 Shape of the population histogram is Shape of the population histogram is either unknown or not
known to be a N(μ, σ2) curve normal
3 Sample average X is said to have Sample average X is said to have N(μx, σx2) approximately
N(μx, σx2) for any n only for large n
μx = μ, and σx = σ/√n μx = μ, and σx = σ/√n
31
Centre Limit Theorem
32
33
34
Z Formula for Sample Means
35
Z Values for Some of the More Common
Levels of Confidence
36
37
Example
Graphic Solution to Example
.5 .5
00 00
. 0 . 0
4 4
2 2
0 0 1
8 7 8 7 .
X 0 Z
5 7 4
1
Equal
Areas
of .0793
Statistical Estimation
• Point estimate -- the single value of a statistic calculated from a sample
41
Confidence Interval to Estimate μ
when n is Large
• Point estimate
• Interval
Estimate
Distribution of Sample Means
for (1-α)% Confidence
1−α
μ X
Z
0
Distribution of Sample Means
for (1-α)% Confidence
μ X
Z
0
Distribution of Sample Means
for (1-α)% Confidence
μ X
Z
0
Probability Interpretation
of the Level of Confidence
Distribution of Sample Means
for 95% Confidence
.025 .025
95%
.47 .47
50 50
μ X
Z
-1.96 0 1.96
95% Confidence Interval for μ
95% Confidence Intervals for μ
95%
μ X
X
X
X
X
X
X
95% Confidence Intervals for μ
Is our interval,
95%
143.22 ≤ μ ≤
162.78, in the
μ red?
X
X
X
X
X
X
X
Example
Example
Example:
53
Confidence limits
• A confidence interval is the probability that the actual parameter will
fall between a pair of values around the mean.
• It gives the degree of certainty or uncertainty in a sampling method.
• Mostly used intervals are 90%, 95% and 99%.
• Here we can say 99% interval will have greater probability of
containing true data than 90%.
• Confidence limit lies within the specification limit.
• It is symmetric from the mean. (e.g. for 5% significance limit, 2.5%
will be at both side of the mean.)
*
95% A 99%
Specification limit
• Specification limit is a range of product specification that is provided
by the customer. If product specification (i.e. mean count, mean
strength etc.) is higher or lower than the specification limit then the
product would not be acceptable.
• Example: if customer demands a yarn of count 20 with +/- 5%
specification limit then yarn having count more than 21 and less than
19 will not be accepted.
• Specification limit is not affected by the curve itself. It is a fixed value
throughout the demand and supply process.
• Wider specification limit means more variation in the mean permitted
by the customer whereas narrower specification limit means more
accurate and precise data is needed.
μ-error μ μ+error
Continued…
• It doesn’t change with change in test data or the statistical curve.
• To reduce the error% either the sample size should be high or
deviation should be less.
• Narrower the curve lesser the error. Wider the curve higher the error.
• It is not compulsorily symmetric.
More error
Less error
Confidence limit with respect to
specification limit.
■ Below curves shows respective positions of confidence limits and
specification limit. Confidence limit
Specification limit
(A) Confidence limits within the specification limit (B) Confidence limits outside the specification limit
(Most desirable) (Objectionable)
(C) Lower confidence limit satisfying the (D) Upper confidence limit satisfying the
Specification limit but upper does not Specification limit but lower does not
(Objectionable) (Objectionable)
Determining Sample Size
when Estimating μ
• Z formula
• Error of Estimation
(tolerable error)
• Estimated σ
Sample Size When Estimating μ: Example
Solution for Demonstration Problem
Sample size determination for estimating the population mean μ.
Ref
Inductive Statistics
■
65
Inductive Statistics
66
67
Inductive Statistics
■
68
■
69
70
71
72
Inductive Statistics
Q. A company manufacturing rope whose breaking strength is 300 lbs and
population SD = 24 lbs. It is believed by a newly developed process the
mean breaking strength can be improved.
i. Design a decision rule for rejecting the old process at 1% significance
level, if it is agreed to test 64 ropes.
ii. What’ll be the probability of accepting the old process when in fact the
new process has increased the mean breaking strength to 310 lbs?
Assume SD is still 24 lbs.
74
Statistical Hypothesis Test
75
■ Based on the truth and the decision we make this table, where-
H0 is null hypothesis is true (e.g. No change in linear density)
H1 is alternate hypothesis is true (e.g. Linear density has changed)
No error H0 True
z=1.5
-1.96 1.96
Accept reject
In actual Null hypothesis was true. And we fail to reject null hypothesis.
That is correct decision of “1-α” power
Let’s take an example
Type (B)
■ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 32.
(taking 5% significance)
= 2.1 (at 2.5% significance both side, Z=1.96)
H0 True
α error z=2.1
H0 True H1 True
β error z=1.5
H0 True H1 True
No error z=2.1
αα
False negative rate
ββ
81
Step 1: Decide on alpha and identify your decision rule (Zcrit)
null distribution
Rejection region
µ0 = 50
82
Step 2: State your decision rule in units of sample mean (Xcrit )
null distribution
Rejection region
µ0 = 50 Xcrit = 52.61
83
Step 3: Identify µA, the suspected true population mean for your
sample
alternative distribution
Acceptance region Rejection region Rejection region
µ0 = 50 Xcrit = 52.61 µA = 55
84
Step 4: How likely is it that this alternative distribution would
produce a mean in the rejection region?
power
beta alternative distribution
Rejection region
µ0 = 50 Xcrit = 52.61 µA = 55
Z = -1.51 Z=0
85
Power & Error
beta alpha
µ0 µA
Xcrit
86
Power is a function of
87
Changing alpha
beta alpha
µ0 µA
Xcrit
88
Changing alpha
beta alpha
µ0 µA
Xcrit
89
Changing alpha
beta alpha
µ0 µA
Xcrit
90
Changing alpha
beta alpha
µ0 Xcrit µA
91
Changing alpha
beta alpha
µ0 µA
Xcrit
• Raising alpha gives you less Type II error (more power) but
more Type I error. A trade-off.
92
Changing distance between μ0 and μA
beta alpha
µ0 µA
Xcrit
93
Changing distance between μ0 and μA
beta alpha
µ0 µA
Xcrit
94
Changing distance between μ0 and μA
beta alpha
µ0 µA
Xcrit
95
Changing distance between μ0 and μA
beta alpha
µ0
Xcrit µA
96
Changing distance between μ0 and μA
beta alpha
µ0 µA
Xcrit
97
Changing standard error
beta alpha
µ0 µA
Xcrit
98
Changing standard error
beta alpha
µ0 µA
Xcrit
99
Changing standard error
beta alpha
µ0 µA
Xcrit
100
Changing standard error
beta alpha
µ0 µA
Xcrit
101
Changing standard error
beta alpha
µ0 µA
Xcrit
102
To increase power
● Try to make μ really different from the null-hypothesis value (if possible)
● Loosen your alpha criterion (from .05 to .10, for example)
● Reduce the standard error (increase the size of the sample, or reduce
variability)
103
1. Power increases as effect size increases
Power
Effect size
A
B
104
2. Power increases as alpha decreases
Power
A
B
105
3. Power increases as sample size increases
Low n
A
B
106
3. Power increases as sample size increases
High n
A
B
107
Alpha
Effect size
Power
Sample size
108
109
Contd.
A company manufacturing rope whose breaking strength is 300 lbs and
population SD = 24 lbs. It is believed by a newly developed process the
mean breaking strength can be improved.
i. Design a decision rule for rejecting the old process at 1% significance
level, if it is agreed to test 64 ropes.
ii. What’ll be the probability of accepting the old process when in fact the
new process has increased the mean breaking strength to 310 lbs?
Assume SD is still 24 lbs.
0
111
Z
Inductive Statistics
Decision Continue Process Adjust Process
H0 true; Process mean Yes α-error
hasn’t been shifted
H1 true; Process mean have β-error Yes
shifted
■ For Process change we’ve to make other curve.
■ β-error is more damaging to company as it’ll increase the complaints.
ii. Zβ = (307 – 310)/(24/√(64))
=–1
⇒ β = 0.1587 = 15.87%
Probability of falsely accepting old process is 0.1587.
112
Choice of Sample Size
Suppose that the null hypothesis is
false H1: μ ≠ μ0
and µ = µ0 + δ, where δ > 0
H1: μ ≠ μ0
The above can also be shown
as -
β
■ Taking right side only
More appropriately
■
■
Inductive Statistics
Q. To detect departure of 1 tex from 40 tex yarn count given, SD = 2. How
many sample to be tested?
α = 0.05 & β = 0.1
Sol. Here, δ = 1 tex, x-bar = 40 tex, σ = 2 tex, α = 0.05 & β = 0.1
■ As, n =
= [(1.96 + 1.28)222]/ 12
= 41.99 = 42 tests
117
Inductive Statistics
Q. Mean, µ = 12 gf/tex, σ = 1.5 gf/tex, n = 25
H0; µ = 12
H1; µ < 12
i. What is the critical region if α = 0.01?
ii. Find out β-error if mean strength have become 11.25 gf/tex?
Sol. α = 0.01
⇒ Zα = - 2.3263
Critical Region = 11.302
Zβ = (11.302 – 11.25)/(1.5/√(25) )
= 0.1733
Therefore, β = 0.4364
⇒ β-error = 43.64% 118
Sample Size Requirements
Sample size for one-sample z test:
where
1 – β ≡ desired power
α ≡ desired significance level (two-sided)
σ ≡ population standard deviation
Δ = μ0 – μa ≡ the difference worth detecting
Student’s t probability distribution
•
Inductive Statistics
•
V = 20
V=4
µ 121
Inductive Statistics
•
Critical region
Comparing a measured result
with a “known” value
• “Known” value would typically be a certified value from a standard
reference material (SRM)
• Another application of the t statistic
sample Population
mean estimates mean
Sample SD Population SD
Standard Deviation
• What if we don’t want to assume that population SD σ is known?
• If σ is unknown, we can’t use our formula for the standard deviation of
the sample mean:
133
Estimating the Mean of a Normal Population:
Small n and Unknown σ
• The population has a normal distribution.
• The value of the population standard deviation is unknown.
• The sample size is small, n < 30.
• Z distribution is not appropriate for these conditions
• t distribution is appropriate
The t Distribution
• Developed by British statistician, William Gosset
• A family of distributions -- a unique distribution for each value of its
parameter, degrees of freedom (d.f.)
• Symmetric, Unimodal, Mean = 0, Flatter than a Z
• t formula
Comparison of Selected t Distributions
to the Standard Normal
Standard Normal
t (d.f. = 25)
t (d.f. = 5)
t (d.f. = 1)
-3 -2 -1 0 1 2 3
Table of Critical Values of t
Comparing
means
Comparing 2 Paired t-test
measurements WITHIN
the same subject
3+ Repeated
measures
ANOVA
ANOVA = Analysis of variance
144
Same mean but different standard deviation
146
147
148
R/F-1 R/F-2
Sample size 35 40
Mean strength 60 units 56 units
Std. deviation of strength 1.25 units 1.50 units
149
150
Estimating the Difference of Two Population Means
Inductive Statistics
•
Comparing replicate measurements or comparing
means of two sets of data
• Yet another application of the t statistic
• Example: Given the same sample analyzed by two different methods, do
the two methods give the “same” result?
Use the 2nd version of the Use the 1st version of the t-test
t-test (the beastly version)
156
157
Inductive Statistics
•
Inductive Statistics
Two types of cottons are tested for shedding in per 2500g of yarn and following results are obtained.
Is there a significant difference in two cottons in term of shedding %?
Paired t test
162
t-statistics with matched pairs
To compare the effect of finish on air permeability of various fabrics by
using t-test matched pairs method
Fabric A B C D
Permeability (cu.cm/s/sq.cm) without finish (X1) 915 671 457 366
Permeability (cu.cm/s/sq.cm) after finish (X2) 600 407 213 92
XD=X1-X2 315 264 244 274
As here the limit is not including zero value, it shows that the finish has
made significant difference in air permeability of fabrics. 163
Inductive Statistics
•
Plant A B C D E F G H I J Average
Before 45 73 46 124 33 57 83 34 26 17 53.8
After 36 60 44 119 35 51 77 29 24 11 48.9
Difference 9 13 2 5 -2 6 6 5 2 6 5.2
Inductive Statistics
t (table at 95% significance) = 1.83
As t (calculated) is greater than T(table). Therefore, improvement in process
is significant.
0 1.83 4.03
t 165
Population Variance
df = 5
df = 10
0
Estimating the Population Variance
• Population Parameter σ2
• Estimator of σ2
• If the null hypothesis that there are no differences between the classes in
the population is true, the test statistic computed from the observations
follows a χ2 frequency distribution.
172
Study of Variance
■ Starting will always be zero. Starting will always be zero.
■ Area under curve is one.
V=2
V=4
V=6
Density
Function (Y)
χ2
173
Chi-square (𝜒2 )
• Used for small sample or small sampling distribution.
• The quantity 𝜒2 describes the magnitude of the discrepancy between
theoretical and observed value.
• Let X1, X2….., Xn be a random sample from a normal distribution with
parameters µ and then
𝜒2 = = with n - 1 degree of
freedom(df)
174
STUDY OF VARIANCE
• F-test is a statistical test which helps us in finding whether two populations sets have a
normal distribution of their data points have the same standard deviation or variances.
• But the first and foremost thing to perform F-test is that the data sets should have a
normal distribution.
• This is applied to F distribution under the null hypothesis.
• F-test is a very crucial part of the Analysis of Variances (ANOVA) and is calculated by
2
Confidence Interval for σ
Inference about single variance
•
Chi squared distribution
• The p-value is calculated using the Chi-squared distribution for this test
• Chi-squared is a skewed distribution which varies depending on the
degrees of freedom
20 10.8508 31.4104
21 11.5913 32.6706
.05 22 12.3380 33.9245
23 13.0905 35.1725
24 13.8484 36.4150
0 2 4 6 8 10 12 14 16 18 20 25 14.6114 37.6525
2.16735 14.0671
2
90% Confidence Interval for σ
Solution for Demonstration Problem
Q
A machine is producing 5 samples of yarns. The % of the
moisture content of each was found with the following results
7.2 7.3 7.6 7.5 7.1
Calculate 95% confidence limits for the variance in moisture of
these samples.
183
F-test to compare standard deviations
• Used to determine if std. deviations are significantly different before
application of t-test to compare replicate measurements or compare
means of two sets of data
• Uses F distribution
184
‘F’ probability distribution
• Two samples, 1 and 2, of sizes N1 and N2, respectively, drawn
from two normal (or nearly normal) populations having variances
• Then statistics is defined as-
• Where,
PDF is defined as
F value at k1 and k2
(Where k1 and k2 are
degree of freedom two
variance )
186
F-test to compare standard deviations
Will compute Fcalc and compare to Ftable.
Critical region
187
188
Study of Variance
Density
Function (Y)
F
Fk1, k2, 1-α/2 Fk1, k2, α/2
Study of Variance
k1 &k2 are degree of freedoms from 1st and 2nd set.
Fk2, k1, α/2 = 1/Fk1,k2,1-α/2
1/Fk2, k1, α/2 ≤ F = ≤ Fk1,k2,α/2
Practice Problems
1. A retailer buy garment from two different places. In first industry 20
samples were taken with mass variation as 25 and in second industry 25
samples were taken with mass variation as 14.1.
190
F24,19,0.025 = 2.114
191
Study of Variance
Sol. Here, S12 = 25, S22 = 14.1,
n1 = 20 & n2 = 25
Therefore, F19,24,0.025 = 2.06
& F24,19,0.025 = 2.114
S12/(S22Fk1,k2,α/2) ≤ σ12/σ22 ≤ (S12 Fk2,k1,α/2) /S22
⇒ 0.86 ≤ σ12/σ22 ≤ 3.72
As the range is completely positive and fall on one side of the number line,
i.e., sometimes σ1 is large and other time σ2 is large.
Therefore, there is no significant difference between the samples from two
sources.
192
Summery of Statistics
193
Summery of Confidence Interval Procedure
194
Test of Variance
195
196