0% found this document useful (0 votes)

35 views13 pages

Week 8

This document discusses analysis of variance (ANOVA) and assumptions for parametric statistical tests including normality and outliers. It covers checking for normality, one-way and two-way ANOVA, applications of ANOVA in Excel, outliers in analytical data, and robust and non-parametric statistics. Graphical and statistical tests for verifying data distribution are also presented, such as histograms, normal quantile plots, and the Kolmogorov-Smirnov test.

Uploaded by

Reza Joia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views13 pages

Week 8

Uploaded by

Reza Joia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Week 8

MODULE 4 Analysis of variance. Assumptions for parametric statistical tests:

normality and outliers. Non-parametric and robust statistics

8 L 8. Checking normality. Analysis of variance. 1

Learning resources:
One-way and two-ways ANOVA. 1. Stephen Kokoska,
Introductory Statistics: A
PC 8. Applications of ANOVA in Excel using 2 5 Problem-Solving Approach,
various chemical data. Publisher: WH Freeman; 3rd
edition chapter 6, 11
IWST 4. Exercises and problems regarding 2. James Miller, Jane Miller,
ANOVA and normality tests Robert Miller, Statistics and
Chemometrics for Analytical
9 L 9. Outliers in analytical data. Detecting 1 Chemistry, Publisher:
outliers using Dixon and Grubbs tests. Robust Pearson Education; 7th
and non-parametric statistics. edition, chapter 3
3. Stephen L. R. Ellison, Vicki J.
Barwick, Trevor J. Duguid
PC 9. Practical application involving outlier 2 10
Farrant, Practical Statistics
detections and non-parametric Wilcoxon and for the Analytical Scientist: A
Mann-Whitney tests. Bench Guide, Publisher:
Royal Society of Chemistry;
IWS 3. Individual work with exercises and 10 2nd edition , chapter 6
problems regarding outliers testing, ANOVA,
non-parametric and robust statistics
Data symmetry based on descriptive statistics

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

Positively skewed distribution Symmetric distribution Negatively skewed distribution

(right tailed) (left tailed)

(Q3-Q2) > (Q2-Q1) (Q3-Q2) = (Q2-Q1) (Q3-Q2) < (Q2-Q1)

kth raw moment

0,0
kth moment about the a value

raw moment Moment about Standardized ,0

0 the μ central moment 2
(central moment)
1

4 𝒏

∑ ( 𝒙 𝒊 − 𝒙 )𝟑
Skewness 𝒃 𝟏=
𝒏 𝒊=𝟏
(𝒏 − 𝟏)(𝒏 − 𝟐) 𝒔𝟑

Kurtosis ...
Skewness β1 < 0 β1 > 0 β1 = 0
If:
• β1 = 0 then the data is symmetric β1 = 0,5
• β1 > 0 positive asymmetry
• β1 < 0 negative asymmetry β1 = 1

negative asymmetry positive asymmetry β1 = 1,5

Kurtosis (Peakedness)
For a normal distribution, β2 = 3. Therefore, it we define β2' = β2 -
3, excess of kurtosis and takes values between [-2, ꝏ).
If:
• β2’ = 0 then the data is normally distributed (mesokurtic)
• β2’ > 0 leptocurtic distribution
• β2’ < 0 platycurtic distribution
β 2’ > 0
mesokurtic
leptocurtic
β 2’ = 0
β 2’ < 0
platycurtic
Verification of data distribution through graphical representations Verification of data distribution by statistical tests
i. Histogram i. χ2 test
ii. Normal quantile plot, normal quantile-quantile plot, QQ plot ii. Kolmogorov-Smirnov test
iii. Stem and leaf plot iii. Shapiro-Wilk test

i. Histogram

• Interval with values ordered ascendingly, 100

divide into subintervals of the form: with β1 = 0.243 ± 0.127

observatii (mm)
the width 80
• The optimal number of subintervals is β2’ = -0.592 ± 0.253

observations
established by Stirling's formula:
60

no. of Nr.
• On the width of these intervals, rectangles are 20
constructed with length (height) proportional
to the relative frequency 0
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5
Length of orez
Lungime rice(mm)
(mm)
ii. Normal quantile-quantile plot 4
3.0
Plot the i-th ordered value versus quantile of the normal standard distribution 3 2.6
(corresponding z-scores) or vice versa. It allows the identification of potential
2

z-scores
atypical points. A more general formula for determining the quantile in the

z asteptate
corresponding normal distribution: 1

expected
scorurile
-1

-2
-2.6
-3 -3.0
-4
3 4 5 6 7 8 9
Length of orez
Lungime rice(mm)
(mm)

3 · · · ·
4 034 · · · ·
iii. Stem and leaf plot 4 6788899999 · · ·
5 00011111112222222233333333344444444444 ·
A stem and leaf plot is a way of organizing data into a form that allows 5 55555556666666666677777777777888888888899999999
for an easy visual perception of frequencies for different types of 6 000000011111122222233333333444444 ·
values. Such a presentation allows easy determination of quantiles as 6 5555555666666777778888889999999 ·
well as data distribution profile. It also allows the identification of 7 000000001111111122222233444 · ·

potential atypical points. 7 555667 · · · ·

8 · · · ·

establishing a code for the stem and leaves 8 7 · · · ·

9 · · · ·
4 6 means 4.6 min = 4.0 max = 8.7 Total n:371

Stem ‫ ׀‬leaf (leaf unit=1.00, e.g., 66.50 = 5‫)׀‬

Results from Statistica software.
Histogram: Lungime
H istogram : Rorez (mm) (m m )
ise length
K-S d=.07410, p<.05 ; p<.05
K -S d=.07410, Lilliefors p<.01 p<.01
; Lilliefors
Shapiro-Wilk W=.97839,
S hapiro-W p=.00002
ilk W =.97839, p=.00002 Normality/Symmetry Graphs | Real
220 220 Statistics Using Excel (real-statistics.
com)
200 200

180 180 Example using Statistica soft:

160 160 Using a QQ plot determine
whether the data set with 8
140 140
elements {-5.2, -3.9, -2.1, 0.2, 1.1,
No. of obs.

No. of obs.

120 120 2.7, 4.9, 5.3} is normally

100 100
distributed

80 80

60 60

40 40

20 20

0 0
3 3 4 4 5 5 6 6 7 7 8 8 9 9
X <= Category Boundary
X <= C ategory B oundary
One-way Analysis of Variance (ANOVA)
From an applicative point of view, ANOVA is an extension from the t-test for comparing two independent samples (when variations
are unknown) to more than two samples. Basically, ANOVA tests the effect of a single factor (an independent variable) on a
dependent variable for more than two samples/samples (at several levels). For two-factor testing, bifactorial or multi-factor ANOVA
is used, MANOVA is applied.
Examples of factors tested:
qualitative (catalyst, operator, a particular analytical method, etc.
quantitative (pH, temperature, pressure, etc.)
And the dependent variable can be any quantity, measurable or quantitatively assessed, for the tested factor, at different levels.

Thus, statistical assumptions will be:

H0: there is no difference between population means, μA = μB = μC = ...
H1: At least one mean differs, μp ≠ μq, for any p ≠ q
T1 T2 ... Tj ... Tk
k groups (treatments, methods, etc.), k levels for the same factor.
x11 x12 ... x1j ... x1k Each group contains nj values. The j-th group is , and is the i-th
x21 x22 ... x2j ... x2k measurement from the j-th group. It can also be written in the
form: .
. . ... . ... . j – index for the position of a group
. . ... xij ... . i – index for the position of a value in a group
. . ... . ... .
x n 11 x n 22 ... xnjj ... xnkk
T1 T2 ... Tj ... Tk 𝑛𝑗 𝑘 𝑘 𝑛𝑗

𝑺𝑺 𝒋 =∑ (𝑥 𝑖𝑗 − 𝑥 𝑗 ) 𝑺𝑺𝑾 =∑ 𝑆𝑆 𝑗= ∑ ∑ (𝑥 𝑖𝑗 − 𝑥 𝑗 )
2 2
x11 x12 ... x1j ... x1k
x21 x22 ... x2j ... x2k 𝑖=1 𝑗=1 𝑗=1 𝑖=1
𝑘 𝑘 𝑛𝑗
. . ... . ... .
𝑺𝑺 𝑩= ∑ 𝑛 𝑗 (𝑥 𝑗 − 𝑥) 𝑺𝑺𝑻 =∑ ∑ (𝑥𝑖𝑗 − 𝑥 )
2 2
. . ... xij ... .
𝑗=1 𝑗=1 𝑖=1
. . ... . ... .
𝑘
x n 11 x n 22 ... xnjj ... xnkk
𝝂 𝑾 = ∑ ( 𝑛 𝑗 −1 ) =𝑛 − 𝑘𝝂 𝑩 =𝑘 −1 𝝂𝑻 =𝑛 −1
𝑗=1
𝒙𝟏 𝒙𝟐 𝒙 𝒋 𝒙 𝒌
there is a variance in the group (Within), internal, residual
we suspect a variance between groups (Between), external,
explained
If the factor has no effect, there is no difference between the degrees of sum of mean
two variances freedom squares squares
• we define an overall average (Total), x ̅
𝑛𝑗
index ν SS MS
∑ 𝑥 𝑖𝑗
𝒙 𝒋 = 𝑖=1 , 𝑗=1 , 𝑘 , 𝑚𝑒𝑎𝑛 𝑓𝑜𝑟 𝑡h𝑒 𝑗 − 𝑡h 𝑔𝑟𝑜 𝑢𝑝 W
𝑛𝑗 Within
B
Between
T
Total
If the null hypothesis is true, are both a measure of random errors and we expect that

and if it's false

Decision:
• If , the null hypothesis is not rejected(the factor tested has no significant effect, μA = μB = μC = ...)
• If , reject the null hypothesis and accept the alternative hypothesis(The tested factor has a significant effect, at least one
average differs, μp ≠ μq, for a certain p ≠ q)
Example: The table below shows the results obtained in a stability study of a fluorescent reagent stored under different
conditions. The values given are fluorescence signals (in arbitrary units) from solutions diluted to the same concentration.
Three measurements were made in each sample. The table shows that the average values for the four samples are different.
However, we know that due to a random error, even if the true value we are trying to measure is unchanged, the sample
average may vary from sample to sample. Using ANOVA, test (α = 0,05) if the difference between sample means is too large to
be explained by random errors.
A B C D
(freshly diluted) (after 1h in the dark) (after 1h in the shade) (after 1h in light)

102 101 97 90
100 101 95 92
101 104 99 94
2
𝒙=𝟗𝟖

ANOVA: Single ANOVA

Factor
Source of Variation SS df MS F P-value F crit
SUMMARY
Groups Count Sum Average Variance
Column 1 3 303 101 1 Between Groups 186 3 62 20.67 0.0004 4.07
Column 2 3 306 102 3 Within Groups 24 8 3
Column 3 3 291 97 4
Column 4 3 276 92 4 Total 210 11

we reject H0, the tested effect is strongly significant;

IWST

P1. The following results show the percentage of

total interstitial water that was recovered by Depth (m) Water recovered (%)
centrifuging samples taken at different depths in 7 33.3 33.3 35.7 38.1 31 33.3
stone sediment. Choose a statistical test and show 8 43.6 45.2 47.7 45.4 43.8 46.5
16 73.2 68.7 73.6 70.9 72.5 74.5
(with 95% probability) that the percentage of
23 72.5 70.4 65.2 66.7 77.6 69.8
reclaimed water differs significantly at different
depths.
Analyst Paracetamol (% m/m)
P2. Six analysts each made six determinations of A 84.32 84.51 84.63 84.61 84.64 84.51
paracetamol content in the same batch of tablets. B 84.24 84.25 84.41 84.13 84.00 84.30
The results are presented below. Test if there is any C 84.29 84.40 84.68 84.28 84.40 84.36
significant difference (α=0.05) between the D 84.14 84.22 84.02 84.48 84.27 84.33
averages obtained by the six analysts. E 84.50 83.88 84.49 83.91 84.11 84.06
F 84.70 84.17 84.11 84.36 84.61 83.81

Solution Manual of Probability & Statistics For Engineers & Scientists (9th Edition)
69% (13)
Solution Manual of Probability & Statistics For Engineers & Scientists (9th Edition)
257 pages
Asia Pacific Business Schools
100% (1)
Asia Pacific Business Schools
11 pages
Sequential Analysis
From Everand
Sequential Analysis
Abraham Wald
4/5 (2)
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Freud Handout PDF
100% (1)
Freud Handout PDF
4 pages
1.01 Quality of Analytical Measurements_Statistical Methods for Internal Validation
No ratings yet
1.01 Quality of Analytical Measurements_Statistical Methods for Internal Validation
51 pages
FORMULAS
No ratings yet
FORMULAS
16 pages
Data Comes in Different Formats Time Histograms Lists But . Can Contain The Same Information About Quality
No ratings yet
Data Comes in Different Formats Time Histograms Lists But . Can Contain The Same Information About Quality
64 pages
Wa Nko Nalipay PR
No ratings yet
Wa Nko Nalipay PR
12 pages
6. Assumption_16_oct18
No ratings yet
6. Assumption_16_oct18
48 pages
Quality Trainer Content Outline
0% (1)
Quality Trainer Content Outline
4 pages
Starting Points in Data Analysis: January 21, 2020
No ratings yet
Starting Points in Data Analysis: January 21, 2020
32 pages
Statistcal Methods in Engineering and QA PDF
No ratings yet
Statistcal Methods in Engineering and QA PDF
387 pages
Lecture 7
No ratings yet
Lecture 7
7 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
50% (2)
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
4 pages
Merged_Statistics_II_Cheat_Sheet
No ratings yet
Merged_Statistics_II_Cheat_Sheet
9 pages
Lab 4 .
No ratings yet
Lab 4 .
6 pages
Resumo Adp
No ratings yet
Resumo Adp
5 pages
Week 2 Lecture 1
No ratings yet
Week 2 Lecture 1
14 pages
2012-Assumption and Data Transformationnew
No ratings yet
2012-Assumption and Data Transformationnew
57 pages
Statistical Methods in Engineering and Quality Assurance
100% (6)
Statistical Methods in Engineering and Quality Assurance
387 pages
Biostatistics 140127003954 Phpapp02
No ratings yet
Biostatistics 140127003954 Phpapp02
47 pages
STAT
No ratings yet
STAT
40 pages
Statistics For Analyst
100% (1)
Statistics For Analyst
11 pages
Quantitative Methods 2: ECON 20003
No ratings yet
Quantitative Methods 2: ECON 20003
26 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
04 Assumptions
No ratings yet
04 Assumptions
53 pages
Assumptions
No ratings yet
Assumptions
29 pages
Community MCQ
50% (2)
Community MCQ
271 pages
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
No ratings yet
Content Outline: Chapter 1: Descriptive Statistics and Graphical Analysis
4 pages
Lec 7 8
No ratings yet
Lec 7 8
58 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Solution Manual of Probability Amp Statistics For Engineers Amp Scientists 9th Edition PDF Free
No ratings yet
Solution Manual of Probability Amp Statistics For Engineers Amp Scientists 9th Edition PDF Free
257 pages
Checking Model Assumptions
No ratings yet
Checking Model Assumptions
4 pages
Journal of Chemometrics - 2018 - Brereton - Introduction To Analysis of Variance
No ratings yet
Journal of Chemometrics - 2018 - Brereton - Introduction To Analysis of Variance
4 pages
Psychological Stats Reviewer
No ratings yet
Psychological Stats Reviewer
11 pages
Assessment of Outlier....................
No ratings yet
Assessment of Outlier....................
8 pages
Introduction To Statistics and Data Analysis
No ratings yet
Introduction To Statistics and Data Analysis
26 pages
Exp 3
No ratings yet
Exp 3
35 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
L03 ECO220 Print
No ratings yet
L03 ECO220 Print
15 pages
Resampling Methods A Practical Guide to Data Analysis Digital EPUB Download
100% (9)
Resampling Methods A Practical Guide to Data Analysis Digital EPUB Download
16 pages
CHAPTERS
No ratings yet
CHAPTERS
17 pages
Statistics For A2 Biology
100% (1)
Statistics For A2 Biology
9 pages
Final Examination Answer
No ratings yet
Final Examination Answer
6 pages
Data Analysis and Statistical Treatment
No ratings yet
Data Analysis and Statistical Treatment
99 pages
Medical Statistics New
No ratings yet
Medical Statistics New
46 pages
BUSINESS STATISTICS II notes (1)
No ratings yet
BUSINESS STATISTICS II notes (1)
123 pages
3 Assmuption-testing PDF
No ratings yet
3 Assmuption-testing PDF
17 pages
Lec 22
No ratings yet
Lec 22
38 pages
Class Test 1 Revision Notes
No ratings yet
Class Test 1 Revision Notes
10 pages
Descriptive Statistics Using Microsoft Excel
No ratings yet
Descriptive Statistics Using Microsoft Excel
5 pages
Ch15
No ratings yet
Ch15
79 pages
Reviewer Psychstats Midterms
No ratings yet
Reviewer Psychstats Midterms
12 pages
BA bt
No ratings yet
BA bt
31 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
From Everand
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
S. Twomey
No ratings yet
Summative 1 - Math 10
No ratings yet
Summative 1 - Math 10
3 pages
Topic 4 - Decision Making
No ratings yet
Topic 4 - Decision Making
49 pages
Radial Basis Function
No ratings yet
Radial Basis Function
35 pages
Ma1102R Calculus Lesson 22: Wang Fei
No ratings yet
Ma1102R Calculus Lesson 22: Wang Fei
9 pages
3.2 Riemann Integral of Several Variables
No ratings yet
3.2 Riemann Integral of Several Variables
5 pages
Sma 2101
No ratings yet
Sma 2101
3 pages
2303 07553
No ratings yet
2303 07553
34 pages
Random Sample and Central: Limit Theorem X-Bar and R Control Charts
100% (1)
Random Sample and Central: Limit Theorem X-Bar and R Control Charts
40 pages
Worksheet 3: Even More Precalc!: Russell Buehler
No ratings yet
Worksheet 3: Even More Precalc!: Russell Buehler
2 pages
Numerical Modeling in Acoustics
No ratings yet
Numerical Modeling in Acoustics
9 pages
Criteria Qualities of Good Scientific Research
No ratings yet
Criteria Qualities of Good Scientific Research
1 page
Bodies of Revolution: NPTEL IIT Kharagpur: Prof. K.P. Sinhamahapatra, Dept. of Aerospace Engineering
No ratings yet
Bodies of Revolution: NPTEL IIT Kharagpur: Prof. K.P. Sinhamahapatra, Dept. of Aerospace Engineering
8 pages
Journal List
No ratings yet
Journal List
146 pages
StewartCalcET8 13 01
No ratings yet
StewartCalcET8 13 01
19 pages
Week 2 Exercise Due Date: Submission Method:: Nama Kelompok
No ratings yet
Week 2 Exercise Due Date: Submission Method:: Nama Kelompok
4 pages
TechAnalysis Lecture 1
No ratings yet
TechAnalysis Lecture 1
46 pages
Integral Pogramming (ILP)
No ratings yet
Integral Pogramming (ILP)
99 pages
Final Project Black-Box Optimization Problem
No ratings yet
Final Project Black-Box Optimization Problem
2 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
Effective Implementation of ISO 50001 Energy Management System Applying Lean Six Sigma Approach
No ratings yet
Effective Implementation of ISO 50001 Energy Management System Applying Lean Six Sigma Approach
12 pages
MODULE 3 For 2nd Year BSEd-Mathematics Second Sem AY 2023 - 2024
No ratings yet
MODULE 3 For 2nd Year BSEd-Mathematics Second Sem AY 2023 - 2024
13 pages
Six Sigma
100% (1)
Six Sigma
47 pages
G. D. Goenka Public School, Sector-22, Rohini SAMPLE PAPER-3 (2019-20) Class - XII Subject - Mathematics
No ratings yet
G. D. Goenka Public School, Sector-22, Rohini SAMPLE PAPER-3 (2019-20) Class - XII Subject - Mathematics
4 pages
Sai SRP
No ratings yet
Sai SRP
16 pages
MARK5811 T1-2024 Assessment Information
No ratings yet
MARK5811 T1-2024 Assessment Information
13 pages
Bchy101p Engineering-Chemistry-Lab Lo 1.0 65 Bchy101p
No ratings yet
Bchy101p Engineering-Chemistry-Lab Lo 1.0 65 Bchy101p
2 pages
Steel Buckling Restrained Braced Frames Under SEISMIC EXCITATION" Submitted by Sri. B. Venkata Reddy in
No ratings yet
Steel Buckling Restrained Braced Frames Under SEISMIC EXCITATION" Submitted by Sri. B. Venkata Reddy in
8 pages

Week 8

Uploaded by

Week 8

Uploaded by

Week 8

MODULE 4 Analysis of variance. Assumptions for parametric statistical tests:

8 L 8. Checking normality. Analysis of variance. 1

Positively skewed distribution Symmetric distribution Negatively skewed distribution

(right tailed) (left tailed)

(Q3-Q2) > (Q2-Q1) (Q3-Q2) = (Q2-Q1) (Q3-Q2) < (Q2-Q1)

raw moment Moment about Standardized ,0

negative asymmetry positive asymmetry β1 = 1,5

• Interval with values ordered ascendingly, 100

potential atypical points. 7 555667 · · · ·

establishing a code for the stem and leaves 8 7 · · · ·

Stem ‫ ׀‬leaf (leaf unit=1.00, e.g., 66.50 = 5‫)׀‬

180 180 Example using Statistica soft:

120 120 2.7, 4.9, 5.3} is normally

Thus, statistical assumptions will be:

and if it's false

ANOVA: Single ANOVA

we reject H0, the tested effect is strongly significant;

P1. The following results show the percentage of

You might also like