Week 8
Week 8
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
0,0
kth moment about the a value
,2
4 𝒏
∑ ( 𝒙 𝒊 − 𝒙 )𝟑
Skewness 𝒃 𝟏=
𝒏 𝒊=𝟏
(𝒏 − 𝟏)(𝒏 − 𝟐) 𝒔𝟑
Kurtosis ...
Skewness β1 < 0 β1 > 0 β1 = 0
If:
• β1 = 0 then the data is symmetric β1 = 0,5
• β1 > 0 positive asymmetry
• β1 < 0 negative asymmetry β1 = 1
Kurtosis (Peakedness)
For a normal distribution, β2 = 3. Therefore, it we define β2' = β2 -
3, excess of kurtosis and takes values between [-2, ꝏ).
If:
• β2’ = 0 then the data is normally distributed (mesokurtic)
• β2’ > 0 leptocurtic distribution
• β2’ < 0 platycurtic distribution
β 2’ > 0
mesokurtic
leptocurtic
β 2’ = 0
β 2’ < 0
platycurtic
Verification of data distribution through graphical representations Verification of data distribution by statistical tests
i. Histogram i. χ2 test
ii. Normal quantile plot, normal quantile-quantile plot, QQ plot ii. Kolmogorov-Smirnov test
iii. Stem and leaf plot iii. Shapiro-Wilk test
i. Histogram
observatii (mm)
the width 80
• The optimal number of subintervals is β2’ = -0.592 ± 0.253
observations
established by Stirling's formula:
60
40
no. of Nr.
• On the width of these intervals, rectangles are 20
constructed with length (height) proportional
to the relative frequency 0
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5
Length of orez
Lungime rice(mm)
(mm)
ii. Normal quantile-quantile plot 4
3.0
Plot the i-th ordered value versus quantile of the normal standard distribution 3 2.6
(corresponding z-scores) or vice versa. It allows the identification of potential
2
z-scores
atypical points. A more general formula for determining the quantile in the
z asteptate
corresponding normal distribution: 1
expected
scorurile
-1
-2
-2.6
-3 -3.0
-4
3 4 5 6 7 8 9
Length of orez
Lungime rice(mm)
(mm)
3 · · · ·
4 034 · · · ·
iii. Stem and leaf plot 4 6788899999 · · ·
5 00011111112222222233333333344444444444 ·
A stem and leaf plot is a way of organizing data into a form that allows 5 55555556666666666677777777777888888888899999999
for an easy visual perception of frequencies for different types of 6 000000011111122222233333333444444 ·
values. Such a presentation allows easy determination of quantiles as 6 5555555666666777778888889999999 ·
well as data distribution profile. It also allows the identification of 7 000000001111111122222233444 · ·
No. of obs.
80 80
60 60
40 40
20 20
0 0
3 3 4 4 5 5 6 6 7 7 8 8 9 9
X <= Category Boundary
X <= C ategory B oundary
One-way Analysis of Variance (ANOVA)
From an applicative point of view, ANOVA is an extension from the t-test for comparing two independent samples (when variations
are unknown) to more than two samples. Basically, ANOVA tests the effect of a single factor (an independent variable) on a
dependent variable for more than two samples/samples (at several levels). For two-factor testing, bifactorial or multi-factor ANOVA
is used, MANOVA is applied.
Examples of factors tested:
qualitative (catalyst, operator, a particular analytical method, etc.
quantitative (pH, temperature, pressure, etc.)
And the dependent variable can be any quantity, measurable or quantitatively assessed, for the tested factor, at different levels.
𝑺𝑺 𝒋 =∑ (𝑥 𝑖𝑗 − 𝑥 𝑗 ) 𝑺𝑺𝑾 =∑ 𝑆𝑆 𝑗= ∑ ∑ (𝑥 𝑖𝑗 − 𝑥 𝑗 )
2 2
x11 x12 ... x1j ... x1k
x21 x22 ... x2j ... x2k 𝑖=1 𝑗=1 𝑗=1 𝑖=1
𝑘 𝑘 𝑛𝑗
. . ... . ... .
𝑺𝑺 𝑩= ∑ 𝑛 𝑗 (𝑥 𝑗 − 𝑥) 𝑺𝑺𝑻 =∑ ∑ (𝑥𝑖𝑗 − 𝑥 )
2 2
. . ... xij ... .
𝑗=1 𝑗=1 𝑖=1
. . ... . ... .
𝑘
x n 11 x n 22 ... xnjj ... xnkk
𝝂 𝑾 = ∑ ( 𝑛 𝑗 −1 ) =𝑛 − 𝑘𝝂 𝑩 =𝑘 −1 𝝂𝑻 =𝑛 −1
𝑗=1
𝒙𝟏 𝒙𝟐 𝒙 𝒋 𝒙 𝒌
there is a variance in the group (Within), internal, residual
we suspect a variance between groups (Between), external,
explained
If the factor has no effect, there is no difference between the degrees of sum of mean
two variances freedom squares squares
• we define an overall average (Total), x ̅
𝑛𝑗
index ν SS MS
∑ 𝑥 𝑖𝑗
𝒙 𝒋 = 𝑖=1 , 𝑗=1 , 𝑘 , 𝑚𝑒𝑎𝑛 𝑓𝑜𝑟 𝑡h𝑒 𝑗 − 𝑡h 𝑔𝑟𝑜 𝑢𝑝 W
𝑛𝑗 Within
B
Between
T
Total
If the null hypothesis is true, are both a measure of random errors and we expect that
Decision:
• If , the null hypothesis is not rejected(the factor tested has no significant effect, μA = μB = μC = ...)
• If , reject the null hypothesis and accept the alternative hypothesis(The tested factor has a significant effect, at least one
average differs, μp ≠ μq, for a certain p ≠ q)
Example: The table below shows the results obtained in a stability study of a fluorescent reagent stored under different
conditions. The values given are fluorescence signals (in arbitrary units) from solutions diluted to the same concentration.
Three measurements were made in each sample. The table shows that the average values for the four samples are different.
However, we know that due to a random error, even if the true value we are trying to measure is unchanged, the sample
average may vary from sample to sample. Using ANOVA, test (α = 0,05) if the difference between sample means is too large to
be explained by random errors.
A B C D
(freshly diluted) (after 1h in the dark) (after 1h in the shade) (after 1h in light)
102 101 97 90
100 101 95 92
101 104 99 94
2
𝒙=𝟗𝟖