STAT II
STAT II
Two-way classification: Analyzing variance with two factors, including cases with equal number of entries per cell.
Plot and Block Design: Importance of plot size, shape, and arrangement.
Analysis of Completely Randomized Design: Understanding this design and its analysis.
Main Effects and Interaction Effects: Calculating the effects of factors individually and in combination.
MS Excel and R: Using software for one-way, two-way, and two-way with multiple entries classification ANOVA.
This syllabus provides a clear breakdown of analysis and experimental design methods used in statistical research.
Let's begin with the first question and work through them step-by-step, as you requested.
Question: Stating the linear model for one-way analysis of variance (ANOVA), find the least square estimates of the parameters in
the model and show that:
1/36
Solution:
In one-way ANOVA, we analyze the difference between the means of several groups or treatments. The linear model can be
represented as:
Yij = μ + τi + ϵij
Where:
Yij = the observation for the j -th subject in the i-th group.
ϵij = the random error (residual) associated with the j -th observation in the i-th group, assumed to be independent and identically
k ni
SS Total = ∑ ∑(Yij − Yˉ )2
i=1 j=1
Where:
k
SS Treatment = ∑ ni (Yˉi − Yˉ )2
i=1
k ni
SSE = ∑ ∑(Yij − Yˉi )2
i=1 j=1
In one-way ANOVA, we estimate the parameters μ (the overall mean) and τi (the group effects) by minimizing the sum of squared
^
Estimate of the overall mean μ = , where N is the total number of observations.
N
2/36
Since the total sum of squares can be decomposed as the sum of the treatment sum of squares and the error sum of squares, the
expectation of these quantities are as follows:
1. The expectation of the total variability (SS Total) is the total variance in the data.
2. The expectation of the SS due to treatment is the variance explained by the group means.
3. The expectation of the error sum of squares is the variance within each group.
Thus, the decomposition of the total sum of squares into the treatment and error components is correct.
Solution:
The Gauss-Markov Theorem provides a foundation for the Best Linear Unbiased Estimator (BLUE) in the context of linear regression
models. It states:
"In a linear regression model where the errors are uncorrelated with constant variance (homoscedasticity), the Ordinary Least
Squares (OLS) estimator is the best (most efficient) linear unbiased estimator, i.e., it has the smallest variance among all linear
unbiased estimators."
In simpler terms, if you have a linear model with certain assumptions (like errors being uncorrelated and having constant variance), the
OLS method provides the best linear unbiased estimates of the parameters.
Definitions:
1. Linear Parametric Function: A linear parametric function refers to any function that is linear in the parameters (i.e., coefficients)
of the model. For example, in a linear regression model:
Y = β 0 + β 1 X1 + β 2 X2 + ⋯ + β p Xp + ϵ
θ = β1 + β2
3/36
2. Estimable Linear Parametric Function: An estimable linear parametric function refers to a linear function of parameters that can
be consistently estimated from the available data. In other words, we can compute an unbiased estimate of the function based on
the sample data. For example:
θ = β1 − 2β2
This is estimable if the linear combination of the parameters is consistent with the linear structure of the model. Whether or not a
function is estimable depends on the design matrix and the data available.
3. BLUE (Best Linear Unbiased Estimator): BLUE refers to the Best Linear Unbiased Estimator, which is the estimator that is:
Unbiased: The estimator's expected value equals the true value of the parameter.
Best: The estimator has the minimum variance among all unbiased linear estimators.
The Gauss-Markov Theorem guarantees that in the context of linear regression with uncorrelated errors and constant variance, the
OLS estimator is BLUE. That is, it provides the most efficient (smallest variance) unbiased estimate of the parameters.
Mathematically, the BLUE for the parameter vector β in the linear model Y = Xβ + ϵ (where ϵ has mean 0 and covariance σ 2 I ) is
given by:
β^ = (X T X)−1 X T Y
This estimator has the smallest variance among all unbiased estimators of β .
Shall I solve the next question (Q3: ANOVA Assumptions and Use)?
Question: What is meant by ANOVA? State the basic assumptions in ANOVA. Explain its use.
Solution:
What is ANOVA?
ANOVA (Analysis of Variance) is a statistical technique used to compare the means of three or more groups to determine if there is a
statistically significant difference between them. The main idea behind ANOVA is to partition the total variability in the data into
components attributable to different sources, such as group differences and random error.
ANOVA helps us assess if variations in a dependent variable can be explained by the independent variables (or factors) under study.
For example, if we are comparing the exam scores of students from three different teaching methods, ANOVA helps determine if the
differences in scores are significant or just due to random variation.
ANOVA relies on several key assumptions about the data to ensure its validity:
4/36
1. Independence of observations:
The observations within each group should be independent of each other. In other words, the score of one individual in a group
should not affect the score of another individual in the same group.
The residuals (the differences between the observed values and the group means) should be approximately normally
distributed for each group.
The variances of the residuals (errors) should be equal across all groups. In other words, each group should have similar
variability.
Use of ANOVA
ANOVA is commonly used in various fields, including psychology, medicine, agriculture, and economics, for the following purposes:
It allows the comparison of means from different groups. For example, you could compare the mean heights of individuals in
different age groups or the effectiveness of different drugs on patients.
2. Hypothesis testing:
ANOVA helps in testing the null hypothesis that the means of all groups are equal. The alternative hypothesis suggests that at
least one of the means is different.
Alternative hypothesis (H₁): At least one group mean is different from the others.
3. Understanding variations:
ANOVA breaks down the total variation in the data into components due to between-group variation and within-group
variation. This helps to understand how much of the variation in the dependent variable is explained by the independent
variables (factors).
In experimental designs, ANOVA helps assess the effects of different treatments or conditions on a response variable.
5. Factorial experiments:
ANOVA is also used to analyze data from factorial experiments (experiments with two or more factors). It helps assess the main
effects of each factor and their interactions.
Shall I proceed with solving the next question (Q4: Linear Model for One-Way ANOVA)?
TOPIC 3
4: Linear Model for One-Way and Two-Way ANOVA
Q4: Linear Model for One-Way ANOVA with Unequal Number of Entries in Each Class
Year(s) Asked: S17 (E)
Question: Explain the linear model in analysis of variance for one-way classified data with unequal numbers of entries in each class.
Obtain the breakup of total sum of squares and the expected values of various mean sum of squares. Write its ANOVA table and
5/36
explain how various hypotheses can be tested.
Solution:
Linear Model for One-Way ANOVA with Unequal Numbers of Entries in Each Class
In One-Way ANOVA, we are analyzing the effect of a single factor (independent variable) on a dependent variable, where the factor has
multiple levels (or groups). The data in each group can be different in number (unequal entries per class).
Yij = μ + αi + ϵij
Where:
ϵij is the random error associated with the j-th observation in the i-th group. It is assumed to be normally distributed with mean 0
In this case, we have unequal numbers of observations in each group, so each group may have a different number of entries, i.e.,
n1 , n2 , … , nk , where k is the number of groups.
The total sum of squares (Total SS) measures the total variation in the data. It can be partitioned into two components:
1. Sum of Squares due to the factor (SS due to groups): This measures the variation between the group means and the overall mean.
2. Sum of Squares due to Error (SSE): This measures the variation within the groups (i.e., how much each observation deviates from
its respective group mean).
k ni
Total SS = ∑ ∑(Yij − Y )2
i=1 j=1
Where:
Here, Y i is the mean of the i-th group, and ni is the number of observations in the i-th group.
6/36
The Mean Sum of Squares (MS) are calculated as:
SS due to groups
MSG =
k−1
where k is the number of groups.
SS due to error
MSE =
N −k
where N is the total number of observations (sum of all group sizes), and k is the number of groups.
ANOVA Table
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic
SS due to groups MSG
Between Groups SS due to groups k−1 k−1
MSE
SS due to error
Within Groups (Error) SS due to error N −k N −k
—
Total Total SS N −1 — —
Hypothesis Testing
H0 : μ 1 = μ 2 = ⋯ = μ k
Alternative Hypothesis (H₁): At least one group mean is different from the others, i.e.,
We compare the F-statistic with the critical value from the F-distribution table with k − 1 and N − k degrees of freedom. If the F-
statistic is greater than the critical value, we reject the null hypothesis and conclude that there is a significant difference between the
group means.
Shall I proceed with solving the next question (Q5: Linear Model for Two-Way ANOVA with One
Observation per Cell)?
Q5: Linear Model for Two-Way ANOVA with One Observation per Cell
Year(s) Asked: W18 (E)
7/36
Solution:
Linear Model for Two-Way ANOVA with One Observation per Cell
In Two-Way ANOVA, we analyze the effects of two factors (independent variables) on a dependent variable. Each factor has multiple
levels, and the design assumes one observation per cell, meaning each combination of factor levels has exactly one data point.
Where:
Yijk is the observation for the i-th level of factor A, j-th level of factor B, and k-th replication (in this case, only one replication, so it’s
indexed by k ).
ϵijk is the random error term, assumed to be normally distributed with mean 0 and variance σ 2 (ϵijk ∼ N (0, σ 2 )).
In this case, there is one observation per cell (no replication), so the model focuses on the main effects and interaction effect without
considering the within-cell variation (random error).
The total sum of squares (Total SS) can be partitioned into three components:
1. Sum of Squares for Factor A (SS for A): Measures the variation due to the different levels of factor A.
2. Sum of Squares for Factor B (SS for B): Measures the variation due to the different levels of factor B.
3. Sum of Squares for Interaction (SS for Interaction): Measures the variation due to the interaction between factors A and B.
a b
Total SS = ∑ ∑(Yij − Y )2
i=1 j=1
Where:
Since there is only one observation per cell, the error term will be minimal or nonexistent, and the sum of squares is mainly attributed to
the main effects and interaction.
ANOVA Table
8/36
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic
SS for Factor A MS for Factor A
Factor A SS for Factor A a−1 a−1
MS for Error
MS for Error
MS for Error
Total Total SS ab − 1 — —
Where:
Hypothesis Testing
Null Hypothesis (H₀): All levels of factor A have the same effect on the dependent variable.
H 0 : α1 = α2 = ⋯ = αa = 0
Alternative Hypothesis (H₁): At least one level of factor A has a different effect on the dependent variable.
H1 : At least one αi
=0
Null Hypothesis (H₀): All levels of factor B have the same effect on the dependent variable.
H0 : β 1 = β 2 = ⋯ = β b = 0
Alternative Hypothesis (H₁): At least one level of factor B has a different effect on the dependent variable.
H1 : At least one βj
=0
H0 : (αβ)ij = 0
for all i, j
Alternative Hypothesis (H₁): There is an interaction between factors A and B.
We compute the F-statistics for each factor and the interaction, and compare them with the critical F-value from the F-distribution table
at the chosen significance level.
Shall I proceed with solving the next question (Q6: Linear Model for Two-Way ANOVA with Multiple
Entries per Cell)?
Q6: Linear Model for Two-Way ANOVA with Multiple Entries per Cell
Year(s) Asked: W19 (B)
9/36
Solution:
Linear Model for Two-Way ANOVA with Multiple Entries per Cell
In a two-way ANOVA with multiple entries per cell, we analyze two factors (A and B) and their interaction on a dependent variable. This
model assumes more than one observation per combination of factor levels, so the error term is included in the model.
Where:
Yijk is the k -th observation for the i-th level of factor A and the j-th level of factor B.
ϵijk is the random error term for the k -th observation, assumed to follow a normal distribution with mean 0 and variance σ 2 (
In this case, we assume multiple entries per cell, meaning that for each combination of factor levels (i, j), there are multiple
observations (indexed by k ).
The total sum of squares (Total SS) is the total variability in the data. It can be partitioned into the following components:
1. Sum of Squares for Factor A (SS for A): Measures the variation due to factor A.
2. Sum of Squares for Factor B (SS for B): Measures the variation due to factor B.
3. Sum of Squares for Interaction (SS for Interaction): Measures the variation due to the interaction between factors A and B.
4. Sum of Squares for Error (SS for Error): Measures the random variation not explained by factors A, B, or their interaction.
a b m
Total SS = ∑ ∑ ∑(Yijk − Y )2
Where:
ANOVA Table
The ANOVA table for two-way ANOVA with multiple entries per cell is as follows:
10/36
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic
SS for Factor A MS for Factor A
Factor A SS for Factor A a−1 a−1
MS for Error
MS for Error
MS for Error
SS for Error
Error (Residual) SS for Error abm − a − b + 1 abm−a−b+1
—
Where:
Hypothesis Testing
The hypotheses for testing the effects in two-way ANOVA with multiple entries per cell are:
Null Hypothesis (H₀): All levels of factor A have the same effect on the dependent variable.
H 0 : α1 = α2 = ⋯ = αa = 0
Alternative Hypothesis (H₁): At least one level of factor A has a different effect on the dependent variable.
H1 : At least one αi
=0
Null Hypothesis (H₀): All levels of factor B have the same effect on the dependent variable.
H0 : β 1 = β 2 = ⋯ = β b = 0
Alternative Hypothesis (H₁): At least one level of factor B has a different effect on the dependent variable.
H1 : At least one βj
=0
H0 : (αβ)ij = 0
for all i, j
Alternative Hypothesis (H₁): There is an interaction between factors A and B.
We compute the F-statistics for each factor and the interaction, and compare them with the critical F-value from the F-distribution table
at the chosen significance level.
Shall I proceed with solving the next question (Q7: Linear Model for Three-Way ANOVA)?
11/36
Year(s) Asked: S19 (E)
Solution:
In a three-way ANOVA, we analyze three factors (A, B, and C) and their interactions on a dependent variable. Each factor has several
levels, and we test how these factors, as well as their interactions, affect the outcome.
Where:
Yijkℓ is the ℓ-th observation for the i-th level of factor A, the j -th level of factor B, and the k -th level of factor C.
ϵijkℓ is the random error term for the k -th observation at each combination of factors, assumed to be normally distributed with
The model incorporates main effects for all three factors (A, B, and C) and their interactions, including two-way and three-way
interactions.
The Total Sum of Squares (Total SS) is the total variation in the data. It is partitioned into several components as follows:
1. Sum of Squares for Factor A (SS for A): Measures the variation due to factor A.
2. Sum of Squares for Factor B (SS for B): Measures the variation due to factor B.
3. Sum of Squares for Factor C (SS for C): Measures the variation due to factor C.
SS for (A * B): Measures the variation due to the interaction between factors A and B.
SS for (A * C): Measures the variation due to the interaction between factors A and C.
SS for (B * C): Measures the variation due to the interaction between factors B and C.
SS for (A * B * C): Measures the variation due to the three-way interaction between factors A, B, and C.
5. Sum of Squares for Error (SS for Error): Measures the residual or unexplained variation.
Total SS = SS for A + SS for B + SS for C + SS for (A * B) + SS for (A * C) + SS for (B * C) + SS for (A * B * C) + SS for Error
12/36
ANOVA Table
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic
SS for A MS for Factor A
Factor A SS for Factor A a−1 a−1 MS for Error
MS for Error
MS for Error
SS for (A * B) MS for (A * B)
Interaction (A * B) SS for (A * B) (a − 1)(b − 1) (a−1)(b−1)
MS for Error
SS for (A * C) MS for (A * C)
Interaction (A * C) SS for (A * C) (a − 1)(c − 1) (a−1)(c−1)
MS for Error
SS for (B * C) MS for (B * C)
Interaction (B * C) SS for (B * C) (b − 1)(c − 1) (b−1)(c−1)
MS for Error
SS for (A * B * C) MS for (A * B * C)
Interaction (A * B * C) SS for (A * B * C) (a − 1)(b − 1)(c − 1) (a−1)(b−1)(c−1)
MS for Error
SS for Error
Error (Residual) SS for Error abc − a − b − c + 2 abc−a−b−c+2
—
Where:
Hypothesis Testing
The hypotheses for testing the effects in three-way ANOVA are as follows:
Null Hypothesis (H₀): All levels of factor A have the same effect on the dependent variable.
H 0 : α1 = α2 = ⋯ = αa = 0
Alternative Hypothesis (H₁): At least one level of factor A has a different effect on the dependent variable.
H1 : At least one αi
=0
Null Hypothesis (H₀): All levels of factor B have the same effect on the dependent variable.
H0 : β 1 = β 2 = ⋯ = β b = 0
Alternative Hypothesis (H₁): At least one level of factor B has a different effect on the dependent variable.
H1 : At least one βj
=0
Null Hypothesis (H₀): All levels of factor C have the same effect on the dependent variable.
H 0 : γ1 = γ2 = ⋯ = γc = 0
Alternative Hypothesis (H₁): At least one level of factor C has a different effect on the dependent variable.
H1 : At least one γk
=0
13/36
H0 : (αβ)ij = 0
for all i, j
Alternative Hypothesis (H₁): There is an interaction between factors A and B.
The same type of hypotheses is applied to test for interactions between factors A ∗ C , B ∗ C , and the three-way interaction A ∗ B ∗ C .
Conclusion
The three-way ANOVA model allows us to study the main effects and interactions between three factors. We can use the ANOVA table to
perform hypothesis testing, compute F-statistics, and determine which factors or interactions significantly affect the dependent
variable. By carefully examining the p-values for each effect, we can draw conclusions about the relationship between the factors.
Q8W16 (E): Linear Model in Three-Way ANOVA and Total Sum of Squares
Solution:
In the analysis of variance (ANOVA) with three-way classification, we examine the effects of three factors (A, B, and C) and their
interactions on a dependent variable. The general linear model for this analysis is expressed as:
Where:
Yijkℓ is the observed value for the ℓ-th replication at the i-th level of factor A, j -th level of factor B, and k -th level of factor C.
ϵijkℓ is the random error term, assumed to follow a normal distribution with mean 0 and variance σ 2 .
The total sum of squares (Total SS) represents the total variation in the dependent variable, which can be partitioned into various
components that correspond to the factors, their interactions, and error. The total sum of squares is given by:
ijkℓ
The total sum of squares can be broken down into seven components:
14/36
ˉi⋅⋅⋅ is the mean of all observations at the i-th level of factor
where n⋅⋅⋅ is the number of observations for each level of factor A, and Y
A.
i,j
ˉij⋅⋅ is the mean of all observations at the i-th level of factor A and j -th level of factor B.
where Y
i,k
ˉi⋅k⋅ is the mean of all observations at the i-th level of factor A and k -th level of factor C.
where Y
j,k
ˉ⋅jk⋅ is the mean of all observations at the j -th level of factor B and k -th level of factor C.
where Y
SS for (A * B * C) = ∑ n⋅⋅⋅ (Yˉijkℓ − Yˉij⋅⋅ − Yˉi⋅k⋅ − Yˉ⋅jk⋅ + Yˉi⋅⋅⋅ + Yˉ⋅j⋅⋅ + Yˉ⋅⋅k⋅ − Yˉ⋅⋅⋅ )2
i,j,k
ˉijkℓ is the mean of all observations at the i-th level of factor A, j -th level of factor B, and k -th level of factor C.
where Y
ijkℓ
Conclusion
Thus, the total sum of squares can be partitioned into seven components:
These components allow us to analyze the contribution of each factor and its interactions to the total variability in the data. By
examining the mean squares for each component and performing hypothesis testing, we can determine the significance of the factors
15/36
and interactions in explaining the variation in the dependent variable.
Please provide the next question, and I'll be happy to assist you.
Question:
Q9W19 (F): In the analysis of variance, for three-way classification, write the linear model explaining the constants in it. Prepare the
ANOVA table for this analysis.
Solution:
In a three-way classification, we have three factors (A, B, C), and the response variable is influenced by the interaction of these three
factors. The general linear model for a three-way classification is:
Where:
Yijk is the observed value for the i-th level of factor A, the j -th level of factor B, and the k -th level of factor C.
(αβγ)ijk is the interaction effect among the three factors (A, B, and C).
βj : The effect of the j -th level of factor B, assuming all other factors are constant.
γk : The effect of the k -th level of factor C, assuming all other factors are constant.
(αβ)ij , (αγ)ik , (βγ)jk , and (αβγ)ijk are interaction terms that capture the combined effects of two or more factors on the
response variable.
ANOVA Table:
We can break down the total variability in the data into different components, each corresponding to the main effects and interaction
effects. The ANOVA table for a three-way classification is:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Value
SSA M SA
Factor A SSA a−1 M SA =
a−1
FA =
M SError
SSB M SB
Factor B SSB b−1 M SB =
b−1
FB =
M SError
SSC M SC
Factor C SSC c−1 M SC =
c−1
FC =
M SError
SSAB M SAB
Interaction A × B SSAB (a − 1)(b − 1) M SAB =
(a−1)(b−1)
FAB =
M SError
SSAC M SAC
Interaction A × C SSAC (a − 1)(c − 1) M SAC = FAC =
(a−1)(c−1) M SError
SSBC M SBC
Interaction B ×C SSBC (b − 1)(c − 1) M SBC =
(b−1)(c−1)
FBC =
M SError
16/36
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Value
SSABC M SABC
Interaction A × B ×C SSABC (a − 1)(b − 1)(c − 1) M SABC = FABC =
(a−1)(b−1)(c−1) M SError
SSError
Error (Residual) SSError abc(n − 1) M SError =
abc(n−1)
-
Where:
Factor A:
Factor B:
Factor C:
Interaction Effects:
Null hypotheses for each interaction term (A × B , A × C , B × C , and A × B × C ) are that the interaction effects are zero.
Question:
Q10S16 (E): Provide the complete analysis of one-way classified data with an equal number of entries in each class.
Solution:
In one-way classification, we are dealing with a single factor, and the data is divided into different classes (groups or treatments). The
objective is to determine if there are significant differences between the means of these groups.
Assumptions:
The data is normally distributed within each class.
Let the data be classified into k groups, and each group has the same number of entries, say n.
Yij = μ + αi + ϵij
Where:
17/36
αi is the effect of the i-th group.
ϵij is the random error term, assumed to be normally distributed with mean 0 and variance σ 2 .
k n
SSTotal = ∑ ∑(Yij − Yˉ )2
i=1 j=1
2. Compute the Sum of Squares Between Groups (SS_Between): This measures the variation due to the differences between the
group means.
k
SSBetween = n ∑(Yˉi − Yˉ )2
i=1
3. Compute the Sum of Squares Within Groups (SS_Within): This measures the variation within each group due to the random error.
k n
SSWithin = ∑ ∑(Yij − Yˉi )2
i=1 j=1
4. Degrees of Freedom:
SSBetween
M SBetween =
dfBetween
SSWithin
M SWithin =
dfWithin
6. F-Statistic: The F-statistic is used to test if there are significant differences between the group means. It is calculated as:
M SBetween
F =
M SWithin
Hypothesis Testing:
Null Hypothesis (H0 ): The means of all groups are equal (α1
= α2 = ⋯ = αk = 0).
Decision Rule:
If F is greater than the critical value Fcritical from the F-distribution table (at a chosen significance level, e.g., 0.05), reject the null
hypothesis. This indicates that there is a significant difference between the group means.
If F is less than Fcritical , do not reject the null hypothesis, meaning there is no significant difference between the group means.
ANOVA Table:
18/36
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Value
SSBetween M SBetween
Between Groups SSBetween k−1 M SBetween =
k−1
F = M SWithin
SSWithin
Within Groups SSWithin
k(n − 1) M SWithin =
k(n−1)
-
Total SSTotal kn − 1 - -
Example:
8 10 15
7 9 16
9 11 14
8 10 17
7 8 16
Question:
Q11S18 (A): Give a complete statistical analysis of one-way classified data with an unequal number of entries in different classes.
Solution:
In this case, we have a one-way classification, but the number of observations per class (or group) may not be equal. This situation is
common in real-world data, where different groups may have different sample sizes.
The basic approach remains the same, but the calculation of the sum of squares (SS) and degrees of freedom (df) needs to account for
the unequal number of entries in each class.
Assumptions:
The data is normally distributed within each class.
Let the data be classified into k groups, with n1 , n2 , … , nk entries in each group.
Yij = μ + αi + ϵij
Where:
ϵij is the random error term, assumed to be normally distributed with mean 0 and variance σ 2 .
19/36
1. Compute the Total Sum of Squares (SS_Total):
k ni
SSTotal = ∑ ∑(Yij − Yˉ )2
i=1 j=1
This measures the variation due to the differences between the group means. Since the number of observations in each group is
unequal, we multiply each group's mean deviation from the overall mean by the sample size for that group.
k
SSBetween = ∑ ni (Yˉi − Yˉ )2
i=1
This measures the variation within each group due to the random error. Since the sample sizes are unequal, we compute the
deviation of each observation within a group from that group's mean.
k ni
SSWithin = ∑ ∑(Yij − Yˉi )2
i=1 j=1
4. Degrees of Freedom:
SSBetween
M SBetween =
dfBetween
SSWithin
M SWithin =
dfWithin
6. F-Statistic: The F-statistic is used to test if there are significant differences between the group means. It is calculated as:
M SBetween
F =
M SWithin
Hypothesis Testing:
Null Hypothesis (H0 ): The means of all groups are equal (α1
= α2 = ⋯ = αk = 0).
Decision Rule:
If F is greater than the critical value Fcritical from the F-distribution table (at a chosen significance level, e.g., 0.05), reject the null
hypothesis. This indicates that there is a significant difference between the group means.
If F is less than Fcritical , do not reject the null hypothesis, meaning there is no significant difference between the group means.
ANOVA Table:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Value
SSBetween M SBetween
Between Groups SSBetween k−1 M SBetween =
k−1
F = M SWithin
20/36
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Value
SSWithin
Within Groups SSWithin N −k M SWithin =
N −k
-
Total SSTotal N −1 - -
Example:
8 10 15
7 9 16
9 11 14
8 10 17
Question:
Q12S16 (A): Give the complete statistical analysis of two-way classified data with one observation per cell.
Solution:
In a two-way classification, we have two factors, and each combination of the factors has exactly one observation. This scenario is
simpler than situations where we have multiple observations per cell because we only need to focus on the interaction between the two
factors.
In this case, the data is organized into a matrix with r × c cells, and each cell contains one observation.
The model for the data is:
Yij = μ + αi + βj + ϵij
Where:
Yij is the observation at the i-th level of factor A and the j -th level of factor B.
ϵij is the random error term, assumed to be normally distributed with mean 0 and variance σ 2 .
i=1 j=1
21/36
ˉ is the overall mean of all the observations.
Where Y
2. Compute the Sum of Squares for Factor A (SS_A): This represents the variation due to factor A, i.e., the effect of the different levels
of factor A. It is calculated as:
r
SSA = c ∑(Yˉi − Yˉ )2
i=1
ˉi is the mean of the i-th level of factor A, and c is the number of levels of factor B.
Where Y
3. Compute the Sum of Squares for Factor B (SS_B): This represents the variation due to factor B. It is calculated as:
c
SSB = r ∑(Yˉj − Yˉ )2
j=1
ˉj is the mean of the j -th level of factor B, and r is the number of levels of factor A.
Where Y
4. Compute the Sum of Squares for Interaction (SS_{AB}): This represents the variation due to the interaction between factors A and
B. It is calculated as:
r c
SSAB = ∑ ∑(Yij − Yˉi − Yˉj + Yˉ )2
i=1 j=1
ˉi and Yˉj are the group means for factors A and B, respectively.
Where Y
5. Compute the Sum of Squares for Error (SS_Error): Since we only have one observation per cell, there is no within-cell variation, so
the error term is zero. Therefore:
SSError = 0
6. Degrees of Freedom:
SSA
M SA =
dfA
SSB
M SB =
dfB
SSAB
M SAB =
dfAB
8. F-Statistics:
M SA
FA =
M SError
M SB
FB =
M SError
22/36
M SAB
FAB =
M SError
Hypothesis Testing:
Null Hypothesis for Factor A (H0 ): There is no significant effect of factor A on the observations (α1
= α2 = ⋯ = αr = 0).
Null Hypothesis for Factor B (H0 ): There is no significant effect of factor B on the observations (β1
= β2 = ⋯ = βc = 0).
Null Hypothesis for Interaction (H0 ): There is no interaction between factors A and B (αi βj
= 0 for all i, j ).
Decision Rule:
If the computed F-statistic is greater than the critical value from the F-distribution table (at a chosen significance level, say 0.05),
reject the null hypothesis, indicating a significant effect.
If the computed F-statistic is less than the critical value, do not reject the null hypothesis.
ANOVA Table:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Value
Factor A SSA
r−1 M SA
FA
Factor B SSB
c−1 M SB
FB
Interaction (A * B) SSAB
(r − 1)(c − 1) M SAB
FAB
Error SSError
0 0 -
Total SSTotal
rc − 1 - -
Example:
Let’s say we have the following data with two factors (A and B):
Factor A \ Factor B B1 B2
A1 5 8
A2 7 6
1. Compute the means for each factor level and the overall mean.
2. Calculate the sum of squares for each source of variation (A, B, and interaction).
Question:
Q13S18 (E): Give a complete statistical analysis of two-way classified data with one observation per cell.
Solution:
This is similar to the previous question but explicitly asks for a complete statistical analysis. Let's go through the steps once again in
detail for a clearer understanding.
Data Structure:
23/36
Each combination of levels of Factor A and Factor B contains exactly one observation. Therefore, the data are arranged in a r × c matrix
with one observation per cell.
The Model:
Yij = μ + αi + βj + ϵij
Where:
Yij is the observation at the i-th level of Factor A and the j -th level of Factor B.
ϵij is the random error term (assumed to be normally distributed with mean 0 and variance σ 2 ).
r c
SSTotal = ∑ ∑(Yij − Yˉ )2
i=1 j=1
rc i=1 j=1
r
SSA = c ∑(Yˉi − Yˉ )2
i=1
c
1
Yˉi = ∑ Yij
c j=1
c
SSB = r ∑(Yˉj − Yˉ )2
j=1
r
1
Yˉj = ∑ Yij
r
i=1
r c
SSAB = ∑ ∑(Yij − Yˉi − Yˉj + Yˉ )2
i=1 j=1
ˉi and Yˉj are the means for the levels of Factors A and B, respectively.
Where Y
24/36
5. Sum of Squares for Error (SS_Error):
In this case, because we only have one observation per cell, the error sum of squares is:
SSError = 0
SSA
M SA =
dfA
SSB
M SB =
dfB
SSAB
M SAB =
dfAB
8. F-Statistics:
The F-statistics are used to test the significance of the factors and interaction. They are calculated as:
M SA
FA =
M SError
M SB
FB =
M SError
M SAB
FAB =
M SError
9. Hypothesis Testing:
Null Hypothesis for Factor A (H0 ): There is no significant effect of Factor A on the observations.
Null Hypothesis for Factor B (H0 ): There is no significant effect of Factor B on the observations.
Null Hypothesis for Interaction (H0 ): There is no interaction between Factors A and B.
If the computed F-statistic is less than the critical value, do not reject the null hypothesis.
25/36
ANOVA Table:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic
Error 0 0 0 -
Total SSTotal rc − 1 - -
Example:
Let’s assume the following data:
Factor A \ Factor B B1 B2
A1 12 14
A2 18 20
Question:
Q14S19 (A): In a two-way ANOVA with m (m > 1) entries per cell, explain the linear model and obtain the least square estimates of the
parameters in this mathematical model. State the hypothesis to be tested and interpret the conclusions under the following situations
in this ANOVA:
(i) When the hypothesis of interaction between the two factors is accepted at the 5% level of significance
(ii) When the hypothesis of interaction between the two factors is rejected.
Solution:
Introduction:
In a two-way ANOVA with m entries per cell, we have two factors, Factor A and Factor B, each with r and c levels, respectively. Each
combination of Factor A and Factor B has m observations.
Data Structure:
Let:
The number of observations in each cell is m, so we have rc cells, and each cell contains m observations.
The Model:
The linear model for the two-way ANOVA with m entries per cell is:
Yij k = μ + αi + βj + ϵijk
Where:
Yij k is the k -th observation at the i-th level of Factor A and the j -th level of Factor B.
26/36
αi is the effect of the i-th level of Factor A.
ϵijk is the random error term for the k -th observation in the i-th and j -th level combination (assumed to be normally distributed
Assumptions:
1. The errors ϵijk are independent and normally distributed with mean 0 and common variance σ 2 .
r c m
SSTotal = ∑ ∑ ∑(Yijk − Yˉ )2
r c m
1
Yˉ = ∑ ∑ ∑ Yijk
r
SSA = m ∑(Yˉi − Yˉ )2
i=1
c m
1
Yˉi = ∑ ∑ Yijk
c j=1
k=1
c
SSB = m ∑(Yˉj − Yˉ )2
j=1
r m
1
Yˉj = ∑ ∑ Yijk
r
i=1 k=1
r c
SSAB = m ∑ ∑(Yˉij − Yˉi − Yˉj + Yˉ )2
i=1 j=1
m
1
Yˉij = ∑ Yijk
m
k=1
27/36
r c m
SSError = ∑ ∑ ∑(Yijk − Yˉij )2
SSA
M SA =
dfA
SSB
M SB =
dfB
SSAB
M SAB =
dfAB
SSError
M SError =
dfError
4. F-Statistics:
The F-statistics are calculated as:
M SA
FA =
M SError
M SB
FB =
M SError
M SAB
FAB =
M SError
5. Hypothesis Testing:
Null Hypothesis for Factor A: H0 : α1 = α2 = ⋯ = αr = 0 (No effect of Factor A)
Null Hypothesis for Interaction: H0 : αi βj = 0 for all i and j (No interaction effect between Factor A and Factor B)
6. Decision Rule:
28/36
If the computed F -statistic for Factor A or Factor B is greater than the critical value from the F -distribution table, reject the null
hypothesis.
If the computed F -statistic for Interaction is significant, then there is interaction between the factors, and the main effects should
be interpreted carefully.
7. Interpretation of Results:
(i) When the hypothesis of interaction between the two factors is accepted at the 5% significance level:
The effect of one factor depends on the level of the other factor.
The interpretation should focus on the interaction and avoid interpreting the main effects independently.
(ii) When the hypothesis of interaction between the two factors is rejected:
In this case, the main effects of Factor A and Factor B can be interpreted independently.
The effects of Factor A and Factor B are additive and do not depend on each other.
Conclusion:
The analysis involves partitioning the variance into components attributable to Factor A, Factor B, their interaction, and error. Based on
the F-statistics and hypothesis testing, you can interpret whether the factors or their interaction have a significant effect on the data.
Question:
Q15W19 (B): Write the linear model for two-way classification with m entries per cell, explaining the constants in the model. State the
null and alternative hypotheses to be tested, and prepare the ANOVA table for two-way ANOVA with m entries (m > 1) per cell.
Solution:
Introduction:
In a two-way classification with m entries per cell, we are studying the interaction between two factors, Factor A and Factor B, each
having multiple levels. This type of ANOVA is used when we want to analyze how two factors affect a dependent variable, both
independently and in interaction with each other.
Data Structure:
Let:
Each combination of Factor A and Factor B contains m observations, so the total number of observations is rc × m.
The general linear model for two-way classification with m entries per cell is as follows:
Yijk = μ + αi + βj + ϵijk
Where:
Yijk is the k -th observation in the i-th level of Factor A and the j -th level of Factor B.
29/36
ϵijk is the random error term for the k -th observation in the i-th and j -th level combination (assumed to be independent and
βj represents the effect of the j -th level of Factor B on the dependent variable.
Assumptions:
The errors ϵijk are independent and normally distributed with mean 0 and variance σ 2 .
Hypotheses to be Tested:
1. For Factor A:
2. For Factor B:
Null hypothesis: H0 : αi βj = 0 for all i and j (No interaction between Factor A and Factor B)
ANOVA Table:
The ANOVA table partitions the total variability into components attributed to Factor A, Factor B, their interaction, and the error term.
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Statistic (F)
SSA M SA
Factor A SSA dfA = r − 1 M SA =
dfA
FA =
M SError
SSB M SB
Factor B SSB dfB = c − 1
M SB =
dfB
FB =
M SError
SSAB M SAB
Interaction (A * B) SSAB dfAB = (r − 1)(c − 1)
M SAB =
dfAB
FAB =
M SError
SSError
Error SSError dfError = rc(m − 1) M SError =
dfError
N/A
Total SSTotal
dfTotal = rcm − 1
N/A N/A
r c m
SSTotal = ∑ ∑ ∑(Yijk − Yˉ )2
r
SSA = m ∑(Yˉi − Yˉ )2
i=1
30/36
3. Sum of Squares for Factor B (SS_B):
c
SSB = m ∑(Yˉj − Yˉ )2
j=1
r c
SSAB = m ∑ ∑(Yˉij − Yˉi − Yˉj + Yˉ )2
i=1 j=1
ˉij is the mean for the i-th level of Factor A and the j -th level of Factor B.
Where Y
r c m
SSError = ∑ ∑ ∑(Yijk − Yˉij )2
Conclusion:
The ANOVA table helps to partition the total variability into the components due to Factor A, Factor B, their interaction, and the error. By
calculating the F-statistics and comparing them with the critical value from the F-distribution table, we can decide whether to reject or
fail to reject the null hypotheses for Factor A, Factor B, and their interaction.
Question:
Q16W18 (B): Define:
Solution:
Introduction:
In the context of ANOVA (Analysis of Variance), we distinguish between Fixed effect models and Random effect models based on the
type of factors (or treatments) being analyzed. These models describe how the factors (independent variables) are treated in terms of
their levels and variability.
Definition:
In a Fixed effect model, the levels of the factor(s) under consideration are fixed and specifically chosen by the researcher. The primary
goal of the fixed effect model is to make inferences about the specific levels of the factor(s) that are included in the study.
Key Features:
Factors (Treatments) are fixed: The levels of the factor(s) are chosen deliberately, and the conclusions drawn are only valid for
these specific levels.
Focus on comparing the levels of the factor: The aim is to test whether different levels of the factor have a significant effect on the
response variable.
No generalization: Inferences are limited to the specific levels in the study, and the model does not generalize to other levels
beyond those studied.
Mathematical Form:
In a simple one-way fixed effect model, the linear model can be written as:
31/36
Yij = μ + αi + ϵij
Where:
ϵij is the random error term, assumed to be normally distributed with mean 0 and variance σ 2 .
Example:
If a researcher is studying the effect of three different teaching methods on student performance, and they deliberately choose those
three methods to study, they are using a fixed effect model. The conclusions about these specific methods cannot be generalized to
other methods that were not included in the study.
Definition:
In a Random effect model, the factor(s) under consideration are considered to be random samples from a larger population of possible
factor levels. The researcher is interested in making inferences about the variability in the population of levels, not about specific levels.
Key Features:
Factors are random: The levels of the factor(s) are randomly selected from a larger population. These levels are not fixed and could
vary from one study to another.
Focus on estimating variance components: The aim is to understand the variance within and between groups and how the
random factor(s) contribute to the variability in the response.
Generalization: The inferences are meant to generalize to the larger population from which the factor levels were randomly chosen.
Mathematical Form:
In a simple one-way random effect model, the linear model can be written as:
Yij = μ + αi + ϵij
Where:
ϵij is the random error term, assumed to be normally distributed with mean 0 and variance σ 2 .
Example:
If a researcher is studying the effect of different schools on student performance, and the schools are randomly selected from a
population of schools, the schools are considered as random effects. The conclusion can be generalized to other schools in the
population, but not to the specific schools included in the study.
Nature of Factors Factors are fixed and specifically chosen Factors are randomly chosen from a larger population
32/36
Feature Fixed Effect Model Random Effect Model
Inferences Valid for the specific levels studied Generalizes to the entire population of factor levels
Focus Comparison of specific levels of the factor Estimation of variability due to random factors
Example Comparing specific drug doses Comparing random selection of hospitals or schools
Variance Component Only within-group variance is considered Both within-group and between-group variances are considered
Conclusion:
Fixed effect models are used when the researcher is interested in the specific levels of the factor and wants to compare these
levels.
Random effect models are used when the researcher is interested in the variability of the factor levels and wants to generalize the
results to a broader population.
Question:
Q17W19 (E): In the context of analysis of variance, explain the linear model. Define the fixed effect model, mixed effect model, and
random effect model.
Solution:
Introduction:
In the context of ANOVA (Analysis of Variance), the linear model describes the relationship between the response variable and the
factors involved in the experiment. The fixed effect model, random effect model, and mixed effect model are three different ways of
treating the factors (independent variables) based on whether they are considered fixed, random, or a combination of both.
Yij = μ + αi + ϵij
Where:
Yij is the j -th observation from the i-th group (factor level).
ϵij is the random error associated with the j -th observation in the i-th group, assumed to be independent and normally distributed
This model represents the basic structure for ANOVA, where we analyze the variability in the response variable by partitioning it into
components attributable to factors and random errors.
Definition:
33/36
In the Fixed effect model, the factors or treatments are considered fixed, meaning the levels of the factor(s) are specifically chosen and
are of primary interest to the researcher. The fixed effects model is used when the researcher is interested in comparing the specific
levels of a factor and drawing conclusions only about those levels.
Key Points:
The levels of the factors are predefined and fixed by the researcher.
The goal is to make inferences about the specific levels of the factors studied, not to generalize to other levels.
Mathematical Form:
The linear model for a one-way fixed effect can be written as:
Yij = μ + αi + ϵij
Where αi is a fixed effect for the i-th treatment or factor level, and the error terms ϵij are assumed to be normally distributed with
Example:
If a study is conducted to compare three specific brands of fertilizers, and these brands are preselected, then the levels of the factor
(fertilizers) are fixed. The researcher is only interested in comparing these three brands.
Definition:
In the Random effect model, the factors are considered random, meaning the levels of the factor are randomly chosen from a larger
population. The goal is to make inferences about the variability in the population of factor levels, rather than about the specific levels
themselves.
Key Points:
The levels of the factors are randomly selected from a larger population of possible levels.
The focus is on the variance due to the random effects, and generalizations are made about the population of factor levels.
This model is typically used when the researcher is interested in estimating the variance components associated with the factors.
Mathematical Form:
The linear model for a one-way random effect can be written as:
Yij = μ + αi + ϵij
Where αi is the random effect of the i-th treatment, and αi is assumed to be normally distributed with mean 0 and variance σα2 . The
error terms ϵij are assumed to be normally distributed with mean 0 and variance σ 2 .
Example:
If a researcher is studying the effect of schools on student performance, and the schools are randomly selected from a larger pool of
schools, then the schools are treated as random factors.
Definition:
34/36
A Mixed effect model involves both fixed and random effects. In this model, some factors are treated as fixed (the researcher is
interested in the specific levels of these factors), while others are treated as random (the researcher is interested in the variability of
these factors across a population).
Key Points:
Some factors are treated as fixed (with specific levels of interest), and others are treated as random (with levels randomly selected
from a larger population).
This model is useful when there are factors for which the researcher is interested in specific comparisons, and others for which the
goal is to understand variability.
It helps in estimating both fixed effects (for comparison) and random effects (for variance components).
Mathematical Form:
For a mixed effect model, the linear model can be written as:
Yij = μ + αi + βj + ϵij
Where:
Example:
A study could be conducted to examine the effect of specific treatments on plant growth, where the treatments are fixed effects, but the
plots of land on which the treatments are applied are considered random effects.
Nature of Fixed levels (specific treatments) Randomly chosen levels from a Both fixed and random levels
Factors population
Focus Comparison of specific levels Estimation of variability in the Comparison of specific levels & estimation of
population variability
Inferences Valid for specific levels Generalizes to the entire Generalizes to the population but includes fixed-level
population inferences
Example Comparing drug doses Comparing random schools or Comparing treatments, with random effects from
hospitals different locations
Use Case When the factor levels are fixed When the factor levels are random When a combination of fixed and random effects is
and chosen involved
Conclusion:
The Fixed effect model is used when the researcher is interested in specific levels of factors and making comparisons between
them.
The Random effect model is used when the researcher is interested in generalizing the findings to a larger population of factor
levels, estimating variability.
35/36
The Mixed effect model is a combination of both fixed and random effects, useful when there is a need to estimate both specific
treatment effects and the variability due to random factors.
36/36
STAT II
Here’s a simplified summary of the syllabus:
Two-way classification: Analyzing variance with two factors, including cases with equal number of entries per cell.
Plot and Block Design: Importance of plot size, shape, and arrangement.
Analysis of Completely Randomized Design: Understanding this design and its analysis.
Main Effects and Interaction Effects: Calculating the effects of factors individually and in combination.
MS Excel and R: Using software for one-way, two-way, and two-way with multiple entries classification ANOVA.
This syllabus provides a clear breakdown of analysis and experimental design methods used in statistical research.
UNIT I
UNIT 2
Q1 (S16):
Discuss the three principles of the design of experiments. What are uniformity trials? Explain their use.
1/68
Answer:
Purpose: Eliminates bias and ensures each experimental unit has an equal chance of receiving any treatment.
Purpose: Improves the precision of the experiment by reducing the impact of variability.
σ2
Variance of Mean (σx2ˉ ) =
r
where:
r: Number of replications.
3. Local Control:
Definition: Grouping experimental units into blocks with similar conditions to minimize variability within each block.
Uniformity Trials:
Definition: Preliminary experiments conducted to assess the natural variability of experimental units (e.g., soil plots, lab
samples).
Use:
Conclusion:
The three principles—randomization, replication, and local control—reduce experimental error and improve the reliability of
results. Uniformity trials play a crucial role in optimizing plot size and arrangement.
2/68
Should I proceed to Q2 (S17)? 😊
Q2 (S17):
What is meant by "Design of an Experiment"? Describe the three basic principles of experimental design. How are these
principles used in C.R.D., R.B.D., and L.S.D.?
Answer:
Objective: Maximize the efficiency of an experiment by minimizing error and ensuring reliable conclusions.
2. Replication:
Example Layout:
Unit 1 2 3
Treatment A B C
3/68
2. Randomized Block Design (R.B.D.):
Local Control: Blocks are formed based on homogeneity (e.g., soil fertility).
Example Layout:
2 14 16 19
Replication: Each treatment appears exactly once in every row and column.
Local Control: Controls variability in two directions (e.g., row and column effects).
Example Layout:
1 2 3
1 A B C
2 B C A
3 C A B
Conclusion:
The principles of randomization, replication, and local control are applied differently in C.R.D., R.B.D., and L.S.D., depending on
the experimental layout. These principles ensure unbiased and precise results.
2. Uniformity trials
3. Fertility gradient
4/68
Answer:
Plot Size: Refers to the area assigned to each experimental unit where a single treatment is applied.
Larger Plots: Reduce experimental error but may increase cost and variability within plots.
Smaller Plots: Economical but may lead to higher error due to variability.
Plot Shape: Should minimize variability due to external factors like soil fertility or moisture.
Rectangular plots are often used, with a longer dimension aligned along the fertility gradient to reduce variability.
2. Uniformity Trials:
Definition: Preliminary experiments conducted on the experimental site without applying treatments to understand natural
variability.
Purpose:
3. Fertility Gradient:
Impact on Design:
Managed using local control principles by creating blocks along the gradient.
Example: If fertility decreases along rows, blocks should align with rows to minimize error.
Definition: Groups similar experimental units into blocks to minimize the effect of variability caused by extraneous factors
(e.g., soil fertility).
Application in Designs:
5/68
Conclusion:
These concepts are foundational for designing effective experiments. They help reduce error and ensure reliable conclusions by
managing variability systematically.
1. Treatments
4. Efficiency of a design
Answer:
1. Treatments:
Purpose:
To study the effect of treatments on the response variable (e.g., crop yield).
Uniformity Trials:
Fertility Gradient:
6/68
3. Size and Shape of Plots and Blocks:
Size:
Shape:
Long rectangular plots are often used to align with fertility gradients.
4. Efficiency of a Design:
Definition: A measure of how well a design reduces error and increases precision in estimating treatment effects.
Conclusion:
These terms collectively define critical components of experimental design, ensuring systematic planning, accurate analysis, and
meaningful conclusions.
Answer:
Layout of C.R.D.:
In a Completely Randomized Design, treatments are assigned randomly to all experimental units.
7/68
Example Layout: Suppose there are 4 treatments (T1 , T2 , T3 , T4 ) and 3 replications (r
= 3).
A possible random assignment could be:
T1 T4 T2
T3 T2 T1
T4 T3 T3
T2 T1 T4
Merits of C.R.D.:
Demerits of C.R.D.:
1. Sensitivity to Heterogeneity:
If experimental units are not uniform, variability may increase, leading to incorrect conclusions.
Mathematical Model:
Yij = μ + τi + ϵij
Where:
μ: Overall mean.
τi : Effect of the i-th treatment (i = 1, 2, … , t).
8/68
Analysis of Variance (ANOVA) Table:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Ratio
t−1
F = M SE
(Yij − Yˉi )2
t ri SSE
Error SSE = ∑i=1 ∑j=1 N −t M SE =
N −t
− Yˉ )2
t ri
Total SST ot = ∑i=1 ∑j=1 (Yij N −1
Where:
t: Number of treatments.
ri : Number of replications for the i-th treatment.
Yˉ : Overall mean.
Answer:
1. Randomization:
Helps eliminate bias and ensures that each treatment has an equal chance of being assigned to any unit.
2. Replication:
Increases precision and provides a more reliable estimate of the experimental error.
As C.R.D. assumes homogeneous experimental units, local control is not explicitly applied.
Yij = μ + τi + ϵij
Where:
9/68
Yij : Observation from the i-th treatment and j -th replication.
μ: Overall mean.
τi : Effect of the i-th treatment (i = 1, 2, … , t).
Steps in Analysis:
1. Hypotheses:
ˉ ):
2. Calculate the Overall Mean (Y
t r
1 i
Yˉ = ∑ ∑ Yij
N i=1 j=1
t
Where N = ∑i=1 ri is the total number of observations.
ˉi ):
3. Calculate the Treatment Means (Y
r
1 i
Yˉi = ∑ Yij
ri
j=1
t ri
SST ot = ∑ ∑(Yij − Yˉ )2
i=1 j=1
t
SST = ∑ ri (Yˉi − Yˉ )2
i=1
Treatments: dfT
=t−1
Error: dfE
=N −t
Total: dfT ot
=N −1
6. Mean Squares (MS):
SST
M ST =
dfT
SSE
M SE =
dfE
7. F-Statistic:
S
10/68
M ST
F =
M SE
Compare F -statistic with the critical value from F -distribution for dfT and dfE .
ANOVA Table:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Ratio
t−1
F = M SE
N −t
Decision Rule:
Conclusion:
The complete statistical analysis of C.R.D. involves computing the sum of squares, degrees of freedom, mean squares, and the F-
ratio to test the equality of treatment means.
2. Uniformity trials
3. Experimental error
5. Efficiency of a design
Answer:
Definition:
A plot is the smallest unit of land or area on which a single treatment is applied during an experiment. The size and shape of
11/68
the plot significantly impact the accuracy and precision of the experiment.
Importance:
Larger plots help in reducing variability caused by heterogeneity in the field but increase costs.
Smaller plots are economical but may not adequately capture treatment effects due to variability.
Ideal Shape:
Rectangular plots are preferred as they reduce border effects and are easier to manage.
The aspect ratio depends on the nature of the crop or experimental material.
2. Uniformity Trials:
Definition:
Preliminary experiments conducted to determine the extent of variability in the experimental field or area.
Purpose:
To estimate experimental error and decide on the plot size and shape.
Benefits:
3. Experimental Error:
Definition:
The variation in the observed responses that cannot be attributed to the treatments. It arises due to factors like soil
heterogeneity, measurement errors, or environmental conditions.
Minimization:
Definition:
A principle in the design of experiments where the experimental area is divided into homogeneous blocks or groups to
minimize variability.
Purpose:
12/68
Reduces the effect of heterogeneity in the experimental area.
Ensures more reliable results by accounting for systematic differences across blocks.
Examples in Designs:
Applied in Randomized Block Design (RBD) and Latin Square Design (LSD).
5. Efficiency of a Design:
Definition:
The ability of a design to detect significant differences between treatments. Efficiency is influenced by the design's ability to
minimize experimental error and maximize the use of available resources.
Calculation:
Efficiency is often compared with a standard design using the formula:
Answer:
Design of Experiment:
Definition:
The Design of Experiment (DOE) refers to the process of planning and organizing experiments in a systematic way to test
hypotheses or investigate treatment effects.
3. Local Control: Dividing the experimental area into homogeneous blocks to reduce variability.
13/68
Completely Randomized Design (C.R.D.):
1. Layout of C.R.D.:
The experimental units are randomly assigned treatments without any restrictions.
Example: If there are 4 treatments (T1 , T2 , T3 , T4 ) and 3 replications, a random assignment might look like this:
T1 T2 T4 T3
T3 T4 T1 T2
T2
T3
T4
T1
2. Mathematical Model:
yij = μ + τi + ϵij
Where:
μ: Overall mean.
τi : Effect of the ith treatment.
ϵij : Random error component, assumed to be independently and identically distributed (N (0, σ 2 )).
Treatment total:
r
Ti = ∑ yij
j=1
Grand total:
t
G = ∑ Ti
i=1
Overall mean:
G
yˉ =
N
Where t is the number of treatments, r is the number of replications, and N = t ⋅ r is the total number of observations.
t r
SST = ∑ ∑(yij − yˉ)2
i=1 j=1
Partition into:
r N
Error Sum of Squares (SSE):
14/68
SSE = SST − SSTreat
Total:
dfT = N − 1
Treatment:
dfT reat = t − 1
Error:
dfE = N − t
dfT reat
dfE
Step 5: F-Test
M ST reat
F =
M SE
Compare F with the critical value from F -distribution with dfT reat and dfE .
4. ANOVA Table:
Total SST N −1
5. Advantages of C.R.D.:
6. Disadvantages of C.R.D.:
15/68
Q9 (W16):
Give the complete statistical analysis of C.R.D. (Completely Randomized Design).
Answer:
The analysis for Completely Randomized Design (C.R.D.) involves the following steps:
1. Experimental Layout:
T1 T3 T2 T1
T2 T3 T1 T2
T3
T1
T3
T2
2. Mathematical Model:
yij = μ + τi + ϵij
Where:
μ: Overall mean.
τi : Effect of the ith treatment.
r
Ti = ∑ yij
j=1
t
G = ∑ Ti
i=1
ˉ):
Overall mean (y
16/68
G
yˉ =
N
Where t is the number of treatments, r is the number of replications, and N = t ⋅ r.
t r
SST = ∑ ∑(yij − yˉ)2
i=1 j=1
Partitioned into:
t
∑i=1 Ti2 G2
SST reat = −
r N
Error Sum of Squares (SSE ):
Total:
dfT = N − 1
Treatment:
dfT reat = t − 1
Error:
dfE = N − t
dfT reat
dfE
M ST reat
F =
M SE
17/68
Compare F with the critical value from the F -distribution with dfT reat and dfE .
4. ANOVA Table:
Total SST N −1
5. Example Problem:
Given Data:
3 Treatments (T1 , T2 , T3 ).
Observations:
Treatment y1 y2 y3
T1 5 7 6
T2 8 6 7
T3 4 5 3
G = T1 + T2 + T3 = 18 + 21 + 12 = 51.
G 51
yˉ =
N = 9 = 5.67.
2. Partition Sum of Squares:
t r
SST = ∑i=1 ∑j=1 (yij − yˉ)2 = ... (Perform calculations).
∑ti=1 Ti2 G2
SST reat = − = ....
r N
6. Advantages of C.R.D.:
7. Disadvantages of C.R.D.:
18/68
Less efficient if variability is present.
Answer:
1. Experimental Layout:
For a Completely Randomized Design (C.R.D.), we randomly assign the treatments to the experimental units. Suppose we have
the following setup:
Number of replications (r ) = 4
Number of observations (N ) = t × r = 3 × 4 = 12
Let’s say the data (for simplicity) for 3 treatments in 4 replications looks like this:
T3 3 3 4 5
2. Mathematical Model:
The model for C.R.D. is:
yij = μ + τi + ϵij
Where:
3. Calculations:
19/68
Treatment totals (Ti ):
T1 = 8 + 7 + 9 + 6 = 30,
T2 = 4 + 5 + 6 + 7 = 22,
T3 = 3 + 3 + 4 + 5 = 15
G = T1 + T2 + T3 = 30 + 22 + 15 = 67
ˉ):
Overall Mean (y
G 67
yˉ = = ≈ 5.58
12
t r
SST = ∑ ∑(yij − yˉ)2
i=1 j=1
After computing the above expression, we obtain the total sum of squares.
t
∑ T2 G2
SST reat = i=1 i −
r N
Substitute the values:
After calculating SST and SST reat, subtract them to get SSE .
4. Degrees of Freedom:
The degrees of freedom for each source:
dfT = N − 1 = 12 − 1 = 11
dfT reat = t − 1 = 3 − 1 = 2
20/68
Error Degrees of Freedom (dfE ):
dfE = N − t = 12 − 3 = 9
5. Mean Squares:
Now calculate the mean squares:
SST reat
M ST reat =
dfT reat
SSE
M SE =
dfE
6. F -Test:
Finally, compute the F -statistic:
M ST reat
F =
M SE
Compare this F -value with the critical value from the F -distribution table at dfT reat
= 2 and dfE = 9 to determine if the
7. ANOVA Table:
Here’s the ANOVA table for the analysis:
Source Sum of Squares (SS ) Degrees of Freedom (df) Mean Square (MS ) F -value
Treatments SST reat dfT reat
M ST reat
F -calculated
Error SSE dfE
M SE
21/68
E(SSE) = (N − t)σ 2
Where:
Answer:
1. Experimental Error:
Experimental error refers to the variation or discrepancies in the observed data that cannot be explained by the treatment or
factors being studied. This error can arise due to a variety of reasons during the experiment and does not result from the effect
of the treatments themselves.
In other words, experimental error is the deviation of the observed values from the true or expected values under the
treatment, which is usually attributed to random factors.
Natural Variation:
Differences that occur due to natural fluctuations in the environment or the experimental units, such as slight
differences in the conditions of the laboratory, measurement equipment, or even slight variations in the individuals in
the sample.
Systematic Errors:
Errors that occur consistently in the same direction and may be caused by faulty equipment, improper technique, or bias
in measurement, but they are often controllable.
Measurement Error:
Environmental Factors:
Temperature, humidity, lighting, and other environmental factors can influence the outcome of experiments.
22/68
Sampling Error:
The inherent differences between the sample and the population, leading to variations in measurements.
Subjective Bias:
(a) Randomization:
Definition:
Randomization refers to the random allocation of treatments to experimental units, meaning that each unit is equally likely
to receive any treatment.
(b) Replication:
Definition:
Replication is the process of repeating the experiment on multiple experimental units or performing multiple observations
under each treatment.
Definition:
Local control involves controlling or accounting for variability within the experimental environment. This can include
controlling environmental factors, grouping experimental units with similar characteristics, or using blocking techniques.
4. Conclusion:
In summary, experimental error arises due to uncontrolled variability in the system, and it is essential to minimize this error to
improve the reliability and accuracy of experimental results. By using the principles of randomization, replication, and local
control, experimenters can effectively manage and reduce the impact of experimental error, leading to more valid and precise
conclusions.
23/68
Should I solve Q12 (W19) next? 😊
Q12 (W19):
Carry out a complete analysis of CRD (Completely Randomized Design) by explaining its layout. Explain the test used for
testing the equality of any pair of treatment means in CRD.
Answer:
1. Experimental Units:
2. Treatment Groups:
Assume there are k different treatments, and each treatment is applied to n_i experimental units where i = 1, 2, … , k .
3. Randomization:
The experimental units are randomly assigned to one of the k treatments. Randomization is used to ensure that the
assignment of treatments is unbiased and not influenced by external factors.
4. Data Collection:
After the treatments are applied, data are collected for each experimental unit.
Diagram:
markdown
In this layout, each treatment is randomly assigned to experimental units, and the outcome is recorded for each unit.
24/68
Steps for Conducting ANOVA:
Yij = μ + τi + ϵij
Where:
μ = Overall mean.
τi = Effect of the ith treatment.
2. Sum of Squares (SS): The total variation in the data is partitioned into different components:
k ni
SST = ∑ ∑(Yij − Yˉ⋅⋅ )2
i=1 j=1
k
SST r = ∑ ni (Yˉi⋅ − Yˉ⋅⋅ )2
i=1
df for Treatments: k −1
k
df for Error: N − k Where N is the total number of observations (i.e., N = ∑i=1 ni ).
SST r
MST r =
k−1
SSE
MSE =
N −k
5. F-statistic: The F-statistic is used to test whether the treatments are significantly different:
MST r
F =
MSE
If F is large enough (greater than the critical value from the F-distribution), then we reject the null hypothesis that all
treatment means are equal.
25/68
To test whether any two treatment means are significantly different from each other, we use multiple comparisons. The most
common method is the Tukey’s Honest Significant Difference (HSD) test, which compares all pairs of treatment means.
Tukey's HSD:
The Tukey HSD test is used after ANOVA when we have a significant F-statistic. It calculates the minimum difference between
treatment means that can be considered statistically significant. The formula for HSD is:
MSE
HSD = qα,k,N −k ×
n
Where:
If the difference between any two treatment means exceeds the HSD value, then those two means are considered significantly
different.
4. Conclusion:
The CRD is a simple yet powerful design to analyze the effects of different treatments. The analysis of variance (ANOVA) helps
partition the total variation into components due to treatments and error, and the Tukey's HSD test can be used to test the
equality of treatment means in case of significant differences.
UNIT 3
This concludes the solution for Q12 (W19). Would you like me to assist with anything else or continue with another question? 😊
Topic 1: Randomized Block Design (RBD)
Q1 (S16): What is meant by a Randomized Block Design (RBD)? Provide the analysis of variance for this design, clearly
stating the mathematical model and the underlying assumptions.
Answer:
The Randomized Block Design (RBD) is a type of experimental design in which the experimental units are divided into groups
(called blocks) that are similar to each other. Within each block, treatments are randomly assigned. The goal of RBD is to reduce
the impact of variability within experimental units by grouping them into blocks based on some common characteristic.
Steps of RBD:
1. Blocking:
Divide the experimental units into b blocks, where each block is more homogeneous than the overall experimental units.
2. Randomization:
Within each block, treatments are assigned randomly to the experimental units.
26/68
3. Data Collection:
After applying the treatments, data is collected from each experimental unit.
Diagram:
markdown
Yij = μ + τi + βj + ϵij
Where:
Yij = Observation for the j th experimental unit in the ith treatment and j th block.
ϵij = Random error associated with the j th unit in the ith treatment and j th block.
3. Underlying Assumptions:
2. Normality: The errors ϵij are normally distributed with mean zero and constant variance σ 2 .
3. Random assignment: Treatments are randomly assigned to units within each block.
The total variation in the experiment is divided into components due to treatments, blocks, and error. The steps for ANOVA are as
follows:
b r
SST = ∑ ∑(Yij − Yˉ⋅⋅ )2
i=1 j=1
k
SST r = r ∑(Yˉi⋅ − Yˉ⋅⋅ )2
i=1
27/68
ˉi⋅ is the mean of the ith treatment.
Where Y
b
SSB = k ∑(Yˉ⋅j − Yˉ⋅⋅ )2
j=1
df for Treatments: k −1
df for Blocks: b − 1
7. F-statistics:
MST r MSB
Ftreatment =
,
Fblock =
MSE MSE
If Ftreatment or Fblock is large enough (greater than the critical value), we reject the null hypothesis that there are no
This is a detailed solution for Q1 (S16). Would you like me to proceed with the next question (Q2.1) from the list?
Q2.1 (S17): Give a layout of a Randomized Block Design (R.B.D.) with 3 blocks and 4 treatments.
State the mathematical model of R.B.D. and describe various terms involved in it. Provide the
complete analysis of an R.B.D., assuming the expected values of various sum of squares.
Answer:
In this case, there are 3 blocks and 4 treatments. Each treatment is applied to every block, and the treatments are assigned
randomly within each block.
Here is a possible layout of the Randomized Block Design with 3 blocks and 4 treatments:
markdown
Block 1 | T1 | T2 | T3 | T4
----------------------------
Block 2 | T3 | T1 | T4 | T2
28/68
----------------------------
Block 3 | T2 | T4 | T1 | T3
Where:
Block 1, Block 2, Block 3 are the three blocks in which treatments are randomly assigned.
Yij = μ + τi + βj + ϵij
Where:
Yij = Observation for the j th experimental unit in the ith treatment and j th block.
ϵij = Random error associated with the j th unit in the ith treatment and j th block.
3. Analysis of R.B.D.
To perform the analysis of variance (ANOVA) for the Randomized Block Design, we divide the total variation into components
due to treatments, blocks, and error. Below is the step-by-step analysis:
b r
SST = ∑ ∑(Yij − Yˉ⋅⋅ )2
i=1 j=1
k
SST r = r ∑(Yˉi⋅ − Yˉ⋅⋅ )2
i=1
b
SSB = k ∑(Yˉ⋅j − Yˉ⋅⋅ )2
j=1
This is the unexplained variation after accounting for treatment and block effects.
29/68
4. Degrees of Freedom:
1. MS for Treatments:
SST r SST r
MST r = =
k−1 3
2. MS for Blocks:
SSB SSB
MSB = =
b−1 2
3. MS for Error:
SSE SSE
MSE = =
(k − 1)(b − 1) 6
6. F-statistics:
To test the significance of the treatments and blocks, we calculate the F-statistics:
1. F for Treatments:
MST r
Ftreatment =
MSE
2. F for Blocks:
MSB
Fblock =
MSE
We compare these F-values with the F-distribution to determine if there is a statistically significant difference between
treatments and blocks.
Conclusion:
This is the detailed solution for Q2.1 (S17). Would you like me to proceed with the next question (Q2.2)?
Q2.2 (S17): Explain the procedure for testing the significance of the difference between two
treatment means in R.B.D.
Answer:
30/68
Procedure for Testing the Significance of the Difference Between Two Treatment Means in R.B.D.
To test the significance of the difference between two treatment means in a Randomized Block Design (R.B.D.), we use a
hypothesis testing procedure based on the analysis of variance (ANOVA). Below is the detailed procedure:
Null Hypothesis (H0 ): There is no significant difference between the two treatment means.
H0 : μ i = μ j
(where μi and μj are the means of two treatments)
Alternative Hypothesis (H1 ): There is a significant difference between the two treatment means.
H1 : μ i
= μj
We perform the ANOVA to partition the total variability into components due to treatments, blocks, and error. This step helps us
understand how much of the variability is explained by the treatment differences.
b r
SST = ∑ ∑(Yij − Yˉ⋅⋅ )2
i=1 j=1
k
SST r = r ∑(Yˉi⋅ − Yˉ⋅⋅ )2
i=1
SST r
MST r =
k−1
SSE
MSE =
(k − 1)(b − 1)
31/68
Step 4: Calculate the F-statistic
Next, we calculate the F-statistic for the treatments to compare the variance due to treatments against the variance due to error:
MST r
Ftreatment =
MSE
The F-statistic is compared to the critical F-value from the F-distribution table at a chosen significance level (α, typically 0.05).
If Ftreatment
> Fcritical , we reject the null hypothesis (H0 ) and conclude that there is a significant difference between the
treatment means.
If Ftreatment
≤ Fcritical , we fail to reject the null hypothesis and conclude that there is no significant difference between the
treatment means.
If we reject the null hypothesis, indicating that there is a significant difference between treatments, we may need to perform a
post-hoc test (such as Tukey’s Honest Significant Difference test) to identify which specific treatment means differ from each
other.
Conclusion:
This is the detailed procedure for testing the significance of the difference between two treatment means in a Randomized Block
Design. Would you like me to proceed with the next question (Q3)?
Q3 (S19): Explain how the three principles of design of experiments are used in an RBD
(Randomized Block Design). State the linear model in RBD. Derive expressions for the expected
values of the total sum of squares, block sum of squares, treatment sum of squares, and error
sum of squares in an RBD.
Answer:
In a Randomized Block Design (RBD), the three core principles of design of experiments are randomization, replication, and
blocking. Let’s discuss how each principle is applied in RBD:
1. Randomization:
32/68
This principle ensures that the treatments are assigned to experimental units randomly within each block. This helps
eliminate bias and allows for generalization of results. In RBD, the treatments are randomly assigned to experimental
units within each block.
2. Replication:
Replication involves repeating the experiment multiple times to ensure that the results are reliable and can be
generalized. In RBD, each treatment is applied in each block, and blocks themselves are repeated (if possible) to increase
the precision of the results.
3. Blocking:
This principle helps control for variability by grouping similar experimental units together in blocks. In RBD, blocks are
formed by grouping experimental units that are similar, and the treatments are applied randomly within these blocks to
control for block-related variability.
In Randomized Block Design (RBD), we assume that the data can be modeled as follows:
Yij = μ + τi + βj + ϵij
Where:
Yij = Response for the j th observation in the ith treatment and j th block.
μ = Overall mean.
τi = Effect of the ith treatment (fixed effect).
ϵij = Random error associated with the ith treatment in the j th block (assumed to be independent and identically distributed
Now, we derive the expected values of the sum of squares involved in the analysis of variance (ANOVA) for RBD.
The Total Sum of Squares represents the total variability in the data and is given by:
b r
SST = ∑ ∑(Yij − Yˉ⋅⋅ )2
i=1 j=1
The Treatment Sum of Squares represents the variability due to the treatments. It is given by:
k
SST r = r ∑(Yˉi⋅ − Yˉ⋅⋅ )2
i=1
33/68
The Block Sum of Squares represents the variability due to the blocks. It is given by:
b
SSB = k ∑(Yˉ⋅j − Yˉ⋅⋅ )2
j=1
The Error Sum of Squares represents the residual or unexplained variability due to random error. It is given by:
This error sum of squares accounts for the randomness in the data that is not explained by the treatments or blocks.
E(SST ) = (b ⋅ r − 1)σ 2
2. Expected Value of SSTr (Treatment Sum of Squares):
E(SSE) = (b − 1)(r − 1) ⋅ σ 2
Conclusion:
These derivations provide a clear understanding of how the sums of squares are partitioned in a Randomized Block Design
(RBD), allowing us to understand the variability in the data attributable to treatments, blocks, and error.
Q4 (W16): What is a Randomized Block Design (RBD)? Discuss its advantages and disadvantages
compared to Completely Randomized Design (CRD).
Answer:
34/68
A Randomized Block Design (RBD) is an experimental design used to control variability by grouping similar experimental units
into blocks. Within each block, treatments are randomly assigned to experimental units. This design is particularly useful when
there are known sources of variability (such as environmental factors or characteristics of subjects) that need to be controlled.
In RBD, the idea is to reduce the experimental error by ensuring that units within each block are more similar to each other. By
controlling for variability within each block, the design increases the precision of treatment comparisons.
Yij = μ + τi + βj + ϵij
Where:
Yij = Response for the j th observation in the ith treatment and j th block.
μ = Overall mean.
τi = Effect of the ith treatment (fixed effect).
ϵij = Random error associated with the ith treatment in the j th block (assumed to have mean 0 and variance σ 2 ).
1. Reduction in Variability:
By grouping similar experimental units into blocks, the RBD helps control variability within the experimental units. This
leads to more precise treatment comparisons.
2. Increased Precision:
Since blocks are homogenous, the variability within each block is minimized, increasing the accuracy of treatment effect
estimates.
RBD can effectively control for known sources of variability (such as environmental conditions) that could otherwise
confound the results.
The use of blocks allows for a better estimation of treatment effects because the influence of extraneous variability is
reduced.
5. Flexibility:
RBD can be adapted to experimental situations where blocking factors (e.g., age, gender, environmental conditions) are
important, making it a versatile experimental design.
1. Increased Complexity:
RBD requires a clear understanding of which factors should be used to form blocks. If the blocking factor is not properly
chosen, it may not lead to better precision.
2. Limited Applicability:
35/68
RBD is useful when the variability between blocks is high, but if there is little variability within blocks, the design may not
be as beneficial.
3. Block Sizes:
RBD often requires that each block has the same number of experimental units, which may not always be feasible in
some experimental settings.
It may not always be easy to form blocks in a meaningful way. The blocks should contain units that are as similar as
possible, but if this is not achievable, the benefits of the design may be diminished.
A Completely Randomized Design (CRD) is the simplest experimental design in which treatments are randomly assigned to all
experimental units without any consideration for potential variability in the experimental units. Let’s compare RBD and CRD:
Control of Controls variability by grouping similar units into No control of variability, as treatments are randomly assigned
Variability blocks. to all units.
Efficiency More efficient when there are known sources of Less efficient if there is significant variability between
variability. experimental units.
Complexity More complex due to the need for blocks and careful Simpler and easier to implement.
planning.
Applicability Suitable for experiments where units are naturally Suitable for homogeneous populations or when blocking is
grouped. not possible.
Precision of Increases precision by reducing error variance. Less precise due to lack of control over variability.
Estimates
Statistical Analysis Requires more complex analysis (e.g., ANOVA for Simpler analysis (e.g., single-way ANOVA).
blocks).
Flexibility Flexible in handling different types of blocking Less flexible; doesn’t account for sources of variability.
factors.
5. Conclusion
RBD is a more sophisticated design that accounts for variability by using blocks, leading to more reliable and accurate
treatment effect estimates.
CRD, while simpler, may not provide the same level of precision and control over variability, especially when there are
significant differences between experimental units.
Q5 (W17): Give the complete statistical analysis of RBD (Randomized Block Design).
Answer:
36/68
1. Overview of Randomized Block Design (RBD):
A Randomized Block Design (RBD) is used when there are experimental units that can be grouped into blocks, where each block
is homogeneous, meaning the units within each block are similar in some way. The treatments are then randomly assigned to
the experimental units within each block.
The main goal of RBD is to reduce the variability within each block, improving the precision of the treatment comparisons.
The linear model for the Randomized Block Design (RBD) is expressed as:
Yij = μ + τi + βj + ϵij
Where:
Yij = Response from the j th experimental unit in the ith treatment and j th block.
ϵij = Random error associated with the ith treatment and j th block, assumed to be normally distributed with mean 0 and
variance σ 2 .
To perform the statistical analysis of RBD, we use Analysis of Variance (ANOVA). The steps are as follows:
The total variation in the experiment is measured by the total sum of squares:
b t
TSS = ∑ ∑(Yij − Yˉ.. )2
i=1 j=1
Where:
t
SSTreatments = ∑ n (Yˉi. − Yˉ.. )
2
i=1
Where:
n = Number of observations per treatment (assuming equal number of units in each block).
Yˉi. = Mean of the ith treatment.
37/68
b
SSBlocks = ∑ n (Yˉ.j − Yˉ.. )
2
j=1
Where:
b t
SSError = ∑ ∑(Yij − Yˉij )2
i=1 j=1
Where:
1. Degrees of Freedom:
SSBlocks
Mean Square for Blocks: M SBlocks = dfBlocks
SSError
Mean Square for Error: M SError = dfError
M STreatments
FTreatments =
M SError
If FTreatments is larger than the critical value from the F-distribution table (based on the desired significance level), we reject the
null hypothesis and conclude that there is a significant difference between the treatments.
Source Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Ratio
Total SSTotal bt − 1
Where:
t = Number of treatments.
b = Number of blocks.
38/68
5. Interpretation of Results:
Significance Testing: The main aim of the ANOVA in RBD is to test if there is a significant difference between the treatment
means. This is done by comparing the FTreatments statistic to the critical value from the F-distribution.
Error Analysis: A significant error term (large MS_Error) indicates that there is unexplained variability in the response
variable, possibly due to uncontrolled factors.
6. Conclusion:
The statistical analysis of a Randomized Block Design (RBD) involves partitioning the total variation into treatment, block, and
error components. Using ANOVA, we can test the significance of treatment effects while controlling for block variability. The key
steps include calculating sums of squares, degrees of freedom, mean squares, and F-ratios, followed by hypothesis testing.
Q6 (W18): Give a layout of R.B.D. with 4 treatments and 3 blocks. Provide the complete analysis of
R.B.D., assuming the expected values of various sums of squares.
Answer:
4 treatments: T1 , T2 , T3 , T4
3 blocks: B1 , B2 , B3
The layout is constructed by randomly assigning the treatments to the experimental units within each block. The number of
experimental units is the product of the number of treatments and blocks, i.e., 4 × 3 = 12 units.
Here’s the layout of the experiment:
Block/Treatment T1 T2 T3 T4
Where:
Yij represents the observed response from the ith block and j th treatment.
Normally distributed errors within each block with mean 0 and constant variance σ 2 .
39/68
3. Mathematical Model of RBD:
Yij = μ + τj + βi + ϵij
Where:
ϵij = Random error associated with the ith block and j th treatment.
3 4
TSS = ∑ ∑(Yij − Yˉ.. )2
i=1 j=1
Where:
4
SSTreatments = 4 ∑(Yˉ.j − Yˉ.. )2
j=1
Where:
4 = Number of blocks.
3
SSBlocks = 3 ∑(Yˉi. − Yˉ.. )2
i=1
Where:
3 = Number of treatments.
3 4
SSError = ∑ ∑(Yij − Yˉij )2
i=1 j=1
Where:
40/68
Step 5: Degrees of Freedom (df):
SSTreatments
M STreatments =
dfTreatments
SSBlocks
M SBlocks =
dfBlocks
SSError
M SError =
dfError
Step 7: F-Ratio:
M STreatments
FTreatments =
M SError
Similarly, the F-ratio for testing the significance of blocks can be calculated:
M SBlocks
FBlocks =
M SError
Source Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-Ratio
Total SSTotal 11
Where:
6. Interpretation of Results:
41/68
F-Test for Treatments: The F_{\text{Treatments}} statistic is used to test if there is a significant difference in the treatment
means. If FTreatments is larger than the critical value from the F-distribution table, we conclude that at least one treatment is
F-Test for Blocks: Similarly, F_{\text{Blocks}} tests whether the blocks have a significant effect. However, in most
experiments, blocks are used to control variability, and their significance may not be the primary focus.
Error: The error sum of squares represents the unexplained variability, and a high value may indicate that additional factors
or random error may be influencing the outcome.
Q7 (W19): Derive expressions for the expected values of total SS, block SS, and treatment SS in an
RBD with b blocks and v treatments. Also, find the expected value of error SS in an RBD.
Answer:
In an RBD, the experiment consists of b blocks and v treatments. Each block contains all the treatments, and the treatments are
randomly assigned to the experimental units within each block.
The mathematical model for the observed response in an RBD is given by:
Yij = μ + τj + βi + ϵij
Where:
The total sum of squares (TSS) measures the total variation in the data. It is computed as:
b v
TSS = ∑ ∑(Yij − Yˉ.. )2
i=1 j=1
Where:
42/68
3. Treatment Sum of Squares (SS_Treatments):
The treatment sum of squares (SS_Treatments) measures the variation in the responses due to the treatments. It is computed as:
v
SSTreatments = v ∑(Yˉ.j − Yˉ.. )2
j=1
Where:
v = Number of blocks.
v
E[SSTreatments ] = (v − 1)σ + b ∑ τj2
2
j=1
Where:
σ 2 = Error variance.
τj2 = Variance of the treatment effects.
The block sum of squares (SS_Blocks) measures the variation in the responses due to the blocks. It is computed as:
b
SSBlocks = b ∑(Yˉi. − Yˉ.. )2
i=1
Where:
b = Number of treatments.
b
E[SSBlocks ] = (b − 1)σ 2 + v ∑ βi2
i=1
Where:
σ 2 = Error variance.
βi2 = Variance of the block effects.
The error sum of squares (SS_Error) measures the variation due to random error. It is computed as:
b v
SSError = ∑ ∑(Yij − Yˉij )2
i=1 j=1
Where:
43/68
Yˉij = Mean of the responses for the ith block and j th treatment.
E[SSError ] = bvσ 2
Where:
σ 2 = Error variance.
b = Number of blocks.
v = Number of treatments.
To summarize, the expected values for the sums of squares in an RBD are:
v b
E[TSS] = (bv − 1)σ 2 + b ∑ τj2 + v ∑ βi2
j=1 i=1
v
E[SSTreatments ] = (v − 1)σ 2 + b ∑ τj2
j=1
b
E[SSBlocks ] = (b − 1)σ 2 + v ∑ βi2
i=1
E[SSError ] = bvσ 2
Conclusion:
These expected values provide a deeper understanding of the sources of variation in the RBD and help in the interpretation of
the results through ANOVA. The treatment and block sum of squares reflect the impact of treatments and blocks on the variation
in the response, while the error sum of squares quantifies the random variation that is not explained by the treatments or blocks.
Q8 (S18, F): Obtain the least square estimates of different effects involved in the linear model of
an R.B.D. Derive an expression for S.S.E. (Sum of Squares for Error) in R.B.D. in terms of T.S.S.
(Total Sum of Squares), S.S.B. (Sum of Squares Between blocks), and S.S.T. (Sum of Squares Total).
44/68
Answer:
In a Randomized Block Design (RBD), the experiment consists of b blocks and v treatments. The linear model for the observed
response Yij in the RBD is given by:
Yij = μ + τj + βi + ϵij
Where:
In the RBD, the least square estimates are obtained by minimizing the sum of squared deviations between the observed values
and the expected values based on the model.
The least square estimates for the treatment and block effects are given by:
b
1
τ^j = ∑ Yij − Yˉ..
b
i=1
Where:
b = Number of blocks.
Yˉ.. = Grand mean of all observations.
^i ):
Estimated Block Effects (β
v
1
β^i = ∑ Yij − Yˉ..
v j=1
Where:
v = Number of treatments.
^):
Overall Mean (μ
b v
1
^=
μ ∑ ∑ Yij
bv i=1 j=1
Where:
45/68
3. Sum of Squares for Error (S.S.E.):
The Sum of Squares for Error (SSE) measures the variability in the data that cannot be explained by the treatment and block
effects. It is computed as the difference between the Total Sum of Squares (TSS) and the sum of squares due to treatments and
blocks.
b v
TSS = ∑ ∑(Yij − Yˉ.. )2
i=1 j=1
b
SSBlocks = v ∑(Yˉi. − Yˉ.. )2
i=1
v
SSTreatments = b ∑(Yˉ.j − Yˉ.. )2
j=1
The error sum of squares (SSE) is the difference between the Total Sum of Squares (TSS) and the sum of the Treatment Sum of
Squares (SS_Treatments) and Block Sum of Squares (SS_Blocks):
b v
SSE = ∑ ∑(Yij − μ
^ − β^i − τ^j )2
i=1 j=1
This represents the residual sum of squares after accounting for the block and treatment effects.
From the above, the expression for the Sum of Squares for Error (SSE) in terms of the Total Sum of Squares (TSS), the Sum of
Squares for Blocks (SS_Blocks), and the Sum of Squares for Treatments (SS_Treatments) is:
Where:
46/68
5. Final Formulae for SSE:
The Sum of Squares for Error (SSE) quantifies the variation that remains unexplained after considering the effects of treatments
and blocks. It provides an estimate of the random error or noise in the experiment.
Conclusion:
This derivation of the Sum of Squares for Error (SSE) and the least square estimates for the effects of treatments and blocks in
the Randomized Block Design (RBD) provides valuable insight into the variability in the experiment. The SSE reflects the
unexplained variability in the data and is crucial for assessing the significance of the treatment effects in the presence of block
effects.
Q9 (S18, G): Derive an expression for the expected value of the sum of squares due to treatments
in R.B.D.
Answer:
In a Randomized Block Design (RBD), we are given b blocks and v treatments. The linear model for the observed response Yij in
Yij = μ + τj + βi + ϵij
Where:
μ = Overall mean.
τj = Effect of the j th treatment.
The treatments τj are the fixed effects, and the error terms ϵij are the random errors.
47/68
The Sum of Squares due to Treatments (SS_Treatments) measures the variation in the data that is attributed to the treatments. It
is given by:
v
SSTreatments = b ∑(Yˉ.j − Yˉ.. )2
j=1
Where:
b = Number of blocks.
Yˉ.j = Mean of the j th treatment across all blocks.
The treatment sum of squares quantifies the variation in the data due to differences between the treatment means.
To derive the expected value of the Sum of Squares due to Treatments (SS_Treatments), we start by recognizing that the total
variation in the observed data can be partitioned into the variation due to treatments, blocks, and the error.
The expected value of the Sum of Squares due to Treatments is derived as follows:
j=1
b
1
Yˉ.j = μ + τj + ∑ ϵij
b
i=1
Thus, the variation due to the j th treatment is composed of the treatment effect τj and the random error terms ϵij .
Now, the expected value of the sum of squares due to treatments can be expressed as:
v
E(SSTreatments ) = b ∑ E ((Yˉ.j − Yˉ.. )2 )
j=1
ˉ.j
Since Y − Yˉ.. = τj + 1b ∑i=1 ϵij − μ, the error terms ϵij have zero expectation, and we are left with:
b
v
E(SSTreatments ) = b ∑ E (τj2 )
j=1
Since τj are fixed effects and do not vary, we can simplify this as:
v
E(SSTreatments ) = b ∑ τj2
j=1
This is the expected value of the sum of squares due to treatments in the Randomized Block Design.
4. Conclusion:
The expected value of the sum of squares due to treatments in an RBD is:
48/68
v
E(SSTreatments ) = b ∑ τj2
j=1
This result reflects the contribution of the treatment effects to the overall variation in the experiment, considering b blocks and v
treatments.
Q10 (S18, H): Derive an expression for the estimate of the relative efficiency of R.B.D. over C.R.D.
Answer:
Relative efficiency (RE) is a measure used to compare two experimental designs. It quantifies how much more efficient one
design is compared to another. In the case of comparing a Randomized Block Design (RBD) with a Completely Randomized
Design (CRD), the relative efficiency helps to understand how much more precise the RBD is in estimating treatment effects
compared to the CRD.
The relative efficiency (RE) of RBD over CRD is defined as the ratio of the variance of the treatment effects in CRD to the variance
of the treatment effects in RBD.
In a Completely Randomized Design (CRD), the treatments are assigned randomly across all experimental units without any
blocking. The model for CRD is:
Yij = μ + τj + ϵij
Where:
μ = Overall mean.
τj = Effect of the j th treatment.
The variance of treatment effects in CRD is the variance of the treatment means. The expected value of the treatment sum of
squares (SS_Treatment) is:
v 2 n 2
E(SSTreatment )CRD =
σ + τj
n v
Where:
v = Number of treatments.
n = Number of replications.
49/68
σ 2 = Error variance.
τj2 = Treatment variance.
σ2 τj2
Var(Treatment effects in CRD) = +
n v
In a Randomized Block Design (RBD), the treatments are assigned randomly within each block. The model for RBD is:
Yij = μ + τj + βi + ϵij
Where:
μ = Overall mean.
τj = Effect of the j th treatment.
In RBD, the variance of treatment effects is smaller because the block effects are taken into account. The variance of treatment
effects in RBD is given by:
σ2 τj2
Var(Treatment effects in RBD) = +
b v
Where:
b = Number of blocks.
Now, we can calculate the relative efficiency (RE) of RBD over CRD by taking the ratio of the variances:
σ2 τj2
+
n v
RERBD vs CRD =
τj2
σ2
+
b
v
This expression gives the relative efficiency of RBD in comparison to CRD, considering the variances associated with the
treatments and the error terms in both designs.
5. Conclusion:
σ2 τj2
+
n v
RERBD vs CRD =
τj2
σ2
+
b v
50/68
This ratio shows how much more efficient the RBD design is, considering the variance of the treatment effects in both
experimental setups. A higher value of RERBD vs CRD indicates that the RBD is more efficient than the CRD.
Answer:
A Latin Square Design (LSD) is a special experimental design used to control for two sources of variability. It arranges treatments
in a square grid such that each treatment appears only once in each row and each column.
Yij = μ + τi + λj + ϵij
Where:
μ = Overall mean.
τi = Effect of the ith treatment.
In a 5x5 Latin Square Design, there are 5 treatments arranged in a 5x5 grid. Each treatment appears exactly once in each row
and each column.
T1 T2 T3 T4 T5
T2 T3 T4 T5 T1
T3 T4 T5 T1 T2
T4 T5 T1 T2 T3
T5 T1 T2 T3 T4
Where:
51/68
3. Assumptions of LSD:
1. The experimental units are homogenous within each row and column.
2. The treatments are applied randomly within each row and column.
3. The error terms are independent and normally distributed with mean zero and constant variance.
4. The effects of treatments, rows, and columns are additive and linear.
LSD is often used in field experiments where there are two sources of variability, such as:
1. Agricultural experiments: Where rows may represent different soil types and columns may represent different planting
methods.
2. Industrial trials: Where rows could correspond to different machine operators and columns could represent different
production methods.
3. Psychological experiments: Where rows may represent different groups of subjects and columns represent different
conditions.
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) F-statistic
v−1
MSError
SSRows MSRows
v−1
MSError
SSColumns MSColumns
v−1 MSError
SSError
(v−1)2
Total v2 − 1 SSTotal
Where:
v = Number of treatments (also the number of rows and columns in a square matrix).
SSTreatment = Sum of squares due to treatments.
52/68
v
v2
SSTreatment = ∑ τi2
r
i=1
SSTreatment
M STreatment =
v−1
M STreatment
FTreatment =
M SError
Merits:
1. Control of two sources of variability: LSD helps control for row and column effects, making it useful when two sources of
variability are present.
2. Efficient use of experimental units: Each treatment is applied only once in each row and column, reducing the number of
experimental units required compared to a full factorial design.
Demerits:
1. Limited to square designs: The number of rows and columns must be equal, which can be restrictive.
2. Requires a balanced number of treatments: All treatments must appear in each row and column, limiting the flexibility of
the design.
3. Not ideal for large datasets: It may not be practical for very large datasets, as the size of the Latin square grows rapidly.
Answer:
53/68
In a 4×4 Latin Square Design (LSD), there are 4 treatments arranged in a 4×4 square grid. Each treatment appears exactly once in
each row and each column.
T1 T2 T3 T4
T2 T3 T4 T1
T3 T4 T1 T2
T4 T1 T2 T3
Where:
The mathematical model for the Latin Square Design (LSD) is:
Yij = μ + τi + λj + ϵij
Where:
μ = Overall mean.
τi = Effect of the ith treatment.
The main idea behind this model is to control for row and column effects by assigning treatments in a square matrix, ensuring
that each treatment appears once per row and column.
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) F-statistic
v−1
MSError
SSRows MSRows
v−1
MSError
SSColumns MSColumns
v−1 MSError
SSError
(v−1)2
Total v2 − 1 SSTotal
Where:
v = Number of treatments (also the number of rows and columns in the square matrix).
SSTreatment = Sum of squares due to treatments.
54/68
4. Formulae for Sum of Squares, Mean Sum of Squares, and F-statistic:
v
v2
SSTreatment = ∑ τi2
r
i=1
SSTreatment
M STreatment =
v−1
M STreatment
FTreatment =
M SError
The expected value of the Mean Square for Treatments (MS_Treatment) is given by:
v
E(M STreatment ) = σ 2 + τ 2
r
Where:
This formula shows the contribution of both the treatment variance and the error variance to the mean square due to
treatments.
Control of two sources of variation: LSD controls for both row and column effects, making it useful when there are two
sources of variability.
Efficient use of experimental units: Each treatment is applied only once in each row and column, reducing the number of
experimental units compared to a full factorial design.
Balanced design: LSD ensures that each treatment is equally represented across rows and columns.
Limited to square designs: The number of rows and columns must be equal, which restricts its use.
55/68
Requires a balanced number of treatments: All treatments must appear in each row and column, which may not be
possible with unequal numbers of treatments.
Not ideal for large datasets: It may not be practical for very large datasets, as the size of the Latin square increases rapidly.
Assumption of homogeneity: The design assumes that rows and columns are homogeneous, which may not always hold in
real-world scenarios.
Answer:
A Latin Square Design (LSD) is a type of experimental design used when there are two sources of variability (like rows and
columns) that need to be controlled, and there are r treatments to be applied in a way that each treatment appears only once in
each row and once in each column.
LSD is typically used when the researcher wants to control two types of extraneous variables (e.g., rows and columns) while
testing the treatment effects. It is considered a special case of the Randomized Block Design (RBD).
In a r × r Latin Square, there are r treatments arranged in a square matrix with r rows and r columns, such that each treatment
appears exactly once in each row and once in each column.
A possible layout for a 4×4 Latin Square Design (LSD) with 4 treatments (T1, T2, T3, T4) is:
T1 T2 T3 T4
T2 T3 T4 T1
T3 T4 T1 T2
T4 T1 T2 T3
In this layout:
Each treatment appears exactly once in every row and every column.
56/68
Yij = μ + τi + λj + ϵij
Where:
This model assumes that the observed values are influenced by the overall mean, the treatment effects, the row effects, and
random error.
The Analysis of Variance (ANOVA) table for the Latin Square Design (LSD) with r treatments is structured as follows:
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
Treatments r−1 SSTreatments
M STreatments = SSTreatments
r−1
FTreatments = MMSTreatments
SError
FRows = M
M SRows
SError
Error
SSError
Error (r − 1)(r − 1) SSError M SError = (r−1)(r−1)
Total r2 − 1 SSTotal
Where:
SS = Sum of Squares.
MS = Mean Square.
F = F-statistic.
The expected value of the Mean Square for Treatments, Rows, and Columns are as follows:
For Treatments:
E(M STreatments ) = σ 2 + rτ 2
Where:
Where:
57/68
E(M SColumns ) = σ 2 + rθ2
Where:
E(M SError ) = σ 2
This assumes that the error variance is the same across all sources of variation.
Control of Two Sources of Variability: The Latin Square Design allows for the control of two sources of variation (rows and
columns), which reduces experimental error.
Efficient Use of Experimental Units: Each treatment is applied once per row and column, making the design efficient.
Balanced Design: It is a balanced design where each treatment appears equally often across rows and columns.
Minimization of Experimental Error: By controlling both row and column effects, LSD minimizes uncontrolled variability.
Limited to Square Designs: LSD is restricted to situations where the number of treatments equals the number of rows and
columns. This may not be feasible for larger numbers of treatments.
Homogeneity Assumption: The design assumes homogeneity within rows and columns. If this assumption does not hold,
the results may be biased.
Not Suitable for Larger Datasets: As the number of treatments increases, the design can become cumbersome and
inefficient.
Interference Between Treatments: If there are interactions between treatments, rows, or columns, the design may not
provide reliable results.
Control of Sources of Variation: Both LSD and RBD control for one source of variation. However, LSD controls for two
sources of variability (rows and columns) simultaneously, while RBD only controls for one (blocks).
Flexibility: RBD is more flexible because it can handle more blocks or treatments, while LSD is limited to designs where the
number of treatments equals the number of rows and columns.
Efficiency: LSD is more efficient when both rows and columns are important sources of variation, but RBD may be more
appropriate when additional blocking factors are necessary.
58/68
Let's start with Q14.
Answer:
The general layout for a r × r Latin Square, where there are r treatments, can be shown as a square matrix where each
treatment appears once in each row and column.
Yij = μ + τi + λj + ϵij
Where:
Yij = The observed value for the ith row and j th column.
ϵij = The random error term for the (i, j)th position.
This model assumes that the observed value is the result of the overall mean, the treatment effects, the row effects, and random
error. The goal is to estimate the effects of treatments and rows, controlling for the row and column sources of variability.
Answer:
59/68
To derive the expected value of the Mean Sum of Squares due to treatments (MSTreatment) in a Latin Square Design (LSD), we
start with the analysis of variance (ANOVA) model.
Treatments (T)
Rows (R)
Columns (C)
Error (E)
Now, we focus on deriving the expected value for the Mean Sum of Squares due to treatments (MSTreatment).
r
SST reatment = r ⋅ ∑(Y i⋅ − Y ⋅⋅ )2
i=1
Where:
SST reatment
MST reatment =
r−1
Now, let's calculate the expected value of MSTreatment. Since we have two sources of variation (treatment and error), the
expected value for MSTreatment is given by:
r
E(MST reatment) = σ 2 + ⋅ τ2
r−1
Where:
Thus, the expected value of the MSTreatment involves both the random error variance and the treatment variance.
This is the expected value of the Mean Sum of Squares due to treatments in a Latin Square Design (LSD).
60/68
Should I proceed with the solution to Q16?
Answer:
To test the significance of the difference between two treatment means in a Latin Square Design (LSD), we follow these steps:
Null Hypothesis (H₀): There is no significant difference between the means of the two treatments.
H0 : μ i = μ j
(where μi and μj are the treatment means)
Alternative Hypothesis (H₁): There is a significant difference between the means of the two treatments.
H1 : μ i
= μj
D =Yi−Yj
Where:
2 ⋅ MSE
SE(D) =
r
Where:
61/68
Step 4: Calculate the Test Statistic
The test statistic (t-value) for the significance of the difference is computed using the following formula:
D
t=
SE(D)
Where:
dfError = (r − 1)(r − 1)
Step 7: Conclusion
Based on the comparison with the critical t-value, we conclude whether or not there is a significant difference between the two
treatment means. If we reject H0 , we can conclude that there is a significant difference between the two treatments. If we fail to
reject H0 , we conclude that there is no significant difference between the two treatment means.
This is the procedure for testing the significance of the difference between two treatment means in a Latin Square Design (LSD).
62/68
Answer:
The Relative Efficiency of a Randomized Block Design (RBD) with respect to a Completely Randomized Design (CRD) compares
how efficient the RBD is in reducing the error variance compared to the CRD. It is calculated as the ratio of the error mean square
(MSE) for the CRD to the error mean square for the RBD.
Step 1: Define the Mean Square Error for CRD and RBD
For both designs, we need to understand the following:
In CRD:
In RBD:
The variance of error for RBD is also σ 2 , but the blocks are used to reduce the variance.
MSECRD = σ 2
(Since there is no blocking, the error variance is just σ 2 , the general error term.)
σ2
MSERBD =
b
Where:
This formula reflects the fact that blocking in RBD reduces the error variance, leading to a smaller error mean square.
MSECRD
RERBD/CRD =
MSERBD
σ2
RERBD/CRD =
σ2
=b
b
63/68
Step 4: Interpretation of Relative Efficiency
The relative efficiency of RBD with respect to CRD is equal to the number of blocks b. This means that by using blocks, the
efficiency of the experiment increases by a factor of b. The more blocks there are, the more efficient the design becomes in
reducing variability compared to the CRD.
Conclusion:
The relative efficiency of RBD with respect to CRD is:
RERBD/CRD = b
Answer:
A Latin Square design arranges treatments in such a way that each treatment appears only once in each row and each column of
the layout, making it a balanced design that controls both row and column effects.
T1 T2 T3 T4
T2 T3 T4 T1
T3 T4 T1 T2
T4
T1
T2
T3
Where:
64/68
3. Mathematical Model of LSD
The mathematical model for a Latin Square design is:
Yijk = μ + αi + βj + τk + ϵijk
Where:
Yijk = Observation in the i-th row, j -th column, and k -th treatment.
μ = Overall mean.
αi = Effect of the i-th row.
In this model:
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-Statistic
SSTreatments M STreatments
Treatments r−1 SSTreatments M STreatments =
r−1
FTreatments =
M SError
SSRows
Rows r−1 SSRows M SRows =
r−1
SSColumns
Columns r−1 SSColumns M SColumns =
r−1
SSError
Error (r − 1)(r − 1) SSError M SError =
(r−1)(r−1)
Total r2 − 1 SSTotal
Where:
SSTreatments , SSRows , SSColumns , SSError = Sums of squares for treatments, rows, columns, and error.
Efficient with Few Treatments: For a small number of treatments, LSD offers better control over variability compared to
other designs like CRD and RBD.
65/68
Minimizes Confounding: It reduces the confounding between treatments, rows, and columns.
Difficult to Extend: It is hard to extend LSD to more than two sources of variability (i.e., more than two blocking factors).
Requires Equal Number of Treatments, Rows, and Columns: It requires the number of rows to equal the number of
columns and the number of treatments to be the same. This may not always be possible in practical experiments.
RBD is more adaptable when there is a need for more than two blocking factors, whereas LSD is restricted to only two
blocking factors (rows and columns).
LSD can be more efficient in some cases, but RBD might be preferred when the block sizes are unequal.
Answer:
1. Relative Efficiency (RE) of R.B.D. (Randomized Block Design) with Respect to C.R.D. (Completely
Randomized Design)
The relative efficiency of two experimental designs (in this case, R.B.D. and C.R.D.) refers to the comparison of the efficiency of
one design over another, based on their variances. Efficiency compares how well a design can detect treatment effects under the
same conditions.
The relative efficiency between R.B.D. (Randomized Block Design) and C.R.D. (Completely Randomized Design) can be
expressed as the ratio of their error variances.
Variance of Error for R.B.D.: In an R.B.D., the error variance is reduced due to the blocking effect. The formula for the
variance of error in an R.B.D. is:
σ2
Variance of Error (RBD) =
b
Where:
66/68
σ 2 is the overall error variance.
b is the number of blocks.
Variance of Error for C.R.D.: In a C.R.D., the error variance is larger because there is no blocking, so the variance is simply:
Where:
b
So, the relative efficiency of R.B.D. compared to C.R.D. is b, the number of blocks.
Conclusion:
R.B.D. is more efficient than C.R.D. by a factor of the number of blocks b.
As the number of blocks increases, the efficiency of the R.B.D. increases because blocking reduces the error variance.
Answer:
L.S.D. allows control over two sources of variability by arranging treatments in both rows and columns. This helps in
reducing the confounding effects that may arise from variation in rows or columns.
When you have a limited number of treatments (r), the Latin Square design is very efficient as it minimizes the number
of experimental units while still controlling for two factors simultaneously.
By ensuring that each treatment appears exactly once in each row and column, L.S.D. minimizes the error due to
variability in rows and columns, leading to more precise estimates.
67/68
L.S.D. is flexible in applications, especially when it’s important to study the effects of treatments under the influence of
two blocking factors. It is useful in agricultural experiments and industrial settings.
5. Balanced Design:
Each treatment is assigned to each row and each column exactly once, ensuring that there is no bias due to the
positioning of treatments.
L.S.D. can only be used when the number of treatments is equal to the number of rows and columns (r × r). This limits its
application to only square experimental designs.
2. Complexity in Layout:
Setting up a Latin Square design can be complex, especially when the number of treatments (r) becomes large.
Designing the arrangement of treatments in rows and columns can become cumbersome.
L.S.D. can only control two blocking factors (rows and columns). If there are more factors influencing the experiment,
L.S.D. may not be the best choice as it lacks the ability to incorporate more than two blocking factors.
4. Susceptibility to Confounding:
If the number of treatments exceeds the number of rows and columns, or if additional factors are involved that are not
controlled in the design, L.S.D. may suffer from confounding issues, leading to biased results.
The Latin Square design assumes that all experimental units within a row or column are homogeneous in terms of the
experimental conditions, which may not always be the case in real-world settings.
Conclusion:
While the Latin Square Design is a powerful tool for minimizing error variance and controlling for two blocking factors, its
restrictions on the number of treatments and the complexity of layout make it less suitable for experiments with more than two
factors or a large number of treatments.
This concludes Q20. Would you like to move on to any other topics or questions?
It seems that we've covered all the questions you provided. Would you like to dive into more topics, or would you prefer to
explore any other areas related to your studies? Feel free to let me know!
2/2
68/68
STAT II UNIT 4
Sure! Let's start solving each question one by one.
Solution:
In a factorial experiment, factors are typically represented by levels. The experiment you're asking about is a 2³²³ design, which means it
has 3 factors (A, B, C) and each factor has 2 levels (usually denoted as +1 and -1).
Orthogonality of Main Effects: Orthogonality means that the factors do not interfere with each other, i.e., the sum of their products
across the runs is zero.
For a 2³²³ design, the main effects AA, BB, and CC are orthogonal if the sum of their interactions is zero.
We can demonstrate this by looking at the design matrix for the factorial experiment. The general form of the matrix will be:
Run A B C
1 +1 +1 +1
2 +1 +1 −1
3 +1 −1 +1
4 +1 −1 −1
5 −1 +1 +1
6 −1 +1 −1
7 −1 −1 +1
8 −1 −1 −1
The sum of the main effects AA, BB, and CC is calculated using the following formulas:
8
1
AA = ∑ (A i ⋅ Y i )
8
i=1
where Ai is the level of factor A in the ith run, and Yi is the observed output. The sums for BB and CC are calculated similarly.
Orthogonality: To show that the main effects are orthogonal, we need to show that the sum of the products of the main effects is zero:
8 8 8
∑ Ai Bi = 0,
∑ Bi Ci = 0, ∑ A i Ci = 0
From the design matrix, it is evident that the sum of the interactions is zero, which confirms that AA, BB, and CC are orthogonal.
Yate’s Method for Factorial Effect Totals: Yate’s method is used to calculate the main effects and interaction effects. It is based on
calculating the contrasts between the high and low levels of the factors in the design.
1. The first step is to compute the total of the factorial effects by using the factor levels, as shown in the design matrix.
2. Next, compute the sum of each factor's effect by averaging the effects of the high and low levels.
You can apply Yate’s method to determine the factorial effects by systematically using the contrasts for A, B, and C.
1/48
Would you like me to proceed to the next question?
Solution:
Linear Contrast:
A linear contrast is a weighted sum of the treatment means, where the weights sum to zero. A contrast allows the comparison of
different treatment conditions by assigning specific weights to each treatment level.
k
C = ∑ wi ⋅ μi
i=1
where wi is the weight assigned to the ith treatment mean μi , and the sum of the weights is zero:
k
∑ wi = 0
i=1
Orthogonality of Contrasts:
Two contrasts are said to be orthogonal if the sum of the products of their weights is zero:
k
∑ w i ⋅ vi = 0
i=1
where wi and vi are the weights of the two contrasts being compared.
Consider a 2³²³ factorial experiment arranged in a Randomized Block Design (R.B.D.) with r replicates. In such an experiment, the main
effects AA, BB , and the interaction effects ABAB and ABCABC are orthogonal if the sum of the products of their weights is zero.
Main effect of A (AA): The main effect of factor A represents the average difference in the outcome due to changing the level of
factor A, holding other factors constant. In a 2³²³ design, this is represented by the contrast between the high (+1) and low (-1)
levels of factor A.
Main effect of B (BB): Similarly, the main effect of factor B is the contrast between the high (+1) and low (-1) levels of factor B.
ABAB interaction effect: This represents the combined effect of factors A and B, showing how the levels of one factor modify the
response at the levels of the other factor.
ABCABC interaction effect: This represents the interaction between all three factors, showing how the levels of factors A, B, and C
interact to affect the outcome.
These effects are orthogonal because their contrasts are designed in such a way that the sum of the products of their weights across all
treatment combinations is zero.
The orthogonality of these effects can be shown using the design matrix. If we take the following design matrix for a 2³²³ factorial
experiment:
2/48
Run A B C Y
1 +1 +1 +1 Y1
2 +1 +1 −1 Y2
3 +1 −1 +1 Y3
4 +1 −1 −1 Y4
5 −1 +1 +1 Y5
6 −1 +1 −1 Y6
7 −1 −1 +1 Y7
8 −1 −1 −1 Y8
The orthogonality of main effects and interaction effects can be shown by calculating the sum of the products of the respective
contrasts across all runs. It turns out that the sum of these products is zero, thus proving their orthogonality.
To test the significance of the main effects and interaction effects, we can perform the following steps:
For each main effect and interaction, calculate the sum of squares based on the differences between the means of the
corresponding treatment groups.
The DF for each main effect is the number of levels minus 1 (for a 2-level factor, DF = 1).
The DF for interaction effects is calculated as the product of the degrees of freedom of the factors involved.
The Mean Square (MS) is calculated by dividing the Sum of Squares (SS) by the corresponding degrees of freedom (DF).
SS
MS =
DF
4. F-statistic:
The F-statistic is calculated as the ratio of the Mean Square of the effect to the Mean Square of the error (residual).
M SEffect
F =
M SError
Using the F-distribution table, compare the calculated F-statistic with the critical value at a chosen significance level (typically
0.05). If F is greater than the critical value, the effect is considered significant.
Solution:
1. Simple Experiments:
A simple experiment typically involves only one factor being studied at two or more levels. For example, testing the effect of a
single fertilizer type on plant growth.
The focus is on the effect of one factor at a time, and interactions between multiple factors are not considered.
3/48
Analysis is generally straightforward, and interactions or combined effects of factors are ignored.
2. Factorial Experiments:
Factorial experiments involve multiple factors being studied simultaneously. Each factor is tested at multiple levels, and the
combinations of these levels for all factors are included in the experiment.
Factorial designs allow the study of both the individual main effects of each factor as well as the interaction effects between
factors.
This type of experiment is more complex, but it provides richer information, including the effects of interactions between
factors.
1. Interaction Effects:
Factorial experiments allow the study of interactions between factors, which is often critical in understanding complex
systems.
In simple experiments, interactions are overlooked, which may lead to incomplete or misleading conclusions.
2. Efficiency:
Factorial experiments are more efficient because they allow for the simultaneous study of multiple factors. This means fewer
experiments are needed to gather more information.
3. Better Insight:
Since factorial experiments consider multiple factors, they provide a more comprehensive understanding of the system being
studied. This is especially useful when multiple factors contribute to the outcome.
4. Flexibility:
Factorial designs can accommodate more than two factors, offering flexibility in experimental setups.
In factorial experiments, both main effects and interactions can be analyzed, whereas in simple experiments, only main effects
can be observed.
Complete Statistical Analysis of a 2²²² Factorial Experiment in Randomized Block Design (R.B.D.):
Let us consider a 2²²² factorial experiment with 2 factors (A, B, and C), each at 2 levels (high and low), and the experiment is arranged
in a Randomized Block Design (R.B.D.) with r replicates.
1. Treatment Combination: The treatment combinations for a 2²²² factorial experiment are as follows:
(A, B, C) ∈ {(+1, +1, +1), (+1, +1, −1), (+1, −1, +1), (+1, −1, −1), (−1, +1, +1), (−1, +1, −1), (−1, −1, +1), (−1, −1, −1)}
2. Model for the Experiment: The general model for the 2²²² factorial experiment can be written as:
where:
4/48
Ai , Bj , Ck are the main effects.
3. Sum of Squares (SS): The next step is to calculate the Sum of Squares (SS) for each main effect and interaction:
Error Sum of Squares (SS Error): The remaining variation after accounting for the main and interaction effects.
df for effect
4. Degrees of Freedom (DF): The degrees of freedom for the various components are calculated as follows:
SS
MS =
df
6. F-statistics: The F-statistic for testing the significance of each effect is calculated as:
M Seffect
Feffect =
M SError
7. ANOVA Table: An ANOVA table summarizes the variance analysis for the experiment.
Source of Variation Degrees of Freedom (DF) Sum of Squares (SS) Mean Square (MS) F-statistic
8. Significance Testing:
To test the significance of each effect (main or interaction), we compare the calculated F-statistic for that effect with the critical
value from the F-distribution table based on the chosen significance level (typically α = 0.05).
If the calculated F-statistic exceeds the critical value, the effect is considered statistically significant.
5/48
What is a treatment contrast? When are two such contrasts said to be orthogonal? In a 2²² factorial experiment with two factors K
and N, define the main effects K and N, and the interaction effect KN. Obtain expressions for these effects. Show that:
Treatment Contrast:
A treatment contrast is a linear combination of the treatment means in an experiment. It represents the comparison of the effects of
different treatments or combinations of factors. It is typically used in statistical analysis to compare the impact of various factors or
levels of factors.
A treatment contrast for two factors, say A and B , with levels +1 and −1, can be written as:
C = ∑ αi μ i
where αi are coefficients, and μi are the means corresponding to the treatment combinations. These coefficients are chosen to
Orthogonal Contrasts:
Two contrasts are said to be orthogonal if the sum of the products of their coefficients equals zero. In other words, two contrasts C1
∑ αi1 αi2 = 0
where αi1 and αi2 are the coefficients for the contrasts C1 and C2 , respectively.
Orthogonality implies that the contrasts are independent of each other, and testing one contrast does not affect the testing of the
other.
In a 2²² factorial experiment, there are two factors, K and N, each at two levels, say +1 and −1. The treatments are represented by
combinations of the factors' levels: (K, N ) = {(+1, +1), (+1, −1), (−1, +1), (−1, −1)}.
Let’s define the main effects of K and N, and their interaction effect KN as follows:
Main Effects:
1. Main effect of K (K ):
The main effect of factor K represents the difference in the average response when K is at its high level (+1) versus its low level (−1),
irrespective of the level of N . The expression for K is:
1
K= [(μ(K+1,N +1) + μ(K+1,N −1) ) − (μ(K−1,N +1) + μ(K−1,N −1) )]
2
6/48
2. Main effect of N (N ):
The main effect of factor N represents the difference in the average response when N is at its high level (+1) versus its low level (−1),
irrespective of the level of K . The expression for N is:
1
N= [(μ(K+1,N +1) + μ(K−1,N +1) ) − (μ(K+1,N −1) + μ(K−1,N −1) )]
2
Interaction Effect:
The interaction effect KN represents the difference in the response between the combined effect of K and N at different levels. The
expression for the interaction effect is:
1
KN = [(μ(K+1,N +1) − μ(K+1,N −1) ) − (μ(K−1,N +1) − μ(K−1,N −1) )]
4
Main effect N is also a treatment contrast because it is a linear combination of the treatment means, comparing the two levels of
factor N while averaging over the levels of factor K .
Interaction effect KN is a treatment contrast because it is a linear combination of the treatment means and compares the
combined effect of both factors at different levels.
Each of these effects involves a linear comparison between treatment combinations, making them treatment contrasts.
=1−1−1+1=0
Since the sum of the products is zero, the contrasts K and N are orthogonal.
Conclusion:
The main effects K and N and the interaction effect KN are all treatment contrasts because they are linear combinations of
treatment means.
The main effects K and N are orthogonal treatment contrasts because the sum of the products of their coefficients equals zero.
7/48
Would you like to move on to the next question?
1. Simple Experiment:
It examines the effect of one independent variable (factor) on the dependent variable, keeping other factors constant.
Example: Testing the effect of fertilizer type on plant growth while keeping the watering method fixed.
2. Factorial Experiment:
Involves the study of two or more factors simultaneously, with all possible combinations of factor levels.
It examines the interactions between factors and their individual effects on the dependent variable.
Example: Testing the effect of fertilizer type and watering method together on plant growth.
Interaction Effects: Enables studying interactions between factors, which may be overlooked in simple experiments.
Flexibility: Can handle multiple levels of factors and multiple factors at the same time.
In a 2²² factorial experiment, we have two factors (let's call them K and N) each at two levels, +1 and −1. The factorial design consists
of all possible combinations of these levels for both factors, leading to 4 treatment combinations:
Now, consider this experiment is arranged in a Randomized Block Design (R.B.D.). In this design, the blocks are used to account for
variability in the experimental units. Suppose the experiment is conducted with b blocks, and each treatment is randomly assigned to
the experimental units within the blocks.
1. Data Collection:
Collect observations for each treatment combination in each block. For example, the data could look like this:
1 (+1, +1) y1
1 (+1, -1) y2
1 (-1, +1) y3
1 (-1, -1) y4
8/48
Block Treatment Combination Observation
2 (+1, +1) y5
2 (+1, -1) y6
2 (-1, +1) y7
2 (-1, -1) y8
Sum of Squares for Treatments (SST): This measures the variation between treatment groups.
Sum of Squares for Blocks (SSB): This measures the variation between blocks.
Sum of Squares for Error (SSE): This measures the residual variation within treatments and blocks.
Then, the mean squares (MS) are computed by dividing the sum of squares by their respective degrees of freedom (df).
3. F-Statistic:
The F-statistic is computed by dividing the mean square for treatments by the mean square for error:
M Streatment
Ftreatment =
M Serror
Similarly, we can calculate the F-statistic for blocks and interaction effects.
4. Hypothesis Testing:
Test the significance of the main effects and interaction effects using an F-test. If the F-statistic exceeds the critical value from
the F-distribution table, we reject the null hypothesis and conclude that the effects are significant.
Simple Experiments:
Focuses on determining the effect of one independent variable on the dependent variable.
Example: Measuring the effect of temperature on the rate of reaction, while holding pressure constant.
Factorial Experiments:
Can identify not only the main effects but also the interactions between the factors.
9/48
Example: Measuring the effect of both temperature and pressure on the rate of reaction.
Interaction Effects: Can study how multiple factors interact and affect the outcome.
Insight into Factors: Helps identify which factors have the most significant impact on the outcome.
This analysis is similar to what was done for Q5, with the main difference being the inclusion of more detailed steps and data
organization. In a 2²² factorial experiment, we have two factors K and N , each with two levels. This design generates four treatment
combinations:
Data is collected for each treatment combination in multiple blocks (denoted as b).
1 (+1, +1) y1
1 (+1, -1) y2
1 (-1, +1) y3
1 (-1, -1) y4
2 (+1, +1) y5
2 (+1, -1) y6
2 (-1, +1) y7
2 (-1, -1) y8
2. Sum of Squares:
Sum of Squares for Treatments (SST): Measures the variation between treatment groups.
Sum of Squares for Blocks (SSB): Measures the variation between blocks.
Sum of Squares for Error (SSE): Measures the residual variation within treatments and blocks.
For treatments, the degrees of freedom are the number of treatment combinations minus one: dftreatment = 4 − 1 = 3.
For blocks, the degrees of freedom are the number of blocks minus one: dfblocks = b − 1.
For error, the degrees of freedom are calculated as: dferror = (b − 1)(4 − 1) = 3b − 3.
4. Mean Squares:
The mean square for each factor and error is calculated by dividing the sum of squares by their respective degrees of freedom.
SST
M Streatment =
dftreatment
SS
10/48
SSB
M Sblocks =
dfblocks
SSE
M Serror =
dferror
5. F-Statistic:
The F-statistic is computed for treatment effects and error using the mean squares:
M Streatment
Ftreatment =
M Serror
6. Hypothesis Testing:
Factorial Experiments are a type of experimental design that allows the study of the effects of two or more factors (independent
variables) simultaneously.
Each factor is tested at multiple levels, and the effects of individual factors, as well as their interactions, are analyzed.
Factorial designs are highly efficient because they allow us to understand not only the main effects of the factors but also how they
interact with each other.
Factorial experiments involve testing multiple factors at once, making them more efficient compared to simple experiments,
which test only one factor at a time.
Factorial experiments can identify interactions between factors, which is not possible in simple experiments where factors are
kept fixed.
3. Comprehensive Analysis:
Factorial designs provide a more complete picture of the experimental situation, helping researchers understand the
combined effects of various factors.
4. Increased Precision:
Factorial experiments provide better precision in estimating the effects of factors, especially when there are interactions.
11/48
Analysis of Degrees of Freedom in a Factorial Experiment with Three Factors at Two Levels Each in Three Replications, Arranged in
an R.B.D.:
In this case, we have a 23 factorial experiment, which means there are 3 factors, each with two levels.
Total number of treatment combinations = 23 = 8 (since there are 3 factors, each with 2 levels).
Total number of observations = 8 treatment combinations × 3 replications = 24 observations.
The degrees of freedom for each factor is based on the number of levels (which is 2 in this case) minus one.
The error degrees of freedom is calculated as the total number of observations minus the degrees of freedom used by the
factors and interactions.
dferror = 23 − (1 + 1 + 1 + 1 + 1 + 1 + 1) = 23 − 7 = 16
Since the experiment is arranged in a randomized block design (R.B.D.), the degrees of freedom for blocks depend on the
number of blocks. If there are b blocks, the degrees of freedom for blocks = b − 1.
Expressions for Main Effects and Interaction Effects for a 2³² Experiment:
For a 23 factorial experiment with three factors A, B, C at two levels each, the effects can be computed as follows:
N N
1 1
A= ∑ yi (for A1 ) −
∑ yi (for A2 )
N i=1 N i=1
Where yi are the observations at different levels of A and N is the total number of observations for each level of A.
12/48
N N
1 1
B= ∑ yi (for B1 ) − ∑ yi (for B2 )
N i=1 N i=1
N N
1 1
C= ∑ yi (for C1 ) −
∑ yi (for C2 )
N i=1 N i=1
N N
1 1
AB = ∑ yi (for AB1 ) −
∑ yi (for AB2 )
N i=1 N i=1
N N
1 1
AC = ∑ yi (for AC1 ) −
∑ yi (for AC2 )
N N
i=1 i=1
N N
1 1
BC = ∑ yi (for BC1 ) −
∑ yi (for BC2 )
N N
i=1 i=1
N N
1 1
ABC = ∑ yi (for ABC1 ) −
∑ yi (for ABC2 )
N N
i=1 i=1
These expressions allow us to calculate the main effects and interaction effects for each factor and their combinations.
1. Simple Experiments:
In a simple experiment, only one factor is varied at a time, and the effect of this factor is studied on the dependent variable.
Example: Testing the effect of temperature on the yield of a chemical reaction, keeping all other factors constant.
2. Factorial Experiments:
A factorial experiment involves the simultaneous manipulation of two or more factors. The effect of each factor and their
interactions are studied.
Factorial experiments allow for the analysis of interactions between factors and help in understanding how factors work
together.
13/48
Example: Studying the effect of temperature and pressure on the yield of a chemical reaction, where both factors are varied
simultaneously.
A 22 factorial experiment involves 2 factors, each at 2 levels. Let’s denote the factors as A and B , each with two levels, say A1 , A2 and
B1 , B2 .
The 22 factorial experiment consists of 4 treatment combinations: (A1 , B1 ), (A1 , B2 ), (A2 , B1 ), (A2 , B2 ).
Assume we have 3 replications of the experiment arranged in a Randomized Block Design (R.B.D.), which means there are 3 blocks, and
each treatment is randomly assigned to a block.
2. Total Observations:
Degrees of freedom for treatments: Since there are 4 treatment combinations, the degrees of freedom for treatments = 4 −
1 = 3.
Degrees of freedom for blocks: There are 3 blocks, so the degrees of freedom for blocks = 3 − 1 = 2.
Degrees of freedom for error: The degrees of freedom for error = Total degrees of freedom – (df for treatments + df for
blocks).
dferror = 12 − 1 − 3 − 2 = 6
4. Sum of Squares:
The sum of squares for each component (blocks, treatments, error) can be calculated using the following formulae:
4
SStreatments = ∑(Ti − Y )2
i=1
where Ti is the mean for each treatment, and Y is the overall mean.
3
SSblocks = ∑(Bj − Y )2
j=1
12
SSerror = ∑(Yi − Y^i )2
i=1
^i is the predicted value from the model.
where Yi is the observed value and Y
5. Mean Squares:
SS
14/48
SStreatments
M Streatments =
dftreatments
SSblocks
M Sblocks =
dfblocks
SSerror
M Serror =
dferror
6. F-Statistics:
M Streatments
Ftreatments =
M Serror
M Sblocks
Fblocks =
M Serror
7. Interpretation:
If the Ftreatments or Fblocks value is greater than the critical value from the F -distribution table, the respective factor or block
The error term (residual) is used to estimate the variability that is not explained by the factors or blocks.
N N
1 1
A= ∑ yi (for A1 ) −
∑ yi (for A2 )
N i=1 N i=1
N N
1 1
B= ∑ yi (for B1 ) − ∑ yi (for B2 )
N i=1 N i=1
N N
1 1
AB = ∑ yi (for AB1 ) −
∑ yi (for AB2 )
N N
i=1 i=1
These effects are analyzed using the sums of squares and mean squares, and their significance is tested using the F-test.
A 2²² factorial experiment involves 2 factors, each with 2 levels, giving 4 treatment combinations. This experiment is arranged in a
Randomized Block Design (RBD), meaning that each treatment is randomly assigned to different blocks, and each treatment appears
15/48
once in every block. Let the two factors be labeled as A and B.
(A1 , B1 )
(A1 , B2 )
(A2 , B1 )
(A2 , B2 )
2. Total Observations:
3. Degrees of Freedom:
Degrees of freedom for treatments: Since there are 4 treatment combinations, the degrees of freedom for treatments = 4 − 1 =
3.
Degrees of freedom for blocks: Since there are 3 blocks, the degrees of freedom for blocks = 3 − 1 = 2.
Degrees of freedom for error: The degrees of freedom for error = Total degrees of freedom – (df for treatments + df for blocks).
dferror = 12 − 1 − 3 − 2 = 6
4
SStreatments = ∑(Ti − Y )2
i=1
3
SSblocks = ∑(Bj − Y )2
j=1
12
SSerror = ∑(Yi − Y^i )2
i=1
^i is the predicted value from the model.
where Yi is the observed value, and Y
5. Mean Squares:
SStreatments
M Streatments =
dftreatments
SSblocks
M Sblocks =
dfblocks
16/48
Mean Square for Error:
SSerror
M Serror =
dferror
6. F-Statistics:
M Streatments
Ftreatments =
M Serror
M Sblocks
Fblocks =
M Serror
N N
1 1
A= ∑ yi (for A1 ) −
∑ yi (for A2 )
N N
i=1 i=1
N N
1 1
B= ∑ yi (for B1 ) − ∑ yi (for B2 )
N N
i=1 i=1
N N
1 1
AB = ∑ yi (for AB1 ) −
∑ yi (for AB2 )
N N
i=1 i=1
8. Interpretation:
If the F-statistics for the main effects or interaction effect are significant (i.e., larger than the critical value from the F-distribution), it
suggests that the corresponding factor or interaction significantly affects the response variable.
Simple Experiments:
In a simple experiment, one factor is varied at different levels to observe its effect on the response variable.
The analysis usually involves comparing the means of different levels of the factor to determine if there is a significant
difference.
Example: Studying the effect of one factor (e.g., temperature) on the yield of a reaction.
Factorial Experiments:
17/48
A factorial experiment involves varying two or more factors simultaneously, at different levels, to study not only the individual
effects of each factor but also the interactions between them.
This experiment allows for a more comprehensive understanding of how factors interact and their combined impact on the
response variable.
Example: A 2² factorial experiment would involve two factors (e.g., temperature and pressure) each at two levels (e.g., high and
low), resulting in 4 treatment combinations.
Let’s assume the experiment involves two factors A and B, each at two levels:
1. A1 , B1
2. A1 , B2
3. A2 , B1
4. A2 , B2
1. Experimental Design:
The experiment is arranged in a Randomized Block Design (RBD) with 3 blocks (replications).
2. Degrees of Freedom:
dftotal = 12 − 1 = 11
dftreatments = 4 − 1 = 3
dfblocks = 3 − 1 = 2
dferror = 11 − 3 − 2 = 6
4
SStreatments = ∑(Ti − Y )2
i=1
where Ti is the mean of each treatment combination and Y is the overall mean.
3
SSblocks = ∑(Bj − Y )2
j=1
18/48
where Bj is the mean of each block.
12
SSerror = ∑(Yi − Y^i )2
i=1
^i is the predicted value from the model.
where Yi is the observed value, and Y
4. Mean Squares:
SStreatments
M Streatments =
dftreatments
SSblocks
M Sblocks =
dfblocks
SSerror
M Serror =
dferror
5. F-statistics:
M Streatments
Ftreatments =
M Serror
M Sblocks
Fblocks =
M Serror
N N
1 1
A= ∑ yi (for A1 ) −
∑ yi (for A2 )
N N
i=1 i=1
N N
1 1
B= ∑ yi (for B1 ) − ∑ yi (for B2 )
N i=1 N i=1
N N
1 1
AB = ∑ yi (for AB1 ) −
∑ yi (for AB2 )
N i=1 N i=1
7. Interpretation:
The main effects represent the individual effects of the factors A and B on the response variable.
The interaction effect shows how the combination of factors A and B affects the response variable.
If the F-statistics for any of these effects are significant, it indicates that the corresponding factor or interaction has a significant
influence on the response variable.
19/48
Q11. (H) S18:
If a 2²² factorial experiment is carried out in L.S.D., then state the order of this L.S.D. and write an ANOVA table in this case.
In the context of experimental design, the Least Significant Difference (LSD) method is often used after performing an analysis of
variance (ANOVA) to determine if the means of different treatment groups are significantly different from each other.
For a 2²² factorial experiment, LSD is used when testing pairwise comparisons between treatment means. The LSD method relies on
the error mean square obtained from the ANOVA and compares it to the difference between treatment means.
Order of LSD:
For a 2²² factorial experiment with 2 factors (A and B), each at 2 levels, carried out in LSD, the order of the LSD would be based on
the number of replications performed.
If there are 3 replications (as we assume for this question), the LSD would involve comparing the means of different treatment
combinations (e.g., A1 B1 , A1 B2 , A2 B1 , and A2 B2 ) to see which ones are significantly different.
ANOVA Table:
Let’s denote the total number of observations as n = 4 × 3 = 12, where 4 is the number of treatment combinations, and 3 is the
number of replications.
The ANOVA table for the 2²² factorial experiment in an R.B.D. would look like this:
Source Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
Sum of Squares for Treatment (SS_T): Variability due to differences between the treatment groups.
Sum of Squares for Block (SS_B): Variability due to the differences between blocks (replications).
Sum of Squares for Error (SS_E): Variability within the blocks and treatments.
SST
M ST =
dfT
SS
20/48
SSB
M SB =
dfB
SSE
M SE =
dfE
4. F-statistics:
F for Treatment:
M ST
FT =
M SE
F for Block:
M SB
FB =
M SE
These F-statistics are used to test the significance of the treatment and block effects.
5. LSD Test:
After calculating the F-statistics and determining the significance, the LSD test can be performed. The LSD is calculated as:
2M SE
LSD = tα ×
r
where tα is the critical value from the t-distribution for the desired significance level, and r is the number of replications per
treatment.
The LSD value is then compared with the absolute difference between the means of treatment combinations. If the difference
between two treatment means is greater than the LSD value, they are considered significantly different.
Number of Factors Typically involves one factor being tested. Involves two or more factors simultaneously.
Treatment Only one treatment combination is applied. Several treatment combinations are applied to study their
Combinations interactions.
Interaction Effects No interaction effects between factors are Interaction effects between factors are studied.
considered.
Purpose Used for testing the effect of a single factor. Used to evaluate the effect of multiple factors and their
interactions.
Examples Example: Testing the effect of fertilizer type on Example: Testing the effects of fertilizer type and water amount on
plant growth. plant growth.
Statistical Analysis Simple analysis with single-factor ANOVA. Requires multi-factor analysis and often involves ANOVA for
multiple factors.
21/48
Complete Analysis of a 2²² Factorial Experiment in an R.B.D.:
Given:
A 2²² factorial experiment is conducted with two factors (A and B), each at two levels (e.g., A1 , A2 and B1 , B2 ). The experiment is
A1 B1 , A1 B2 , A2 B1 , and A2 B2
r = number of replications = 3
n = total number of experimental units = 4 (treatment combinations) × 3 (replications) = 12
1. Data Structure: The data will consist of measurements from the 4 treatment combinations, repeated over 3 blocks.
Source Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
SSA M SA
Factor A 1 SSA M SA =
1
FA =
M SE
SSB M SB
Factor B 1 SSB
M SB =
1
FB =
M SE
SSAB M SAB
Interaction (AB) 1 SSAB M SAB = 1
FAB =
M SE
SSB M SB
Blocks (B) 2 SSB
M SB =
2
FB =
M SE
SSE
Error (E) 6 SSE M SE =
6
SS for each source (A, B, AB, and Block) is computed based on the variation observed from the treatments and blocks.
F-statistics:
F-statistics for each factor and interaction are calculated by dividing the corresponding MS by the MS for the error term M SE .
F-statistics are then used to test the significance of factors and interaction.
Hypothesis Testing:
For each source (A, B, AB, Blocks), test the null hypothesis:
22/48
H0 : No effect (factor or interaction)
vs. H1 : Effect is significant.
Compare the F-statistics with the critical value from the F-distribution to determine significance.
Given:
A 2³³ factorial experiment is conducted with three factors (A, B, and C), each at 2 levels (e.g., A1 , A2 , B1 , B2 , and C1 , C2 ). The
Since each factor has 2 levels, the total number of treatment combinations is 23 = 8 combinations.
Treatment combinations: A1 B1 C1 , A1 B1 C2 , A1 B2 C1 , A1 B2 C2 , A2 B1 C1 , A2 B1 C2 , A2 B2 C1 , A2 B2 C2 .
The number of observations is the product of the number of treatments (8) and the number of blocks (5), so the total number
of observations is 8 × 5 = 40.
3. Degrees of Freedom:
Total df = 40 − 1 = 39
Block df = 5 − 1 =4
Factor A df = 2 − 1 =1
Factor B df = 2 − 1 =1
Factor C df = 2 − 1 =1
Interaction AB df = (2 − 1)(2 − 1) =1
Interaction AC df = (2 − 1)(2 − 1) =1
Interaction BC df = (2 − 1)(2 − 1) =1
Interaction ABC df = (2 − 1)(2 − 1)(2 − 1) =1
Error df = 39 − (4 + 1 + 1 + 1 + 1 + 1 + 1 + 1) = 28
ANOVA Table:
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
SSB M SB
Blocks 4 SSB M SB =
4
FB =
M SE
23/48
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
SSA M SA
Factor A 1 SSA M SA =
1
FA =
M SE
SSB M SB
Factor B 1 SSB
M SB =
1
FB =
M SE
SSC M SC
Factor C 1 SSC M SC =
1
FC =
M SE
SSAB M SAB
Interaction AB 1 SSAB
M SAB =
1
FAB =
M SE
SSAC M SAC
Interaction AC 1 SSAC M SAC =
1
FAC =
M SE
SSBC M SBC
Interaction BC 1 SSBC
M SBC =
1
FBC =
M SE
SSABC M SABC
Interaction ABC 1 SSABC M SABC =
1
FABC =
M SE
SSE
Error 28 SSE M SE =
28
Total 39 SStotal
Hypothesis Testing:
For each source (A, B, C, AB, AC, BC, ABC, and Blocks), we will test the null hypothesis:
H0 : No effect (factor or interaction)
vs. H1 : Effect is significant.
The F-statistic for each factor/interaction is compared to the critical value from the F-distribution table to determine significance. If
F exceeds the critical value, we reject H0 .
Stating the hypothesis to be tested, write the ANOVA table for this experiment.
1. Factorial Experiments:
Factorial Experiments involve studying multiple factors (variables) simultaneously, with each factor having two or more levels, and
all possible combinations of these levels are tested. This allows for the assessment of both the main effects and interaction effects
between factors.
Simple Experiments, on the other hand, typically test only one factor at a time, often ignoring the interactions between factors.
2. Experiment Details:
Factorial Design: We have a 23 factorial design, which means three factors (A, B, and C), each at two levels (e.g., A1 , A2 ; B1 , B2 ;
C1 , C2 ).
Randomized Block Design (RBD) with r blocks, meaning there are r replications of the treatments.
24/48
The number of treatments = 23 = 8 (for all combinations of A, B, C ).
The total number of observations = 8 × r (since each treatment is replicated r times).
3. Main Effect of A:
Main effect of A represents the average change in the response when we change factor A from its low level to its high level, while
keeping factors B and C constant.
r
i=1
Where:
4. Interaction Effects:
Interaction Effect AB :
The interaction effect between A and B measures how the effect of one factor (say A) changes depending on the level of another
factor (say B ).
r
i=1
The interaction effect between all three factors A, B , and C measures how the combined effect of factors A and B changes
depending on the level of factor C .
r
(∑ (YA2 B2 C2 − YA1 B2 C2 ) − (YA2 B1 C1 − YA1 B1 C1 ))
1
Interaction effect ABC =
r i=1
5. Hypothesis Testing:
For each effect, we test the following hypotheses:
Main Effect of A:
H0 : μ A 1 = μ A 2
(no main effect of A)
H1 : μ A 1
= μ A2
(main effect of A)
Interaction Effect AB :
25/48
H1 : There is an interaction between A and B
6. ANOVA Table:
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
SSB M SB
Blocks r−1 SSB
M SB =
r−1
FB =
M SE
SSA M SA
Main effect A 1 SSA M SA =
1
FA =
M SE
SSB M SB
Main effect B 1 SSB M SB =
1
FB =
M SE
SSC M SC
Main effect C 1 SSC M SC =
1
FC =
M SE
SSAB M SAB
Interaction AB 1 SSAB M SAB =
1
FAB = M SE
SSAC M SAC
Interaction AC 1 SSAC M SAC =
1
FAC =
M SE
SSBC M SBC
Interaction BC 1 SSBC M SBC =
1
FBC =
M SE
SSABC M SABC
Interaction ABC 1 SSABC M SABC =
1
FABC =
M SE
SSE
Error 8r − (1 + 1 + 1 + 1 + 1 + 1 + 1 + 1) = 8r − 7 SSE M SE =
8r−7
Total 8r − 1 SStotal
Conclusion:
The F-statistics for each effect are compared to the critical value from the F-distribution table. If the F-statistic exceeds the critical
value, the corresponding hypothesis is rejected, indicating that the factor or interaction effect is significant.
Simple Experiments:
Definition: In a simple experiment, only one factor is studied at a time, and the effect of this factor on the outcome is analyzed,
while keeping other factors constant.
Example: Studying the effect of a single factor, like temperature, on the yield of a chemical reaction.
26/48
Factorial Experiments:
Definition: Factorial experiments involve the study of two or more factors simultaneously. The interaction between these factors
is also analyzed, in addition to their individual effects.
Example: Studying the combined effect of temperature and pressure on the yield of a chemical reaction.
The experiment is carried out in a Randomized Block Design (RBD) with r replicates (blocks).
3. Orthogonality of Contrasts:
Orthogonality means that the contrasts (comparisons) between the factors and interactions do not interfere with each other and can
be tested independently.
Main Effect A: The main effect of A compares the response when factor A is at its two levels (e.g., A1 vs A2 ).
Main Effect B : The main effect of B compares the response when factor B is at its two levels (e.g., B1 vs B2 ).
Interaction Effect AB : The interaction effect of A and B compares how the combination of factors A and B affects the response.
Interaction Effect ABC : The interaction effect of A, B , and C compares how the combination of A, B , and C affects the
response.
Now, we show that the contrasts for the main effects and interaction effects are orthogonal:
The contrasts CA , CB , CAB , and CABC are orthogonal because the interaction between each factor (or combination of factors) is
independent, and the effects of one factor do not interfere with the effects of others.
This orthogonality condition allows us to test the main effects and interaction effects independently in the ANOVA.
4. ANOVA Table:
27/48
Source of Variation Degrees of Freedom (df) Sum of Squares (SS) Mean Squares (MS) F-statistic
SSB M SB
Blocks r−1 SSB M SB =
r−1
FB =
M SE
SSA M SA
Main effect A 1 SSA
M SA =
1
FA =
M SE
SSB M SB
Main effect B 1 SSB M SB =
1
FB =
M SE
SSC M SC
Main effect C 1 SSC
M SC =
1
FC =
M SE
SSAB M SAB
Interaction AB 1 SSAB M SAB =
1
FAB =
M SE
SSAC M SAC
Interaction AC 1 SSAC
M SAC =
1
FAC =
M SE
SSBC M SBC
Interaction BC 1 SSBC M SBC =
1
FBC =
M SE
SSABC M SABC
Interaction ABC 1 SSABC M SABC =
1
FABC =
M SE
SSE
Error 8r − 7 SSE M SE =
8r−7
Total 8r − 1 SStotal
The computed F-values are compared to the critical value from the F-distribution table with the appropriate degrees of freedom to
determine whether the effect is significant or not.
If F > Fcritical , the null hypothesis is rejected, meaning the corresponding effect is significant.
Conclusion:
This detailed solution shows that in a 2³³ factorial experiment arranged in an RBD, the main effects and interaction effects are
orthogonal, and we can test their significance using the ANOVA table. By comparing F-values with critical values, we can decide
whether the factors and interactions significantly affect the response.
In a 23 factorial experiment, there are three factors: A, B , and C , each at two levels (e.g., A1 , A2 ; B1 , B2 ; C1 , C2 ). This results in 8
treatment combinations.
The orthogonality of the main effects means that the contrasts used to estimate the main effects do not interfere with each other,
making it possible to estimate them independently.
For simplicity, the treatment combinations can be listed as follows (with levels 1 and 2 for each factor):
28/48
A B C Y (Response)
1 1 1 y111
1 1 2 y112
1 2 1 y121
1 2 2 y122
2 1 1 y211
2 1 2 y212
2 2 1 y221
2 2 2 y222
1. Main Effect A:
The main effect of factor A compares the average response at the levels of A, keeping B and C constant:
2. Main Effect B :
Similarly, the main effect of factor B compares the average response at the levels of B , keeping A and C constant:
3. Main Effect C :
The main effect of factor C compares the average response at the levels of C , keeping A and B constant:
The contrasts CA , CB , and CC are orthogonal to each other because the effects of A, B , and C are independent, meaning that
the impact of each factor can be measured independently of the others. This is due to the structure of the factorial design where
the interactions are balanced.
In simple terms, this orthogonality ensures that no factor's effect is confounded with the effect of any other factor, allowing for
independent estimation of main effects.
2. Yate's Method for Calculating Factorial Effect Totals in a 2³³ Factorial Experiment:
Yate’s method is a systematic procedure used to calculate factorial effects (main effects and interaction effects) in factorial experiments.
The factorial effect is defined as the difference between the sum of responses for certain treatment combinations. Yate's method
breaks down the calculation into the following steps:
In a 23 factorial experiment, label the treatment combinations based on the levels of factors A, B , and C .
29/48
Main effect of A is calculated by subtracting the average response at A2 from that at A1 (while averaging over the levels of B
and C ).
Similarly, the main effects of B and C are computed by holding the other factors constant and comparing the levels of each
factor.
The interaction effects AB , AC , and BC , and higher-order interactions like ABC can be calculated by comparing the
response totals at specific treatment combinations, using the differences between paired treatment averages.
Main effect of B :
Main effect of C :
Interaction effect AB :
Effect of ABC = (y111 + y112 − y121 − y122 ) − (y211 + y212 − y221 − y222 )
Conclusion:
The main effects AA , BB , and CC in a 2³³ factorial experiment are orthogonal because the design is balanced, and no factor's
Yate's method provides a systematic way to compute the factorial effects, helping to break down the experimental responses into
their components (main effects and interactions).
A B Y (Response)
1 1 y11
1 2 y12
30/48
A B Y (Response)
2 1 y21
2 2 y22
The main effect of factor A is calculated as the difference between the sum of the responses at A1 and A2 , while keeping factor B
constant.
This equation computes the difference in the average response when A is at level 1 compared to when it is at level 2, averaging over
the levels of B .
The main effect of factor B is calculated similarly by comparing the sum of responses at B1 and B2 , while keeping factor A constant.
This computes the difference in the average response when B is at level 1 compared to when it is at level 2, averaging over the levels of
A.
Interaction Effect AB :
The interaction effect between factors A and B shows how the effect of one factor changes depending on the level of the other factor.
It is calculated as:
This equation computes the difference in the change of response when A moves from 1 to 2, comparing both levels of B .
A B Y (Response)
1 1 10
1 2 15
2 1 20
2 2 25
Main Effect of A:
Main Effect of B :
31/48
CB = [(y11 + y21 ) − (y12 + y22 )]
Interaction Effect AB :
4. Conclusion:
Main effect of A: −20
Interaction effect AB : 0
This analysis tells us that factor A has a significant effect on the response (since the main effect of A is large), while factor B also has
an effect (but smaller), and there is no interaction between factors A and B (since the interaction effect is 0).
In a factorial experiment, the interaction effect ABAB represents how the effect of one factor depends on the level of the other factor.
2
Specifically, in a 2 factorial experiment, this effect shows how the difference in response when factor A changes is influenced by the
change in factor B .
In this context, for a 22 factorial experiment with two factors A and B , each at two levels, the interaction effect is calculated as follows:
2. Experiment Layout:
A B Y (Response)
1 1 y11
1 2 y12
2 1 y21
2 2 y22
Where:
32/48
3. General Expression for the Interaction Effect ABAB :
To find the interaction effect ABAB , we look at how the difference in response changes when both A and B vary. This interaction
This formula captures the difference in how the response changes when A changes from level 1 to level 2, as compared between the
levels of B .
The interaction effect CAB is essentially the difference between the change in response due to factor A at B1 and the change in
Now, the interaction effect is the difference between the two differences:
This formula is derived from the basic principle that the interaction effect captures how the effects of one factor change at different
levels of the other factor.
5. Example Calculation:
Let’s consider the following hypothetical values for the responses:
A B Y (Response)
1 1 10
1 2 15
2 1 20
2 2 25
At B1 : y11
− y21 = 10 − 20 = −10
At B2 : y12
− y22 = 15 − 25 = −10
Thus, the interaction effect ABAB is 0, meaning that there is no interaction between factors A and B in this case.
33/48
6. Conclusion:
The interaction effect ABAB is computed as the difference between the changes in the response at different levels of the two factors.
In our example, the interaction effect is zero, meaning the factors do not interact.
In a 23 factorial experiment, we have 3 factors, each at two levels, typically denoted as −1 (low) and +1 (high). The goal of Yate’s
method is to compute the main effects and interaction effects for all combinations of these factors.
Run A B C Response Y
1 -1 -1 -1 y1
2 -1 -1 +1 y2
3 -1 +1 -1 y3
4 -1 +1 +1 y4
5 +1 -1 -1 y5
6 +1 -1 +1 y6
7 +1 +1 -1 y7
8 +1 +1 +1 y8
Where:
A, B , and C are the factors, and each factor can be at two levels: −1 and +1.
The response variable Y represents the measurement of the outcome at each combination of factor levels.
34/48
For a main effect A, the contrast is defined as the average response at high levels of A minus the average response at low
levels of A, adjusted for other factors.
Similarly, for interaction effects, contrasts are calculated using combinations of the factors.
The contrast for the main effect of A is the difference between the average responses at the high and low levels of factor A, i.e.,
between the runs where A = +1 and A = −1.
(y5 + y6 + y7 + y8 ) − (y1 + y2 + y3 + y4 )
Amain =
4
Similarly, for factor B , the contrast is the difference between the average responses at the high and low levels of B , i.e., between the
runs where B = +1 and B = −1.
(y3 + y4 + y7 + y8 ) − (y1 + y2 + y5 + y6 )
Bmain =
4
(y2 + y4 + y6 + y8 ) − (y1 + y3 + y5 + y7 )
Cmain =
4
4. Interaction Effects:
The contrast for the interaction effect AB is calculated as the difference between the combined effects of A and B at their high and
low levels.
(y7 + y8 + y5 + y6 ) − (y1 + y2 + y3 + y4 )
ABint =
4
(y3 + y4 + y5 + y6 ) − (y1 + y2 + y7 + y8 )
ACint =
4
(y2 + y4 + y6 + y8 ) − (y1 + y3 + y5 + y7 )
BCint =
4
4
35/48
6. Example Calculation:
If we have the following response values for the 23 factorial experiment:
Run A B C Y (Response)
1 -1 -1 -1 10
2 -1 -1 +1 15
3 -1 +1 -1 20
4 -1 +1 +1 25
5 +1 -1 -1 30
6 +1 -1 +1 35
7 +1 +1 -1 40
8 +1 +1 +1 45
You can apply the formulas for the contrasts as demonstrated earlier to calculate the main effects and interaction effects using Yate’s
method.
7. Conclusion:
Yate’s method provides an efficient way to compute factorial effects by calculating the contrasts between different combinations of
factor levels. This method helps in identifying the impact of main effects and interactions in a factorial design.
In a factorial experiment, orthogonal contrasts help us isolate the effects of individual factors and interactions. This is important
because it simplifies the interpretation of the main effects and interaction effects.
For a 22 factorial experiment, there are two factors N and P , each at two levels (denoted as −1 and +1).
1 -1 -1 y1
2 -1 +1 y2
3 +1 -1 y3
4 +1 +1 y4
36/48
Where:
The main effect of N is the difference between the average response at the high and low levels of N . We calculate this contrast using
the following formula:
(y3 + y4 ) − (y1 + y2 )
Nmain =
2
The interaction effect NP measures how the combination of factors N and P affects the response, over and above the main effects.
The contrast for the interaction effect is:
(y2 + y3 ) − (y1 + y4 )
N Pint =
2
This contrast captures the interaction between N and P , i.e., how the effect of N changes at different levels of P and vice versa.
5. Showing Orthogonality:
To show that the main effect of N and the interaction effect NP are orthogonal, we need to check if the sum of the products of their
coefficients is zero.
The coefficients for the main effect N are 1, 1, −1, −1 (corresponding to the runs 3, 4, 1, 2).
The coefficients for the interaction effect NP are 1, −1, −1, 1 (corresponding to the runs 2, 3, 1, 4).
Since the sum is zero, the main effect of N and the interaction effect NP are orthogonal.
6. Conclusion:
The main effect of N and the interaction effect NP are orthogonal contrasts in the 22 factorial experiment. This means the effect of N
can be estimated independently of the interaction between N and P , and vice versa, making the analysis more straightforward and
interpretable.
37/48
Would you like to proceed with the next question?
The experiment is arranged in a Randomized Block Design (RBD), which means that the experimental units are grouped into blocks to
account for variability between them. Each combination of the factor levels (i.e., all possible factor combinations) is applied in each
block.
The experiment layout for a 22 factorial in an RBD, with 2 blocks (for example), is as follows:
1 1 -1 -1 y1
1 2 -1 +1 y2
1 3 +1 -1 y3
1 4 +1 +1 y4
2 1 -1 -1 y5
2 2 -1 +1 y6
2 3 +1 -1 y7
2 4 +1 +1 y8
Where:
Each treatment combination is repeated in each block to reduce the effects of block-to-block variability.
2. Hypotheses to be Tested:
For a 22 factorial experiment in an RBD, the hypotheses for the main effects and interaction effects are:
Main Effect of N :
38/48
3. ANOVA Table for the 22 Factorial Experiment in an RBD:
An Analysis of Variance (ANOVA) table summarizes the variation in the response variable due to different sources (main effects,
interaction, and error). Here's how the ANOVA table would look for this experiment:
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-ratio (F)
F-ratio = Comparison of the Mean Square of the effect with the Mean Square of Error
5. Conclusion:
In this ANOVA table:
The main effects of N and P are tested to check if the individual factors significantly affect the response.
The interaction effect NP is tested to check if the combined effect of N and P significantly affects the response.
BB , and CC , each at two levels, is applied in an RBD with r blocks, obtain expressions for:
1. Main effect AA
Stating the hypothesis to be tested, write the ANOVA table for this experiment.
39/48
1. Factorial Experiments:
A factorial experiment is an experiment where two or more factors are varied simultaneously to investigate their individual and
interactive effects on the response variable. Each factor is tested at multiple levels, and every combination of these levels is considered
in the experiment.
For a 23 factorial experiment, there are 3 factors AA , BB , and CC , each at 2 levels (say −1 and +1).
Simple Experiment: Involves varying only one factor at a time to assess its effect on the response variable. Each factor is tested
independently, and its effect is considered in isolation.
Factorial Experiment: Involves varying multiple factors simultaneously and studying both their individual and interaction effects.
The goal is to understand how factors work together to influence the response variable.
Run AA BB CC Response Y
1 -1 -1 -1 y1
2 -1 -1 +1 y2
3 -1 +1 -1 y3
4 -1 +1 +1 y4
5 +1 -1 -1 y5
6 +1 -1 +1 y6
7 +1 +1 -1 y7
8 +1 +1 +1 y8
Main Effect AA :
The main effect of factor AA is defined as the difference between the average response when AA is at its high level (+1) and its low
level (-1), averaged over all levels of the other factors BB and CC .
1
Main Effect of AA = [(y5 + y6 + y7 + y8 ) − (y1 + y2 + y3 + y4 )]
4
Main Effect BB :
Similarly, the main effect of factor BB is the difference between the average response when BB is at its high level (+1) and its low level
1
Main Effect of BB = [(y3 + y4 + y7 + y8 ) − (y1 + y2 + y5 + y6 )]
4
Main Effect CC :
The main effect of factor CC is the difference between the average response when CC is at its high level (+1) and its low level (-1),
40/48
1
Main Effect of CC = [(y2 + y4 + y6 + y8 ) − (y1 + y3 + y5 + y7 )]
4
Interaction Effect AB AB :
The interaction effect between AA and BB measures how the effect of one factor changes at different levels of the other factor. It is
calculated as the difference between the response at the combined levels of AA and BB , averaged over all levels of the third factor CC .
1
Interaction Effect of AB AB = [(y5 + y6 ) − (y1 + y2 ) − (y7 + y8 ) + (y3 + y4 )]
2
Interaction Effect AC AC :
1
Interaction Effect of AC AC = [(y3 + y4 ) − (y1 + y2 ) − (y5 + y6 ) + (y7 + y8 )]
2
FBB = M SBB /M SE
Interaction AB AB SSAB AB
1 M SAB AB = SSAB AB /1
FAB AB = M SAB AB /M SE
FAC AC = M SAC AC /M SE
4. Conclusion:
In this ANOVA table, the main effects of AA , BB , and CC , as well as their interaction effects, are tested for significance. The error term
accounts for the variability not explained by the factors and their interactions.
Factor A: A1 = −1, A2 = +1
Factor B : B1 = −1, B2 = +1
41/48
Run A B Response Y
1 -1 -1 y1
2 -1 +1 y2
3 +1 -1 y3
4 +1 +1 y4
For the RBD, assume that the experiment is repeated r times (blocks) and each block contains all treatment combinations.
Where:
Yij is the response for the i-th level of factor A, the j -th level of factor B , and the j -th block.
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-ratio (F)
SSTotal = ∑(Yij − Yˉ )2
42/48
2. Sum of Squares for Factor A (SSA ):
SSA = r × (∑ (YˉA − Yˉ ) )
2
SSB = r × (∑ (YˉB − Yˉ ) )
2
SSAB = r × (∑ (YˉAB − Yˉ ) )
2
If the F-ratio is large enough (greater than the critical F-value from the F-distribution table), the null hypothesis is rejected,
indicating a significant effect.
Factor A: A1
= −1, A2 = +1
Factor B : B1
= −1, B2 = +1
Run A B Response Y
1 -1 -1 y1
2 -1 +1 y2
3 +1 -1 y3
4 +1 +1 y4
In an RBD (Randomized Block Design), we have r blocks. Each block will contain all the treatment combinations. Let’s assume that the
experiment is replicated across r blocks.
43/48
Step 2: Layout of the Experiment
In the RBD, the treatments are randomly assigned to the blocks. Each block consists of all four treatment combinations. The layout of
the experiment could look like this (for r = 3 blocks):
Block A B Response Y
1 -1 -1 y1
1 -1 +1 y2
1 +1 -1 y3
1 +1 +1 y4
2 -1 -1 y5
2 -1 +1 y6
2 +1 -1 y7
2 +1 +1 y8
3 -1 -1 y9
3 -1 +1 y10
3 +1 -1 y11
3 +1 +1 y12
Here, the treatments are randomly assigned to each block, and each block contains all four treatment combinations.
Source of Variation Sum of Squares (SS) Degrees of Freedom (df) Mean Square (MS) F-ratio (F)
SSTotal = ∑(Yij − Yˉ )2
44/48
SSA = r × (∑ (YˉA − Yˉ ) )
2
SSB = r × (∑ (YˉB − Yˉ ) )
2
SSAB = r × (∑ (YˉAB − Yˉ ) )
2
If the F-ratio is large enough (greater than the critical F-value from the F-distribution table), the null hypothesis is rejected,
indicating a significant effect.
BLUE is an estimator that is the best (has the least variance) among all unbiased linear estimators.
One assumption of ANOVA is that the errors (residuals) are normally distributed.
β0 , β1 , … , βn are linear.
45/48
Q4. What are uniformity trials in the design of experiments?
Uniformity trials are used to assess the uniformity or homogeneity of experimental units, often to detect any variation across
experimental plots.
Efficiency of a design refers to how well it achieves a given objective (e.g., estimating treatment effects) with the least possible
error, typically expressed as the ratio of the variance of an optimal design to the variance of the proposed design.
Q6. What are the degrees of freedom (d.f.) for the Error Sum of Squares (ESS) in a CRD?
The degrees of freedom for ESS in a Completely Randomized Design (CRD) is n − t, where n is the total number of observations
and t is the number of treatments.
A standard Latin square is a n × n grid where each treatment appears exactly once in each row and column.
LSD (Least Significant Difference) is an improvement over RBD (Randomized Block Design) because it allows for comparison of
multiple treatments while controlling the Type I error rate.
A treatment contrast is a linear combination of treatment means with specific coefficients summing to zero, used to compare
different treatment effects.
2
46/48
A treatment contrast is a comparison between two or more treatment means, represented as a linear combination of treatment
means.
Q12. What is the purpose of conducting uniformity trials in the design of experiments?
The purpose is to assess the consistency of experimental units in terms of response, ensuring that the treatment effects can be
attributed to the treatments rather than variability in the experimental units.
A fertility contour map is a graphical representation that shows the variation in soil fertility across experimental plots or fields.
A linear mixed effect model is a statistical model that includes both fixed effects (e.g., treatment) and random effects (e.g.,
random variations between subjects or blocks).
Q15. Write a possible layout of an RBD with 4 treatments (A, B, C, D) and 3 replicates.
1 A B C D
2 B C D A
3 C D A B
One disadvantage of L.S.D. is that it does not account for block effects, potentially leading to less accurate conclusions compared
to R.B.D..
Q17. Write the standard order of treatment combinations in a 23 factorial experiment with factors A, B, and C.
(−1, −1, −1), (−1, −1, +1), (−1, +1, −1), (−1, +1, +1), (+1, −1, −1), (+1, −1, +1), (+1, +1, −1), (+1, +1, +1)
47/48
The linear model for one-way ANOVA is:
Yij = μ + τi + ϵij
where Yij is the response, μ is the overall mean, τi is the effect of treatment i, and ϵij is the random error.
Critical difference is used to determine if the difference between two treatment means is statistically significant at a given
confidence level.
Q20. Define the critical difference for testing the significance of the difference between two treatment means.
Critical difference is the minimum difference between two treatment means that is required to be considered statistically
significant, typically based on the Student’s t-distribution.
Q21. What is meant by the efficiency of a design in the context of experimental designs?
The efficiency of a design refers to how well it minimizes the experimental error and maximizes the precision of estimates while
using fewer resources.
Q22. If a 23 factorial experiment is applied over LSD, what should be the order of the Latin Square? Why?
The order of the Latin Square should be 3 (a 3 × 3 grid), as it allows each treatment to appear once in each row and column,
ensuring that each treatment is tested under all conditions of other factors.
48/48