Bstat Design
Bstat Design
Planning of Experiments
1
4. Choice of experimental design
This is the actual data collection process. The experimenter should carefully
monitor the progress of experiment to ensure that it is proceeding according to the plan.
Particular attention should be paid to randomization, measurement accuracy and
maintaining as uniform an experimental environment as possible.
6. Data Analysis
Once the data has been analyzed, the experimenter may draw conclusions or
inferences about his results. The statistical inferences must be physically interpreted,
and the practical significance of these findings must be evaluated. Then
recommendation concerning these findings must be made. These recommendations
involve further round of experiments, as experimentation is usually an iterative process,
with one experiment answering some questions and simultaneously posing other. In
presentations of results and conclusion, charts and graphs are very effective.
Treatments: - The different factors whose effects are being compared are called
treatments or varieties. Eg: standard ration, temperature, humidity combination,
insecticides.
2
Experimental material: The material to which treatment is applied.
Sample unit: - The sample unit can be identical to the experimental unit or it can be a
part of the experimental unit. For example, if we measure weights of independent
calves at the age of 6 months, then a calf is a sample and an experimental unit. On the
other hand, if some treatment is applied to 10 chicks in a cage and each chick is
weighed, then the cage is the experimental unit and each chick is a sample unit.
Definition:
3
4. The specification of measurements or other records to be made on each
experimental unit
2. Precision: - Precision means how close the measurements are to one another
regardless of how close they are to the true mean, that is, it explains the
repeatability of the experiment. Accuracy represents how close the estimated mean
of replicated measurements is to the true mean. The closer to the true mean, the
more accurate the result. Random errors affect precision of an experiment and to a
lesser extent its accuracy. A small random error means greater precision.
Systematic errors affect the accuracy of the experimental, but not the precision. In
order to have a successful experiment, systematic errors must be eliminated and
random errors should be as small as possible. In experiments precision is expressed
n
as the amount of information I , where n is the number of observations in a
2
group or treatment, and 2 is the variance between units in the population. just as
the estimator of variance 2 is the mean square error s 2 MSE ,the estimate of
n
the amount of information is I
MSE
4
Reciprocal of I is the square of the estimator of the standard error of the mean
1 MSE
s y 2 that is , more information results in smaller standard error and
I n
estimation of mean is more precise. More information and greater precision also
results in easier detection of possible differences between means.
The observations in any experiment are affected not only by the action of
treatment, but also by some extraneous factors which tend to mask the effect of
treatment. Randomization, replication and local control are the three basic principles of
experimental design, for increasing the precision of experiment and also for drawing
valid inferences from the experiments.
5
1. Randomization:- After the treatments and experimental units are decided the
treatments are allotted to the experimental units at random to avoid any type of
personal or subjective bias which may be conscious or unconscious. This ensures
validity of the results. It helps to have an objective comparison among the
treatments. It also ensures independence of observation which is necessary for
drawing valid inference from the observations by applying appropriate statistical
techniques.
6
UNIT II
Uniformity Trials
A crop is grown in a field under uniform conditions and variability is measured among
equal size plots for yield or any other trait, and then it is called a uniformity trial. A
fertility map is prepared on the basis of fertility gradient. Blocks are formed on the
patches that have same fertility.
3. The errors are distributed normally with mean zero and common variance 2.
ANOVA is based on linear statistical model. The ANOVA model for an experiment
with different levels of only one factor is
7
yij ti eij ; i=1, 2, .t and j=1, 2…r
where yij is the value of the variable in the jth replicate of the ith treatment is the
general mean effect, ti is the effect due to ith treatment , eij is the random error which is
assumed to be independently and normally distributed with mean zero and variance e2
CRD is the simplest design using only two basic principles of experimentation,
namely replication and randomization. In this design, the whole experimental material,
supposed to be homogeneous, is divided in to a number of experimental units
depending upon the number of treatments and number of replications for each
treatment. The treatments are then allotted randomly to the units in the entire material.
This design is useful for laboratory or greenhouse experiments, where as its use in field
experiment is limited. Missing values or unequal replications do not create any
difficulty in the analysis of this design.
Advantages
1. The number of replications may be varied from treatment to treatment. Because
of this flexibility, all the available experimental material can be used with out
any wastage.
2. CRD provides the maximum number of degrees of freedom for the estimation
of experimental error.
3. The statistical analysis of CRD is very simple, even if information on some
units are missing.
Disadvantages
1. It is less accurate than other designs, when large number of treatments are
included, a relatively large amount of experimental material must be used. Here,
the heterogeneity of experimental material will be increased. This will result in
increased experimental error and reduced precision.
8
Lay out of CRD
The placement of the treatments on the experimental units along with the arrangement
of experimental units is known as the lay out of an experiment
Suppose that there are t treatments, namely T1 , T2 ,......Tt . Further suppose that the
treatments are replicated r times each. Then we require txr=n experimental units. In
case of unequal replications the number of experimental units required will be
r1 r2 ..... rt n
1 2 3 4 5
10 9 8 7 6
11 12 13 14 15
20 19 18 17 16
The n distinct three-digit random numbers are selected from random number tables.
The random numbers are written in order and are ranked. The first set of r units are
allotted to treatments T1, next r units to T2 and so on. The procedure is continued until
all treatments have been applied. For example, the selected random numbers and their
ranks are as follows:
9
That is, treatment 1 is applied to units 18,4,10 and 9: treatment 2 is applied to units
14,7,19 and 13, and so on. The final layout with unit number and treatment allocated
will be as follows:
1 T4 2 T3 3 T4 4 T1 5 T3
10 T1 9 T1 8 T3 7 T2 6 T3
11 T5 12 T4 13 T2 14 T2 15 T4
20 T5 19 T2 18 T1 17 T5 16 T5
Analysis of CRD
where yij is the value of the variable in the jth replicate of the ith treatment, is the
general mean effect, ti is the effect due to ith treatment , eij is the random error which is
assumed to be independently and normally distributed with mean zero and variance e2
Treatment
1 2 3 … t
y11 y21 y31 … yt1
y12 y22 y32 ... yt2
. . .
: : :
y1r1 y2r2 y3r3 … ytrt Grand Total
Total T1 T2 T3 Tt G
No. of
replication r1 r2 r3 rt n
Next , the required sum of squares are computed.
G2
Correction Factor (CF) = where G is the grand total,
n
t ri
= yi 1 j 1
ij
2
_ CF ,
10
T12 T22 Tt 2
CF = Ti 2
Treatment sum of squares (SST) = r1
t
.....
r2 rt
i 1 ri
CF ,
If the calculated value of F is greater than the table value of F ,t 1, nt , where
denotes the level of significance, the hypothesis H0 is rejected and it can be inferred
that the treatment effects are significantly different from one another.
A non significant F may result either due to small treatment difference or a very large
experimental error or both. It does not mean always that all the treatments have the
same effect. When the experimental error is large, it is an indication of the failure of the
experiment to detect treatment differences. In order to find the reliability of the
experiment, the coefficient of variation (CV) is used. It is computed as
MSE
CV 100
overallmean
In case of significant F , the null hypothesis is rejected. Then the problem is to know
which of the treatment means are significantly different. The most commonly used test
for this purpose is least significant differs (LSD) other wise known as critical difference
(CD)
11
The formula is CD=t SE(d) where SE(d) is the standard error of difference of means
and t is the table value of t for a specified level of significance and error degrees of
1 1
freedom . If we are computing ith and jth treatment mean, the SE(d) = MSE
r r
i j
where ri and rj are the number of replication for ith and jth treatments respectively. If the
2MSE
replications are equal, SE(d)= .
r
Two treatment means are significantly different if the difference between treatment
means is greater than the calculated CD value, other wise they are not significantly
different.
Contrast
Z c1 T1 c2 T2 ... ct Tt
do not have the same number of replications say treatment T i has ri replications, then
for Z to be a contrast, the condition is, rc
i
i i 0.
Eg: T1 T2 , T1 2T2 T3
Orthogonal contrast
Two contrasts for the same set of treatment means, each of them having the
same number of replications, are said to be orthogonal if sum of the product of the
corresponding coefficient is zero. The contrasts Z1 and Z2 ,
12
Z1 c1 T1 c2 T2 ... ct Tt and Z2 d1 T1 d2 T2 ... dt Tt are orthogonal if and only
rc d
i
i i i 0 . For t treatments, there cannot be more than k-1 orthogonal contrasts.
Advantages
Disadvantage
13
When the number of treatments is increased, the block size will increase. If the block
size is large it may be difficult to maintain homogeneity within blocks. Consequently,
the experimental error will be increased. Hence RBD may not be suitable for large
number of treatments.
Each block is divided in to t units. The units in each block are numbered from 1 to t.
The treatments are also numbered conveniently. By using random number table, we
select t distinct random numbers from 1 to t. These random numbers correspond to the
treatment numbers. The first selected treatment is applied to the first unit of a block, the
second selected treatment to the second unit and so on. The randomization is done in
each block in the same way.
Analysis of RBD
where yij is the value of the variate for the ith treatment in the jth block , is the general
mean effect, t i is the effect due to ith treatment , bj is the effect due to jth block and eij is
random error which is assumed to be independently and normally distributed with mean
zero and variance e2 .
The result from RBD can be arranged in two- way table according to the replications
(blocks) and treatments. There will be rt observations in total. The data arrangement is
given below.
Treatments Replication(blocks)
1 2 3 … r Total
1 Y11 y12 y13 … y1r T1
2 Y21 y22 y23 y2r T2
3 Y31 y32 y33 y3r T3
. . .
. . .
. . .
t yt1 yt2 yt3 ………………………. ytr Tt
B1 B2 B3 …………………………..Br G
14
The total variance is divided in to three sources of variation: between blocks, between
treatments and error. The required sum of squares are obtained as follows
G2
Correction Factor (CF) = ,
rt
Total sum of squares (TSS ) = yij2 CF ,
1 r 2
Block sum of squares (SSB) = B j _ CF ,
t j 1
1 t 2
Treatment sum of squares (SST) = Ti CF ,
r i 1
Error sum of squares (SSE) = TSS-SST-SSB.
The ANOVA table for testing the hypothesis H0:T1=T2=……=Tt against, the
alternative that they are not all equal is given below:
Source of degrees of Sum of Mean sum of
variation freedom squares squares F Table F
Blocks or r-1 SSB SSB MSB Fr 1, r 1t 1
MSB=
Replications r 1 MSE
MSB
If calculated value of F= is less than Fr 1, r 1t 1 , there is no significant
MSE
difference between blocks and it is an indication that the RBD will not contribute to
precision in detecting treatment differences. In such situations, the adoption of RBD in
preference to CRD is not advantageous.
MST
If calculated value of F= is greater than table value of Ft 1, r 1t 1 , the
MSE
hypothesis H0 is rejected and it can be inferred that there is significant difference
between treatment means. Now, for comparing pairs of treatment, we can calculate
CD=t.SE(d)
15
where t= table value of t for specified level of significance and error degrees of
2MSE
freedom and SE (d) = .
r
Ti
The treatments means are given as (i 1, 2.....t ) . Any two treatment means are said
r
to differ significantly if their difference is larger than the critical difference.
Efficiency of Blocking
I ( RBD)
efficiency of RBD over CRD is obtained as RE (RBD) = 100 . If RE is more
I (CRD)
than 100%, the excess is known as the gain in efficiency due to RBD.
When the error degrees of freedom is less than 20, the relative efficiency (RE) has to
be adjusted by multiplying it by the precision factor. The precision factor is computed
(ne 1)(ne1 3)
as PF where ne-error degrees of freedom and ne1=nr+ne=error degrees
(ne 3)(ne 1)
1
16
treatment occurs once and only once in each row and each column. The number of rows
and columns are equal. Hence the arrangement will form a square. Thus, a latin square
of size s is an arrangement of s latin letters in to s2 positions such that every row and
every column contains every treatment precisely once. Through the elimination of row
and column effects, the error variation can be considerately reduced.
LSD is not suitable for less than five treatments. For a Latin square with five
treatments, written as 5x5 Latin square, arrangement may be as follows:
A B C D E A B C D E
B C D E A B A E C D
C D E A B C D A E B
D E A B C D E B A C
E A B C D E C D B A
The selection of squares can be done from Fisher and Yates (1953) statistical tables.
Some terminologies
Standard square:- A standard square is one in which the first row and first column are
ordered alphabetically.
Conjugate Square:- Two standard squares are said to be conjugate squares when the
rows of one square are the columns of other square.
If the number of treatments is t, then txt standard square is selected from Fisher and
Yates (1953) statistical tables. The columns of the selected standard square are
rearranged randomly by using random number tables. Then keeping the first row of the
rearranged square as such, the remaining rows are randomized. Then treatments are
then allocated as in the final arrangement.
Analysis of LSD
yijk ri c j tk eijk
where yijk is the observation on the kth treatment in the ith row and jth
column(i,j,k=1,2,3………t) is the general mean effect, ri is the effect due to ith row ,
cj is the effect due to jth bcolumn, tk is the effect due to kth treatment and eij is random
17
error which is assumed to be independently and normally distributed with mean zero
and variance e2 .
The results of LSD will be in the form of two-way tables according to rows and
columns. The results have to be arranged according to treatments also. Let R i denote
the ith row total, Cj be the jth total, Tk be the kth treatment total and G be the grand total.
The different sum of squares t x t LSD can be obtained as follows:
G2
Correction Factor (CF) =
t2
Total sum of squares (TSS) = yijk
2
CF
Ri2
Sum of squares due to Rows (SSR) = CF
t
C 2j
Sum of squares due to columns (SSC) = CF
t
Tk2
Sum of squares due to Treatments (SST) = CF
t
Sum of squares due to Error (SSE) = TSS-SSR-SSC-SST
For testing the hypothesis H0:T1=……..=Tt against the alternative T’s are not all equal,
the ANOVA table is as given below
If F is not significant for treatments, we can conclude that the treatment effect do not
differ significantly. If F is significant, we calculate CD= t x SE(d) where t denote the
18
table value of t for a specified level of significance and error degrees of freedom and
2MSE
SE(d) = , where t= number of rows or columns.
t
Efficiency of LSD
In estimating efficiency of LSD over RBD , we have to consider the type of blocks. If
LSD had been RBD with columns as blocks it is termed as column blocking. Similarly,
if LSD had been RBD with rows as blocks, it is termed as row blocking. When we
resort to column blocking, variation due to rows will be added to error variation.
Hence, in case of column blocking the estimate of MSE for RBD is given by
When the error degrees of freedom is less than 20, the precision factor is to be taken in
to account. The precision factor is
ne 1 ne1 3
PF= 1 where ne= error degrees of freedom for LSD,
ne 3 ne 1
ne1 =nr+ne= nc+ne.error degrees of freedom for RBD
For estimating efficiency of LSD over CRD, the estimate of MSE for CRD is
nr MSR nc MSC (nt ne ) MSE
MSE(CRD)
nr nc nt ne
I ( LSD) MSE (CRD)
The RE of LSD over CRD is RE= 100 100 .
I (CRD) MSE ( LSD)
In the P.F formula, the number of degrees of freedom for CRD will be ne1 nr nc ne .
19
Unit IV
With a single missing value in RBD with t treatments and r replications each , the
rBi1 tT j1 G1
first step is to estimate the missing value by using the formula, X
(r 1)(t 1)
where X is the estimate of the missing value, Bi1 is the total of available values in the ith
block (with the missing value), Tj1 is the total of available values in the jth treatment
(with the missing value), G1 is Grand total of all available values.
The analysis then carried as usual after substituting the estimated value of the missing
observation with the following changes:
B11 t 1 X Bi1 tT j1 G1
2 2
Bias= or .
t (t 1) t (t 1)(r 1) 2
3. For comparing mean of the treatment with a missing value and the mean of any
MSE t
other treatment, SE(d)= 2 . The SE(d) formula for other
r (r 1)(t 1)
The procedure is to first obtain the estimate of missing value X, by the formula,
t ( R1 C1 T 1 ) 2G1
X
(t 1)(t 2)
where t is the number of treatments
R1 is the total available observations in the row with missing value
C1 is the total available observations in the column with missing value
T1 is the total available observations for the treatment with missing value
G1 is the grand total of all the available observations
The estimated missing value is then inserted and the analysis is carried out according
to the usual procedure for a LSD except for subtracting 1 degrees of freedom each from
the degrees of freedom for total sum of squares and error sum of squares.
20
The upward bias in SST is computed by using the formula
2
G1 R1 C1 (t 1)T 1
upward bias= .
t 1 t 2
2
The SE of the difference between the mean of the treatment with missing value and the
MSE t
mean of any other treatment is given by SE(d)= 2 .
t (t 1)(t 2)
Analysis of covariance
In many experiments for each experimental unit we have observation on one more
supplementary variable in addition to the response variable. They are usually
concomitant observations or concomitant variables. If the concomitant variables are
unrelated to the treatments and influence the response variable, the variation in
response variable caused by them should be eliminated before comparing the
treatments. For example, in animal feeding experiments designed to compare the effect
of different diets on growth, the initial weight of animal is expected to affect the
increase in weight recorded at the end of the experimental period and therefore
comparison of diets should be made after the variation in weight increase resulting
from the difference in initial weights of animal has been eliminated.
The technique of analysis used to eliminate the variation resulting from the influence
of concomitant variable on the response variable is called Analysis of covariance. It is
an adaptation of methods of regression analysis to experimental designs and consists
essentially of fitting a regression of response variable on the concomitant variables. The
response variables are then ‘adjusted’ by regression and comparison of treatments are
carried out by using the adjusted response variables. At the same time, variation due to
non-uniformity of the experimental material is eliminated by the use of suitable design.
21
Analysis of covariance for CRD
For the CRD with t treatments, r i replication for ith treatment,i=1,2,3,…..t where ri
=n , the model is
yij ti ( xij x ) eij
where yij is the jth observation on the response variable taken under ith treatment,
- is the general effect, ti- ith treatment effect, is linear regression coefficient
indicating the dependency of yij on xij , xij is the observation on the concomitant
variable corresponding to yij, x -is the mean of the xij values and eij is the random error
component which is independently and normally distributed with mean zero variance
2.
The data arrangement is as given below
Treatments
1 2 …. t
y x y x y x
y11 x11 y21 x21 yt1 xt1
y12 x12 y22 x22 yt2 xt2
.
.
.
. Grand Total
Total Ty1 Tx1 Ty2 Tx2 Tyt Txt GTy GTx
The first step in the analysis of covariance is to compute the sum of squares for the
variable (Y) and the covariate (X) as well as the sum of products for Y and X. The
sums of squares for both Y and X are computed in the usual manner for a CRD. The
sum of products (SP) of Y and X is computed as follows:
GTy .GTx
CFyx
n
TyiTxi
Treatment SP = Tyx = CFyx
ri
22
Analysis on Y :
GTy2
CFy=
n
Tyi2
SST=Tyy= CFy
ri
SSE=Eyy=Gyy-Tyy
GTx2
Analysis on X : CFx=
n
Tix2
Txx=SSTx= CFx
ri
E yx
Then the regression coefficient within treatments is computed as . The
Exx
significance of is tested using the F-test. The test statistic F is given by
Mean square of
F=
Adjusted Error mean square
2
E yx
E xx
1
=
2
E yx
E yy
E xx
( n t 1)
23
The F follows a F distribution with 1 and (n-t-1) degrees of freedom. If the regression
coefficient is significant we proceed to make adjustments for the variate. If it is not
significant, it is not worth while to make the adjustments.
The adjusted values for the variable Y are then computed as follows:
G
2
G G yy
1 yx
yy
Gxx
E
2
E E yy
1 yx
yy
Exx
T G E
1
yy
1
yy
1
yy
If F is not significant for treatments, we can conclude that the treatment effect do not
differ significantly. If F is significant, we calculate CD= t x SE(d) where t denote the
table value of t for a specified level of significance and error degrees of freedom.
Then the adjusted treatment means are obtained from the formula
yi1 yi xi x
and the standard error for the difference between two adjusted means is given by
xi x j
2
1 1
SE(d)= MSE .
ri rj Exx
24
When the number of replications is same for all treatments and when averaged over all
values of ( xi x j )2
2MSE Txx
SE(d) = 1 .
r t 1 Exx
Two treatments are significantly different if the difference between adjusted treatment
means is greater than the calculated CD value, other wise they are not significantly
different.
25
UNIT III
Factorial Experiments
Experiments are characterized by the nature of treatments under investigation and the
nature of comparisons required. There are three main types of experiments.
The treatments in a single factor experiments are the different levels of the same
factor. For example, several feeds for animals, different doses of a drug or different
feeds for animals. The main purpose of such experiment is to compare the treatments in
all possible pairs. Thus, when the treatments consists of different levels of a single
variable factor and all other factors are kept at a single prescribed level, it is known as a
single factor experiment.
A factorial experiment is named based on the number of factors and the levels
of each factor. For example, if there are four factors at two levels, the experiment is
known as 24 factorial experiment and if there are two factors at three levels, it is known
as 32 factorial experiment. In general if there are n factors each with p levels , then it is
known as pn factorial experiment. If there are three factors one at two levels, second at
3 levels and third at 4 levels, it is a 2 x 3 x 4 factorial experiment.
26
If the number of levels of each factor in an experiment is the same, the
experiment is called symmetrical factorial; otherwise, it is called asymmetrical factorial
or mixed factorial.
When several treatment combinations are involved, execution of the experiment and
statistical analysis become complex.
27
the letters A and B. The levels of factors can be denoted as 0 & 1. Therefore
combination can be written as a0b0, a0b1, a1b0 & a1b1
or 0 0, 1 0 , 0 1 & 1 1 or I, a, b & ab.
The symbol I denotes that both factors are at the lower level in the combination and is
called the control treatment.
When there are three factors each at two levels, the factorial is denoted by 23 and there
are eight treatment combinations. Factors are denoted by A, B ,& C. The general
factorial with n factors each of two levels is denoted by 2 n. For a 22 or 23 factor any of
the three designs, CRD, RBD or LSD can be used. But their analysis involves some
more partitioning of the treatment sum of squares to obtain the main effect and
interaction variation of the factors.
The main effect and interaction effect in factorial experiment can be computed by
many methods.
Suppose that we have two factors A and B of two levels. Then treatment
combinations are represented as
28
1
= (a 1)(b 1)
2
or = ½[simple effect of B at a1 – simple effect of B at a0]
1
Effect of X = (a 1)(b 1)(c 1)(d 1)... where n is the number of factors , & the
2n1
sign in each bracket is negative if the corresponding capital letter is present in X and
positive if it is absent.
For example, if there are three factors A ,B and C each at two levels, we have
1
Main effect of A (a 1)(b 1)(c 1)
22
1
abc ab ac a bc b c (1)
4
1
Effect of BC (a 1)(b 1)(c 1)
22
1
abc bc a (1) ab ac b c
4
1
Effect of ABC (a 1)(b 1)(c 1)
22
1
abc a b c ab ac bc (1)
4
The sign + is given for an effect if the corresponding small letter is present and – if it
is absent. The sign for interaction is the product of the corresponding signs for the
individual letters. If there is replication, then the divisor will be multiplied by r where r
is the number of replications.
29
For a 2n experiment with r replications,
effect of total of X
2
In the first column, we write the treatment combination in the standard order. In the
second column, against each treatment combination, we write the corresponding total
yields from all replicates. The entries in the 3rd column can be split in to two parts. The
first half is obtained by writing the pairwise sum of observations in the second column.
The second half is obtained by subtracting first observation in the pair from the second
observation. In a 2 n experiment, the procedure is repeated n times. For example, in a 2 2
experiment,
n
2
Analysis
Suppose that we have factorial RBD with 2 factors at 2 levels. Then ANOVA model
is yijk ri a j bk ab jk eijk
where the terms have the usual meaning. The degree of freedom associated with a
factor is equal to its level minus one. For interaction, degrees of freedom will be the
product of degrees of freedom of the individual factors of that interaction.
30
Source Degrees of freedom Sum of squares
Blocks r-1
Treatment
A 1
B 1
AB 1
Error (r-1)(22-1)
Total 22.r-1
2MSE
The S.E(d) of A=
r.2
2MSE
The S.E(d) of B=
r.2
2MSE
The S.E(d) of AB=
r
2MSE
S.E (d) of X= where X is the main factor or interaction, D is the product of
r.D
levels of left out factors in X and r is the number of replication.
3n Factorial
This is a factorial arrangement with n factors at three levels. The three levels of the
factor can be low, intermediate and high. The levels will be denoted as 0 for low, 1 for
intermediate and 2 for high. Factors and interaction will be denoted by capital letters.
Treatment combination is denoted by n digits, where the first digit the level of factor A,
2nd digit that of factor B and so on. For example 102 denote A at intermediate level, B
at low level and C at high level.
In an asymmetrical factorial experiment, all the factors are not at the same number of
levels. Suppose that in an experiment there are p levels of factor A and q levels of
factor B. Then such experiment is a p x q factorial experiment.
31
The ANOVA model for this experiment is
yijk ri a j bk ab jk eijk
where the terms have the usual meaning. TSS and SS due to Replications (SSR) are
found in the usual way. The degrees of freedom associated with a factor is equal to its
level minus one.
In order to compute main effect sum of squares & interaction effect sum of squares in
an asymmetrical factorial experiment, first prepare a two-way table of factors as given
below. The values in the table are the totals of all the treatments over ‘r’ replications
arranged in the form of a table
SSE=TSS-SSR-SSA-SSB-SSAB
32
Source Degrees of Sum of Mean sum of F-Ratio
freedom squares squares
Blocks r-1 SSR MSR MSR/MSE
Treatment
A p-1 SSA MSA MSA/MSE
B q-1 SSB MSB MSB/MSE
AB (p-1)(q-1) SSAB MSAB MSAB/MSE
Error (pq-1)(r-1) SSE MSE
Total pqr-1
Suppose that we have a factorial RBD with three factors A, B, C having p, q and s
levels respectively. There will be pqs treatment combination. The ANOVA model for
this experiment is
The terms ab, ac, bc and abc are interaction effects. The other terms have the usual
meaning The degrees of freedom associated with a factor is equal to its level minus
one.
For interaction the degrees of freedom will be the product of the degrees of freedom of
the individual factors of that interaction.
33
Source Degrees of Sum of Mean sum of F-Ratio
freedom squares squares
Blocks r-1 SSB
Treatment
A p-1 SSA
B q-1 SSB
C s-1 SSC
AB . SSAB
. .
. .
. .
Error (pqs-1)(r-1) SSE
Total Pqsr-1
2MSE
The general formula for SE(d) of X= where X is the main factor or
r.D
interaction
D is the product of levels of left out factors in X & r is the number of replications
2MSE
SE(d) for A=
r.q.s
2MSE
SE(d) for AB =
r.s
2MSE
SE(d) for ABC=
r
Confounding
In factorial experiments, when the number of factors or the levels of factors are
increased the number of treatment combinations increases rapidly. For example, in 24
factorial experiments, there are 16 treatment combinations. So large blocks have to be
used and it will be difficult to ensure homogeneity within the blocks. In such situations,
we use an incomplete factorial which investigates the main effects of the factors and the
more important interactions under uniform condition by suitably subdividing the
experimental material to smaller homogeneous blocks. The heterogeneity of blocks is
allowed to affect only interactions which are likely to be unimportant. The process by
which unimportant comparisons are deliberately confused or mixed up with block
comparisons for the purpose of assessing more important comparison with greater
precision is called confounding. Confounding may also be defined as a technique for
34
arranging complete factorial experiment in blocks where the block size is smaller than
the number of treatment combination in one replicate.
Replication 1
Block 1 Block 2
a ab
b (1)
For a 23 factorial experiment, suppose that each replicate is divided in to 2 blocks of 4
units each, and interaction ABC is confounded, interaction ABC is estimated from
abc+a+b+c-ab-ac-bc-(1)
The two sets are abc, a, b, c and ab, ac, bc, (1)
35
If a treatment effect is confounded in some replications and unconfounded in other
replications, the system is known as partial confounding. For example, consider a 23
factorial experiment with three replications. The arrangement of treatment combination
may be as follows.
The analysis of confounded factorial experiments involves the same principle as all
other factorial analysis. In the source of variation the component, blocks within
replications, is added. Let r be number of replications, b= 2 n-p be the number of blocks
in a replication and t=2p be the number of treatment combination in a block.
36
Ri2
Replication sum of squares = CF
2n
r b Bij2
Block sum of squares =
i 1 j 1 t
CF
Block within replication sum of squares= Block sum of squares – Replication sum of
squares
37
UNIT IV
Split-Plot Design
This enables to test for the effects of the sub plot treatments and interaction of the
whole plot treatments and sub plot treatments more efficiently than the main effects of
main plot treatments. That is the effect of main plot treatments are estimated with low
precision and sub plot treatment and interaction effects are estimated high precision.
yijk=the observation of ith replication, jth main plot and kth sub plot
=over all mean,
ri= replication effect,
mj= jth main plot treatment effect,
eij=main plot error or error (a),
sk=kth sub plot treatment effect,
(ms)jk= interaction effect,
eijk = error component for subplot and interaction or error (b).
The ANOVA will have two parts which correspond to the main plots and sub
plots. For the main plot analysis, replication X main plot treatment table is formed.
From this two way table, sum of squares for replication, main plot treatment & error (a)
are computed.
38
Replication Main Plot treatments
1 2 …… m Total
1 y11. y12. . . . y1m y1..=R1
2 y21. y22. y2m y2..=R2
. .
. .
. .
. .
. .
. .
r yr1. yr2. . . . yrm. yr..=Rr
yi..2 i Ri2
Replication sum of squares (SSR) = CF CF
sm sm
y.2j . M 2j
Main Plot treatment sum of squares (SSM)= CF CF
rs rs
Main plot error sum of squares, SSE(a)=Total sum of squares for MxR table –
Replication sum of squares - Main plot treatment sum of squares
For the analysis of sub plot treatments, main plot x sub plot treatment table is formed
Main Plot Sub plot
1 2 . . . s
1 y.11 y.12 y.1s y.1.
2 y.21 y.22 y.2s y.2.
.
.
.
m y.m1 y.m2 . . . y.ms y.m.
y..1 y..2 y..s
s1 s2 ss
From the table, sum of squares for subplot treatments and interaction between main plot
and sub plot treatments are computed. Error (b) sum of square is found out by residual
method.
39
y. jk 2
Total sum of squares for table (M x S) = CF
r
Sk2
Sum of squares due to sub plots, SSS = CF
mr
Sum of squares due to interaction, SSE= Total sum of squares (M x S) – Main plot Sum
of squares – Sum of squares due to sub plots
The ANOVA table for a split-plot design in Randomized blocks is given in the
following table
sub plot
Error(b) m(r-1)(s-1) SSE(b) MSE(b)
Total rms-1
2MSE (a)
CD = t r 1 m1
rs
2MSE (b)
CD = tm ( r 1)( s 1)
rm
40
For comparing two sub plot treatment means at a given main plot treatment,
2MSE (b)
CD = tm ( r 1)( s 1)
r
For comparing two main plot treatments either at a given sub plot treatment or at
different sub plot treatments
Strip-plot design
If two factors are involved and if both the factors require large plot sizes, it is difficult
to carry out the experiment in a split plot design. In some other situations a higher
precision may be required for the interaction than the precision for the two factors. The
strip-plot design is suitable for such experiments. It is also known as split-block design.
In strip-plot design each block is divided in to number of vertical and horizontal strips
depending on the levels of the respective factors. The vertical strip treatments are laid
out either in randomized blocks or in Latin square. The intersection of plots provides
information on the interaction of the two factors.
Replication 1 Replication 2
a0 a2 a1 a3 a2 a0 a3 a1
b1 b1
b0 b2
b2 b0
For example, for factors like spacing and ploughing, a block may be divided in to strips
in one direction to be allotted for one set of treatments, say different spacing and in to
another set of strips, in direction right angles to the first, to be allotted for second set of
treatment, say ploughing. The allotment of the treatment to the strips at each stage has
to be made at random. The analysis of strip plot design is carried out in three parts. The
first part is the vertical strip analysis, the second part is the horizontal strip analysis;
41
and the third is the interaction analysis. Suppose that A & B are the vertical strip and
horizontal strip factors, respectively. The data are rearranged in A x Replication table,
B x Replication table and A x B table. From A x Replication, sum of squares for
replication, A and error (a) are computed. From B x Replication table, the sum of
squares for B and error (b) are computed; and from A x B table, A B SS is computed.
The ANOVA table is formed with these results
2MSE (a)
SE(d) for A=
rb
2MSE (b)
SE(d) for B=
ra
42
Incomplete Block Design (IBD)
Balanced IBD
When the number of replications of all pairs of treatments in a design is the same,
then the design is known as BIBD. The design ensures equal precisions of the estimates
of all pairs of treatment effect.
The quantities t,b,r,k and are called parameters of BIBD. The necessary relationships
between the parameters of BIBD are
(i) rt=kb
(ii) (t-1)=r(k-1)
(iii) b t ; r>
Lattice design
A balance two dimensional (design) with k2 treatments having one restriction is called
simple lattice. Also in this design, the treatments should be assigned to each block in
such a manner that is same for all pairs of treatments. For a design to be balanced,
minimum required number of blocks is k(k+1). Thus, at least k+1 replications are
needed for a k2 simple lattice. This property of having separate replications for BIBD
holds only when t is a multiple of k and especially for lattice square design. Most of the
BIB design do not hold this property. In general, in an m-dimensional balanced lattice
design, the number of treatments is km, where k is a prime number or prime power.
43
Response surface methodology
Y=f(N, P, K)+e
is the required model relating the observed response to the levels of input factors. In
general, if we have r input factors (variables) X1, X2,…..Xr, the model can be written as
Y=f(X1, X2,…..Xr)+e
There are two uses of response surface. First, it has been applied to describe how the
response is affected by a number of quantitative variables over some already chosen
levels of interest. The second use is to locate the neighborhood of maximal or minimal
response. In agriculture experiments, it is of interest to determine optimum level of a
factor or combination of factors that will maximize the yield. When the response is cost
of production per unit of output, the objective may be to minimize the response.
Cross over Designs are used in situation in which treatment are applied in sequence
over several periods to a group of individual items and the number of experimental
units may be less than number of observation. The design has been used for comparing
two or four treatments in dairy husbandry and other biological studies. The cross over
design has two restrictions imposed on randomization of the treatments to the
experimental units. The treatments are all included in each replicate or group. The
experimental units are rated with regards to time of application in each replicate or
group. The second restriction is that each treatment must be applied an equal number of
times in each period or time in the replicates.
44
For example, suppose that we have to compare the effect of two feeding rations, A & B
on the amount and quality milk produced by the cow. Since cows vary greatly in their
milk production, each ration is tested on every cow by feeding it either the first or the
second half of the period of lactation, so that each cow gives a separate replicate. The
rations are allotted to the periods at random with the restriction that half of the cows
receive first ration and the other half receives 2nd ration B in the period 1 and cows
receiving A receive B in period 2.
The experimental design for the six replicates (six cows) is of the following
Cows or Replication
Rows 1 2 3 4 5 6
Period I B B A A B A
Period II A A B B A B
If the above design were applied to an experimental situation which require a separate
experimental unit for each replicate, the analysis would be the same as given alone. For
example, suppose that two treatments A and B are applied to dairy cows, that treatment
period is used, that twelve cows are paired in to 6 pairs with each member of pair being
rated as superior or inferior and that one half of the superior and one half of the inferior
cows receive treatment B. The experimental design might be of the following form
Cows or Replication
Rows 1 2 3 4 5 6
superior B B A A B A
Inferior A A B B A B
The cross over design may be used for any number of treatments with the condition that
the number of replicates must be a multiple of the number of treatments.
TRANS FORMATIONS
3. The errors are distributed normally with mean zero and common variance 2.
When the above assumptions of ANOVA are violated we have to transform the data.
45
Whenever t he st andard deviat ions of samples are roughly
proport io nal t o t he means, an effect ive t ransfor mat io n may be a lo g
t ransfor mat io n. Frequenc y dist r ibut ions skewed t o t he r ight are o ft en
made mor e s ymmet r ic al by t ransfor mat ion t o a logar it hmic scale. While
logar it hms t o any base can be used, commo n lo gar it hms ( base 10) or
nat ural logar it hms ( base e) are generall y t he mo st convenient . T he
presence o f mult iplicat ive effect s and a rough proport ionalit y bet ween
st andard deviat io ns and means suggest t hat a logar it hmic t ransfor mat io n
ma y be appropr iat e. For example, a log transformation is often appropriate
when the dependent variable is a concentration. This cannot be less than zero, and may
have several moderately high observations, but may have a small number of very high
values. Taking logs (one can be added to each observation, if some are zero) often
normalizes the data.
Whenever t he response var iable is a count of relat ively r are event s
(e.g. insect count s on a leaf, blood cells wit hin a gr idded regio n o f a
hemat ocyt omet er, et c.) t he dat a t end t o fo llow a specia l dist r ibut io n
called a Poi sson di st ribution . In such sit uat io ns square root
t ransfor mat io n is used. It is bet t er to use ( y+0.5) inst ead o f ( y). I f
t here is negat ive values in t he dat a, use appropr iat e co nst ant to make it
posit ive by adding it t hrough out . For example, counts such as the numbers
of cells in a haemocytometer square, can sometimes produce data which can be
analysed by the ANOVA. If the mean count is low, say less than about five, then the
data may have a Poisson distribution. This can be transformed by taking the square root
of the observations.
Another kind of data that may require transformation is that based on counts
expressed as percentages or proportions of the total sample. Such data generally
exhibit what is called a binomial distribution rather than a normal distribution. One of
the characteristics of such a distribution is that the variances are related to the means.
In such situation we go for arc sin transformation to the square root of proportion or
percentages. i.e., sin-1(p). It is used to stabilize the data when observed proportions are
in the range of 0 to 30% or 70 to 100%. When the data contains 0 or 1, transformation
is improved by replacing 0 by (1/4n) and 1 by (1-(1/4n)) before taking angular values,
where n is the number of observations based on which p is estimated for each group.
46
A logit transformation {loge(p/(l-p))} where p is the proportion, will often correct
percentages or proportions in which there are many observations less than 0.2 or greater
than 0.8 (assuming the proportions cannot be < 0 or > 1)
When the treatment S.D. are proportional to square root of means, the appropriate
transformation is x to 1/x. It is mostly used when time is the independent variable.
When a transformation has been made, the analysis is carried out with the
transformed data. The conclusions are drawn from such analysis. However, while
presenting the results, the mean and standard errors are transformed back in to original
units. While transforming back to original units, some corrections have to be made. In
case of logtransformed data, if the mean value is X , the mean value of original units
will be Y anti log( X 1.15V ( X )) , where V ( X ) is variance of the mean X .
47