AgStat 2.22019 Mannula PDF
AgStat 2.22019 Mannula PDF
PRACTICAL MANUAL
AG. STAT. 2. 2
STATISTICAL METHODS
SECOND SEMESTER B.Sc.(Agri.) CLASS
Name :
University Seat No. :
Registration No. :
Prepared By :
Dr. Alok Shrivastava, Dr. Y A Garde Dr. H. R. Pandya
1
DEPARTMENT OF AGRILCULTURAL STATISTICS
COLLEGE OF AGRICULTURE
Bharuch
CERTIFICATE
Date : - - 201
Publisher:
Principal and Dean
College of Agriculture
Navsari Agricultural University
Campus Bharuch
Bharuch - 392012
Edition: FIRST
Year : 2018
FORWARD
Uncertainty and variation are two major components which governs the laws of
nature. Because of analytical power of the statistical science under the above situation, it has
been widely used in diverse field to analyze the behavior and to increase the precision in
findings. Hardily, there is any branch of science where statistical methods are not in use. The
Agriculture and related field are such fields which have led to the development and
discovery of so many statistical theories in increasing the precision of inference.
The Manual on Statistical Methods is intended to be a source of reference for
Students of undergraduate especially of this college will get benefit in the field of
Agriculture, livestock, Horticulture, Forestry & other allied discipline as well as researchers
and extension workers to get some basic concept of Statistics to get in to the matter deeply.
Though, a challenging tasks, I am happy that Dr. Alok Shrivastava has taken steps in this
direction.
As a matter of fact, explanation uses examples & and inference drawn in each and
every chapter of this Practical Manual/Notes will help the user profusely. I hope this
collection will draw the huge attention of students, researcher and other users of diverse
agriculture and allied fields as guide to solve their varieties of real life problems.
(K.G.Patel )
(As per 5th Dean Committee Recommendation)
: SYLLABUS:
Theory:
5 Problems on Normal
Distribution
6 Problems on Large
Sample Test (Z-test)
7 Problems on Small
Sample Test (t-test)
8 Problem on F-test
9 Problem on 2 test
10 Problems on Correlation
and Regression
11 Problems on Rank
Correlation
12 Completely
Randomized Design
(CRD)
13 Randomized Block
Design (RBD)
14 Latin Square Design
(LSD)
15 Simple Random
Sampling with and
without replacement
16 APPENDICES
Statistical Table
**************
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[A single death is a tragedy. A million deaths is a statistic. @ Joseph Stalin]
------------------------------------------------------------------------------------------------------------
Introduction
The term “statistics” is used in two senses : first in plural sense meaning a
collection of numerical facts or estimates—the figure themselves. It is in this sense
that the public usually think of statistics, e.g., figures relating to population, profits
of different units in an industry etc.
Secondly, as a singular noun, the term ‘statistics’ denotes the various methods
adopted for the collection, analysis and interpretation of the facts numerically
represented. In singular sense, the term ‘statistics’ is better described as statistical
methods. In our study of the subject, we shall be more concerned with the second
meaning of the word ‘statistics’.
Definition
Statistics has been defined differently by different authors and each author has
assigned new limits to the field which should be included in its scope. We can do no
better than give selected definitions of statistics by some authors and then come to
the conclusion about the scope of the subject.
A.L. Bowley defines, “Statistics may be called the science of counting”. At
another place he defines, “Statistics may be called the science of averages”. Both
these definitions are narrow and throw light only on one aspect of Statistics.
According to King, “The science of statistics is the method of judging
collective, natural or social, phenomenon from the results obtained from the analysis
or enumeration or collection of estimates.
Frequency Distribution & Frequency Table
Types of data: There are two types of data (a) Primary data (b) Secondary data
Collection of primary data: Primary data are collected through following methods
(i) Direct Personal Investigation
(ii) Indirect oral investigation
(iii)Information through correspondents
(iv) Information through schedules to be filled in by informants
(v) Information through Schedules in Charge of Enumerators
Classification: The process of arranging data in groups or classes according to resemblances
and similarities is called classification.
Objectives of classification: (i) To make data easy and comprehensive (ii) To clear
similarities and dissimilarities (iii) To support comparison (iv) To make scientifically
reasonable arrangement (v) to make basis for tabulation.
Types of Classification: (a) Qualitative (b) Quantitative
Tally mark: A bar (|) put against the class/variable for its occurrence is called a tally marks.
Frequency: Total number of tally marks put against a particular class/ variable is called its
frequency.
Frequency distribution: Arrangement of classes of values and their frequencies in a
systematic manner is called frequency distribution.
Frequency distribution table: A table showing the distribution of the frequencies in the
different classes is called a frequency distribution table.
Types of class:
(a) Inclusive classes __ In which both the upper and lower limits are included, and
(b) Exclusive classes __ In which upper limit of the class is not included in the class.
------------------------------------------------------------------------------------------------
Classification and tabulation of data: The process of reduction of data to a manageable
size is called classification OR The process by which the data are arranged in groups or
classes according to similarities is known as classification and the process by which the
classified data are presented in an orderly manner by being placed in proper rows and
columns of a table in order to bring out their essential features or characteristics is known as
tabulation.
Objectives of classification:
1. To reduce data in groups/classes according to similarity.
2. To facilitate comparison through statistical analysis.
3. To point out most significant features of the data at a glance.
4. To give importance to a particular item by dropping out the unnecessary elements.
5. To enable a statistical treatment of the material collected.
Types of classification:
1. Geographical: When the classification is made on area basis e.g. district. taluka, city,
2. Chronological: When the classification is made on the basis of time e.g. production of
wheat in past 10 years.
3. Qualitative: When classification is made on the basis of some attributes. This
classification is further divided into four types.
(A) Simple classification: Only one attribute is considered e.g. blindness or sex.
(B) Two way classification: Two attributes are considered e.g. blindness & deafness,
colour & shape of flowers.
(C) Three way classification: Three attributes are considered e.g. sex, education level
and residing location.
(D) Manifold classification: More than three attributes are considered.
e.g.
Classification
One way Two way Three way
Sex Sex & Marital Sex, Marital status &
status Education level
High
Married Medium
Low
Male
High
Unmarried Medium
Low
Population
High
Married Medium
Low
Female
High
Unmarried Medium
Low
4. Quantitative: When the classification is made in the form of magnitude e.g. cows are
classified according to milk yield. This classification is further divided into two types.
(A) Discrete classification: Specific value in the range is considered e.g.
no. of petal, no. of insects etc.
(B) Continuous classification: Any value in the range of variation is
considered e.g. length, width etc.
FREQUENCY DISTRIBUTION
Objectives:
1. To condense the mass of data in such a manner that similarities and dissimilarities can be
easily understand.
2. To enable statistical treatment to the data collected.
Frequency: The no. or individual of items occurring in each class is termed as frequency.
Frequency distribution: The manner in which the frequencies are distributed over the
different class is called frequency distribution of the character under study and the table
indicating frequency distribution is called frequency table.
Class limit: It is the lowest and highest values of the distribution that can be included in the
class e.g 10-20, 20-30 etc. Two boundaries of a class are known as the lower limit and upper
limit of a class.
Class interval: The width of a class that is the difference of upper and lower limit of the
class is known as class interval.
Class mid point: It is the value lying half way between the lower limit (LL) and upper limit
(UL) of a class interval i.e. (LL + UL)/2.
Points while deciding class interval/classes:
1. It should be of uniform width which facilitates the statistical computation.
2. Range of the class should cover the data and should be continuous.
3. It should be convenient to make the mid-point of a class.
4. It should not be over lapping.
Types of frequency distribution:
(1) Discrete frequency distribution
(2) Continuous frequency distribution
Methods of classifying the data according to class interval:
Exclusive method: When the class intervals are so fixed that the upper limit of one class is
the lower limit of the next class. This method is known as exclusive method e.g. -10, 10-20.
Usually this method is preferred for continuous type of data. The data observed up to 9.99
would be included in 0-10 class while 10 or greater than 10 will be included in 10-20 class.
Inclusive method: In this method of classification, the upper limit of one class is included in
that class itself e.g. 100-199, 200-299. The value of 100 and 199 will be included in the class
of 100-199. This method is preferred for discrete type of data.
Procedure to form frequency distribution:
Step 1: Find range of the data. Range = Highest value – Lowest value.
Step 2: Fix the number of classes. Number of classes should preferably between 5 to 15
and should not be less than 5 and more than 30.
Approximate no. of classes = K =1 + 3.322 log N (Sturge’s rule) where N = no. of
observations under study.
Step 3: Fix the class interval = CI = Range/No. of classes or (L-S)/K where L = largest
value and S = smallest value
Step 4: Arrange different classes in ascending order of magnitude
Step 5: Pick up the values of observation and make tally mark against respective classes.
Step 6: Find total tally mark of each class which will give the no. of frequencies in the
respective classes.
1.1 The following data represent the milk yield in lits/day of surti buffaloes.
5.5 ,6.7,8.9,9.6,12.5,12.0,13.7,14.5,10.8,9.6 (ungroup data.)
1.2 Find out the frequency distribution table from the following data.
4,7,6,5,4,2,10,1,3,6,4,3,5,7,8,3,4,5,2,6,4,7,5,4,1,6,2,5,4,10,4,5,2,7,4,5,4,8,5,4,7
,9,5,6,5,9,6,5,7,0,8,1,5,6,3,9,3,1,5,3,7,8,5,3,0,6,8,4,3,2 (Discrete frequency
distribution)
Marks (X) Tally marks Frequency (fi)
0 ││ 2
1 ││││ 4
2 ││││ 5
3 ││││ │││ 8
4 ││││ ││││ ││ 12
5 ││││ ││││ ││││ 14
6 ││││ │││ 8
7 ││││ ││ 7
8 ││││ 5
9 │││ 3
10 ││ 2
Total 70
1.3 The following data represent the fat yield in kg/faction of buffaloes of Mehsana
breed. Prepare the frequency distribution table by using the 160-170,170-180,
etc. class interval.( Continuous frequency distribution)
295 199 195 192 209 197 200 189 177 195
169 205 202 204 165 206 207 189 201 203
187 208 191 203 226 212 172 182 207 217
213 221 214 222 244 221 180 229 215 219
216 223 231 253 225 237 225 230 227 228
218 234 240 260 267 242 243 232 236 279
251 252 261 268 246 284 283 257 233 162
262 178 247 273 239 259 231 173 245 266
270 175 212 211 214 196 215 243 216 244
274 194 225 223 224 226 228 255 227 198
248 218 233 234 235 238 242 265 258 211
252 235 249 254 255 241 256 271 264 272
224 286 253 263 264 254 269 285 282 281
210 298 188 175 276 277 278 289 287 288
182 170 299 208 181 193 271 205 221 291
Some Preliminary
[ I could prove God statistically ~ George Gallup]
------------------------------------------------------------------------------------------------------------
Objective: To Compute Arithmetic Mean for Grouped and Un-Grouped data.
Arithmetic Mean: The arithmetic mean of a set of observations is the quantity obtained by dividing the
sum of the values of the observations by their number.
The arithmetic mean of observations is given by
In case of grouped data is taken as the mid value of the corresponding class.
In a frequency table if the class intervals are of equal width, say , then it is convenient to use this class
width as a devisor to make the calculations simpler.
(i) Select an assumed mean and take the deviations of the given values from . Now divide the
deviations by an arbitrary point (preferably class width).
(iv) Multiply by the divisor and add it to the assumed mean . The resulting value is the required
mean.
This result shows that if each score X is multiplied by a number , the net effect of this operation is to
multiply the mean by . This process is known as step-deviation method.
Combined mean: If are the means of distributions whose corresponding sizes are
then the mean of combined distribution on combining the distributions is given
by
Properties of mean: (1) The sum of deviation from mean is always zero. (2) Sum of square of deviation
from mean is least. (3) Mean is not independent of change of origin and scale.
Central tendency: Generally it is found that in any distribution, values of
the variable tend to cluster around a central value or centrally located
observation of the distribution. This characteristic is known as central tendency.
This centrally located value which represents the group of values is termed as the
measure of central tendency e.g. an average is called measure of central tendency.
Objectives
1) To get one single value that describe the characteristics of the entire
series/group.
2) To compare two or more distributions.
X i
X i1
n
(ii) Assumed mean method :
n
d i
X A i1
di = Xi - A ( A = Assumed mean )
n
Grouped data
f X i i
X i 1
k
f
i 1
i n
fd i i
X A i1
di = Xi - A
n
A = Assumed mean
(iii) Step deviation method
fd i i
X A i1
I
n
Xi A
where, dxi , A = Assumed mean , I = Class interval.
I
Xi = Class mid value.
EXERCISE NO. 2
Problems On Measures Of Central Tendency
------------------------------------------------------------------------------------------------
2.1 Workout mean from the following information.
5.5, 6.7, 8.9, 9.6, 12.5, 12.0, 13.7, 14.5, 10.8, 9.6
.
2.2 Calculate mean from the following distribution.
Marks No of Xi Ci fixi
Students (fi)
00-20 10
20-40 22
40-60 36
60-80 f4
80-100 f5
Total 100
2.4 Workout mean value from the following information.
Sr. No. Course No. Credits (wi) Grade Point (Xi) wiXi
1 Eng. 1.2 2 5.4
2 Agron. 1.2 5 6.6
3 Ag. Chem.1.2 5 6.3
4 Ag. Bot.1.2 3 6.7
5 Pl. Path. 1.2 3 7.6
6 Hort. 1.2 2 8.1
Example : The microbial count setting per pettry plate of a milk sample are given below.
Find the average no of micriob.
No. of plate 1 2 3 4 5 6 7 8 9 10
No. of microb 6 16 25 38 85 108 100 65 28 9
Solution
No. of seeds
No. of pods (f)
(X)
1 6 6 Here
2 16 32
3 25 75
4 38 152
5 85 425 Since seed cannot be in fraction, therefore we
6 108 648 can say average number of seeds per pod of the
7 100 700 given variety
8 65 520 =6
9 28 252
10 9 90
Total 480 2900
Example: Compute the average income (Rs.) of dairy farmers of a village from the
following distribution of sale of milk and milk products per day.
Income(Rs.) 90-110 110-130 130-150 150-170 170-190 190-210 210-230 230-250
No. of farmers
15 42 60 64 112 174 150 83
Solution:
No. of Dairy Mid Here, assumed mean
Income farmers value and divisor (class interval)
Exercise : In an agribusiness company, the average monthly income of 18 field worker and 5
office assistants at Raipur centre is ` 20000.00 and 17000 and 32 field worker and 10 office
assistants at Bilaspur centre is ` 16500.00 and 14500.00, respectively. Find the average
monthly income of all the workers in the company.
Solution:
Objective: To Compute Median for Grouped and Un-Grouped data.
Median: The median is that value of the variable which devides the group into two equal parts, one part
comprising all values greater and the other all values less than the median. (Connor)
Median of ungrouped data: Let be the number of values of the variate then arrange the series in an
order (ascending or descending) then
Median of ungrouped data: The class corresponding to the cumulative frequency just greater than
is called the median class and the value of median is obtained by
Median can also be located graphically by making ‘more than’ or/and ‘less than’ cumulative frequency
curve. Different steps of this method are
(i) Draw an ogive (Cumulative frequency curve).
(ii) Mark median point along -axis.
(iii) Draw a line from point parallel to -axis meeting the curve at .
(iv) Draw a perpendicular on -axis from point .
(v) The point of intersection of -axis will be the median.
Different steps for finding median when both ‘more than’ and ‘less than’ cumulative frequency are
drawn.
(i) Plot ‘less than’ and ‘more than’ ogive.
(ii) The intersecting point is .
(iii) Draw a perpendicular on -axis from point . The point of intersection of -axis will be the
median.
Uses of median: Median is useful when information is desired on the relative position and is the most
appropriate average in dealing with rates, ranks, scores and items that are not counted or measured, i.e. it
is very much useful when data are to be measured qualitatively and in group e.g. it is most suitable
measure for comparing data on health, intelligence, honesty etc. It is especially useful and more
representatives when the distribution is highly skewed or open ended.
Merits (i) It can be located by inspection. (ii) It is easy to understand and computation. (iii) Its value is
not affected by extreme values. (iv) It can be calculated for distributions with open end classes. (v) It is
suitable for qualitative studies. (vi) It can be exactly computed.
Demerits: (i) Necessitates arraying of data before it can be found. (ii) In case of even number of
observations, median cannot be determined exactly. We merely estimate it by taking the mean of two
middle terms. (iii) As compared with mean, it is much affected by fluctuations of sampling. (iv) Median
may not be representative, if the distribution is irregular and abnormal. (v) It is not amenable to
algebraic and arithmetical treatment.
Example 1: Calculate median from the following data
(i) 18 22 6 25 32 35 15 50 45 43
(ii) Class 30-35 35-40 40-45 45-50 50-55 55-60 60-65
Frequency ( 14 16 18 23 18 8 3
Solution: (i) Arranging the term in ascending order
6 15 18 22 25 32 35 43 45 50
Here therefore the median will be the mean of the and terms
Median
(ii)
Class Cumulative frequency Median item
30-35 14 14
35-40 16 30 item lies in class 45-50, therefore this
40-45 18 48 is median class. Hence
45-50 23 71 , , ,
50-55 18 89
55-60 8 97
60-65 3 100
Total 100
Example 2: Given the following frequency distribution with some missing frequencies. If the total
frequency is 685 and median is 42.6, then find the missing frequencies.
Class 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 180 - 34 180 136 - 50
Solution: Let the missing frequencies be and respectively.
Class Cumulative frequency We have
10-20 180 180
20-30 180 +
30-40 34 214 + As median = 42.6 lies in the class 40-50,
40-50 180 394 + median class is 40-50.
50-60 136 530 + We have , ,
60-70 530 + + ,
70-80 50 580 + +
Total 685
or
As frequency of an item is always a whole number, we take
Example 3: Find median of the following data, graphically.
Solution :
50-60 60 7 56
Mode: In grouped data, if the series is inclusive series, the first step is to convert it into
exclusive series. The class which is having maximum frequency is known as modal class. If the
maximum frequency occurred in more than one class, the modal class is found out through the
method of grouping. Once the modal class is selected, the mode of the distribution is computed
through the following formula
where, Class interval
lower limit of modal class Frequency of class preceding the modal class
Frequency of the modal class Frequency of class succeeding the modal class
In some situations when mode is ill defined, following empirical relationship should be used
or
Mode can also be located graphically by drawing histogram of the frequency distribution.
Different steps are as follows
(i) Draw histogram.
(ii) Join the right corner of modal class’s rectangle to right corner of previous class rectangle
and left corner to left corner of succeeding class rectangle.
(iii) Draw perpendicular on -axis from the intersecting point meeting -axis at .
(iv) is the mode of the distribution.
Maximum frequency is not always a correct indication of mode. The concentration may be
around two or more points. In these cases, we have to find the point of maximum concentration.
In determining the point, we use the method of grouping.
The series, in which the concentration of items is around two or more than two values, are
called bimodal, trimodal or multimodal series depending upon the number of values around
which items concentrate.
Method of grouping is also used to determine the mode for the distribution in which the
maximum frequency occurs in the very beginning or at the end of the distribution.
Merits: (i) It can be located merely by inspection. (ii) It is easily comprehensible and commonly
used. (iii) It is not affected by extreme variation. (iv) Open end classes also do not pose any
problem in the location of mode. (v) Mode can be conveniently located even if the frequency
distribution has classes of unequal intervals provided the modal class and the class preceding
and succeeding it are of same magnitude.
Demerits: (i) It is not based on all the observations. (ii) It is not suitable for further
mathematical treatment. (iii) It is ill defined, and not always possible to find a clearly defined
mode.
Uses of mode: Mode is very much useful for dealing with quanlitative data. Mode is widely
used in business, forecasting weather changes and in biological studies, and for market studies.
Example 1 Find mode of the following data set.
26 23 24 20 18 26 24 20 24 19 25 24 24 28 30
25 32 30 24 18 20 24 30 22 24 22 28 30 24 18
Solution Arrange the data in an order (either ascending or descending)
18 18 18 19 20 20 20 22 22 23 24 24 24 24 24
24 24 24 24 25 25 26 26 28 28 30 30 30 30 32
Make a frequency table
Value 18 19 20 22 23 24 25 26 28 30 32
Frequency 3 1 3 2 1 9 2 2 2 4 1
Here 24 repeated maximum number of times.
Mode = 24.
Example 2: The agricultural holdings of 362 families of a village are given below. Find out the
modal size of holdings.
Holdings (ha) 0-5 5-10 10-15 15-20 20-25
No. of families 25 36 180 89 32
Solution
Holding No. of families Here modal class is 10-15 as it has the maximum frequency
(180). Therefore
0-5 25
5-10 36
10-15 Mode
180
15-20 89
20-25 32 ha
Total 362
Example 3: Find the mode of the following frequency distribution through graphical method.
0-10 10-20 20-30 30-40 40-50 50-60 60-70
5 10 12 15 13 8 4
Solution:
Exercise 3: Find mode from the following table by the use of graph and check the results by
calculations.
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 2 18 30 45 35 20 6 3
Solution:
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[I can prove anything by statistics - except the truth. ~ George Canning]
------------------------------------------------------------------------------------------------------------
Definition
Dispersion may be defined as the extend of the slatterns of observations around a
measure of central tendency and a measure of such scatter is called measures of dispersion.
1) Range.
2) Absolute mean deviation or Absolute deviation ( A.D.)
3) Standard deviation ( S)
4) Variance (S)2
5) Standard error of mean ( SEm.)
6) Coefficient of variation ( C.V.%)
Definition
"It is a square root of a ratio of sum of square of deviation calculated from arithmetic
mean to the total number of observations minus one ."
Properties of standard deviation: (i) Standard deviation is based on all observations of the
distribution. (ii) It is dependent of change of scale. It changes as much as the scale is
changed. (iii) It is independent of change of origin. (iv) It is always positive. (v) It is affected
by extreme values.
Combined variance: If and are the means of two series of size n1 and n2 with variance
and , respectively. Then the formula of variance of the series formed by adding the two
given series is given by
where , and
Standard deviation is the best measure of dispersion because of following reasons:
(i) Arithmetic mean, which is the best measure of central tendency, is used for
computing standard deviation.
(ii) It is based on all observations.
(iii)It is not much affected by fluctuations of sampling.
(iv) Algebraic sign is not ignored for computing standard deviation.
(X i X) 2
S i1
n 1
(2) Variable square method
2
n
n
X Xi
2
i
i1
n
S i1
n 1
f (X i i X) 2
S i1
n 1
k
where n fi ; fi= Frequency of ith class
i1
2
k
k
f i X fi X i
i1
2
i
n
S i1
n 1
(3) Assumed mean method
2
k
k
f i d fi di )
i1
2
i
n
S i1
n 1
2
k
k
fi d fi d xi )
i1
2
xi
n
S i1 I dxi = (Xi - A)/I
n 1
Variance
Variance is the square of standard deviation. It is also called the “Mean square
deviation". Its being used very extensively in analysis of variance of results from field
experiment. Symbolically denoted by
S2 = Sample variance and 2 = Population variance.
Method of computation Raw data or ungrouped data
(1) Deviation method
(X i - X) 2
S2 i1
(n 1)
(2) Variable square method
2
n
n
X Xi
2
i
i1
n
SS
S 2 i1
(n 1) df
2
n
n
d di
2
i
i1
n
S 2 i1 di = Xi - A
(n 1)
A = Assumed mean
Grouped data or frequency distribution
f (X
i1
i i - X) 2
S2 =
(n 1)
(2) Variable square method
2
k
k
f i X fi X i
i1
2
i
n
S 2 i1
(n 1)
(3) Assumed mean method
2
k
k
fi d fi di
2
i
i1
n
S 2 i1 di = (Xi - A)
(n 1)
A = Assumed mean
fi = Frequency of ith class
(4) Step deviation method
2
k
k
f i d fi d xi n
i1
2
xi
S 2 i1 I2 dxi = (Xi - A)/I
(n 1)
Standard error of mean ( SEm.)
The standard deviation is the standard error of a single variate where as standard error
of mean is the standard deviation of sampling distribution of the sample mean OR it refers
to the average magnitude of difference between the sample estimate and population
parameter taken over all possible samples from the population.
Definition
It is defined as square root of the ratio of the variance to the total no. of observations in a
given set of data.
For statistical analysis work the use of Sx is common. It is also used to provide confidence
limit on population mean and for test of significance.
fd i i
(1) Mean X A i1
I
n
(-25)
= 425 + -------- x 50
125
= 425-10
= 415 milk yield / liters
2
k
k
f i d
2
xi fi d xi )
i1
n
(2) S i1 I
n 1
501-5
= -------- x 50
124
496
= -------- x 50
124
= 100
3.4 Compare the variability in intelligence of two classes A and B from the following
information.
X-series Y-series
3.6: A shopkeeper mixes a batch of 200 apples of mean mass 150 g and standard
deviation 30 g with another batch of 300 apples of mean mass 100 g and standard
deviation 20 g. find the mean and standard deviation of the combined batch of 500
apples.
Solution: We have g g
Combined mean
Hence the mean weight of combined batch is 120 g with standard deviation of 34.64 g.
3.7 From the performance of following two plant characters of a rice variety state which
character is more stable?
Penicle length (cm)
15 20 16 22 20 14 21 24 30 18 15 25
100 seed weight
(gm) 28 30 22 32 35 20 28 35 42 28 22 38
Solution:
3.8: In a sensory evaluation experiment, two judges accorded the following ranks to eight
milk products State which judge is more stable?
Judge A 8 7 6 3 1 1 5 4
Judge B 7 5 4 1 3 2 6 8
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[ Facts are stubborn things, but statistics are pliable. ― Mark Twain ]
------------------------------------------------------------------------------------------------
Background: Most of the decision-making situations in business management involve
uncertainty. Since uncertainty is present and is an important aspect in determining the
consequences of various alternative courses of action, it is imperative to get proper
appreciation of it, draw a mathematical picture of it and attempt to measure it in numerical
terms
Basic terminology
1) Random experiment
If in each trial of an experiment conducted under identical conditions, the outcome it
not unique but may be one of the possible outcomes then such an experiments is called
Random experiment.
Example of Random experiment are: tossing a coin, throwing a die etc.
2) Trial and event
Any particular performance of a random experiment is called a trial and outcome as
combination of outcomes are termed as events.
For example (i) If a coin is tossed repeatedly, the results is not unique. We may get
any of two faces. Thus tossing a coin is random experiment and getting head as tail is even.
3) Exhaustive event
The total number of possible outcomes of a random experiment is known as the
exhaustive events or cases.
Example: In tossing of a coin, there are two exhaustive cases viz. head or tail.
4) Favorable event
The number of cases favorable to an event in a trial is the number of outcomes which
entail the happening of an event.
ex. In throwing of two --dice the no. of cased favorable of getting sum as 5 are
(1, 4), (4, 1), (2, 3) i.e. 4
5) Mutually exclusive events
Events are said to be mutually exclusive or incomparable if the happening of any one
of them precludes the happening of all the others, i.e. In throwing a dice all the 6 faces
numbered 1 to 6 are mutually exclusive.
6) Equally likely events
Outcomes of trial are said to be equally likely if taking into consideration all the
relevant evidences. There is no reason to expected one preference to other.
Example: In a random toss of unbiased or uniform coin, head and tail are equally
likely events.
7) Independent event
Several events are said to be independent if the happening of an event is not affected
by the supplementary knowledge concerning the occurrence of any number of the remaining
events.
Example: In tossing an unbiased coin, the event of getting a head in the first toss is
independent of getting a head in the and.
Definitions
Solution:
Exhaustive no. of case = 52 c 3
Favorable number of cases of a king = 4 C1
Favorable number of cases of a queen = 4 C1
Favorable number of cases of a knave = 4 C1
4
c1 4 c1 4 c1 1
P(E)
52 51 50 271
Example. Show that the probability of obtaining a total of 9 in a simple throw with 2
dyes is 1/9.
If two events A and B are mutually exclusive with probabilities P1 and P2 respectively, then
the probability of occurrence of either of them (A or B) is equal to the sum of the individual
probabilities (A and B).
Proof: If an event A can happen in m1 ways and B in m2 ways, then the number of ways in
which either event can happen is m1 + m2. If the number of possibilities is n, then by
definition the probability of either the first or the second event happening is
m1 m2
P(A or B)
n
m1 m2
n n
P(A) P(B) P1 P2
where m1 m2
P(A) P1 P(B) P2
n n
Example: If A is the event drawing an ace from a pack of cards and B is the event drawing a
king, then P(ace = A) = 4/52 and P (Jack = B) = 4/52. The probability of drawing either an
ace or a king in a single draw is
Since both ace and king can not be drawn in a single draw and are thus mutually exclusive
events.
From the above explanation, one can point out two facts. They are:
If A and B are not mutually exclusive events, then the probability of either of them is equal to
the sum of their probabilities less the probability of their simultaneous occurrence.
Symbolically
Since the ace of spade can be drawn. Thus the probability of drawing either ace or a
spade or both is
P(A or B) = P(A) + P(B) - P(A∩B)
= 4/52 + 13/52 -
1 /52 = 16/52
Similarly, we can generalize the rule for more than two events also.
LAW OF MULTIPLICATION
Let n1 and m1 be the possible and favorable numbers of cases for the
event A and n2 and m2 for the event B then
Thus,
m1.m 2 m m
P(A and B, both at a time) 1 . 2 P1.P2
n1.n 2 n1 n 2
Example
Let A be the event “first ball drawn is black” and B the event “second ball
drawn is black”, where the balls are not replaced after being drawn. Here A and B
are dependent events.
EXERCISE NO. 4
PROBLEMS ON PROBABILITY
------------------------------------------------------------------------------------------------------------
4.1 A die is rolled, find the probability that an even number is obtained.
4.2 Two coins are tossed, find the probability that two heads are obtained
4.3 i)A die is rolled, find the probability that the number obtained is
greater than 4.
ii) Two dice are rolled, find the probability that the sum is equal to 5
4.4. If the probability of solving a problem by two students Ram and Shyam
are 1/2 and 1/3 respectively then what is the probability of the problem to
be solved.
4.5 A number is selected from the first 30 natural numbers. What is the
probability that it would be divisible by 4 or 7 ?
4.6 A single card is chosen at random from a standard deck of 52 playing cards.
What is the probability of choosing a king or a club?
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[How do you nurture a positive attitude when all the statistics say you're a dead
man? You go to work: Patrick Swayze]
------------------------------------------------------------------------------------------------------------
NORMAL DISTRIBUTION:
The most important continuous probability distribution used in the entire field of
statistics is normal distribution. The normal curve is bell-shaped that extends
infinitely in both directions coming closer and closer to the horizontal axis without
touching it. The mathematical equation of normal curve was developed by De
Moivre in 1733. A continuous random variable x is said to be normally distributed if it
has the probability density function represented by the equation of normal curve
The normal distribution, also called the normal probability distribution, is most
useful theoretical distribution for continuous variables. The data of many biological
phenomena follow normal distribution. The area under the curve represents the total
number of observations. The distribution is represented mathematically by
X 2
1
f (X ) e 2 2
2
The quantities and are parameters of this distribution. The above equation
takes following form under the assumption that
= 0, = 1 and (X - ) / = Z
2
z
1
f (X ) e 2
2
1. It is a symmetrical, bell shaped single peaked curve. Its slope grows steeper
and steeper as it progress towards the ends. It is asymptotic curve i.e. it
approaches closer and closer to the base line but never coincide to the base
line.
2. The shape of the curve at the center towards the x-axis is concave while at
end it is convex. The curve changes the shape at the distance of from the
mean.
3. There are two parameters viz., (mean) (standard deviation). The curve
can be drawn if we are having the values of both the parameters of
population.
4. The normal curve is symmetrical about the mean therefore, mean divides the
entire area of the curve into two equal parts and hence the mean is also the
median. The maximum frequencies are also at the center of the curve and
therefore, the mode is also equal to median. Thus, mean, mode and median
coincide at the center.
5. If two ordinates at the distance of on both the sides of the mean are
erected, the area of the curve so cut off is equal to 68.26 percent i.e. about
2/3 of the entire curve.
7. If two ordinates are erected on both the sides of mean at the distance of 3,
the area so cut off will be 99.74 percent of the entire area of the curve.
10. The absolute mean deviation about the mean = 0.7999 = 4/5
Solution:
Given μ = 3, σ = 2
(i) P (0 ≤ X ≤ 4)
We know that Z = (X – μ) / σ
When X = 0, Z = (0 – 3) / 2 = −3 / 2 = − 1.5
When X = 4, Z = (4 – 3) / 2 = 1 / 2 =0.5
= 2(0.4772) = 0.9544
EXERCISE NO. 5
5.1 The average number of acres burned by forest and range fires in a large New
Mexico county is 4,300 acres per year, with a standard deviation of 750 acres. The
distribution of the number of acres burned is normal. What is the probability that
between 2,500 and 4,200 acres will be burned in any given year?
5.2 If mean of a given data for a random value is 81.1 and standard deviation is
4.7, then find the probability of getting a value more than 83.
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[There are three types of lies -- lies, damn lies, and statistics. ― Benjamin Disraeli]
------------------------------------------------------------------------------------------------------------
TEST OF SIGNIFICANCE
Test of significance: It is a kind of test which enables us to decide the opinion about the
population parameter on the basis of sample results that whether
(i) the deviation between the observed sample statistic and the hypothetical parameter value
(ii) the deviation between two sample statistics,
is significant or might be attributed to chance or the fluctuations of sampling.
Null hypothesis: It is a statement about the population parameter which is tested for possible
rejection under the assumption that it is true.
It is usually denoted by . Generally, it is called no difference hypothesis, because we
hypothesize that the statistic/parameter/ratio about which the statement has been developed is
not different from the population parameter/ratio. It is also called the ‘size of the critical
region’.
Alternative hypothesis: Any hypothesis complementary to the null hypothesis is called an
alternative hypothesis. It is generally denoted by H1. The alternative hypothesis may be of
single tailed or two tailed hypothesis. For example
H1: (i) or (ii) or (iii)
Here, alternative hypothesis (i) is a two tailed hypothesis where as hypotheses (ii) and (iii) are
single tailed hypothesis.
The table values for two tailed and one tailed tests are observed as
for single tailed test for two tailed test.
Errors in sampling: There are two types of errors in sampling. Type I error and Type II
error.
Type I error: Reject , when it is true. It is denoted by . It is also known as producer’s risk.
Type II error: Accept , when it is wrong. It is denoted by . It is also known as consumer’s
risk.
Level of significance The probability of happening of type I error is known as level of
significance. The levels of significance usually employed in testing of hypothesis are 5% and
1%. It is usually fixed in advance before collecting the sample information.
Z test can be defined as " It is the ratio of the difference between the estimated
population mean and hypothetical mean to the standard error of mean based on population
standard deviation or its estimate from large sample.
SND (Standard Normal Deviation) test for single sample: Under the null hypothesis
that the sample has been drawn from a population with mean and variance , i.e., there is
no difference between the sample mean and population mean , the test statistic (for
large samples) is
where is known as the standard error of the mean
If the population standard deviation is unknown then we use its estimate provided by the
sample variance .
Where
The steps for SND test are:
1. Compute the test statistic Z under null hypothesis .
2. If is always rejected.
3. If , we test its significance at pre-fixed level of significance. In agriculture, it is
generally 5% and sometimes at 1%. Thus for two tailed test
(a) If is rejected at 5% level of significance.
(b) If is rejected at 1% level of significance.
(c) If is accepted at 1% level of significance.
Similarly, for one-tailed test is compared with 1.645 (at 5% level) and 2.33 (at 1% level)
and accepted of reject , accordingly.
Sample may be regarded as large if .
Confidence limits for
99% confidence limit (1% level of significance) for are
98% confidence limit (2% level of significance) for are
95% confidence limit (5% level of significance) for are
90% confidence limit (10% level of significance) for are
Example 1: An automatic packing machine was designed to pack exactly 2.0 kg of panner.
A sample of 100 sample was examined to test the machine. The average weight was found to
be 1.94 kg with standard deviation 0.10 kg. Was the machine working properly?
Solution: Given sample size , sample mean kg,
Sample standard deviation kg
It is required to test the hypothesis that the population mean is 2.0 kg.
kg kg
Since sample size is large, the sample mean is approximately normally distributed with mean
and S.E. . However, since the population s.d. is not known, an approximate value of
S.E. is S.E.
Therefore,
Since exceeds 2.58, we reject the null hypothesis at 1% level of significance and
conclude that the machine is not functioning properly.
EXERCISE: PROBLEMS ON LARGE SAMPLE TEST (Z-TEST)
4.1 Average milk yield of a cow in a year is estimated as 1750 Kg. To test this,
a random sample of 100 cows was taken. The results of which are given in
the following table. Test whether average yield of cow is 1750 Kg. or not.
Also work out the confidence limits at 95%.
If the samples have been drawn from the population with common standard deviation , then under
, the test statistic becomes
means .
If is not known then its estimate based on the sample variances is used. If sample sizes are not
sufficiently large then an unbiased estimate of is given by
Where
But, since sample sizes are large , . Therefore, in practice, for large
samples,
We know
Since exceeds 2.58, we reject null hypothesis at 1% level and conclude that there is a
significant difference in the mean yields of milk in the two districts.
4.2 A random sample of 200 villages have been taken from one district and average
population per village was found to be 495 with a standard deviation of 50.
Another random sample of 200 villages from another district gave an average
population 510 with S.D. 40. Test whether there is any difference between the
average of the two samples. Also work out the confidence limits at 95 percent.
X-series Y-series
Sx2 = 0.2 Yi = 300
S2 = 8 Yi2 = 3145
Xi2 = 9312 n = 30
4.4 Head diameter of 125 sunflowers was measured and found that the average
diameter was 38 cm. Is it true to say that the sample was taken from the
population having average diameter 43.5 cm with variance 20.25 sq cm?
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[Statistics are used much like a drunk uses a lamppost: for support, not illumination: Vin Scully]
------------------------------------------------------------------------------------------------------------
"t" test
When the sample is large and if is not known, we estimate the same and can be
used in Z test. But if 'n' is small error will be more for replacing by S and under that
situation the Z remain no longer normal, but changes to another distribution named "t".
The "t" distribution was found out by W.S. Gossett in the name of 'Student' in 1908.
Values of 't' depends on degree of freedom and is always greater than its limiting
value of Z for any unit degree of freedom. When d.f. is large t --> Z. Difference between
t and Z becomes more and more marked as n become smaller and smaller.
Definition: It is the ratio of the deviation between of sample mean and hypothetical mean
to the standard error of mean estimated from the small sample.
(1) Test of single mean : To test if the sample mean differs significantly from the
hypothetical value of the population mean.
Assumptions: (i) The parent population from which the sample is drawn is normal. (ii) The
sample observations are independent, i.e. the sample is random.
Conditions: (i) The population standard deviation is unknown. (ii) Sample size is small.
Let be a random sample of size from a normal population with mean and
variance . Then Student’s is defined by the statistic
Compare the calculated with the significant points of the t-distribution with
degrees of freedom.
Applications of t-distribution
1. To test if the sample mean differs significantly from the hypothetical value of the
population mean.
2. To test the significance of the difference between two sample means.
3. To test the significance of an observed sample correlation coefficient.
4. To test the significance of observed partial correlation coefficient.
5. To test the significance of observed multiple correlation coefficients.
6. To test the significance of the sample regression coefficient.
7. To test the significance of difference of two regression coefficients.
Example of single mean
Example : Following are the girth at breast-height (gbh) of trees attained since 6 years of planting.
Tree No. 1 2 3 4 5 6 7 8
gbh (cm) 30.85 30.24 30.94 29.89 21.52 25.38 22.89 29.44
Do we say that the trees selected are taken from the population of trees having mean gbh 25.50 cm?
Solution: Level of significance = 5%
Tree gbh (cm) ( )
No.
1 30.85 951.7225
2 30.24 914.4576
3 30.94 957.2836
4 29.89 893.4121
5 21.52 463.1104
Observe table value at 7 degrees of freedom
6 25.38 644.1444
at 5% level of significance.
7 22.89 523.9521
8 29.44 866.7136
Total 221.15 6214.7963
Since calculated is less han table value at 7 degrees of freedom, it may therefore be
concluded that the trees selected are taken from the population of trees having mean gbh 25.50
Example of paired sample
Example: Following are the observations on organic carbon (%) obtained from soil core samples
drawn from two different layers of a number of soil pits in a natural forest. Compare the organic
carbon status of soil at the two soil depth levels.
Soil Pit 1 2 3 4 5 6 7 8 9 10
Layer 1 (x)
Organic 1.59 1.39 1.64 1.17 1.27 1.58 1.64 1.53 1.21 1.48
carbon (%)
Layer 2 (y)
1.21 0.92 1.31 1.52 1.62 0.91 1.23 1.21 1.58 1.18
Hence, pooled
Statistic
Hence, pooled
5.1 The plants are chosen from a population at random whose height in inches is
given below.
62, 65, 67, 71, 74, 75, 77, 78, 80, 81.
In the height of above data discuss the suggestion that the mean height of plants
in the population is 70 inches. Also work out the confidence limits at 95%.
A 9 17 14 13 15 10 11 13
B 8 15 11 10 13 9 - -
Examine whether the two series differ significantly in mean weight.
A (X) (Xi-X) (Xi-X)2 B (Y) (Yi-Y) (Yi-Y)2
9 -3.75 14.06 8 -3 9
17 4.25 18.06 15 4 16
14 1.25 1.56 11 0 0
13 0.25 0.06 10 -1 1
15 2.25 5.06 13 2 4
10 -2.75 7.56 9 -2 4
11 -1.75 3.06 - - -
13 0.25 0.06 - - -
102 49.48 66 0 34
5.4 To test the effectiveness of new drug for blood pressure of 12 patients was
recorded before and after the drug administration. The data are given in the table.
Also work out the confidence limit at 95 percent.
Crown Before 5.9 8.3 8.5 9.3 10.6 11.4 11.9 12.3 12.6 13.0
Solution:
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
---------------------------------------------------------------------------------------------------------------
t and Z tests are used for comparing two populations mean. When the population is to
be compared with respect to their variances the F test is used.
F-test: F-statistic is the ratio of two independent ‘unbiasd estimator’ of the population
variance. F-test is also known as variance ratio test.
Let be the sample variance of a random sample of size drawn from a normal population
with variance and let be the sample variance of another independent random sample of
size drawn from a normal population with variance . We are interested to test the null
hypothesis .
If the test statistic
[If ] or [If ]
then we reject the in favour of at level of significance; otherwise not.
Assumption for F-test
(i) Both samples drawn are simple random samples. (ii) Parant populations of both samples
are normal.
Uses of F-test
1. F-test for equality of two population variances.
F-test for the significance of an observed multiple correlation coefficient.
2. F-test for significance for an observed correlation ratio.
F-test for testing the linearity of regression.
3. F-test for equality of several means.
To test the significance of observed partial correlation coefficient.
Example : The standard deviations calculated from two random samples of sizes 9 and 13 are
2.23 and 1.87 respectively. May the samples be regarded as drawn from normal populations
with the same standard deviation? Given that .
Solution Here , , and
6.1 If S1 = 1.2, S2 = 1.5, n1 = 15 and n2 = 16 then calculated F-test and draw your
conclusion.
6.2 Two random samples of bottle guard are drawn from two populations and the following
lengths (cm) were obtained
Sample I 40.0 41.6 42.3 44.0 45.4 45.8 46.0 46.2 46.5 47.0
Sample II 35.6 36.2 36.8 37.1 37.4 38.4 39.6 40.0 42.5 44.0 45.2 46.0
Find the variances of two samples and test whether the two populations have the same
variance.
Given that , .
Solution:
6.3: Tree diameter (cm) at breast height recorded on two different samples trees are
Sample Tree diameter at breast height (cm)
Sample 1 14.8 12.0 10.5 14.2 11.8 13.6 13.8 14.5 10.0 12.2
Sample 2 10.0 10.1 9.6 9.5 10.1 11.6 14.1 12.7 12.6 8.5
Can it be said that the both samples are drawn from the populations of equal variances?
Solution:
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
of freedom.
Step VI : If cal 2 table 2 0.05, (k-1) d.f. observed difference is not significant at 5% level of
significance. Ho accepted.
If cal 2 > table 2 0.05, (k-1)d.f. observed difference is significant at 5% level of
significance. Ho rejected.
Step VII : Conclusion : Non significance difference indicates that the given sampling
distribution is in agreement with theoretical distribution and the fit is good.
Significant difference indicates that the given sampling distribution is not
in agreement with theoretical distribution and the fit is poor.
2) Test of Independence
Another common use of the chi square test is in testing independence of
classifications.
Independence: The two attributes A and B are said to be independent to each other if
the proportion of A's among B's is the same as that in not - B's.
Contingency table : When the individuals in a sample have two characters or attributes
and a frequency distribution is made classifying them according to both so as to show the
relation between the characters, the resulted table is termed as contingency table.
Yate's correction
In order to avoid irregularities caused by smaller frequencies in 2 x 2
contingency table, a correction for continuity known as Yate's correction is to be applied.
When the product of principal diagonal (ad) is greater than off diagonal (bc) i.e. ad > bc,
then 1/2 is to be subtracted from the values of principle diagonal cell frequencies and 1/2
is to be added to the values of off diagonal so that the marginal total remain unchanged.
Similarly bc > ad then 1/2 is to be added and 1/2 is to be subtracted from the values of
principle and off diagonal cell frequency respectively.
If ad > bc,
2
N
ad bc N
2
2
R1 R 2 C1 C 2
Class A1 A2 Total
B1 a b R1
B2 c d R2
Total C1 C2 N
2 ad bc2 N
R1 R 2 C1 C 2
Where, a, b, c and d are the observed frequency of the respective cell R1, R2,
C1, and C2 are the rows and column totals. N is the grand total.
Step V : Compare calculated 2 with table value at 5% level of significance and (r-1)(c-1)
degree of freedom.
Step VI : If cal 2 table 2 0.05,(r-1)(c-1) d.f. observed difference is not significant at 5%
level of significance. Ho accepted.
If cal 2 > table 2 0.05,(r-1)(c-1) d.f. observed difference is significant at 5% level of
significance. Ho rejected.
Step VII :Acceptance of Ho means the two characters are independent to each other
Rejection of Ho means the two characters are not independent to each other.
Yate’s Correction: If the expected cell frequencies are small (say less than 5) the combining
of classes become meaningless as it will give degrees of freedom, in that case add
0.5 to the cell frequency which is less than 5 and then adjust the other cell frequencies for the
observed marginal totals. The adding of 0.5 frequencies to the minimum frequency of a
contingency table is called Yate’s Correction. Or calculate the by following
modified formula-
Procedure for test of Independence of attribute in case of r x c contingency table :
Step I : Set the appropriate null hypothesis.
Ho : The given classification of group of individuals is independent to each other.
Ha : The given classification of group of individuals is not independent to each other.
Step II : Fix the level of significance.
Step III : Let a group of individuals is classified in two ways in 'r' rows and ‘c’ column in the
following table.
Class I II Total
1st a11 a12 a13 ... R1
2nd a21 a22 a23 ... R2
: : : : : :
: : : aij : :
. . . : .
Total C1 C2 N
R1C1 R1C2
E(a11) = -------- E(a12) = --------
N N
R2C1 R2C2
E(a21) = ------ E(a22) = -------
N N
In general, RiCj
E(aij) = -------
N
Step V : Calculate Chi square as
( Οij - Eij )2
2 = i, j Eij
Step VI : Compare calculated 2 with table value at 5% level of significance and (r-1)(c-1) degree
of freedom.
Step VII : If cal 2 table 2 0.05,(r-1)(c-1) d.f. observed difference is not significant at 5% level
of significance. Ho accepted.
If cal 2 > table 2 0.05,(r-1)(c-1) d.f. observed difference is significant at 5% level of
significance. Ho rejected.
Step VIII Acceptance of Ho means the two characters are independent to each other Rejection of
Ho means the two characters are not independent to each other.
Example :In an orchard of 1750 trees, a record was taken of the number of shaded and un-
shaded trees, and in each of these classes the frequency of high and low yielding trees was
noted as below:
Shaded Un-shaded
Low yielding 640 305
High yielding 510 295
Test whether shading on the tree has any effect on its yielding capacity?
Solution: : The shading on the tree has no effect on its yielding capacity
: The yielding capacity of the tree is affected by shading
Shaded Un-shaded Total
Low yielding 640 (a) 305 (b) 945 (a+c)
High yielding 510 (c) 295 (d) 805 (c+d)
Total 1150 (a+c) 600 (b+d) 1750 (N)
7.1 In a series of experiment 1342 plants with green foliage and 1138 with yellow
were conducted. This was a back cross in which the theoretical ratio was 1:1. Test
whether the ratio between the observed number of plants agrees with the
theoretical ratio.
Observed Expeted Value ( O- E) ( O- E ) 2 (O - E ) 2
Value (O) (E) ------------
E
1342 2480/2 = 1240 102 10404 10404/1240 =
8.39
1138 2480/2 = 1240 -102 10404 8.39
2480 16.78
7.2: A cross between two varieties of sorghum one giving high yield and the other for high
amount of fodder was made. The number of plants in generation were observed as 79,
160, 85. Test whether this sample data is in agreement with the Mendalian ratio 1:2:1 or not.
Solution : The sample ratio is in agreement with 1:2:1 ratio.
Observed Expected Frequency
frequency
79 0.0494
160 0.0247
85 0.1975
Total = 324 324 0.2716
Conclusion: with d.f. at 5% level of
significance. Therefore, the null hypothesis is accepted, i.e., the plants are segregating
according to Mendalian ratio, 1:2:1 in generation.
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[Facts are stubborn, but statistics are more pliable. Mark Twain]
----------------------------------------------------------------------------------------------------------------
So far we have studied problems relating to one variable only. In practice we come across a
large number of problems involving the use of two or more than two variables.
UNIVARIATE POPULATION
BIVARIATE POPULATION
When two variables are simultaneously studied in a single population is termed as bivariate
population e.g. the height and weight of the students, rainfall and yield, the amount of fertilizer used
and the crop yield.
If two quantities vary in such a way that movement in one are accompanied by movements in
the other, these quantities are said to be correlated. e.g. price of commodities and amount
demanded, increase in rainfall up to a point and production of crop. The degree of relationship
between the variables under consideration is measured through the correlation analysis.
CORRELATION
It indicates the association between the two or more variables in a bivariate distribution or an
analysis of the covariation of two or more variables is usually called correlation.
TYPES OF CORRELATION
Correlation is described or classified in several different ways. Three of the most important
ways of classifying correlation are:
i) Positive or negative
ii) Simple, partial and multiple
iii) Linear and non-linear
METHODS OF STUDYING CORRELATION
There are four major approaches of ascertaining whether two variables are correlated or not:
1. Scatter diagram method
2. Graphic method
3. Algebraic method: Karl Pearson’s coefficient of correlation
4. Rank method
Computational formula:
Cov XY
ρ
σx σy
r
Cov XY
xy
SP(xy)
Sx Sy x y2 2
SS x SS y
where, xy XY -
X Y
n
x2 = X2 - (X)2/n
y2 = Y2 - (Y)2/n
PROPERTIES OF CORRELATION COEFFICIENT:
1. A change in an origin does not affect the value of the correlation coefficient.
2. A change in a scale does not affect the value of correlation coefficient.
3. The value of correlation coefficient lies between -1 to +1.
4. Correlation coefficient is unit free.
5. Geometric mean of two-regression coefficient is equal to correlation coefficient.
where, , , ,
EXERCISE NO. 10
Example: Following are the number of seeds per cob and their weight (g) of corn. Find the
coefficient of correlation and test its significance.
Cob No. 1 2 3 4 5 6 7 8 9 10
No. of seeds (X) 278 236 298 275 225 282 290 262 265 239
Seed weight (gm) (Y) 184 151 191 168 160 162 186 158 153 147
Solution Let, No. of seeds Seed weight
Cob X Y (
1 278 184 13 169 18 324 234
2 236 151 -29 841 -15 225 435
3 298 191 33 1089 25 625 825
4 275 168 10 100 2 4 20
5 225 160 -40 1600 -6 36 240
6 282 162 17 289 -4 16 -68
7 290 186 25 625 20 400 500
8 262 158 -3 9 -8 64 24
9 265 153 0 0 -13 169 0
10 239 147 -26 676 -19 361 494
Total 2650 1660 0 5398 0 2224 2704
Method of computation
β yx
X μ Y μ Cov XY
x y
X μ x
2
V X
b yx
X X Y Y XY X Y /n xy
X X X X /n x
2 2 2 2
Similarly,
β xy
X μ Y μ CovXY
x y
Y μ y
2
V Y
b xy
X X Y Y XY X Y /n xy
Y Y Y Y /n y
2 2 2 2
The line indicating the mean relationship between two variables is known as regression line.
Regression coefficient is the rate of change in one variable by changing one unit in the other.
The two regression lines are
USES OF REGRESSION:
1) To predict the value of Y for a given value of X with the help of regression equation.
2) To know the rate of change in Y for a unit change in X with the help of regression
coefficient.
Relations among r, byx, bxy, Sx and Sy
Sy Sx
(ii) byx = r ---- (iii) bxy = r ----
Sx Sy
DIFFERENCES BETWEEN CORRELATION AND REGRESSION
CORRELATION REGRESSION
1 It deals with mutual association It deals with cause and effect relationship
2 It is two way relationship It is one way relationship
3 Correlation coefficient is unit free Regression coefficient is in the units of
dependent variable
4 Correlation coefficient lies between Regression coefficient lies between -
- 1 to + 1 and +
5 For a given value of one variable other For a given value of independent variable the
variable can not be predicted value of the dependent variable can be
predicted.
EXERCISE: PROBLEMS ON REGRESSION
10.4 The following table shows the yield of straw (Y) and yield of grain (X) in Kg.
from plots of 10 x 10 m.
Calculate the regression coefficient of Y on X and X on Y. Estimate the grain yield
for plot giving 24.5 Kg. straw. What will be the correlation coefficient value in
between straw and grain yields?
Grain(X) 59 65 62 68 59 67 65 68 67 66 69 65
Straw(Y) 19 27 21 21 20 22 28 23 26 27 22 20
Since correlation coefficient and regression coefficients have same sign therefore
Correlation coefficient between X and Y
Exercise 10.7: The following data are for the amount of water supplied in inches
and the yield of alfalfa in tons per acre
Water 12 18 24 30 36 42 48
Yield 5.3 5.7 6.5 7.2 8.2 9.7 8.4
(i) Find the regression of yield on water.
(ii) Assuming that the relation between the two is linear, calculate the expected
yield when the amount of water supplied is 20 inches.
Solution:
Some Preliminary
[All statistics have outliers.― Nenia Campbell, Terrorscape]
---------------------------------------------------------------------------------------------------
The Karl Pearson’s method is based on the assumption that the population being studied
is normally distributed. When it is known that the population is not normal, or when the
shape of the distribution is not known there is a need for a measure of correlation that
involves no assumption about the parameters of the population.
This method was developed by Charles Spearman in 1904. This measure is especially
useful when quantitative measures for certain factors can not be fixed. E.g. (I) correlation
between marks obtains in two different subjects by the same group of students. (ii)
Correlation of height and weight of the students can be worked out without making exact
measurement. We shall first stand the students according to height; the same procedure can
be utilized for weight for giving ranks. When there are two or more items are of equal
magnitude, their ranks are to be calculated by taking the average of their ranks.
R 1 -
6 di2 1 12 P 3 P 6 di2
n n2 1 or 1 -
n n2 1
where di2 = square of difference of rank
n = number of pairs
P = number of items where ranks are common
Judge 1 4 3 6 1 2 7 9 8 10 5
Judge 2 1 6 4 7 5 8 10 9 3 2
Do the two judges appear agree in their judgments?
Solution:
Flower set No.
Total
1 2 3 4 5 6 7 8 9 10
Judge 1 4 3 6 1 2 7 9 8 10 5
Judge 2 1 6 4 7 5 8 10 9 3 2
Rank difference 3 -3 2 -6 -3 -1 -1 -1 7 3 0
9 9 4 36 9 1 1 1 49 9 128
Here
R 1 -
6 di2 1 12 P 3 P
n n 1
2
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[“Statistics is the main of all inaccurate studies” ― Edmond Goncourt (de), Jules de Goncourt]
---------------------------------------------------------------------------------------------------
Design of Experiments refers to a plan for assigning subjects to experimental conditions and
the statistical analysis associated with the plan.
Experimental unit- The basic units for which response measurements are collected are
called experimental units of subject.
Factors Distinct types of conditions that are manipulated on the experimental units are called
factors.
Factor levels: The different modes of presence of a factor are called factor levels.
Treatment: Each specific combination of the levels of different factors is called a treatment.
Analysis of Variance (ANOVA): The analysis of variance technique divides the total
variance of a set of data into component parts. Each component part has its own source of
variation which the ANOVA procedure will identify and locate. Additionally, the magnitude
of contribution of each source of variation is delimited by this procedure.
Assumptions for validity of ANOVA
(i) The samples drawn are independent of each other and random.
(ii) Parent population from which observations are taken is normal
(iii) Various treatment and environmental effects are additive in nature.
There are three basic principles of experimental designs-
(a) Randomization
Randomly allotment of treatment to different plots is knpwn as randomization.
(b) Replication
Repetition of treatments under test in an experiment is known as replication.
(b) Local control
Randomization: To obtain better estimate of treatment mean, it should enjoy all types of soil
variations existing in the experimental area. This will help to give real comparison
between treatments. It refers to the method of giving equal chance to all the
individuals to show their performance. The statistical procedure employed for
comparing the means of different treatments holds well when the treatments are
allotted at random.
The method of randomization avoids personal bias in the allotment of
treatments. This is necessary for the validity of the use of standard error. In short,
randomization helps to make an unbiased estimate of (a) treatment means and (b)
experimental error.
Replications: The number of experimental units on which a particular treatment is applied is
called the number of replications of that treatment.
The purpose of replication is to obtain more information (more degrees of freedom) for
estimating and assessing the experimental error and to obtain estimates of effects with
smaller standard errors.
Variation in the soil fertility cannot be avoided owing to its unpredictable nature. The
experimenter, therefore, seeks to average out its inference over different treatments by
repetition. If a treatment is repeated n times, the mean of these repetitions will be subject to a
standard error of , where σ = standard deviation of individual plot estimated from the
experiment. This means as n increases the experimental error goes on decreasing and smaller
differences between the treatments can be brought out. If the area is limited, it is better to
increase the replications than the plot size. Thus, replications are necessary for
(i) estimating the experimental error
(ii) increasing the precision of treatment means, and
(iii) increasing the sensitiveness of the test of significance by decreasing the standard error.
Local control : Local control means control of all factors except the ones about which
we are investigating. Variations do occur if the experimenter is careless during the conduct
of the experiment and does not carry out the operations in all the plots on the same day or
part of the day e.g. in varietal or manurial experiments, operations like sowing, weeding or
interculturing have to be completed on the same day. Weeding done in the some plots on one
day and in the remaining plots on the next day is likely to cause variation, particularly when
it is preceded or followed, by rain or cloudy weather or drought conditions. If the experiment
area is big and weeding cannot be managed due to shortage of labour or some such
unavoidable cause, the operation should be completed block wise on the same day or part of
the day, so that any one block containing all the treatments will have received similar care
and variation caused by that operation between block, will be isolated in replication
deviation.
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
Where, Yij = Response or yield from the jth unit receiving the ith treatment
= General mean
i = Effect of ith treatment
ij = Uncontrolled variation associated with jth unit receiving ith treatments.
Source of M. S.
variation DF Sum of Squares (SS) (SS/DF) Cal. F
Treatment (t-1) t t r MST MST
Yi.2 ( Y ij )2 MSE
i 1 j 1
i 1
r rt
Error t(r-1) By subtraction MSE
Total (rt-1) t r
t r
( Y ij )2
Y
i 1 j 1
2
ij
i 1 j 1
rt
Analysis of variance : Completely randomized design with unequal replication.
Source of M. S.
variation DF Sum of Squares (SS) (SS/DF) Cal. F
(t-1) t r MST MST
Treatment t 2
( Y ij )2 MSE
Y
i 1 j 1
i.
t
i 1 ri
r
i 1
i
t
Error MSE
r t
i 1
i
By subtraction
t
Total t r
r 1
i 1
i
t r
( Y ij )2
Y
i 1 j 1
2
ij t
i 1 j 1
r
i 1
i
1 1
SEm MS E or SEd MSE ( )
r or r0 ri rj
Where, r = Number of observations for treatments (equal number of observations)
ro = Harmonic mean of number of observations for different treatments (when
unequal number of observation for treatment).
No. of Treatments
r0
1 1 .......... 1
r1 r2 rt
Where r1, r2... are the number of observations for different treatments.
MSE
CV % x 100
Y ..
Statistical analysis of variance
Let there be N units and `t' treatments. If N is a multiple of t, i.e. N=nt, each treatment
can be allotted to n units at random. As the only source of assignable variation is the effect of
treatments, the total variation in the data under C. R. D. can be analysed as one way
classification. The no. of replications per treatment should be such that the degrees of
freedom for the error shall be at least 10. For the actual computation, following example is
calculated.
: CRD Calculation(with equal replication):
Let there be N units and `t' treatments. If N is a multiple of t, i.e. N=nt, each treatment
can be allotted to n units at random. As the only source of assignable variation is the effect of
treatments, the total variation in the data under C. R. D. can be analysed as one way
classification. The no. of replications per treatment should be such that the degrees of
freedom for the error shall be at least 10. For the actual computation, following example is
calculated.
Ex. yield in per plot 5 varieties of wheat applied each to 4 plots at random.
_
Now rank the treatment mean (Y) :
C D B E A
16 12 11 9 8
--- ---------- --------
---------
Varieties which do not differ significantly are underlined by a common bar.
Normally we would allot the same no. of experimental units to each treatment.
However, on account of death or failure in performance (stop giving milk), it can happen that
we end up the experiment with unqual no. of experimental units in different treatments. The
analysis continues to be simple only the divisors in the sum of squares undergoing changes.
The variance of the difference between treatment means varies depending on the no. of
observations in each estimate of
_ _
V(Yi - Yj) = S2 ( 1/ri + 1/Rj )
_ _
Where S2 = Error m. s., ri & rj are the no. of observations that made up Yi - Yj ,
respectively.
Example: An experiment is conducted to determine the soil moisture deficit resulting from varying amounts of
residual timber left after cutting trees in the forest. The measurements of moisture deficit are given in the
following table: Perform the ANOVA test and construct confidence intervals for treatment differences
Moisture deficit under different treatments
Treatment Moisture deficit in soil
T1 1.44 1.64 1.20 1.48 1.55 1.46
T2 2.65 3.82 2.76 2.12 2.78
T3 1.02 1.22 1.05 1.16 1.32 1.04
T4 0.68 0.82 0.95 0.84
Solution: H0: All treatments retain the soil moisture equally or the effect of all treatments to retain soil
moisture is equal.
Totals of moisture deficit under different treatments
Treatment Moisture deficit in soil Total Mean
T1 1.44 1.64 1.20 1.48 1.55 1.46 7.31 1.218
T2 2.65 3.82 2.76 2.12 2.78 14.13 2.826
T3 1.02 1.22 1.05 1.16 1.32 1.04 6.81 1.135
T4 0.68 0.82 0.95 0.84 3.29 0.822
Since calculated F value (12.03) > table value of F at 1% level of significance (5.18) which indicate that the
difference between the treatments to retain soil moisture is highly significant. Therefore, the null hypothesis
may not be accepted.
Critical difference between treatment 1 and 2, Treatment 2 and 3
= 0.961
Critical difference between treatment 1 and 4, Treatment 3 and 4
= 1.075
Critical difference between treatment 2 and 4
= 1.117
Now, keeping the treatment means in descending order
Treatment T2 T1 T3 T4
Mean yield 2.826 1.218 1.135 0.822
CV(%) = 38.25
Conclusion: Treatment T2 showed its superiority to retain the soil moisture among all other treatments. All
other treatments are statistically alike for the purpose of retaining soil moisture.
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
EXERCISE NO. 12
PROBLEMS ON CRD
------------------------------------------------------------------------------------------------------------
Exercise 12.1: Wood density (g/cc) observed on a randomly collected set of stems
belonging to different cane species are given below:
Some Preliminary
When the experimental material is heterogeneous efforts should be made to group into
homogeneous groups of size equal to no. of treatments, each of groups constitutes a
replication. The treatments are applied to these units of a group at random. Fresh
randomization is to be followed in assigning treatments the experimental units of each group.
In case of field experiment if it is observed that the fertility gradient of the field is in one
direction the whole field may be divided in to a no. of blocks. The no. of plots in each block
is equal to the no. of treatments, so that each block is a replicate.
The shape of the blocks should be either rectangular or square and that the
experimental area should be made as compact as possible. This reduces the difference in soil
fertility within the blocks to a minimum.
The fertility within a block should be as uniform as possible. During the course of
experiment, an uniform technique should be employed for all the plots of the same block. If
necessary the changes in the technique and other conditions may be made bet. the blocks, but
within the blocks uniformity should be maintained.
Advantages (i) Accuracy: Blocking can increase precision by removing one source of
variation from experimental error. (ii) Flexibility: There is no restriction on the
number of treatments or number of blocks so long as each treatment is replicated the same
number of times in each replication. (iii) Easy computation: Statistical analysis is relatively
simple. Moreover any number of treatments may be omitted from the analysis without
complicating it. (iv) It is possible to separate sum of squares for error into components
corresponding to particular treatment effect.
Disadvantages (i) The efficiency of the design decreases as the number of treatments and,
hence, block size increases. (ii) Missing data can cause some difficulty in the analysis. (iii)
The design is less efficient than others in the presence of more than one source of variation.
Statistical model :
Yij = µ + Ti + Bj + Eij
Where,
Yij = Yield of ith treatment in jth replication.
µ = general mean
Ti = Effect due to ith treatment
Bj = Effect due to jth replication
Eij = Uncontrolled variation in plot receiving ith treatment in jth replication.
Analysis of Variance
t r
( Y ij )2
Y
i 1 j 1
2
ij
i 1 j 1
rt
MS E
The standard error of mean = S .Em.
r
The standard error or the difference between the treatment means based on r
replications is estimated by the relation.
S .Ed . 2 MSE
r
where, MSE = Error M.S. r = No. of replications
Ex. 13 : The yield of 6 varieties of a wheat in kg/plot, are given below. The no. of
replications is 5, plot size is 1/20 acre and the varieties have been represented by A, B, C, D,
E & F.
Treatment Replication Treat. Treat.
I II III IV V Total means
A 20 26 30 28 23
B 9 12 10 16 7
C 12 15 16 14 14
D 17 10 20 23 20
E 28 26 23 35 30
F 40 50 56 64 70
Total
Statistical analysis :
The yield of 6 varieties of a wheat in kg/plot, are given below. The no. of replications is
5, plot size is 1/20 acre and the varieties have been represented by A, B, C, D, E & F.
Treatment Replication Treat. Treat.
I II III IV V Total means
A 20 26 30 28 23 127 25.40
B 9 12 10 16 7 54 10.80
C 12 15 16 14 14 71 14.20
D 17 10 20 23 20 90 18.00
E 28 26 23 35 30 142 28.40
F 40 50 56 64 70 280 56.00
Total 126 139 155 180 164 764
Analysis :
Grand total 764
1. General Mean = -------------------------------- = --------------- = 25.46
Total no. of observations 30
130750
= ----------- - 19456.53 = 26150.00 - 19456.53 = 6693.47
5
5
Σ (Replication total)2
1
5. Replication S. S. = -------------------------------- - C. F.
4 d.f. No. of treatments
118518
= ------------- - 19456.53 = 19753.00 - 19456.53 = 296.47
6
ANOVA TABLE :
Source D.F. S.S. M.S. Cal. F Tab. F Result
Replication 4 (r-1) 296.47 74.12 2.67 2.87 NS
Treatment 5 (t-1) 6693.47 1338.69 48.36* 2.71 Sig.
Error 20 553.53 27.68
(r-1) (t-1)
Total 29 (rt-1) 7543.47
Conclusion:
It is clear from the table that treatments are significant at 5% level. There are significant
differences between the treatments means.
Now we have to test the significance of the difference between the individual treatments and
that will be done with the help of Least Significant Difference (L.S.D.) or Critical Difference (C.D.).
Note:-
If the F test reveals that treatments are non-significant, then there is no need to find out value
of L.S.D. or C.D.
Treatments F E A D C B
Mean yields 56.00 28.40 25.40
18.00 14.20 10.80
in kg/plot ---------------------
-------------------------
------------------
The treatments which do not differ significantly have been underlined by a common bar. The
treatment F has been found to be the best of all treatments.
EXERCISE NO. 13
Problems on RBD
------------------------------------------------------------------------------------------------
Example13. 1: An experiment with 10 treatments was carried out in a randomized block
design with three replications for urdbean, variety LBG-17. The seed yield (q/ha) under
different treatments and replications are given in the following table. Analyze the data and
interpret the results
Seed yield (q/ha)
Treatments
Block
1 2 3 4 5 6 7 8 9 10
I 16.22 25.78 28.59 27.47 26.13 23.96 16.99 18.21 37.75 32.15
II 24.59 33.97 26.08 18.32 24.77 23.23 10.05 25.72 39.88 36.71
III 33.39 33.78 34.16 27.88 23.01 28.17 31.62 20.55 39.20 39.83
DEPARTMENT OF AGRICULTURAL STATISTICS
COLLEGE OF AGRICULTURE, BHARUCH
Some Preliminary
[To consult the statistician after an experiment is finished is often merely to ask
him to conduct a post mortem examination. He can perhaps say what the
experiment died of. : Ronald Fisher]
------------------------------------------------------------------------------------------------
LATIN SQUARE DESIGN
Columns
D C B A
Rows C B A D
B A D C
A D C B
What is a Latin square design under what circumstances it is preferred: -
While planning the experiment, the restrictions have been imposed as needed and according to
these restrictions, the design changes.
The design which simultaneously can control variation in two directions is known as Latin
square design. It does not mean length and breadth of plot should be in equal Latin square design, but
the no. of rows and columns are equal. It is reliable to give precise results.
1. When the experimental material can be divided into homogeneous groups by one way and
also into groups by the other way.
Ex. The field having fertility gradient in two direction.
2. Animals can be divided into groups according to their body weight, age, lactation no. etc.
3. In cross over trials, Latin square design is most suitable.
This involves placing the treatments at random in position in the square, subject to the
restriction that treatment can occur and only once in a row or column. The basic principle as stated by
Fisher is that each plot has an equal probability of receiving any of the possible treatments, and each
pair of being treated alike. Yates discussed in detail the procedures necessary for randomization of
Latin squares from the 3 x 3 to 12 x 12. In general, if we have all the possible arrangements for a
Latin square of given dimension, the process of randomization involves.
(i) drawing one of these at random, for example, for the 5 x 5 square :-
C1 C2 C3 C4 C5
R1 A B C D E
R2 E A B C D
R3 D E A B C
R4 C D E A B
R5 B C D E A
C1 C2 C3 C4 C5
R3 D E A B C
R1 A B C D E
R5 B C D E A
R4 C D E A B
R2 E A B C D
C1 C5 C3 C2 C4
D C A E B
A E C B D
B A D C E
C B E D A
E D B A C
(iv) Letter randomization to have a latin square for actual conduct of the experiment :-
1 2 3 4 5
A B C D E
C E B D A
Put these values instead of the original values. Final Latin square design to utilize to
experiment.
1 2 3 4 5
1 D B C A E
2 C A B E D
3 E C D B A
4 B E A D C
5 A D E C B
Statistical Model :
Yij(K) = + Ri + Cj + T(k) + ij(k)
i = 1,2, ...,r j = 1,2, ...,c k = 1,2, ...,t and r = c = t
Where :
t i 1 ..(k ) t2
Error (t-1) MSE or
By Difference
(t-2) Se2 = SSE/DF
Total (t2-1)
t
t
( Y ij ( k ) )2
Y
i , j , k 1
2
i 1
ij ( k )
t2
S e2
C.V. % = x 100
Y
When the experimental material is not completely homogeneous and we observe that there are
more than one kind of variation in the material, e.g. fertility gradient in the field is in two
directions, we devide the experimental media into small blocks in such a way that the variation in
experimental material is controlled in two directions. Now all the treatments under study are
applied randomly within each row and column, the experimental design of this kind is called
Latin Square Design (LSD).
Advantages
(iii) Chief advantage of LSD is that, it controls the heterogeneity of soil in two directions instead
of one as in case of RBD.
(iv) The precision of experiment is increased because of compact blocks.
Disadvantages
(ii) This design is not flexible as RBD. This limits the number of treatments. The number of
plot increases as the number of treatments increases. So, for a large number of treatments, say
beyond 12, LSD is less efficient as the block size will also increase introducing to the
heterogeneity as a source of error. Similarly, if number of treatments is small, say less than 5, the
degrees of freedom for error become very small.
(iii) The analysis becomes very complicated if there are missing data or if treatments are mis-
assigned.
Method of analysis
Let there be t treatments.
Structure of ANOVA for LSD
Source of Degree of Sum of
Mean Sum of Square F
Variation Freedom Square
Rows t-1 SSR
3. Total S. S.
Total S. S. = (Individual observations)2 - C. F.
= (92 + --- + 22) - C. F.
= 535 - 441 = 94
4. Row S. S.
R2i
Row S. S. = ------- - C. F.
t
(31)2 + --- + (13)2
= ----------------------- - C. F. = 481 - 441 = 40
5
5. Column S. S.
6. Treatment S. S.
Treatment S. S. = Total
F1 = 17
F2 = 25
F3 = 19
F4 = 16
F5 = 28
-----
105
T2k
Treatment S. S. = -------- - C.F.
t
= 463.0 - 441.0 = 22
7. Error S.S.
Error S. S. = Total S. S. - (Row S.S. + Column S. S. + Treatment S. S.)
= 94.0 - (40.0 + 18.8 + 22.0)
= 94.0 - 80.8
= 13.2
If we let represent the no. of treatments, rows and columns in a Latin square the form of the
analysis is :-
Source Sum of squares Degree of freedom
Rows (R2i ) - (G)2 (t - 1)
------- ------
t t2
Columns (C2j) - (G)2 (t - 1)
------ -------
t t2
Treatments (T k) - (G)2
2
(t - 1)
------ -------
t t2
Error By difference (t - 1) (t - 2)
Total - (t2 - 1)
Where Ri, Cj, and Tk, represent row, column and treatment totals, and G the G.T.
ANOVA TABLE :
Source d. f. S.S. M.S. Cal. F Table F
(n1, n2, 0.05)
Age group 4 40.00 10.0 9.09* 3.26`
Weight group 4 18.80 4.7 4.27*
Feeds 4 22.00 5.5 5.0*
Error 12 13.20 1.1 -
Total 24 94.00 - -
* Significant.
Standard error of different
between two means (S.Ed.) = (2 x Error m. s.) / t
= (2 x 1.1) / 5 = 0.6633
C. D. (Critical difference) or least significant difference (L.S.D.)
F5 F2 F3 F1 F4
5.6 5.0 3.8 3.4 3.2
------------
-------------
-----------------------
Interpretation of results :
Among the various feeds the best performance is of feed 5 and the poor is feed 4.
Feeds F3, F1 and F4 are some what having equal effect.
Example 14.1: Yield of six hybrid maize varieties as influenced by plant population and
integrated nitrogen management from an experiment laid on LSD were recorded, with
layout, as follows-
Grain yield (q/ha)
68.69 (H2) 102.76 (H5) 94.84 (H1) 101.53 (H6) 79.23 (H4) 98.00 (H3)
79.50 (H3) 95.25 (H6) 60.60 (H2) 75.30 (H4) 106.80 (H5) 90.35 (H1)
89.50 (H1) 78.00 (H4) 110.20 (H5) 85.65 (H3) 73.55 (H2) 102.30 (H6)
106.20 (H6) 92.98 (H3) 81.45 (H4) 106.20 (H5) 86.25 (H1) 65.75 (H2)
85.50 (H4) 90.80 (H1) 98.00 (H6) 70.70 (H2) 92.55 (H3) 111.95 (H5)
114.65 (H5) 67.50 (H2) 90.45 (H3) 94.00 (H1) 102.20 (H6) 82.60 (H4)
Analyze the data and draw conclusion.
EXERCISE NO. 15
Some Preliminaries
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.41 99.42 99.42 99.43 99.43
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 27.13 27.05 26.98 26.92 26.87
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.45 14.37 14.31 14.25 14.20
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.96 9.89 9.82 9.77 9.72
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.66 7.60 7.56
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.54 6.47 6.41 6.36 6.31
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.73 5.67 5.61 5.56 5.52
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.18 5.11 5.05 5.01 4.96
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.77 4.71 4.65 4.60 4.56
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.46 4.40 4.34 4.29 4.25
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.22 4.16 4.10 4.05 4.01
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96 3.91 3.86 3.82
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.86 3.80 3.75 3.70 3.66
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67 3.61 3.56 3.52
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.62 3.55 3.50 3.45 3.41
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 3.46 3.40 3.35 3.31
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.43 3.37 3.32 3.27 3.23
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 3.30 3.24 3.19 3.15
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.29 3.23 3.18 3.13 3.09
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31 3.24 3.17 3.12 3.07 3.03
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.18 3.12 3.07 3.02 2.98
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 3.14 3.07 3.02 2.97 2.93
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.09 3.03 2.98 2.93 2.89
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13 3.06 2.99 2.94 2.89 2.85
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09 3.02 2.96 2.90 2.86 2.81
27 7.68 5.49 4.60 4.11 3.78 3.56 3.39 3.26 3.15 3.06 2.99 2.93 2.87 2.82 2.78
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.96 2.90 2.84 2.79 2.75
29 7.60 5.42 4.54 4.04 3.73 3.50 3.33 3.20 3.09 3.00 2.93 2.87 2.81 2.77 2.73
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.91 2.84 2.79 2.74 2.70
31 7.53 5.36 4.48 3.99 3.67 3.45 3.28 3.15 3.04 2.96 2.88 2.82 2.77 2.72 2.68
32 7.50 5.34 4.46 3.97 3.65 3.43 3.26 3.13 3.02 2.93 2.86 2.80 2.74 2.70 2.65
33 7.47 5.31 4.44 3.95 3.63 3.41 3.24 3.11 3.00 2.91 2.84 2.78 2.72 2.68 2.63
34 7.44 5.29 4.42 3.93 3.61 3.39 3.22 3.09 2.98 2.89 2.82 2.76 2.70 2.66 2.61
35 7.42 5.27 4.40 3.91 3.59 3.37 3.20 3.07 2.96 2.88 2.80 2.74 2.69 2.64 2.60
A3: F table values at 5% level of significance
Df for Degrees of freedom for greater variance (numerator)
Smaller
variance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88 242.98 243.91 244.69 245.36 245.95
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.40 19.41 19.42 19.42 19.43
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.76 8.74 8.73 8.71 8.70
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.94 5.91 5.89 5.87 5.86
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.70 4.68 4.66 4.64 4.62
6 0.49 0.75 0.86 0.92 0.95 0.98 1.00 1.01 1.02 1.03 1.03 1.04 1.05 1.05 1.05
7 0.48 0.74 0.85 0.90 0.94 0.96 0.98 0.99 1.00 1.01 1.02 1.02 1.03 1.03 1.04
8 0.48 0.73 0.84 0.89 0.93 0.95 0.97 0.98 0.99 1.00 1.01 1.01 1.02 1.02 1.02
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.10 3.07 3.05 3.03 3.01
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.94 2.91 2.89 2.86 2.85
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.82 2.79 2.76 2.74 2.72
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.72 2.69 2.66 2.64 2.62
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.63 2.60 2.58 2.55 2.53
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.57 2.53 2.51 2.48 2.46
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.51 2.48 2.45 2.42 2.40
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.46 2.42 2.40 2.37 2.35
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.41 2.38 2.35 2.33 2.31
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34 2.31 2.29 2.27
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.34 2.31 2.28 2.26 2.23
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.31 2.28 2.25 2.22 2.20
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.28 2.25 2.22 2.20 2.18
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.26 2.23 2.20 2.17 2.15
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.24 2.20 2.18 2.15 2.13
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.22 2.18 2.15 2.13 2.11
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.20 2.16 2.14 2.11 2.09
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.12 2.09 2.07
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20 2.17 2.13 2.10 2.08 2.06
28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.15 2.12 2.09 2.06 2.04
29 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.18 2.14 2.10 2.08 2.05 2.03
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.13 2.09 2.06 2.04 2.01
31 0.44 0.69 0.79 0.84 0.87 0.90 0.91 0.92 0.93 0.94 0.95 0.95 0.96 0.96 0.97
32 0.44 0.69 0.79 0.84 0.87 0.90 0.91 0.92 0.93 0.94 0.95 0.95 0.96 0.96 0.97
33 0.44 0.69 0.79 0.84 0.87 0.90 0.91 0.92 0.93 0.94 0.95 0.95 0.96 0.96 0.96
34 4.13 3.28 2.88 2.65 2.49 2.38 2.29 2.23 2.17 2.12 2.08 2.05 2.02 1.99 1.97
35 4.12 3.27 2.87 2.64 2.49 2.37 2.29 2.22 2.16 2.11 2.07 2.04 2.01 1.99 1.96