Full Notes Mohan Sir PDF
Full Notes Mohan Sir PDF
1.1. Introduction:
In the modern world of computer and information technology, the importance of statistics
is very well recognized by all the disciplines. Statistics has originated as a science of statehood
and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology,
Medicine, Industry, Planning, Education and so on.
The word statistics in our everyday life means different things to different people. For a
layman, ‘Statistics’ means numerical information expressed in quantitative terms. A student
knows statistics more intimately as a subject of study like economics, mathematics, chemistry,
physics and others. It is a discipline, which scientifically deals with data, and is often described
as the science of data. For football fans, statistics are the information about rushing yardage,
passing yardage, and first downs, given a halftime. To the manager of power generating station,
statistics may be information about the quantity of pollutants being released into the atmosphere
and power generated. For school principal, statistics are information on the absenteeism, test
scores and teacher salaries. For medical researchers, investigating the effects of a new drug and
patient dairy. For college students, statistics are the grades list of different courses, OGPA,
CGPA etc... Each of these people is using the word statistics correctly, yet each uses it in a
slightly different way and somewhat different purpose.
The term statistics is ultimately derived from the Latin word Status or Statisticum
Collegium (council of state), the Italian word Statista ("statesman”), and The German word
Statistik, which means Political state.
Father of Statistics is Sir R. A. Fisher (Ronald Aylmer Fisher). Father of Indian
Statistics is P.C. Mahalanobis (Prasanth Chandra Mahalanobis)
2) Geographical classification:
In this type of classification, the data are classified according to geographical region or
geographical location (area) such as District, State, Countries, City-Village, Urban-Rural, etc...
Ex: The production of paddy in different states in India, production of wheat in different
countries etc...
State-wise classification of production of food grains in India:
State Production (in tonnes)
Orissa 3,00,000
A.P 2,50,000
U.P 22,00,000
Assam 10,000
3) Qualitative classification:
In this type of classification, data are classified on the basis of attributes or quality
characteristics like sex, literacy, religion, employment social status, nationality, occupation
etc... such attributes cannot be measured along with a scale.
Ex: If the population to be classified in respect to one attribute, say sex, then we can classify
them into males and females. Similarly, they can also be classified into ‘employed’ or
‘unemployed’ on the basis of another attribute ‘employment’, etc...
Qualitative classification can be of two types as follows
(i) Simple classification (ii) Manifold classification
4) Quantitative classification:
In quantitative classification the data are classified according to quantitative
characteristics that can be measured numerically such as height, weight, production, income,
marks secured by the students, age, land holding etc...
Ex: Students of a college may be classified according to their height as given in the table
Height(in cm) No of students
100-125 20
125-150 25
150-175 40
175-200 15
3. Head note: It is used to explain certain points relating to the table that have not been included
in the title nor in the caption or stubs. For example the unit of measurement is frequently written
as head note such as ‘in thousands’ or ‘in million tonnes’ or ‘in crores’ etc...
4. Captions or Column Designation: Captions in a table stands for brief and self explanatory
headings of vertical columns. Captions may involve headings and sub-headings as well.
Usually, a relatively less important and shorter classification should be tabulated in the columns.
5. Stubs or Row Designations: Stubs stands for brief and self explanatory headings of
horizontal rows. Normally, a relatively more important classification is given in rows. Also a
variable with a large number of classes is usually represented in rows.
6. Body: The body of the table contains the numerical information. This is the most vital part of
the table. Data presented in the body arranged according to the description or classification of the
captions and stubs.
7. Footnotes: If any item has not been explained properly, a separate explanatory note should be
added at the bottom of the table. Thus, they are meant for explaining or providing further details
about the data that have not been covered in title, captions and stubs.
8. Sources of data: At the bottom of the table a note should be added indicating the primary and
secondary sources from which data have been collected. This may preferably include the name
of the author, volume, page and the year of publication.
When three or more characteristics are represented in the same table is called three-way
tabulation. As the number of characteristics increases, the tabulation becomes so complicated
and confusing.
Ex: Triple table (three way table): Population of country in different State according to Sex
and Education
Ex: Marks scored by 15 students: 55, 82, 45, 18, 29, 42, 62, 72, 83, 15, 75, 87, 93, 56, 74.
Round this result to get a convenient number. You might need to change the number of classes,
but the priority should be to use values that are easy to understand.
3. Find the class limits: You can use the minimum data entry as the lower limit of the first class.
To find the remaining lower limits, add the class width to the lower limit of the preceding class
(Add the class width to the starting point to get the second lower class limit. Add the class width
to the second lower class limit to get the third, and so on.).
4. Find the upper limit of the first class: List the lower class limits in a vertical column and
proceed to enter the upper class limits, which can be easily identified. Remember that classes
cannot overlap. Find the remaining upper class limits.
5. Go through the data set by putting a tally in the appropriate class for each data value. Use the
tally marks to find the total frequency for each class.
One of the most convincing and appealing ways in which statistical results may be
presented is through diagrams and graphs. Just one diagram is enough to represent a given data
more effectively than thousand words. Moreover even a layman who has nothing to do with
numbers can also understands diagrams. Evidence of this can be found in newspapers,
magazines, journals, advertisement, etc....
Diagrams are nothing but geometrical figures like, lines, bars, squares, cubes, rectangles,
circles, pictures, maps, etc... A diagrammatic representation of data is a visual form of
presentation of statistical data, highlighting their basic facts and relationship. If we draw
diagrams on the basis of the data collected, they will easily be understood and appreciated by all.
It is readily intelligible and save a considerable amount of time and energy.
6.2 Advantage/Significance of diagrams:
Diagrams are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and understandable.
3. They make comparison possible.
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
6.3 Demerits (or) limitations:
1. Diagrams are approximations presentation of quantity.
2. Minute differences in values cannot be represented properly in diagrams.
3. Large differences in values spoil the look of the diagram and impossible to show wide gap.
4. Some of the diagrams can be drawn by experts only. eg. Pie chart.
5. Different scales portray different pictures to laymen.
6. Similar characters required for comparison.
7. No utility to expert for further statistical analysis.
6.5 Types of diagrams:
In practice, a very large variety of diagrams are in use and new ones are constantly being
added. For convenience and simplicity, they may be divided under the following heads:
1. One-dimensional diagrams 3. Three-dimensional diagrams
2. Two-dimensional diagrams 4. Pictograms and Cartograms
20
50 1951
Year UP AP MH 40
30
ii) Cartogram:
In this technique, statistical facts are presented through maps accompanied by various
type of diagrammatic presentation. They are generally used to presents the facts according to
geographical regions. Population and its other constituent like birth, death, growth, density,
production, import, exports, and several other facts can be presented on the maps with certain
colours, dots, cross, points etc...
Systolic BP No.of
persons
100-109 7
110-119 16
120-129 19
130-139 31
140-149 41
150-159 23
160-169 10
170-179 3
Fig 7.3: Systolic Blood Pressure (BP) in mmHg of people
Construction of Histogram:
i)) Construction Histogram for frequency distributions having equal class intervals:
i) Convert the data intoo the exclusive class intervals if it is given in the inclusive class
intervals.
ii) Each class interval is drawn on the X X-axis
axis by section or base (width of rectangle) which
is equal to the magnitude of class interval. On the Y-axis, Y axis, we have to plot the
corresponding
ponding frequencies.
iii) Buildd the rectangles on each class-intervals
class intervals having height proportional to the corresponding
frequencies of the classes.
iv) It should be kept in mind that rectangles are drawn adjacent to each other. These adjacent
rectangles thus formedd gives histogram of frequency distribution.
2)) Histogram for frequency distributions having un-equal un equal class intervals:
i) In case of frequency distribution of un un-equal
equal class interval, it becomes bit difficult to
construct a histogram.
ii) In such cases, a correction
correc of un-equal
equal class interval is essential by determining the
“frequency density” or “relative frequency”.
iii) Here height of bar in histogram constitutes the frequency density instead of frequency,
which are plotted on the Y-axis.
Y
iv) The frequency density is determined using the following formula:
L MM
6
5 6 NY 6 P
8.1 Introduction
While studying the population with respect to variable/characteristic of our interest, we
may get a large number of raw observations which are uncondensed form. It is not possible to
grasp any idea about the characteristic by looking at all the observations. Therefore, it is better to
get single number for each group. That number must be a good representative one for all the
observations to give a clear picture of that characteristic. Such representative number can be a
central value for all these observations. This central value is called a measure of central
tendency or an average or measure of locations.
8.2 Definition:
“A measure of central tendency is a typical value around which other figures
congregate.”
8.3 Objective and function of Average
1) To provide a single value that represents and describes the characteristic of entire group.
2) To facilitate comparison between and within groups.
3) To draw a conclusion about population from sample data.
4) To form a basis for statistical analysis.
8.4 Essential characteristics/Properties/Pre-requisite for a good or an ideal Average:
The following characteristics should possess for an ideal average.
1. It should be easy to understand and simple to compute.
2. It should be rigidly defined.
3. Its calculation should be based on all the items/observations in the data set.
4. It should be capable of further algebraic treatment (mathematical manipulation).
5. It should be least affected by sampling fluctuation.
6. It should not be much affected by extreme values.
7. It should be helpful in further statistical analysis.
8.5 Types of Average
Mathematical Average Positional Average Commercial Average
1) Arithmetic Mean or Mean 1) Median 1) Moving Average
i) Simple Arithmetic Mean 2) Mode 2) Progressive Average
ii) Weighted Arithmetic Mean 3) Quantiles 3) Composite Average
iii) Combined Mean i) Quartiles
2) Geometric Mean ii) Deciles
3) Harmonic Mean iii) Percentiles
W fW + h fh+ … … … … + m fm ∑mklW k fk
= = , = 1,2, . . o
W + h + ⋯+ m K
c) Step-Deviation Method:
∑mklW k 6kr
= + × L, = 1,2, . . o
K
where, A = the assumed mean or any value in x.
K = ∑mklW k = the sum of the frequencies or total frequencies.
(st qu)
6kr = = the deviation of ith value from the assumed mean.
v
4. If f̅W , f̅h , … . . f̅m are the means of ‘n’ samples of size W , h … … . m respectively, then their
combined mean is given by
x = W f̅W
+ h f̅ h … … … + m f̅ m
W+ h + … … … . + m
|W fW + |h fh + … … … … + |j fj ∑jklW |k fk
assigned to them, then the weighted arithmetic mean is given by:
} = = j
|W + |h + ⋯ + |j ∑klW |k
Uses of the weighted mean:
Weighted arithmetic mean is used in:
1. Construction of index numbers.
2. Comparison of results of two or more groups where number of items differs in each
group.
3. Computation of standardized death and birth rates.
4. When values of items are given in percentage or proportion.
2) Geometric Mean (GM):
∑jklW WX fk
Or
~5 = ‚ ƒ
∑mklW k WX (fk )
Or
~5 = ‚ ƒ
K
where, K = ∑mklW k = the sum of the frequencies or total frequencies
∑mklW WX ]k
Or
ƒ
k
~5 = ‚
K
where, K = ∑mklW k = the sum of the frequencies or total frequencies
]k = Mid-points / mid values of class intervals
Merits of Geometric mean:
1. It is rigidly defined.
2. It is based on all observations.
3. It is capable of further mathematical treatment.
4. It is not affected much by the fluctuations of sampling.
5. Unlike AM, it is not affected much by the presence of extreme values.
6. It is very suitable for averaging ratios, rates and percentages.
Demerits of Geometric mean:
1. Calculation is not simple as that of A.M and not easy to understand.
2. The GM may not be the actual value of the series.
3. It can’t be determined graphically and inspection.
4. It cannot be used when the values are negative because if any one observation is
negative, G.M. becomes meaningless or doesn’t exist.
5. It cannot be used when the values are zero, because if any one observation is zero, G. M.
becomes zero.
6. It cannot be calculated for open-end classes.
HM = =
1 1 1 1
+ + ⋯ . . ∑ • Ž
fW fh fj fk
where, n = number of observations
Computation of Harmonic Mean:
i) For raw data/individual-series/ungrouped data:
If fW , fh … … . fj are ‘n’ observations, then their harmonic mean is given by:
HM = =
1 1 1 1
+ + ⋯ . . ∑ • Ž
fW fh fj fk
ii) For frequency distribution data :
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
If fW , fh … … . fm are ‘k’ observations with corresponding frequencies W , h … … . m , then their
∑ k K
geometric mean is computed by:
•5 = =
∑mW • k ‘
fW + fh + ⋯ . . fm
W h m
fk
where, K = ∑mklW k = the sum of the frequencies or total frequencies
∑ k K
by:
HM = =
W
+ h + ⋯ . . m ∑mW • k ‘
]W ]h ]m ]k
where, K = ∑mklW k = the sum of the frequencies or total frequencies
]k = Mid-points / mid values of class intervals
If fW , fh … … . fj are ‘n’ observations, then arrange the given values in the ascending
i) For raw data/individual-series/ungrouped data:
+ 1 “”
value.
. Where K = ∑mklW
Step1: Find cumulative frequencies (CF).
Š—W
Step2: Obtain total frequency (N) and Find h k is total frequencies.
Step3: See in the cumulative frequencies the value just greater than
Š—W
h
, Then the
corresponding value of x is median.
2) Continuous frequency distribution (Grouped frequency distribution) data:
If ]W , ]h … … . ]m represents the mid-points of k class-interval xX − xW , xW − fh , fh −
fp , . . . , fmqW − fm with their corresponding frequencies W , h … … . m , then the steps given below
are followed for the calculation of median in continuous series.
K
Then apply the formula given below.
2 − . .
Median = 56 = 9 + ˜ × ™
Remarks:
1. From the point of intersection of ‘ less than’ and ‘more than’ ogives, if a perpendicular is
drawn on the x-axis,
axis, the point so obtained on the horizontal
horizontal axis gives the value of the
median.
W− X
Then apply the following formula, we can find mode:
Mode = Mo = 9 + ×L
2 W− X− h
Where, L = lower limit of the modal class.
C = Class interval of the modal class
X = frequency of the class preceding the modal class
iii) Deciles: Deciles are nine in number and divide the whole series into ten equal parts.
( + 1)
They are represented by D1, D2 …D9.
First Decile: W =
10
( + 1)
Second Decile: h = 2
10
:
( + 1)
:
Ninth Decile: ¢ = 9
10
iv) Percentiles: Percentiles are 99 in number and divide the whole series into 100 equal parts.
( + 1)
They are represented by P1, P2…P99.
First Percentile: W =
100
( + 1)
Second Percentile: h = 2
100
( + 1)
:
(•3 − •1)
quartile (Q3). i.e.
•. . =
2
The range between first quartile (Q1) and third quartile (Q3) is called by Inter quartile
range (IQR) i.e. M• = •3 − •1.
Half of IQR is known as Semi Inter Quartile Range. Hence, Q.D. is also known Semi
•3 − •1
Inter Quartile Range.
L − •. . =
•3 + •1
Computation of Q.D.:
(•3 − •1)
i) For raw data/Individual series/ ungrouped data:
•. . =
2
Where
+1
First quratile: •W = • ‘
4
+1
Third quratile: •p = 3 • ‘
4
n= number of observations
ii) Frequency distribution data:
(•3 − •1)
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
•. . =
2
Where
K+1
First quratile: •W = • ‘
4
K+1
Third quratile: •p = 3 • ‘
4
K = ∑mklW k = Total frequency
•. . =
2
Where
K
4 − ]W
First quratile: • = 9 + ˜
W W f ™ W
W
K
3 4 − ]p
Third quratile: •p = 9p + ˜ f p ™
p
Where, 9W &9p = lower limit of the first & third quartile class.
K = ∑mklW k = Total frequency
W & p = frequency of the first & third quartile class
]W &]p = cumulative frequency class preceding the first & third quartile class
W & p = width of class intervals.
Merits of Q. D.:
1. It is simple to understand and easy to calculate.
2. It is rigidly defined.
3. It is not affected by the extreme values.
4. In the case of open-ended distribution, it is most suitable.
5. Since it is not influenced by the extreme values in a distribution, it is particularly
suitable in highly skewed distribution.
Demerits of Q. D.:
1. It is not based on all the items. It is based on two positional values Q1 and Q3 and ignores
the extreme 50% of the items.
2. It is not amenable to further mathematical treatment.
3. It is affected by sampling fluctuations.
4. Since it is a positional average, it is not considered as a measure of dispersion. It merely
shows a distance on scale and not a scatter around an average.
3) Mean Deviation (M.D.):
The range and quartile deviation are not based on all observations. They are positional
measures of dispersion. They do not show any scatter of the observations from an average. The
mean deviation is measure of dispersion based on all items in a distribution.
Definition:
“Mean deviation is the arithmetic mean of the absolute deviations of a series computed
from any measure of central tendency; i.e., the mean, median or mode, all the deviations are
taken as positive”.
∑ |fk − |
5. =
Where, M. D = Mean Deviation
A = any one Measures of Average i.e. Mean or Median or Mode
5. .
n= number of observations
L − 5. . =
Mean or Median or Mode
Computation of M.D.:
∑ |fk − |
i) For raw data/Individual series/ ungrouped data:
5. =
fk = observations
Where, M. D = Mean Deviation
∑ k |fk − |
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
5. =
K
fk = observations
Where, M. D = Mean Deviation
∑ k |]k − |
2) Continuous frequency distribution (Grouped frequency distribution) data:
5. =
K
Merits of M. D.:
1. It is simple to understand and easy to compute.
2. It is rigidly defined.
3. It is based on all items of the series.
4. It is not much affected by the fluctuations of sampling.
∑(fk − )h
Z. . («) = ¬
fk = observations
Where, S.D. = Standard Deviation
= Arithmetic Mean
Z. .
n= number of observations
L − Z. . =
5 (f̅ )
∑(fk − )h
Z. . («) = ¬
fk = observations
Where, S.D. = Standard Deviation
= Arithmetic Mean
n= number of observations
b) Direct Method:
∑ fh ∑f
h
Z. . («) = ¬ −‚ ƒ
∑ 6h ∑6
h
Z. . («) = ¬ −‚ ƒ
Where d-stands for the deviation from assumed mean = (fk -A)
ii) Frequency distribution data:
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
a) Deviations taken from Actual mean:
∑ k (fk − )h
Z. . («) = ¬
K
fk = observations
Where, S.D. = Standard Deviation
= Arithmetic Mean
k = actual frequency
K = ∑mklW k = Total frequency
b) Direct Method:
∑ fh ∑ f
h
Z. . («) = ¬ −‚ ƒ
K K
c) Short-cut method (Deviations are taken from assumed mean):
∑ 6h ∑ 6
h
Z. . («) = ¬ −‚ ƒ
K K
Where d-stands for the deviation from assumed mean = (fk -A)
∑ k (]k − f̅ )h
Z. . («) = ¬
K
= Arithmetic Mean
k = actual frequency
K = ∑mklW k = Total frequency
b) Direct Method:
∑ ]h ∑ ]
h
Z. . («) = ¬ −‚ ƒ
K K
c) Short-cut method (Deviations are taken from assumed mean):
∑ 6h ∑ 6
h
Z. . («) = ¬ −‚ ƒ
K K
Where d-stands for the deviation from assumed mean = (]k -A)
Mathematical properties of standard deviation (σ)
7. S.D. of n natural numbers viz. 1,2,3...., n is calculated by
1
Z. . (« ) = ¬ ( h − 1)
12
∑(x- − x)h
called as Variance.
\ (« h ) =
n
Where, « = Variance,
h
fk = observations
= Arithmetic Mean
n= number of observations
Computation of Variance:
i) For raw data/Individual series/ ungrouped data:
∑(fk − )h
a) Deviations taken from Actual mean:
« =
h
b) Direct Method:
∑ 6h ∑6
c) Short-cut method (Deviations are taken from assumed mean):
h
« =
h
−‚ ƒ
Where d-stands for the deviation from assumed mean = (fk -A)
ii) Frequency distribution data:
1) Discrete frequency distribution (Ungrouped frequency distribution) data:
∑ k (fk − )h
a) Deviations taken from Actual mean:
«h =
K
K = ∑klW k = Total frequency
m
∑ fh ∑ f
b) Direct Method:
h
« =
h
−‚ ƒ
K K
∑ 6h ∑ 6
c) Short-cut method (Deviations are taken from assumed mean):
h
« =
h
−‚ ƒ
K K
Where d-stands for the deviation from assumed mean = (fk -A)
2) Continuous frequency distribution (Grouped frequency distribution) data:
∑ k (]k − )h
a) Deviations taken from Actual mean:
«h =
K
]k = mid-points of class intervals
∑ ]h ∑ ]
b) Direct Method:
h
« =
h
−‚ ƒ
K K
∑ 6h ∑ 6
c) Short-cut method (Deviations are taken from assumed mean):
h
« =
h
−‚ ƒ
K K
Where d-stands for the deviation from assumed mean = (]k -A)
Remarks: 1) Variance is independent on change of origin but not scale.
{Change of Origin: If all values in the series are increased or decreased by a constant,
the Variance will remain the same.
Z.
Symbolically,
L \ (L. \. ) = × 100
5
«
L \ (L. \. ) = × 100
Remarks:
1. Generally, coefficient of variation is used to compare two or more series. If coefficient of
variation (C.V.) is more for series-I as compared to the series-II, indicates that the
population (or sample) of series-I is more variable, less stable, less uniform, less
consistent and less homogeneous. If the C.V. is less for series-I as compared to the series-
i) •. . = S. D.
4. Relationship between Q.D., M.D. & S.D. is
h
p
5. . = ¯ S. D.
®
In both these distributions the value of mean and standard deviation is the same (Mean =
15, σ =5). But it does not imply that the distributions are alike in nature. The distribution on the
left-hand side is a symmetrical one whereas the distribution on the right-hand side is
asymmetrical or skewed. In these ways, measures of central tendency & dispersions are
inadequate to depict all the characteristics of distribution. Measures of Skewness gives an idea
about the shape of the curve & help us to determine the nature & extent of concentration of the
observations towards the higher or lower values of the distributions.
10.2 Definition:
"Skewness refers to asymmetry or lack of symmetry in the shape of a frequency
distribution curve"
"When a series is not symmetrical it is said to be asymmetrical or skewed."
10.3. Symmetrical Distribution.
An ideal symmetrical distribution is unimodal, bell shaped curve. The values of mean,
median and mode coincide. Spread of the frequencies on both sides from the centre point of the
curve is same. Then the distribution is symmetrical distribution.
b
According to Karl – Pearson, it involves mean, mode and standard deviation.
] o | =5 −5 6
5 – 5 6 −5 6
O − ’ L Zo | °Zo± ² = =
Z. . «
In case of mode is ill – defined, the coefficient can be determined by the formula:
3(5 – 5 6 ) 3( – 56)
O – ’ L Zo | °Zo± ² = =
Z. . «
mode is 5 6 = 35 6
1. For moderately skewed distribution, empirical relationship between mean, median and
− 2 5
⇒5 − 5 6 = 3°5 – 5 6 ²
2. Karl-Pearson’s coefficient of skewness ranges from -1 to +1. i.e. −1 ≤ Zo± ≤ 1
3. Zo± = 0, = 56 = 5 , Zero skewed
4. Zo± = +1, o | 6
5. Zo± = −1, K o | 6
(2) Bowley’s Method:
In Karl – Pearson’s method of measuring skewness requires the whole series to
calculation. Prof. Bowley has suggested a formula based on relative position of quartiles. In a
symmetrical distribution, the quartiles are equidistant from the value of the median. Bowley’s
method of skewness is based on the values of median, lower and upper quartiles.
b ] o | = •p + •W − 2 5 6
•p + •W − 2 5 6
´ | ′ L Zo | (Zoµ ) =
•p − •W
Where •p and •p are upper and lower quartiles.
Remarks:
1. Bowley’s coefficient of skewness ranges from -1 to +1. i.e. −1 ≤ Zoµ ≤ 1
2. Zoµ = 0, Zero skewed
3. Zoµ = +1, o | 6
4. Zoµ = −1, K o | 6
5. Bowley’s coefficient of skewness also called as Quartile co-efficient of skewness. It can be
used in open-end class interval and when mode is ill defined.
6. One of main limitation in Bowley’s coefficient of skewness is that, it includes only two
extreme quartiles and is based on 50% of observation. It not covers all the observations.
(3) Kelly’s method:
¢X + WX − 2 ¯X
Kelly developed another measure of skewness, which is based on percentiles or deciles.
b ] o | =
2
¢X + WX − 2 ¯X
O ′ L Zo | (Zom ) =
¢X − WX
Where WX , ¯X & ¢X are respectively tenth, fiftieth and ninetieth percentiles.
¢ + W − 2 ¯
Or
b ] o | =
2
¢− W
Where W , ¯ & ¢ are respectively first, fifth and ninth deciles.
(4) Skewness based on moments:
The measure of skewness based on moments is denoted by ¶W or ·W and is given by:
¸ph
¶W = p ·W = •¶W
¸h
10.7 Moments:
Moments refers to the average of the deviations from mean or origin raised to a certain
power. The arithmetic mean of various powers of these deviations in any distribution is called
the moments of the distribution about mean. Moments about mean are generally used in
statistics. The moments about the actual arithmetic mean are denoted by µr. The first four
∑(fk − )»
moments about mean or central moments are as follows:
¹º
] ] ¸» = , = 1,2,3 … o
∑(fk − f̅ )
1¼¹ ] ] ¸W = =½ (0)
∑(fk − f̅ )h
2j¾ ] ] ¸h = =\
∑(fk − f̅ )p
3»¾ ] ] ¸p = = Zo |
∑(fk − f̅ )®
4¹º ] ] ¸® = =o
Definition:
“Kurtosis’ is used to describe the degree of peakedness/flatness of a unimodal frequency
curve or frequency distribution”.
“Kurtosis is another measure, which refers to extent to which a unimodal frequency curve
is peaked/ flatted than normal curve”.
Remarks:
1) Every set is subset of itself i.e. A⊂ A
2) Null set is a sub set of every set i.e. φ ⊂ A, φ⊂ B, φ⊂ C...
5) Equal Set: If A is sub set of B ( i.e. A⊂ B) and B is sub set of A (i.e. B⊂ A), then A& B are
said to be equal i.e. A=B.
6) Equivalent Set: Two sets are said to be equivalent set, if they contain the same number of
elements i.e if n(A)=n(B).
7) Universal Set: Any set which contains many set is known as universal set. It is always
denoted by S or U.
11.4 Operation on Set:
1) Union of Sets: Union of two sets A & B is the set consisting of elements which belong to
either A or B or both (At least one of them should occur/happen).
Symbolically: A∪B={x: x∈A or x∈B}
Ex: U= {a, b, c, d, e, f}, A= {a, b, c, d}, B={b, d, e, f}
Then A or B =A∪B = {a, b, c, d, e, f}
2) Intersection of sets: Intersection of two sets A & B is the set consisting of elements, which
are common in both A & B sets.
Symbolically: A and B = A∩B ={x: x∈A and x∈B}
Ex: if U= {a, b, c, d, e, f}, A= {a, b, c, d}, B={b, d, e, f}
Then A and B =A∩B = {b, d}
3) Disjoint or Mutually exclusive sets: If sets A & B are said to be disjoint set if intersection of
them is the null set i.e. A∩B=φ
Then ̅ = {e, f }
Ex: if U = {a, b, c, d, e, f}, A= {a, b, c, d}, B={b, d, e, f}
5) Difference of two sets: The difference of two sets A & B, which is denoted by A-B is the set
of elements which belongs to A but not belongs to B.
Symbolically: A - B = {x: x∈A and x∉B}
Ex: if if U = {a, b, c, d, e, f}, A= {a, b, c, d}, B={b, d, e, f}
Then A-B = {a, c }, B-A ={e,f}
to each other. The event ‘A does not occur’ is denoted by A' or ̅ or Ac. The event and its
The event “A occurs” and the event “A does not occur” are called complementary events
K ]b b ℎ ( ) ]
probability (p) of happening of ‘A’ is given by:
( )=8= = =
]b fℎ (Z)
Where, n(A)=m= number of favourable cases to an event A
n(S)= n= number of exhaustive cases
Remarks:
1) If m = 0 ⇒ P(A)=p = 0, then ‘A’ is called an impossible event.
2) If m = n ⇒ P(A) = 1, then ‘A’ is called sure (or) certain event.
3) P(φ) = 0 ⇒ probability of null event is always zero
4) P(S) = 1 ⇒ probability of sample space is always one
5) The probability is a non-negative real number and cannot exceed unity
i.e. 0 ≤ P(A) ≤ 1 (i.e. probability lies between 0 to 1)
]
of ‘A’ is given by:
( )=8= ]
j→∞
Remarks: The Statistical probability calculated by conducting an actual experiment is also
called a posteriori probability or empirical probability.
Drawbacks:
1) It fails to determine the probability in the cases when the experimental conditions don’t
remains identically homogeneous.
2) The relative frequency (m/n) may not attain a unique value because actual limiting value may
not really exist.
3) The concept of infinitely large number of observation is theoretical and impracticable.
3) Axiomatic approach to probability: (by A.N. Kolmogorov in 1933)
The modern approach to probability is purely axiomatic and it is based on the set theory.
Axioms of probability:
Let ‘S’ be a sample space and ‘A’ be an event in ‘S’ and P(A) is the probability satisfying
the following axioms:
(1) The probability of any event ranges from zero to one. i.e 0 ≤ P(A) ≤ 1
(2) The probability of the entire space is 1. i.e P(S) = 1
( W ∪ h ∪ … ∪ j) = ( W) + ( h) + ⋯ + ( j)
(3) If A1, A2,…An is a sequence of n mutually exclusive events in S, then
!
arranging ‘r’ objects selected from ‘n’ objects in order is given by
ÆÇ =
( − )!
Where ! is factorial,
n!= n*(n-1)*(n-2)*....3*2*1
Remarks: (a) 0!=1, (b) npn=n!, (c) np0=1, (d) np1=n
2) Combination:
A combination is a selection of objects from group of objects without considering the
order of arrangements. The number of combination is the number of way of selecting ‘r’ objects
!
from ‘n’ objects when order of arrangement is not important is given by:
vÇ =
( − )! !
jÉÇ
vÇ =
»!
Remarks: (a) nCn=1, (b) nC0=1, (c) nC1=n, (d) , (e) npr =r! * nCr
11.9 Theorems of Probability:
There are two important theorems of probability namely,
1. The addition theorem on probability
2. The multiplication theorem on probability.
1) The addition theorem on probability: Here we have two cases
Case I: when events are not mutually exclusive:
If A and B are any two events which are not mutually exclusive, then probability of
( ´ ) = ( ∪ ´ ) = ( ) + (´ ) − ( ∩´)
occurrence of at least one of them (either A or B or both) is given by:
( ´ ) = ( ∪ ´ ) = ( ) + (´ )
probability of A & B given by:
( 6 ´ ) = ( ∩´ ) = ( ). (´ )
of both them is equal to the product of their individual probabilities is given by:
( 6 ´ ) = ( ∩´ ) = ( ). (´/ ); ( ) > 0
occur is
( 6 ´ ) = ( ∩´ ) = (´ ). ( /´ ); (´ ) > 0
For three events A, B & C:
P (A∩B∩C) = P (A). P (B/A). P (C/A∩B)
11.10 Conditional Probability:
If two events ‘A’ and ‘B’ are said to be dependent with P(A) >0, then the probability that
an event ‘B’ occurs subject to the condition that ‘A’ has already occurred is known as the
conditional probability of the event ‘B’ on the assumption that the event ‘A’ has already
occurred. It is denoted by the symbol P(B/A) or P(B|A) and read as the probability of B given A.
( ∩´ )
If two events A and B are dependent, then the conditional probability of B given A is
(´/ ) = ; ( ) > 0
( )
Similarly, if two events A and B are dependent, then the conditional probability of A
( ∩´ )
given B is denoted by P(A/B) or P(A|B) is
(1) ( = fk ) ≥ 0 and
Following condition should hold:
(2) ∑ ( = fk ) = 1
In tossing two coins example, ( = fk ) is the probability function given as,
Sample point HH HT TH TT
( = fk )
X 2 1 1 0
1/4 1/4 1/4 1/4
( = fk ) is called probability mass function and its distribution is called discrete probability
If the random variable X is a discrete random variable, the probability function
(i) ( = fk ) ≥ 0 and
distribution. It satisfies the following conditions:
(ii) ∑ ( = fk ) = 1
Ex: for discrete probability distribution:
1) Bernoulli Distributions
2) Binomial Distributions
3) Poisson Distributions
2) Probability density function (pdf) & Continuous probability distribution:
probability distribution.
(i) ( = fk ) ≥ 0 and
It satisfies the following conditions:
(ii) Ë ( = fk ) = 1
Ex: for discrete probability distribution
1) Normal Distributions
2) Standard Normal Distributions
12.4. Probability mass function/Discrete probability distribution:
1) Bernoulli distributions: (Given by Jacob Bernoulli):
Bernoulli distributions is based on Bernoulli trails. A Bernoulli trial is a random
experiment in which there are only two possible/dichotomous outcomes consists of success or
failure. Ex: for Bernoulli’s trails are:
1) Toss of a coin (head or tail)
2) Throw of a die (even or odd number)
3) Performance of a student in an examination (pass or fail)
4) Germination of seed (germinate or not) etc...
Definition: A random variable x is said to follow Bernoulli distribution, if it takes only two
possible values 1 and 0 with respective probability of success ‘p’ and probability of failure ‘q’
8 Í WqÍ ; f = 0 1
i.e., P(x=1) = p and P(x=0) = q, q = 1-p, then the Bernoulli probability mass function is given by
( = fk ) = Ì Î
0 ℎ |
Where x= Bernoulli variate, p=probability of success, and q=probability of failure
2) Binomial distributions:
Binomial distribution is a discrete probability distribution which arises when Bernoulli
trails are performed repeatedly for a fixed number of times say ‘n’.
Definition: A random variable ‘x’ is said to follow binomial distribution if it assumes
vÏ 8 ; f = 0, 1,2,3 …
nonnegative values and its probability mass function is given by
Í jqÍ
( = fk ) = Ì Î
0 ℎ |
The two independent constants ‘n’ and ‘p’ in the distribution are known as the parameters
of the distribution.
Condition/assumptions of Binomial distribution:
We get the Binomial distribution under the following experimental conditions.
1) The number of trials ‘n’ is finite.
2) The probability of success ‘p’ is constant for each trial.
3) The trials are independent of each other.
4) Each trial must result in only two possible outcomes i.e. success or failure.
The problems relating to tossing of coins or throwing of dice or drawing cards from a
pack of cards with replacement lead to binomial probability distribution.
Constant of Binomial distribution:
Parameter of model are n & p
1) Mean = E(X) = np
Mean >Variance
2) Variance = V(X)= npq
Standard Deviation = SD(X) = • 8
Ðq±
√j±Ð
3) Coefficient of Skewness =
WqѱÐ
j±Ð
4) Coefficient Kurtosis =
5) Mode of the Binomial distribution is that value of the variable x, which occurs with the
largest probability. It may have either unimode or bimode.
Importance/Situation of Binomial Distribution:
1) In quality control, officer may want to know & classify items as defectives or non-
defective.
2) Number of seeds germinated or not when a set of seeds are sown
0 ℎ |
Where λ is known as parameter of the distribution so that λ >0
X= Poisson variate....
e=2.7183
Constant of Poisson distribution:
Parameter of model is λ
1) Mean = E(X) = λ
2) Variance = V(X)= λ λ
Mean =Variance =λ
1 qW(Ý)‡
The standard normal distribution is given by
φ([) = h ; −∞ ≤ [ ≤ ∞
√2Ù
The advantage of the above function is that it doesn’t contain any parameter. This enables
us to compute the area under the normal probability curve. And all the properties holds good for
standard normal distributions. Standard normal distributions also know as unit normal
distribution.
Importance/ application of normal distribution:
The normal distribution occupied a central place of theory of Statistics
1) ND has a remarkable property stated in the central limit theorem, which state that sample
size (n) increases, then distribution of mean of random sample approximately normal
distributed.
2) As sample size (n) becomes large, ND serves as a good approximation of many discrete
probability distribution viz. Binomial, Poisson, Hyper geometric etc..
3) Many of sampling distribution Ex: Student-t, Snedecor’s F, Chi-square distribution
etc... tends to normality for large sample.
4) In testing of hypothesis, the entire theory of small sample test viz. t, f, chi-square test are
based on the assumption that sample are drawn parents population follows normal
distribution.
5) ND is extensively used in statistical quality control in industries.
Remarks: Sampling error is inversely proportional to square root of sample size (n) i.e. Zß ∝
inference about population parameter on the basis of few observation (sample).
W
√j
. Sampling error decreases as the sample size (n) is increased. Sampling errors are non-
existent in census survey, but exist only in sampling survey.
Remarks: Non-sampling error is directly proportional to sample size (n) i.e. KZß ∝ . Non-
k) Error in recording & interviews etc...
sampling error increases as the sample size (n) is increased. Non-sampling error are more in
census survey & less in sampling survey.
and the probability distribution of f̅ and σ etc... Such probability distribution of a statistic is
statistic value like f
For Ex: the standard deviation of the sampling distribution of the mean (f̅ ) known as the
error. It is abbreviated as S.E.
Decision
Nature of Hypothesis
Accept Ho Reject Ho
Ho is true Correct Decision Type I error
1) Type-I error: Rejecting H0 when H0 is true. i.e. The Null hypothesis is true but our test
rejects it. It is also called as first kind of error.
2) Type-II error: Accepting H0 when H0 is false. i.e. The Null hypothesis is false but our test
accepts it. It is also called as second kind of error.
3) The Null hypothesis is true and our test accepts it (correct decision)
( 8 M ) = á
4) The Null hypothesis is false and our test rejects it (correct decision)
( 8 MM ) = ¶
Remarks:
1) In quality control, Type-I error amounts to rejecting a lot when it is good, so Type-I error is
also called as producer risk. Type-II error may be regarded as accepting the lot when it is
bad, so Type-II error is called as consumer risk.
2) Two types of errors are inversely proportional. If one increase, then others decrease, and
vice-versa.
3) Among two errors, Type-I error is more serious than the Type-II error.
Ex: A judge who has to decides whether a person has committed the crime or not. Statistical
hypothesis in this case are,
Ho: person is innocent
H1: Person is crime
Type-I error: Innocent person is found guilty and punished
Type-II error: A guilty person is set free
( 8 − M ) = á
The probability of committing Type-I error is called level of significance. It is denoted by α.
The maximum probability at which we would be willing to risk of Type-I error is known
as level of significance or the size of Type-I error is called as level of significance.
The level of significance usually employed in testing of hypothesis is 5% and 1%. The
Level of significance is always fixed in advance before collecting the sample information. LoS
5% means, the results obtained will be true is 95% out of 100 cases and the results may be wrong
is 5 out of 100 cases.
14.12 Level of Confidence:
The probability of Type-I error is denoted by α. The correct decision of accepting the null
hypothesis when it is true is known as the level of confidence. The level of confidence is denoted
by 1- α.
14.13 Power of test:
The probability of Type-II error is denoted by β. The correct decision of rejecting the null
hypothesis when it is false is known as the power of the test. It is denoted by 1-β.
14.14 Critical Region and Critical Value: In any test, the critical region is represented by a
portion of the area under the probability curve of the sampling distribution of the test statistic.
A region in the sample space S which amounts to rejection of Null hypothesis H0 is
termed as critical region or region of rejection.
The value of test statistic which separates the critical (or rejection) region and the
acceptance region is called the critical value or significant value. It depends upon
i) level of significance (α) used and
ii) alternative hypothesis, whether it is two-tailed or single-tailed.
Left tailed test: In the left tailed test (H1 : µ < µ0 ) the critical region is entirely in the left of the
distribution of x.
Two tailed test: When the critical region falls on either end of the sampling distribution, it is
called two tailed test.
A test of statistical hypothesis where the alternative hypothesis is two tailed such as,
H0 : µ = µ0 against the alternative hypothesis
H1: µ ≠µ0 (µ > µ0 and µ < µ0)
is known as two tailed test and in such a case the critical region is given by the portion of
the area lying in both the tails of the probability curve of test of statistic.
Remark: Whether one tailed (right or left tailed) or two tailed test to be applied is depends only
on alternative hypothesis (H1).
14.16 Test of Significance
The theory of test of significance consists of various test statistic. The theory had been
developed under two broad heading:
1. Test of significance for large sample
Large sample test or Asymptotic test or Z test (n≥30)
2. Test of significance for small samples (n<30)
Small sample test or exact test-t, F and χ2.
It may be noted that small sample tests can be used in case of large samples also.
mean µ which is equal to specified mean/hypothesized mean µ o on the basis of sample mean f̅ .
we are interested to examine whether the sample would have come from a population having
f̅ − µ f̅ − µ X
Test statistic Test statistic
½ = « X ~ K (0, 1) ½ = ~K (0, 1)
Z
√ √
where ‘f̅ ’ is the sample mean, “S” is sample standard deviation
4 Compute the Z test statistic value (denote it as Zcal) and Z table value at α level of
significance (denote it as Zcal). Table values for two tailed are 1.96 at 5% and 2.58 at 1%
level of significance. Table values for one tailed are 1.645 at 5% and 2.33 at 1% level of
significance
5 Determination of Significance and Decision Rule:
a. If |Z cal| ≥ Z tab at α, Reject H0
b. If. |Z cal| < Z tab at α, Accept H0.
6 Conclusions:
a. If we reject the null hypothesis H0, then our conclusion will be there is a significant
difference between sample mean and population mean.
b. If we accept the null hypothesis H0, then our conclusion will be there is no significant
difference between sample mean and population mean.
II. To test the significance difference between two Population Means µ1 & µ2 (two sample
sample means f̅W & f̅h . Or to test the significant difference between the two populations mean µ 1
test): Here we are interested to test equality of two population means µ 1 & µ 2 on the basis of
Where « = Where Z =
h j† Û†‡ —j‡ Û‡‡ h j† ㆇ —j‡ ㇇
j† —j‡ j† —j‡
4. Compute the Z test statistic value and denote it as Z cal and Z table value at α level of
significance, denote it as Z tab.
5. Determination of Significance and Decision Rule:
a. If |Z cal| ≥ Z tab at α, Reject H0
b. If. |Z cal| < Z tab at α, Accept H0.
6. Conclusions:
a. If we reject the null hypothesis H0, then our conclusion will be there is a significant
difference between two populations mean.
b. If we accept the null hypothesis H0, then our conclusion will be there is no
significant difference between two populations mean.
Let fW , fh … … … fj be the random sample of size ‘n’ form a normal population with a
Student, and later on developed and extended by Prof. R.A. Fisher.
mean ‘µ’ and variance ‘σ2’ then student’s t-test is defined by statistic
f̅ − μ
= ~ (jqW) 6
√
∑ Ít ∑(Ít q Í̅ )
and Z = „
‡
where, f̅ =
j jqW
; S is a unbiased estimate of population SD (σ). The above
test statistic follows student’s t-distribution with (n-1) degrees of freedom.
15.3 Properties of t- distribution:
1. t-distribution ranges from - ∞ to ∞ just as does a normal distribution.
2. Like the normal distribution, t-distribution also symmetrical and has a mean zero.
3. t-distribution has a greater dispersion than the standard normal distribution.
4. As the sample size approaches 30, the t-distribution, approaches the Normal distribution.
15.4 Assumptions:
1. The parent population from which the sample drawn is normal.
2. The sample observations are random and independent.
3. The population standard deviation σ is not known.
4. Size of the sample is small (i.e. n<30)
15.5 Applications of t-distribution or t-test
1) To test significant difference between sample mean and hypothetical value of the
population mean (single population mean).
2) To test whether any significant difference between two sample means.
i. Independent samples
ii. Related samples: paired t-test
3) To test the significance of an observed sample correlation co-efficient.
4) To test the significance of an observed sample regression co-efficient.
5) To test the significance of observed partial correlation co-efficient.
1) Test for single population means (one sample t- test)
Test procedure
Aim: To test whether any significant difference between sample mean and population mean.
Let ‘µ’ is the population mean
‘f̅ ’ is the sample mean
‘S’ is the sample standard deviation
‘n’ is sample size
f̅ − µ
3. Consider test statistic : under Ho
= ~ (jqW) 6
√
4. Compare the ‘tcal’ calculated value with the ‘ttab’ table value for (n-1) df at α level of
significance.
5. Determination of Significance and Decision
c. If |t cal| ≥ |t tab| for (n-1) df at α, Reject H0.
d. If |t cal| < |t tab| for (n-1) df at α, Accept H0.
6.Conclusion:
a. If we reject the null hypothesis conclusion will be there is significant difference between
sample mean and population mean.
b. If we accept the null hypothesis conclusion will be there is no significant difference
between sample mean and population mean.
2) Test of significance for difference between two means:
2a) Independent samples t-test:
If we want to test if two independent samples have been drawn from two normal
populations having the same means, the Standard deviation of two populations are same and
unknown.
Let x1, x2, …. xn1 and y1, y2,…… yn2 are two independent random samples from the given
normal populations. Let µ 1 and µ 2 are the mean of two populations, f̅W and f̅h are mean of two
samples, Wh and hh are variance of two samples, and n1 and n2 are size of two samples.
Test procedure
Aim: To test whether any significant difference between the two independent samples mean.
Steps:
1. Null Hypothesis H0: µ 1 = µ 2 i. e. the samples have been drawn from the normal
populations with same means or both population have same mean
Alternative Hypothesis H1: µ 1 ≠ µ 2
2. Level of significance(α) = 5% or 1%
3. Consider test statistic: under H0
and Z h = {∑(fk − f̅ )h + ∑( − )h Â
∑ Ít ∑ yi W
where, f̅ = k
j† j† —j‡ qh
, y=
n2
4. Compare the ‘tcal’ calculated value with the ‘ttab’ table value for (n1 + n2 –2) df at α level of
significance.
5. Determination of Significance and Decision
a. If |t cal| ≥ t tab for (n1 + n2 – 2) df at α, Reject H0.
b. If |t cal| < t tab for (n1 + n2 – 2) df at α, Accept H0.
6. Conclusion
a. If we reject the null hypothesis conclusion will be there is significant difference between
the two sample means.
b.If we accept null hypothesis conclusion will be there is no significant difference between
the two sample means.
2b) Dependent or related samples or Paired t-test:
When n1 = n2 = n and the two samples are not independent but the sample
observations are paired together, then Paired t-test test is applied. The paired t-test is
generally used when measurements are taken from the same subject before and after some
manipulation/ treatment such as injection of a drug. For ex, you can use a paired‘t’-test to
determine the significance of a difference in blood pressure before and after administration of an
experimental presser substance.
You can also use a paired ‘t’-test to compare samples that are subjected to different
conditions, provided the samples in each pair are identical otherwise. For ex, you might test the
effectiveness of a water additive in reducing bacterial numbers by sampling water from different
sources and comparing bacterial counts in the treated versus untreated water sample. Each
different water source would give a different pair of data points.
Assumptions/Conditions:
1. Samples are related with each other i.e. The sample observations (x1, x2 , ……..xn) and (y1,
y2,…….yn) are not completely independent but they are dependent in pairs.
2. Sizes of the samples are small and equal i.e., n1 = n2 = n(say),
3. Standard deviations in the populations are equal and not known
Test procedure
Let x1, x2………...xn are ‘n’ observations in first sample.
y1, y2………..yn are ‘n’ observations in second sample.
di = (xi - yi) = difference between paired observations.
⃓ 6̅ ⃓
3. Consider test statistic: under H0
= ~ (jqW) 6
√
∑ ¾t
6̅ =
j
where, ; di=(xi-yi) = difference between paired observations and
Z = „jqW æ∑ 6h − ç
W ( ∑ ¾)‡
j
4. Compare the ‘tcal’ calculated value with the ‘ttab’ table value for (n-1) df at α level of
significance.
5. Determination of Significance and Decision
a. If |t cal| ≥ t tab for (n-1) df at α, Reject H0.
b. If |t cal| < t tab for (n-1) df at α, Accept H0.
6. Conclusion
a. If we reject the null hypothesis H0 conclusion will be there is significant difference
between the two sample means.
b. If we accept the null hypothesis H0 conclusion will be there no is a significant difference
between the two sample means.
15.6 Chi- Square Test (èé test):
The various tests of significance such that as Z-test, t-test, F-test have mostly applicable
to only quantitative data and based on the assumption that the samples were drawn from normal
population. Under this assumption the various statistics were normally distributed. Since the
procedure of testing the significance requires the knowledge about the type of population or
parameters of population from which random samples have been drawn, these tests are known as
parametric tests.
But there are many practical situations the assumption of about the distribution of
population or its parameter is not possible to make. The alternative technique where no
assumption about the distribution or about parameters of population is made are known as non-
parametric tests. Chi-square test is an example of the non-parametric test and distribution free
test.
Definition:
The Chi- square (ê h ) test (Chi-pronounced as ki) is one of the simplest and most widely used
non-parametric tests in statistical work. The ê h test was first used by Karl Pearson in the year
( k − ßk )h
observation. It is defined as
êh = ë ì í ~ ê h (j) 6
ßk
Where ‘O’ refers to the observed frequencies and ‘E’ refers to the expected frequencies.
Remarks:
1) If ê h is zero, it means that the observed and expected frequencies coincide with each other.
The greater the discrepancy between the observed and expected frequencies the greater is the
value of ê h .
2) ê h .-test depends on only the on the set of observed and expected frequencies and on degrees
of freedom (df), it does not make any assumption regarding the parent population from which the
observation are drawn and it test statistic does not involves any population parameter, it is
termed as non-parametric test and distribution free test.
Measuremental data: The data obtained by actual measurement is called measuremental data.
For example, height, weight, age, income, area etc.,
Enumeration data: The data obtained by enumeration or counting is called enumeration data.
For example, number of blue flowers, number of intelligent boys, number of curled leaves, etc.,
χh – test is used for enumeration data which generally relate to discrete variable where as t-test
and standard normal deviate tests are used for measure mental data which generally relate to
continuous variable.
Properties of Chi-square distribution:
1. The mean of ê h distribution is equal to the number of degrees of freedom (n)
2. The variance of ê h distribution is equal to 2n
3. The median of ê h distribution divides, the area of the curve into two equal parts, each part
being 0.5.
4. The mode of ê h distribution is equal to (n-2)
5. Since Chi-square values always positive, the Chi square curve is always positively skewed.
6. Since Chi-square values increase with the increase in the degrees of freedom, there is a new
Chi-square distribution with every increase in the number of degrees of freedom.
7. The lowest value of Chi-square is zero and the highest value is infinity. i.e. Chi-square ranges
from 0 to ∞
Conditions for applying ê h test:
The following conditions should be satisfied before applying ê h test.
1. N, the total frequency should be reasonably large, say greater than 50.
2. No theoretical (expected) cell-frequency should be less than 5. If it is less than 5, the
frequencies should be pooled together in order to make it 5 or more than 5.
3. Sample observations for this test must be independent of each other.
Karl Pearson developed a ê h -test for testing the significance of the discrepancy between
1. Testing the Goodness of fit (Binomial and Poisson Distribution):
( − ßk )h
j
and the theoretical values. Karl Pearson proved that the statistic
ê = ëì í ~ê h (îljqmqW)¾…
h k
ßk
klW
Follows ê h -distribution with υ= n – k – 1 d.f. where O1, O2, ...On are the observed frequencies,
E1 , E2…En, corresponding to the expected frequencies and k is the number of parameters to be
k Lï Z ] ℎ | Z ] ð ℎ ]
the (i,j)th cell is calculated as
ßkï = =
K [ ]8
1 Null Hypothesis and Alternative Hypothesis
2 HO: The two factor or attributes are independent each other.
3 H1: The two factor or attributes are not independent each other.
4 Level of Significance is (α ) = 0.05 or 0.01
5 Test Statistic:
s j
(ñkï − ßkï )h
ê = ëë
h
~ê h (sqW)(jqW)¾…
ßkï
klW ïlW
6 If Compare the calculate the ‘ê h
ò§ó ’ value with the ê h ¹§¨ table value for (] − 1)( − 1) df
at α level of significance .
The formula for finding ê from the observed frequencies a,b,c and d is
h
Col Total (a+c)=C1 (b+d )=C2 a+b+c+d= N
( 6 − b )h K
êh = ~ ê hW ¾…
( + )(b + 6)( + b)(( + 6)
The decision about independence of factor/attributes A and B is taken by comparing ê h l with
ê h b at certain level of significance; We reject or accept the null hypothesis accordingly at that
level of significance.
Yate’ s Correction for Continuity
In a 2×2 contingency table, the number of df is (2−1)(2−1) =1. If any one of the
theoretical cell frequency is less than 5, the use of pooling method will result in df = 0 which is
meaningless. In this case we apply a correction given by F. Yate (1934) which is usually known
as “Yates correction for continuity”. This consisting adding 0.5 to cell frequency which is less
than 5 and then adjusting for the remaining cell frequencies accordingly. Thus corrected values
of ê h is given as
K[| 6 − b | − K/2]h
êh =
( + )(b + 6 )( + b)(( + 6)
F – Statistic Definition:
If X is a ê h variate with n1 df and Y is an independent ê h - variate with n2 df, then F- statistic is
defined as i.e. F - statistic is the ratio of two independent chi-square variates divided by their
respective degrees of freedom. This statistic follows G.W. Snedocor’s F-distribution with ( n1, n2)
ö÷
= ø÷j1 ~ (j1,j2)6
j2
df i.e.
Application of F-test:
1 Testing Equality/homogeneity of two population variances.
2 Testing of Significance of Equality of several means.
3 Testing of Significance of observed multiple correlation coefficients.
4 Testing of Significance of observed sample correlation ratio.
5 Testing of linearity of regression
1) Testing the Equality/homogeneity of two population variances:
Suppose we are interested to test whether the two normal populations have same variance
or not. Let x1, x2, x3 ….. xn1, be a random sample of size n1, from the first population with
variance «W h and y1, y2, y3 … y n2, be random sample of size n2 form the second population with
a variance «h h . Obviously the two samples are independent.
9
It should be noted that numerator is always greater than the denominator in F-ratio
=
Z]
ν1 = W − 1=df for sample having larger variance
ν=
2 h − 1 = df for sample having smaller variance
The calculated value of Fcal is compared with the table value Ftab for ν1 and ν2 at 5% or
1% level of significance. If Fcal > Ftab then we reject Ho. On the other hand if Fcal < Ftab we accept
the null hypothesis and inferred that both the samples have come from the population having
same variance.
Since F- test is based on the ratio of variances it is also known as the Variance Ratio
test. The ratio of two variances follows a distribution called the F distribution named after the
famous statisticians R.A. Fisher.
Ranges Between
Probability 0 to 1
Z statistic - ∞ to + ∞
t -statistic - ∞ to + ∞
χh Statistic 0 to + ∞
F- statistic 0 to + ∞
Correlation -1 to +1
Regression - ∞ to + ∞
Binomial variate 0 to n
Poisson Variate 0 to + ∞
Normal Variate - ∞ to + ∞
16.1 Introduction
The term correlation is used by a common man without knowing that he is making use of
the term correlation. For example when parents advice their children to work hard so that they
may get good marks, they are correlating good marks with hard work. Sometimes the variables
may be inter-related. The nature and strength of relationship may be examined by correlation and
Regression analysis.
16.2 Definition:
Correlation is a technique/device//tool to measure the nature and extent of relationship of
two or more variables.
Ex: Study the relationship between blood pressure and age, consumption level of nutrient
and weight gain, total income and medical expenditure, relation between height of father and
son, yield and rainfall, wage and price index, share and debentures etc.
Correlation is statistical analysis which measures nature and degree of association or
relationship between two or more variables. The word association or relationship is important. It
indicates that there is some connection between the variables. It measures the closeness of the
relationship. Correlation does not indicate cause and effect relationship.
16.3 Uses of correlation:
1) It is used in physical and social sciences.
2) It is useful for economists to study the relationship between variables like price, quantity
etc.. for businessmen estimates costs, sales, price etc. using correlation.
3) It is helpful in measuring the degree of relationship between the variables like income
and expenditure, price and supply, supply and demand etc…
4) It is the basis for the concept of regression.
16.4 Types of Correlation:
i) Positive, Negative and No Correlation
ii) Simple, Multiple, and Partial Correlation
iii) Linear and Non-linear
iv) Nonsense and Spurious Correlation
i) Positive, Negative, and No Correlation:
These depend upon the direction/movement of change of the variables.
Positive or direct correlation
If the two variables tend to move together in the same direction, i.e. an increase in the
value of one variable is accompanied by an increase in the value of the other (↑↑) or decrease in
the value of one variable is accompanied by a decrease in the value of other (↓↓), then the
correlation is called positive or direct correlation.
L ( , ýP
where, X and Y → variables
W
∑ Nfk 7 f̅ PN k 7 P → covariance between X and Y
j
W
\N P j ∑Nfk 7 f̅ Ph → variance of X
W
\ NýP j ∑N k 7 Ph → variance of Y
Then the
he correlation coefficient is given by
∑ Nfk 7 f̅ PN k 7 P
Íþ
•∑Nfk 7 f̅ Ph •∑N k 7 Ph
we can further simply the calculations, then Pearsonian correlation coefficient given as
∑ ∑ý
∑ ý7
Íþ
h h
„∑ h 7 N∑ P „∑ ý h 7 N∑ ýP
Or
In the above method we need not find mean or standard deviation of variables separately.
However, if X and Y assume large values, the calculation is again quite time consuming.
Remarks:
The denominator in the above formulas is always positive. The numerator may be
positive or negative; therefore the sign of correlation coefficient (r) will be decided by either
positive or negative sign of Cov(X, Y).
Assumptions of Pearsonian correlation coefficient (r):
Correlation coefficient r is used under certain assumptions, they are
1. The variables under study are continuous random variables and they are normally distributed
2. The relationship between the variables is linear
3. Each pair of observations is unconnected with other pair (independent)
Interpreting the value of ‘r’:
The following table sums up the degrees of correlation corresponding to various values of
Pearsonian correlation coefficient (r):
Degree of Correlation Positive Negative
Perfect Correlation +1 -1
Very high degree of correlation > +0.9 > -0.9
Sufficiently high degree of correlation +0.75 to +0.9 -0.75 to -0.9
Moderate degree of correlation +0.6 to +0.75 -0.6 to -0.75
Only possibility of correlation +0.3 to +0.6 -0.3 to -0.6
Possibly no correlation < +0.3 < -0.3
No correlation 0 0
6 ∑ 6kh
Spearman’s rank correlation coefficient, usually denoted by ρ (Rho) is given by the equation
=1−
( h − 1)
where, 6k = (fk − k ) difference between the pair of ranks of the same individual in the two
characteristics and n is the number of pairs of observations.
Repeated values/tied observations:
In case of attributes if there is a tie in values i.e., if any two or more individuals are
placed with the same value w.r.t. an attribute, then Spearman’s for calculating the rank
correlation coefficient breaks down. In this case common ranks are assigned to the repeated
values (observations). For example if the value so is repeated twice at the 5th rank, the common
rank to be assigned to each item is (5+6)/2=5.5, which is the average of 5 and 6 given as 5.5,
appeared twice. These common ranks are the arithmetic mean of the ranks, assigned to tied
6{∑ 6kh Î + . Â
= 1−
( h − 1)
∑(]kp − ]k )
Where, c.f. = Correction factor
. .=
12
]k = Number of times the value is repeated/tied
1. Rank correlation co-efficient lies between -1 and +1. i.e. −1 ≤ ≤ +1. Spearman’s rank
Remarks on Spearman’s Rank Correlation Coefficient
correlation coefficient, ρ, is nothing but Karl Pearson’s correlation coefficient (r) between the
ranks; it can be interpreted in the same way as the Karl Pearson’s correlation coefficient.
2. Karl Pearson’s correlation coefficient assumes that the parent population from which sample
observations are drawn is normal. If this assumption is violated then we need a measure,
which is distribution free (or non-parametric). Spearman’s ρ is such a distribution free and
nonparametric measure, since no strict assumptions are made about from of the population
from which sample observations are drawn.
3. Spearman’s formula is the only formula to be used for finding correlation coefficient if we are
dealing with qualitative characteristics, which cannot be measured quantitatively but can be
arranged serially. It can also be used where actual data are given.
4. Spearman’s rank correlation can also be used even if we are dealing with variables, which are
measured quantitatively, i.e. when the actual data but not the ranks relating to two variables
are given. In such a case we shall have to convert the data into ranks. The highest (or the
smallest) observation is given the rank 1. The next highest (or the next lowest) observation is
given rank 2 and so on. It is immaterial in which way (descending or ascending) the ranks are
assigned.
ý = +b +
The regression equation of Y on X is given as
Where
Y= dependent variable;
X = independent variable
a = intercept
b = the regression coefficient (or slope) of the line.
e = error
“a” and “b” are called as constants
L ( , ý)
b = bþÍ =
\( )
∑ ∑ý
∑ ý −
bþÍ =
(∑ )
h
∑ h−
∑ ý−∑ ∑ý
Or
bþÍ =
∑ h − (∑ )
h
= ý − bþÍ
where bþÍ is called the estimate of regression coefficient of Y on X and it measures the change in
and
ý = + bþÍ
value of X is given by
∑ ý − ∑ ∑ý
bÍþ =
∑ ý h − (∑ ý)h
r
= − bÍþ ý
where bÍþ is called the estimate of regression coefficient of X on Y and it measures the change in
and
= + bÍþ ý
value of Y is given by
bþÍ + bÍþ
coefficient of correlation.
. . ≥
2
10) If two variable X and Y are independent, then regression and correlation coefficient is Zero
11) Both the lines regression pass through the point ( , ý). In other words, the mean values
( , ý ) can be obtained as the point of intersection of the two regression lines.
7. Range is -1 to +1 Range is -∞ to +∞
8. Correlation coefficient relative measure of Regression coefficient is absolute measures.
linear relationship between X and Y.
9. It is pure number, independent of units of It is expressed in the units of dependent variable
measurements.
10. The Correlation Co-efficient is denoted by Regression Co-efficient is denoted by
‘ρ’ for population ‘β’ population
‘r’ for sample ‘b’ sample
L ( , ýP L ( , ýP
The regression coefficient is given by
b = bþÍ = =
\N )
N1P
«Íh
L ( , ýP
The correlation coefficient is given by
It can be written as L ( , ýP
«ö «ø
«ö «ø (2)
» Û Û
By substituting eqn. (2) in (1) we get, bþÍ Ûχ
After simplification we get «ø
bþÍ
«ö
Similarly «ö
bÍþ
«ø
coefficient «ö and «ø are S.D. of X and Y respectively
Where r is correlation coefficient,
17.9 Regression Lines and Coefficient of Correlation
b
3. Consider test statistic
= ~ (jqh) 6
Zß(b)
where, b = r , SE(b) = „ (
‡q ‡ ‡
qh) ‡
4. Compare the calculate the ‘t’ value with the table ‘t’ value for (n-2) df at α level of
significance .
5. Determination of significance and Decision
a. If |t cal | ≥ t tab for (n-2) df at α, Reject H0.
b. If |t cal | < t tab for (n-2) df at α, Accept H0.
6. Conclusion
a. If we reject the null hypothesis conclusion will be regression co-efficient is significant.
b. If we accept the null hypothesis conclusion will be regression co-efficient is not
significant.
grouped into ‘k’ classes of sizes n1, n2 , …..nk respectively ( = ∑mklW k ) as given in below table.
Suppose, n observations of random variable yij ,( i = 1, 2, …… k ; j = 1,2….ni) are
The total variation in the observation Yij can be split into the following two components:
1) The variation between the classes, commonly known as treatment variation/class variation.
2) The variation within the classes i.e., the inherent variation of the random variable within the
observations of a class.
W
1 y11 y12 y13 ... y1n1 T1
ýh =
h
h
2 y21 y22 y23 ... y2n2 T2
ýp =
p
p
3 y31 y32 y31 ... y3n3 T3
: : : : :.... : : :
ým =
m
m
k yk1 yk2 yk3 ... yknk Tk
Grand total (GT) Grand Mean (ý)
Test Procedure: The steps involved in carrying out the analysis are:
1) Null Hypothesis (H0): H0: µ1 = µ2 = …= µk=µ
Alternative Hypothesis (H1): all µi’s are not equal (i = 1,2,…,k)
2) Level of significance (α ): Let α = 0.05 or 0.01
3) Computation of test statistic: steps
MST= SSTr/k-1 5Z
Sources of Variation d.f Sum of squares (S.S.) M.S.S F ratio
=
Test Statistic: Under Ho
§»k§jò ¨ ¹}
= ∼ (o − 1, K − o)
j ¹º ¹» §¹s j¹¼ ã
§»k§jò }k¹ºkj ¹º ¹» §¹s j¹ ã
W.
1 2 3 j h
1 y11 y12 y13 ... y1h R1
2 y21 y22 y23 ... y2h R2 h.
3 y31 y32 y31 ... y3h R3 p.
i : : : yij : : :
k yk1 yk2 yk3 ... ykh Rk m.
The total variation in the observation yij can be split into the following three components:
(i) The variation between the treatments (rations)
(ii) The variation between the varieties (breeds)
(iii) The inherent variation within the observations of treatments and varieties.
The first two types of variations are due to assignable causes which can be detected and
controlled by human endeavor and the third type of variation due to chance causes which are
beyond the control of human hand.
Test procedure for two -way analysis: The steps involved in carrying out the analysis are:
1. Null hypothesis (Ho):
Ho : µ1. = µ2. = ……µk. = µ. (for comparison of treatment/ rations) i.e., there is no significant
difference between rations (treatments)
where, Zß (6 ) = „
h ã
»
r = number of replications
tα, error df→ table ‘t’ value for error df at α level of significance
If the difference between two treatments mean is less than the calculated CD value, then
two treatments is not significantly from each other, otherwise they are significantly different.
7) Bar chart:
It is defined as the diagrammatic representation of drawing conclusion about the
superiority of treatments in an experiment.
Eg: Let T1, T2…..T5 are treatment means then
T2 T5 T1 T3 T4 (in descending order)
Conclusion: T2 and T5 are highly significant than all the others.
Suppose that there are ‘t’ treatments W , h … … . … . . ¹ and each treatments are replicated
3) Assign the treatments to the experimental units by using random numbers.
Note: Only replication and randomization principles are adopted in this design. But local control
is not adopted (because experimental material is homogeneous).
4) The Analysis of Variance (ANOVA) model for CRD is
kï = µ + k + kï
ýkï → observation
i = 1,2……t
j = 1,2…….r
µ → over all mean effect
k → ith treatment effect
kï → error effec
Arrangement of results for analysis
Observations Treat Total No. of replications
t1 y11 y12 ……………... y1r T1 r
t2 y21 y22 ………………. y2r T2 r
Treatments . . . ………………. . . .
ti . . ……*"& ……… . Ti .
. . . ………………. . . .
tt yt1 yt2 ………………. ytr Tt r
ß ZZ (ßZZ) = ZZ – ZZ
ANOVA TABLE
Source of Variation Df Sum of Squares Mean Squares F ratio
5Z = F= ã
».ãã ã
¹qW
Between treatments t-1 SSTr
ßZZ
ß5Z =
−
Within treatments (error) n-t ESS
Zß (6) = „
h ã
»
r = number of replications (for equal replication)
Lastly based on CD value the bar chart can be drawn, using the bar chart conclusions
can be written.
Suppose that there are ‘t’ treatments W , h … … . … . . ¹ and each treatments are replicated
equal to the number of replications for the treatments.
t3 t1 t2 t3
Field
t2 t2 t3 t1
*"
Treatments . . . ………………. . .
i . . ………*"& ………. .
. . . ………………. . .
T yt1 yt2 ………………… y
. tr Tt
Total R1 R2 …… & ………. Rr GT
∑ k
h
] ZZ (ZZ ) = − L
∑ ‡
8 ZZ ( ZZ) = − L
¹
ß ZZ (ßZZ) = ZZ – . ZZ – ZZ
ANOVA Table
5Z
Source of Variation df Sum of Squares Mean Squares F cal
=
ß5Z
Between Replications r-1 RSS RMS
5Z
=
Between treatments t-1 SSTr MSTr
Within treatments (r-1) (t-1) ESS EMS ß5Z
(error)
Total n-1 TSS
5) Test Procedure: The steps involved in carrying out the analysis are:
1. Null hypothesis:
The first step is to setting up a null hypothesis H0
Ho : µ1. = µ2. = ……µt. = µ (for comparison of treatment) i.e., there is no significant
difference between treatments
Ho : µ .1 = µ .2 = …µ.r = µ (for comparison of replications) there is no significant difference
between replication.
2. Level of significance (α ): 0.05 or 0.01
3. Test Statistic:
ò§ó = ~ ( − 1, ( − 1)( − 1))6
ã
ã
For comparison of treatment
4. Then the calculated F statistic value denote as Fcal, which is compared with the F table value (Ftab)
for respective degrees of freedom at the given level of significance.
5. Decision criteria
a) If F cal ≥ F tab Reject H0.
b) If F cal < F tab Accept H0.
= ! , "$ ∗ $% (")
Difference (CD).
Zß (6) = „
h ã
»
r = number of replications
Lastly based on CD value the bar chart can be drawn, using the bar chart conclusions
can be written.
[Note: For replication comparison:
a) If F cal < F tab then F is not significant. We can conclude that there is no significant
difference between replications. It indicates that the RBD will not contribute to precision in
detecting treatment differences. In such situations the adoption of RBD in preference to CRD
is not advantageous.
b) If F cal ≥ F tab then F is significant. It indicates there is a significant difference between
replications. In such situations the adoption of RBD in preference to CRD is advantages.
Then to know which of the treatment means are significantly different, we will
use Critical Difference (CD).
= ! , "$ ∗ $% (")
Where, ), ¾… → table ‘t’ value for error df at α level of significance
Zß (6) = „
h ã
¹
t = number of treatment]
7) Advantages of RBD
1) The precision is more in RBD.
2) The amount of information obtained in RBD is more as compared to CRD.
3) RBD is more flexible.
4) Statistical analysis is simple and easy.
5) Even if some values are missing, still the analysis can be done by using missing plot
technique.
6) It uses all the basic principles of experimental designs.
7) It can be applied to field experiments.