Mathematics Statistics MBA 1st Sem
Mathematics Statistics MBA 1st Sem
MBA
First Semester
Bharathidasan University
Centre for Distance and Online Education
Chairman:
Dr. M. Selvam
Vice-Chancellor
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
Co-Chairman:
Dr. G. Gopinath
Registrar
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
Course Co-Ordinator:
Dr. A. Edward William Benjamin
Director-Centre for Distance and Online Education
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
The Syllabus is Revised from 2021-22 onwards
Reviewer
Dr. O.M. Haja Mohideen, Head & dept of business Admn., Khadhir Mohideen college, Adiarampattinam.
Drt. M. Lakshmi bala, Asst prof. & Head, Dept. of business of Admin., K.N. Govt. arts college for women,
Thanjavur - 613 007
Authors
J. S. Chandan, (Units: 1, 2.6.2, 2.6.4, 3.1-3.4, 3.6, 3.8-3.11, unit-5)
C.R. Kothari, (Units: 2.1-2.3, 2.6.1, 2.6.3, 2.8-2.12, 3.7, unit-4, appendix)
G.S. Monga, (Units: 2.4, 2.6, 2.7, 3.5)
Vikas® Publishing House, (Units: 2.5)
Information contained in this book has been published by VIKAS® Publishing House Pvt. Ltd. and has
been obtained by its Authors from sources believed to be reliable and are correct to the best of their
knowledge. However, the Publisher, its Authors shall in no event be liable for any errors, omissions
or damages arising out of use of this information and specifically disclaim any implied warranties or
merchantability or fitness for any particular use.
Statistics is considered as a mathematical science pertaining to the collection, analysis, interpretation or explanation,
and presentation of data and can be categorized as Inferential Statistics or Descriptive Statistics. Statistical
analysis is important for taking decisions and is widely used by academic institutions, natural and social sciences
departments, governments and business organizations. The word statistics is derived from the Latin word status
which means a political state or government. The subject of statistics is primarily concerned with making decisions
about various disciplines of markets and employment, such as stock market trends, unemployment rates in various
sectors of industries, demographic shifts, interest rates, inflation rates over the years, and so on. These conclusions
help to formulate specific policies and attitudes with respect to diverse areas of interest. Statistics, then, is a
science that deals with numbers or figures describing the state of affairs of various situations with which we are
generally or specifically concerned. Its objective is to summarize such data so that the summary gives us an
indication of some characteristics of a population or phenomenon that we wish to study. To ensure that conclusions
are meaningful, it is necessary to subject data to scientific analysis so that rational decisions can be made. Hence,
the field of statistics is concerned with proper collection of data, organizing this data into a manageable and
presentable form, and analysing and interpreting the data into useful conclusions.
Organized data can be presented in the form of tables, diagrams or graphs. This presentation in an orderly
manner facilitates the understanding and analysis of data. The basic purpose of data analysis is to make it useful
for drawing conclusions. This analysis may simply be a critical observation of data to draw meaningful conclusions
about it or it may involve highly complex and sophisticated mathematical techniques. Interpretation means drawing
conclusions from the data, which then form the basis of decision-making.
The objectives of this book are to:
• Introduce students to frequently used mathematical and basic statistical methods in business decision-
making.
• Explain how to assess these techniques and use them when appropriate.
• Formulate a suitable statistical problem, obtain a solution and interpret important features of this solution in
a business decision-making process.
The learning material is presented in a structured format so that it is easy to grasp. Each unit begins with an
outline of the Learning Objectives followed by Introduction to the topic of the unit. The detailed content is then
presented in a simple language, interspersed with Check Your Progress questions to enable the student to test his
understanding as and when he goes through each unit. Let Us Sum Up provided at the end of each unit helps in
quick recollection. Possible Questions is also provided for further practice.
The book is divided into five units.
UNIT 1: Concepts of Probability
UNIT 2: Probability Distributions
UNIT 3: Sampling and Hypothesis Testing
UNIT 4: Correlation and Regression Analysis
UNIT 5: Time Series Analysis
MATHEMATICS & STATISTICS
DETAILED SYLLABUS
Unit–1: Concept and Definitions of Probability; Bayes’ Theorem and its Application Activity
Unit–2: Random Variable; Expectation and Variance; Continuous and Discrete Probability Functions;
Discrete Probability Distributions; Continuous Probability Distributions; Normal Distribution.
Unit–3: Sampling Procedures–Random and Non-Random Methods; Sample Size Determination, Sampling
Distribution and Standard Error; Hypothesis Testing; T-Distribution Testing ; Chi Square Test for
Independence of Attributes Activity
Unit–4: Correlation Analysis; Methods of Studying Simple Correlation; Properties of Correlation
Coefficient; Rank Correlation; Regression Analysis.
Unit–5: Components of Time Series Analysis; Forecasting with Decomposed Components of Time Series.
CONTENTS
UNIT 1 CONCEPTS OF PROBABILITY 1-21
1.1 Learning Objectives
1.2 Introduction
1.3 Concept and Definitions of Probability
1.3.1 Sample Space
1.3.2 Events
1.3.3 Addition and Multiplication Theorems on Probability
1.3.4 Independent Events
1.3.5 Conditional Probability
1.4 Bayes’ Theorem and its Application Activity
1.5 Let Us Sum Up
1.6 Further Reading
1.7 Answers to Check Your Progress
1.8 Possible Questions
1.9 Learning Outcomes
UNIT 2 PROBABILITY DISTRIBUTIONS 23-66
2.1 Learning Objectives
2.2 Introduction
2.3 Random Variable
2.3.1 Techniques of Assigning Probabilities
2.3.2 Mean of Random Variable or the Expected Value of Random Variable
2.3.3 Variance and Standard Deviation of Random Variable
2.4 Expectation and Variance
2.4.1 Expected Value of a Function of X; 2.4.2 Variance
2.4.3 Expectation (Conditional); 2.4.4 Expectation (Iterated)
2.4.5 Expectation: Continuous Variables
2.5 Continuous and Discrete Probability Functions
2.6 Discrete Probability Distributions
2.6.1 Binomial Distribution—Properties
2.6.2 Applications of Binomial Distribution
2.6.3 Poisson Distribution—Properties
2.6.4 Applications of Poisson Distribution
2.7 Continuous Probability Distributions
2.8 Normal Distribution
2.8.1 Characteristics of Normal Distribution
2.8.2 Family of Normal Distributions
2.8.3 How to Measure the Area under the Normal Curve
2.9 Let Us Sum Up
2.10 Further Reading
2.11 Answers to Check Your Progress
2.12 Possible Questions
2.13 Learning Outcomes
UNIT 1 CONCEPTS OF
PROBABILITY
UNIT STRUCTURE
1.1 Learning Objectives
1.2 Introduction
1.3 Concept and Definitions of Probability
1.3.1 Sample Space
1.3.2 Events
1.3.3 Addition and Multiplication Theorems on Probability
1.3.4 Independent Events
1.3.5 Conditional Probability
1.4 Bayes’ Theorem and its Application Activity
1.5 Let Us Sum Up
1.6 Further Reading
1.7 Answers to Check Your Progress
1.8 Possible Questions
1.9 Learning Outcomes
1.2 INTRODUCTION
In this unit, you will learn different theories of probability and will be able to explain
understand why probability is considered the most important tool in statistical
calculations. The subject of probability in itself is cumbersome; hence, only the
basic concepts will be discussed in this unit. The word probability or chance is very
commonly used in day-to-day conversation, and terms such as possible or probable
or likely, all have similar meanings.
Probability can be defined as a measure of the likelihood that a particular
event will occur. It is a numerical measure with a value between 0 and 1 of such a Probability: A measure
likelihood where the probability of 0 indicates that the given event cannot occur of the likelihood that a
and the probability of 1 assures certainty of such an occurrence. particular event will
occur.
In this unit, you will also learn why all these uncertainties require knowledge
of probability so that calculated risks can be taken. Since the outcomes of most
decisions cannot be accurately predicted because of the impact of many
uncontrollable and unpredictable variables, it is necessary that all the known risks
be scientifically evaluated. Probability theory, sometimes referred to as the science
of uncertainty, is very helpful in such evaluations. It helps the decision-maker with
only limited information to analyse the risks and select the strategy of minimum
risk.
This unit will also discuss the laws of addition and multiplication. The law of
addition states that when two events are mutually exclusive, the probability that
either of the events will occur is the sum of their separate probabilities. The law of
multiplication is applicable when two events occur at the same time.
This unit will also describe the probability distributions and Bayes’ theorem.
Bayes’ theorem is based on the philosophy of science and tries to clarify the
relationship between theory and evidence.
It is not that the classical theory of probability is not useful because of the above
described limitations. We can use this as an important guiding factor to calculate
the probability of uncertain situations as mentioned above and to calculate the
axiomatic approach to probability.
Frequency of occurrence
This approach to probability is widely used to a wide range of scientific disciplines.
It is based on the idea that the underlying probability of an event can be measured
by repeated trials.
Probability as a measure of frequency: Let nA be the number of times event
A occurs after n trials. We define the probability of event A as,
nA
P ( A) = Lim
n n
It is not possible to conduct an infinite number of trials. However, it usually suffices
to conduct a large number of trials, where the standard of large depends on the
probability being measured and how accurate a measurement we need.
Definition of probability
nA
The sequence in the limit that will converge to the same result every time, or
n
that it will converge at all. To understand this, let us consider an experiment consisting
of flipping a coin an infinite number of times. We want that the probability of heads
must come up. The result may appear as the following sequence:
HTHHTTHHHHTTTTHHHHHHHHTTTTTTTTHHHHHHHHHHHHH
HHHTTTTTTTTTTTTTTTT...
This shows that each run of k heads and k tails are being followed by another run of
nA 1
the same probability. For this example, the sequence oscillates between and
n 3
2
3
which does not converge. These sequences may be unlikely, and can be right.
The definition given above does not express convergence in the required way, but it
shows some kind of convergence in probability. The problem of formulating exactly
can be considered using axiomatic probability theory.
(iii) Empirical Probability Theory
The empirical approach to determine probabilities relies on data from actual
experiments to determine approximate probabilities instead of the assumption of
equal likeliness. Probabilities in these experiments are defined as the ratio of the
frequency of the possibility of an event, f(E), to the number of trials in the experiment,
n, written symbolically as P(E) = f(E)/n. For example, while flipping a coin, the
empirical probability of heads is the number of heads divided by the total number
of flips.
The relationship between these empirical probabilities and the theoretical
probabilities is suggested by the Law of Large Numbers. The law states that as the
number of trials of an experiment increases, the empirical probability approaches
the theoretical probability. Hence, if we roll a die a number of times, each number
would come up approximately 1/6 of the time. The study of empirical probabilities
is known as statistics.
(A prime). This means that the outcomes contained in A and the outcomes
contained in A must equal the total sample space. Therefore,
P[A] + P[A ] = P[S] = 1
or, P[A] = 1 – P[A ]
For example, if a passenger airliner has 300 seats and it is nearly full, but not totally
full, then event A would be the number of occupied seats and A would be the
number of unoccupied seats. Suppose there are 287 seats occupied by passengers
and only 13 seats are empty. Typically, the stewardess will count the number of
empty seats which are only 13 and report that 287 people are aboard. This is much
simpler than counting 287 occupied seats. Accordingly, in such a situation, knowing
event A is much more efficient than knowing event A.
• Mutually exclusive events: Two events are said to be mutually exclusive, if
both events cannot occur at the same time as outcome of a single experiment.
For example, if we toss a coin, then either event head or event tail would
occur, but not both. Hence, these are mutually exclusive events.
Venn diagrams
We can visualize the concept of events, their relationships and sample space using
Venn diagrams. The sample space is represented by a rectangular region and the
events and the relationships among these events are represented by circular regions
within the rectangle.
For example, two mutually exclusive events A and B are represented in the
Venn diagram as follows:
A B = S
A B
[AB]
A B
Example 1.1: A sample of 50 students is taken and a survey is made on the reading
habits of the sample selected. The survey results are shown as follows:
[A] 20 Time
[B] 15 Newsweek
[C] 10 Filmfare
[AB] 8 Time and Newsweek
[AC] 6 Time and Filmfare
[BC] 4 Newsweek and Filmfare
[ABC] 2 Time and Newsweek and Filmfare
Find out the probability that a student picked up at random from this sample of 50
students does not read any of these 3 magazines.
Solution: The problem can be solved by a Venn diagram as follows:
A B
8 6 5
2
4 2
2
21
C
Since there are 21 students who do not read any of the three magazines, the probability
that a student picked up at random among this sample of 50 students who does not
read any of these three magazines is 21/50.
The problem can also be solved by the formula for probability for union of
three events, given as follows:
Law of Addition
When two events are mutually exclusive, then the probability that either of the
events will occur is the sum of their separate probabilities. For example, if you roll
a single die, then the probability that it will come up with a face 5 or face 6, where
event A refers to face 5 and event B refers to face 6 and both events being mutually
exclusive events, is given by:
P[A or B] = P[A] + P[B]
or, P[5 or 6] = P[5] + P[6]
= 1/6 +1/6
= 2/6 = 1/3
P [A or B] is written as P[ A B] and is known as P [A union B].
However, if events A and B are not mutually exclusive, then the probability of
occurrence of either event A or event B or both is equal to the probability that event
A occurs plus the probability that event B occurs minus the probability that events
common to both A and B occur.
Symbolically, it can be written as:
P[ A B] = P[A] + P[B] – P[A and B]
Events [A and B] consist of all those events which are contained in both A and
B simultaneously. For example, in an experiment of taking cards out of a pack of 52
playing cards, assume that:
Event A = An ace is drawn.
Event B = A spade is drawn.
Event [AB] = An ace of spade is drawn.
Hence, P[A B] = P[A] + P[B] – P[AB]
= 4/52 + 13/52 – 1/52
= 16/52
= 4/13
This is so, because there are 4 aces, 13 cards of spades, including 1 ace of spades
out of a total of 52 cards in the pack. The logic behind subtracting P[AB] is that the
ace of spades is counted twice—once in event A (4 aces) and once again in event B
(13 cards of spade including the ace).
Another example for P[ A B], where event A and event B are not mutually
exclusive is as follows:
Suppose a survey of 100 persons revealed that 50 persons read India Today
and 30 persons read Time magazine and 10 of these 100 persons read both India
Today and Time. Then:
Event [A] = 50
Event [B] = 30
Event [AB] = 10
Since event [AB] of 10 is included twice, both in event A as well as in event B, event
[AB] must be subtracted once in order to determine the event [ A B] which means
that a person reads India Today or Time or both. Hence,
P[ A B] = P [A] + P [B] – P [AB]
= 50/100 + 30/100 –10/100
= 70/100 = 0.7
Law of Multiplication
Multiplication rule is applied when it is necessary to compute the probability if
both events A and B will occur at the same time. The multiplication rule is different
if the two events are independent as against the two events being not independent.
If events A and B are independent events, then the probability that they both
will occur is the product of their separate probabilities. This is a strict condition so
that events A and B are independent if and only if,
P [AB] = P[A] × P[B]
or = P[A] P[B]
For example, if we toss a coin twice, then the probability that the first toss results in
a head and the second toss results in a tail is given by,
P [HT] = P[H] × P[T]
= 1/2 × 1/2 = 1/4
However, if events A and B are not independent, meaning that the probability of
occurrence of an event is dependent or conditional upon the occurrence or non-
occurrence of the other event, then the probability that they will both occur is given
by,
P[AB] = P[A] × P[B/Given outcome of A]
This relationship is written as,
P[AB] = P[A] × P[B/A] = P[A] P[B/A]
Where P[B/A] means the probability of event B on the condition that event A has
occurred. As an example, assume that a bowl has 6 black balls and 4 white balls. A
ball is drawn at random from the bowl. Then a second ball is drawn without
replacement of the first ball back in the bowl. The probability of the second ball
being black or white would depend upon the result of the first draw as to whether
the first ball was black or white. The probability that both these balls are black is
given by,
P [Two black balls] = P [Black on 1st draw] × P [Black on 2nd draw/Black
on 1st draw]
= 6/10 × 5/9 = 30/90 = 1/3
This is so, because there are 6 black balls out of a total of 10, but if the first ball
drawn is black then we are left with 5 black balls out of a total of 9 balls.
1.3.4 Independent Events
Two events A and B are said to be independent events, if the occurrence of one
event is not influenced at all by the occurrence of the other. For example, if two fair
coins are tossed, then the result of one toss is totally independent of the result of the
other toss. The probability that a head will be the outcome of any one toss will
always be 1/2, irrespective of whatever the outcome is of the other toss. Hence,
these two events are independent.
Let us assume that one fair coin is tossed 10 times and it happens that the first
nine tosses resulted in heads. What is the probability that the outcome of the tenth
toss will also be a head? There is always a psychological tendency to think that a
tail would be more likely in the tenth toss since the first nine tosses resulted in
heads. However, since the events of tossing a coin 10 times are all independent
events, the earlier outcomes have no influence whatsoever on the result of the tenth
toss. Hence, the probability that the outcome will be a head on the tenth toss is still
1/2.
On the other hand, consider drawing two cards from a pack of 52 playing
cards. The probability that the second card will be an ace would depend upon whether
the first card was an ace or not. Hence, these two events are not independent
events.
P[ AB]
or P[A/ B] = = (1/ 6)/(2 / 6) = 1/ 2
P[B]
But for independent events, P[AB] = P[A] P[B]. Thus, substituting this relationship
in the formula for conditional probability, we get,
P[ AB] P[ A]P[B]
P[ A / B] = = = P[ A]
P[B] P[B]
This means that P[A] will remain the same no matter what the outcome of event B
is. For example, if we want to find out the probability of a head on the second toss
of a fair coin, given that the outcome of the first toss was a head, this probability
would still be 1/2 because the two events are independent events and the outcome
of the first toss does not affect the outcome of the second toss.
Solution: Based upon this information, the probability that a student picked up at
random will be female is 30/50 or 0.6, since there are 30 females in the total class of
50 students. Now suppose that we are given additional information that the person
picked up at random is Indian, then what is the probability that this person is a
female? This additional information will result in revised probability or posterior
probability in the sense that it is assigned to the outcome of the event after this
additional information is made available.
Since we are interested in the revised probability of picking a female student
at random provided that we know that the student is Indian. Let A1 be the event
female, A2 be the event male and B be the event Indian. Then based upon our
knowledge of conditional probability, the Bayes’ theorem can be stated as follows,
P( A1) P(B / A1)
P( A1 / B) =
P( A1 ) P(B / A1 ) + P( A2 )( P(B / A2 )
In the example discussed above, there are 2 basic events which are A1 (female) and
A2 (male). However, if there are n basic events, A1, A2, ......An, then Bayes’ theorem
can be generalized as,
P( A )P(B / A )
P( A1 / B) = P( A )P(B / A ) + P( A )(P1 (B / A )1+ ... + P( A )P(B / A )
1 1 2 2 n n
Solution: The businessman has also assigned ‘prior probabilities’ to the demand
structure or rooms. These probabilities reflect the initial judgement of the
businessman based upon his intuition and his degree of belief regarding the outcomes
of the states of nature.
Based upon these values, the expected payoffs for various rooms can be computed
as follows,
EV (50) = ( 25 × 0.2) + (35 × 0.5) + (50 × 0.3) = 37.50
EV (100) = (–10 × 0.2) + (40 × 0.5) + (70 × 0.3) = 39.00
EV (150) = (–30 × 0.2) + (20 × 0.5) + (100 × 0.3) = 34.00
This gives us the maximum payoff of $39,000 for building a 100 rooms hotel.
X1 X2 X3
States of (A1) 0.5 0.3 0.2
Nature of (A2) 0.2 0.6 0.2
(Demand) (A3) 0.1 0.3 0.6
The values in the preceding table are conditional probabilities and are interpreted
as follows:
The upper north-west value of 0.5 is the probability that the consultant’s
prediction will be for low demand (X1) when the demand is actually low. Similarly,
the probability is 0.3 that the consultant’s estimate will be for medium demand (X2)
when in fact the demand is low and so on. In other words, P(X1/ A1) = 0.5 and P(X2/
A1) = 0.3. Similarly, P(X1 / A2) = 0.2 and P(X2 / A2) = 0.6 and so on.
Our objective is to obtain posteriors which are computed by taking the
additional information into consideration. One way to reach this objective is to first
compute the joint probability which is the product of prior probability and conditional
probability for each state of nature. Joint probabilities as computed is given as,
Now, the posterior probabilities for each state of nature Ai are calculated as follows:
By using this formula, the joint probabilities are converted into posterior probabilities
and the computed table for these posterior probabilities is given as,
States of Nature Posterior Probabilities
Now, we have to compute the expected payoffs for each course of action with the
new posterior probabilities assigned to each state of nature. The net profits for each
course of action for a given state of nature is the same as before and is restated
below. These net profits are expressed in thousands of dollars.
Let Oij be the monetary outcome of course of action (i) when (j) is the corresponding
state of nature, so that in the above case Oi1 will be the outcome of course of action
R1 and state of nature A1, which in our case is $25,000. Similarly, Oi2 will be the
outcome of action R2 and state of nature A2, which in our case is –$10,000, and so
on. The expected value EV (in thousands of dollars) is calculated on the basis of the
actual state of nature that prevails as well as the estimate of the state of nature as
provided by the consultant. These expected values are calculated as follows,
Course of action = Ri
Estimate of consultant = Xi
Actual state of nature = Ai
Where, i = 1, 2, 3
Then,
(A) Course of action = R1 = Build 50 rooms hotel
A
R P i O
EV 1
=
X1 X 1 i1
= 0.435(25) + 0.435 (–10) + 0.130 (–30)
= 10.875 – 4.35 – 3.9 = 2.625
A
R1 P i O
EV
=
X2 X 2 i1
3. Two events are said to be mutually exclusive, if both events cannot occur at
the same time as the outcome of a single experiment. For example, if we toss
a coin, then either event head or event tail would occur, but not both. Hence,
these are mutually exclusive events.
Check Your Progress-2
1. The addition rule states that when two events are mutually exclusive, then
the probability that either of the events will occur is the sum of their separate
probabilities. For example, if you roll a single die then the probability that it
will come up with a face 5 or face 6, where event A refers to face 5 and event
B refers to face 6, both events being mutually exclusive events, is given by,
P[A or B] = P[A] + P[B]
Or, P[5 or 6] = P[5] + P[6]
= 1/6 +1/6
= 2/6 = 1/3
P [A or B] is written as P[A B] and is known as P [A union B].
2. The multiplication rule is applied when it is necessary to compute the
probability in case two events occur at the same time.
3. The Bayes’ theorem on probability gives a method for estimating the
probability of causes which are responsible for the outcome of an observed
effect. The theorem contributes to the statistical decision theory in revising
prior probabilities of outcomes of events based upon the observation and
analysis of additional information.
Short-Answer Questions
1. Explain the concept of probability.
2. What are the different theories of probability? Explain briefly.
3. What are mutually exclusive events?
4. What do you mean by simple probability?
5. Explain the axiomatic approach to probability.
6. Explain the concept of multiplication rule.
7. What is Bayes’ theorem? What is its importance in statistical calculations?
Long-Answer Questions
1. A family plans to have two children. What is the probability that both children
will be boys? (List all the possibilities and then select the one which would
be two boys.)
2. A wheel of fortune has numbers 1 to 40 painted on it, each number being at
equal distance from the other so that when the wheel is rotated, there is the
same chance that the pointer will point at any of these numbers. Tickets have
been issued to contestants numbering 1 to 40. The number at which the wheel
stops after being rotated would be the winning number. What is the probability
that,
(i) Ticket number 29 wins
(ii) A person who bought 5 tickets numbered 18 to 22 (inclusive), wins the
prize
3. Two fair dice are rolled. Comment on the probability of getting:
(i) A sum of 10 or more.
(ii) A pair of which atleast one number is 3.
(iii) A sum of 8, 9, or 10.
(iv) One number less than 4.
4. An urn contains 12 white balls and 8 red balls. Two balls are to be selected in
succession, at random and without replacement. What is the probability that,
(i) Both balls are white.
(ii) The first ball is white and the second ball is red.
(iii) One white ball and one red ball are selected.
Would the probabilities change if the first ball after being identified is put
back in the urn before the second ball is selected?
5. 200 students from the college were surveyed to find out if they were taking
any of the Management, Marketing or Finance courses. It was found that 80
of them were taking Management courses, 70 of them were taking Marketing
courses and 50 of them were taking Finance courses. It was also found that
30 of them were taking Management and Marketing courses, 30 of them
were taking Management and Finance courses and 25 of them were taking
Marketing and Finance courses. It was further determined that 20 of these
students were taking courses in all the three areas. What is the probability
that a particular student is not taking any course in any of these areas?
6. A family plans to have three children. List all the possible combinations and
find the probability that all the three children will be boys.
7. A movie house is filled with 700 people and 60 per cent of these people are
females. 70 per cent of these people are seated in the no smoking area including
300 females. What is the probability that a person picked up at random in the
movie house is:
(i) A male.
(ii) A female smoker.
(iii) A male or a non-smoker.
(iv) A smoker if we knew that the person is a male.
(v) Are the events sex and smoking statistically independent?
8. If a fair dice is rolled once, state the probability of getting:
(i) An odd number
(ii) A number greater than 3
9. In a computer course, the probability that a student will get an A is 0.09. The
probability that he will get a B grade is 0.15 and the probability that he will
get a C grade is 0.45. What is the probability that the student will get either a
D or an F grade?
10. An urn contains 12 white balls and 8 red balls. Two balls are to be selected in
succession, at random and without replacement. What is the probability that:
(i) Both balls are white.
(ii) The first ball is white and the second ball is red.
(iii) One white ball and one red ball are selected.
(iv) Would the probabilities change if the first ball after being identified is
put back in the urn before the second ball is selected?
11. In a statistics class, the probability that a student picked up at random comes
from a two parent family is 0.65, and the probability that he will fail the exam
is 0.20. What is the probability that such a randomly selected student will be
a low achiever given that he comes from a two parent family?
12. The following is a breakdown of faculty members in various ranks at the
college.
Rank Number of Males Number of Females
Professor 20 12
Assoc. Professor 18 20
Asst. Professor 25 30
What is the probability that a faculty member selected at random is:
(i) A female.
(ii) A female professor.
(iii) A female given that the person is a professor.
(iv) A female or a professor.
(v) A professor or an assistant professor.
(vi) Are the events of being a male and being an associate professor
statistically independent events?
UNIT 2 PROBABILITY
DISTRIBUTIONS
UNIT STRUCTURE
2.1 Learning Objectives
2.2 Introduction
2.3 Random Variable
2.3.1 Techniques of Assigning Probabilities
2.3.2 Mean of Random Variable or the Expected Value of Random Variable
2.3.3 Variance and Standard Deviation of Random Variable
2.4 Expectation and Variance
2.4.1 Expected Value of a Function of X; 2.4.2 Variance
2.4.3 Expectation (Conditional); 2.4.4 Expectation (Iterated)
2.4.5 Expectation: Continuous Variables
2.5 Continuous and Discrete Probability Functions
2.6 Discrete Probability Distributions
2.6.1 Binomial Distribution—Properties
2.6.2 Applications of Binomial Distribution
2.6.3 Poisson Distribution—Properties
2.6.4 Applications of Poisson Distribution
2.7 Continuous Probability Distributions
2.8 Normal Distribution
2.8.1 Characteristics of Normal Distribution
2.8.2 Family of Normal Distributions
2.8.3 How to Measure the Area under the Normal Curve
2.9 Let Us Sum Up
2.10 Further Reading
2.11 Answers to Check Your Progress
2.12 Possible Questions
2.13 Learning Outcomes
2.2 INTRODUCTION
Randomness means each possible entity has the same chance of being considered.
A random variable may be qualitative or quantitative in nature. In this unit, you will
study probability distribution, which means listing of all possible outcomes of an
experiment together with their probabilities. It may be discrete or continuous. You
will also study the concept of expectation. An expected value is the sum of each
possible outcome and the probability of occurrence of outcome. Expectation may
Statistics for Management 23
Unit-2 Probability Distributions
be conditional or iterated. You will study about the moment generating function
(MGF), which generates the moments for the probability distribution of a random
variable.
All these above stated random variable assignments cover every possible
outcome and each numerical value represents a unique set of outcomes. A random
variable can be either discrete or continuous. If a random variable is allowed to
take on only a limited number of values, it is a discrete random variable but if it is
allowed to assume any value within a given range, it is a continuous random variable.
Random variables presented in the above table are examples of discrete random
variables. We can have continuous random variables if they can take on any value
within a range of values, for example, within 2 and 5, in that case we write the
values of a random variable x as follows:
2 x 5
There are three techniques of assignment of probabilities to the values of the random
variable:
(a) Subjective probability assignment. It is the technique of assigning
probabilities on the basis of personal judgement. Such assignment may differ
from individual to individual and depends upon the expertise of the person
assigning the probabilities. It cannot be termed as a rational way of assigning
probabilities but is used when the objective methods cannot be used for one
reason or the other.
(b) A-priori probability assignment. It is the technique under which the
probability is assigned by calculating the ratio of the number of ways in
which a given outcome can occur to the total number of possible outcomes.
The basic underlying assumption in using this procedure is that every
possible outcome is likely to occur equally. But at times, the use of this
technique gives ridiculous conclusions. For example, we have to assign
probability to the event that a person of age 35 will live upto age 36. There
are two possible outcomes, he lives or he dies. If the probability assigned
in accordance with a-priori probability assignment is half then the same
may not represent reality. In such a situation, probability can be assigned
by some other techniques.
(c) Empirical probability assignment. It is an objective method of assigning
probabilities and is used by decision-makers. Using this technique the
probability is assigned by calculating the relative frequency of occurrence of
a given event over an infinite number of occurrences. However, in practice
only a finite (perhaps very large) number of cases are observed and relative
frequency of the event is calculated. The probability assignment through this
technique may as well be unrealistic, if future conditions do not happen to be
a reflection of the past.
Thus, what constitutes the ‘best’ method of probability assignment can only
be judged in the light of what seems best to depict reality. It depends upon the
nature of the problem and also on the circumstances under which the problem
is being studied.
2.3.2 Mean of Random Variable or the Expected Value of Random
Variable
Mean of random variable is the sum of the values of the random variable weighted
by the probability that the random variable will take on the value. In other words,
it is the sum of the product of the different values of the random variable and
their respective probabilities. Symbolically, we write the mean of a random variable,
say X, as X . The Expected value of the random variable is the average value that
would occur if we have to average an infinite number of outcomes of the random
variable. In other words, it is the average value of the random variable in the
long run. The expected value of a random variable is calculated by weighting
each value of a random variable by its probability and summing over all
values. The symbol for the expected value of a random variable is E (X).
and,
n
E X Xi .pr. X i
i 1
Thus, the mean and the expected value of a random variable are conceptually
and numerically the sam_e but usually denoted by different symbols and as such
the two symbols, viz., X and E (X) are completely interchangeable. We can,
therefore, express the two as follows:
n
E X Xi . pr. Xi X
i 1
The variance of the number of heads for two coins is double the variance
of the number of heads for one coin.
Therefore,
E(X) = 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 7/2 or 3.5
Hence, the expectation is 3.5, which is also the halfway between the possible values
the die can take, and so this is what you should have expected.
2.4.1 Expected Value of a Function of X
To find E[ f(X) ], where f(X) is a function of X, we use the following formula:
E[ f(X) ] = f(x)P(X = x)
Let us consider the above example of die, and calculate E(X2)
Using the notation above, f(x) = x2
f(1) = 1, f(2) = 4, f(3) = 9, f(4) = 16, f(5) = 25, f(6) = 36
P(X = 1) = 1/6, P(X = 2) = 1/6, etc.
Hence, E(X2) = 1/6 + 4/6 + 9/6 + 16/6 + 25/6 + 36/6 = 91/6 = 15.167
The expected value of a constant is just the constant, as for example E(1) = 1.
Multiplying a random variable by a constant multiplies the expected value by that
constant.
Therefore, E[2X] = 2E[X]
An important formula, where a and b are constants, is:
E[aX + b] = aE[X] + b
Hence, we can say that the expectation is a linear operator.
2.4.2 Variance
The variance of a random variable tells us something about the spread of the possible
values of the variable. For a discrete random variable X, the variance of X is written
as Var(X).
Var(X) = E[(X – )2]
E[ X ] = z xp( x)dx
If we consider two random variables X and Y (not necessarily independent), then
their combined behaviour is described by their joint probability density function
p(x, y) and is defined as:
E[ X ] = z +
E[ X |Y = y]. f ( y).dy
E(x) = x P(x)dx = u
–
Example 2.4: A newspaper boy earns 100 a day if there is suspense in the news.
He loses 10 a day if it is an eventless newspaper. What is the expectation of his
earnings if the probability of suspense news is 0.4?
Solution: E(x) = p1x1 + p2x2
= 0.4 × 100 – 0.6 × 10
= 40 – 6 = 34
Example 2.5: A player tossing three coins, earns 10 for 3 heads, 6 for 2 heads
and 1 for 1 head. He loses 25, if 3 tails appear. Find his expectation.
1 1 1 1
Solution: p(HHH) = . . = = p say 3 heads
1
2 2 2 8
1 1 1 3
p(HHT) = 3 C2 . . . = = p 2 say 2 heads, 1 tail
2 2 2 8
1 1 1 3
p(HTT) = 3 C . . . = = p say 1 head, 2 tails
1 3
2 2 2 8
1.1.1 3
p(TTT) = = = p 4 say 3 tails
2 2 2 8
E(x) = p1x1 + p2x2 + p3x3 – p4x4
1 3 3 1
= 10 + 6+ 2– 25
8 8 8 8
9
= = 1.125
8
Example 2.6: A and B roll a die. Whoever gets a 6, first wins 550. Find their
individual expectations if A makes a start. What will be the answer if B makes a
start?
1
Solution: A may not get 6 in 1st trial, p1 =
6
A may not get in 1st, B may not get in 2nd and A may get in 3rd,
2
5 5 1 5 1
p3 = = and so on.
6 6 6 6 6
2
5 5 1 5 1
A’s winning chance = = +...........
6 6 6 6 6
Where p(x) is the density function x,
1 1 6
= 2 =
6 5 11
1–
6
6 5
B’s winning chance = 1 – =
11 11
6
A wins 550 with probability p =
11
5
A gets nothing if he loses with probability q =
11
Expectation of A = p . x + q . 0
6 5
= 550 + 0 = 300
11 11
5
Similarly B wins with p =
11
6
B gets nothing if he loses with q =
11
5 6
Expectation of B = 550 + 0 = 250
11 11
If B starts first his expectation would be 300 and A’s would be 250.
Example 2.7: Calculate the standard deviation (S.D.) when x takes the values 0, 1,
2 and 9 with probability, 0.4, 0 – 0.1.
Solution: x takes the values 0, 1, 2, 9 with probability 0.4, 0.2, 0.3, 0.1.
= E(x) = (x
i i
p ) = 0 × 0.4 + 12 × 0.2 + 3 × 0.3 + 9 × 0.1 = 2.0
E(x2) = xi 2pi = 02 × 0.4 + 12 × 0.2 + 3 × 0.3 + 92 × 0.1 = 11.0
V(x) = E(x2) – 2
= 11 – 2 = 9
S.D.(x) = 9 =3
Example 2.8: The purchase of some shares can give a profit of 400 with probability
1/100 and 300 with probability 1/20. Comment on a fair price of the share.
1 1
Solution: Expected value E(x) = x p = 400 + 300 = 19
i i
100 20
Example 2.9: Find the variance and standard deviation (S.D.) of the following
probability distribution:
x1 1 2 3 4
p1 0.1 0.3 0.2 0.4
a
f (x) dx in case of any two numbers a and b
Total integral of f must be 1. It is converse is also true which tells that a
function f with total integral is equal to 1 then, f represents the probability density
for some probability distribution.
A probability density function is in fact, a refined version of a histogram with
very small or infinitesimal interval. Due to this, the curve is smooth. If sampling is
done, taking many values of a random variable which is of continuous nature, a
histogram is produced that depicts a histogram showing probability density, in a
very narrow output range. This can be termed a probability density function if and
only if it is non-negative and the area under the graph is 1. Putting mathematically
with logical connective showing ‘conjunction’, it is given as:
f (x) 0 x f (x) dx = 1
All distributions do not show density function. It is said to have a density
function f(x) only if its CDF, denoted as F(x), is continuous.
This is expressed mathematically as:
d
F ( x) = f ( x)
dx
Link between discrete and continuous distributions
A probability density function describes the association of variable in a distribution
that is continuous: taking a set of two state variables in the interval [a, b].
Discrete random variables may be represented in probability distribution using
a delta function. If we consider a binary discrete random variable by taking two
Statistics for Management 35
Unit-2 Probability Distributions
distinct values, say 1 and 2 which are equally likely, the probability density of such
a variable is given by:
1
f (t) = ( (t + 1) + (t 1))
2
We may generalize this as follows: If a discrete variable assumes ‘n’ different
values in the set of real numbers, then the associated probability density function is
given by:
n
f (t) = Pi (t xi )
i =1
Where i = 1, 2,3....., n, and x1, ......,. xn stand for discrete values for the variables
and SPi (where i = 1, 2, …., n) are probabilities associated with these values.
The method is used for knowing the characteristics of the mean, its variance
and kurtosis.
The method is used to show mathematically the characteristic of Brownian
movement and deciding on its initial configuration.
Probability functions associated with multiple variables
In case of random variables, of continuous nature, SXi, where i = 1, 2,3, 4, ….., n,
one can define a PDF for the whole set. This is a term coined as joint PDF which is
defined as a function with n variables. This is known by a different name MDF
which means marginal density function.
Independence
Random variables X1, …,…, Xn of continuous nature are independent of each other
if and only if fX1, ...., Xn (x1, ..., xn) = fX1(x1) ... fXn(xn).
Corollary
If a JPDF (joint probability distribution function) of a vector of n random variables
is shown as a product of n functions of one variable fX1, ...., Xn (x1, ..., xn) = f1(x1) ·
fn(xn), then all these variables are independent of each other. MPDF (marginal
probability density function) for each is expressed as:
fi (xi )
fXi (xi ) =
fi (x) dx
fu + v (x) = fu( y) fv (x y) dy
When a random variable x takes discrete values x1, x2,.... , xn with probabilities p1,
p2,.. ,pn, we have a discrete probability distribution of X.
The function p(x) for which X = x1, x2,..., xn takes values p1, p2, ... ,pn, is the
probability function of X.
The variable is discrete because it does not assume all values. Its properties
are:
p(xi) = Probability that X assumes the value x.
= Prob (x = xi) = pi
p(x) 0, p(x) = 1
For example, four coins are tossed and the number of heads X noted. X can take
value 0, 1, 2, 3, 4 heads.
4
1 1
p(X = 0) = =
2 16
3
p(X = 1) = 4 C 1 1 4
1
=
2 2 16
2 2
p(X = 2) = 4 C 2 1 1
=
6
2 16
2
Statistics for Management 37
Unit-2 Probability Distributions
3
p(X = 3) = 4 C 1 1 4
3 =
2 2 16
4 0
p(X = 4) = 4 C 4 1 1 = 1
2 16
2
6
16
5
16
4
16
3
16
2
16
1
16
O
0 1 2 3 4
4 6 4 1
p(x) = 1 + 4 + + + = 1
x=2 16 16 16 16 16
This is a discrete probability distribution.
Example 2.12: If a discrete variable X has the following probability function, then,
find (i) a (ii) p(X 3) (iii) p(X 3).
Solution: x1 p(xi)
0 0
1 a
2 2a
3 2a2
4 4a2
5 2a
Since p(x) = 1 , 0 + a + 2a + 2a2 + 4a2 + 2a = 1
6a2 + 5a – 1 = 0, so that (6a – 1) (a + 1) = 0
1
a= or a = –1 (not admissible)
6
1 5
2 2
For a = , p(X 3) = 0 + a + 2a + 2a = 2a + 3a =
6 9
4
2
p(X 3) = 4a + 2a =
9
Discrete Distributions
There are several discrete distributions. Some other discrete distributions are
described below.
Uniform or rectangular distribution
Each possible value of the random variable x has the same probability in the uniform
distribution. If x takes vaues x1, x2 ...... ,xk, then
1
p(xi, k) =
k
The numbers on a die follow the uniform distribution,
1
p(xi, 6) = (Here, x = 1, 2, 3, 4, 5, 6)
6
Bernoulli trials
In a Bernoulli experiment, an even E either happens or does not happen (E ).
Examples are, getting a head on tossing a coin, getting a six on rolling a die, and so
on.
The Bernoulli random variable is written,
X = 1 if E occurs
= 0 if E occurs
Since there are two possible values it is a case of a discrete variable where,
Probability of success = p = p(E)
Profitability of failure = 1 – p = q = p(E )
We can write,
For k = 1, f(k) = p
For k = 0, f(k) = q
For k = 0 or 1, f(k) = pkq1–k
Negative binomial
In this distribution, the variance is larger than the mean.
Suppose, the probability of success p in a series of independent Bernoulli
trials remains constant.
Suppose the rth success occurs after x failures in x + r trials.
1. The probability of the success of the last trial is p.
2. The number of remaining trials is x + r – 1 in which there should be r – 1
successes. The probability of r – 1 successes is given by,
x + r –1
Cr –1 p r –1 q x
The combined pobability of cases (1) and (2) happening together is,
x + r –1
p(x) = px C r –1 pr –1 qx x = 0, 1, 2,....
p
E(x) = Mean =
q
p
Variance =
q2
x
1
Mode =
2
Example 2.13: Find the expectation of the number of failures preceding the first
success in an infinite series of independent trials with constant probability p of
success.
Solution: The probability of success in,
1st trial = p (Success at once)
2nd trial = qp (One failure then success, and so on)
3rd trial = q2p (Two failures then success, and so on)
x n x
p(x) = N (X = 0, 1, 2,....., n)
Cn
Here, x is the number of successes in the sample and n – x is the number of
failures in the sample.
It can be shown that,
N1
Mean : E(X) = n
N
N n nN nN 2
Variance : Var (X) = 1
– 1
N 1 N N
Example 2.14: There are 20 lottery tickets with three prizes. Find the probability
that out of 5 tickets purchased exactly two prizes are won.
Solution: We have N1 = 3, N2 = N – N1 = 17, x = 2, n = 5.
3
C2 17C3
p(2) = 20
C5
3
C0 17C 5
The probability fo no pize p(0) = 20
C5
3
C1 17 C4
The probability of exactly 1 prize p(1) = 20
C5
Example 2.15: Examine the nature of the distibution when r balls are drawn, one at
a time without replacement, from a bag containing m white and n black balls.
Solution: It is the hypergeometric distribution. It corresponds to the probability
that x balls will be white out of r balls so drawn and is given by,
x
C x nC r x
p(x) = m+ n
Cr
Multinomial
There are k possible outcomes of trials, viz., x1, x2, ..., xk with probabilities p1, p2, ...,
pk, n independent trials are performed. The multinomial distibution gives the
probability that out of these n trials, x1 occurs n1 times, x2 occurs n2 times, and so on.
n!
p1n pn2 ..... pnk ,
1 2
This is given by n !n !....n !
1 2 k
where, ni = n
i=1
0 1 2 3 4 5 6 7 8
No. of Successes
(b) When p is equal to 0.5, the binomial distribution is symmetrical and the graph
takes the form as shown in the following figure.
Probability
0 1 2 3 4 5 6 7 8
No. of Successes
(c) When p is larger than 0.5, the binomial distribution is skewed to the left and
the graph takes the form as shown in the following figure.
Probability
0 1 2 3 4 5 6 7 8
No. of Successes
But if ‘p’ stays constant and ‘n’ increases, then as ‘n’ increases the vertical
lines become not only numerous but also tend to bunch up together to form a bell
shape, i.e., the binomial distribution tends to become symmetrical and the graph
takes the shape as shown in the following figure.
Probability
0, 1, 2, ..........
No. of Successes
1 6 p 6q2
Kurtosis = 3
n. p.q
1
The mean of the binomial distribution1 is given by n. p. = 10 × = 5 and the
2
1 1
variance of this distribution is equal to n. p. q. = 10 × × = 2.5.
2 2
These values are exactly the same as we have found them in the above table.
Hence, these values stand verified with the calculated values of the two
measures as shown in the table.
2.6.2 Applications of Binomial Distribution
Binomial distribution is one of the simplest and most frequently used discrete
probability distributions and is very useful in many practical situations involving
either/or types of events. It has certain distinct properties which are enumerated as
follows:
1. It describes the distribution of probabilities where there are only two mutually
exclusive outcomes for each trial of an experiment. For example, when tossing
a coin, there are only two possible outcomes, namely a head and a tail, which
are both mutually exclusive events. Similarly, when checking the quality of a
product, we can see that either the item is good or it is defective. These two
possible outcomes are denoted as success or failure. Success is simply the
outcome in which we are interested. Similarly, if we are interested in the
probability of a head in the toss of a coin, then the outcome tail will be
considered a failure. The probability of success is symbolized by p and the
probability of failure is symbolized by q, which is also (1 – p).
2. Each trial is independent of other trials. This means that the outcome of any
trial is independent of the outcome of any other trial.
3. The probability of success p remains constant from trial to trial. Similarly,
the probability of failure q or (1 – p) remains constant over all observations.
4. The process is performed under the same conditions for a fixed and finite
number of trials, say (n).
The concept of binomial distribution may be clear with the following case:
Suppose we toss a coin five times. The outcome head is designated as success
and outcome tail as failure, with probability of success and failure being respectively
1. The value of the binomial probability function for various values of n and p are also available in
tables (known as binomial tables) which can be used for the purpose to ease calculation work. The
tables are of considerable help particularly when n is large. (See Appendix)
(p) and (q). Suppose that the following was the sequence of outcomes of these
tosses.
H, T, H, T, T
This means that there are two successes and three failures in the given order
as above. The probability of this sequence of outcomes can be found by multiplication
rule of joint probability of mutually exclusive events and is given as:
p q p q q = p2q3
However, if we are not concerned with any particular sequence of the outcome,
but only in the outcome of two successes in any order out of the five tosses, then it
can be shown that there are ten different ways of obtaining two heads out of five
tosses. By applying the addition rule of probability, we can see that the probability
of getting any sequence with two heads and three tails would be ten times the
probability of the single sequence obtained above.
Accordingly, the probability of any two heads in 5 tosses would be,
10 × p2q3
In our case, we have,
p = 0.5
q = (1 – p) = 0.5
x = number of successes desired = 2
n = number of trials undertaken = 5
(n – x) = number of failures
Hence, the probability of two heads out of five tosses is,
x
This expression is known as the binomial formula.
Based upon this formula, the probability in our example of two heads in five
tosses can be calculated as follows,
5
P(2) = (.5) 2 (.5)3
2
5!
= (.5)2 (.5)3
2!(5 – 2)!
5 4 3 2 1 2 3
= (.5) (.5)
2 1 3 2 1
= 10(.5)2 (.5)3
= 0.3125
Example 2.17: If a new drug is found to be effective 40% of the time, then what is
the probability that in a random sample of 4 patients, it will be effective on 2 of
them?
Solution: Let us define effective as success and non-effective as failure.
Then,
p = 0.4 (since the drug is effective 40% of the time)
q = (1 – p) = (1 – 0.4) = 0.6
x=2
n=4
Now,
n
P(x) = ( p) x (q)n x
x
4
= (.4) 2 (.6) 2
2
4!
= (.4)2 (.6)2
2!(4 – 2)!
= 6 × .16 × .36
= 0.3456
2.6.3 Poisson Distribution—Properties
Poisson distribution is also a discrete probability distribution with which is associated
the name of a Frenchman, Simeon Denis Poisson who developed this distribution.
This distribution is frequently used in the context of operations research and for this
reason has a great significance for management people. This distribution plays an
important role in Queuing theory, Inventory control problems and also in Risk
models.
Poisson process
The distribution applies in case of Poisson process which has the following
characteristics:
• Concerning a given random variable, the mean relating to a given interval
can be estimated on the basis of past data concerning the variable under
study.
• If we divide the given interval into very very small intervals we will find:
(a) The probability that exactly one event will happen during the very very
small interval is a very small number and is constant for every other
very small interval.
(b) The probability that two or more events will happen within a very small
interval is so small that we can assign it a zero value.
(c) The event that happens in a given very small interval is independent,
when the very small interval falls during a given interval.
(d) The number of events in any small interval is not dependent on the
number of events in any other small interval.
Hence, mean = .n
n
x
3. There are tables which gives the e– values. These tables also give the e– values for
x
x = 0, 1, 2, . . . for a given and thus facilitate the calculation.
4. Variance of the Binomial distribution is n. p. q. and the variance of Poisson distribution is .
Therefore, = n. p. q. Since q is almost equal to unity and as pointed out earlier n. p.= in Poisson
distribution. Hence, variance of Poisson distribution is also .
1. 0.13534
= 0.13534
1
2
23.e 2 2 2 0.13534
P(with 3 defects, i.e., x = 3) = =
3 3 2 1
0.54136
= 0.18045
3
2
24.e 2 2 2 2 0.13534
P(with 4 defects, i.e., x = 4) = =
4 4 3 2 1
0.27068
= 0.09023
3
Example 2.19: How would you use a Poisson distribution to find approximately
the probability of exactly 5 successes in 100 trials the probability of success in
each trial being p = 0.1?
Solution: In the question we have been given,
n = 100 and p = 0.1
= n.p = 100 × 0.1 = 10
To find the required probability, we can use Poisson probability function as an
approximation to Binomial probability function as shown below:
( n. p) .e (n. p)
x x
.e
f ( Xi = x ) = =
x! x!
105.e 10
100000 0.00005 5.00000
or P(5)7 =
5 5 4 3 2 1 5 4 3 2 1
1
= = 0.042
24
2.6.4 Applications of Poisson Distribution
Poisson distribution is another theoretical discrete distribution, which is useful for
modelling certain real situations. It differs from the binomial distribution in the
sense that in the binomial distribution we must be able to count the number of
successes and the number of failures, while in Poisson distribution, all we want to
know is the average number of successes in a given unit of time or space. In many
situations, it is not possible to count the number of failures even though we can
know the number of successes. For example, in the case of patients coming to the
hospital for emergency treatment, we can always count the number of patients
arriving in any given hour. If the number of patients arriving is considered as the
number of successes, then we cannot know the number of failures because it is not
possible to count the number of patients not coming for emergency treatment in
that hour. Accordingly, it is not possible to determine the total number of possible
outcomes (successes and failures), and hence, binomial distribution cannot be applied
as a decision-making tool. In such a situation, we can use Poisson distribution, if
we know the average number of patients arriving for emergency treatment per hour.
It is assumed that such arrival of patients is a random phenomenon and hence, the
exact number of patients arriving in any hour is not predictable. Other examples of
Poisson distribution are telephone calls going through a switchboard system, the
number of cars passing through India Gate, the number of customers coming to a
bank for service, and so on. All these arrivals can be described by a discrete random
variable that takes on integer values (0, 1, 2, 3, ...).
Characteristics of the Poisson distribution
A physical situation must possess certain characteristics before it can be characterized
by poisson distribution. Some of these characteristics are:
(a) In a very small time interval between t and (t ), (where is infinitesimally
small), the probability that exactly one event will occur is a very small number,
(the event is a rare event) and is constant for every such small time period.
(b) The probability that two or more events will occur in this small time period
(t to t ) is so small that it can be assigned a value of zero.
(c) The events must be random and independent of each other. The occurrence
of one event cannot influence the chances of another event occurring, nor can
the occurrence of any one event be predicted in advance.
With the characteristics as described here, we need to know the average of
events per unit of time. The symbol for this average is (lambda) and it could be
the average number of cars passing under a bridge in any given hour or it could be
the average number of machine breakdowns per month, or it could be the average
number of customers arriving at a bank per day, and so on.
The probability that exactly (x) events will occur in a given time is given as
follows:
x
e
P(x) =
x!
where is the average number of occurrences per unit of time and e is the
base of the natural logarithms and is equal to 2.71828...... .
Example 2.20: Assume that on an average 3 persons enter a bank for service every
10 minutes. What is the probability that exactly 5 customers will enter the bank in
a given 10-minute period, assuming that the process can be described by Poisson
distribution.
Solution:
x –
e
P(x) =
x!
(3)5 (2.71828)–3
=
5!
(243)(.0498)
= = 0.1008
120
z1 z2
–
p( x)dx = 1, p( x)>0, – < x < , i.e., the area under the curve p(x) is 1
and the probability of x lying between two values a, b, i.e., p(a < x < b) is positive.
The most prominent example of a continuous probability function is the normal
distribution.
Cumulative Probability Function (CPF)
The cumulative probability function (CPF) shows the probability that x takes a
value less than or equal to, say, z and corresponds to the area under the curve up
to z:
z
p(x z) = –
p(x)dx
5. Quite often, mathematicians use the normal approximation of the binomial distribution whenever
‘n’ is equal to or greater than 30 and np and nq each are greater than 5.
6. Equation of the normal curve in its simplest form is,
x2
2
2
y y0 .e
Where, y = The computed height of an ordinate at a distance of X from the mean.
y0 = The height of the maximum ordinate at the mean. It is a constant in the equation
and is worked out as under:
Ni
y0
2
7. A symmetric distribution is one which has no skewness. As such it has the following statistical
properties:
(a) Mean=Mode=Median (i.e., X=Z=M)
(b) (Upper Quantile – Median)=(Median – Lower Quantile) (i.e., Q3–M = M–Q1)
(c) Mean Deviation=0.7979(Standard Deviation)
Q Q
3 1 0.6745 (Standard Deviation)
(d)
2
X or X
–3 –2 – + +2 +3
8. This also means that in a normal distribution, the probability of area lying between various
limits are as follows:
Limits Probability of area lying within the stated limits
µ ± 1 S.D. 0.6827
µ ± 2 S.D. 0.9545
µ ± 3 S.D. 0.9973 (This means that almost all cases
lie within µ ± 3 S.D. limits)
Normal curves with identical standard deviation but each with different means:
Normal curves each with different standard deviations and different means:
Where, Z = The standard variate (or number of standard deviations from X to the
mean of the distribution).
X = Value of the random variable under consideration.
µ = Mean of the distribution of the random variable.
= Standard deviation of the distribution.
The table showing the area under the normal curve (often termed as the standard
normal probability distribution table) is organized in terms of standard variate (or
Z) values. It gives the values for only half the area under the normal curve, beginning
with Z = 0 at the mean. Since the normal distribution is perfectly symmetrical the
values true for one half of the curve are also true for the other half. We now illustrate
the use of such a table for working out certain problems.
Example 2.22: A banker claims that the life of a regular savings account opened
with his bank averages 18 months with a standard deviation of 6.45 months. Answer
the following: (a) What is the probability that there will still be money in 22 months
in a savings account opened with the said bank by a depositor? (b) What is the
probability that the account will have been closed before two years?
Solution: (a) For finding the required probability we are interested in the area of
the portion of the normal curve as shaded and shown below:
σ = 6.45
μ = 18
z=0 X = 22
σ = 6.45
μ = 18
X = 24
z=0
The value from the concerning table, when Z = 0.93, is 0.3238 which refers to the
area of the curve between µ = 18 and X = 24. The area of the entire left hand portion
of the curve is 0.5 as usual.
Hence, the area of the shaded portion is (0.5) + (0.3238) = 0.8238 which is the
required probability that the account will have been closed before two years, i.e.,
before 24 months.
Example 2.23: Regarding a certain normal distribution concerning the income of
the individuals we are given that mean=500 rupees and standard deviation =100
rupees. Find the probability that an individual selected at random will belong to
income group,
(a) 550 to 650 (b) 420 to 570
Solution: (a) For finding the required probability we are interested in the area of
the portion of the normal curve as shaded and shown below:
= 100
= 500 X = 650
z = 0 X = 550
For finding the area of the curve between X = 550 to 650, let us do the following
calculations:
550 500 50 = 0.50
Z= =
100 100
Corresponding to which the area between µ = 500 and X = 550 in the curve as per
table is equal to 0.1915 and,
650 500 150
Z 1.5
100 100
Corresponding to which, the area between µ = 500 and X = 650 in the curve, as per
table, is equal to 0.4332.
Hence, the area of the curve that lies between X = 550 and X = 650 is,
(0.4332) – (0.1915) = 0.2417
This is the required probability that an individual selected at random will belong to
income group of 550 to 650.
(b) For finding the required probability we are interested in the area of the portion
of the normal curve as shaded and shown below:
To find the area of the shaded portion we make the following calculations:
= 100
= 100
z=0
X = 420 X = 570
570 500
Z = = 0.70
100
Corresponding to which the area between µ = 500 and X = 570 in the curve as per
table is equal to 0.2580.
420 500
and Z= = 0.80
100
Corresponding to which the area between µ = 500 and X = 420 in the curve as per
table is equal to 0.2881.
Hence, the required area in the curve between X = 420 and X = 570 is,
(0.2580) + (0.2881) = 0.5461
This is the required probability that an individual selected at random will belong to
income group of 420 to 570.
1''
Example 2.24: A certain company manufactures 1 all-purpose rope using imported
2
hemp. The manager of the company knows that the average load-bearing capacity
of the rope is 200 lbs. Assuming that normal distribution applies, find the standard
1''
deviation of load-bearing capacity for the 1 rope if it is given that the rope has a
2
0.1210 probability of breaking with 68 lbs. or less pull.
Solution: Given information can be depicted in a normal curve as shown below:
Probability of this
area (0.5) – (0.1210) = 0.3790
μ = 200
X = 68 z=0
If the probability of the area falling within µ = 200 and X = 68 is 0.3790 as stated
above, the corresponding value of Z as per the table9 showing the area of the normal
curve is – 1.17 (minus sign indicates that we are in the left portion of the curve).
Now to find , we can write,
X
Z
68 200
or 1.17 =
or 1.17 = 132
or = 112.8 lbs. approx.
Thus, the required standard deviation is 112.8 lbs. approximately.
Example 2.25: In a normal _distribution, 31 per cent items are below 45 and 8 per
cent are above 64. Find the X and of this distribution.
Solution: We can depict the given information in a normal curve as shown below:
X X
X
X
If the probability of the area falling within µ and X = 45 is 0.19 as stated above, the
corresponding value of Z from the table showing the area of the normal curve is –
0.50. Since, we are in the left portion of the curve, we can express this as under,
45
0.50 = (1)
Similarly, if the probability of the area falling within µ and X = 64 is 0.42, as stated
above, the corresponding value of Z from the area table is, +1.41. Since, we are in
the right portion of the curve we can express this as under,
64
1.41 (2)
If we solve equations (1) and (2) above to obtain the value of µ or X , we have,
– 0.5 = 45 – µ (3)
1.41 = 64 – µ (4)
By subtracting equation (4) from equation (3) we have,
– 1.91 = –19
= 10
9. The table is to be read in the reverse order for finding Z value (See Appendix).
• Statistical data is used to draw certain conclusions with the effective use of
probability. The various terms and concepts can be applied in the decision-
making process and also in inferring the occurrence of several events in the
business environment of an organization.
• The outcomes of most decisions cannot be accurately predicted because of
the impact of many uncontrollable and unpredictable variables, so it is
necessary to scientifically evaluate the known risks.
• The expected value is the weighted average of the possible values, a variable
can take. The variance of a random variable explains the spread of the possible
values of the variable.
• Probability distribution refers to the listing of all possible outcomes of an
experiment together with their probabilities. It may be discrete or continuous
depending upon the nature of the variable being considered.
• Binomial distribution is probably the best known of discrete distributions.
• The normal distribution, or Z-distribution, is often used to approximate the
binomial distribution. However, if the sample size is very large, Poisson
distribution is a philosophically more correct alternative to binomial
distribution than normal distribution.
• One of the main differences between Poisson distribution and the binomial
distribution is that in using binomial distribution, all eligible phenomena are
studied, whereas in Poisson only the cases with a particular outcome are
studied.
• Amongst all, normal probability distribution is by far the most important and
frequently used distribution because it fits well with many types of problems.
14. In a distribution exactly normal, 7 per cent of the items are under 35 and 89
per cent are under 63. What are the mean and standard deviation of the
distribution?
15. Fit a normal distribution to the following data:
Height in inches Frequency
60–62 5
63–65 18
66–68 42
69–71 27
72–74 8
3.2 INTRODUCTION
Sampling is the process of selecting a sample from the population. This unit
introduces you to sampling procedures, where you will learn about random and
non-random sampling methods. You will also learn the different methods of sample
size determination.
Hypothesis is an assumption that is tested to find its logical or empirical
consequences. A hypothesis should be clear and accurate. Various concepts such as
null and alternate hypotheses help to verify the testability of an assumption.
In this unit, you will also learn about the chi-square test and t-test. Any
statistical hypothesis test in which the test statistic has a chi-square distribution
when the null hypothesis is true is termed as a chi-square test.
to incorrect inferences. Too large a sample would be costly in terms of time and
money. The optimum sample size should fulfil the requirements of efficiency,
representativeness, reliability and flexibility. What is an optimum sample size is
also open to question. Some experts have suggested that 5 per cent of the population
properly selected would constitute an adequate sample, while others have suggested
as high as 10 per cent depending upon the size of the population under study.
However, proper selection and representation of the sample is more important than
size itself. The following considerations may be taken into account in deciding
about the sample size:
(a) The larger the size of the population, the larger should be the sample size.
(b) If the resources available do not put a heavy constraint on the sample size,
a larger sample would be desirable.
(c) If the samples are selected by scientific methods, a larger sample size would
ensure greater degree of accuracy in conclusions.
(d) A smaller sample could adequately represent the population, if the population
consists of mostly homogeneous units. A heterogeneous universe would
require a larger sample.
3.3.2 Census and Sampling
Under the census or complete enumeration survey method, data is collected for
each and every unit (e.g., person, consumer, employee, household, organization) of
the population or universe which are the complete set of entities and which are of
interest in any particular situation. In spite of the benefits of such an all-inclusive
approach, it is infeasible in most of the situations. Besides, the time and resource
constraints of the researcher, infinite or huge population, the incidental destruction
of the population unit during the evaluation process (as in the case of bullets,
explosives, etc.) and cases of data obsolescence (by the time census ends) do not
permit this mode of data collection.
Sampling is simply a process of learning about the population on the basis of
a sample drawn from it. Thus, in any sampling technique, instead of every unit of
the universe, only a part of the universe is studied and conclusions are drawn on
that basis for the entire population. The process of sampling involves selection of a
sample based on a set of rules, collection of information and making an inference
about the population. It should be clear to the researcher that a sample is studied not
for its own sake, but the basic objective of its study is to draw inference about the
population. In other words, sampling is a tool which helps us know the characteristics
of the universe or the population by examining only a small part of it. The values
obtained from the study of a sample, such as the average and dispersion are known
as ‘statistics’ and the corresponding such values for the population are called
‘parameters’.
Although diversity is a universal quality of mass data, every population has
characteristic properties with limited variation. The following two laws of statistics
are very important in this regard.
1. The law of statistical regularity states that a moderately large number of items
chosen at random from a large group are almost sure on the average to possess
Non-probability Probability
Samples Samples
creeping into the analysis besides enhancing the representativeness of the sample.
Furthermore, it is easy to assess the accuracy of the sampling estimates because
sampling errors follow the principles of chance. However, a completely catalogued
universe is a prerequisite for this method. The sample size requirements would be
usually larger under random sampling than under stratified random sampling, to
ensure statistical reliability. It may escalate the cost of collecting data as the cases
selected by random sampling tend to be too widely dispersed geographically.
(ii) Stratified random sampling
In stratified random sampling method, the universe to be sampled is subdivided
(stratified) into groups which are mutually exclusive, but collectively exhaustive
based on a variable known to be correlated with the variable of interest. Then, a
simple random sample is chosen independently from each group. This method differs
from simple random sampling in that, in the latter the sample items are chosen at
random from the entire universe. In stratified random sampling, the sampling is
designed in such a way that a designated number of items is chosen from each
stratum. If the ratio of items between various strata in the population matches with
the ratio of corresponding items between various strata in the sample, it is called
proportionate stratified sampling; otherwise, it is known as disproportionate stratified
sampling. Ideally, we should assign greater representation to a stratum with a larger
dispersion and smaller representation to one with small variation. Hence, it results
in a more representative sample than simple random sampling.
(iii) Systematic sampling
Systematic sampling is also known as quasi-random sampling method because once
the initial starting point is determined, the remainder of the items selected for the
sample are predetermined by the sampling interval. A systematic sample is formed
by selecting one unit at random and then selecting additional units at evenly spaced
intervals until the sample has been formed. This method is popularly used in those
cases where a complete list of the population from which sample is to be drawn is
available. The list may be prepared in alphabetical, geographical, numerical or
some other order. The items are serially numbered. The first item is selected at
random generally by following the lottery method. Subsequent items are selected
by taking every Kth item from the list, where ‘K’ stands for the sampling interval or
the sampling ratio, i.e., the ratio of the population size to the size of the sample.
Symbolically,
K = N / n , where K = Sampling interval; N = Universe size; n = Sample size. In
case K is a fractional value, it is rounded off to the nearest integer.
employed for the first stage units. For example, in a survey of 10,000 households in
AP, we may choose a few districts in the first stage, a few towns/villages/mandals
in the second stage and select a number of households from each town/village/
mandal selected in the previous stage. This method is quite flexible and is particularly
useful in surveys of underdeveloped areas, where no frame is generally sufficiently
detailed and accurate for subdivision of the material into reasonably small sampling
units. However, a multistage sample is, in general, less accurate than a sample
containing the same number of final stage units which have been selected by some
suitable single stage process.
3.3.4 Sampling and Non-Sampling Errors
The basic objective of a sample is to draw inferences about the population from
which such sample is drawn. This means that sampling is a technique which helps
us in understanding the parameters or the characteristics of the universe or the
population by examining only a small part of it. Therefore, it is necessary that the
sampling technique be a reliable one. The randomness of the sample is especially
important because of the principle of statistical regularity, which states that a sample
taken at random from a population is likely to possess almost the same characteristics
as those of the population. However, in the total process of statistical analysis,
some errors are bound to be introduced. These errors may be the sampling errors or
the non-sampling errors. Sampling errors arise due to drawing faulty inferences
about the population based upon the results of the samples. In other words, it is the
difference between the results that are obtained by the sample study and the results
that would have been obtained if the entire population was taken for such a study,
provided that the same methodology and manner was applied in studying both the
sample as well as the population. For example, if a sample study indicates that 25
per cent of the adult population of a city does not smoke and the study of the entire
adult population of the city indicates that 30 per cent are non-smokers, then this
difference would be considered as the sampling error. This sampling error would be
smallest if the sample size is large relative to the population, and vice versa.
Non-sampling errors, on the other hand, are introduced due to technically faulty
observations or during the processing of data. These errors could also arise due to
defective methods of data collection and incomplete coverage of the population,
because some units of the population are not available for study, inaccurate
information provided by the participants in the sample and errors occurring during
editing, tabulating and mathematical manipulation of data. These are the errors
which can arise even when the entire population is taken under study.
Both the sampling as well as the non-sampling errors must be reduced to a minimum
in order to get as representative a sample of the population as possible.
One of the major objectives of statistical analysis is to know the ‘true’ values of
different parameters of the population. Since it is not possible due to time, cost and
other constraints to take the entire population for consideration, random samples
74 Statistics for Management
Sampling and Hypothesis Testing Unit-3
are taken from the population. These samples are analysed properly and they lead
to generalizations that are valid for the entire population. The process of relating
the sample results to the population is referred to as, ‘Statistical Inference’ or
‘Inferential Statistics’.
Statistical inference:
In general, a single sample is taken and its mean X is considered to represent The process of relating
the population mean. However, in order to use the sample mean to estimate the the sample results to the
population mean, we should examine every possible sample (and its mean, etc.) population.
that could have occurred, because a single sample may not be representative enough.
If it was possible to take all the possible samples of the same size, then the distribution
of the results of these samples would be referred to as, ‘sampling distribution’. The
distribution of the means of these samples would be referred to as, ‘sampling
distribution of the means’.
The relationship between the sample means and the population mean can
best be illustrated by Example 3.1.
Example 3.1: Suppose a babysitter has 5 children under her supervision with average
age of 6 years. However, individually, the age of each child be as follows:
X1 = 2
X2 = 4
X3 = 6
X4 = 8
X5 = 10
Now these 5 children would constitute our entire population, so that N = 5.
Solution:
X
The population mean μ
N
2 + 4 + 6 + 8 +10
= = 30 / 5 = 6
5
and the standard deviation is given by the formula:
( μ)2
σ
N
Total = (X– )2 = 40
Then,
40
σ 8 2.83
5
Now, let us assume the sample size, n = 2, and take all the possible samples of size
2, from this population. There are 10 such possible samples. These are as follows,
along with their means.
X1, X2 (2, 4) X1 = 3
X1, X3 (2, 6) X2 = 4
X1, X4 (2, 8) X3 = 5
X1, X5 (2,10) X4 = 6
X2, X3 (4, 6) X5 = 5
X2, X4 (4, 8) X6 = 6
X2, X5 (4, 10) X7 = 7
X3, X4 (6, 8) X8 = 7
X3, X5 (6, 10) X9 = 8
X4, X5 (8, 10) X10 = 9
Now, if only the first sample was taken, the average of the sample would be 3.
Similarly, the average of the last sample would be 9. Both these samples are totally
unrepresentative of the population. However, if a grand mean X of the distribution
of these sample means is taken, then,
10
Xi
X = i=1
10
As we can see from our sampling distribution of the means, the grand mean
X of the sample means or – × equals , the population mean. However, realistically
speaking, it is not possible to take all the possible samples of size n from the
population. In practice only one sample is taken, but the discussion on the sampling
distribution is concerned with the proximity of ‘a’ sample mean to the population
mean.
It can be seen that the possible values of sample means tend towards the
population mean, and according to Central Limit Theorem, the distribution of sample
means tend to be normal for a sample size of n being larger than 30. Hence, we can
draw conclusions based upon our knowledge about the characteristics of the normal
distribution.
For example, in the case of sampling distribution of the means, if we know
the grand mean –×of this distribution, which is equal to p, and the standard deviation
of this distribution, known as ‘Standard error of free mean’ and denoted by –× ,
then we know from the normal distribution that there is a 68.26 per cent chance that
a sample selected at random from a population, will have a mean that lies within
one standard error of the mean of the population mean. Similarly, this chance
increases to 95.44 per cent, that the sample mean will lie within two standard errors
of the mean ( –× ) of the population mean. Hence, knowing the properties of the
sampling distribution tells us as to how close the sample mean will be to the true
population mean.
3.4.2 Standard Error
Standard error of the mean ( –×)
Standard error of the mean ( –×)is a measure of dispersion of the distribution of
sample means and is similar to the standard deviation in a frequency distribution
and it measures the likely deviation of a sample mean from the grand mean of the
sampling distribution.
If all sample means are given, then ( –×)can be calculated as follows:
Then,
( – )
=
N
28
=
7
= 4=2
However, since it is not possible to take all possible samples from the
population, we must use alternate methods to compute –× .
The standard error of the mean can be computed from the following formula,
if the population is finite and we know the population mean. Hence,
σ
σ = (N n)
n (N 1)
Where,
= Population standard deviation
N = Population size
n = Sample size
This formula can be made simpler to use by the fact that we generally deal
with very large populations, which can be considered infinite, so that if the population
size A’ is very large and sample size n is small, as for example in the case of items
tested from assembly line operations, then,
(N – n)
would approach 1.
(N – 1)
Hence,
=
n
(N – n)
The factor (N – n)
is also known as the ‘finite correction factor’, and should be
used when the population size is finite.
As this formula suggests, –× decreases as the sample size (w) increases,
meaning that the general dispersion among the sample means decreases, meaning
further that any single sample mean will become closer to the population mean, as
the value of ( –×) decreases. Additionally, since according to the property of the
normal curve, there is a 68.26 per cent chance of the population mean being within
one –× of the sample mean, a smaller value of –× will make this range shorter;
thus making the population mean closer to the sample mean (refer Example 3.2).
Example 3.2: The IQ scores of college students are normally distributed with the
mean of 120 and standard deviation of 10.
(a) What is the probability that the IQ score of any one student chosen at random
is between 120 and 125?
(b) If a random sample of 25 students is taken, what is the probability that the
mean of this sample will be between 120 and 125?
125
= 120
= 10
(X – )
Z=
125 – 120
Z= = 5 / 10 = .5
10
The area for Z = .5 is 19.15.
This means that there is a 19.15 per cent chance that a student picked up at
random will have an IQ score between 120 and 125.
(b) With the sample of 25 students, it is expected that the sample mean will be
much closer to the population mean, hence it is highly likely that the sample
mean would be between 120 and 125.
The formula to be used in the case of standardized normal distribution for
sampling distribution of the means is given by,
X–
Z=
where,
=
n
Hence,
125
=120
=10
X–
Z=
10
where, = = 10 = =2
n 25 5
Then,
125 – 120
Z= = 5 / 2 = 2.5
2
The area for Z = 2.5 is 49.38.
This shows that there is 49.38 per cent chance that the sample mean will be
between 120 and 125. As the sample size increases further, this chance will also
increase. It can be noted that the probability of a sample mean being between 120
and 125 is much higher than the probability of an individual student having an IQ
between 120 and 125.
Here, we should not insist on calling either hypothesis null and the other alternative
since the reverse could also be true.
3.5.2 Type I and Type II Errors
There are two types of errors in statistical hypothesis, which are as follows:
• Type I error: In this type of error, you may reject a null hypothesis when
it is true. It means rejection of a hypothesis, which should have been
accepted. It is denoted by (alpha) and is also known alpha error.
• Type II error: In this type of error, you are supposed to accept a null
hypothesis when it is not true. It means accepting a hypothesis, which
should have been rejected. It is denoted by (beta) and is also known as
beta error.
Type I error can be controlled by fixing it at a lower level. For example, if
you fix it at 2%, then the maximum probability to commit Type I error is
0.02. But, reducing Type I error has a disadvantage when the sample size is
fixed, as it increases the chances of Type II error. In other words, it can be
said that both types of errors cannot be reduced simultaneously. The only
solution of this problem is to set an appropriate level by considering the costs
and penalties attached to them or to strike a proper balance between both
types of errors.
In a hypothesis test, a Type I error occurs when the null hypothesis is rejected when
it is in fact true; that is, H0 is wrongly rejected. For example, in a clinical trial of a
new drug, the null hypothesis might be that the new drug is no better, on average,
than the current drug; that is H0: there is no difference between the two drugs on
average. A Type I error would occur if we concluded that the two drugs produced
different effects, when in fact there was no difference between them.
In a hypothesis test, a Type II error occurs when the null hypothesis H0, is not
rejected when it is in fact false. For example, in a clinical trial of a new drug, the
null hypothesis might be that the new drug is no better, on average, than the current
drug; that is H0: there is no difference between the two drugs on average. A Type II
error would occur if it were concluded that the two drugs produced the same effect,
that is, there is no difference between the two drugs on average, when in fact they
produced different ones.
In how many ways can we commit errors?
We reject a hypothesis when it may be true. This is type I error.
We accept a hypothesis when it may be false. This is type II error.
The other true situations are desirable:
We accept a hypothesis when it is true. We reject a hypothesis when it is false.
Accept H0 Reject H0
The level of significance implies the probability of Type I error. A 5 per cent level
implies that the probability of committing a Type I error is 0.05. A 1 per cent level
implies 0.01 probability of committing Type I error.
Lowering the significance level and hence the probability of Type I error is good
but unfortunately, it would lead to the undesirable situation of committing Type II
error.
To sum up:
• Type I Error: Rejecting H0when H 0is true.
• Type II Error: Accepting H0when H 0is false.
Note. The probability of making a Type I error is the level of significance of a
statistical test. It is denoted by
Where, = Prob. (Rejecting H 0 / H0 true)
1– = Prob. (Accepting H 0 / H0 true)
The probability of making a Type II error is denoted by .
Where, = Prob. (Accepting H 0 / H0 false)
1– = Prob. (Rejecting H / H false) = Prob. (The test correctly rejects H when
0 0 0
H0 is false.
1– is called the power of the test. It depends on the level of significance , sample size
n and the parameter value.
3.5.3 Null and Alternative Hypotheses
Hypothesis is usually considered as the principal instrument in research.
In the context of statistical analysis, while comparing any two methods, the
following concepts or assumptions are taken into consideration:
• Null hypothesis: While comparing two different methods in terms of their
superiority, wherein the assumption is that both the methods are equally good
is called null hypothesis. It is also known as statistical hypothesis and is
symbolized as H0.
• Alternate hypothesis: While comparing two different methods, regarding
their superiority, wherein, stating a particular method to be good or bad
as compared to the other is called alternate hypothesis. It is symbolized as
H1.
Comparison of null hypothesis with alternate hypothesis
Following are the points of comparison between null hypothesis and alternate
hypothesis:
• Null hypothesis is always specific, while alternate hypothesis gives an
approximate value.
• The rejection of null hypothesis involves great risk, which is not the case of
alternate hypothesis.
Null hypothesis is more frequently used in statistics than alternate hypothesis because
it is specific and is not based on probabilities.
The hypothesis to be tested is called the null hypothesis and is denoted by H0.This
is to be tested against other possible states of nature called alternative hypothesis.
The alternative is usually denoted by H1.
The null hypothesis implies that there is no difference between the statistic and the
population parameter. To test whether there is any difference between the sample
mean X and the population , we write the null hypothesis.
H0: X = µ
The alternative hypothesis would be,
H1: µ
This means > µ or < µ. This is called a two-tailed hypothesis.
The alternative hypothesis H1: > µ is right tailed.
The alternative hypothesis H1: < µ is left tailed.
These are one sided or one-tailed alternatives.
Note 1: The alternative hypothesis H1 implies all such values of the parameter,
which are not specified by the null hypothesis H0.
Note 2: Testing a statistical hypothesis is a rule, which leads to a decision to accept
or reject a hypothesis.
A one-tailed test requires rejection of the null hypothesis when the sample statistic
is greater than the population value or less than the population value at a certain
level of significance.
1. We may want to test if the sample mean exceeds the population mean .
Then the null hypothesis is,
H0: > µ
2. In the other case the null hypothesis could be,
H0: < µ
Each of these two situations leads to a one-tailed test and has to be dealt with in the
same manner as the two-tailed test. Here, the critical rejection is on one side only,
right for > µ and left for < µ. Both the Figures 3.2 and 3.3 here show a 5 per cent
level of test of significance.
For example, a minister in a certain government has an average life of 11 months
without being involved in a scam. A new party claims to provide ministers with an
average life of more than 11 months without scam. We would like to test if, on the
average, the new ministers last longer than 11 months. We may write the null
hypothesis H0: = 11 and alternative hypothesis H1: > 11.
Critical region is the area of the sampling distribution in which the test statistic
must fall for the null hypothesis to be rejected.
We can say that the critical region corresponds to the range of values of the statistic,
which according to the test requires the hypothesis to be rejected.
• Two-tailed and one-tailed tests: A two-tailed test rejects the null hypothesis
if the sample mean is either more or less than the hypothesized value of the
mean of the population. It is considered to be apt when null hypothesis is of
some specific value whereas alternate hypothesis is not equal to the value of
null hypothesis. In a two-tailed curve there are two rejection regions, also
called critical regions.
0.475 of 0.475 of
area area
Z = –1.96 2H 0= Z = 1.96
• Conditions for the occurrence of one-tailed test: When the population mean
is either lower or higher than some hypothesized value, one-tailed test is
considered to be appropriate where the rejection is only on the left tail of the
curve. This is known as left-tailed test.
For example, what will happen if the acceptance region is made larger? will
decrease. It will be more easily possible to accept H0 when H0 is false (Type II
error), i.e., it will lower the probability by making a Type I error, but raise that of ,
Type II error.
Note: , are probabilities of making an error; 1 – , l – are probabilities of
making correct decisions.
Can we say α + β = 1?
No. Each is concerned with a different type of error. But both are not independent
of each other.
3.5.5 Penalty
Usually Type II error is considered the worse of the two though, it is mainly the
circumstances of a case that decide the answer to this question.
If Type I error means accepting the hypothesis that a guilty person is innocent
and if Type II error means accepting the hypothesis that an innocent person is guilty,
then Type II error would be dangerous. The penalties and costs associated with an
error determine the balance or trade-off between Type I and Type II errors.
Usually, Type I error is shown as the shaded area, say 5 per cent of a normal
curve which is supposed to represent the data. If the sample statistic, say the sample
mean, falls in the shaded area, the hypothesis is rejected at 5 per cent level of
significance.
3.5.6 Standard Error
The concept of Standard Error (SE) of statistic, is used to test the precision of a
sample and provides the confidence limits for the corresponding population
parameter.
The statistic may be the sample arithmetic mean , the sample proportion p,
etc.
The SE of any such statistic is the standard deviation of the sampling
distribution of the statistic. Given below is SE in common use.
2 2
1 + 2
SE( X1 X2) =
n1 n2
SE( p p )= P1Q1 P2Q2
+
1 2
n1 n2
SE of difference between two means X1, X 2 or two proportions p1, p2 sample sizes
n1, n2 can be stated as,
2 2
1 + 2
SE( X1 X2) =
n1 n2
SE( p p )= P1Q1 P2Q2
+
1 2
n1 n2
A decision function D(x), assigns to every possible outcome a unique action. This
may result in loss—positive or negative—depending on an unknown parameter w.
So, the loss function is L(w, D), which depends on the outcome x is a random
variable. Its expected value is called the risk function.
X
z is approximately nominal. The theoretical region for z depending on
SE( X )
the desired level of significance can be calculated.
Example 3.3: A factory produces items, each weighing 5 kg with variance 4. Can a
random sample of size 900 with mean weight 4.45 kg be justified as having been
taken from this factory?
Solution:
n = 900
X = 4.45
µ =5
= 4=2
X X 4.45 5
z = = = 8.25
SE( X ) / n 2 / 30
We have z > 3. The null hypothesis is rejected. The sample may not be regarded as
originally from the factory at 0.27% level of significance (corresponding to 99.73%
acceptance region).
3.5.9 Test for Equality of Two Proportions
If P1, P2 are proportions of some characteristic of two samples of sizes n1, n2, drawn
from populations with proportions P1, P2, then we have H0: P1 = P2 vs H1:P1 ¹ P2.
Case (I): If H0 is true, then let P1 = P2 = p
Where, p can be found from the data:
n1P1 n2 P2
p
n1 n2
q 1 p
P1 P2
z , P is approximately normal (0,1)
SE(P1 P2 )
We write z ~ N(0, 1)
The usual rules for rejection or acceptance are applicable here.
Case (II): If it is assumed that the proportion under question is not the same in the
two populations from which the samples are drawn and that P1, P2 are the true
proportions, we write,
P1q1 P2 q2
SE(P1 P2 ) = +
n1 n2
P1q1 P2q2
(P1 P2 ) z /2
n1 n2
The 90 per cent confidence limits would be [with a = 0.1, 100 (1 – a) = 0.90]
Example 3.4: Out of 5000 interviewees, 2400 are in favour of a proposal, and out
of another set of 2000 interviewees, 1200 are in favour. Is the difference significant?
2400 1200
Where, P= = 0.48 P = = 0.6
1 2
5000 2000
2400 1200
Solution: Given, P = = 0.48 P = = 0.6
1 2
5000 2000
n1 = 5000 n2 = 2000
P1 P2 0.12
z 9.2 3
SE 0.013
The difference is highly significant at 0.27% level.
H0 : 1 2
H1 : 1 2
z X1 X2
2 2
1 2 , approximately normally distributed with mean 0, and S.D. = 1.
n1 n2
We write z ~ N (0, 1)
As usual, if | z | > 2 we reject H0 at 4.55 per cent level of significance, and so on.
Example 3.5: Two groups of sizes 121 and 81 are subjected to tests. Their means
are found to be 84 and 81 and standard deviations 10 and 12. Test for the significance
of difference between the groups.
Solution: X1 = 84 X 2 = 81 n1 = 121 n2 = 81
= 10 = 12
1 2
z X1 X2 84 81
2 2
z
1 2 , = 1.86 < 1.96
n1 n2 121 81
The difference is not significant at the 5 per cent level of significance.
3.5.11 Small Sample Tests of Significance
The sampling distribution of many statistics for large samples is approximately
normal. For small samples with n < 30, the normal distribution, as shown in Example
3.5, can be used only if the sample is from a normal population with known .
If is not known, we can use student’s t distribution instead of the normal. We then
replace by sample standard deviation with some modification as given below.
Let x1, x2, ..., xn be a random sample of size n drawn from a normal population with
mean and S.D. . Then,
x
t .
s/ n 1
Here, t follows the student’s t distribution with n – 1 degrees of freedom.
Note: For small samples of n < 30, the term n 1 , in SE = s / n 1 , corrects the
bias, resulting from the use of sample standard deviation as an estimator of .
Also,
s2 n 1 n 1
or s S
S2 n n
Student 1 2 3 4 5 6 7 8 9 10 11
x 23 20 19 21 18 20 18 17 23 16 19
y 24 19 22 18 20 22 20 20 23 20 17
di –1 1 –3 3 –2 –2 –2 –3 0 –4 2
11
d= = 1
11
2
(d d )
s= = 2.49
n 1
df = 11 1 = 10
d 1
t= = = 0.121
2.49 / 11 2.49 11
2 2
Null hypothesis H0 : 0
or
0
2 2
H1 : 0
ns2 (x – x )2
2
Test statistic 2 2
0 0
2
If is greater than the table value, we reject the null hypothesis.
|X |
t=
S | SEx X
where, X = Mean of the sample
(x i x )2
n
SEx = s
=
n n
and the degrees of freedom = (n – 1)
The above stated formula for t can as well be stated as under:
|x |
t=
SEx
|x |
=
(x x )2
n 1
n
|x |
= n
(x x )2n
1
If we want to work out the probable or fiducial limits of population mean (µ) in
case of small samples, we can use either of the following:
(a) Probable limits with 95 per cent confidence level:
=X SEx (t0.05 )
(b) Probable limits with 99 per cent confidence level:
= X SEx (t0.01 )
At other confidence levels, the limits can be worked out in a similar manner, taking
the concerning table value of t just as we have taken t0.05 in (a) and t0.01 in (b) above.
(ii) To test the difference between the means of the two samples
| X1 X 2 |
t=
SE x x
1 2
2 2
(X x )1 + (X 2 i x 2)
=
1i
SEx1 x2
n1 + n2 2
1 1
+
n1 n2
( x 1i + x 1)2 + ( x 2i x 2)2
n1 + n2 2
( x 1i A 1)2 + (x A )2 1 n (x 1 A )2 n (x A2i )2
= 2i 1i 2 2 2
n1 + n2 2
where, XDiff or
D = Mean of the differences of sample items.
0 = The value zero on the hypothesis that there is no difference
Diff.
= Standard deviation of difference and is worked out as
D XDiff ) (n 2
1)
or
D 2 ( D )2 n
(n 1)
D = Differences
n = Number of pairs in two samples and is based on (n –1)
degrees of freedom.
The following examples would illustrate the application of t-test using the above
stated formulae.
Example 3.8:
A sample of 10 measurements of the diameter of a sphere, gave a mean X = 4.38
inches and a standard deviation, = 0.06 inches. Find (a) 95% and (b) 99%
confidence limits for the actual diameter.
Solution:
On the basis of the given data the standard error of mean
0.06
= s= 0.06 = = 0.02
n 1 10 1 3
Assuming the sample mean 4.38 inches to be the population mean, the required
limits are as follows:
(i) 95% confidence limits = X SEx (t0.05 ) with degrees of freedom
= 4.38 ± .02(2.262)
= 4.38 ± .04524
i.e., 4.335 to 4.425
(ii) 99% confidence limits = X SEx (t0.01 ) with 9 degrees of freedom
= 4.38 ± .02(3.25) = 4.38 ± .0650
i.e., 4.3150 to 4.4450.
Example 3.9:
The specimen of copper wires drawn from a large lot have the following breaking
strength (in kg. wt.):
578, 572, 570, 568, 572, 578, 570, 572, 596, 544
Test whether the mean breaking strength of the lot may be taken to be 578 kg. wt.
Solution:
We take the hypothesis that there is no difference between the mean height of the
sample and the given height of universe. In other words we can write,
H0 : = X , H0 : X . Then on the basis of the sample data, the mean and standard
deviation has been worked out as follows:
2
S. No. X (X – X ) (X1 – X )
1 578 6 36
2 572 0 0
3 570 –2 4
4 568 –4 16
5 572 0 0
6 578 6 36
7 570 –2 4
8 572 0 0
9 596 24 576
10 544 –28 784
n = 10 X = 5720 (X – X )2 = 1456
i i
x 5720
X= =
n 10
= 572
(x x )s2
s =
n 1
1456 1456
= =
10 1 9
=12.72
12.72
SE x = s
=
n 10
12.72
= = 4.03
3.16
|x | | 572 578 |
t= =
SEx 4.03
= 1.488
Degrees of freedom = n –1 = 9
At 5% level of significance for 9 degrees of freedom, the table value of t = 2.262.
For a two-tailed test.
The calculated value of t is less than its table value and hence the difference is
insignificant. The mean breaking strength of the lot may be taken to be 578 Kg. wt.
with 95% confidence level.
Example 3.10:
Sample of sales in similar shops in two towns are taken for a new product with the
following results:
Mean sales Variance ‘Size of sample
Town A 57 5.3 5
Town B 61 4.8 7
Is there any evidence of difference in sales in the two towns?
Solution:
We take the hypothesis that there is no difference between the two sample means
concerning sales in the two towns. In other words, H0 : X1 = X 2 , H0 : X1 X 2 . Then,
we work out the concerning t value as follows:
Statistics for Management 99
Unit-3 Sampling and Hypothesis Testing
|X1 X2 |
t=
SEx1 x 2
(x 1i x )12 + (x x )2 2 1 1
SEx1 x2 = 2i
+
n1 + n2 2 n1 n2
Hence,
| 57 61 | 4
t= =
1.421 1.421
= 2.82
Degrees of freedom = (n1 + n2 – 2) = (5 + 7 – 2) = 10
Table value of t at 5% level of significance for 10 degrees of freedom is 2.228, for
a two-tailed test.
The calculated value of t is greater than its table value. Hence, the hypothesis is
wrong and the difference is significant.
Example 3.11:
The sales data of an item in six shops before and after a special promotional campaign
is:
Shops A B C D E F
Before the
promotional
campaign 53 28 31 48 50 42
After the campaign 58 29 30 55 56 45
Can the campaign be judged to be a success? Test at 5% level of significance.
Solution:
We take the hypothesis that the campaign does not bring any improvement in sales.
We can thus write:
In order to judge this, we apply the ‘difference test’. For this purpose we calculate
the mean and standard deviation of differences in two sample items as follows:
A 53 58 +5 +1.5 2.25
B 28 29 +1 –2.5 6.25
C 31 30 –1 –4.5 20.25
n=6 D = 21 (D – D )2 = 47.50
D 21
Mean of difference or X = = = 3.5
Diff
n 6
Standard deviation of difference
(D D)2 47.50
Diff = = = 3.08
n 1 6 1
X 0
t = Diff = n
Diff
Student 1 2 3 4 5 6 7 8 9
Before (XBi) 10 15 9 3 7 12 16 17 4
After (XAi) 12 17 8 5 6 11 18 20 3
Solution:
We take the hypothesis that training was not effective. We can write,
H0 : xA = X B , H0 : X > X B . We apply the difference test for which purpose first of all
we calculate the mean and standard deviation of difference as follows:
1 10 12 2 4
2 15 17 2 4
3 9 8 –1 1
4 3 5 2 4
5 7 6 –1 1
6 12 11 –1 1
7 16 18 2 4
8 17 20 3 9
9 4 3 –1 1
n=9 D=7 D2 = 29
D 7
D= = = 0.78
n 9
D2 (D)2 n 29 (0.78)2 9
Diff = = = 1.71
n 1 9 1
0.78
t= = 1.369
1 71
Degrees of freedom = (n – 1) = (9 – 1) = 8
Table value of t at 5% level of significance for 8 degrees of freedom
= 1.860 for one-tailed test.
Since the calculated value of t is less than its table value, the difference is insignificant
and the hypothesis is true. Hence, it can be inferred that the training was not effective.
Example 3.13:
It was found that the coefficient of correlation between two variables calculated
from a sample of 25 items was 0.37. Test the significance of r at 5% level with the
help of t-test.
Solution:
To test the significance of r through t-test, we use the following formula for
calculating t value:
r
t= n 2
1 r2
0.37
= 25 2
1 (0.37)2
=1.903
Degrees of freedom = (n–2) = (25–2) = 23
The table value of at 5% level of significance for 23 degrees of freedom is 2.069
for a two-tailed test.
The calculated value of t is less than its table value, hence r is insignificant.
Example 3.14:
A group of seven week old chickens reared on high protein diet weighs 12, 15, 11,
16, 14, 14 and 16 ounces; a second group of five chickens similarly treated except
that they receive a low protein diet weighs 8, 10, 14, 10 and 13 ounces. Test at 5%
level whether there is significant evidence that additional protein has increased the
weight of chickens. (Use assumed mean (or A1) = 10 for the sample of 7 and assumed
mean (or A2) = 8 for the sample of 5 chickens in your calculation).
Solution:
We take the hypothesis that additional protein has not increased the weight of the
chickens. We can write, Ho: X1 > X2 Ho : X1 > X2.
Applying t-test, we work out the value of t for measuring the significance of two
sample means as follows:
X X2
t = SE1
x x
1 2
A1 = 10 A2= 8
12 2 4 8 0 0
15 5 25 10 2 4
11 1 1 14 6 36
16 6 36 10 2 4
14 4 16 13 5 25
14 4 16
16 6 36
(X A )12 + ( X A )2 2 n ( X 1 A1 )2 n (X A 2)2 1 1
SE X1 X2 = 1i 2i 1 2 2
+
n1 + n2 2 n1 n2
134 + 69 7 (14 10)2 5(11 8)2 1 1
= +
7+5 2 7 5
= (2.14) (.59) = 1.2626
We now calculate the value under t,
X X 2 14 11
t= 1 = = 2.376
SEX X 1.26261 2
2 ( f0 fe )2
fe
of fit; (b) Test the homogeneity of a number of frequency distributions; and (c)
Test the significance of association between two attributes. In other words, Chi-
square test is a test of independence, goodness of fit and homogeneity. At times
Chi-square test is used as a test of population variance also.
As a test of goodness of fit, 2 test enables us to see how well the distribution of
observe data fits the assumed theoretical distribution such as Binomial
distribution, Poisson distribution or the Normal distribution.
As a test of independence, 2 test helps explain whether or not two
attributes are associated. For instance, we may be interested in knowing whether
a new medicine is effective in controlling fever or not and 2 test will help us in
deciding this issue. In such a situation, we proceed on the null hypothesis that the
two attributes (viz., new medicine and control of fever) are independent. Which
means that new medicine is not effective in controlling fever. It may, however,
be stated here that 2 is not a measure of the degree of relationship or the form of
relationship between two attributes but it simply is a technique of judging the
significance of such association or relationship between two attributes.
As a test of homogeneity, 2 test helps us in stating whether different
samples come from the same universe. Through this test, we can also explain
whether the results worked out on the basis of sample/samples are in conformity
with well defined hypothesis or the results fail to support the given hypothesis.
As such the test can be taken as an important decision-making technique.
As a test of population variance. Chi-square is also used to test the
significance of population variance through confidence intervals, specially in
case of small samples.
3.7.4 Steps in Finding the Value of Chi-Square
The various steps involved are as follows:
(i) First of all calculate the expected frequencies.
(ii) Obtain the difference between observed and expected frequencies and
find out the squares of these differences, i.e., calculate (f 0 – f )e2.
(iii) Divide the quantity ( f0 – f ) 2 obtained, as stated above by the corresponding
e
( f0 fe )2
expected frequency to get .
fe
( f0 fe )2 ( f0 fe )2
(iv) Then find summation of values or what we call
fe fe
2
This is the required value.
The 2 value obtained as such should be compared with the relevant table value
of 2 and inference may be drawn as stated above.
The following examples illustrate the use of Chi-square test.
Example 3.15: A dice is thrown 132 times with the following results:
Number Turned Up 1 2 3 4 5 6
Frequency 16 20 25 14 29 28
Test the hypothesis that the dice is unbiased.
Solution: Let us take the hypothesis that the dice is unbiased. If that is so, the
probability of obtaining any one of the six numbers is 1/6 and as such the
1
expected frequency of any one number coming upward is 132 22. Now, we
6
can write the observed frequencies along with expected frequencies and work out
the value of 2 as follows:
2 ( f0 fe )2
= 6.76 approx.
fe
Example 3.17:
Two research workers classified some people in income groups on the basis of
sampling studies. Their results are as follows:
Income Groups
Investigators Poor Middle Rich
Total
A 160 30 10 200
B 140 120 40 300
Total 300 150 50 500
Show that the sampling technique of at least one research worker is defective.
Solution:
Let us take the hypothesis that the sampling techniques adopted by the research
workers are similar (i.e., there is no difference between the techniques adopted
by the research workers). This being so, the expectation of A investigator
classifying the people in,
200 300
(i) Poor income group 120
500
200 150
(ii) Middle income group 60
500
200 50
(iii) Rich income group 20
500
Similarly, the expectation of B investigator classifying the people in
300 300
(i) Poor income group 180
500
300 150
(ii) Middle income group 90
500
300 50
(iii) Rich income group 30
500
2
We can now calculate value of as follows:
2 ( f0 fe )2
= 55.54
fe
2
(ad bc)2 N
=
(a c)(b d )(a b)(c d )
Where, N means the total frequency, ad means the larger cross product, bc means
the smaller cross product and (a + c), (b + d), (a + b) and (c + d) are the
marginal totals. The alternative formula is rarely used in finding out the value of
Chi-square as it is not applicable uniformly in all cases but can be used only in
a (2 × 2) contingency table.
3.7.6 Yates’ Correction
F. Yates has suggested a correction in 2 value calculated in connection with a (2
× 2) table particularly when cell frequencies are small (since no cell frequency
should be less than 5 in any case, though 10 is better as stated earlier) and 2 is
just on the significance level. The correction suggested by Yates is popularly
2
N.(ad bc 0.5 N )2
(corrected) =
(a b)(c d )(a c)(b d )
In case we use the usual formula for calculating the value of Chi-square
2 ( f0 fe )2 then Yates’ correction can be applied as follows:
viz.,
fe
2 2
f01 fe1 0.5 f02 fe2 0.5
2
(corrected) =
fe1 fe2
It may again be emphasized that Yates’ correction is made only in case of
(2 × 2) table and that too when cell frequencies are small.
3.7.7 Chi-Square as a Test of Population Variance
2
is used, at times, to test the significance of population variance ( p)2 through
confidence intervals. This, in other words, means that we can use 2 test to judge
if a random sample has been drawn from a normal population with mean ( ) and
with specified variance ( p )2. In such a situation, the test statistic for a null
hypothesis will be as follows:
2 ( Xi X s )2 n( s )2
= with (n–1) degrees of freedom.
( p)
2
( p )2
By comparing the calculated value (with the help of the above formula) with the
table value of 2 for (n–1) df at a certain level of significance, we may accept or
reject the null hypothesis. If the calculated value is equal or less than the table
value, the null hypothesis is to be accepted but if the calculated value is greater
than the table value, the hypothesis is rejected. This can be made clear by an
example.
Example 3.18:
Weight of 10 students is as follows:
Sl. No. 1 2 3 4 5 6 7 8 9 10
Weight in kg. 38 40 45 53 47 43 55 48 52 49
Can we say that the variance of the distribution of weights of all students from
which the above sample of 10 students was drawn is equal to 20 square kg? Test
this at 5% and 1% level of significance.
Solution:
First, we should work out the standard deviation of the sample ( s).
Calculation of the sample standard deviation:
2
Sl. No. Xi Xi Xs ( Xi Xs)
Weight in kg
1 38 – 9 81
2 40 – 7 49
3 45 – 2 04
4 53 + 6 36
5 47 + 0 00
6 43 – 4 16
7 55 + 8 64
8 48 + 1 01
9 52 + 5 25
10 49 + 2 04
2
n = 10 Xi = 470 ( Xi Xs) = 280
Xi 470
Xs = 47 kg
n 10
( Xi X s )2 280
s
= 28 5.3 kg
n 10
)s2 = 28
Taking the null hypothesis as H : ( )2 = ( )2
0 p s
2
n( s ) 10 28 280
The test statistic 2
= 14
( p )2 20 20
Degrees of freedom in this case is (n – 1) = 10 – 1 = 9
At 5% level of significance, the table value of 2 = 16.92, and at 1% level of
significance it is 21.67 for 9 df, and both these values are greater than the
calculated value of 2 which is 14. Hence, we accept the null hypothesis and
conclude that the variance of the given distribution can be taken as 20 square kg
at 5% as well as at 1% level of significance.
3.7.8 Additive Property of Chi-Square( 2)
An important property of 2 is its additive nature. This means that several values
of 2 can be added together and if the degrees of freedom are also added, this
number gives the degrees of freedom of the total value of 2. Thus, if a number of
2
values have been obtained from a number of samples of similar data, then
because of the additive nature of 2, we can combine the various values of 2 by just
simply adding them. Such addition of various values of 2 gives one value of 2
which helps in forming a better idea about the significance of the problem under
consideration. The following example illustrates the additive property of the 2.
2. The Critical Region (CR), or Rejection Region (RR), is a set of values for
testing statistic for which the null hypothesis is rejected in a hypothesis test.
3. The t-test is used when two conditions are fullfiled,
(i) The sample size is less than 30, i.e., when n 30.
(ii) The population standard deviation ( p) must be unknown.
Check Your Progress-3
1. Chi-square, symbolically written as 2 (pronounced as Ki-square), is a
statistical measure with the help of which, it is possible to assess the
significance of the difference between the observed frequencies and the
expected frequencies obtained from some hypothetical universe. Chi-square
tests enable us to test whether more than two population proportions can be
considered equal.
2. As a test of goodness of fit, 2 test enables us to see how well the distribution
of observed data fits the assumed theoretical distribution such as binomial
distribution, Poisson distribution or the normal distribution.
3. (i) Chi-square test is based on frequencies and not on parameters like mean
and standard deviation.
(ii) This test is used for testing the hypothesis and is not useful for estimation.
Short-Answer Questions
1. Define sampling.
2. Explain briefly the different types of sampling.
3. What is sampling distribution? Give examples.
4. Define population in statistical terms.
5. What is a hypothesis?
6. Explain the importance of statistical decision-making.
7. Define null and alternate hypotheses.
8. Describe the various types of errors that occur in statistical hypotheses.
9. Describe standard error.
10. What do you mean by the level of significance?
11. What is critical region?
12. Describe the one-tailed test.
13. What is the importance of a two-tailed test in statistics?
14. Write the importance of a small sample test.
15. Explain t-test and Chi-square test.
Long-Answer Questions
1. Differentiate between probability samples and non-probability samples. Under
what circumstances would non-probability types of samples be more useful
in statistical analyses?
2. How does sampling with replacement differ from sampling without
replacement? Give some examples of situations where sampling has to be
done without replacement.
3. Explain in detail the situations that would require:
(a) Judgement sampling
(b) Quota sampling
(c) Stratified sampling
4. Differentiate between sampling errors and non-sampling errors. Under what
circumstances would each type of error occur? What steps can be taken to
minimize the impact of such errors upon statistical analyses?
5. Your college has a total population of 5000 students. It is desired to estimate
the proportion of students who use drugs.
(a) What type of sampling would be necessary to reach a meaningful
conclusion regarding the drug use habits of all students?
(b) What type of sampling would you select so that the sample is most
representative of the population?
(c) Drug use being a sensitive issue, what type of questions would you
include in your questionnaire? What type of questions would you avoid?
Give reasons.
6. The lottery method of sample selection is still the most often used method.
Discuss this method in detail and give reasons as to why a sample selected by
the lottery method would be representative of the population.
7. You are the chairman of the Department of Business Administration and you
have been asked to make a report on the current status of students who
graduated with B.S. in Business during the two years of 1989 and 1990.
Records have indicated that a total of 425 students graduated from the
department during these two years. The report is to include information
regarding sex of the student, grade point average at the time of graduation,
whether the students completed MBA degree or started a job after B.S. degree,
current employment position and current annual salary. Prepare a proposal
for this survey and include in this proposal:
(a) Objectives of the survey
(b) Type of sampling technique
(c) Size of the sample
8. At New Delhi airport, there is a green channel and a red channel. Passengers
without any custom duty articles can go through the green channel. Some
passengers are stopped for a random check. What type of random sampling
would be appropriate in such situations? Would judgement sampling be more
appropriate? Give reasons.
17. 800 ore pieces from a mine were found to contain an average of 74.5 gm of
gold. From a nearby mine, 1600 pieces had 75 gm gold. Test the equality of
the averages from the two mines, each having an S.D. of 2.4.
18. To test the goodness of a coin, it is tossed 5 times. It is considered a bad coin
if more than 4 heads show up. (a) What is the probability of Type I error?
(b) If the probability of a head is 0.2, what is the probability of Type II error?
19. 10 persons randomly selected are found to have heights 63, 63, 66, 67, 68,
69, 70, 71, 71, 71 inches. Discuss the suggestion that the mean height in the
population is 66 inches.
20. 360 persons out of 600 are found to suffer from pollution induced bronchitis
in one city. In another, 400 out of 500 are found to suffer from bronchitis. Is
there any significant difference in the incidence of bronchitis?
4.2 INTRODUCTION
In this unit, you will learn about correlation analysis. This technique looks at indirect
relationships and establishes the variables that are most closely associated with a
given data or mindset. It is the process of finding how accurately the line fits using
the observations. Correlation analysis can be referred to as the statistical tool used
to describe the degree to which one variable is related to another. The relationship,
if any, is assumed to be a linear one. In fact, the word ‘correlation’ refers to the
relationship or the interdependence between two variables. There are various Correlation analysis:
The statistical tool used
phenomena that are related to each other. The theory by means of which quantitative to describe the degree to
connections between two sets of phenomena are determined is called the ‘Theory which one variable is
of Correlation’. On the basis of this theory, you can study the comparative changes related to another.
occurring in two related phenomena and their cause–effect relation can also be
examined. Thus, correlation is concerned with the relationship between two related
and quantifiable variables and can be positive or negative.
Statistics for Management 119
Unit-4 Correlation and Regression Analysis
In this unit, you will also learn about regression analysis, which is the
mathematical process of using observations to find the line of best fit through the
data in order to make estimates and predictions about the behaviour of variables.
This technique is used to determine the statistical relationship between two or more
variables and to make prediction of one variable on the basis of one or more other
variables.
Correlation analysis is the statistical tool generally used to describe the degree to
which one variable is related to another. The relationship, if any, is usually assumed
to be a linear one. This analysis is used quite frequently in conjunction with regression
analysis to measure how well the regression line explains the variations of the
dependent variable. In fact, the word correlation refers to the relationship or
interdependence between two variables. There are various phenomena which are
related to each other. For instance, when demand of a certain commodity increases,
its price goes up and when its demand decreases, its price comes down. Similarly,
with age the height of children, with height the weight of children, with money the
supply and the general level of prices go up. Such sort of relationships can as well
be noticed for several other phenomena. The theory by means of which quantitative
connections between two sets of phenomena are determined is called the ‘Theory
of Correlation’.
On the basis of the theory of correlation, one can study the comparative
changes occurring in two related phenomena and their cause–effect relation can
be examined. It should, however, be borne in mind that relationships like ‘black
cat causes bad luck’, ‘filled up pitchers result in good fortune’ and similar other
beliefs of the people cannot be explained by the theory of correlation, since they
are all imaginary and are incapable of being justified mathematically. Thus,
correlation is concerned with relationship between two related and quantifiable
variables. If two quantities vary in sympathy, so that a movement (an increase or
decrease) in one tends to be accompanied by a movement in the same or opposite
direction in the other and the greater the change in one, the greater is the change
in the other, the quantities are said to be correlated. This type of relationship is
known as correlation or what is sometimes called, in statistics, as covariation.
For correlation, it is essential that the two phenomena should have cause–
effect relationship. If such relationship does not exist then one should not talk of
correlation. For example, if the height of the students as well as the height of the
trees increases, then one should not call it a case of correlation because the two
phenomena, viz., the height of students and the height of trees are not even casually
related. However, the relationship between the price of a commodity and its
demand, the price of a commodity and its supply, the rate of interest and savings,
etc. are examples of correlation, since in all such cases the change in one
phenomenon is explained by a change in another phenomenon.
It is appropriate here to mention that correlation in case of phenomena
pertaining to natural sciences can be reduced to absolute mathematical term, e.g.,
Coefficient of Determination
The coefficient of determination (symbolically indicated as r2, though some people
would prefer to put it as R2) is a measure of the degree of linear association or
correlation between two variables, say X and Y, one of which happens to be an
independent variable and the other being a dependent variable. This coefficient is
based on the following two types of variations:
( )
2
(i) The variation of the Y values around the fitted regression line, viz., Y ˆ
Y ,
technically known as the unexplained variation.
(ii) The variation of the Y values around their own mean, viz., (Y Y ) ,
2
(Y Y ) (Y Yˆ )
2 2
=
2
Yˆ Y
100
80 Y
Explained Variation
( i.e.,Y Y ) at a
specific point
60
Y Mean line of Y
X
40
20
X- axis
0 20 40 60 X 80 100 120
Income (’00 Rs)
Fig. 4.1 Diagram Showing Total, Explained and Unexplained Variations
2
Yˆ Y
=1– 2
Y Y
Interpreting r2
The coefficient of determination can have a value ranging from 0 – 1. The value of
1 can occur only if the unexplained variation is 0, which simply means that all the
data points in the Scatter diagram fall exactly on the regression line. For a 0 value to
occur, (Y Y )2 = (Y Ŷ )2 , which simply means that X tells us nothing about Y and
hence there is no regression relationship between X and Y variables. Values between
0 and 1 indicate the ‘Goodness of fit’ of the regression line to the sample data. The
higher the value of r2, the better the fit. In other words, the value of r2 will lie
Analysis of result: The regression equation used to calculate the value of the
coefficient of determination (r2) from the sample data shows that, about 90 per cent
of the variations in consumption expenditure can be explained. In other words, it
means that the variations in income explain about 90 per cent of variations in
consumption expenditure.
Observation 1 2 3 4 5 6 7 8 9 10
Income (X) (’00 ) 41 65 50 57 96 94 110 30 79 65
Consumption
Expenditure (Y) (’00 ) 44 60 39 51 80 68 84 34 55 48
Then, by applying the following formulae, we can find the value of the coefficient
of correlation as,
Explained variation
r= r2
Total variation
Unexplained variation
= 1
Total variation
2
Y Yˆ
1
= 2
Y Y
This clearly shows that the coefficient of correlation happens to be the square root
of the coefficient of determination.
Short-cut formula for finding the value of ‘r’ by the method of least squares
can be repeated and readily written as,
a Y b XY nY 2
r=
Y2 nY 2
Where, a = Y-intercept
b = Slope of the estimating equation
X = Values of the independent variable
Y_ = Values of dependent variable
Y = Mean of the observed values of Y
n = Number of items in the sample
(i.e., pairs of observed data)
The plus (+) or the minus (–) sign of the coefficient of correlation worked out
by the method of least squares, is related to the sign of ‘b’ in the estimating
equation, viz., Yˆ a bXi . If ‘b’ has a minus sign, the sign of ‘r’ will also be
minus, but if ‘b’ has a plus sign, then the sign of ‘r’ will also be plus. The value
of ‘r’ indicates the degree along with the direction of the relationship between the
two variables X and Y.
we find the regression coefficient of Y on X, i.e., the slope of the estimating equation
Y
of Y (symbolically written as b ) and this happens to be equal to r . For finding
YX
X
‘r’, the square root of the product of these two regression coefficients are worked
out as1
r= bXY .bYX
X Y
= r .r
Y X
= r2 = r
As stated earlier, the sign of ‘r’ will depend upon the sign of the regression
coefficients. If they have minus sign, then ‘r’ will take minus sign but the sign of ‘r’
will be plus if regression coefficients have plus sign.
4.5.3 Karl Pearson’s Coefficient
Karl Pearson’s method is the most widely used method of measuring the relationship
between two variables. This coefficient is based on the following assumptions:
(i) There is a linear relationship between the two variables, which means that a
straight line would be obtained if the observed data is plotted on a graph.
(ii) The two variables are casually related, which means that one of the variables
is independent and the other one is dependent.
(iii) A large number of independent causes operate on both the variables so as to
produce a normal distribution.
According to Karl Pearson, ‘r’ can be worked out as,
XY
r = n
X Y
_
Where, X = (X – X_)
Y = (Y – Y )
X = Standard deviation of
X2
X series and is equal to
n
Y = Standard deviation of
Y2
Y series and is equal to
n
n = Number of pairs of X and Y observed
A short-cut formula, known as the Product Moment Formula, can be derived
from the above stated formula as,
XY
r = n
X Y
XY
= 2
X Y2
n n
XY
n= 2
X Y2
The above formulae are based on obtaining true means (viz. X and Y ) first and
then doing all other calculations. This happens to be a tedious task, particularly
if the true means are in fractions. To avoid difficult calculations, we make use of
the assumed means in taking out deviations and doing the related calculations. In
such a situation, we can use the following formula for finding the value of ‘r’:2
a. In case of ungrouped data:
dX.dY dX dY
n n n
r =
dX 2 dX 2
dY 2 dY 2
.
n n n n
dX dY
dX .dY
n
= 2
dX 2
dY
dX 2 dY 2
n n
2. In case we take assumed mean to be zero for X variable as for Y variable, then our formula will be
as,
nXY nX Y
n
r =
X2 X 2
Y2 Y 2
n n n n
XY
XY
n
or r =
X2 2 Y2 2
X Y
n n
XY nXY
r= 2
X nX 2
Y2 nY 2
fdX . fdY
fdX .dY
n
or r =
2 2
fdX fdY
fdX 2 fdY 2
n n
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
Solution:
Let us develop the following table for calculating the value of ‘r’:
X Y X2 Y2 XY
1 9 1 81 9
2 8 4 64 16
3 10 9 100 30
4 12 16 144 48
5 11 25 121 55
6 13 36 169 78
7 14 49 196 98
8 16 64 256 128
9 15 81 225 135
n=9
X = 45 Y = 108 X2 = 285 Y2 = 1356 XY = 597
_ _
X = 5; Y = 12
(i) Coefficient of correlation by the method of least squares is worked out as
follows:
First find out the estimating equation,
Yˆ = a + bXi
XY n XY
Where, b= 2
X2 nX
Unexplained variation
r= 1
Total variation
2 2
Y YYˆˆ Yˆˆ Y
= 1 2 = 2
Y Y Y Y
2
a Y b XY nY
= 2
Y2 nY
This is as per short-cut formula,
r=
9(12)
2
1356
54.15
= = 0.9025 = 0.95
60
(ii) Coefficient of correlation by the method based on regression coefficients is
worked out as,
Regression coefficients of Y on X,
XY n XY
i.e., bYX = 2
2
X nX
597 9 5 12 597 540 57
= 285 9 5 285 225 60
Regression coefficient of X on Y,
XY n XY
i.e., bXY = 2
2
Y nY
597 9 5 12 597 540 57
= 2
1356 9 12 1356 1296 60
Hence, r = bYX . bXY
57 57 = 57 = 0.95
=
60 60 60
(iii) Coefficient of correlation by the product moment method of Karl Pearson is
worked out as,
XY n XY
r= 2 2
X2 nX Y2 nY
597 9 5 12
=
285 9 5 2 1356 9 12 2
597 540 57 57
= = = 0.95
285 225 1356 1296 60 60 60
Hence, we get the value of r = 0.95. We get the same value by applying the other
two methods also. Therefore, whichever method we apply, the results will be the
same.
4.5.4 Other Measures
Two other measures are often talked about along with the coefficients of
determinations and that of correlation. These are as follows:
(i) Coefficient of nondetermination: Instead of using coefficient of
determination, sometimes coefficient of nondetermination is used. Coefficient
of nondetermination (denoted by k2) is the ratio of unexplained variation to
total variation in the Y variable related to the X variable. Algebrically,
2
Y Yˆ
Unexplained variation
k2 = = 2
Total variation Y Y
(ii) Coefficient of alienation: Based on k2, we can work out one more measure,
namely the coefficient of alienation, symbolically written as ‘k’. Thus,
coefficient of alienation, i.e., ‘k’ = k 2 .
Unlike r + k2 = 1, the sum of ‘r’ and ‘k’ will not be equal to 1 unless one of the
two coefficients is 1 and in this case the remaining coefficients must be zero. In
all other cases, ‘r’ + ‘k’ > 1. Coefficient of alienation is not a popular measure
from a practical point of view and is used very rarely.
Example 4.3: The ranks given by two judges to 10 individuals are as follows:
Rank given by
Individual Judge I Judge II D D2
x y = x–y
1 1 7 6 36
2 2 5 3 9
3 7 8 1 1
4 9 10 1 1
5 8 9 1 1
6 6 4 2 4
7 4 1 3 9
8 3 6 3 9
9 10 3 7 49
10 5 2 3 9
D2 = 128
Solution:
The rank correlation is given by,
6 D2 6 128
ρ =1 = 1 = 1 0.776 = 0.224
n3 n 103 10
The value of = 0.224 shows that the agreement between the judges is not high.
Example 4.4: Consider Example 4.3 and compute r and compare.
Solution:
The simple coefficient of correlation r for the previous data is calculated as follows:
x y x2 y2 xy
1 7 1 49 7
2 5 4 25 10
7 8 49 64 56
9 10 81 100 90
8 9 64 81 72
6 4 36 16 24
4 1 16 1 4
3 6 9 36 18
10 3 100 9 30
5 2 25 4 10
x = 55 y = 55 x2 = 385 y2 = 385 xy = 321
55 55
321 10
10 10 18.5 18.5 = 0.224
r = 2 2 = =
55 55 82.5 82.5 82.5
385 10 385 10
10 10
This shows that the Spearman for any two sets of ranks is the same as the
Pearson r for the set of ranks. However, it is much easier to compute .
O X
Positive slope
Direct linear relationship
O X High scatter r low, positive
Negative slope
Inverse linear relationship
High scatter r low, negative
(iii) Y
Y
(iv)
O X O X
Slope = 0 No relationship
r=0 r~–0
(v) Y
(vi) Y
O X
Direct
curvilinear relationship
O X
Inverse
curvilinear relationship
(vii)( Y
v
i
i
)
O X
Perfect relationship
But, r = 0 because of
non-linear relation
(v) The analysis is to be used to predict values within the range (and not for
values outside the range) for which it is valid.
Simple Linear Regression Model
In case of simple linear regression analysis, a single variable is used to predict
another variable on the assumption of linear relationship (i.e., relationship of the
type defined by Y = a + bX) between the given variables. The variable to be predicted
is called the dependent variable and the variable on which the prediction is based is
called the independent variable.
Simple linear regression model3 (or the Regression Line) is stated as,
Yi = a + bXi + ei
Where, Yi = The dependent variable
Xi = The independent variable
ei = Unpredictable random element (usually called
residual or error term)
(i) a represents the Y-intercept, i.e., the intercept specifies the value of the
dependent variable when the independent variable has a value of zero.
(However, this term has practical meaning only if a zero value for the
independent variable is possible).
(ii) b is a constant, indicating the slope of the regression line. Slope of the line
indicates the amount of change in the value of the dependent variable for a
unit change in the independent variable.
If the two constants (viz., a and b) are known, the accuracy of our prediction of Y
(denoted by Yˆ and read as Y-hat) depends on the magnitude of the values of ei . If in
the model, all the ei tend to have very large values then the estimates will not be
very good, but if these values are relatively small, then the predicted values ( Yˆ )
will tend to be close to the true values (Yi).
Estimating the intercept and slope of the regression model (or estimating the
regression equation)
The two constants or the parameters viz., ‘a’ and ‘b’ in the regression model for the
entire population or universe are generally unknown and as such are estimated from
sample information. The following are the two methods used for estimation:
(i) Scatter diagram method
(ii) Least squares method
ˆ
3. Usually, the estimate of Y denoted by Y is written as,
Yˆ a bXi
on the assumption that the random disturbance to the system averages out or has an expected
value of zero (i.e., e = 0) for any single observation. This regression model is known as the
Regression line of Y on X from which the value of Y can be estimated for the given value of X.
y-axis
Consumption Expenditure (’00 Rs)
120
100
80
60
40
20
x-axis
0 20 40 60 80 100 120
The scatter diagram by itself is not sufficient for predicting values of the dependent
variable. Some formal expression of the relationship between the two variables is
necessary for predictive purposes. For the purpose, one may simply take a ruler and
draw a straight line through the points in the scatter diagram and this way can
4.
(2) (3) (4) (5)
(1)
Five possible forms, which Scatter diagram may assume has been depicted in the above five
diagrams. Diagram (1) is indicative of perfect positive relationship. Diagram (2) shows perfect
negative relationship. Diagram (3) shows no relationship, Diagram (4) shows positive
relationship and Diagram (5) shows negative relationship between the two variables under
consideration.
determine the intercept and the slope of the said line and then the line can be defined
as Yˆ = a + bXi , with the help of which we can predict Y for a given value of X.
However, there are shortcomings in this approach. For example, if five different
persons draw such a straight line in the same scatter diagram, it is possible that
there may be five different estimates of a and b, especially when the dots are more
dispersed in the diagram. Hence, the estimates cannot be worked out only through
this approach. A more systematic and statistical method is required to estimate the
constants of the predictive equation. The least squares method is used to draw the
best fit line.
2. Least square method
The least squares method of fitting a line (the line of best fit or the regression line)
through the scatter diagram is a method which minimizes the sum of the squared
vertical deviations from the fitted line. In other words, the line to be fitted will pass
through the points of the scatter diagram in such a way that the sum of the squares
of the vertical deviations of these points from the line will be a minimum.
The meaning of the least squares criterion can be easily understood through
Figure 4.3, where the earlier Figure 4.2 in scatter diagram has been reproduced
along with a line which represents the least squares line to fit the data.
Fig. 4.3 Scatter Diagram, Regression Line and Short Vertical Lines Representing ‘e’
In Figure 4.3, the vertical deviations of the individual points from the line are shown
as the short vertical lines joining the points to the least squares line. These deviations
will be denoted by the symbol ‘e’. The value of ‘e’ varies from one point to another.
In some cases it is positive, while in others it is negative. If the line drawn happens
to be the least squares line, then the values of ei is the least possible. It is because
of this feature the method is known as Least Squares Method.
Why we insist on minimizing the sum of squared deviations is a question that
needs explanation. If we denote the deviations from the actual value Y to the estimated
n
any ei can be positive or negative. Large positive values and large negative values
could cancel one another. However, large values of ei regardless of their sign,
n
indicate a poor prediction. Even if we ignore the signs while working out | ei | ,
i1
the difficulties may continue. Hence, the standard procedure is to eliminate the
effect of signs by squaring each observation. Squaring each term accomplishes
two purposes, viz., (i) it magnifies (or penalizes) the larger errors, and (ii) it cancels
the effect of the positive and negative values (since a negative error when squared
becomes positive). The choice of minimizing the squared sum of errors rather
than the sum of the absolute values implies that there are many small errors rather
than a few large errors. Hence, in obtaining the regression line, we follow the
approach that the sum of the squared deviations be minimum and on this basis
work out the values of its constants viz., ‘a’ and ‘b’ also known as the intercept
and the slope of the line. This is done with the help of the following two normal
equations:5
Y = na + b X
XY = a X + b X2
In these two equations, ‘a’ and ‘b’ are unknowns and all other values viz., X,
Y, X2, XY, are the sum of the products and cross products to be calculated
from the sample data, and ‘n’ means the number of observations in the sample.
Example 4.6 explains the Least squares method.
Example 4.6: Fit a regression line Yˆ = a + bXi by the method of Least squares to
the following sample information.
Observations 1 2 3 4 5 6 7 8 9 10
Income (X) (’00 ) 41 65 50 57 96 94 110 30 79 65
Consumption
Expenditure (Y) (’00 ) 44 60 39 51 80 68 84 34 55 48
Solution:
We are to fit a regression line Yˆ = a + bXi to the given data by the method of Least
Squares. Accordingly, we work out the ‘a’ and ‘b’ values with the help of the
normal equations as stated above and also for the purpose, work out X, Y,
5. If we proceed centering each variable, i.e., setting its origin at its mean, then the two equations
will be as under:
Y = na + b X
XY = a X + b X2
But since Y and X will be zero, the first equation and the first term of the second equation will
disappear and we shall simply have the following equations:
XY = b X2
b = XY/ X2
The value of ‘a’ can then be worked out as: _ _
a = Y – bX
The square of the Se, also known as the variance of the error term, is the basic
measure of reliability. The larger the variance, the more significant are the magnitudes
of the e’s and the less reliable is the regression analysis in predicting the data.
Interpreting the standard error of estimate and finding the confidence
limits for the estimate in large and small samples
The larger the S.E. of estimate (SEe), the greater happens to be the dispersion,
or scattering, of given observations around the regression line. However, if the
S.E. of estimate happens to be zero, then the estimating equation is a ‘perfect’
estimator (i.e., cent per cent correct estimator) of the dependent variable.
(i) In case of large samples, i.e., where n > 30 in a sample, it is assumed that the
observed points are normally distributed around the regression line and we may
find that,
• 68 per cent of all points lie within Yˆ 1 SEe limits.
• 95.5 per cent of all points lie within Yˆ 2 SEe limits.
• 99.7 per cent of all points lie within Yˆ 3 SEe limits.
This can be stated as,
a. The observed values of Y are normally distributed around each estimated
value of Yˆ and;
Yˆ Y = r
Y
Xi X
X
Y
r Xi X Y
or, Yˆ = X
Y
Coefficient of Xi = b = r
X
Y
and a = r X Y
X
r Y
= Y since b
bX
X
Xˆ X
X
= r Y Y
Y
X
or Xˆ = r Y Y X
Y
and the
X XY n XY
Regression coefficient of X on Y (or b XY) r 2
2
Y Y nY
If we are given the two regression equations as stated above, along with the values
of ‘a’ and ‘b’ constants to solve the same for finding the value of X and Y, then the
values of X and Y so obtained, are the mean values of X (i.e., X ) and the mean
value of Y (i.e., Y ).
If we are given the two regression coefficients (viz., bXY and bYX), then we can
work out the value of coefficient of correlation by just taking the square root of the
product of the regression coefficients as shown,
r= bYX .bXY
Y X
= r .r
X Y
= r.r =r
The (±) sign of r will be determined on the basis of the sign of the given regression
coefficients. If regression coefficients have minus sign then r will be taken with
minus (–) sign and if regression coefficients have plus sign then r will be taken with
plus (+) sign, (Remember that both regression coefficients will necessarily have the
same sign, whether it is minus or plus, for their sign is governed by the sign of
coefficient of correlation.) To understand it better, see Examples 4.7 and 4.8.
Example 4.7: Given is the following information:
X Y
Mean 39.5 47.5
Standard Deviation 10.8 17.8
Simple correlation coefficient between X and Y is = + 0.42.
Find the estimating equation of Y and X.
Solution:
Estimating equation of Y can be worked out as,
(X )
Y
Yˆ Y r –X
= i
X
ˆ r Y
(X – X )+Y
or Y = i
X
17.8
= 0.42 (X i 39.5) + 47.5
10.8
= 0.69 X i 27.25 47.5
= 0.69Xi + 20.25
Similarly, the estimating equation of X can be worked out as
Xˆ
X
= r X
(Y i
Y )
Y
or Xˆ = r
X
(Y i
Y +X )
Y
10.8
or = 0.42 (Yi 47.5) + 39.5
17.8
= 0.26Yi – 12.35 + 39.5
= 0.26Yi + 27.15
Example 4.8: The following is the given data:
Variance of X = 9
Regression equations:
4X – 5Y + 33 = 0
20X – 9Y – 107 = 0
9Yi 107
Xˆ
20 20
4 Xi 33
and Yˆ
5 5
Hence, r= 9 / 20 4 / 5
= 9 / 25
= 3/5
= 0.6
Since, regression coefficients have plus signs, we take r = + 0.6.
(iii) Standard deviation of Y can be calculated,
Variance of X = 9 Standard deviation of X = 3
4
bYX r Y
= 0.6 Y
0.2 Y
X 5 3
Hence, Y = 4
Alternatively, we can work it out as,
= 9 = 0.6
Y 1.8
b X =
XY
r
Y
20 3 Y
Hence, Y= 4
5.2 INTRODUCTION
In this unit, you will learn how time series analysis differs from regression analysis.
We often see a number of charts on company drawing boards or in newspapers, where
we see lines goingup and down from left to right on a graph. The vertical axis represents
a variable such as productivity or crime data in the city, and the horizontal axis
represents the different periods of increasing time such asdays, weeks, months or years.
The analysis of the movements of such variables over periods of time is referred to as
time series analysis. Time series can then be defined as a set of numeric observations
of the dependent variable, measured at specific points in time in a chronological order,
usually at equal intervals, in order to determine the relationship of time to such
variables.
In this unit, you will also learn that one of the major elements of planning, and
specifically strategic planning of any organization is accurately forecasting the future
events that would have an impact on the operations of the organization. Previous
performances must be studied so as to forecast future activity. Even in our daily lives,
we plan our future events on the basis of a reasonable estimate of the future environment
that could affect our plans, whether it is forecasting rain on our picnic on Saturday, or
forecasting economic conditions for ten years. Textbook publishers must predict future
Statistics for Management 149
Unit-5 Time Series Analysis
sales of books to print enough copies for students; financial advisors must predict the
values of a variety of economic factors in order to advise clients regarding stocks,
bonds and other business opportunities. Similarly, hotel builders in a citymust project
the future influx of tourists, and so on. The quality of such forecasts is strongly related
to the relevant information that can be extracted and used from past data. In that respect,
time series can be used to determine patterns in the data of the past over a period of
time and extrapolate the data into the future.
(i) Season and climate. Changes in the climate and weather conditions
have a profound effect on sales. For example, the sale of umbrellas in India is
always more during monsoons. Similarly, during winter, there is a greater demand
for woollen clothes and hot drinks, while during summer months there is an
increase in the sales of fans and air conditioners.
(ii) Customs and festivals. Customs and traditions affect the pattern of
seasonal spending. For example, Mother’s Day or Valentine’s Day in America see
increase in gift sales preceding these days. In India, festivals such as Baisakhi
and Diwali mean a big demand for sweets and candy. It is customary all over the
world to give presents to children when they graduate from high school or college.
Accordingly, the month of June, when most students graduate, is a time for the
increase of sale for presents befitting the young.
An accurate assessment of seasonal behaviour is an aid in business planning
and scheduling such as in the area of production, inventory control, personnel,
advertising, and so on. The seasonal fluctuations over four repeating quarters in
a given year for sale of a given item is illustrated as:
In time series analysis, the independent variable is time, so we will use the
symbol t in place of X and we will use the symbol Yt in place of Yc which we
have used previously.
Hence, the equation for linear trend is given as:
Yt = b0 + b1t
where, Yt = Forecast value of the time series in period t
b0 = Intercept of the trend line on Y-axis
b1 = Slope of the trend line
t = Time period
As discussed earlier, we can calculate the values of b0 and bl by the following
formulae:
n (ty) – ( t)( y)
b1 , and b0 y b1t
n( t 2 ) – ( t)2
where, y = Actual value of the time series in period time t
n = Number of periods
y
y = Average value of time series
n
t
t=
t = Average value of
n
Knowing these values, we can calculate the value of y.
Example 5.1:
A car fleet owner has 5 cars which have been in the fleet for several different years.
The manager wants to establish if there is a linear relationship between the age of
the car and the repairs in hundreds of dollars for a given year. This way, he can
predict the repair expenses for each year as the cars become older. The information
for the repair costs he had collected for the last year on these cars is as follows:
Car # Age (t) Repairs (Y)
1 1 4
2 3 6
3 3 7
4 5 7
5 6 9
The manager wants to predict the repair expenses for the next year for the
two cars that are 3 years old.
Solution:
The trend in repair costs suggests a linear relationship with the age of the
car, so that the linear regression equation is given as:
Yt b0 b1t
3 6 18 9
3 7 21 9
5 7 35 25
6 9 54 36
Total 18 33 132 80
Knowing that n = 5, let us substitute these values to calculate the regression
coefficients b0 and b1.
5(132) (18)(33)
Then, b1
5(80) (18)2
660 – 594
=
400 – 324
66
= = 0.87
76
and b0 y b1t
y 33
where y= 6.6
n 5
t 18
and t= 3.6
n 5
Then, b0 6.6 0.87(3.6)
= 6.6 – 3.13
= 3.47
Hence, Yt 3.47 0.87t
The cars that are 3 years old now will be 4 years old next year, so that t =
4.
Hence, Y( 4) 3.47 0.87(4)
= 3.47 + 3.48
= 6.95
Accordingly, the repair costs on each car that is 3 years old are expected to
be $695.00.
Then the average of the sum of squared errors, also known as mean squared
error (MSE) is given as:
16 + 9 +1 26
MSE = = = 8.67
3 3
The value of MSE is an often-used measure of the accuracy of the forecasting
method, and the method which results in the least value of MSE is considered
more accurate than others. The value of MSE can be manipulated by varying the
number of data values to be included in the moving average. For example, if we
had calculated the value of MSE by taking 4 periods into consideration for
calculating the moving average, rather than 3, then the value of MSE would be
less. Accordingly, by using trial and error method, the number of data values
selected for use in forecasting would be such that the resulting MSE value would
be minimum.
2. Exponential smoothing. In the moving average method, each value contributes
equally towards the calculation of the moving average, irrespective of the number Exponential smooth-
of time periods taken into consideration. In most actual situations, this is not a ing: Each observation in
realistic assumption. Because of the dynamics of the environment over a period the moving average cal-
of time, it is more likely that the forecast for the next period would be closer to culation receives the
the most recent previous period than the more distant previous period, so that the same weight.
more recent value should get more weight than the previous value, and so on.
The exponential smoothing technique uses the moving average with appropriate
weights assigned to the values taken into consideration in order to arrive at a
more accurate or smoothed forecast. It takes into consideration the decreasing
impact of the past time periods as we move further into the past time periods.
This decreasing impact as we move down into the time period is exponentially
distributed and hence, the name exponential smoothing.
In this method, the smoothed value for period t, which is the weighted
average of that period’s actual value and the smoothed average from the previous
period (t – 1), becomes the forecast for the next period (t + l). Then the exponential
smoothing model for time period (t + l) can be expressed as follows:
F(t 1) Yt (1 )Ft
where F(t + 1) = The forecast of the time series for period (t + 1)
Yt = Actual value of the time series in period t
= Smoothing factor (0 1)
Ft = Forecast of the time series for period t
The value of is selected by the decision-maker on the basis of the degree of
smoothing required. A small value of means a greater degree of smoothing. A
large value of means very little smoothing. When = 1, then there is no
smoothing at all so that the forecast for the next time period is exactly the same
as the actual value of times series in the current period. This can be seen by:
F(t 1) Yt (1 )Fr
when 1
F(t 1) Yt 0Ft Yt
The exponential smoothing approach is simple to use and once the value of
is selected, it requires only two pieces of information, namely Yt and Ft to
calculate F(t 1) .
To begin with the exponential smoothing process, we let Ft equal the actual
value of the time series in period t, which is Y1. Hence, the forecast for period
2 is written as:
F2 Y1 (1 ) F1
But since we have put F1=Y1 , hence,
F2Y1 (1 ) Y1
= Y1
Let us now apply exponential smoothing method to the problem of forecasting
car sales as discussed in the case of moving averages. The data once again is
given as follows:
Yt Ft Ft
Ft (Yt Ft )
where (Yt – Ft) is the forecast error during the time period t.
The accuracy of the forecast can be improved by carefully selecting the
value of . If the time series contains substantial random variability then a small
value of (known as smoothing factor or smoothing constant) is preferable. On
the other hand, a larger value of would be desirable for time series with
relatively little random variability (Yt – Ft).
Since secular trend component can be described by the trend line (usually
calculated by line of regression), we can isolate cyclical and irregular components
from the trend. Furthermore, since irregular variation occurs by chance and cannot
be predicted or identified accurately, it can be reasonably assumed that most of
the variation in time series left unexplained by the trend component can be
explained by the cyclical component. In that respect, cyclical variation can be
considered as residual, once other causes of variation have been identified.
The measure of cyclic variation as percentage of trend is calculated as
follows:
1. Determine the trend line (usually by regression analysis).
2. Compute the trend value Yt for each time period (t) under consideration.
3. Calculate the ratio Y/Yt for each time period.
4. Multiply this ratio by 100 to get the percentage of trend, so that:
Y
100.
Percentage of trend =
Yt
Example 5.3:
The following is the data for energy consumption (measured in quadrillions of
BTU) in the United States from 1981–1986 as reported in the Statistical Abstracts
of the United States.
Year Time Period (t) Annual Energy
Consumption (Y)
1981 1 74.0
1982 2 70.8
1983 3 70.5
1984 4 74.1
1985 5 74.0
1986 6 73.9
Assuming a linear trend, calculate the percentage of trend for each year
(cyclical variation).
Solution:
First, we find the secular trend by the regression line method, which is
given by:
Yt b0 b1t
n (ty) ( t)( y)
where b1
n( t 2 ) ( t)2
and b0 y b1t
9221.4 – 9183.3
=
546 – 441
38.1
= = 0.363
105
and b0 = y – b1t
y 437.3
where y 72.88
n 6
21
t 3.5
6
Hence, b0 72.88 0.363(3.5)
= 72.88 – 1.27
= 71.61
Then, Yt 71.61 0.363t
Calculating the value of Yt for each time period, we get the following table
for the percentage of trend (Y/Yt)100.
Time Period Energy Consumption Trend Percentage of Trend
(t) (Y) (Yt) (Y/Yt)100
1 74.0 71.97 102.82
2 70.8 72.34 97.87
3 70.5 72.70 96.97
4 74.1 73.06 101.42
5 74.0 73.43 100.77
6 73.9 73.79 100.15
The following graph shows the actual energy consumption (Y), trend line
(Yt) and the cyclical fluctuations above and below the trend line over the time
period (t) for 6 years.
Yt
The percentage of trend figures show that in 1981, the actual consumption
of energy was 102.82 per cent of expected consumption that year and in 1983,
the actual consumption was 96.97 per cent.
g
s
Short-Answer Questions
1. Differentiate between secular trend and cyclic fluctuation.
2. How is irregular variation caused?
3. Define seasonal variation.
4. What do you mean by trend analysis?
5. How will you measure cyclical effect?
6. Describe the simple average method of isolating seasonal fluctuations in
time series.
Long-Answer Questions
1. The following data shows the number of Lincoln Continental cars sold by a
dealer in Queens during the 12 months of 1994.
Month Number Sold
Jan 52
Feb 48
Mar 57
Apr 60
May 55
June 62
July 54
Aug 65
Sept 70
Oct 80
Nov 90
Dec 75
2. The owner of six gasoline stations in New Jersey would like to have some
reasonable indication of future sales. He would like to use the moving average
method to forecast future sales. He has recorded the quarterly gasoline sales
(in thousands of gallons) for all his gas stations for the past three years. These
are shown in the following table.
Year Quarter Sales
1 1 38
2 58
3 80
4 30
2 1 40
2 60
3 50
4 55
3 1 50
2 45
3 80
4 70
The General Manager asked the company statistician to calculate the five-
year moving average.
(a) If you were the company statistician, how would you present the results
to the General Manager?
(b) Calculate the seven-year moving average and explain the difference
between the five-year moving average and the seven-year moving
average to the General Manager as to which of the two smoothes the
data better.
(c) Plot the actual sales and the seven-year moving average sales on the
same graph and interpret it.
5. An economist has calculated the variable rate of return on money market
funds for the last twelve months as follows:
Month Rate of Return (%)
January 6.2
February 5.8
March 6.5
April 6.4
May 5.9
June 5.9
July 6.0
August 6.8
September 6.5
October 6.1
November 6.0
December 6.0
(a) Using a three-month moving average, forecast the rate of return for
next January.
(b) Using exponential smoothing method and setting, = 0.8, forecast the
rate of return for next January.
6. The Indian Motorcycle Company is concerned about declining sales in the
western region. The following data shows monthly sales (in millions of dollars)
of the motorcycles for the past twelve months.
Month Sales
January 6.5
February 6.0
March 6.3
April 5.1
May 5.6
June 4.8
July 4.0
August 3.6
September 3.5
October 3.1
November 3.0
December 3.0
(a) Plot the trend line and describe the relationship between sales and time.
(b) What is the average monthly change in sales?
(c) If the monthly sales fall below $2.4 million, then the West Coast office
must be closed. Is it likely that the office will be closed during the next
six months?
7. The chief economist for New York State Department of Commerce reported
the following seasonally adjusted values for the consumption of durable goods
for the last twelve-month period.
Month Index
January 119
February 114
March 115
April 116
May 120
June 113
July 115
August 112
September 117
October 116
November 121
December 124
p p p p
n r0 0.10 0.25 0.40 0.50
e– e– e– e–
(Mean) (Mean) (Mean) (Mean)
x ex e–x x ex e–x
0.0 1.000 1.000
0.1 1.105 0.905 5.1 164.0 0.0061
0.2 1.221 0.819 5.2 181.3 0.0055
0.3 1.350 0.741 5.3 200.3 0.0050
0.4 1.492 0.670 5.4 221.4 0.0045
0.5 1.649 0.607 5.5 244.7 0.0041
0.6 1.822 0.549 5.6 270.4 0.0037
0.7 2.014 0.497 5.7 298.9 0.0033
0.8 2.226 0.449 5.8 330.3 0.0030
0.9 2.460 0.407 5.9 365.0 0.0027
1.0 2.718 0.368 6.0 403.4 0.0025
1.1 3.004 0.333 6.1 445.9 0.0022
1.2 3.320 0.301 6.2 492.8 0.0020
1.3 3.669 0.273 6.3 544.6 0.0018
1.4 4.055 0.247 6.4 601.8 0.0017
1.5 4.482 0.223 6.5 665.1 0.0015
1.6 4.953 0.202 6.6 735.1 0.0014
1.7 5.474 0.183 6.7 812.4 0.0012
1.8 6.050 0.165 6.8 897.8 0.001 1
1.9 6.686 0.150 6.9 992.3 0.0010
2.0 7.389 0.135 7.0 1096.6 0.0009
2.1 8.166 0.122 7.1 1212.0 0.0008
2.2 9.025 0.111 7.2 1339.4 0.0007
2.3 9.974 0.100 7.3 1480.3 0.0007
2.4 11.023 0.091 7.4 1636.0 0.0006
2.5 12.18 0.082 7.5 1808.0 0.00055
2.6 13.46 0.074 7.6 1998.2 0.00010
2.7 14.88 0.067 7.7 2208.3 0.00045
2.8 16.44 0.061 7.8 2440.6 0.00041
2.9 18.17 0.055 7.9 2697.3 0.00037
3.0 20.09 0.050 8.0 2981.0 0.00034
3.1 22.20 0.045 8.1 3294.5 0.00030
3.2 24.53 0.041 8.2 3641.0 0.00027
3.3 27.11 0.037 8.3 4023.9 0.00025
3.4 29.96 0.033 8.4 4447.1 0.00022
3.5 33.12 0.030 8.5 4914.8 0.00020
3.6 36.60 0.027 8.6 5431.7 0.00018
3.7 40.45 0.025 8.7 6002.9 0.00017
3.8 44.70 0.022 8.8 6634.2 0.00015
3.9 49.40 0.020 8.9 7332.0 0.00014
4.0 54.60 0.018 9.0 8103.1 0.00012
4.1 60.34 0.017 9.1 8955.3 0.00011
4.2 66.69 0.015 9.2 9897.1 0.00010
4.3 73.70 0.014 9.3 10938 0.00009
4.4 81.45 0.012 9.4 12088 0.00008
4.5 90.02 0.011 9.5 13360 0.00007
4.6 99.48 0.010 9.6 14765 0.00007
4.7 109.95 0.009 9.7 16318 0.00006
4.8 121.51 0.008 9.8 18034 0.00006
4.9 134.29 0.007 9.9 19930 0.00005
5.0 148.4 0.0067
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4804 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
TABLE V
Cumulative Normal Distribution
[Extract from the Table Concerning the Area under the Normal Curve]
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7703 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8454 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8927 0.90147
1.3 0.9032 0.9049 0.90658 0.90824 0.90988 0.91149 0.91309 0.91466 0.91621 0.91774
1.4 0.91924 0.92073 0.92220 0.92364 0.92507 0.92647 0.92785 0.92922 0.93056 0.93189
1.5 0.93319 0.93448 0.93574 0.93699 0.93822 0.93943 0.94062 0.94179 0.94295 0.99406
1.6 0.94520 0.94630 0.94738 0.94845 0.94950 0.95053 0.95154 0.95254 0.95352 0.95449
1.7 0.95543 0.95637 0.95728 0.95818 0.95907 0.95994 0.96080 0.96164 0.96246 0.96327
1.8 0.96407 0.96485 0.96562 0.96638 0.96712 0.96784 0.96851 0.96926 0.96995 0.97062
1.9 0.97128 0.97193 0.97257 0.97320 0.97381 0.97441 0.97500 0.97558 0.97615 0.97670
2.0 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98125 0.98169
2.1 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574
2.2 0.98610 0.98645 0.98679 0.98713 0.98745 0.98773 0.98809 0.98840 0.98870 0.98899
2.3 0.98928 0.98956 0.98983 0.990097 0.990358 0.090613 0.990863 0.991106 0.991344 0.991576
2.4 0.991802 0.992024 0.992240 0.992451 0.992656 0.992857 0.993053 0.993244 0.993431 0.993613
Note: When Z = 4.09, the table value is 0.99997843 (i.e. = 1 for all practical purposes).
(This table contains the area of left half of the curve (i.e., 0.5) plus the relevant area of the right half for different values of ‘Z’)
Statistics for Management
Appendix
Appendix
TABLE VI
Some Critical Values of ‘t ’
Level of Significance
Note: These table values of ‘t’ are in respect of two-tailed tests. If we use the t-distribution for one-
tailed test then we are interested in determining the area located in one tail. So to find the
appropriate t-value for a one-tailed test say at a 5% level with 12 degrees of freedom, then we
should look in the above table under the 10% column opposite the 12 degrees of freedom row.
(This value will be 1.782). This is true because the 10% column represents 10% of the area under
the curve contained in both tails combined, and so it also represents 5% of the area under the
curve contained in each of the tails separately.
Level of Significance
Degrees of
Freedom 10% 5% 1%
Note: For degrees of freedom greater than 30, the quantity 2x2 2v 1 may be used as a
normal variate with unit variance.
1 4052 5000 5403 5625 5764 5859 5982 6106 6235 6366
2 98.50 99.00 99.17 99.25 99.30 99.33 99.37 99.42 99.46 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.49 27.05 26.60 26.13
4 21.20 18.20 16.69 15.88 15.52 15.21 14.80 14.37 13.93 13.45
5 16.26 13.27 12.06 11.39 10.97 10.67 10.29 9.89 9.47 9.02
6 13.75 10.92 9.78 9.15 8.75 8.47 8.10 7.72 7.31 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.84 6.47 6.07 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.67 5.28 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.47 5.12 4.73 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 4.33 3.91
11 9.65 7.21 6.22 5.87 5.32 5.07 4.74 4.40 4.02 3.60
12 9.33 6.93 5.95 5.41 5.06 4.82 4.50 4.16 3.78 3.36
13 9.07 6.70 5.74 5.21 4.86 4.62 4.30 3.96 3.59 3.17
14 8.86 6.51 5.56 5.04 4.69 4.46 4.14 3.80 3.43 3.00
15 8.68 6.36 4.42 4.89 4.56 4.32 4.00 3.67 3.29 2.87
16 8.53 6.23 5.29 4.77 4.44 4.20 3.89 3.55 3.18 2.75
17 8.40 6.11 5.18 4.67 4.34 4.10 3.79 3.46 3.08 2.65
18 8.29 6.01 5.09 4.58 4.25 4.01 3.71 3.37 3.00 2.59
19 8.18 5.93 5.01 4.50 4.17 3.94 3.63 3.30 3.92 2.49
20 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.23 2.86 2.42
21 8.02 5.78 4.87 4.37 4.04 3.81 3.51 3.17 2.80 2.36
22 7.95 5.72 4.82 4.31 3.99 3.76 3.45 3.12 2.75 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.41 3.07 2.70 2.26
24 7.82 5.61 4.72 4.22 3.90 3.67 3.36 3.03 2.66 2.21
25 7.77 5.57 4.68 4.18 3.85 3.63 3.32 2.99 2.62 2.17
26 7.72 5.53 4.64 4.14 3.82 3.59 3.20 2.96 2.58 2.10
27 7.68 5.49 4.60 4.11 3.78 3.56 3.26 2.93 2.45 2.13
28 7.64 5.45 4.57 4.07 3.75 3.53 3.23 2.90 2.52 2.06
29 7.60 5.42 4.54 4.04 3.73 3.50 3.20 2.87 2.49 2.03
30 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 2.47 2.01
40 7.31 5.18 4.31 3.83 3.51 3.29 2.99 2.66 2.29 1.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.82 2.50 2.12 1.60
120 6.85 4.79 3.95 3.48 3.17 2.96 2.66 2.34 1.95 1.38
6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 1.79 1.00
v1 1 2 3 4 5 6 8 12 24
v2
1 161.4 199.5 215.7 224.6 230.2 234.0 238.9 243.9 249.0 254.3
2 18.51 19.00 19.16 19.25 19.30 19.33 19.37 19.41 19.45 19.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.84 8.74 8.64 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 5.77 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 4.53 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 3.84 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.57 3.41 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 3.12 3.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.23 3.07 2.90 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.07 2.91 2.74 2.54
11 4.84 3.98 3.59 3.36 3.20 3.09 2.95 2.79 2.61 2.40
12 4.75 3.88 3.49 3.26 3.11 3.00 2.85 2.69 2.50 2.30
13 4.67 3.80 3.41 3.18 3.02 3.92 2.77 2.60 2.42 2.21
14 4.60 3.74 3.34 3.11 2.96 2.85 2.70 2.53 2.35 2.13
15 4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 2.29 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.59 2.42 2.24 2.01
17 4.45 3.59 3.20 2.96 2.81 3.70 2.55 2.38 2.19 1.96
18 4.41 3.55 3.16 2.93 2.77 2.66 2.51 2.34 2.15 1.92
19 4.38 3.52 3.13 2.90 2.74 2.63 2.48 2.31 2.11 1.88
20 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84
21 4.32 3.47 3.07 2.84 2.68 2.57 2.42 2.25 2.05 1.81
22 4.30 3.44 3.05 2.82 2.66 2.55 2.40 2.23 2.03 1.78
23 4.28 3.42 3.03 2.80 2.64 2.53 2.38 2.20 2.00 1.76
24 4.26 3.40 3.01 2.78 2.62 2.51 2.36 2.18 1.98 1.73
25 4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 1.96 1.71
26 4.22 3.37 2.98 2.74 2.59 2.47 2.32 2.15 1.95 1.69
27 4.21 3.35 2.96 2.73 2.57 2.46 2.30 2.13 1.93 1.67
28 4.20 3.34 2.95 2.71 2.56 2.44 2.29 2.12 1.91 1.65
29 4.18 3.33 2.93 2.70 2.54 2.43 2.28 2.10 1.90 1.64
30 4.17 3.32 2.92 2.69 2.53 2.42 2 .27 2.09 1.89 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 1.79 1.51
60 4.00 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70 1.89
120 2.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 1.61 1.25
3.84 2.99 2.60 2.37 2.21 2.09 1.94 1.75 1.52 1.00
(Extract of 80 random numbers lying between 00 and 99 from R.A. Fisher and F. Yates’ table of
random numbers)
44 16 09 83 19
84 07 99 11 32
82 77 81 45 14
50 66 97 65 31
85 50 30 34 96
40 20 36 89 03
96 50 75 12 93
88 95 72 64 16
16 58 79 86 62
16 44 83 46 24
97 77 07 32 08
92 11 00 76 38
39 08 42 07 88
33 38 13 51 74
83 87 97 25 47
42 45 16 36 00