2024 Statistics Lecture Notes
2024 Statistics Lecture Notes
LECTURE NOTES
COURSE CONTENT
CHAPTER ONE: PROBABILITY THEORY
1.1 Introduction
1.2 Outcomes and Events
1.3 Probability Function
1.4 Properties of the Probability Function
1.5 Equally-Likely Outcomes
1.6 Joint Events
1.7 Conditional Probability
1.8 Independence
1.9 Law of Total Probability
1.10 Bayes Rule
1.11 Permutations and Combinations
1.12 Sampling With and Without Replacement
6.2 Hypotheses
6.3 Acceptance and Rejection
6.4 Type I and II Error
6.5 One-Sided Tests
6.6 Two-Sided Tests
6.7 What Does “Accept H0” Mean About H0?
6.8 Statistical Significance
6.9 P-Value
6.10 Application Exercises
CHAPTER ONE: PROBABILITY THEORY
For example, if we toss a coin into the air and allow it to land, there is the likelihood
(Probability) that it may land with either the Face up (H), or with the tail up (T). The total
number of opportunities of occurrence of the outcome will be two, that is, head (H) and tail (T).
This means that the probability that the coin will land with the face up (H) is 1/2, and similarly,
the probability that the coin will land with the tail up (T) is also 1/2. Again, if we cast a fair die
(With six faces {1, 2, 3, 4, 5, 6}), there is the possibility that one of the faces will be facing up.
The probability that one of the faces will be facing upward is 1/6.
The above definition of Probability in a quantitative sense, has a limitation in that, we sometimes
encounter events whose total opportunity of occurrence is very rare. For instance, in an industry,
what is the probability that a plant may malfunction and lead to many deaths? What is the
probability that a space satellite will fall out of orbit and land in Bamenda City? What is the
probability that a pregnant woman in her first delivery will give birth to twin children, given that
no woman in her family had done so before? Hence, we notice that to compute the probability of
these events following the definition above, will be very difficult. This type of probability which
can be determined with a high degree of confidence, purely on formal or logical grounds,
independent of experiences of the event, is known as Theoretical Probability. On the other
hand, the type of probability which is computed on the basis of past experiences of the event
(Such as the illustration above for the coin and the fair die), is known as Empirical Probability.
When the expected outcome of an event has an uncertain value, they are called random
variables or chanced variables or stochastic variables. The random variables can either be
continues or discrete, some examples of events that generate random variables include; tossing
of the coins, weighing of bricks, drawing of cards, Playing Football, etc.
A sample space is a set of the all possible outcomes of a probability experiment. It is also known
as the outcome space of the experiment e.g. In the experiment of tossing a coin into the air, the
sample space will be, S = {H, T}. If two coins are flipped the sample space is S = {H H, HT, T
H, T T }. In the case of rolling a fair die with six faces, the sample space will be S= {1, 2, 3, 4,
5, and 6}. The various outcomes that define the sample space must be mutually exclusive and
exhaustive. Outcomes are said to be mutually exclusive if no two of the outcomes (results) can
both occur on a given trial i.e. only one of the outcomes can occur on a given trail. When the
outcomes that define a sample space includes every possible outcome of the experiment, we say
that the outcomes are exhaustive. For example, consider that probability experiment of tossing a
coin into the air. The two outcomes of the experiment, head (H) and tail (T), are said to be
mutually exclusive, because, in a single trial or experiment, the two outcomes cannot occur at
the same time. Again, the outcomes are said to be exhaustive because H and T represent all the
possible outcomes for the toss of the coin. The outcomes which fulfil the two conditions (Mutual
Exclusivity and Exhaustivity) are called Simple Outcomes of an experiment.
N|B: It should be noted that defining the sample space of an experiment is the most crucial and
also the most difficult aspect in solving a probability problem.
1.4 Outcomes
An outcome is a specific result when an experiment is conducted. For example, in the
experiment of tossing a coin into the air, the possible outcomes are either head (H) or tail (T ). If
two coins are flipped in sequence, we may write an outcome as HT for a head and then a tail. A
roll of a six-sided fair die, has the following six outcomes {1,2,3,4,5,6}.
1.5 Event
An event (A), is a subset of outcomes in the outcome space. For example, when we roll a fair
die with six faces {1, 2,3,4,5,6}, the occurrence of any of the six faces is known as an event in
statistics. Hence, a sample space for an experiment will always contain all the possible events or
outcomes for the experiment.
The outcomes of an experiment are said to be equally-likely, when each of the outcomes in the
sample space has the same probability of occurring, when the experiment is conducted. For
instance, if we consider the experiment of tossing a coin into the air, the possible outcomes are
Head(H) and tail (T). Since each of them have the same probability of occurring, when the
experiment is conducted, they are said to be equally-Likely outcomes of the experiment. The
same can be considered for the outcomes, when a fair die is rolled.
In assigning the probability value to the various outcomes of an experiment, some important
rules must be respected;
1 => The probability of a simple outcome will be represented by numbers between zero and
one 0 ≤ P(A) ≤ 1. This means that the probabilities of a simple outcome cannot exceed one,
and it is non-negative.
2 => The sum of the probabilities of all the simple outcomes within a sample space will be
equal to one (1). That is,∑ P ( S ) =1 where S=Sample space . This means that the sum of the
probabilities of the simple outcomes within the sample space cannot exceed 1.
3 => If we have a simple outcome (A) in a sample space, the probability that the simple
outcome will occur P(A) plus the probability that the outcome will not occur P(~A) must be
equal to one. That is, P(A) + P(~A) = 1
P(~A) is known as the probability of the complement of the event (A). It follows from above
that P(A) = 1 – P(~A)
Tree Diagram for Determining the Sample Space for a Double Coin Toss Experiment.
(Probability Distribution)
1st Coin Toss Landing 2nd coin toss landing Sample space = 4
Probability
H HH 1/4
H
T HT 1/4
H TH 1/4
T
T TT 1/4
4 1
Hence the sum of the probabilities of all the simple outcomes is given by;
P(A) = ¼ + ¼ + ¼ + ¼ =1
-
.From the figure above, we can deduce the following;
- The probability that at least two heads (H, H) will occur in the double coin toss will be
P(H,H) = ¼. Similarly
- The probability that at least two tails (T, T) will occur in the double coin toss will be P(T,
T) = ¼.
- The that at least Head H will occur in the double tosses is P(H,H) + P(H,T)+P(T,H) = ¼
+¼+¼=¾
- The probability that one head will occur in the double toss P(H,T) + P(T,H). = ¼ + ¼ = ½
1.10 COMPOSITE OUTCOMES OR JOINT EVENTS
Composite outcomes can be defined as the sum of the simple outcomes that make up the
composite outcome or events. For instance, let us consider the example earlier treated above,
for the experiment of rolling a fair die with six faces. The sample space for the various
outcomes is defined as s = {1, 2, 3, 4, 5, 6}. Since there are equally- likely outcomes,
1. The probability that 1 will occur when the experiment is conducted is Pr (1) = 1/6
2. The probability that 2 will occur when the experiment is conducted is Pr (2) =1/6
3. The probability that 3 will occur when the experiment is conducted is Pr (3) =1/6
4. The probability that 4 will occur when the experiment is conducted is Pr (4) =1/6
5. The probability that 5 will occur when the experiment is conducted is Pr (5) =1/6
6. The probability that 6 will occur when the experiment is conducted is Pr (6) =1/6
These are all examples of simple outcomes and their respective probabilities.
Now, if we are interested in knowing the probability that an odd number will occur when the
experiment is conducted, it will be Pr (odd NO) = Pr (1) + Pr (3) + Pr (5).
This is equal to Pr (odd NO) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2.
Similarly, the probability that an even number will occur when the experiment is conducted
is given by; be Pr (Even NO) = Pr (2) + Pr (4) + Pr (6).
This is equal to Pr (Even NO) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2.
The probability of an odd number to occur and the probability of an even number to occur,
when the experiment is conducted, are examples of composite or joint events.We can
therefore conclude by saying that, the probability of a composite outcome is equal to the
sum of the probabilities of the simple outcomes that make up the composite outcome or
event.
P [ A ∩ B]
P[A|B] =
P( B)
Similarly, the probability that B occurs when A occurs is given by the formula;
P [ A ∩ B]
P[B|A] =
P( A)
For example, take the roll of a fair die. Let A = {1,2,3,4} and B = {4,5,6}. The intersection is
A∩B = {4}, which has probability P[A ∩B] = 1/6. The probability of B is P[B] = 1/2. Thus P[A|
B] = (1/6)/(1/2) = 1/3. This can also be calculated by observing that conditional on B, the events
{4}, {5}, and {6} each have probability 1/3. Event A only occurs given B if {4} occurs. Thus
P[A | B] = P[4| B] = 1/3.
1.12. Independent Events
We say that events are independent if their occurrence is unrelated, or equivalently that the
knowledge of one event does not affect the conditional probability of the other event. Take two-
coin flips. If there is no mechanism connecting the two flips, we would typically expect that
neither flip is affected by the outcome of the other. Similarly, if we take two die throws, we
typically expect that, there is no mechanism connecting the dice and thus no reason to expect that
one is affected by the outcome of the other. This discussion implies that two unrelated
(independent) events A and B will satisfy the properties P [A | B] = P[A] and P [B | A] = P[B]. In
words, the probability that a coin is H is unaffected by the outcome (H or T) of another coin.
From the definition of conditional probability this implies P [A |B] = P[A]P[B]. Hence, events A
and B are statistically independent if P [A| B] = P[A] P[B]. We can therefore conclude that, If A
and B are independent with P[A] = 0 and P[B] = 0, then
P [A | B] = P[A]
P [B | A] = P[B].
Exercise 3. By considering the example concerning the test on cow for BSE or Mad Cow
Disease, which we illustrated above. Suppose a cow chosen at random is tested positive, what is
the probability that it has the Mad Cow Disease (BSE)?
Hint; The question is asking for the value of P (B|T). The information we are given is P(T|B)
which is the wrong one, so we have to switch between T and B.
By a similar calculation: P(B | T c) = 0.0068. These probabilities reflect that this Test is not a
very good test; a perfect test would result in P(B | T) = 1 and P(B | T c) = 0
What we have just seen is known as Bayes’ rule, after the English clergyman Thomas Bayes
who derived this in the 18th century. Bayes’ rule is stated as follows;
Suppose the events C1, C2, . . . , Cm are disjoint and C1 ∪ C2 ∪ · · · ∪ Cm = Ω. The conditional
probability of Ci, given an arbitrary event A, can be expressed as:
. P( A∨Ci)· P(Ci)
P(Ci | A) =
P ( A∨C 1) P(C 1)+ P( A∨C 2)P(C 2)+· ··+ P (A∨Cm) P(Cm)
This is the traditional form of Bayes’ formula. It follows from
P ( A∨Ci )· P(Ci)
P(Ci | A) =
P( A)
A. PERMUTATION
A permutation is an order in which n different object can be placed. For instance, consider three
numbers given by 1,2, and 3. How many different ways can these numbers be arranged?
The sample space is given by S = {123, 132, 213, 231, 312, 321}. Hence, there are 6 different
ways or permutations of arranging three numbers.
Consider four numbers given by 1, 2. 3, and 4. How many different ways can these numbers be
arranged?
In general, if we have n objects, then the possible ways of arranging the n objects will ne given
by; n · (n - 1) · · · · 3 · 2 · 1 = n!
Where (n!) is the possible permutations of n objects. Here n! is the standard notation for this
product and is pronounced “n factorial.” It is convenient to define 0! = 1.
The binomial distribution is a family of distribution all of which have certain characteristics in
common. It describe the number of “success” obtained when a number of identical “trial” of an
experiment are conducted. A binomial situation is characterized by the following;
1 the existence of a trial or an experiment which is defined in terms of two states “success”
represented by (p) and “failure” represented by q where q=1-p and p+q=1
N/B: In a binomial probability distribution; one event occurs independent of the other and we are
mostly concern with the probability of the number of successes when an experiment is conducted
repeatedly.
Hence the binomial distribution is of great importance in statistics, because it will enable us to
compute the probability that a sample of n-observations (or trials) will results in any specified
number of successes from 0 to n
When a probability experiment is conducted repeated with p = probability of success at any trial
and n = number of trials then the probability of obtaining x successes is given by
Pr (X) = nCx.px(1-p)n-x or
Pr (X) = nCx.px(q)n-x
With this formula, when the probability for all values of X computed of X, the result is known as
a binomial probability distribution.
N/B it is important to note that in the above formula nCx (known as n combination x) is obtain as
follows
n n!
Cx = x ! ( n−x ) !
n!
Pr (X) = .px(1-p)n-x
x ! ( n−x ) !
etc
N/B, it should be noted that the number of successes (x) never exceed the number of trial (n).
To represent the probability distribution, we plot the values of x’s against the probability
calculated.
APPLICATION EXERCISE 1
In a company, the probability that a machine will need some correcting adjustment during a days
production is 0.2. if there are 6 machines running on a particular day, find the probability that;
SOLUTION
Pr(x)=nCxpx (1-p)n-x
6 ×5 × 4 ×3 ×2 ×1
x=1 =>Pr(1)=6C1p1(1-0.2)5 = (0.2)1(0.8)5
1(5× 4 × 3 ×2 ×1)
Pr(1)= 6(0.2)(0.8)5
For x=2 => Pr(2)=6C2p2(1-p)6-2
6 ×5 × 4 ×3 ×2 ×1
Pr(2)= (0.2)2(0.8)4
2! (4 )!
Pr(2) = 15(0.2)2(0.8)4
=1-[0.262+0.392 + 0.246]
20 100
= 5x = = 1.00
100 100
μx=E(x)= 1.00
δx = √∑ [X −E ( X ) ] P ¿ ¿) = √ npq
i i
2
But q =1-p
δ x =∑ [ X i−E ( X i ) ] P ¿) =√ np(1− p)
2
√ √
δ x = 5 x 25 x 75 = 9375 = 0.9682
100 100 10000
*** N/B. binomial probability formulae are suitable for use when the number of observation (n)
is small e.g less than 20. However, if the sample size (n) is large, it will be better to use the
Normal Probability calculations. The sole reason for this is because as the sample size (n) gets
larger (n>30), a binomial distribution approximates to a Normal Distribution.
When the probability of success (p) is equal to the probability of failure (q) or (1-p)) for a
Binomial Distributions, we say that the distribution is Symmetrical. However, if P ≠ q, the
distribution is said to be Asymmetrical or Skewed.
This is another type of discrete probability distribution which is very similar to the binomial
distribution even though it uses different formula for the computation of the probability of x i
successes in n trials. In this situation, if the probability of an average ( mean number) of
successes (p) per unit of measure is very small or close to zero e.g P = 0.001, 0.01, 0.02 etc it
will be advisable to use the Poisson distribution. (Another rule is that if the probability of success
(P) is to small such that np<10 use Poisson distribution).
According to the Poisson distribution, the probability of k successes in n trials, when “P”
is the probability of success in one trial is approximately
−μ k k
e μ
P(X=k) = = e− μ μ
k! k!
Where μ=np is the mean number of successes per unit of a sample, It is the mean of the Poisson
distribution.
e = base of natural logarithms. Its value is equal to 2.71828. By substituting these elements
inside the formula for the Poisson distribution given above, we shall have;
[np]k
P(X=k) = e−np
k!
The mean of the Poisson distribution is just like the mean of the Binomial Distribution . it is
given by μ= np
δ = √ μ = √ np
***N/B. in the Binomial Distribution , the important quantities are the fixed number of
observation (n) and the probability of success in any given observation or trial(p)
For the Poisson Distribution, the quantity that is necessary in specifying the distribution is the
mean number of success occurring per unit of measure of mean of the distribution.
The Binomial Probability formulae given above are very practical in situation where the number
of trials (n) and consequently the number of successes (k) are small. But as the number of trials
get larger, the Binomial Distribution get close to a Normal Distribution. Hence whenever n is
large we can use the Normal Probability calculations to calculate the Binomial Probabilities . but
how large should “n” be, to warrant the use of the normal probability?
Answer
Some hold that when n is greater than 20 i.e n≥ 20 we can use the Normal Probability.
Others think that when n is large , the distribution of X will be approximately normal with mean
(U) = np an the standard deviation δ=√ np (1− p¿)¿ .
Hence, as a rule of thumb, we use the normal approximation when “n” and “p” satisfies the
following conditions;
1=> np ≥ 10
2=> n(1-p) ≥ 10
EXAMPLE
**A Recent Survey was carried out on a random sample of 2500 adults in a nation .The aim of
the survey was to know if they Agree or Disagree, to buy a new product that has just been
introduced into the market. Suppose that 60% of all adults in the nation would say “Agree” if
asked the question, what is the probability that 1520 or more of the sample agree.
SOLUTION
The responses of 2500 randomly chosen adult are independent of each other. and in each case a
respondant will either Agree or disagree. Hence the number in our sample who agree that they
will buy the new product is a discrete variable which follows the binomial distribution with n=
2500 and P = 0.6. But we notice that the sample size (n) is so large that np ≥ 10
60
i.e 250× =1500 and
100
40
2500(1- 0.6) =2500× =1000. n(1-p) ≥0
100
Hence, we use the Normal Probability calculation to calculate the Binomial Probability as
follows;
μ= np = (2500)(0.6) =1500
X N(1500, 24.49)
X i−μ
The probability will be given by; Zc = = 0.82
δ
P(X≥1520) = P( Z ≤ 0.82 )
But the area between negative infinity and the mean = 50% = 0.5000
= 1 - 0.7939