0% found this document useful (0 votes)
8 views

Biostat Lecture five

Chapter five introduces the concept of probability, defining it as the chance of observing a particular outcome in a random process. It emphasizes the importance of probability in statistics and medicine, particularly in decision-making under uncertainty. The chapter covers basic terms, types of probability, and key rules for calculating probabilities, including conditional probabilities and the distinction between independent and mutually exclusive events.

Uploaded by

birukfirdut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Biostat Lecture five

Chapter five introduces the concept of probability, defining it as the chance of observing a particular outcome in a random process. It emphasizes the importance of probability in statistics and medicine, particularly in decision-making under uncertainty. The chapter covers basic terms, types of probability, and key rules for calculating probabilities, including conditional probabilities and the distinction between independent and mutually exclusive events.

Uploaded by

birukfirdut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 72

chapter five

Introduction to Probability

02/08/25 [email protected] 1
Probability
Probability is the chance of observing a particular outcome or likelihood of
observing an event.
Assumes a “random” process: i.e.. the outcome is not predetermined - there
is an element of chance
Probability theory developed from the study of games of chance like dice and
cards.
A process like flipping a coin, rolling a die or drawing a card from a deck are
probability experiments.

02/08/25 [email protected] 2
Why Probability in Statistics and
medicine?
• Because medicine is not an exact science,
physicians seldom can predict an outcome with
absolute certainty.

E.g., to formulate a diagnosis, a physician must


rely on available diagnostic information about a
patient
– History and physical examination

– Laboratory studies, X-ray findings, ECG, etc

• Although no test result is absolutely accurate, it


02/08/25 [email protected] 3
An understanding of probability is fundamental for quantifying the

uncertainty that is inherent in the decision-making process

Probability theory is a foundation for statistical inference, &

Allows us to draw conclusions about a population of patients based

on information obtained from a sample of patients drawn from that

population.

02/08/25 [email protected] 4
Basic Terms of
Probability
• Probability experiment: is an action through
which specific results/outcomes (counts,
measurements or responses) are obtained.
Example:
• Tossing a coin and observing the face showing
up is a probability experiment.
• Outcome: It is the result of a single trial in a
probability experiment. It is also called simple
event.
Example: the outcome of the sex of a newborn
from a mother in delivery room is either Male
or female

02/08/25 [email protected] 5
Basic Terms
Cont…
• Sample space: The set of all possible outcomes of a statistical
experiment is called the sample space and is represented by the
symbol S.
Example: The sample space for the sex of newborns
when two mothers are in the gynecology ward to
give birth is: S= {MM, MF, FM, FF}
• An event: consists of one or more outcomes and is a
subset of the sample space
Example: From the above experiment, an event
consisting of
at least one female is E = {MF, FM, FF}
• Random Variable: is a function that associates a unique
numerical value with every outcome of an experiment.
02/08/25 [email protected] 6
Basic terms…
 Certain event: An event which is sure to occur.

 Impossible event: An event which can't occur.

 Complement of an event: The complement of

event A (denoted by A` ), consists of all the


sample points in the sample space that are not
in A.

02/08/25 [email protected] 7
Two Categories of
Probability
Objective and Subjective Probabilities.
Objective probability
1) Classical probability and
2) Relative frequency probability.

02/08/25 [email protected] 8
Types of
probability
Classical (or theoretical)
probability
It is used when each outcome in a sample
space is equally likely to occur.
That is if an experiment has n equally likely
outcomes, then each possible outcome must
have probability of 1/n to occur Or, equivalently
the probability for event E is;

Example: The probability of getting at least one


female birth from two pregnant mothers is:
¾ = 0.75

02/08/25 [email protected] 9
Types of probability
cont…
Empirical (or statistical) probability: is based
on observations obtained from
experiments /a large number of trials or
from historical data.

Example:
•A medical doctor realized that out of 100,000
patients visited the hospital, there are 50
cancer cases. What is the probability that a
patient to be examined will be positive for
cancer?
P(+ve
02/08/25 for cancer) = 50/100,000 = 0.0005
[email protected] 10
Example 2
In a sample of 50 people, 21 had type O blood, 22 had
type A blood, 5 had type B blood, and 2 had type
AB blood. Set up a frequency distribution and find
the following probabilities
a. A person has type O blood
b. A person has type A or type B blood
c. A person has neither type A nor type O blood
d. A person does not have type AB blood

02/08/25 [email protected] 11
Solutio
n
Blood type Frequency
A 22
B 5
AB 2
O 21
Total 50

 P(o) = 21/50 = 0.42


 P(A)= 22/50 = 0.44
 P (A or B)=p(A)+P(B)=
22/50+5/50=27/50
 Do others in this way?

02/08/25 [email protected] 12
Example: Of 158 people who attended a dinner party, 99 were ill.

P (Illness) = 99/158 = 0.63 = 63%.


In 1998, there were 2,500,000 registered live births; of these, 200,000
were LBW infants.
Therefore, the probability that a newborn is LBW is estimated by P
(LBW) = 200,000/2,500,000 = 0.08

02/08/25 [email protected] 13
Subjective Probability
Personalistic (represents one’s degree of belief in the occurrence of
an event).

Personal assessment of which is more effective to provide cure –


traditional/modern

Personal assessment of which sports team will win a match.

Also uses classical and relative frequency methods to assess the


likelihood of an event.

02/08/25 [email protected] 14
E.g., If someone says that he is 95% certain that a cure for AIDS will be
discovered within 5 years, then he means that:
P(discovery of cure for AIDS within 5 years) = 95% = 0.95
Although the subjective view of probability has enjoyed increased
attention over the years, it has not fully accepted by scientists.

02/08/25 [email protected] 15
Mutually Exclusive
Events
Two events A and B are mutually exclusive if they cannot both Happen
at the same time:
P (A ∩ B) = 0
Example:
 A coin toss cannot produce heads and tails simultaneously.
 Weight of an individual can’t be classified simultaneously as
“underweight”, “normal”, “overweight”

02/08/25 [email protected] 16
Independent Events
Two events A and B are independent if the probability of the first one
happening is the same no matter how the second one turns out.
The outcome of one event has no effect on the occurrence or non-
occurrence of the other.

P(A∩B) = P(A) x P(B) (Independent events)

P(A∩B) ≠ P(A) x P(B) (Dependent events)

Example:
 The outcomes on the first and second coin tosses are independent

02/08/25 [email protected] 17
Intersection, and union
The intersection of two events A and B, A ∩ B, is the event that A and
B happen simultaneously

P ( A and B ) = P (A ∩ B )

Let A represent the event that a randomly selected newborn is LBW,


and B the event that he or she is from a multiple birth

The intersection of A and B is the event that the infant is both LBW
and from a multiple birth

02/08/25 [email protected] 18
The union of A and B, A U B, is the event that either A happens or B
happens or they both happen simultaneously

P ( A or B ) = P ( A U B )
In the example above, the union of A and B is the event that the
newborn is either LBW or from a multiple birth, or both

02/08/25 [email protected] 19
Probability concept is used to understand:
 About probability distributions: Binomial, Poisson, and Normal
Distributions
 Sampling and sampling distributions

 Estimation

 Hypothesis testing

 Advanced statistical analysis

02/08/25 [email protected] 20
Properties of
Probability
1. The numerical value of a probability always lies between 0 and 1,
inclusive.
0  P(E)  1
 A value 0 means the event can not occur=impossible event
 A value 1 means the event definitely will occur=sure event
 A value of 0.5 means that the probability that the event will occur
is the same as the probability that it will not occur.

02/08/25 [email protected] 21
2. The sum of the probabilities of all mutually exclusive outcomes is
equal to 1.

P(E1) + P(E2 ) + .... + P(En ) = 1.

3. For two mutually exclusive events A and B,

P(A or B ) = P(AUB)= P(A) + P(B).

If not mutually exclusive:

P(A or B) = P(A) + P(B) - P(A and B)

02/08/25 [email protected] 22
4. The complement of an event A, denoted by Ā or Ac, is the event
that A does not occur
Consists of all the outcomes in which event A does NOT occur

P(Ā) = P(not A) = 1 – P(A)


Ā occurs only when A does not occur.

These are complementary events.

02/08/25 [email protected] 23
In the example, the complement of A is the event that a newborn is
not LBW
In other words, A is the event that the child weighs 2500 grams at
birth

P(Ā) = 1 − P(A)

P(not low bwt) = 1 − P(low bwt)

= 1− 0.076

= 0.924

02/08/25 [email protected] 24
Basic Probability
Rules
1. Addition rule
 If events A and B are mutually exclusive:

P(A or B) = P(A) + P(B)


P(A n B) = 0
 More generally:

P(A or B) = P(A) + P(B) - P(A and B)


P(event A or event B occurs or they both occur)

02/08/25 [email protected] 25
Example: The probabilities below represent years of
schooling completed by mothers of newborn infants.

02/08/25 [email protected] 26
What is the probability that a mother has completed < 12 years of
schooling?
P( 8 years) = 0.056 and
P(9-11 years) = 0.159
Since these two events are mutually exclusive,

P( 8 or 9-11) = P( 8 U 9-11)


= P( 8) + P(9-11)
= 0.056+0.159
= 0.215

02/08/25 [email protected] 27
What is the probability that a mother has completed 12 or more years of
schooling?

P(12) = P(12 or 13-15 or 16)

= P(12 U 13-15 U 16)

= P(12)+P(13-15)+P(16)

= 0.321+0.218+0.230

= 0.769

02/08/25 [email protected] 28
If A and B are not mutually exclusive events,
then subtract the overlapping:
P(AU B) = P(A)+P(B) − P(A ∩ B)

02/08/25 [email protected] 29
2. Multiplication rule
 If A and B are independent events, then

P(A ∩ B) = P(A) × P(B)


 More generally,

P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B)


 P(A and B) denotes the probability that A and B both occur at
the same time.

02/08/25 [email protected] 30
Conditional Probability
Refers to the probability of an event, given that another event is
known to have occurred.

“What happened first is assumed”.

Hint - When thinking about conditional probabilities, think in stages.

Think of the two events A and B occurring chronologically, one after


the other, either in time or space.

02/08/25 [email protected] 31
The conditional probability that event B has occurred given that
event A has already occurred is denoted P(B|A) and is defined
provided that P(A) ≠ 0.

02/08/25 [email protected] 32
Example:
A study investigating the effect of prolonged exposure to
bright light on retina damage in premature infants.

Retinopathy Retinopathy TOTAL


YES NO
Bright light 18 3 21
Reduced light 21 18 39
TOTAL 39 21 60

02/08/25 [email protected] 33
The probability of developing retinopathy is:

P (Retinopathy) = No. of infants with retinopathy

Total No. of infants


= (18+21)/(21+39)

= 0.65

02/08/25 [email protected] 34
We want to compare the probability of retinopathy, given that the
infant was exposed to bright light, with that the infant was
exposed to reduced light.
Exposure to bright light and exposure to reduced light are
conditioning events, events we want to take into account when
calculating conditional probabilities.

02/08/25 [email protected] 35
The conditional probability of retinopathy, given exposure to bright
light, is:

P(Retinopathy/exposure to bright light) =

No. of infants with retinopathy exposed to bright light

No. of infants exposed to bright light

= 18/21 = 0.86

02/08/25 [email protected] 36
P(Retinopathy/exposure to reduced light) =

# of infants with retinopathy exposed to reduced light

No. of infants exposed to reduced light

= 21/39 = 0.54
The conditional probabilities suggest that premature infants exposed
to bright light have a higher risk of retinopathy than premature infants
exposed to reduced light.

02/08/25 [email protected] 37
 For independent events A and B

P(A/B) = P(A).

For non-independent events A and B

P(A and B) = P(A/B) P(B)

(General Multiplication Rule)

02/08/25 [email protected] 38
Test for
Independence
Two events A and B are Two events A and B are dependent
independent if: if:

P(B|A)=P(B) P(B|A) ≠P(B)

or or

P(A and B) = P(A) • P(B) P(A and B) ≠P(A) • P(B)

02/08/25 [email protected] 39
Example
In a study of optic-nerve degeneration in Alzheimer’s disease,
postmortem examinations were conducted on 10 Alzheimer’s
patients.

The following table shows the distribution of these patients according


to sex and evidence of optic-nerve degeneration.

Are the events “patients has optic-nerve degeneration” and “patient is


female” independent for this sample of 10 patients?

02/08/25 [email protected] 40
Optic-nerve Degeneration
Sex
Present Not Present

Female 4 1

Male 4 1

02/08/25 [email protected] 41
Solution
P(Optic-nerve degeneration/Female) =

No. of females with optic-nerve degeneration

No. of females

= 4/5 = 0.80

P(Optic-nerve degeneration) =

No of patients with optic-nerve degeneration

Total No. of patients

= 8/10 = 0.80

The events are independent for this sample.


02/08/25 [email protected] 42
Example
• The following data shows the association between
aspirin use and heart attack.
• Table 4.1: Data for treatment versus Myocardial
Infarction
Myocardial Infarction
Treatmen Yes No Total
t

Placebo 100 500 600


Aspirin 60 900 960
Total 160 1400 1560
Let us define A and B as, positive for Myocardial
Infarction And Aspirin used respectively
02/08/25 [email protected] 43
Example
cot…
• Find;
A. P(A/B), B. p(B/A)
C. Are the characteristics of A and B independent
Solution:
A. P(A/B) = P(A n B)/P(B) = 60/1560 ÷ 960/1560 = 0.0625
B. P(B/A) = P(B n A)/P(A) = 60/1560 ÷ 160/1560 = 0.375
C. To test independency p(A/B) = p(A) or p(A ∩ 𝐵) =
p(A)×p(B) Therefore: P(A/B) = 0.0625 where as p(A) =
160/1560 =0.103

Now P(A/B) ≠ p(A) i.e. 0.0625≠ 0.103


So, the characteristics of A and B are not independent,
i.e. they are dependent
02/08/25 [email protected] 44
Probability
distribution
Probability distribution refers to the way data are distributed, in
order to draw conclusions about a set of data.
It tells us how total probability 1 is distributed among the various
values which the random variable can take.
A probability distribution of a random variable can be displayed by a
table or a graph or a mathematical formula.
Random Variable = Any quantity or characteristic that is able to
assume a number of different values such that any particular
outcome is determined by chance.

02/08/25 [email protected] 45
Random variables: can be either discrete or continuous.

A discrete random variable is able to assume only a finite or countable


number of outcomes

A continuous random variable can take on any value in a specified


interval

The probability distribution can be displayed in the form of a table


giving the values and their associated probabilities and/or it can be
expressed as a mathematical formula giving the probability of all possible
values.

02/08/25 [email protected] 46
Common Probability
distributions
1. Binomial distribution
Consider a dichotomous variable (a nominal variable with only two
possible values).
The two mutually exclusive outcomes are referred as “failure” and
“success”.

E.g. Let X represents smoking status; X=1 smoker and X=0 non-smoker.
The two outcomes are mutually exclusive.
E.g In USA; in 1987, 29% of the adults in USA were smokers, therefore
Pr (X=1) = 0.29 and Pr (X=0) = 1-0.29 = 0.71.

02/08/25 [email protected] 47
Binomial distribution…

In general in binomial distribution:


There are a fixed n number of trials each of which results in one of
two mutually exclusive outcomes.
The outcomes of n trials are independent.

The probability of “success” is constant for each trial

Pr (X=success) = Pr (X=1) = p , Pr (X=failure) = Pr (X=0) = 1-p

02/08/25 [email protected] 48
If an experiment is repeated n times, the probability P(X=x) that
outcome X occurs exactly x times is
Pr (X= x) = n! p x (1- p) n- x
x ! (n- x )!

 n (trials) & p (probability outcome of event X) are parameters of the


binomial distribution.

x is number of successes. and n! read as ”n factorial” or factorial


n” is the product of all integers 1 to n inclusive.

By definition 1!=0!=1.

02/08/25 [email protected] 49
Binomial distribution….

In addition to the probabilities of individual outcomes, we can also


compute the numerical summary measures associated with a
probability distribution.

The mean value for a binomial distribution or the average number of


successes in repeated samples of n is equal to n × p and the standard
deviation S = √np(1-p)

02/08/25 [email protected] 50
Binomial distribution….
Suppose that in a certain population 52% of all recorded births are
males. If we select randomly 10 birth records What is the probability
that : A. Exactly 5 will be males? n=10, x=5,

Pr (X= x) = n! p x (1- p) n- x

x ! (n -x )!

Pr (X=5) = 10! X 0.52 5 x (1- 0.52)10-5 =0.24


5!(10-5)!
B. Less than 3 will be females?
Pr(X<3) = [Pr(X=0)+Pr(X=1)+Pr(X=2)]
=[0.001+0.013+0.055]= 0.069

02/08/25 [email protected] 51
2. Normal Distributions
The ND is the most important probability distribution in statistics

Frequently called the “Gaussian distribution” or bell-shape curve.

Variables such as blood pressure, weight, height, serum cholesterol


level— are approximately normally distributed

The ND is vital to statistical work, most estimation procedures and


hypothesis tests underlie ND.

02/08/25 [email protected] 52
Properties of the Normal
Distribution
1. It is symmetrical about its mean, .

2. The mean, the median and mode are almost equal and it is uni-modal.

3. The total area under the curve about the x-axis is 1 square unit.

4. The curve never touches the x-axis.

5. As the value of  increases, the curve becomes more and more flat.

6. The distribution is completely determined by the parameters  and .

02/08/25 [email protected] 53
Standard Normal Distribution
It is a normal distribution that has a mean equal to 0 and a
SD equal to 1, and is denoted by N(0, 1).
The main idea is to standardize all the data that is given by
using Z-scores.
These Z-scores can then be used to find the area (and thus
the probability) under the normal curve.
The standard normal distribution has mean 0 and variance 1

02/08/25 [email protected] 54
Z - Transformation
If a random variable X~N(,) then we can transform it to a SND with
the help of Z-transformation xx 
zz 


Z represents the Z-score for a given x value.
Tells us how many SDs away from mean for normal distribution.

This process is known as standardization and gives the position on a


normal curve with μ=0 and σ=1, i.e., the SND, Z.
A Z-score is the number of standard deviations that a given x value is
above or below the mean.
02/08/25 [email protected] 55
Finding normal curve areas
1. The table gives areas between -∞ and the value of z.

2. Find the z value in tenths in the column at left margin and locate its row.
 Find the hundredths place in the appropriate column.

3. Read the value of the area (P) from the body of the table where the row and
column intersect.
 Values of P are in the form of a decimal point and four places.

 Following the model of the ND, a given value of x must be converted to a z


score before it can be looked up in the z table.

02/08/25 [email protected] 56
Some Useful Tips
Only a single curve for which μ = 0 and σ = 1 is tabulated.

02/08/25 [email protected] 57
02/08/25 [email protected] 58
a) What is the probability that z < -1.96?

(1) Sketch a normal curve

(2) Draw a perpendicular line for z = -1.9

(3) Find the area in the table

(4) The answer is the area to the left of the line P(z < -1.96) = 0.0250

02/08/25 [email protected] 59
b) What is the probability that -1.96 < z < 1.96?

The area between the values P(-1.96 < z < 1.96)

= .9750 - .0250 = .9500

02/08/25 [email protected] 60
c) What is the probability that z > 1.96?

 The answer is the area to the right of the line; found by subtracting table value
from1.0000;P(z>1.96)=1.0000-.9750=.0250

 Formula

 P(x<Z<Y)=p(y)-1-P(X)

 P(x<Z)=1-P(x=Z)

 P(x>Z)=1-P(x=Z)

02/08/25 [email protected] 61
Exercise
1. Compute P(-1 ≤ Z ≤ 1.5)

2. Find the area under the SND from 0 to 1.45

0.4265

3. Compute P(-1.66 < Z < 2.85)

02/08/25 [email protected] 62
Example on z-transformation
The diastolic blood pressures of males 35–44 years of age are normally
distributed with µ = 80 mm Hg and σ2 = 144 mm Hg2, Let individuals with
BP above 95 mm Hg are considered to be hypertensive

a. What is the probability that a randomly selected male has a BP above


95 mm Hg?

02/08/25 [email protected] 63
Approximately 10.6% of this population would be classified as
hypertensive.

02/08/25 [email protected] 64
b. What is the probability that a randomly selected male has a
DBP above 110 mm Hg?
Z = 110 – 80 = 2.50
12

P (Z > 2.50) = 0.0062

Approximately 0.6% of the population has a DBP above 110


mm Hg

02/08/25 [email protected] 65
c. What is the probability that a randomly selected male has a DBP
below 60 mm Hg?

Z = 60 – 80 = -1.67

12

P (Z < -1.67) = 0.0475

Approximately 4.8% of the population has a DBP below 60 mm Hg.

02/08/25 [email protected] 66
 The normal distribution
depends on the two
parameters  and .
 determines the  1 
2 
3

 1
 <<
location of the curve. 1 2 3


2

But,  determines

the scale of the curve, i.e.
3

the degree of flatness or 


 < <
peakedness of the curve.
1 2 3

02/08/25 [email protected] 67
Student’s t Distribution
The t distribution was discovered by W. S. Gosset in 1908 under a
family of continuous probability distributions

He used the pseudonym Student to avoid getting fired for doing
statistics on the job!!!

The shape of the t distribution is very similar to the shape of the


standard normal distribution.

They are all symmetric and uni-modal.

They are all centered at 0.

02/08/25 [email protected] 68
Flatter/broader than the Normal (0,1).

This means:
The variability of t is greater than that of a Z that is normal(0,1).

Thus, there is more area under the tails and less at center

Because variability is greater, resulting confidence intervals will be


wider.

02/08/25 [email protected] 69
Student’s t Distribution…….
The t distribution has a (slightly) different shape for each possible
sample size.

As the df gets larger, the student’s t-distribution looks more and


more like the SND with mean=0 and variance=1.

02/08/25 [email protected] 70
Student’s t Table

The body of the table contains Look up 


t values, not probabilities

02/08/25 [email protected] 71
Thank You for Being
Patient Till the End!!!

02/08/25 [email protected] 72

You might also like