0% found this document useful (0 votes)
16 views38 pages

Chap 3. Probability_Intro

This document provides an introduction to probability and its application in biostatistics, emphasizing the relationship between probability and statistical inference. It covers key concepts such as random variables, probability distributions, basic probability laws, and the significance of sensitivity and specificity in diagnostic testing. Additionally, it discusses the calculation of predictive values and introduces factorials, combinations, and permutations.

Uploaded by

edison
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views38 pages

Chap 3. Probability_Intro

This document provides an introduction to probability and its application in biostatistics, emphasizing the relationship between probability and statistical inference. It covers key concepts such as random variables, probability distributions, basic probability laws, and the significance of sensitivity and specificity in diagnostic testing. Additionally, it discusses the calculation of predictive values and introduces factorials, combinations, and permutations.

Uploaded by

edison
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Biostatistics

By Dr. Jeanne P. Uyisenga


[email protected]
0784686167
Chap 4. Introduc=on to Probability
4.1 Probability and sta=s=cs
• Probability is used to describe the likelihood of observing a
par=cular sample outcome.
• Probability and sta=s=cs are related in an important way;
• In sta=s=cs we apply probability to draw conclusion from data
sample when the popula=on is unknown i.e Sta=s=cal inference
• Probability is used as a tool ; it allows you to evaluate the
reliability of your conclusions about the popula=on when you
have only sample informa=on

• Data are obtained by observing either uncontrolled events in


nature or controlled situa=ons in a laboratory.
4.2. Random Variables and Probability
Distribu=ons
• We have previously introduced the concepts of popula=ons,
samples, variables and sta=s=cs.
• we observe a sample from some popula=on, measure a
variable outcome (categorical or numeric) on each element of
the sample, and compute sta=s=cs to describe the sample
(such as Ȳ ).
• The variables observed in the sample, as well as the sta=s=cs
they are used to compute, are random variables.
• →there is a popula=on of such outcomes, and we observe a
random subset of them in our sample.
• The collec=on of all possible outcomes in the popula=on, and
their corresponding rela=ve frequencies is called a probability
distribu=on
• A variable x is a random variable if the value that it assumes ,
corresponding to the outcome of the experiment, is a chance
or a random event.

• For discrete variables


Ø Probability = sum of the surface of all rectangles
Ø Density of rela=ve frequency= length of the rectangle
• For con0nuous variables
Ø Probability= area under the curve (to calculate by
integra=on of the surface of rectangles)
Ø Density = f(x) = height of the curve à describe by a
mathema=cal func=on

4
4.3 Terminology
• An experiment is the process by which an observa=on or
measurements is obtained,( is a methode of data collec=on).
• When a repe==on of an experiment is performed, what we
observed is an outcome called a simple event (E).
• A probability of a simple event is a mesure of our belief that
the event A will occur.
• An event is a collec=on of simple events.
• Eg: Event A= {E1, E2, E3} Event B= {E1, E3, E5}
• The probability of an event A is wri]en as P(A), p(A) or Pr(A).
• The set of all events is called the sample space
• Probability func0on: a func=on giving the probability for each
outcome
• The probalility varies between 0 and 1; the sum of the
probabili=es for all simple events equals 1.
4.4 Basic probability laws

1. For any event A, 0 ≤ P(A) ≤ 1; and


2. the sum of probabili=es for all simple events in S
equals 1.

Example:
Blood type O A B AB
Probability 0.44 ? 0.10 0.04
3. Complementality

• The probability that a certain event will not occur


• P(not A)= 1- P(A)

For any event, or

Then,
Probabili=es involving mul=ple events

3. Addi0onal rule
• If both events A and B occur on a single performance of an
experiment, this is called the intersec=on or
joint probability of A and B, denoted as
• In this case, P(A and B) ≠ 0
Therefore, P(A U B)= P(A) + P(B) –

• When A and B are disjoint or mutually exclusive ,


P(A and B) = 0, and P(A or B)= P(AUB) = P(A) + P(B)

Eg: Record a person’s blood type.


The four mutually exclusive possible outcome are these simple events:
E1= Blood type A
E2: Blood type B
E3= Blood type AB
E4= Blood type 0
4. Mul0plica0on rule
• If two events, A and B are independent then the joint
probability is

Ø The occurrence of event A does not affect the occurrence of


event B, and vice versa.
P (AUB)= P(A) + P(B) – P(A)P(B)
• The condi=onal probability of event A is the the probability of
an event A given that even B has occurred, and is denoted
P(A\B) ; and P(A\B)=

The general mul=plica=on rule


= P(A) P(B\A) or = P(B) P(A\B)
Summary of basic Proper0es of
probability
• 1. Probabili=es are real numbers on the interval from 0 to 1; i.e.,
0 ≤ P(A) ≤ 1
• 2. If an event is certain to occur, its probability is 1, and if the event
is certain not to occur, its probability is 0.
• 3. If two events are mutually exclusive (disjoint), the probability
that
one or the other will occur equals the sum of the probabili=es;
P(A or B) = P(A) + P(B).
• 4. If A and B are two events, not necessarily disjoint, then
P( A or B)= P(A) +P(B) – P(A and B).
• 5. The sum of the probabili=es that an event will occur and that it
will not occur is equal to 1; hence, P(A’) = 1 – P(A)
• 6. If A and B are two independent events, then
P( A and B) = P(A) P(B)
Exercises
• Suppose that P(A)= 0.4 and P(A and B)= 0.12
a. P(B\A)=?
b. Are the two events mutually exclusive?
c. If P(B) = 0.3, are the two events
independent?
3.1.2 Diagnos=c Tests
• Diagnos=c tes=ng provides another situa=on where
basic rules of probability can be applied.
• Subjects in a study group are determined to have a
disease (D+), or not have a disease (D−), based on a
gold standard (a process that can detect disease with
certainty).
• Then, the same subjects are subjected to the newer
(usually less trauma=c) diagnos=c test and are
determined to have tested posi=ve for disease (T+) or
tested nega=ve (T−).
• Pa=ents will fall into one of four combina=ons of gold
standard and diagnos=c test outcomes (D+T+, D+T−,
D−T+, D−T−).
Sensi=vity and Specificity
• Sensi=vity: This is the probability that a person with disease
(D+) will correctly test posi=ve based on the diagnos=c test
(T+).
• sensi=vity = P(Testposi=ve | pa=ent has disease)
= P(T+|D+).
• Specificity: This is the probability that a person without
disease (D−) will correctly test nega=ve based on the
diagnos=c test (T−).
• specificity = P(Testnega=ve | pa=ent does not have disease)
= P(T−|D−).
• This nota=on describes the condi=onal probability.

• Sensi=vity and specificity depend on


– How well the test separates the two groups
–What threshold we choose
Sensi=vity and Specificity
• Sensi=vity and specificity tell us about the test result, given that we
know if the pa=ent has the disease or not.
• In clinic, we don’t know if the pa=ent has the disease; that’s what we
want the test to tell us.

• Sensi=vity
= P(Test posi=ve | pa=ent has disease)
= P(T+ | D+)
= True posi=ve rate

• Specificity
= P(Testnega=ve | pa=ent does not have disease)
= P(T-| D-)
=True nega=ve rate
• Two commonly used terms related to diagnos=c
tes=ng are false posi=ve and false nega=ve.
• False posi=ve is when a person who is non-diseased
(D−) tests posi=ve (T+), and false nega=ve is when a
person who is diseased (D+) tests nega=ve (T−).
• The probabili=es of these events can be wri]en in
terms of sensi=vity and specificity:
• P(False Posi=ve) = P(T+|D−) = 1−specificity
• P(False Nega=ve) = P(T−|D+) = 1−sensi=vity
POPULATION

Test Results With Disease Without Disease

True Positive False Positive


Positive (TP) (a) (FP) (b)
False Negative True Negative
Negative (FN) (c) (TN) (d)
Possible Groups
with Dichotomous Test
True Disease Status is
Known,
as with dichotomous tests.
Grouping All Posi=ves
and All Nega=ves
True Disease Status
is Unknown,
as with con=nuous variables.

Ar=ficial
Cutoff
True posi=ves True posi=ves
Sensi=vity = = X 100
True posi=ves + All persons with the
false nega=ves disease

= TP
TP + FN

= a/ a+c
True nega=ves True nega=ves
Specificity = = X 100
True nega=ves+ All persons without the
false posi=ves disease

= TN
TN + FP

= d/(b+d)
Percent false posi=ves = % of people without the disease
who were incorrectly labeled by the test as having the disease

FP

FP + TN X 100

= b/(b+d)
Percent false nega=ves = % of people with the disease who
were not detected by the test

FN
FN + TP X 100

= c/ (a+c)
Predic=ve Values
• Posi=ve Predic=ve Value: This is the probability that a person who
has tested posi=ve on a diagnos=c test (T+) actually has the disease
(D+).
• PPV = P(pa=ent has disease | Test posi=ve )
= P(D+|T+).
• Nega=ve Predic=ve Value: This is the probability that a person who
has tested nega=ve on a diagnos=c test (T−) actually does not have
the disease (D−).
• NPV − = P(pa=entdoes not have disease | Test nega=ve )
= P(D−|T−).
• Overall Accuracy: This is the probability that a randomly selected
subject is correctly diagnosed by the test.
• It can be wri]en as accuracy = P(D+)sensi=vity + P(D−)specificity.
Predic=ve Value

Pos. PV = True Posi=ves


X 100 X= 100
%
TP + FP = a/(a+b)

Neg. PV = True Nega=ves


X 100X 100
TN + FN = d/(c+d)
Summary

• Sensi=vity = P(T+ | D+) = A/(A+C)


• Specificity = P(T-| D-) = D/(B+D)
• PPV = P(D+ | T+) = A/(A+B)
• NPV = P(D-| T-) = D/(C+D)
Example: Se & Sp
Sensi=vity = 90%
TEST 2 (Glucose Tolerance Test) Specificity = 90%
DIABETES
Tot
+ -
TEST 315 190 505
RESULTS + (a) (b) (a+b)
35 1710 1745
- (c) (d) (c+d)
Tot 350 1900 2250
(a+c) (b+d) (N)
Sensi=vity = 315/350 = 90%
Specificity = 1710 = 90%
1900
Predic=ve Value
Specificity &
Predic=ve Value

As specificity increases,
posi=ve predic=ve
value increases.
As sensi=vity increases, posi=ve
predic=ve value also increases,
but to a much lesser extent.
Exercises
• 1. In a popula=on with 1000 individuals, 100 have
the disease. The screening test given to each of
the individuals was posi=ve for 80 of the 100
diseased individuals. In the same popula=on , the
test was nega=ve for 800 of the 900 non-
diseased individuals.
• Sensi=vity= ?
• Specificity=?
• PPV= ?
• NPV= ?
Disease Status

Diseased (D+) Not Diseased (D-) Total

Test results (T+) 80 100 180

(T-) 20 800 820

Total 100 900 1000


Factorials
x factorial (x!) is the product of the whole
numbers from x down to 1
x! = (x)(x-1)(x-2)…(1)

5! = 5 × 4 × 3 × 2 × 1 = 120

By defini=on 0! = 1
Combina=on
• A combina0on is an arrangement of objects,
without repe==on, where order is not
important.
• Since a combina=on is the number of ways
you can arrange objects, it will always be a
whole number.
Permuta=on
• A permuta0on is an arrangement of objects,
without repe==on, where order is important.
• Since a permuta=on is the number of ways
you can arrange objects, it will always be a
whole number.
Permuta=on Example
If we have 3 colored marbles: Instead of lis=ng all possible
one RED, one BLUE, and permuta=ons, there is a
one GREEN, how many formula to figure out the total
permuta=ons using all three number of permuta=ons
marbles are possible?
n!
n Px =
RBG RGB (n − x )!
BGR BRG n is the total # of objects
GRB GBR x is the total # of objects you are choosing

3! (3 ∗ 2 ∗1)
3 P3 = = =6
(3 − 3)! 0!
Combina=on Example
If we have 3 colored marbles: There is a formula to figure out
one RED, one BLUE, and the total number of
one GREEN, how many combina=ons for more
combina=ons using all three complex situa=ons
marbles are possible?
n!
n Cx =
Just 1: x!(n − x )!
RBG
n is the total # of objects
x is the total # of objects in the
arrangement

3! 3 ∗ 2 ∗1
3 C3 = = =1
3!(3 − 3)! (3 ∗ 2 ∗1)(0!)
Example
List all combina0ons of the leRers List all permuta0ons of the leRers
ABCD in groups of 3 ABCD in groups of 3

ABC ABC ACB BAC


ABD BCA CAB CBA
ACD ABD ADB BAD
BCD BDA DAB DBA
ACD ADC CAD
4! 4 ∗ 3 ∗ 2 ∗1 CDA DAC DCA
4 C3 = = =4 BCD BDC CBD
3!(4 − 3)! (3 ∗ 2 ∗1)(1!) CDB DBC DCB

Order does NOT maRer.


4! (4 ∗ 3 ∗ 2 ∗1)
4 P3 = = = 24
(4 − 3)! 1!
Order DOES maRer.
Combina=ons & Permuta=ons
Permuta0ons
• no repe==on of objects allowed
• order is important

Combina0ons
• no repe==on of objects allowed
• order isn't important
Exercise
• You own 4 pairs of jeans, 12 clean T-shirts,
and 4 wearable pairs of sneakers.
• How many ou}it (jeans, T-shirt, and sneakers)
you can create?

You might also like