Advanced Statistics Concepts
Advanced Statistics Concepts
STATISTICAL CONCEPTS
Dr. Amel Fayed
MD, MPH, PhD
Associate Professor Biostatistics
College of Medicine
PNU
Outlines
■ Hypothesis testing
■ Confidence intervals
How to Talk to a
Statistician?
■ “It’s all Greek to me . . .” Καλημέρ
α
Why Do I Need
STATISTICS?
■ Planning a study
■ Proposal writing
■ Literature interpretation
Comprehensi
Sample
ve
Types of data
Consta
nt Variable
s
Data: Example
Discrete
Nominal
(number of
(gender,
patients,
colours)
pulse)
paramet sample
er
statistic
NO!
■ Data from a sample can tell us something about a
population
Population
■The entire collection of events of
interest
a population
population
Generalizations will depend on how
well the sample represents the
population.
• Random Sampling
• Systematic Random
Sampling
• Stratified Random Sampling
• Cluster Random Sampling
• Multi-Stage Sampling
Simple Random Sample
• In order to be random, a full list of
everyone within a sample frame is required.
• Random number tables or a computer is
then used to select respondents at random
from the list.
Advantage
Most representative group
Disadvantage
Difficult to identify every member of a population
(sample frame)
How to select a simple
random sample
1. Define the population
2. Determine the desired sample size
3. List all members of the population or the potential
subjects
■ For example:
– 4th grade boys who have demonstrated problem
behaviors
– Lets select 10 boys from the list
Simple random sampling
Table of random numbers
684257954125632140
582032154785962024
362333254789120325
985263017424503686
Systematic Random Sampling
• This selection is like random
sampling but rather than use
random tables or a computer to
select your respondents you select
them in a systematic way.
• Similar to stratified
sampling but the groups
are selected for their
geographical location
• i.e. school children within
a particular school.
• The school is the cluster
with the children being
selected randomly from
within the cluster
Cluster sampling
Section 1 Section 2
Section 3
Section 5
Section 4
Multistage random
sampling
■ Stage 1
– randomly sample clusters (schools)
■ Stage 2
– randomly sample individuals from the schools
selected
Sampling Methods
Probability Sampling
■ Simple random
sampling Non-Probability
Sampling
■ Stratified random
■ Deliberate (quota)
sampling
sampling
■ Systematic random ■ Convenience
sampling sampling
■ Cluster (area) random ■ Purposive sampling
sampling ■ Snowball sampling
■ Multistage random ■ Consecutive
sampling sampling
Convenience Sampling
• This involves selecting the nearest and
most convenient people to participate in
the research.
• This method of selection is not
representative and is considered a very
unsatisfactory way to conduct research.
Quota Sampling
■ For example interviewers might be tempted to interview those who
look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection. This random element
is its greatest weakness and quota versus probability has been a matter
of controversy for many years
• This type of sampling
is used when the
research is focused
on participants with
very specific
characteristics such
as being members of
a certain group.
• Having identified and
contacted one gang
member the
researcher asks to be
put in touch with any
friends or associates
who are also group
members.
• This type of sampling
is not representative
however is useful,
especially where the
groups in the
research are not
Sampling
lists.
■ Purposive sampling (criterion-based sampling)
– Establish criteria necessary for being included in
study and find sample to meet criteria
■ Solution: Screening
– Use random sampling to obtain a representative
sample of larger population and then those
subjects that are not members of the desired
Purposiv –
population are screened or filtered out
EX: want to study smokers but can’t identify all
e
smokers
Samplin
g
Consecutive
sampling
– Outcome of
1000
consecutive
patients
presenting to
the
emergency
room with
chest pain
NORMAL
DISTRIBUTION
Properties of the Normal
Distribution
■ Many continuous
variables have
distributions that are bell-
shaped and are called
normally distributed
variables.
■ The theoretical curve,
called the normal
distribution curve, can be
used to study many
variables that are not
normally distributed but
are approximately normal.
Areas Under the Normal Curve
52
Interpreting Z Scores
25% of cases
2nd Quartile or
Median
25% of cases
Lower
■ Notice that these three quartiles cut the data set into
61
Vocabulary
■ Hypotheses: a statement of the research
statistical evaluation
variables
The Scientific Method
Observati
on
Hypothesi
s
Experime
nt
Revise H
Results
Evidence
Evidence
inconsisten
supports H
t with H
Hypothesis Testing
■Null Hypothesis:
H0 : it means there is no difference between the studied
groups
■Alternative hypothesis:
H1 : it means there is difference between the studied
groups
Level of confidence
67
Type I error
■ Type I error: When we reject the null hypothesis while it
was true.( False Positive results)
■ More serious than type II error.
■ Inversely related to type II error.
■ Probability of type I error is called α.
■ The α is usually = 0.01 or 0.05 and is the same as the
level of significance.
68
Type II error
■ It occurs when we accept the null hypothesis while it
was false. (False Negative results)
■ Less serious than type I error.
■ Probability of type II error is called β
■ The β = 1- Power
■ Power of the study is how strong the study is, how
much I can avoid the false negative results. We usually
use 0.80 (80%) as an accepted level of Power.
69
TYPE I
AND
TYPE II
ERROR
S
One sided Vs two sided
(one tail Vs two tails)
If We want to test that group “A” SBP is different from SBP of
group B
The null hypothesis is “mean SBP in group A= the mean
SBP of group B”
The alternative hypothesis is ” mean SBP in group A ≠
mean SBP in group B”
Another alternative hypothesis “mean SBP in group
A> mean SBP in group B”
Another alternative hypothesis is “mean SBP in group
A< mean SBP in group B”
CONFIDENCE
INTERVAL
CONFIDENCE
INTERVAL FOR
PROPORTIONS
Notation for
Proportions
p= population proportion
ˆp = nx sample proportion
of x successes in a sample of size n
(pronounced
‘p-hat’)
Point of estimate
Upper limit Lower limit
E E
5% 5%
Overlapping of
Confidence Intervals
CONFIDENCE
INTERVAL FOR
RISK RATIO
RISK
RATIO
Interpretation of the
significance of OR and RR
■ Confidence interval
Linear Curvilinear
relationships relationships
Y Y
X X
Y Y
X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
Weak relationships
Strong
relationships
Y Y
X X
Y Y
X X
from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
No relationship
X
from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
A statistic that quantifies a
relation between two
variables