0% found this document useful (0 votes)

7 views

Basic Biostats, 2

The document discusses sampling, hypothesis testing, and errors in hypothesis testing. It provides examples of hypothesis testing including comparing forced expiratory volume between a treatment and control group in a clinical trial.

Uploaded by

aishp2897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Basic Biostats, 2

Uploaded by

aishp2897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Basic Biostatistics Part 2

1st March, 2017

Content

• Part 1 Summary
• Sampling
• Statistical Hypothesis Tests
• Errors in Hypothesis Tests
• Power and Sample Size
• Examples
• Correlation and Regression
Part 1 Summary

• What were the key learning points from Part 1?

− In groups, identify 3 key learning points from

the first session
Sampling
Sampling
• An investigation of a population is said to be a survey or study of the
population
• A population is a group of individuals or objects that meets a set of
pre-defined criteria; e.g.
- All people with permanent residence in the UK
- All patient records held in a database
- All patients with schizophrenia
- All staff members of an organisation
- All patients registered to a particular specialist
- All members of the population diagnosed with a particular health
condition
• A survey or study that collects information from every member of a
population is referred to as a census
Sampling

• Not always possible to collect information from every

member of a population due to time and resources
• A ‘good sample’ can be used to reliably estimate
characteristics (e.g. the mean) of the population
• Sample – any subset of a population

Sample

Population
Sampling Error

• Errors in surveys can be divided into two categories

• Sampling error - error due to taking a sample rather
than studying the whole population

- e.g. if a psychiatrist randomly selects a sample of

patients and records the duration of each appointment,
the average treatment time can be calculated
- if the times for all patients were recorded (i.e. the entire
population) then the population average would most
likely differ from the sample average
Non-sampling error

• Non-Sampling error is error due to:

- poor selection of strata or sample (coverage errors)
- poor data entry (processing errors)
- inaccurate responses (measurement errors)
- non-response errors
• In surveys, non-sampling errors can be more of a
problem than sampling errors
Statistical Hypothesis Tests
Hypothesis Testing

• A process called Hypothesis Testing is used

to quantify a belief against a particular
hypothesis about the population
• There are many different types of hypothesis
tests
• Five stages for hypothesis testing can be
defined:
5 Stages

1. Define the Null & Alternative Hypotheses

2. Collect data
3. Calculate the value of the test statistic
4. Compare the value of the test statistic to
values from a known probability
distribution
5. Interpret the P-value and results
The Null Hypothesis

• The Null Hypothesis is tested which assumes

no effect (e.g. the difference in means equals
zero) in the population

• Example: Comparing the rates of

hallucinations in men and woman in the
population
− Null Hypothesis (H0): rates of hallucinations
are the same in men and woman in the
population
The Alternative Hypothesis

• The Alternative Hypothesis is holds if the Null

Hypothesis is not true

• Example
− Alternative Hypothesis (H1): rates of
hallucinations are different in men and
woman in the population
The test statistic

• After data collection, the sample data is used

to calculate a test statistic

• The test statistic is effectively the amount of

evidence in the data against H0

• Generally, the larger the value (irrelevant of

sign), the greater the evidence against H0
The P-value

• The test statistic is compared to values from

the relevant probability distribution to obtain a
P-value
• The P-value is the probability of obtaining
our results, or something more extreme, if
the Null Hypothesis is true
• The smaller the P-value, the greater the
evidence against H0
Rejecting H0

• Conventionally, if the P-value < 0.05, there is

sufficient evidence to reject H0

• There is only a small chance of the results

occurring if H0 is true
– H0 is rejected, the results are statistically
significant at the 5% level
Not rejecting H0

• If the P-value ≥ 0.05, there is insufficient

evidence to reject H0
– H0 is not rejected, the results are not
statistically significant at the 5% level

• NB: This does not mean that the null

hypothesis is true, simply that we do not have
enough evidence to reject it!
Parametric vs. Non-Parametric tests

• Tests which are based on the assumption that the

data follows a known probability distribution (often the
Normal) are known as parametric tests

• Sometimes data does not conform to the assumption

so non-parametric tests can be used

• Non-Parametric tests make no assumption about

the probability distribution
Non-parametric tests
• Useful when:

− sample size is small

− data is measured on a categorical scale (though
they are used for numerical data as well)

• However:

− they have less power of detecting a real difference

than the equivalent parametric tests
− they lead to decisions rather than generating a true
understanding of the data
Statistical tests

• Numerical data (Parametric tests)

– One-sample t-test
– Independent t-test
– Paired t-test
– One-way ANOVA
Statistical tests

• Numerical data, (non-parametric tests)

– Sign test
– Wilcoxon signed rank test
– Wilcoxon rank sum test
– Kruskal-Wallis test
Statistical tests

• Categorical data

– z-test for a proportion

– Sign test
– McNemar’s test
– Chi-squared test
– Chi-squared trend test
– Fisher’s exact test
Choosing a statistical test

• Useful medical statistical books will contain a

flowchart to provide guidance

• Considerations include:

– what is the data type?

– how many groups of data are there?
– can a probability distribution be assumed?
Errors in Hypothesis Testing
Making a wrong decision
• There is the possibility of making a wrong
decision when conducting a Hypothesis test

• A wrong decision may be made when rejecting

or not rejecting the Null Hypothesis

• The possible mistakes that can be made are a:

– Type I error
– Type II error
Type I error
• Rejecting the Null Hypothesis when it is true

• Concluding that there is an effect (difference)

when in reality there is none

• The maximum chance of making a Type I error

is denoted by alpha α

• α is the significance level of the test, we reject

the null hypothesis if the p-value is less than
the significance level
Type II error
• Not rejecting the Null Hypothesis when it is
false

• Concluding that there is no effect (difference)

when one really exists

• The chance of making a Type II error is

denoted by beta β

• Its compliment 1- β, is the Power of the test

Power and Sample Size
Power of the test

• The Power is the probability of rejecting the

Null Hypothesis when it is false

– i.e. the probability of making a correct decision

• The ideal power of the test is 100%

• However there is always a possibility of making

a Type II error
Sample size

• If the number of patients/samples in the study is small,

there may be inadequate power to detect an important
existing effect – wasted resources

• If the sample is too large, the study may be

unnecessarily time – consuming, expensive or
unethical

• Need to choose an optimal sample size that strikes a

balance between the implications of making a Type I or
Type II error
Calculating an optimal sample size for a test

• The following quantities need to be specified at

the design stage of the investigation in order to
calculate an optimal sample size:

– The Power
– Significance Level
– Variability
– Smallest effect of interest
Recall: 5 stages

1. Define the Null & Alternative Hypotheses

2. Collect data
3. Calculate the value of the test statistic
4. Compare the value of the test statistic to
values from a known probability distribution
5. Interpret the P-value and results
Examples
Scenario 1

• A randomised double blind trial to determine

the effect of inhaled corticosteroids on
wheezing episodes in children
• An inhaled beclomethasone dipropionate was
compared to a Placebo
• Response variable was average forced
expiratory volume (FEV) over a 6 month
period
• Sample sizes: Treatment group =50, Placebo
group = 48
Stages 1 and 2
• Stage 1: Define Ho and H1:
Ho: the mean FEV in the population of children is
the same in the two groups
H1: the mean FEV in the population of children is
different in the two groups

• Stage 2: Collect data

Graphical Analysis
Boxplots comparing treated group to control group
2.50

2.25
Forced Expiratory Volume (FEV)

2.00

1.75

1.50

1.25

1.00

Treated Group Control Group

Selecting a test

• What is the data type? Numerical

• How many groups are there? 2
• Are the groups Paired or Independent?
Independent
• Is Normality and equal variances of the data
assumed? Yes

→Unpaired (Independent) t-test

Analysis Output

Stages 3 and 4: Calculate the

Sample N Mean StDev SE Mean test statistic and compare to
1 50 1.640 0.286 0.040 values from a known probability
2 48 1.537 0.246 0.035 distribution

Difference = mu (1) - mu (2)

Estimate for difference: 0.1033
95% CI for difference: (-0.0038, 0.2104)
T-Test of difference = 0 (vs not =): T-Value = 1.91 P-Value = 0.059 DF = 96
Both use Pooled StDev = 0.2670
Stage 5: Interpret the results

• The P-value is 0.059

• There is insufficient evidence (just!) to reject Ho
at the 5% level
• There is insufficient statistical evidence of a
difference between the 2 groups
• The Power of the Test should be checked
• A Type II error may be made when not
rejecting Ho
Scenario 2

• A study was conducted to determine if a heart condition

influences the age at which children start to walk
• Response variable was age the children started to walk
• 30 children with a specific heart condition were analysed in
the study
• Children (in general) are known to start walking at an age
of 11.4 months
• Does the heart condition influence the age at which
children start to walk?
Stages 1 and 2

• Stage 1: Define Ho and H1

Ho: the mean walking age of the children with
the heart condition = 11.4 months
H1: the mean walking age of the children with
the heart condition ≠ 11.4 months
• Stage 2: Collect data
Graphical Analysis
Histogram showing walking age of children

4
Frequency

0
10 12 14 16 18
Months
Selecting a test

• What is the data type? Numerical

• How many groups are there? 1
• Is Normality of the data assumed? Yes

→One-sample t-test
Analysis Output

One-Sample T Stages 3 and 4: Calculate the test

statistic and compare to values from a
known prob distribution
Test of mu = 11.4 vs not = 11.4

N Mean StDev SE Mean 95% CI T P

30 13.158 2.583 0.472 (12.193, 14.123) 3.73 0.001
Stage 5: Interpret the results

• The P-value is 0.001

• There is strong evidence to reject Ho
• There is statistical evidence that the heart
condition influences the age at which children
start to walk
• The Probability that a Type I error has been
made in drawing this conclusion is 0.1%
Correlation and Regression
Correlation and Regression

• Correlation
– measures the strength of association
between two variables

• Regression
– models a relationship between two or
more variables
Correlation
• The degree of association between two variables is
called their correlation

• Positive correlation - when the points appear in a

band running from lower left to upper right (when x
increases, y increases)

• Negative correlation - when the points appear in a

band from upper left to lower right (when x increases,
y decreases)

• No correlation - when the points are randomly

scattered about the graph
Correlation and “Line of best fit”

Here are
some
examples
Be Careful!

"Correlation does not imply causality"

• In other words, the scatter plot may show that

a relationship exists, but it does not and cannot
prove that one factor is causing the other

• The scatter plot can only provide a clue that

two factors may be “cause and effect”
Correlation - example

• Driving test scores – written paper

• Outcome compared by plotting scores against

number of lessons (1-10)

– does score improve as the number of lessons

increases?
Scatter plot for learner drivers
170

160

150

140
marks3

130

120

110

100

90
0 2 4 6 8 10
classes
Linear Regression

• Investigates a straight line (linear) association

between variables

• Straight line fitted to the scatter diagram is

known as the regression equation

• Least squares – the sum of the squared

differences between the observed and
predicted values is minimised
Medical example

• Does increasing hardness improve abrasion resistance

for composites?

• Does increasing etch time improve bond strength to

enamel?

• Both questions require a regression approach

– using just two or three materials of different hardness
is not acceptable

– using just two etch times would not provide answers

Data

Composite Hardness Wear rate

1 120 56
2 168 46
3 290 21
4 42 98
5 78 80
6 90 65
7 130 32
Regression equation 1

A regression equation is: wear = 94.6 - 0.288 hardness

Fitted Line Plot

wear = 94.65 - 0.2882 hardness
100 S 14.5829
R-Sq 75.4%
R-Sq(adj) 70.4%

60
wear

0
50 100 150 200 250 300
hardness
Regression equation 2
• Etch time 5 to 60 s
• Bond strength 15 to 26 MPa
Regression equation: bond strength = 17.3 + 0.110 etch time
Fitted Line Plot
bond strength = 17.31 + 0.1103 etch time
27.5 S 2.51095
R-Sq 35.2%
R-Sq(adj) 32.2%
25.0
bond strength

22.5

20.0

17.5

15.0

0 10 20 30 40 50 60
etch time
Summary

• Part 2 Summary
• Sampling
• Statistical Hypothesis Tests
• Errors in Hypothesis Tests
• Power and Sample Size
• Examples
• Correlation and Regression

Motor Control and Learning
75% (4)
Motor Control and Learning
592 pages
Inferential Statistics
100% (4)
Inferential Statistics
28 pages
Presentation Friedman Test
100% (1)
Presentation Friedman Test
11 pages
Ch21 Time Series Econometrics - Basic Concept
No ratings yet
Ch21 Time Series Econometrics - Basic Concept
51 pages
Hypothesis Testing-2 PDF
No ratings yet
Hypothesis Testing-2 PDF
16 pages
L7-Hypothesis Testing
No ratings yet
L7-Hypothesis Testing
44 pages
90156hypothesis Testing
No ratings yet
90156hypothesis Testing
34 pages
Statistical Inferences
No ratings yet
Statistical Inferences
46 pages
PSM 201 Sampling Distributions and Hypothesis Testing
No ratings yet
PSM 201 Sampling Distributions and Hypothesis Testing
31 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
86 pages
03 Fact Sheet HME712 Bos - 3 General Principles of Hypothesis Testing
No ratings yet
03 Fact Sheet HME712 Bos - 3 General Principles of Hypothesis Testing
2 pages
Lec 9(Hypothesis Testing)
No ratings yet
Lec 9(Hypothesis Testing)
53 pages
Week7-Inferentionalstat - (Grup Differences)
No ratings yet
Week7-Inferentionalstat - (Grup Differences)
32 pages
What Is Hypothesis Testing
100% (1)
What Is Hypothesis Testing
32 pages
Statistics Lecture Part 4
No ratings yet
Statistics Lecture Part 4
100 pages
Point Estimation and Interval Estimation: Learning Objectives
No ratings yet
Point Estimation and Interval Estimation: Learning Objectives
58 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
8 pages
TOPIC FIVE_HYPOTHESIS TESTING_
No ratings yet
TOPIC FIVE_HYPOTHESIS TESTING_
60 pages
Wk. 13 Ppt. - Quantitative Techniques in Business
No ratings yet
Wk. 13 Ppt. - Quantitative Techniques in Business
24 pages
pr2-c4-ls6
No ratings yet
pr2-c4-ls6
4 pages
Week 13
No ratings yet
Week 13
33 pages
12 Stats Review
No ratings yet
12 Stats Review
51 pages
Defining Hypothesis Testing
No ratings yet
Defining Hypothesis Testing
17 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
Statistics Can Be Broadly Classified Into Two Categories Namely (I) Descriptive Statistics and (II) Inferential Statistics
0% (1)
Statistics Can Be Broadly Classified Into Two Categories Namely (I) Descriptive Statistics and (II) Inferential Statistics
59 pages
Q2 Module 5 - Data Analysis Using Statistics and Hypothesis Testing
No ratings yet
Q2 Module 5 - Data Analysis Using Statistics and Hypothesis Testing
9 pages
PREETI
No ratings yet
PREETI
18 pages
Learning Module - Statistics and Probability
No ratings yet
Learning Module - Statistics and Probability
71 pages
Unit 3 Hypothesis
No ratings yet
Unit 3 Hypothesis
41 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
68 pages
Reading 11: Hypothesis Testing
No ratings yet
Reading 11: Hypothesis Testing
28 pages
Biostatistics Final
No ratings yet
Biostatistics Final
7 pages
Hypotheses Testing
No ratings yet
Hypotheses Testing
5 pages
chapter 7 hypothesis testing and sample size determination _2
No ratings yet
chapter 7 hypothesis testing and sample size determination _2
69 pages
Math 403 Engineering Data Analysi1
No ratings yet
Math 403 Engineering Data Analysi1
10 pages
HYPOTHESIS TESTING - Edited
No ratings yet
HYPOTHESIS TESTING - Edited
12 pages
FAQ in STATISTICS 17june2023
No ratings yet
FAQ in STATISTICS 17june2023
59 pages
Statistical Inference: (Analytic Statistics) Lec 10
No ratings yet
Statistical Inference: (Analytic Statistics) Lec 10
42 pages
QA_Hypothesis
No ratings yet
QA_Hypothesis
41 pages
AEB03 - Inferential Statitsitics (FE)
No ratings yet
AEB03 - Inferential Statitsitics (FE)
54 pages
04 Hypothesis Testing IITB PDF
No ratings yet
04 Hypothesis Testing IITB PDF
33 pages
L10 11 Hypothesis & ANOVA
No ratings yet
L10 11 Hypothesis & ANOVA
13 pages
Week5 Inferentionalstat
No ratings yet
Week5 Inferentionalstat
54 pages
Research Methodology Notes Part 3
No ratings yet
Research Methodology Notes Part 3
30 pages
Nciph ERIC2
No ratings yet
Nciph ERIC2
7 pages
Introduction to Statistical Hypothesis Testing in R
No ratings yet
Introduction to Statistical Hypothesis Testing in R
8 pages
2 Intro Inferrential Statistics
No ratings yet
2 Intro Inferrential Statistics
24 pages
Tests of Significance
No ratings yet
Tests of Significance
60 pages
Chapter 4 Lesson 3: Estimating Population Proportion (P) For The Large Sample Size
No ratings yet
Chapter 4 Lesson 3: Estimating Population Proportion (P) For The Large Sample Size
15 pages
Chapter 5
No ratings yet
Chapter 5
35 pages
04 Statistical Inference v0 1 09062022 090226pm
No ratings yet
04 Statistical Inference v0 1 09062022 090226pm
42 pages
Chapter 2
No ratings yet
Chapter 2
16 pages
Hypothesis Testing For One Population Parameter - Samples
100% (1)
Hypothesis Testing For One Population Parameter - Samples
68 pages
Testing of Hypothesis Hypothesis
No ratings yet
Testing of Hypothesis Hypothesis
32 pages
Part II
No ratings yet
Part II
58 pages
Module 3-4
No ratings yet
Module 3-4
36 pages
Statistical Test of Hypotheses
No ratings yet
Statistical Test of Hypotheses
36 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
37 pages
Chapter 7
No ratings yet
Chapter 7
9 pages
Hypothesis Python
No ratings yet
Hypothesis Python
42 pages
4.1 Common Statistical Tests and Applications in Epidemiological Literature
No ratings yet
4.1 Common Statistical Tests and Applications in Epidemiological Literature
6 pages
Topic06. Analysis of Differences
No ratings yet
Topic06. Analysis of Differences
63 pages
Concise Biostatistical Principles & Concepts: Guidelines for Clinical and Biomedical Researchers
From Everand
Concise Biostatistical Principles & Concepts: Guidelines for Clinical and Biomedical Researchers
Franklin Opara
No ratings yet
Research Synopsis
No ratings yet
Research Synopsis
1 page
GkFinalCentralTendency Slides
No ratings yet
GkFinalCentralTendency Slides
46 pages
1.gait Kinematic
No ratings yet
1.gait Kinematic
55 pages
Steps For Writing A Research Article
No ratings yet
Steps For Writing A Research Article
2 pages
Gait Kinetics
No ratings yet
Gait Kinetics
20 pages
Hypermobility Exercise Programme v7
No ratings yet
Hypermobility Exercise Programme v7
7 pages
MPT - 1 Year Question Paper (Physical and Functional Diag.)
100% (2)
MPT - 1 Year Question Paper (Physical and Functional Diag.)
10 pages
11-10-0502 IBD Stayfit
No ratings yet
11-10-0502 IBD Stayfit
16 pages
Applied Work Physiology
No ratings yet
Applied Work Physiology
2 pages
ACL Evaluation
100% (1)
ACL Evaluation
10 pages
Rotator Cuff Injuries and Their Management (20-01-22)
No ratings yet
Rotator Cuff Injuries and Their Management (20-01-22)
76 pages
Figure 1 Blood Gas Analysis 1710
No ratings yet
Figure 1 Blood Gas Analysis 1710
1 page
07.03.52 Extubation of Endotracheal Tubes
No ratings yet
07.03.52 Extubation of Endotracheal Tubes
6 pages
Revisiting The Classic Control Function Approach, With Implications For Parametric and Non-Parametric Regressions
No ratings yet
Revisiting The Classic Control Function Approach, With Implications For Parametric and Non-Parametric Regressions
42 pages
Quantitative Models in Marketing Research - Franses & Paap
No ratings yet
Quantitative Models in Marketing Research - Franses & Paap
221 pages
Distribution of Continuous R.V.: Normal Distribution (CH 1.4) Topics
No ratings yet
Distribution of Continuous R.V.: Normal Distribution (CH 1.4) Topics
7 pages
Module 3 Ie Stat 1
No ratings yet
Module 3 Ie Stat 1
12 pages
Scikit Learn
No ratings yet
Scikit Learn
10 pages
Econometrics- chapter -chapter- II
No ratings yet
Econometrics- chapter -chapter- II
34 pages
Motaal Analysis Theory
No ratings yet
Motaal Analysis Theory
5 pages
Uji Homogenitas Ragam: Levene's Test of Equality of Error Variances
No ratings yet
Uji Homogenitas Ragam: Levene's Test of Equality of Error Variances
3 pages
PSYC - 1001 - Chapter 2
No ratings yet
PSYC - 1001 - Chapter 2
5 pages
Unit II HONOR- Continuous Distibution
No ratings yet
Unit II HONOR- Continuous Distibution
14 pages
Mean, Median and Mode - Module 1
0% (1)
Mean, Median and Mode - Module 1
8 pages
Boukeloua 2015
No ratings yet
Boukeloua 2015
8 pages
Chapter Two
50% (2)
Chapter Two
13 pages
Mahalanobis Distance
No ratings yet
Mahalanobis Distance
6 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Regresi Data Panel Pooled Model (PLS)
No ratings yet
Regresi Data Panel Pooled Model (PLS)
3 pages
Tesis DR Imam 2018 Siap Sidang Hasil Revisi
100% (2)
Tesis DR Imam 2018 Siap Sidang Hasil Revisi
207 pages
Complete Download (eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tan PDF All Chapters
100% (4)
Complete Download (eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tan PDF All Chapters
61 pages
Smartpls Report: Complete Final Results
No ratings yet
Smartpls Report: Complete Final Results
257 pages
Business Statistics-Sample QP
No ratings yet
Business Statistics-Sample QP
11 pages
Rangeee
No ratings yet
Rangeee
43 pages
HR Analytics to Track Employee Performance
No ratings yet
HR Analytics to Track Employee Performance
9 pages
Tukey's Honestly Signi Cant Difierence (HSD) Test:, Including
100% (1)
Tukey's Honestly Signi Cant Difierence (HSD) Test:, Including
6 pages
结构方程模型输出
No ratings yet
结构方程模型输出
2 pages
Practice Questions MGT 632: Business Research Methods/ MGB 114: Research Methods
No ratings yet
Practice Questions MGT 632: Business Research Methods/ MGB 114: Research Methods
9 pages
DifferencebetwndescripandinfeStatistics Rahid
No ratings yet
DifferencebetwndescripandinfeStatistics Rahid
9 pages
Jurnal Persalinan Normal
No ratings yet
Jurnal Persalinan Normal
8 pages