biostatistis
biostatistis
Importance:
Examples:
Inferential Statistics
Purpose: Makes predictions or generalizations based on sample data.
Example:
Organized into intervals (e.g., age groups: Raw, unorganized data (e.g., individual
0–10, 11–20) ages: 5, 12, 34)
2. Categories of Data
Sorted
Ranked Order of finishing in a race
numerically/categorically.
📊 Representing Data
1. Tabulated Data
Organize information into rows and columns for clarity.
2. Graphical Representation
Bar Charts: For comparing categorical data.
Example:
Untitled 2
Sum = 120
Mean = 120 / 5 = 24
1.2 Median
Steps to Calculate:
Median = 24
1.3 Mode
Definition: The most frequently occurring value in the dataset.
Example:
2. Measures of Dispersion
2.1 Range
Formula: Range = Maximum value - Minimum value
Example:
Range = 28 - 20 = 8
Untitled 3
2.2 Interquartile Range (IQR)
Definition: The range of the middle 50% of the data.
Steps to Calculate:
3. IQR = Q3 - Q1
Example:
Q1 = 21
Q3 = 27
IQR = 27 - 21 = 6
2.3 Variance
Formula: Variance = (Sum of squared deviations from the mean) / (Total number of
data points)
Steps to Calculate:
2. Subtract the mean from each data point and square the result.
Example:
Mean = 24
Deviations: (20 - 24), (22 - 24), (24 - 24), (26 - 24), (28 - 24)
Variance = 40 / 5 = 8
Untitled 4
Formula: Standard Deviation = √Variance
Example:
Variance = 8
Example:
3. Graphical Representation
Example - Histogram
20–25 3
26–30 3
31–35 2
Steps:
Graphical Representation:
Insert histograms, bar charts, or scatter plots via the "Insert" tab.
Untitled 5
CH2////////////////////////////////
📚Distributions
Probability and Probability
Let's dive into the world of probability, where we predict outcomes and analyze
random events in different scenarios.
Types of Events
Independent Events:
Events where the occurrence of one does not affect the probability of the
other.
Dependent Events:
Events where the occurrence of one affects the probability of the other.
Example: Drawing cards from a deck without replacement.
Untitled 6
Example: Tossing a coin (it can't land heads and tails at the same time).
Complementary Events:
Two events that together exhaust all possible outcomes.
Key Properties:
1. Each probability is between 0 and 1.
🎯 Binomial Distribution
What is Binomial Distribution?
Definition: A binomial distribution models the number of successes in a
fixed number of independent Bernoulli trials (yes/no outcomes).
Conditions:
Formula:
To calculate the probability of exactly k successes in n trials:
P(X = k) = (n choose k) * p^k * (1 - p)^(n - k)
Where:
Untitled 7
(n choose k) is the binomial coefficient, which is calculated as:(n choose k)
= n! / [k!(n - k)!]
n = 5 (number of trials)
Solution:
📈 Poisson Distribution
What is Poisson Distribution?
Definition: A Poisson distribution models the probability of a given number
of events happening in a fixed interval of time or space, given a known
average rate.
Conditions:
Formula:
Untitled 8
To calculate the probability of exactly k events happening in a fixed interval,
use the Poisson formula:
Solution:
So, the probability of exactly 2 cars passing in the next 10 minutes is 0.2241 or
22.41%.
Properties:
Symmetric: The mean, median, and mode are all the same.
68-95-99.7 Rule:
Untitled 9
95% falls within 2 standard deviations.
μ = 70 (mean)
σ = 10 (standard deviation)
To find this, use z-scores and refer to the normal distribution table or use a
calculator to find the cumulative probability.
📚 Key Takeaways
Probability measures the likelihood of an event occurring.
Untitled 10
CH3/////////
Types of Correlation:
1. Positive Correlation: As one variable increases, the other also increases.
Example: Height and weight are positively correlated (taller people tend to
weigh more).
Untitled 11
r = -1: Perfect negative correlation.
r = 0: No correlation.
Objective: Predict the value of the dependent variable (y) based on the
independent variable (x).
Regression Equation:
The formula for the regression line is:
y = β₀ + β₁x + ε
Where:
β₀ is the y-intercept.
β₁ is the slope of the line (shows the change in y per unit change in x).
Untitled 12
Hours Studied (x) Exam Score (y)
1 60
2 65
3 70
4 75
5 80
Mean of y:
ȳ = (60 + 65 + 70 + 75 + 80) / 5 = 70
Regression Equation:
y = 40 + 10x
🔢Correlation
Introduction to Multiple Regression and
Objective: Predict the value of the dependent variable (y) based on the
values of several independent variables (x₁, x₂, x₃,...).
Untitled 13
y = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + ... + βₙxₙ + ε
Where:
β₀ is the intercept.
β₁, β₂, ... βₙ are the coefficients for each independent variable.
🔍Regression
Key Differences Between Simple and Multiple
📚 Key Takeaways
Correlation helps measure the strength and direction of relationships
between variables.
Untitled 14
CH4/////
📚Intervals
Hypothesis Testing and Confidence
🔍Proportion,
Hypothesis Testing of a Single Population Mean,
and Variance
1. Testing the Population Mean
Formula (t-test for population mean):
x̄ = sample mean
Untitled 15
μ₀ = population mean
n = sample size
p̂ = sample proportion
n = sample size
n = sample size
s² = sample variance
Untitled 16
Where:
x̄ ₁, x̄ ₂ = sample means
Example: Testing if the mean weight loss is different between two groups
using different diets.
p̂ ₁, p̂ ₂ = sample proportions
F = s₁² / s₂²
Where:
📊Errors
Interpretation of p-value, Type I, and Type II
1. p-value
Definition: The p-value indicates the strength of the evidence against the
null hypothesis.
Untitled 17
Interpretation:
p-value > α: Fail to reject the null hypothesis (not enough evidence for
the alternative hypothesis).
"False positive."
"False negative."
📏Interpretation
Confidence Intervals: Estimation and
Confidence Interval (CI): A range of values, derived from sample data, that
is likely to contain the population parameter with a certain level of
confidence (e.g., 95%).
x̄ = sample mean
z = z-value for the desired confidence level (e.g., 1.96 for 95% CI)
n = sample size
Example: If the sample mean is 50, the sample standard deviation is 10, and
the sample size is 30, the 95% confidence interval for the population mean
is:
Untitled 18
CI = 50 ± 1.96 * (10 / √30) ≈ 50 ± 3.58 → (46.42, 53.58)
p̂ = sample proportion
n = sample size
OR = odds ratio
🔄Hypothesis
Relationship Between Confidence Intervals and
Testing
Key Connection: A confidence interval for a parameter can be used to
perform hypothesis testing.
If the null value (e.g., 0 for the difference in means, 1 for odds ratio)
falls within the confidence interval, we fail to reject the null hypothesis.
If the null value falls outside the confidence interval, we reject the null
hypothesis.
💡 Key Takeaways
Hypothesis Testing allows us to make decisions about population
parameters based on sample data.
Untitled 19
Confidence Intervals provide a range of plausible values for a population
parameter, and they are closely related to hypothesis testing.
💬 Practice Questions
t-test: Test whether the average salary of employees in a department is
different from $50,000.
p-value: If the p-value from a hypothesis test is 0.03 and α = 0.05, should
you reject or fail to reject the null hypothesis?
//ch5
Used when: Data is interval or ratio scale and meets assumptions about
population parameters (e.g., mean, standard deviation).
Examples:
t-tests (One-sample, Independent two-sample, Paired sample)
Nonparametric Tests
Untitled 20
Definition: Tests that do not assume a specific data distribution.
Examples:
Mann-Whitney U test
Chi-square test
📊 Student’s t-tests
1. One-Sample t-test
Purpose: Compares the sample mean to a known population mean.
Formula:
t = (x̄ - μ₀) / (s / √n)
Where:
x̄ = sample mean
μ₀ = population mean
n = sample size
Example:
Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]
Where:
Untitled 21
x̄ ₁, x̄ ₂ = sample means
Example:
Comparing average test scores of two different teaching methods. Group 1
(n₁ = 25) has a mean score of 75 and standard deviation of 12, and Group 2
(n₂ = 30) has a mean score of 80 and standard deviation of 15.
Formula:
t = (d̄ ) / (sₖ / √n)
Where:
n = number of pairs
Example:
Formula:
F = (Between-group variance) / (Within-group variance)
Example:
Comparing the effectiveness of three different diets on weight loss, where
you have three groups with different diets.
2. Two-Way ANOVA
Untitled 22
Purpose: Examines two factors simultaneously, and their interaction effect
on the dependent variable.
Formula:
The formula for the F-statistic is similar but involves multiple sources of
variance (Factor A, Factor B, and Interaction).
Example:
Testing the effect of teaching methods (A/B/C) and time of day
(Morning/Evening) on test scores.
Example:
📊 Chi-Square Test
Purpose: Tests the relationship between two categorical variables.
O = observed frequency
E = expected frequency
Example:
🧩 Nonparametric Tests
1. Mann-Whitney U Test
Untitled 23
Purpose: A nonparametric test that compares differences between two
independent groups when the dependent variable is ordinal or continuous
but not normally distributed.
Hypothesis:
Null Hypothesis (H₀): The distributions of the two groups are identical.
Hypothesis:
Formula:
ρ = 1 - [(6 Σ dᵢ²) / (n(n² - 1))]
Where:
n = number of observations
Example:
Assessing the relationship between job satisfaction and employee
performance by ranking both variables.
Untitled 24
Chi-Square Test: Tests relationships between categorical variables.
💡 Classroom Practice
t-test: Compare the average height of students in two classrooms.
📚Intervals
Hypothesis Testing and Confidence
In today's lesson, we'll be diving into the world of hypothesis testing and
confidence intervals, two key concepts in statistics that help us make decisions
based on data.
Untitled 25
Alternative Hypothesis (H₁ or Ha): The hypothesis that suggests a
significant effect or difference.
🔍Proportion,
Hypothesis Testing of a Single Population Mean,
and Variance
1. Testing the Population Mean
Formula (t-test for population mean):
t = (x̄ - μ₀) / (s / √n)
Where:
x̄ = sample mean
μ₀ = population mean
n = sample size
p̂ = sample proportion
n = sample size
Untitled 26
Example: Testing if the proportion of students who pass a test is 80%.
n = sample size
s² = sample variance
x̄ ₁, x̄ ₂ = sample means
Example: Testing if the mean weight loss is different between two groups
using different diets.
p̂ ₁, p̂ ₂ = sample proportions
Untitled 27
n₁, n₂ = sample sizes
📊Errors
Interpretation of p-value, Type I, and Type II
1. p-value
Definition: The p-value indicates the strength of the evidence against the
null hypothesis.
Interpretation:
p-value > α: Fail to reject the null hypothesis (not enough evidence for
the alternative hypothesis).
"False positive."
Untitled 28
"False negative."
📏Interpretation
Confidence Intervals: Estimation and
Confidence Interval (CI): A range of values, derived from sample data, that
is likely to contain the population parameter with a certain level of
confidence (e.g., 95%).
x̄ = sample mean
z = z-value for the desired confidence level (e.g., 1.96 for 95% CI)
n = sample size
Example: If the sample mean is 50, the sample standard deviation is 10, and
the sample size is 30, the 95% confidence interval for the population mean
is:
CI = 50 ± 1.96 * (10 / √30) ≈ 50 ± 3.58 → (46.42, 53.58)
p̂ = sample proportion
n = sample size
OR = odds ratio
Untitled 29
SE = standard error of the log(OR)
🔄Hypothesis
Relationship Between Confidence Intervals and
Testing
Key Connection: A confidence interval for a parameter can be used to
perform hypothesis testing.
If the null value (e.g., 0 for the difference in means, 1 for odds ratio)
falls within the confidence interval, we fail to reject the null hypothesis.
If the null value falls outside the confidence interval, we reject the null
hypothesis.
💡 Key Takeaways
Hypothesis Testing allows us to make decisions about population
parameters based on sample data.
💬 Practice Questions
t-test: Test whether the average salary of employees in a department is
different from $50,000.
p-value: If the p-value from a hypothesis test is 0.03 and α = 0.05, should
you reject or fail to reject the null hypothesis?
Untitled 30
📚Applications
Research Project and Practical
📝Methods
Research Project: Applying Advanced Statistical
Objective
Individual or Group Research Project:
Untitled 31
Confidence Intervals and Hypothesis Testing: To estimate population
parameters and test assumptions.
🏥Techniques
Practical Applications of Advanced Biostatistics
in Healthcare
Real-World Applications
Advanced biostatistics techniques are widely used in healthcare to improve
patient outcomes, optimize treatment strategies, and inform public health
policies. Some common applications include:
Untitled 32
Cohort Studies and Case-Control Studies: Common in epidemiology to
explore relationships between exposures and outcomes.
4. Interpreting Results:
5. Creating Visualizations:
🎤Findings
Presentation and Discussion of Research
Structure of Presentation:
1. Introduction: Provide background on the research question, objectives, and
the importance of the study.
Untitled 33
2. Methods: Explain the data collection process, statistical methods used, and
any assumptions made.
3. Results: Present findings with the help of visual aids (charts, graphs).
1. Data Collection:
2. Statistical Analysis:
3. Results:
If the p-value is less than 0.05, reject H₀ (suggesting the drug has a
significant effect).
Untitled 34
4. Discussion:
💡 Key Takeaways:
Research Projects allow students to apply statistical methods to real-world
problems.
Practice Exercise:
Scenario: You are conducting a study to determine if a new heart
medication reduces cholesterol levels more effectively than the current
standard treatment. Create a research question, outline your hypothesis,
and suggest statistical methods for analysis.
Untitled 35