0% found this document useful (0 votes)
16 views

biostatistis

The document provides an introduction to biostatistics, explaining its importance in research and healthcare, and differentiating between descriptive and inferential statistics. It covers various statistical concepts such as types of data, measures of central tendency, probability distributions, and correlation and regression techniques. Key takeaways include the significance of probability in predicting outcomes and the use of different distributions to analyze data trends.

Uploaded by

jay.kishan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

biostatistis

The document provides an introduction to biostatistics, explaining its importance in research and healthcare, and differentiating between descriptive and inferential statistics. It covers various statistical concepts such as types of data, measures of central tendency, probability distributions, and correlation and regression techniques. Key takeaways include the significance of probability in predicting outcomes and the use of different distributions to analyze data trends.

Uploaded by

jay.kishan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

📚 Introduction to Biostatistics

Let's explore how biostatistics shapes scientific research and healthcare!

🌟 Why Do We Need Biostatistics?


Definition: The application of statistics to biological, medical, and health-
related fields.

Importance:

📈 Helps in understanding data trends in research and healthcare.


🧪 Provides tools to validate scientific hypotheses.
Example:
Biostatistics helps determine whether a new drug is effective by analyzing
patient recovery rates.

📊 Descriptive vs. Inferential Statistics


Descriptive Statistics
Purpose: Summarizes and organizes data.

Examples:

Mean, Median, Mode

Graphs like Bar Charts, Histograms

Inferential Statistics
Purpose: Makes predictions or generalizations based on sample data.

Example:

🧬 Testing if a vaccine reduces disease cases across populations.


📋 Types of Data
Untitled 1
1. Grouped vs. Ungrouped Data

Grouped Data Ungrouped Data

Organized into intervals (e.g., age groups: Raw, unorganized data (e.g., individual
0–10, 11–20) ages: 5, 12, 34)

2. Categories of Data

Category Definition Example

Nominal No specific order. Blood type (A, B, AB, O)

Disease severity (mild, moderate,


Ordinal Ranked data.
severe)

Sorted
Ranked Order of finishing in a race
numerically/categorically.

Discrete Whole numbers only. Number of students in a class

Can take any value within a


Continuous Height (e.g., 162.5 cm)
range.

📊 Representing Data
1. Tabulated Data
Organize information into rows and columns for clarity.

2. Graphical Representation
Bar Charts: For comparing categorical data.

Histograms: For frequency distributions of continuous data.

Pie Charts: For visualizing proportions.

1. Measures of Central Tendency


1.1 Mean (Average)
Formula:
Mean = (Sum of all data points) / (Total number of data points)

Example:

Data: 20, 22, 24, 26, 28

Untitled 2
Sum = 120

Mean = 120 / 5 = 24

1.2 Median
Steps to Calculate:

1. Arrange data in ascending order.

2. Find the middle value:

If odd: Middle value is the median.

If even: Median = Average of two middle values.

Example (Odd Data Points):

Data: 20, 22, 24, 26, 28

Median = 24

Example (Even Data Points):

Data: 20, 22, 24, 26

Median = (22 + 24) / 2 = 23

1.3 Mode
Definition: The most frequently occurring value in the dataset.

Example:

Data: 20, 22, 22, 24, 28

Mode = 22 (occurs twice)

2. Measures of Dispersion
2.1 Range
Formula: Range = Maximum value - Minimum value

Example:

Data: 20, 22, 24, 26, 28

Range = 28 - 20 = 8

Untitled 3
2.2 Interquartile Range (IQR)
Definition: The range of the middle 50% of the data.

Steps to Calculate:

1. Find Q1 (First Quartile): Median of the lower half.

2. Find Q3 (Third Quartile): Median of the upper half.

3. IQR = Q3 - Q1

Example:

Data: 20, 22, 24, 26, 28

Q1 = 21

Q3 = 27

IQR = 27 - 21 = 6

2.3 Variance
Formula: Variance = (Sum of squared deviations from the mean) / (Total number of
data points)

Steps to Calculate:

1. Find the mean.

2. Subtract the mean from each data point and square the result.

3. Sum the squared deviations.

4. Divide by the number of data points.

Example:

Data: 20, 22, 24, 26, 28

Mean = 24

Deviations: (20 - 24), (22 - 24), (24 - 24), (26 - 24), (28 - 24)

Squared deviations: 16, 4, 0, 4, 16

Sum of squared deviations = 40

Variance = 40 / 5 = 8

2.4 Standard Deviation (SD)

Untitled 4
Formula: Standard Deviation = √Variance

Example:

Variance = 8

Standard Deviation = √8 ≈ 2.83

2.5 Coefficient of Variation (CV)


Formula: CV = (Standard Deviation / Mean) × 100

Example:

Mean = 24, SD = 2.83

CV = (2.83 / 24) × 100 ≈ 11.79%

3. Graphical Representation
Example - Histogram

Age Range (Bins) Frequency

20–25 3

26–30 3

31–35 2

Steps:

Plot Age Ranges on the x-axis.

Plot Frequency on the y-axis.

4. Using Tools (e.g., Excel)


Mean, Median, Mode:

Use formulas: =AVERAGE(range) , =MEDIAN(range) , and =MODE(range)

Variance and Standard Deviation:

Use formulas: =VAR.P(range) and =STDEV.P(range)

Graphical Representation:

Insert histograms, bar charts, or scatter plots via the "Insert" tab.

Untitled 5
CH2////////////////////////////////

📚Distributions
Probability and Probability

Let's dive into the world of probability, where we predict outcomes and analyze
random events in different scenarios.

🔢 Basic Probability Concepts


What is Probability?
Definition: Probability is the measure of the likelihood of an event
happening.

Formula:P(A) = (Number of favorable outcomes) / (Total number of


possible outcomes)

Range: Probability values range from 0 to 1

P(A) = 0 means the event will not happen.

P(A) = 1 means the event will definitely happen.

Types of Events
Independent Events:

Events where the occurrence of one does not affect the probability of the
other.

Example: Tossing a coin and rolling a die.

Dependent Events:

Events where the occurrence of one affects the probability of the other.
Example: Drawing cards from a deck without replacement.

Mutually Exclusive Events:


Two events that cannot happen at the same time.

Untitled 6
Example: Tossing a coin (it can't land heads and tails at the same time).

Complementary Events:
Two events that together exhaust all possible outcomes.

Example: If event A is "rain," then the complement is "no rain."

📊 Discrete Probability Distributions


What is a Discrete Distribution?
Definition: A discrete probability distribution is a distribution of a random
variable that can take on a finite number of values.

Example: The number of heads when tossing three coins.

Key Properties:
1. Each probability is between 0 and 1.

2. The sum of all probabilities in a discrete distribution is equal to 1.

🎯 Binomial Distribution
What is Binomial Distribution?
Definition: A binomial distribution models the number of successes in a
fixed number of independent Bernoulli trials (yes/no outcomes).

Conditions:

Fixed number of trials (n).

Only two possible outcomes: success (S) or failure (F).

Constant probability of success in each trial (p).

Independence between trials.

Formula:
To calculate the probability of exactly k successes in n trials:
P(X = k) = (n choose k) * p^k * (1 - p)^(n - k)
Where:

Untitled 7
(n choose k) is the binomial coefficient, which is calculated as:(n choose k)
= n! / [k!(n - k)!]

p is the probability of success.

k is the number of successes.

n is the number of trials.

Example (Binomial Distribution):


Suppose you flip a coin 5 times. What is the probability of getting 3 heads?

n = 5 (number of trials)

p = 0.5 (probability of heads)

k = 3 (number of heads desired)

Solution:

1. First, calculate the binomial coefficient for n = 5 and k = 3:(5 choose 3) = 5!


/ [3!(5 - 3)!] = 10

2. Apply the formula:P(X = 3) = 10 * 0.5^3 * (0.5)^2 = 10 * 0.125 * 0.25 =


0.3125

So, the probability of getting exactly 3 heads is 0.3125 or 31.25%.

📈 Poisson Distribution
What is Poisson Distribution?
Definition: A Poisson distribution models the probability of a given number
of events happening in a fixed interval of time or space, given a known
average rate.

Conditions:

The events occur independently.

The average number of events (λ) is known.

The events are rare and random.

Formula:

Untitled 8
To calculate the probability of exactly k events happening in a fixed interval,
use the Poisson formula:

P(X = k) = (λ^k * e^(-λ)) / k!


Where:

λ is the average number of events in the interval.

k is the number of events.

e is Euler's number (approximately 2.718).

Example (Poisson Distribution):


Suppose on average, 3 cars pass through a toll booth every 10 minutes. What
is the probability of exactly 2 cars passing in the next 10 minutes?

λ = 3 (average number of cars)

k = 2 (number of cars to occur)

Solution:

1. Apply the Poisson formula:P(X = 2) = (3^2 * e^(-3)) / 2!P(X = 2) = (9 *


0.0498) / 2 = 0.2241

So, the probability of exactly 2 cars passing in the next 10 minutes is 0.2241 or
22.41%.

📐 Normal Probability Distribution


What is Normal Distribution?
Definition: A normal distribution is a continuous probability distribution that
is symmetric around the mean, where most of the observations cluster
around the central peak.

Properties:

Symmetric: The mean, median, and mode are all the same.

Bell-shaped curve: The curve is highest at the mean and decreases


symmetrically on either side.

68-95-99.7 Rule:

68% of the data falls within 1 standard deviation of the mean.

Untitled 9
95% falls within 2 standard deviations.

99.7% falls within 3 standard deviations.

Formula for Normal Distribution:


The Probability Density Function (PDF) is:f(x) = (1 / (σ√(2π))) * e^(-((x -
μ)^2 / (2σ^2)))
Where:

μ is the mean of the distribution.

σ is the standard deviation.

x is the variable being measured.

Example (Normal Distribution):


Suppose the scores of a class on a final exam follow a normal distribution with
a mean of 70 and a standard deviation of 10. What is the probability that a
randomly chosen student scored between 60 and 80?

μ = 70 (mean)

σ = 10 (standard deviation)

We need to find P(60 ≤ X ≤ 80).

To find this, use z-scores and refer to the normal distribution table or use a
calculator to find the cumulative probability.

📚 Key Takeaways
Probability measures the likelihood of an event occurring.

Binomial Distribution is used for discrete events with two outcomes.

Poisson Distribution deals with rare events occurring in a fixed interval.

Normal Distribution is continuous and bell-shaped, used for many natural


phenomena.

Untitled 10
CH3/////////

📚 Correlation and Regression


Welcome to today's topic! Let's explore how we analyze relationships between
variables using correlation and regression techniques.

🔑 Basic Concepts of Correlation and Regression


What is Correlation?
Definition: Correlation is a statistical technique used to measure the
strength and direction of the relationship between two variables.

Key Idea: Correlation helps us understand if an increase (or decrease) in


one variable results in an increase (or decrease) in another variable.

Types of Correlation:
1. Positive Correlation: As one variable increases, the other also increases.
Example: Height and weight are positively correlated (taller people tend to
weigh more).

2. Negative Correlation: As one variable increases, the other decreases.


Example: The number of hours spent watching TV and academic
performance might have a negative correlation.

3. No Correlation: No discernible relationship between the variables.


Example: Shoe size and intelligence.

Correlation Coefficient (r):


Formula:
r = (Σ(xi - x̄ )(yi - ȳ)) / √(Σ(xi - x̄ )² Σ(yi - ȳ)²)
Where:

xi and yi are the data points for variables x and y.

x̄ and ȳ are the means of x and y.

r ranges from 1 to +1:

r = +1: Perfect positive correlation.

Untitled 11
r = -1: Perfect negative correlation.

r = 0: No correlation.

📊 Simple Linear Regression and Correlation


What is Simple Linear Regression?
Definition: Simple linear regression models the relationship between two
variables by fitting a straight line (linear equation) to the data.

Objective: Predict the value of the dependent variable (y) based on the
independent variable (x).

Regression Equation:
The formula for the regression line is:
y = β₀ + β₁x + ε

Where:

y is the dependent variable.

x is the independent variable.

β₀ is the y-intercept.

β₁ is the slope of the line (shows the change in y per unit change in x).

ε is the error term (residuals).

Steps to Calculate the Regression Line:


1. Find the mean of both x and y.

2. Calculate the slope (β₁):


β₁ = Σ((xi - x̄ )(yi - ȳ)) / Σ((xi - x̄ )²)

3. Calculate the intercept (β₀):


β₀ = ȳ - β₁x̄

4. Plot the line on a scatter plot of x and y.

Example of Simple Linear Regression


Suppose we have the following data on hours studied (x) and exam scores (y):

Untitled 12
Hours Studied (x) Exam Score (y)

1 60

2 65

3 70

4 75

5 80

Step 1: Calculate the Means


Mean of x:
x̄ = (1 + 2 + 3 + 4 + 5) / 5 = 3

Mean of y:
ȳ = (60 + 65 + 70 + 75 + 80) / 5 = 70

Step 2: Calculate the Slope (β₁)


β₁ = Σ((xi - x̄ )(yi - ȳ)) / Σ((xi - x̄ )²)
(Use simplified calculations as shown in the original explanation)

Step 3: Calculate the Intercept (β₀)


β₀ = ȳ - β₁x̄

Regression Equation:
y = 40 + 10x

🔢Correlation
Introduction to Multiple Regression and

What is Multiple Regression?


Definition: Multiple regression is an extension of simple linear regression. It
models the relationship between a dependent variable and two or more
independent variables.

Objective: Predict the value of the dependent variable (y) based on the
values of several independent variables (x₁, x₂, x₃,...).

Multiple Regression Equation:

Untitled 13
y = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + ... + βₙxₙ + ε
Where:

y is the dependent variable.

x₁, x₂, ... xₙ are the independent variables.

β₀ is the intercept.

β₁, β₂, ... βₙ are the coefficients for each independent variable.

Applications of Multiple Regression:


Predicting outcomes based on several factors.
Example: Predicting a person’s salary based on years of experience,
education level, and location.

Analyzing the impact of multiple factors on a dependent variable.


Example: Analyzing how advertising, price, and store location affect sales.

🔍Regression
Key Differences Between Simple and Multiple

Feature Simple Linear Regression Multiple Regression

Number of Two or more independent


One independent variable (x)
Variables variables (x₁, x₂, ... xₙ)

Regression y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ +


y = β₀ + β₁x + ε
Equation ε

Predict a dependent variable Predict a dependent variable


Purpose
using one predictor using multiple predictors

📚 Key Takeaways
Correlation helps measure the strength and direction of relationships
between variables.

Simple Linear Regression models the relationship between two variables.

Multiple Regression extends simple regression to more than two variables,


making predictions more accurate by considering multiple factors
simultaneously.

Untitled 14
CH4/////

📚Intervals
Hypothesis Testing and Confidence

🔑 Basic Concepts of Hypothesis Testing


Hypothesis Testing: A statistical method used to make inferences or draw
conclusions about a population based on sample data.

Null Hypothesis (H₀): The hypothesis that there is no significant effect or


difference. It is assumed true until evidence suggests otherwise.

Alternative Hypothesis (H₁ or Ha): The hypothesis that suggests a


significant effect or difference.

🧮 Simple and Alternative Hypotheses


Null Hypothesis (H₀): States there is no effect or no difference.

Example: H₀: μ = 50 (The population mean is 50).

Alternative Hypothesis (H₁ or Ha): States there is an effect or a difference.

Example: H₁: μ ≠ 50 (The population mean is not 50).

🔍Proportion,
Hypothesis Testing of a Single Population Mean,
and Variance
1. Testing the Population Mean
Formula (t-test for population mean):

t = (x̄ - μ₀) / (s / √n)


Where:

x̄ = sample mean

Untitled 15
μ₀ = population mean

s = sample standard deviation

n = sample size

Example: Testing if the average height of a group of students is 170 cm.

2. Testing the Population Proportion


Formula (z-test for population proportion):

z = (p̂ - p₀) / √[p₀(1 - p₀) / n]


Where:

p̂ = sample proportion

p₀ = hypothesized population proportion

n = sample size

Example: Testing if the proportion of students who pass a test is 80%.

3. Testing the Population Variance


Formula (Chi-Square test for variance):
χ² = (n - 1) * s² / σ₀²
Where:

n = sample size

s² = sample variance

σ₀² = hypothesized population variance

Example: Testing if the variance in blood pressure measurements is 25.

🔄TwoHypothesis Testing for the Difference Between


Populations (Mean, Proportion, and Variance)
1. Difference Between Two Population Means (Independent t-
test)
Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]

Untitled 16
Where:

x̄ ₁, x̄ ₂ = sample means

s₁², s₂² = sample variances

n₁, n₂ = sample sizes

Example: Testing if the mean weight loss is different between two groups
using different diets.

2. Difference Between Two Population Proportions (z-test)


Formula:
z = (p̂ ₁ - p̂ ₂) / √[p̂ (1 - p̂ ) * (1/n₁ + 1/n₂)]
Where:

p̂ ₁, p̂ ₂ = sample proportions

n₁, n₂ = sample sizes

p̂ = pooled sample proportion

Example: Testing if the proportion of smokers in two cities is different.

3. Difference Between Two Population Variances (F-test)


Formula:

F = s₁² / s₂²
Where:

s₁², s₂² = sample variances

Example: Testing if the variability in exam scores is different between two


schools.

📊Errors
Interpretation of p-value, Type I, and Type II

1. p-value
Definition: The p-value indicates the strength of the evidence against the
null hypothesis.

Untitled 17
Interpretation:

p-value ≤ α (significance level): Reject the null hypothesis (evidence


suggests the alternative hypothesis is true).

p-value > α: Fail to reject the null hypothesis (not enough evidence for
the alternative hypothesis).

2. Type I Error (α)


Definition: Occurs when the null hypothesis is incorrectly rejected when it
is actually true.

"False positive."

3. Type II Error (β)


Definition: Occurs when the null hypothesis is incorrectly failed to be
rejected when it is actually false.

"False negative."

📏Interpretation
Confidence Intervals: Estimation and

Confidence Interval (CI): A range of values, derived from sample data, that
is likely to contain the population parameter with a certain level of
confidence (e.g., 95%).

Formula for Confidence Interval of the Mean:


CI = x̄ ± z * (s / √n)
Where:

x̄ = sample mean

z = z-value for the desired confidence level (e.g., 1.96 for 95% CI)

s = sample standard deviation

n = sample size

Example: If the sample mean is 50, the sample standard deviation is 10, and
the sample size is 30, the 95% confidence interval for the population mean
is:

Untitled 18
CI = 50 ± 1.96 * (10 / √30) ≈ 50 ± 3.58 → (46.42, 53.58)

Confidence Interval for Proportions:


CI = p̂ ± z * √[p̂ (1 - p̂ ) / n]
Where:

p̂ = sample proportion

z = z-value for the desired confidence level

n = sample size

Confidence Interval for Odds Ratios:


CI = exp(ln(OR) ± z * SE)
Where:

OR = odds ratio

SE = standard error of the log(OR)

z = z-value for the desired confidence level

🔄Hypothesis
Relationship Between Confidence Intervals and
Testing
Key Connection: A confidence interval for a parameter can be used to
perform hypothesis testing.

If the null value (e.g., 0 for the difference in means, 1 for odds ratio)
falls within the confidence interval, we fail to reject the null hypothesis.

If the null value falls outside the confidence interval, we reject the null
hypothesis.

💡 Key Takeaways
Hypothesis Testing allows us to make decisions about population
parameters based on sample data.

p-values help determine whether we should reject the null hypothesis or


not.

Untitled 19
Confidence Intervals provide a range of plausible values for a population
parameter, and they are closely related to hypothesis testing.

Understanding Type I and Type II Errors is crucial for interpreting the


results correctly.

💬 Practice Questions
t-test: Test whether the average salary of employees in a department is
different from $50,000.

Confidence Interval: Estimate the population mean of test scores given a


sample mean of 78, sample standard deviation of 10, and sample size of 25.

p-value: If the p-value from a hypothesis test is 0.03 and α = 0.05, should
you reject or fail to reject the null hypothesis?

//ch5

📚 Parametric and Nonparametric Tests


🔑 What are Parametric and Nonparametric Tests?
Parametric Tests
Definition: Tests that assume the data follows a specific distribution
(usually normal distribution).

Used when: Data is interval or ratio scale and meets assumptions about
population parameters (e.g., mean, standard deviation).

Examples:
t-tests (One-sample, Independent two-sample, Paired sample)

Analysis of Variance (ANOVA)

Nonparametric Tests

Untitled 20
Definition: Tests that do not assume a specific data distribution.

Used when: Data is ordinal or when parametric assumptions (normality) are


not met.

Examples:
Mann-Whitney U test

Wilcoxon signed-rank test

Chi-square test

📊 Student’s t-tests
1. One-Sample t-test
Purpose: Compares the sample mean to a known population mean.

Formula:
t = (x̄ - μ₀) / (s / √n)
Where:

x̄ = sample mean

μ₀ = population mean

s = sample standard deviation

n = sample size

Example:

You want to test if the average height of a group of 30 students is 170 cm


(population mean μ₀ = 170), and the sample mean is 168 cm with a sample
standard deviation of 10 cm.
t = (168 - 170) / (10 / √30) = -0.547

2. Independent Two-Sample t-test


Purpose: Compares the means of two independent groups.

Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]
Where:

Untitled 21
x̄ ₁, x̄ ₂ = sample means

s₁², s₂² = sample variances

n₁, n₂ = sample sizes

Example:
Comparing average test scores of two different teaching methods. Group 1
(n₁ = 25) has a mean score of 75 and standard deviation of 12, and Group 2
(n₂ = 30) has a mean score of 80 and standard deviation of 15.

3. Paired Sample t-test


Purpose: Compares the means of two related groups (e.g., pre-test and
post-test scores).

Formula:
t = (d̄ ) / (sₖ / √n)
Where:

d̄ = mean of the differences

sₖ = standard deviation of the differences

n = number of pairs

Example:

Comparing pre- and post-treatment scores of patients in a clinical study.

🧪 Analysis of Variance (ANOVA)


1. One-Way ANOVA
Purpose: Compares the means of three or more independent groups.

Formula:
F = (Between-group variance) / (Within-group variance)

Example:
Comparing the effectiveness of three different diets on weight loss, where
you have three groups with different diets.

2. Two-Way ANOVA

Untitled 22
Purpose: Examines two factors simultaneously, and their interaction effect
on the dependent variable.

Formula:

The formula for the F-statistic is similar but involves multiple sources of
variance (Factor A, Factor B, and Interaction).

Example:
Testing the effect of teaching methods (A/B/C) and time of day
(Morning/Evening) on test scores.

3. ANCOVA (Analysis of Covariance)


Purpose: Combines ANOVA and regression to assess the effect of one or
more continuous covariates.

Example:

Comparing the effectiveness of different teaching methods on student


performance while controlling for previous academic achievement.

📊 Chi-Square Test
Purpose: Tests the relationship between two categorical variables.

Formula (Chi-Square Test of Independence):


χ² = Σ [(O - E)² / E]
Where:

O = observed frequency

E = expected frequency

Example:

Testing if the distribution of voter preferences (for political parties) is


independent of region.

🧩 Nonparametric Tests
1. Mann-Whitney U Test

Untitled 23
Purpose: A nonparametric test that compares differences between two
independent groups when the dependent variable is ordinal or continuous
but not normally distributed.

Hypothesis:

Null Hypothesis (H₀): The distributions of the two groups are identical.

Alternative Hypothesis (H₁): The distributions of the two groups are


different.

2. Wilcoxon Signed-Rank Test


Purpose: A nonparametric test to compare two related samples (e.g.,
before and after measurements).

Hypothesis:

Null Hypothesis (H₀): The median of the differences is zero.

Alternative Hypothesis (H₁): The median of the differences is not zero.

3. Spearman’s Rank Correlation Coefficient (ρ)


Purpose: Measures the strength and direction of association between two
ranked variables.

Formula:
ρ = 1 - [(6 Σ dᵢ²) / (n(n² - 1))]
Where:

dᵢ = difference in ranks for each pair of data

n = number of observations

Example:
Assessing the relationship between job satisfaction and employee
performance by ranking both variables.

📚 Summary of Key Tests


Student's t-test: Used to compare the means of one or two groups.

ANOVA: Compares the means of three or more groups, or tests for


interaction effects between two factors.

Untitled 24
Chi-Square Test: Tests relationships between categorical variables.

Nonparametric Tests: Used when data doesn’t meet assumptions of


parametric tests. Includes tests like the Mann-Whitney U and Wilcoxon
signed-rank tests.

💡 Classroom Practice
t-test: Compare the average height of students in two classrooms.

ANOVA: Test if three types of training methods have different effects on


student performance.

Chi-Square: Analyze if gender influences voting preferences.

Spearman’s Rank Correlation: Investigate the relationship between


employee experience and salary.

Here’s a Notion-styled documentation for Hypothesis Testing and


Confidence Intervals, formatted with text-based equations for better
compatibility in Notion.

📚Intervals
Hypothesis Testing and Confidence

In today's lesson, we'll be diving into the world of hypothesis testing and
confidence intervals, two key concepts in statistics that help us make decisions
based on data.

🔑 Basic Concepts of Hypothesis Testing


Hypothesis Testing: A statistical method used to make inferences or draw
conclusions about a population based on sample data.

Null Hypothesis (H₀): The hypothesis that there is no significant effect or


difference. It is assumed true until evidence suggests otherwise.

Untitled 25
Alternative Hypothesis (H₁ or Ha): The hypothesis that suggests a
significant effect or difference.

🧮 Simple and Alternative Hypotheses


Null Hypothesis (H₀): States there is no effect or no difference.

Example: H₀: μ = 50 (The population mean is 50).

Alternative Hypothesis (H₁ or Ha): States there is an effect or a difference.

Example: H₁: μ ≠ 50 (The population mean is not 50).

🔍Proportion,
Hypothesis Testing of a Single Population Mean,
and Variance
1. Testing the Population Mean
Formula (t-test for population mean):
t = (x̄ - μ₀) / (s / √n)
Where:

x̄ = sample mean

μ₀ = population mean

s = sample standard deviation

n = sample size

Example: Testing if the average height of a group of students is 170 cm.

2. Testing the Population Proportion


Formula (z-test for population proportion):
z = (p̂ - p₀) / √[p₀(1 - p₀) / n]
Where:

p̂ = sample proportion

p₀ = hypothesized population proportion

n = sample size

Untitled 26
Example: Testing if the proportion of students who pass a test is 80%.

3. Testing the Population Variance


Formula (Chi-Square test for variance):
χ² = (n - 1) * s² / σ₀²
Where:

n = sample size

s² = sample variance

σ₀² = hypothesized population variance

Example: Testing if the variance in blood pressure measurements is 25.

🔄TwoHypothesis Testing for the Difference Between


Populations (Mean, Proportion, and Variance)
1. Difference Between Two Population Means (Independent t-
test)
Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]
Where:

x̄ ₁, x̄ ₂ = sample means

s₁², s₂² = sample variances

n₁, n₂ = sample sizes

Example: Testing if the mean weight loss is different between two groups
using different diets.

2. Difference Between Two Population Proportions (z-test)


Formula:
z = (p̂ ₁ - p̂ ₂) / √[p̂ (1 - p̂ ) * (1/n₁ + 1/n₂)]
Where:

p̂ ₁, p̂ ₂ = sample proportions

Untitled 27
n₁, n₂ = sample sizes

p̂ = pooled sample proportion

Example: Testing if the proportion of smokers in two cities is different.

3. Difference Between Two Population Variances (F-test)


Formula:
F = s₁² / s₂²
Where:

s₁², s₂² = sample variances

Example: Testing if the variability in exam scores is different between two


schools.

📊Errors
Interpretation of p-value, Type I, and Type II

1. p-value
Definition: The p-value indicates the strength of the evidence against the
null hypothesis.

Interpretation:

p-value ≤ α (significance level): Reject the null hypothesis (evidence


suggests the alternative hypothesis is true).

p-value > α: Fail to reject the null hypothesis (not enough evidence for
the alternative hypothesis).

2. Type I Error (α)


Definition: Occurs when the null hypothesis is incorrectly rejected when it
is actually true.

"False positive."

3. Type II Error (β)


Definition: Occurs when the null hypothesis is incorrectly failed to be
rejected when it is actually false.

Untitled 28
"False negative."

📏Interpretation
Confidence Intervals: Estimation and

Confidence Interval (CI): A range of values, derived from sample data, that
is likely to contain the population parameter with a certain level of
confidence (e.g., 95%).

Formula for Confidence Interval of the Mean:


CI = x̄ ± z * (s / √n)
Where:

x̄ = sample mean

z = z-value for the desired confidence level (e.g., 1.96 for 95% CI)

s = sample standard deviation

n = sample size

Example: If the sample mean is 50, the sample standard deviation is 10, and
the sample size is 30, the 95% confidence interval for the population mean
is:
CI = 50 ± 1.96 * (10 / √30) ≈ 50 ± 3.58 → (46.42, 53.58)

Confidence Interval for Proportions:


CI = p̂ ± z * √[p̂ (1 - p̂ ) / n]
Where:

p̂ = sample proportion

z = z-value for the desired confidence level

n = sample size

Confidence Interval for Odds Ratios:


CI = exp(ln(OR) ± z * SE)
Where:

OR = odds ratio

Untitled 29
SE = standard error of the log(OR)

z = z-value for the desired confidence level

🔄Hypothesis
Relationship Between Confidence Intervals and
Testing
Key Connection: A confidence interval for a parameter can be used to
perform hypothesis testing.

If the null value (e.g., 0 for the difference in means, 1 for odds ratio)
falls within the confidence interval, we fail to reject the null hypothesis.

If the null value falls outside the confidence interval, we reject the null
hypothesis.

💡 Key Takeaways
Hypothesis Testing allows us to make decisions about population
parameters based on sample data.

p-values help determine whether we should reject the null hypothesis or


not.

Confidence Intervals provide a range of plausible values for a population


parameter, and they are closely related to hypothesis testing.

Understanding Type I and Type II Errors is crucial for interpreting the


results correctly.

💬 Practice Questions
t-test: Test whether the average salary of employees in a department is
different from $50,000.

Confidence Interval: Estimate the population mean of test scores given a


sample mean of 78, sample standard deviation of 10, and sample size of 25.

p-value: If the p-value from a hypothesis test is 0.03 and α = 0.05, should
you reject or fail to reject the null hypothesis?

Untitled 30
📚Applications
Research Project and Practical

In this section, we will explore how advanced statistical methods can be


applied in real-world research projects and practical healthcare applications.

📝Methods
Research Project: Applying Advanced Statistical

Objective
Individual or Group Research Project:

Students will apply advanced statistical methods to a real-world


research problem.

The project will involve data collection, analysis, and interpretation of


results.

Research Project Phases:


1. Topic Selection: Choose a research topic related to biostatistics or
healthcare (e.g., analyzing the effectiveness of a new drug, comparing
different treatment options).

2. Data Collection: Collect primary or secondary data, ensuring it's relevant


and sufficient to answer the research question.

3. Data Analysis: Use statistical methods like hypothesis testing, regression


analysis, ANOVA, and more.

4. Interpretation of Results: Draw conclusions from the data, test hypotheses,


and interpret the significance of the findings.

5. Presentation: Share the results with the class in a professional manner.

Key Statistical Methods to Use:


Regression Analysis: To model relationships between variables.

ANOVA (Analysis of Variance): To compare means across different groups.

Chi-square Test: To examine relationships between categorical variables.

Untitled 31
Confidence Intervals and Hypothesis Testing: To estimate population
parameters and test assumptions.

🏥Techniques
Practical Applications of Advanced Biostatistics
in Healthcare
Real-World Applications
Advanced biostatistics techniques are widely used in healthcare to improve
patient outcomes, optimize treatment strategies, and inform public health
policies. Some common applications include:

1. Clinical Trials: Using statistical methods to analyze the efficacy of drugs or


interventions.

Example: Analyzing the effectiveness of a new vaccine using t-tests


and confidence intervals.

2. Epidemiology: Applying statistical models to study the distribution and


determinants of diseases in populations.

Example: Using regression models to identify risk factors for heart


disease.

3. Public Health Research: Analyzing large datasets to guide health policy


decisions.

Example: Multiple regression to predict hospital readmissions based on


patient characteristics.

4. Healthcare Quality Improvement: Using data to optimize healthcare


services.

Example: ANOVA to assess differences in treatment outcomes across


various clinics.

Common Statistical Techniques Used in Healthcare:


Survival Analysis: To analyze the time to event data (e.g., time to recovery
or death).

Logistic Regression: To model the probability of a binary outcome (e.g.,


disease present or absent).

Untitled 32
Cohort Studies and Case-Control Studies: Common in epidemiology to
explore relationships between exposures and outcomes.

📊 Key Steps in Research Project Implementation


1. Formulating Hypotheses:

Null Hypothesis (H₀): Assumes no effect or difference.

Alternative Hypothesis (H₁ or Ha): Assumes there is an effect or


difference.

2. Data Collection and Preprocessing:

Clean the dataset (remove outliers, handle missing data).

Visualize data distributions (e.g., histograms, box plots).

3. Performing Statistical Analysis:

Descriptive Statistics: Calculate mean, median, mode, standard


deviation.

Inferential Statistics: Use t-tests, chi-square tests, and regression


analysis for hypothesis testing and estimating relationships.

4. Interpreting Results:

Look at p-values and confidence intervals to draw conclusions.

Assess model assumptions (normality, homogeneity of variance, etc.).

5. Creating Visualizations:

Bar Charts, Histograms, and Scatter Plots to present data trends.

Box Plots for comparing distributions across groups.

🎤Findings
Presentation and Discussion of Research

Structure of Presentation:
1. Introduction: Provide background on the research question, objectives, and
the importance of the study.

Untitled 33
2. Methods: Explain the data collection process, statistical methods used, and
any assumptions made.

3. Results: Present findings with the help of visual aids (charts, graphs).

4. Discussion: Interpret results, explain limitations, and suggest future


research directions.

5. Conclusion: Summarize the key takeaways and implications of the findings.

Tips for Effective Presentation:


Clear Visuals: Use simple and easy-to-understand charts/graphs.

Data Interpretation: Focus on explaining what the numbers mean in the


context of the research question.

Engage Your Audience: Keep the presentation interactive by asking


questions or discussing practical implications of the findings.

📈 Example Project Workflow


Research Question: Does a new drug reduce recovery time in patients with flu
symptoms compared to a placebo?

1. Data Collection:

Group 1: Patients receiving the drug (n = 50).

Group 2: Patients receiving placebo (n = 50).

Measure recovery time (in days) for each patient.

2. Statistical Analysis:

Hypothesis: H₀: μ₁ = μ₂ (No difference in recovery times).

Test: Independent two-sample t-test to compare means.

Confidence Interval: Calculate a 95% CI for the difference in recovery


times.

3. Results:

If the p-value is less than 0.05, reject H₀ (suggesting the drug has a
significant effect).

Present the confidence interval for the difference in recovery times.

Untitled 34
4. Discussion:

If the results support the hypothesis, discuss the practical implications


for treatment options.

Address any limitations, such as sample size or potential biases.

💡 Key Takeaways:
Research Projects allow students to apply statistical methods to real-world
problems.

Practical Applications of biostatistics are vast in healthcare, ranging from


clinical trials to public health research.

Effective Presentation of research findings involves clear visuals and


strong data interpretation.

The use of advanced statistical methods is crucial in making informed


decisions in healthcare.

Practice Exercise:
Scenario: You are conducting a study to determine if a new heart
medication reduces cholesterol levels more effectively than the current
standard treatment. Create a research question, outline your hypothesis,
and suggest statistical methods for analysis.

Untitled 35

You might also like