0% found this document useful (0 votes)

16 views

biostatistis

The document provides an introduction to biostatistics, explaining its importance in research and healthcare, and differentiating between descriptive and inferential statistics. It covers various statistical concepts such as types of data, measures of central tendency, probability distributions, and correlation and regression techniques. Key takeaways include the significance of probability in predicting outcomes and the use of different distributions to analyze data trends.

Uploaded by

jay.kishan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

biostatistis

Uploaded by

jay.kishan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

📚 Introduction to Biostatistics

Let's explore how biostatistics shapes scientific research and healthcare!

🌟 Why Do We Need Biostatistics?

Definition: The application of statistics to biological, medical, and health-
related fields.

Importance:

📈 Helps in understanding data trends in research and healthcare.

🧪 Provides tools to validate scientific hypotheses.
Example:
Biostatistics helps determine whether a new drug is effective by analyzing
patient recovery rates.

📊 Descriptive vs. Inferential Statistics

Descriptive Statistics
Purpose: Summarizes and organizes data.

Examples:

Mean, Median, Mode

Graphs like Bar Charts, Histograms

Inferential Statistics
Purpose: Makes predictions or generalizations based on sample data.

Example:

🧬 Testing if a vaccine reduces disease cases across populations.

📋 Types of Data
Untitled 1
1. Grouped vs. Ungrouped Data

Grouped Data Ungrouped Data

Organized into intervals (e.g., age groups: Raw, unorganized data (e.g., individual
0–10, 11–20) ages: 5, 12, 34)

2. Categories of Data

Category Definition Example

Nominal No specific order. Blood type (A, B, AB, O)

Disease severity (mild, moderate,

Ordinal Ranked data.
severe)

Sorted
Ranked Order of finishing in a race
numerically/categorically.

Discrete Whole numbers only. Number of students in a class

Can take any value within a

Continuous Height (e.g., 162.5 cm)
range.

📊 Representing Data
1. Tabulated Data
Organize information into rows and columns for clarity.

2. Graphical Representation
Bar Charts: For comparing categorical data.

Histograms: For frequency distributions of continuous data.

Pie Charts: For visualizing proportions.

1. Measures of Central Tendency

1.1 Mean (Average)
Formula:
Mean = (Sum of all data points) / (Total number of data points)

Example:

Data: 20, 22, 24, 26, 28

Untitled 2
Sum = 120

Mean = 120 / 5 = 24

1.2 Median
Steps to Calculate:

1. Arrange data in ascending order.

2. Find the middle value:

If odd: Middle value is the median.

If even: Median = Average of two middle values.

Example (Odd Data Points):

Data: 20, 22, 24, 26, 28

Median = 24

Example (Even Data Points):

Data: 20, 22, 24, 26

Median = (22 + 24) / 2 = 23

1.3 Mode
Definition: The most frequently occurring value in the dataset.

Example:

Data: 20, 22, 22, 24, 28

Mode = 22 (occurs twice)

2. Measures of Dispersion
2.1 Range
Formula: Range = Maximum value - Minimum value

Example:

Data: 20, 22, 24, 26, 28

Range = 28 - 20 = 8

Untitled 3
2.2 Interquartile Range (IQR)
Definition: The range of the middle 50% of the data.

Steps to Calculate:

1. Find Q1 (First Quartile): Median of the lower half.

2. Find Q3 (Third Quartile): Median of the upper half.

3. IQR = Q3 - Q1

Example:

Data: 20, 22, 24, 26, 28

Q1 = 21

Q3 = 27

IQR = 27 - 21 = 6

2.3 Variance
Formula: Variance = (Sum of squared deviations from the mean) / (Total number of
data points)

Steps to Calculate:

1. Find the mean.

2. Subtract the mean from each data point and square the result.

3. Sum the squared deviations.

4. Divide by the number of data points.

Example:

Data: 20, 22, 24, 26, 28

Mean = 24

Deviations: (20 - 24), (22 - 24), (24 - 24), (26 - 24), (28 - 24)

Squared deviations: 16, 4, 0, 4, 16

Sum of squared deviations = 40

Variance = 40 / 5 = 8

2.4 Standard Deviation (SD)

Untitled 4
Formula: Standard Deviation = √Variance

Example:

Variance = 8

Standard Deviation = √8 ≈ 2.83

2.5 Coefficient of Variation (CV)

Formula: CV = (Standard Deviation / Mean) × 100

Example:

Mean = 24, SD = 2.83

CV = (2.83 / 24) × 100 ≈ 11.79%

3. Graphical Representation
Example - Histogram

Age Range (Bins) Frequency

20–25 3

26–30 3

31–35 2

Steps:

Plot Age Ranges on the x-axis.

Plot Frequency on the y-axis.

4. Using Tools (e.g., Excel)

Mean, Median, Mode:

Use formulas: =AVERAGE(range) , =MEDIAN(range) , and =MODE(range)

Variance and Standard Deviation:

Use formulas: =VAR.P(range) and =STDEV.P(range)

Graphical Representation:

Insert histograms, bar charts, or scatter plots via the "Insert" tab.

Untitled 5
CH2////////////////////////////////

📚Distributions
Probability and Probability

Let's dive into the world of probability, where we predict outcomes and analyze
random events in different scenarios.

🔢 Basic Probability Concepts

What is Probability?
Definition: Probability is the measure of the likelihood of an event
happening.

Formula:P(A) = (Number of favorable outcomes) / (Total number of

possible outcomes)

Range: Probability values range from 0 to 1

P(A) = 0 means the event will not happen.

P(A) = 1 means the event will definitely happen.

Types of Events
Independent Events:

Events where the occurrence of one does not affect the probability of the
other.

Example: Tossing a coin and rolling a die.

Dependent Events:

Events where the occurrence of one affects the probability of the other.
Example: Drawing cards from a deck without replacement.

Mutually Exclusive Events:

Two events that cannot happen at the same time.

Untitled 6
Example: Tossing a coin (it can't land heads and tails at the same time).

Complementary Events:
Two events that together exhaust all possible outcomes.

Example: If event A is "rain," then the complement is "no rain."

📊 Discrete Probability Distributions

What is a Discrete Distribution?
Definition: A discrete probability distribution is a distribution of a random
variable that can take on a finite number of values.

Example: The number of heads when tossing three coins.

Key Properties:
1. Each probability is between 0 and 1.

2. The sum of all probabilities in a discrete distribution is equal to 1.

🎯 Binomial Distribution
What is Binomial Distribution?
Definition: A binomial distribution models the number of successes in a
fixed number of independent Bernoulli trials (yes/no outcomes).

Conditions:

Fixed number of trials (n).

Only two possible outcomes: success (S) or failure (F).

Constant probability of success in each trial (p).

Independence between trials.

Formula:
To calculate the probability of exactly k successes in n trials:
P(X = k) = (n choose k) * p^k * (1 - p)^(n - k)
Where:

Untitled 7
(n choose k) is the binomial coefficient, which is calculated as:(n choose k)
= n! / [k!(n - k)!]

p is the probability of success.

k is the number of successes.

n is the number of trials.

Example (Binomial Distribution):

Suppose you flip a coin 5 times. What is the probability of getting 3 heads?

n = 5 (number of trials)

p = 0.5 (probability of heads)

k = 3 (number of heads desired)

Solution:

1. First, calculate the binomial coefficient for n = 5 and k = 3:(5 choose 3) = 5!

/ [3!(5 - 3)!] = 10

2. Apply the formula:P(X = 3) = 10 * 0.5^3 * (0.5)^2 = 10 * 0.125 * 0.25 =

0.3125

So, the probability of getting exactly 3 heads is 0.3125 or 31.25%.

📈 Poisson Distribution
What is Poisson Distribution?
Definition: A Poisson distribution models the probability of a given number
of events happening in a fixed interval of time or space, given a known
average rate.

Conditions:

The events occur independently.

The average number of events (λ) is known.

The events are rare and random.

Formula:

Untitled 8
To calculate the probability of exactly k events happening in a fixed interval,
use the Poisson formula:

P(X = k) = (λ^k * e^(-λ)) / k!

Where:

λ is the average number of events in the interval.

k is the number of events.

e is Euler's number (approximately 2.718).

Example (Poisson Distribution):

Suppose on average, 3 cars pass through a toll booth every 10 minutes. What
is the probability of exactly 2 cars passing in the next 10 minutes?

λ = 3 (average number of cars)

k = 2 (number of cars to occur)

Solution:

1. Apply the Poisson formula:P(X = 2) = (3^2 * e^(-3)) / 2!P(X = 2) = (9 *

0.0498) / 2 = 0.2241

So, the probability of exactly 2 cars passing in the next 10 minutes is 0.2241 or
22.41%.

📐 Normal Probability Distribution

What is Normal Distribution?
Definition: A normal distribution is a continuous probability distribution that
is symmetric around the mean, where most of the observations cluster
around the central peak.

Properties:

Symmetric: The mean, median, and mode are all the same.

Bell-shaped curve: The curve is highest at the mean and decreases

symmetrically on either side.

68-95-99.7 Rule:

68% of the data falls within 1 standard deviation of the mean.

Untitled 9
95% falls within 2 standard deviations.

99.7% falls within 3 standard deviations.

Formula for Normal Distribution:

The Probability Density Function (PDF) is:f(x) = (1 / (σ√(2π))) * e^(-((x -
μ)^2 / (2σ^2)))
Where:

μ is the mean of the distribution.

σ is the standard deviation.

x is the variable being measured.

Example (Normal Distribution):

Suppose the scores of a class on a final exam follow a normal distribution with
a mean of 70 and a standard deviation of 10. What is the probability that a
randomly chosen student scored between 60 and 80?

μ = 70 (mean)

σ = 10 (standard deviation)

We need to find P(60 ≤ X ≤ 80).

To find this, use z-scores and refer to the normal distribution table or use a
calculator to find the cumulative probability.

📚 Key Takeaways
Probability measures the likelihood of an event occurring.

Binomial Distribution is used for discrete events with two outcomes.

Poisson Distribution deals with rare events occurring in a fixed interval.

Normal Distribution is continuous and bell-shaped, used for many natural

phenomena.

Untitled 10
CH3/////////

📚 Correlation and Regression

Welcome to today's topic! Let's explore how we analyze relationships between
variables using correlation and regression techniques.

🔑 Basic Concepts of Correlation and Regression

What is Correlation?
Definition: Correlation is a statistical technique used to measure the
strength and direction of the relationship between two variables.

Key Idea: Correlation helps us understand if an increase (or decrease) in

one variable results in an increase (or decrease) in another variable.

Types of Correlation:
1. Positive Correlation: As one variable increases, the other also increases.
Example: Height and weight are positively correlated (taller people tend to
weigh more).

2. Negative Correlation: As one variable increases, the other decreases.

Example: The number of hours spent watching TV and academic
performance might have a negative correlation.

3. No Correlation: No discernible relationship between the variables.

Example: Shoe size and intelligence.

Correlation Coefficient (r):

Formula:
r = (Σ(xi - x̄ )(yi - ȳ)) / √(Σ(xi - x̄ )² Σ(yi - ȳ)²)
Where:

xi and yi are the data points for variables x and y.

x̄ and ȳ are the means of x and y.

r ranges from 1 to +1:

r = +1: Perfect positive correlation.

Untitled 11
r = -1: Perfect negative correlation.

r = 0: No correlation.

📊 Simple Linear Regression and Correlation

What is Simple Linear Regression?
Definition: Simple linear regression models the relationship between two
variables by fitting a straight line (linear equation) to the data.

Objective: Predict the value of the dependent variable (y) based on the
independent variable (x).

Regression Equation:
The formula for the regression line is:
y = β₀ + β₁x + ε

Where:

y is the dependent variable.

x is the independent variable.

β₀ is the y-intercept.

β₁ is the slope of the line (shows the change in y per unit change in x).

ε is the error term (residuals).

Steps to Calculate the Regression Line:

1. Find the mean of both x and y.

2. Calculate the slope (β₁):

β₁ = Σ((xi - x̄ )(yi - ȳ)) / Σ((xi - x̄ )²)

3. Calculate the intercept (β₀):

β₀ = ȳ - β₁x̄

4. Plot the line on a scatter plot of x and y.

Example of Simple Linear Regression

Suppose we have the following data on hours studied (x) and exam scores (y):

Untitled 12
Hours Studied (x) Exam Score (y)

1 60

2 65

3 70

4 75

5 80

Step 1: Calculate the Means

Mean of x:
x̄ = (1 + 2 + 3 + 4 + 5) / 5 = 3

Mean of y:
ȳ = (60 + 65 + 70 + 75 + 80) / 5 = 70

Step 2: Calculate the Slope (β₁)

β₁ = Σ((xi - x̄ )(yi - ȳ)) / Σ((xi - x̄ )²)
(Use simplified calculations as shown in the original explanation)

Step 3: Calculate the Intercept (β₀)

β₀ = ȳ - β₁x̄

Regression Equation:
y = 40 + 10x

🔢Correlation
Introduction to Multiple Regression and

What is Multiple Regression?

Definition: Multiple regression is an extension of simple linear regression. It
models the relationship between a dependent variable and two or more
independent variables.

Objective: Predict the value of the dependent variable (y) based on the
values of several independent variables (x₁, x₂, x₃,...).

Multiple Regression Equation:

Untitled 13
y = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + ... + βₙxₙ + ε
Where:

y is the dependent variable.

x₁, x₂, ... xₙ are the independent variables.

β₀ is the intercept.

β₁, β₂, ... βₙ are the coefficients for each independent variable.

Applications of Multiple Regression:

Predicting outcomes based on several factors.
Example: Predicting a person’s salary based on years of experience,
education level, and location.

Analyzing the impact of multiple factors on a dependent variable.

Example: Analyzing how advertising, price, and store location affect sales.

🔍Regression
Key Differences Between Simple and Multiple

Feature Simple Linear Regression Multiple Regression

Number of Two or more independent

One independent variable (x)
Variables variables (x₁, x₂, ... xₙ)

Regression y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ +

y = β₀ + β₁x + ε
Equation ε

Predict a dependent variable Predict a dependent variable

Purpose
using one predictor using multiple predictors

📚 Key Takeaways
Correlation helps measure the strength and direction of relationships
between variables.

Simple Linear Regression models the relationship between two variables.

Multiple Regression extends simple regression to more than two variables,

making predictions more accurate by considering multiple factors
simultaneously.

Untitled 14
CH4/////

📚Intervals
Hypothesis Testing and Confidence

🔑 Basic Concepts of Hypothesis Testing

Hypothesis Testing: A statistical method used to make inferences or draw
conclusions about a population based on sample data.

Null Hypothesis (H₀): The hypothesis that there is no significant effect or

difference. It is assumed true until evidence suggests otherwise.

Alternative Hypothesis (H₁ or Ha): The hypothesis that suggests a

significant effect or difference.

🧮 Simple and Alternative Hypotheses

Null Hypothesis (H₀): States there is no effect or no difference.

Example: H₀: μ = 50 (The population mean is 50).

Alternative Hypothesis (H₁ or Ha): States there is an effect or a difference.

Example: H₁: μ ≠ 50 (The population mean is not 50).

🔍Proportion,
Hypothesis Testing of a Single Population Mean,
and Variance
1. Testing the Population Mean
Formula (t-test for population mean):

t = (x̄ - μ₀) / (s / √n)

Where:

x̄ = sample mean

Untitled 15
μ₀ = population mean

s = sample standard deviation

n = sample size

Example: Testing if the average height of a group of students is 170 cm.

2. Testing the Population Proportion

Formula (z-test for population proportion):

z = (p̂ - p₀) / √[p₀(1 - p₀) / n]

Where:

p̂ = sample proportion

p₀ = hypothesized population proportion

n = sample size

Example: Testing if the proportion of students who pass a test is 80%.

3. Testing the Population Variance

Formula (Chi-Square test for variance):
χ² = (n - 1) * s² / σ₀²
Where:

n = sample size

s² = sample variance

σ₀² = hypothesized population variance

Example: Testing if the variance in blood pressure measurements is 25.

🔄TwoHypothesis Testing for the Difference Between

Populations (Mean, Proportion, and Variance)
1. Difference Between Two Population Means (Independent t-
test)
Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]

Untitled 16
Where:

x̄ ₁, x̄ ₂ = sample means

s₁², s₂² = sample variances

n₁, n₂ = sample sizes

Example: Testing if the mean weight loss is different between two groups
using different diets.

2. Difference Between Two Population Proportions (z-test)

Formula:
z = (p̂ ₁ - p̂ ₂) / √[p̂ (1 - p̂ ) * (1/n₁ + 1/n₂)]
Where:

p̂ ₁, p̂ ₂ = sample proportions

n₁, n₂ = sample sizes

p̂ = pooled sample proportion

Example: Testing if the proportion of smokers in two cities is different.

3. Difference Between Two Population Variances (F-test)

Formula:

F = s₁² / s₂²
Where:

s₁², s₂² = sample variances

Example: Testing if the variability in exam scores is different between two

schools.

📊Errors
Interpretation of p-value, Type I, and Type II

1. p-value
Definition: The p-value indicates the strength of the evidence against the
null hypothesis.

Untitled 17
Interpretation:

p-value ≤ α (significance level): Reject the null hypothesis (evidence

suggests the alternative hypothesis is true).

p-value > α: Fail to reject the null hypothesis (not enough evidence for
the alternative hypothesis).

2. Type I Error (α)

Definition: Occurs when the null hypothesis is incorrectly rejected when it
is actually true.

"False positive."

3. Type II Error (β)

Definition: Occurs when the null hypothesis is incorrectly failed to be
rejected when it is actually false.

"False negative."

📏Interpretation
Confidence Intervals: Estimation and

Confidence Interval (CI): A range of values, derived from sample data, that
is likely to contain the population parameter with a certain level of
confidence (e.g., 95%).

Formula for Confidence Interval of the Mean:

CI = x̄ ± z * (s / √n)
Where:

x̄ = sample mean

z = z-value for the desired confidence level (e.g., 1.96 for 95% CI)

s = sample standard deviation

n = sample size

Example: If the sample mean is 50, the sample standard deviation is 10, and
the sample size is 30, the 95% confidence interval for the population mean
is:

Untitled 18
CI = 50 ± 1.96 * (10 / √30) ≈ 50 ± 3.58 → (46.42, 53.58)

Confidence Interval for Proportions:

CI = p̂ ± z * √[p̂ (1 - p̂ ) / n]
Where:

p̂ = sample proportion

z = z-value for the desired confidence level

n = sample size

Confidence Interval for Odds Ratios:

CI = exp(ln(OR) ± z * SE)
Where:

OR = odds ratio

SE = standard error of the log(OR)

z = z-value for the desired confidence level

🔄Hypothesis
Relationship Between Confidence Intervals and
Testing
Key Connection: A confidence interval for a parameter can be used to
perform hypothesis testing.

If the null value (e.g., 0 for the difference in means, 1 for odds ratio)
falls within the confidence interval, we fail to reject the null hypothesis.

If the null value falls outside the confidence interval, we reject the null
hypothesis.

💡 Key Takeaways
Hypothesis Testing allows us to make decisions about population
parameters based on sample data.

p-values help determine whether we should reject the null hypothesis or

not.

Untitled 19
Confidence Intervals provide a range of plausible values for a population
parameter, and they are closely related to hypothesis testing.

Understanding Type I and Type II Errors is crucial for interpreting the

results correctly.

💬 Practice Questions
t-test: Test whether the average salary of employees in a department is
different from $50,000.

Confidence Interval: Estimate the population mean of test scores given a

sample mean of 78, sample standard deviation of 10, and sample size of 25.

p-value: If the p-value from a hypothesis test is 0.03 and α = 0.05, should
you reject or fail to reject the null hypothesis?

//ch5

📚 Parametric and Nonparametric Tests

🔑 What are Parametric and Nonparametric Tests?
Parametric Tests
Definition: Tests that assume the data follows a specific distribution
(usually normal distribution).

Used when: Data is interval or ratio scale and meets assumptions about
population parameters (e.g., mean, standard deviation).

Examples:
t-tests (One-sample, Independent two-sample, Paired sample)

Analysis of Variance (ANOVA)

Nonparametric Tests

Untitled 20
Definition: Tests that do not assume a specific data distribution.

Used when: Data is ordinal or when parametric assumptions (normality) are

not met.

Examples:
Mann-Whitney U test

Wilcoxon signed-rank test

Chi-square test

📊 Student’s t-tests
1. One-Sample t-test
Purpose: Compares the sample mean to a known population mean.

Formula:
t = (x̄ - μ₀) / (s / √n)
Where:

x̄ = sample mean

μ₀ = population mean

s = sample standard deviation

n = sample size

Example:

You want to test if the average height of a group of 30 students is 170 cm

(population mean μ₀ = 170), and the sample mean is 168 cm with a sample
standard deviation of 10 cm.
t = (168 - 170) / (10 / √30) = -0.547

2. Independent Two-Sample t-test

Purpose: Compares the means of two independent groups.

Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]
Where:

Untitled 21
x̄ ₁, x̄ ₂ = sample means

s₁², s₂² = sample variances

n₁, n₂ = sample sizes

Example:
Comparing average test scores of two different teaching methods. Group 1
(n₁ = 25) has a mean score of 75 and standard deviation of 12, and Group 2
(n₂ = 30) has a mean score of 80 and standard deviation of 15.

3. Paired Sample t-test

Purpose: Compares the means of two related groups (e.g., pre-test and
post-test scores).

Formula:
t = (d̄ ) / (sₖ / √n)
Where:

d̄ = mean of the differences

sₖ = standard deviation of the differences

n = number of pairs

Example:

Comparing pre- and post-treatment scores of patients in a clinical study.

🧪 Analysis of Variance (ANOVA)

1. One-Way ANOVA
Purpose: Compares the means of three or more independent groups.

Formula:
F = (Between-group variance) / (Within-group variance)

Example:
Comparing the effectiveness of three different diets on weight loss, where
you have three groups with different diets.

2. Two-Way ANOVA

Untitled 22
Purpose: Examines two factors simultaneously, and their interaction effect
on the dependent variable.

Formula:

The formula for the F-statistic is similar but involves multiple sources of
variance (Factor A, Factor B, and Interaction).

Example:
Testing the effect of teaching methods (A/B/C) and time of day
(Morning/Evening) on test scores.

3. ANCOVA (Analysis of Covariance)

Purpose: Combines ANOVA and regression to assess the effect of one or
more continuous covariates.

Example:

Comparing the effectiveness of different teaching methods on student

performance while controlling for previous academic achievement.

📊 Chi-Square Test
Purpose: Tests the relationship between two categorical variables.

Formula (Chi-Square Test of Independence):

χ² = Σ [(O - E)² / E]
Where:

O = observed frequency

E = expected frequency

Example:

Testing if the distribution of voter preferences (for political parties) is

independent of region.

🧩 Nonparametric Tests
1. Mann-Whitney U Test

Untitled 23
Purpose: A nonparametric test that compares differences between two
independent groups when the dependent variable is ordinal or continuous
but not normally distributed.

Hypothesis:

Null Hypothesis (H₀): The distributions of the two groups are identical.

Alternative Hypothesis (H₁): The distributions of the two groups are

different.

2. Wilcoxon Signed-Rank Test

Purpose: A nonparametric test to compare two related samples (e.g.,
before and after measurements).

Hypothesis:

Null Hypothesis (H₀): The median of the differences is zero.

Alternative Hypothesis (H₁): The median of the differences is not zero.

3. Spearman’s Rank Correlation Coefficient (ρ)

Purpose: Measures the strength and direction of association between two
ranked variables.

Formula:
ρ = 1 - [(6 Σ dᵢ²) / (n(n² - 1))]
Where:

dᵢ = difference in ranks for each pair of data

n = number of observations

Example:
Assessing the relationship between job satisfaction and employee
performance by ranking both variables.

📚 Summary of Key Tests

Student's t-test: Used to compare the means of one or two groups.

ANOVA: Compares the means of three or more groups, or tests for

interaction effects between two factors.

Untitled 24
Chi-Square Test: Tests relationships between categorical variables.

Nonparametric Tests: Used when data doesn’t meet assumptions of

parametric tests. Includes tests like the Mann-Whitney U and Wilcoxon
signed-rank tests.

💡 Classroom Practice
t-test: Compare the average height of students in two classrooms.

ANOVA: Test if three types of training methods have different effects on

student performance.

Chi-Square: Analyze if gender influences voting preferences.

Spearman’s Rank Correlation: Investigate the relationship between

employee experience and salary.

Here’s a Notion-styled documentation for Hypothesis Testing and

Confidence Intervals, formatted with text-based equations for better
compatibility in Notion.

📚Intervals
Hypothesis Testing and Confidence

In today's lesson, we'll be diving into the world of hypothesis testing and
confidence intervals, two key concepts in statistics that help us make decisions
based on data.

🔑 Basic Concepts of Hypothesis Testing

Hypothesis Testing: A statistical method used to make inferences or draw
conclusions about a population based on sample data.

Null Hypothesis (H₀): The hypothesis that there is no significant effect or

difference. It is assumed true until evidence suggests otherwise.

Untitled 25
Alternative Hypothesis (H₁ or Ha): The hypothesis that suggests a
significant effect or difference.

🧮 Simple and Alternative Hypotheses

Null Hypothesis (H₀): States there is no effect or no difference.

Example: H₀: μ = 50 (The population mean is 50).

Alternative Hypothesis (H₁ or Ha): States there is an effect or a difference.

Example: H₁: μ ≠ 50 (The population mean is not 50).

🔍Proportion,
Hypothesis Testing of a Single Population Mean,
and Variance
1. Testing the Population Mean
Formula (t-test for population mean):
t = (x̄ - μ₀) / (s / √n)
Where:

x̄ = sample mean

μ₀ = population mean

s = sample standard deviation

n = sample size

Example: Testing if the average height of a group of students is 170 cm.

2. Testing the Population Proportion

Formula (z-test for population proportion):
z = (p̂ - p₀) / √[p₀(1 - p₀) / n]
Where:

p̂ = sample proportion

p₀ = hypothesized population proportion

n = sample size

Untitled 26
Example: Testing if the proportion of students who pass a test is 80%.

3. Testing the Population Variance

Formula (Chi-Square test for variance):
χ² = (n - 1) * s² / σ₀²
Where:

n = sample size

s² = sample variance

σ₀² = hypothesized population variance

Example: Testing if the variance in blood pressure measurements is 25.

🔄TwoHypothesis Testing for the Difference Between

Populations (Mean, Proportion, and Variance)
1. Difference Between Two Population Means (Independent t-
test)
Formula:
t = (x̄ ₁ - x̄ ₂) / √[(s₁² / n₁) + (s₂² / n₂)]
Where:

x̄ ₁, x̄ ₂ = sample means

s₁², s₂² = sample variances

n₁, n₂ = sample sizes

Example: Testing if the mean weight loss is different between two groups
using different diets.

2. Difference Between Two Population Proportions (z-test)

Formula:
z = (p̂ ₁ - p̂ ₂) / √[p̂ (1 - p̂ ) * (1/n₁ + 1/n₂)]
Where:

p̂ ₁, p̂ ₂ = sample proportions

Untitled 27
n₁, n₂ = sample sizes

p̂ = pooled sample proportion

Example: Testing if the proportion of smokers in two cities is different.

3. Difference Between Two Population Variances (F-test)

Formula:
F = s₁² / s₂²
Where:

s₁², s₂² = sample variances

Example: Testing if the variability in exam scores is different between two

schools.

📊Errors
Interpretation of p-value, Type I, and Type II

1. p-value
Definition: The p-value indicates the strength of the evidence against the
null hypothesis.

Interpretation:

p-value ≤ α (significance level): Reject the null hypothesis (evidence

suggests the alternative hypothesis is true).

p-value > α: Fail to reject the null hypothesis (not enough evidence for
the alternative hypothesis).

2. Type I Error (α)

Definition: Occurs when the null hypothesis is incorrectly rejected when it
is actually true.

"False positive."

3. Type II Error (β)

Definition: Occurs when the null hypothesis is incorrectly failed to be
rejected when it is actually false.

Untitled 28
"False negative."

📏Interpretation
Confidence Intervals: Estimation and

Confidence Interval (CI): A range of values, derived from sample data, that
is likely to contain the population parameter with a certain level of
confidence (e.g., 95%).

Formula for Confidence Interval of the Mean:

CI = x̄ ± z * (s / √n)
Where:

x̄ = sample mean

z = z-value for the desired confidence level (e.g., 1.96 for 95% CI)

s = sample standard deviation

n = sample size

Example: If the sample mean is 50, the sample standard deviation is 10, and
the sample size is 30, the 95% confidence interval for the population mean
is:
CI = 50 ± 1.96 * (10 / √30) ≈ 50 ± 3.58 → (46.42, 53.58)

Confidence Interval for Proportions:

CI = p̂ ± z * √[p̂ (1 - p̂ ) / n]
Where:

p̂ = sample proportion

z = z-value for the desired confidence level

n = sample size

Confidence Interval for Odds Ratios:

CI = exp(ln(OR) ± z * SE)
Where:

OR = odds ratio

Untitled 29
SE = standard error of the log(OR)

z = z-value for the desired confidence level

🔄Hypothesis
Relationship Between Confidence Intervals and
Testing
Key Connection: A confidence interval for a parameter can be used to
perform hypothesis testing.

If the null value (e.g., 0 for the difference in means, 1 for odds ratio)
falls within the confidence interval, we fail to reject the null hypothesis.

If the null value falls outside the confidence interval, we reject the null
hypothesis.

💡 Key Takeaways
Hypothesis Testing allows us to make decisions about population
parameters based on sample data.

p-values help determine whether we should reject the null hypothesis or

not.

Confidence Intervals provide a range of plausible values for a population

parameter, and they are closely related to hypothesis testing.

Understanding Type I and Type II Errors is crucial for interpreting the

results correctly.

💬 Practice Questions
t-test: Test whether the average salary of employees in a department is
different from $50,000.

Confidence Interval: Estimate the population mean of test scores given a

sample mean of 78, sample standard deviation of 10, and sample size of 25.

p-value: If the p-value from a hypothesis test is 0.03 and α = 0.05, should
you reject or fail to reject the null hypothesis?

Untitled 30
📚Applications
Research Project and Practical

In this section, we will explore how advanced statistical methods can be

applied in real-world research projects and practical healthcare applications.

📝Methods
Research Project: Applying Advanced Statistical

Objective
Individual or Group Research Project:

Students will apply advanced statistical methods to a real-world

research problem.

The project will involve data collection, analysis, and interpretation of

results.

Research Project Phases:

1. Topic Selection: Choose a research topic related to biostatistics or
healthcare (e.g., analyzing the effectiveness of a new drug, comparing
different treatment options).

2. Data Collection: Collect primary or secondary data, ensuring it's relevant

and sufficient to answer the research question.

3. Data Analysis: Use statistical methods like hypothesis testing, regression

analysis, ANOVA, and more.

4. Interpretation of Results: Draw conclusions from the data, test hypotheses,

and interpret the significance of the findings.

5. Presentation: Share the results with the class in a professional manner.

Key Statistical Methods to Use:

Regression Analysis: To model relationships between variables.

ANOVA (Analysis of Variance): To compare means across different groups.

Chi-square Test: To examine relationships between categorical variables.

Untitled 31
Confidence Intervals and Hypothesis Testing: To estimate population
parameters and test assumptions.

🏥Techniques
Practical Applications of Advanced Biostatistics
in Healthcare
Real-World Applications
Advanced biostatistics techniques are widely used in healthcare to improve
patient outcomes, optimize treatment strategies, and inform public health
policies. Some common applications include:

1. Clinical Trials: Using statistical methods to analyze the efficacy of drugs or

interventions.

Example: Analyzing the effectiveness of a new vaccine using t-tests

and confidence intervals.

2. Epidemiology: Applying statistical models to study the distribution and

determinants of diseases in populations.

Example: Using regression models to identify risk factors for heart

disease.

3. Public Health Research: Analyzing large datasets to guide health policy

decisions.

Example: Multiple regression to predict hospital readmissions based on

patient characteristics.

4. Healthcare Quality Improvement: Using data to optimize healthcare

services.

Example: ANOVA to assess differences in treatment outcomes across

various clinics.

Common Statistical Techniques Used in Healthcare:

Survival Analysis: To analyze the time to event data (e.g., time to recovery
or death).

Logistic Regression: To model the probability of a binary outcome (e.g.,

disease present or absent).

Untitled 32
Cohort Studies and Case-Control Studies: Common in epidemiology to
explore relationships between exposures and outcomes.

📊 Key Steps in Research Project Implementation

1. Formulating Hypotheses:

Null Hypothesis (H₀): Assumes no effect or difference.

Alternative Hypothesis (H₁ or Ha): Assumes there is an effect or

difference.

2. Data Collection and Preprocessing:

Clean the dataset (remove outliers, handle missing data).

Visualize data distributions (e.g., histograms, box plots).

3. Performing Statistical Analysis:

Descriptive Statistics: Calculate mean, median, mode, standard

deviation.

Inferential Statistics: Use t-tests, chi-square tests, and regression

analysis for hypothesis testing and estimating relationships.

4. Interpreting Results:

Look at p-values and confidence intervals to draw conclusions.

Assess model assumptions (normality, homogeneity of variance, etc.).

5. Creating Visualizations:

Bar Charts, Histograms, and Scatter Plots to present data trends.

Box Plots for comparing distributions across groups.

🎤Findings
Presentation and Discussion of Research

Structure of Presentation:
1. Introduction: Provide background on the research question, objectives, and
the importance of the study.

Untitled 33
2. Methods: Explain the data collection process, statistical methods used, and
any assumptions made.

3. Results: Present findings with the help of visual aids (charts, graphs).

4. Discussion: Interpret results, explain limitations, and suggest future

research directions.

5. Conclusion: Summarize the key takeaways and implications of the findings.

Tips for Effective Presentation:

Clear Visuals: Use simple and easy-to-understand charts/graphs.

Data Interpretation: Focus on explaining what the numbers mean in the

context of the research question.

Engage Your Audience: Keep the presentation interactive by asking

questions or discussing practical implications of the findings.

📈 Example Project Workflow

Research Question: Does a new drug reduce recovery time in patients with flu
symptoms compared to a placebo?

1. Data Collection:

Group 1: Patients receiving the drug (n = 50).

Group 2: Patients receiving placebo (n = 50).

Measure recovery time (in days) for each patient.

2. Statistical Analysis:

Hypothesis: H₀: μ₁ = μ₂ (No difference in recovery times).

Test: Independent two-sample t-test to compare means.

Confidence Interval: Calculate a 95% CI for the difference in recovery

times.

3. Results:

If the p-value is less than 0.05, reject H₀ (suggesting the drug has a
significant effect).

Present the confidence interval for the difference in recovery times.

Untitled 34
4. Discussion:

If the results support the hypothesis, discuss the practical implications

for treatment options.

Address any limitations, such as sample size or potential biases.

💡 Key Takeaways:
Research Projects allow students to apply statistical methods to real-world
problems.

Practical Applications of biostatistics are vast in healthcare, ranging from

clinical trials to public health research.

Effective Presentation of research findings involves clear visuals and

strong data interpretation.

The use of advanced statistical methods is crucial in making informed

decisions in healthcare.

Practice Exercise:
Scenario: You are conducting a study to determine if a new heart
medication reduces cholesterol levels more effectively than the current
standard treatment. Create a research question, outline your hypothesis,
and suggest statistical methods for analysis.

Untitled 35

Statistics For The Behavioral Sciences 10th Edition PDF (Etextbook) All Chapter Instant Download
60% (10)
Statistics For The Behavioral Sciences 10th Edition PDF (Etextbook) All Chapter Instant Download
53 pages
Psychological Assessment (Kaplan Summary C1-3)
No ratings yet
Psychological Assessment (Kaplan Summary C1-3)
12 pages
Linear Regression and Correlation - A Beginner's Guide
100% (5)
Linear Regression and Correlation - A Beginner's Guide
220 pages
Session 3 Distribtion
No ratings yet
Session 3 Distribtion
46 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
QM Formula Class
No ratings yet
QM Formula Class
31 pages
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
No ratings yet
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
46 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
Class 4 SP
No ratings yet
Class 4 SP
23 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
Head First Statistics Bullet Points
No ratings yet
Head First Statistics Bullet Points
28 pages
Chapter 6 Probability Distribution
No ratings yet
Chapter 6 Probability Distribution
22 pages
Distributions and Sampling - Tuesday
No ratings yet
Distributions and Sampling - Tuesday
54 pages
Business Stats TOE 28 Aug
No ratings yet
Business Stats TOE 28 Aug
82 pages
Distributions and Sampling - Wednesday
No ratings yet
Distributions and Sampling - Wednesday
45 pages
What Is Probability
No ratings yet
What Is Probability
8 pages
Key of Week1 - Lecture Notes
No ratings yet
Key of Week1 - Lecture Notes
10 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
History Reporting
No ratings yet
History Reporting
61 pages
Statistics
No ratings yet
Statistics
51 pages
Introduction To Analytics
No ratings yet
Introduction To Analytics
50 pages
1743 Chapter 4 Probability Distribution
No ratings yet
1743 Chapter 4 Probability Distribution
23 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
2.normal Distribution
No ratings yet
2.normal Distribution
69 pages
3 STATISTICAL DISTRIBUTION FUNCTIONS
No ratings yet
3 STATISTICAL DISTRIBUTION FUNCTIONS
4 pages
Statistics and Probability Midterm Reviewer
100% (1)
Statistics and Probability Midterm Reviewer
4 pages
Solutions Spring 2019 HW
No ratings yet
Solutions Spring 2019 HW
67 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Measures of Variability and Position
No ratings yet
Measures of Variability and Position
34 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
D2 Basic Stat
No ratings yet
D2 Basic Stat
53 pages
Probability Distributions
No ratings yet
Probability Distributions
5 pages
Statical Distriution function
No ratings yet
Statical Distriution function
8 pages
L3 Distributions 0521
No ratings yet
L3 Distributions 0521
13 pages
Statistics For Datacience
100% (1)
Statistics For Datacience
7 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
Probility Distribution
No ratings yet
Probility Distribution
41 pages
Continuous Random Variables and Probability Distribution: Learning Objectives
No ratings yet
Continuous Random Variables and Probability Distribution: Learning Objectives
19 pages
STATISTICS & PROBABILITY
No ratings yet
STATISTICS & PROBABILITY
9 pages
Unit3 Business Stats Hypothesis
No ratings yet
Unit3 Business Stats Hypothesis
119 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Descriptive Statistics and Probability Distributions: Session 1
No ratings yet
Descriptive Statistics and Probability Distributions: Session 1
34 pages
Biostatistics Sem V
No ratings yet
Biostatistics Sem V
20 pages
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
No ratings yet
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
44 pages
Introduction Biostat
No ratings yet
Introduction Biostat
32 pages
ERM 4b Final
No ratings yet
ERM 4b Final
114 pages
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-02-05_Reference-Material-I
No ratings yet
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-02-05_Reference-Material-I
45 pages
Statistic Module 2
No ratings yet
Statistic Module 2
15 pages
The Normal Distribution IB
No ratings yet
The Normal Distribution IB
11 pages
L8 Probability Distributionsv2
No ratings yet
L8 Probability Distributionsv2
42 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
Key Concepts Ch1 6
No ratings yet
Key Concepts Ch1 6
2 pages
UNIT 1 Notes by ARUN JHAPATE
No ratings yet
UNIT 1 Notes by ARUN JHAPATE
20 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
BUS_5_prob_dist
No ratings yet
BUS_5_prob_dist
35 pages
UNIT 1 SSMDA NOTES
No ratings yet
UNIT 1 SSMDA NOTES
35 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Demand Analysis: Market Trends and Demand Patterns
No ratings yet
Demand Analysis: Market Trends and Demand Patterns
1 page
Manpower Planning MP Meaning Steps and Techniques Manpower Planning
No ratings yet
Manpower Planning MP Meaning Steps and Techniques Manpower Planning
8 pages
Regresi Ganda
No ratings yet
Regresi Ganda
33 pages
587-Article Text-2592-1-10-20220108
No ratings yet
587-Article Text-2592-1-10-20220108
26 pages
Basic Methods of Comparing Data in Minitab Express
No ratings yet
Basic Methods of Comparing Data in Minitab Express
6 pages
Madura IFM10e IM Ch04
100% (1)
Madura IFM10e IM Ch04
11 pages
Application of The Multiple Regression Analysis For Prediction of Green Compression Strength
No ratings yet
Application of The Multiple Regression Analysis For Prediction of Green Compression Strength
7 pages
Spract 6
No ratings yet
Spract 6
40 pages
Leveraging The Linear Regression Indicator For Trading Success
No ratings yet
Leveraging The Linear Regression Indicator For Trading Success
2 pages
5 - AIML - Module3 - PPT
No ratings yet
5 - AIML - Module3 - PPT
37 pages
Mediating Effects
No ratings yet
Mediating Effects
25 pages
1.2.1 Cost-Terms-Concepts-And-Behavior
No ratings yet
1.2.1 Cost-Terms-Concepts-And-Behavior
4 pages
Linearregression PDF
No ratings yet
Linearregression PDF
30 pages
Exploring Factors Influencing E-Hailing Services in Klang Valley, Malaysia
No ratings yet
Exploring Factors Influencing E-Hailing Services in Klang Valley, Malaysia
15 pages
Introduction To Correlation and Regression Analyses PDF
No ratings yet
Introduction To Correlation and Regression Analyses PDF
12 pages
A Method For Assessing Influence Relationships Among Kpis of Service Systems
No ratings yet
A Method For Assessing Influence Relationships Among Kpis of Service Systems
15 pages
Barth Geography IA jzt447
No ratings yet
Barth Geography IA jzt447
31 pages
Quantile Regression
No ratings yet
Quantile Regression
11 pages
IIMT2641 Group 6 Final Report
No ratings yet
IIMT2641 Group 6 Final Report
24 pages
QUANTITATIVE-METHODS
No ratings yet
QUANTITATIVE-METHODS
53 pages
Chapter 8
No ratings yet
Chapter 8
30 pages
Marta Ass
No ratings yet
Marta Ass
15 pages
Advance Statistics (Simple Linear Regression)
No ratings yet
Advance Statistics (Simple Linear Regression)
3 pages
Usp36 1225 PDF
100% (2)
Usp36 1225 PDF
5 pages
Barclay, S., & Schmitt, N. (2019) - Current Perspectives On Vocabulary Teaching and Learning. Second Handbook of English Language Teaching, 799-819.
No ratings yet
Barclay, S., & Schmitt, N. (2019) - Current Perspectives On Vocabulary Teaching and Learning. Second Handbook of English Language Teaching, 799-819.
14 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
Ch3 Forecasting
No ratings yet
Ch3 Forecasting
53 pages