Basic Biostatistics
Basic Biostatistics
Fig. 1 Pie chart number of deaths by cause among 25–34 and 35- Fig. 2 Component band chart showing number of deaths by
44 year olds — united states, 2003 cause among 25–44 year olds — united states, 1997 and
Spot and rate maps
• They are maps that show the geographical distribution of dis-
ease or other epidemiological events
• Rate maps show:
• Differences in value of cases according to geographical locations
• Prevalence
• Incidence
• mortality
• Spot and rate maps can show data in both a static and interac-
tive form e.g. The World Health Chart
Table 2: Functions of spot and rate maps
Spot maps Rate maps
Show the locations of individual cases Show the distribution of rate (i.e. disease rate)
across different areas
Each point represents a single case Uses colors or shading to show differences in
rates across regions
Spot and rate maps
Fig. 3. Spot map of bacterial meningitis Fig. 4. Rate maps of the world showing incidence
cases, upper west region, 2018–2020. rates and mortality rates of thyroid cancer among
women
Bar charts and line graphs
• Median
• The value of the middle after all the measurements have been put in
order
• Mode
• The value of the measurement in a sample that occurs most frequently
Measures of variability
√
𝑛
∑ ( 𝑥𝑖 − 𝑥 )2
Standard devia- 𝜎 = 𝑖 =1
tion 𝑛− 1
• Standard error Measure of potential error 2
• Estimates efficiency, accuracy and consistency of var ⅈ 𝑎𝑛𝑐 ⅇ =𝜎
a sample
• The higher the SE, the lower the reliability 𝜎
𝑆𝐸 ( 𝜎 𝑥 ) =
• Standard deviation √𝑛
• Measures how far apart values are from the
mean. Low SD indicates that the values are
closer to the mean
• Square root of the variance
• Variance
• Measures the average degree to which each
value is different from the mean
• Square of the standard deviation
Basic concepts of statistical inference
• t-tests
• chi square tests
• correlation
• regression
T-tests
• Tests if two means differ significantly under the null hypothe-
sis
• Helps to understand if the difference observed is due to
chance
• There are different types of T-tests:
• Independent samples t-tests: compares means of two separate
and unrelated groups (e.g. comparing mean ages between two dif-
ferent populations)
• Paired samples t-tests: used for two sets of measurements that
are paired. Takes dependency and relationship between measure-
ments into consideration (e.g. comparing the mean blood pressure
in the same population before and after medication)
• One-sample T-tests: compares the mean of a group to a known or
hypothesized value (e.g. comparing level of pesticides in a popula-
tion compared to the government-approved limits)
Chi-squared tests for cross tabula-
tions
• Chi-squared test is a statistical analysis used to determine if
there is a significant association between two categorical
data
• To perform this test, a cross-tabulation or contingency table
is created
• χ²= Σ [(Observed frequency - Expected frequency)²/ Ex-
pected frequency]
• After calculating chi-squared statistic, it is compared to the
critical value from the chi-squared distribution
• If calculated chi-square is greater than critical value, null
hypothesis is rejected
Correlation
• Quantifies the degree to which two
variables vary together
• results relating to correlation can be:
• Positive
• Negative
• No correlation
• Corelation is typically measured with
correlation coefficient
• The most common is the Pearson cor-
relation coefficient, r.
• r ranges from -1 to +1
• If r is close to +1, positive correlation
• If r is close to -1, negative correlation
• If r is close to 0, zero or weak corre-
lation
• To visualize correlation, spot maps
are the best
Regression
• Statistical method used to examine relationship be-
tween a dependent variable and one or more indepen-
dent variables
• To understand how a change in the independent vari-
able is related to a change in the dependent variable
• Regression models help to estimate values of a depen-
dent variable based on the independent variable
• Linear regression
• Logistic regression
• Cox proportional hazards regression
Linear regression
• Linear regression assumes a straight line relationship between
the dependent and independent variable.
• Mathematically, this model is expressed as:
Y = b0 + b1*X + ε
where:
Y = the dependent variable
X =independent variable
b0 is the y-intercept, which represents the predicted value of Y when X is
zero.
b1 is the slope of the line, indicating how much Y is expected to change for a
one-unit increase in X.
Logistic regression
• Analyses the relationship between a categorical depen-
dent variable and one or more independent variables
• Used for situations where the dependent variable can
take on binary values
• In logistic regression, the relationship between the in-
dependent variables and the dependent variable is
modeled using the logistic or sigmoid function
• Logistic regression estimates the odds ratio, It mea-
sures the ratio of the probability of success to the prob-
ability of failure
Survival analyses and Cox propor-
tional hazards models
• To investigate survival time of patients and predictor
variables (covariates)
• It is a multivariate statistical model
• h(t) = h0(t)*exp(b1x1 + b2x2 + ... + bpxp)
• In this model,
• t represents survival time
• h(t) is the hazard function which is determined by covariates (x1, x2, ...,
xp )
• x1, x2, ..., xp measures the impact of the covariates
• h0 is the baseline hazard
• Censoring affects computation of cox-proportional mod-
els
Kaplan-Meier survival curves
• Used to display time-to-event data especially survival
data
• Proportion range from 1.0 (or 100%) to 0.0 (or 0%)
• Solves the problem of censoring in statistics
• Used in medical field to analyze:
• effectiveness of treatments
• Survival rate of participants
• How to create a Kaplan-Meier survival curve:
• Identify the starting point
• Observe the event
• Calculate the probability of survival
• Plot the curve
Kaplan-meier survival curve
Meta-analysis
• Statistical analysis combining the result of separate
but comparable results
• Used in order to identify an overall trend
• Different from other studies-no new data is collected
• Steps for a successful meta-analysis:
• formulating the problem and study design;
• identifying relevant studies;
• excluding poorly conducted studies or those with
major methodological flaws;
• measuring, combining and interpreting the re-
sults.
Reason for the surge in
Meta-analysis
• ethical reasons,
• cost issues
• the need to have an overall idea of effects in dif-
ferent population
• To make conclusive judgements from aggregate
studies when sample size for a single study is too
small