Biostats - PST 426.sister HO Fawole
Biostats - PST 426.sister HO Fawole
AND BIOSTATISTICS
PST 426
• Probability
• Normal distribution
• Sampling methods
• Tests of hypothesis
• Measurement of health
Summarization and
presentation of data
Reduction, summarization and
presentation of data
• Objects of measurements are referred to as cases or observations
❖ Nominal – describes data of finite set of possible values without any particular
order. Examples include blood groups (O,A,B,AB), eye colour, marital status,
gender etc.
❖ Ordinal – describes data that are ranked and ordered. Examples include size of
container (small, medium, large); fatigue level (none, mild, moderate, severe). Also
Likert scale/responses may have the options ‘strongly agree’, ‘agree’, ‘neither agree
nor disagree’, ‘disagree’ and ‘strongly disagree’ – ordered scale
*Note that the difference in rank or ordered data may not be the same – note that
we cannot quantify these differences.
Reduction, summarization and
presentation of data (contd)
Easiest way to tell whether data is metric or not
NO YES
Adapted from Bowers (2014). Medical Statistics from Scratch: An Introduction for Health
Professionals
Reduction, summarization and
presentation of data (contd)
• A set of data can be difficult to interpret because
it contains a lot of information.
Ordinal data
Example of frequency tables
Parity Frequency Birthweight (g) Frequency
(number of (number of mothers) (number of
pregnancies) n=100 mothers)
0 49 n=100
1 18 1500-1999 2
2 17 2000-2499 5
3 11 2500-2999 27
4 2 3000-3499 28
5 1 3500-3999 27
6 1 4000-4499 9
7 0 4500-4999 2
8 0 Continuous metric data
9 0
10 1
Discrete metric data
Example of frequency tables
Parity Frequency
(number of (number of mothers)
pregnancies) n=100
0 49
1 18
2 17
3 11
4 2
5 1
>6 2
Open-ended groups
Histogram
• Once the frequency table or distribution of
raw data has been tabulated, a variety of
techniques is available for graphical
presentation of given set of measurements.
• For most continuous data sets, the best
diagram to use is histogram.
• Histogram have bars that are drawn to touch
each other – thus reflecting the continuity of
the data that make up the bars.
Histogram
Bar charts
• When the data are discrete and the
frequencies refer to individual values, they are
displayed graphically using a bar
charts/graphs.
• Bar charts involves plotting the frequency of
each category and drawing a bar with heights
of bars representing frequencies.
• Bar charts are drawn with a gap between
neighbouring bars so that they are easily
distinguished from histograms.
Frequency polygons
• For frequency polygon only the tops of the
bars are marked, and then these points are
joined by straight lines.
• Frequency polygons are particularly useful for
comparing two or more sets of data.
Frequency polygon
Summary measures and
measures of location
• In addition to the graphical techniques, it is often
useful to obtain quantitative summaries of
certain aspects of the data.
• Most simple summary measurements can be
divided into two types; firstly quantities which
are “typical” of the data, and secondly, quantities
which summarise the variability of the data.
• The former are known as measures of location
and the latter as measures of spread.
Sample mean:
• Significant difference
• Significant correlations
• Regression analysis
Parametric and non-
parametric tests
Parametric Test
• The parametric test is the hypothesis test
which provides generalisations for making
statements about the mean of the parent
population
• The statistic rests on the underlying
assumption that there is the normal
distribution of variable and the mean is known
or assumed to be known
• It is assumed that the variables of interest, in
the population are measured on an interval
scale.
• Based on the parameters of the normal curve.
• Data must meet certain assumptions, or
parametric statistics cannot be calculated.
Nonparametric Test
• The nonparametric test is defined as the hypothesis
test which is not based on underlying assumptions,
i.e. it does not require population’s distribution to
be denoted by specific parameters.
• The test is mainly based on differences in medians.
• Hence, it is alternately known as the distribution-
free test.
• The test assumes that the variables are measured
on a nominal or ordinal level.
• It is used when the independent variables are non-
metric.
• Nonparametric statistics are not based on the parameters
of the normal curve.
• If data violate the assumptions of a usual parametric, the
nonparametric equivalent of the parametric test is used.
• Also consider using nonparametric equivalent tests when
there is limited sample sizes (e.g., n < 30) or when there are
outliers that cannot be removed.
• Though nonparametric statistical tests have more flexibility
than do parametric statistical tests, nonparametric tests are
not as robust; therefore, most statisticians recommend that
when appropriate, parametric statistics are preferred.
Selecting statistical tests
• There is a wide range of statistical tests.
• The decision of which statistical test to use
depends on the research design, the
distribution of the data, and the type of
variable.
• In general, if the data is normally distributed,
the parametric tests are preferred.
• If the data is non-normal, the non-parametric
tests are selected.
Common statistical tests and their uses