Bio Statistics 3
Bio Statistics 3
Lecture
Prepared by
Baneen Ahmed
DESCRIPTIVE STATISTICS:
Descriptive statistics are those statistical summarizing methods that
help measure properties of a numerical variable and calculate these
measures as either sample statistics or population parameters.
-A descriptive measure computed from the data of a sample is called
a statistic.
-A descriptive measure computed from the data of a population is
called a parameter
descriptive measure divided into three different groups:
1- Measures of central tendency (measures of location)
2- Measures of dispersion
3- Skewness and kurtosis
Solving Steps :
first, arrange the data in ascending order
Second, multiply each value by its frequency.
Third, apply the values into the mean formula
∑
̅
∑
=
2. Median
An alternative measure of central location, perhaps second in
popularity to the arithmetic mean, is the median.
Suppose there are n observations in a sample. If these observations
are ordered from smallest to largest, then the median is defined as
follows:
Definition: The sample median is
Example: Compute the sample median for the birth weight data
Solution: First arrange the sample in ascending order
Since n=20 is even,
Median = average of the 10th and 11th largest observation =
(3245 + 3248)/2 = 3246.5 g
3. Mode:
It is the value of the observation that occurs with the greatest
frequency. A particular disadvantage is that, with a small number of
observations, there may be no mode. In addition, sometimes, there
may be more than one mode such as when dealing with a bimodal
(two-peaks) distribution. It is even less amenable (responsive) to
mathematical treatment than the median. The mode is not often used
in biological or medical data.
Find the modal values for the following data
a) 22, 66, 69, 70, 73. (no modal value)
b) 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg)
Skewness: If extremely low or extremely high observations are
present in a distribution, then the mean tends to shift towards those
scores. Based on the type of skewness, distributions can be:
a) Negatively skewed distribution: occurs when majority of
scores are at the right end of the curve and a few small scores are
scattered at the left end.
b) Positively skewed distribution: Occurs when the majority of
scores are at the left end of the curve and a few extreme large scores
are scattered at the right end.
c) Symmetrical distribution: It is neither positively nor negatively
skewed. A curve is symmetrical if one half of the curve is the mirror
image of the other half.
In unimodal ( one-peak) symmetrical distributions, the mean, median
and mode are identical. On the other hand, in unimodal skewed
distributions, it is important to remember that the mean, median and
mode occur in alphabetical order when the longer tail is at the left of
the distribution or in reverse alphabetical order when the longer tail
is at the right of the distribution.
Example: Compute the 10th and 90th percentile for the birth weight
data.
Solution: Since 20×0.1=2 and 20×0.9=18 are integers, the 10th and
th
percentiles are defined by
th
percentile = the average of the 2nd and 3rd largest values =
(2581+2759)/2 = 2670 g
th
percentile=the average of the18th and 19th largest values =
(3609+3649)/2 = 3629 grams.
We would estimate that 80 percent of birth weights would fall
between 2670 g and 3629 g, which gives us an overall feel for the
spread of the distribution.
Other quantlies which are particularly useful are the quartiles of the
distribution. The quartiles divide the distribution into four equal
parts.
The second quartile is the median. The interquartile range is the
difference between the first and the third quartiles.
To compute it, we first sort the data, in ascending order, then find
the data values corresponding to the first quarter of the numbers
(first quartile), and then the third quartile. The interquartile range
(IQR) is the distance (difference) between these quartiles.
∑ ( ̅)
√ √
∑( )
σ=√ =population standard deviation
This measure of variation is universally used to show the scatter of
the individual measurements around the mean of all the
measurements in a given distribution.
*
̅
The coefficient of variation is most useful in comparing the
variability of several different samples, each with different means.
This is because a higher variability is usually expected when the
mean increases, and the CV is a measure that accounts for this
variability.
The coefficient of variation is also useful for comparing the
reproducibility of different variables. CV is a relative measure free
from unit of measurement. CV remains the same regardless of what
units are used, because if the units are changed by a factor C, both
the mean and SD change by the factor C; the CV, which is the ratio
between them, remains uncharged.
Example: Compute the CV for the birth weight data when they are
expressed in either grams or ounces.
CV=100% * ̅ = =
If the data were expressed in ounces, Χ =111.71 oz, S=15.7 oz,
Then CV = 100%* ̅
= =