0% found this document useful (0 votes)

70 views

Bio Statistics 3

This document provides an overview of descriptive statistics. It discusses measures of central tendency including the mean, median, and mode. It defines these terms and provides examples of calculating each measure. The document also covers measures of dispersion such as range, interquartile range, standard deviation, and variance. It defines these statistical concepts and illustrates how to compute them using example data sets. Overall, the document serves as an introductory guide to foundational descriptive statistics techniques for summarizing and analyzing sample data.

Uploaded by

Moos Light

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views

Bio Statistics 3

Uploaded by

Moos Light

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Biostatistics

Lecture
Prepared by
Baneen Ahmed
DESCRIPTIVE STATISTICS:
Descriptive statistics are those statistical summarizing methods that
help measure properties of a numerical variable and calculate these
measures as either sample statistics or population parameters.
-A descriptive measure computed from the data of a sample is called
a statistic.
-A descriptive measure computed from the data of a population is
called a parameter
descriptive measure divided into three different groups:
1- Measures of central tendency (measures of location)
2- Measures of dispersion
3- Skewness and kurtosis

MEASURES OF CENTRAL TENDENCY:

Three commonly used measures are:
1-the arithmetic mean, (also known simply as the mean or average),
2-the median
3-the mode.

-Definition of the mean

The mean is a number obtained by adding all the values in a
population or sample and dividing by the number of values that are
added.
General Formula for the Mean:
Properties of the Mean
1. Uniqueness. For a given set of data there is one and only one
arithmetic mean.
2. Since each and every value in a set of data enters into the
computation of the mean, it is affected by each value. Extreme values,
therefore, have an influence on the mean and, in some cases, can so
distort it that it becomes undesirable as a measure of central
tendency.
Example :
Suppose the five physicians are surveyed to determine their charges
for a certain procedure. Assume that they report these charges:
$75, $75, $80, $80, and $280.
The mean charge for the five physicians is found to be $118,
a value that is not very representative of the set of data as a whole.
The single h atypical value had the effect of inflating the mean(

If the values occur in frequencies then the mean can be

calculated using the following formula
∑
̅
∑
=

Solving Steps :
first, arrange the data in ascending order
Second, multiply each value by its frequency.
Third, apply the values into the mean formula
∑
̅
∑

=
2. Median
An alternative measure of central location, perhaps second in
popularity to the arithmetic mean, is the median.
Suppose there are n observations in a sample. If these observations
are ordered from smallest to largest, then the median is defined as
follows:
Definition: The sample median is

(1) The ( ) observations if n is odd.

(2) The average of the ( ) and ( ) observations if n is even.

The rational for these definitions is to ensure an equal number of

sample points on both sides of the sample median.

The median is defined differently when n is even and odd because it

is impossible to achieve this goal with one uniform definition. For
samples with an add sample size, there is a unique central point; for
example, for sample of size 7, the fourth largest point is the central
point in the sense that 3 points are both smaller and larger than it.
For samples with an even size, there is no unique central point and
the middle 2 values must be averaged. Thus, for sample of size 8,the
fourth and the fifth largest points would be averaged to obtain the
median, since neither is the central point.

Example: Compute the sample median for the birth weight data
Solution: First arrange the sample in ascending order
Since n=20 is even,
Median = average of the 10th and 11th largest observation =
(3245 + 3248)/2 = 3246.5 g

Example: Consider the following data, which consists of white blood

counts taken on admission of all patients entering a small hospital on
a given day. Compute the median white-blood count (× ).

Solution: First, order the sample as follows. 3,5,7,8,8,9,10,12,35.

Since n is odd, the sample median is given by the 5th, ((9+1)/2)th,
largest point, which is equal to 8.
The principal strength of the sample median is that it is insensitive to
very large or very small values.
In particular, if the second patient in the above data had a white
blood count of 65,000 rather than 35,000, the sample median would
remain unchanged, since the fifth largest value is still 8,000.
Conversely the arithmetic mean would increase dramatically from
10,778 in the original sample to 14,111 in the new sample.

The principal weakness of the sample median is that it is determined

mainly by the middle points in a sample and is less sensitive to the
actual numerical values of the remaining data points.

3. Mode:
It is the value of the observation that occurs with the greatest
frequency. A particular disadvantage is that, with a small number of
observations, there may be no mode. In addition, sometimes, there
may be more than one mode such as when dealing with a bimodal
(two-peaks) distribution. It is even less amenable (responsive) to
mathematical treatment than the median. The mode is not often used
in biological or medical data.
Find the modal values for the following data
a) 22, 66, 69, 70, 73. (no modal value)
b) 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg)
Skewness: If extremely low or extremely high observations are
present in a distribution, then the mean tends to shift towards those
scores. Based on the type of skewness, distributions can be:
a) Negatively skewed distribution: occurs when majority of
scores are at the right end of the curve and a few small scores are
scattered at the left end.
b) Positively skewed distribution: Occurs when the majority of
scores are at the left end of the curve and a few extreme large scores
are scattered at the right end.
c) Symmetrical distribution: It is neither positively nor negatively
skewed. A curve is symmetrical if one half of the curve is the mirror
image of the other half.
In unimodal ( one-peak) symmetrical distributions, the mean, median
and mode are identical. On the other hand, in unimodal skewed
distributions, it is important to remember that the mean, median and
mode occur in alphabetical order when the longer tail is at the left of
the distribution or in reverse alphabetical order when the longer tail
is at the right of the distribution.

Measures of dispersion (variation):

1. Range
The range is defined as the difference between the highest and
smallest observation in the data. It is the crudest measure of
dispersion. The range is a measure of absolute dispersion and as such
cannot be usefully employed for comparing the variability of two
distributions expressed in different units.

Range = xmax – xmin

Where , xmax = highest (maximum) value in the given
distribution.
Xmin = lowest (minimum) value in the given distribution
In our example given above ( the two data sets)
* The range of data in set 1 is 70-
* The range of data in set 2 is 53-

- The pth percentile

is defined by:
(1) The (k+1)th largest sample point if np/100 is not an integer
(where k is the largest integer less than np/100)
(2) The average of the (np/100)th and (np/100 + 1)th largest
observation is np/100 is an integer.

The spread of a distribution can be characterized by specifying

several percentiles. For example, the 10th and 90th percentiles are
often used to characterize spread. Percentages have the advantage
over the range of being less sensitive to outliers and of not being
much affected by the sample size (n).

Example: Compute the 10th and 90th percentile for the birth weight
data.
Solution: Since 20×0.1=2 and 20×0.9=18 are integers, the 10th and
th
percentiles are defined by
th
percentile = the average of the 2nd and 3rd largest values =
(2581+2759)/2 = 2670 g
th
percentile=the average of the18th and 19th largest values =
(3609+3649)/2 = 3629 grams.
We would estimate that 80 percent of birth weights would fall
between 2670 g and 3629 g, which gives us an overall feel for the
spread of the distribution.
Other quantlies which are particularly useful are the quartiles of the
distribution. The quartiles divide the distribution into four equal
parts.
The second quartile is the median. The interquartile range is the
difference between the first and the third quartiles.
To compute it, we first sort the data, in ascending order, then find
the data values corresponding to the first quarter of the numbers
(first quartile), and then the third quartile. The interquartile range
(IQR) is the distance (difference) between these quartiles.

Example: Given the following data set (age of patients):-

find the interquartile range!

1. sort the data from lowest to highest

2. find the bottom and the top quarters of the data
3. find the difference (interquartile range) between the two quartiles.
st quartile = The {(n+1)/4}th observation = (2.25)th observation
= 21 + (23-21)x .25 = 21.5

rd quartile = {3/4 (n+1)}th observation = (6.75)th observation

= 32 + (42-32)x .75 = 39.5

Hence, IQR = 39.5 -

The interquartile range is a preferable measure to the range. Because

it is less prone to distortion by a single large or small value. That is,
outliers in the data do not affect the inerquartile range. Also, it can
be computed when the distribution has open-end classes

-Standard Deviation and Variance:

Definition: The sample and population standard deviations denoted
by S and σ (by convention) respectively are defined as follows:

∑ ( ̅)
√ √

∑( )
σ=√ =population standard deviation
This measure of variation is universally used to show the scatter of
the individual measurements around the mean of all the
measurements in a given distribution.

Note that the sum of the deviations of the individual observations of a

sample about the sample mean is always 0.
The square of the standard deviation is called the variance. The
variance is a very useful measure of variability because it uses the
information provided by every observation in the sample and also it
is very easy to handle mathematically. Its main disadvantage is that
the units of variance are the square of the units of the original
observations.

Thus if the original observations were, for example, heights in cm

then the units of variance of the heights are cm . The easiest way
around this difficulty is to use the square root of the variance (i.e.,
standard deviation) as a measure of variability.

Example: Areas of sprayable surfaces with DDT from a sample of 15

houses are as follows (m ) :

Find the variance and standard deviation of the above distribution.

The mean of the sample is 125 m .

Variance (sample) = s = Σ(xi –x) /n-

={ - - … - } -

= 178.71 (square metres)

Hence, the standard deviation = = 13.37 m .

- The coefficient of variation:

The standard deviation is an absolute measure of deviation of
observations around their mean and is expressed with the same unit
of the data. Due to this nature of the standard deviation it is not
directly used for comparison purposes with respect to variability.
Therefore, it is useful to relate the arithmetic mean and SD together,
since, for example, a standard deviation of 10 would mean something
different conceptually if the arithmetic mean were 10 than if it were
1000. A special measure called the coefficient of variation, is often
used for this purpose.
Definition: The coefficient of variation (CV) is defined by:

*
̅
The coefficient of variation is most useful in comparing the
variability of several different samples, each with different means.
This is because a higher variability is usually expected when the
mean increases, and the CV is a measure that accounts for this
variability.
The coefficient of variation is also useful for comparing the
reproducibility of different variables. CV is a relative measure free
from unit of measurement. CV remains the same regardless of what
units are used, because if the units are changed by a factor C, both
the mean and SD change by the factor C; the CV, which is the ratio
between them, remains uncharged.

Example: Compute the CV for the birth weight data when they are
expressed in either grams or ounces.

Solution: in grams Χ = 3166.9 g, S = 445.3 g,

CV=100% * ̅ = =
If the data were expressed in ounces, Χ =111.71 oz, S=15.7 oz,

Then CV = 100%* ̅
= =

The third lecture has ended

I wish you all the best.

المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
Stat Handout
No ratings yet
Stat Handout
7 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
Week_3
No ratings yet
Week_3
9 pages
Statistics 3: DR Taher
No ratings yet
Statistics 3: DR Taher
38 pages
Summarizing Data
No ratings yet
Summarizing Data
49 pages
Toaz - Info Ge 4 Topic 2 Statistics PR
No ratings yet
Toaz - Info Ge 4 Topic 2 Statistics PR
11 pages
Ge 4 Topic 2-Statistics
67% (3)
Ge 4 Topic 2-Statistics
11 pages
Introduction To Statistics Lecture 7
No ratings yet
Introduction To Statistics Lecture 7
32 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
9 pages
Measures of Central Tendency and Dispers
No ratings yet
Measures of Central Tendency and Dispers
9 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Data Management
No ratings yet
Data Management
7 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
Module 5 Ge 114
No ratings yet
Module 5 Ge 114
15 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
MATM111
No ratings yet
MATM111
8 pages
Chapter 03 NUMERICAL DESCRIPTORS
No ratings yet
Chapter 03 NUMERICAL DESCRIPTORS
54 pages
3jane - Data Description Finala4
No ratings yet
3jane - Data Description Finala4
14 pages
Lecture 2 - 1 Measures of Location
No ratings yet
Lecture 2 - 1 Measures of Location
16 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
UKP6053 L3 Descriptive Statsitcs
100% (1)
UKP6053 L3 Descriptive Statsitcs
92 pages
Midterm Exam Reviewer
No ratings yet
Midterm Exam Reviewer
12 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Statistics
100% (4)
Statistics
124 pages
Mmw Data Management
No ratings yet
Mmw Data Management
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
41 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
1.2 Mathematical Presentation of Data
No ratings yet
1.2 Mathematical Presentation of Data
28 pages
Lecure-2 Descriptive Biostatistics
No ratings yet
Lecure-2 Descriptive Biostatistics
102 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Measures of location.....
No ratings yet
Measures of location.....
8 pages
Lecture-3&4- Measure of Centeral T
No ratings yet
Lecture-3&4- Measure of Centeral T
171 pages
Measure of Central Tendency Dispersion A
No ratings yet
Measure of Central Tendency Dispersion A
8 pages
4.1 Introduction To Statistics SK 1
No ratings yet
4.1 Introduction To Statistics SK 1
76 pages
L3 Numerical Summary Measures
No ratings yet
L3 Numerical Summary Measures
44 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Mathematics As A Tool New
No ratings yet
Mathematics As A Tool New
62 pages
Business Statistics: Measures of Central Tendency
No ratings yet
Business Statistics: Measures of Central Tendency
44 pages
03 - BIOE 211 - Basic Demog and Health Indicator Formula
No ratings yet
03 - BIOE 211 - Basic Demog and Health Indicator Formula
29 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
4.12 Measure of Central Tendency: The Mean
No ratings yet
4.12 Measure of Central Tendency: The Mean
4 pages
4x @6ote ) 'Btda2@m
No ratings yet
4x @6ote ) 'Btda2@m
55 pages
Mean Median Mode
No ratings yet
Mean Median Mode
10 pages
Lecture 3 Summarizing Data Measures of Central Location and Sampling
No ratings yet
Lecture 3 Summarizing Data Measures of Central Location and Sampling
53 pages
Measures of Central Tendency Lecture 3
No ratings yet
Measures of Central Tendency Lecture 3
68 pages
Lesson 4 MMW
No ratings yet
Lesson 4 MMW
33 pages
2.data Description
No ratings yet
2.data Description
57 pages
Psychology Project
No ratings yet
Psychology Project
14 pages
STAE lecture notes_LU3_Annotated
No ratings yet
STAE lecture notes_LU3_Annotated
10 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
CHAPTER All
No ratings yet
CHAPTER All
96 pages
Research Methods in Psychology 10th Edition Shaughnessy Solutions Manual - Instant Download To Read The Complete Content
100% (5)
Research Methods in Psychology 10th Edition Shaughnessy Solutions Manual - Instant Download To Read The Complete Content
51 pages
MATHMWORLD RESEARCH-PROPOSAl
No ratings yet
MATHMWORLD RESEARCH-PROPOSAl
23 pages
3T2324 Module 2 - 3
No ratings yet
3T2324 Module 2 - 3
43 pages
Ebooks File (Ebook PDF) Probability, Statistics, and Random Signals by Charles Boncelet All Chapters
100% (4)
Ebooks File (Ebook PDF) Probability, Statistics, and Random Signals by Charles Boncelet All Chapters
41 pages
Lab Manual B Mba
100% (1)
Lab Manual B Mba
24 pages
Common Statistical Densities: Appendix 1
No ratings yet
Common Statistical Densities: Appendix 1
59 pages
Chi Square Statistics
No ratings yet
Chi Square Statistics
7 pages
CH 8
No ratings yet
CH 8
20 pages
1642155466SHF QT FULL PDF
No ratings yet
1642155466SHF QT FULL PDF
40 pages
Nauli Mazaya Siregar - Regresi - 11 November 2019
No ratings yet
Nauli Mazaya Siregar - Regresi - 11 November 2019
10 pages
WEEK 10: Strategies in Epidemiology Measurements and Measurement of Errors Assessing Evidence of Disease Causation
No ratings yet
WEEK 10: Strategies in Epidemiology Measurements and Measurement of Errors Assessing Evidence of Disease Causation
3 pages
03preprocessing Part2
No ratings yet
03preprocessing Part2
15 pages
173 Funtions of Excel
No ratings yet
173 Funtions of Excel
182 pages
Major Premise: All Students Attend School Regularly Minor Premise: John Is A Student Conclusion: John Attends School Regularly
No ratings yet
Major Premise: All Students Attend School Regularly Minor Premise: John Is A Student Conclusion: John Attends School Regularly
41 pages
Statistics Group Work #2
No ratings yet
Statistics Group Work #2
7 pages
Research Methods
No ratings yet
Research Methods
250 pages
ITLS5050 Data Set 2 Workers Vs Production
No ratings yet
ITLS5050 Data Set 2 Workers Vs Production
38 pages
The Influence of Social Media Influencers On Consumer Behaviour
No ratings yet
The Influence of Social Media Influencers On Consumer Behaviour
25 pages
Mrsdev-1005 - CMS İçi̇n Endpoint Hazirlanmasi Ve Düzenlenmesi̇
No ratings yet
Mrsdev-1005 - CMS İçi̇n Endpoint Hazirlanmasi Ve Düzenlenmesi̇
3 pages
Lab Sessions MINE 467 F20
No ratings yet
Lab Sessions MINE 467 F20
23 pages
Quantitative Research
100% (1)
Quantitative Research
7 pages
Timevarying in R
No ratings yet
Timevarying in R
10 pages
Analysis of Factors Affecting The Backlog Close-Out Project and Its Implications On The Performance of Non-Financial Engineering Departments in Petrochemical Companies
No ratings yet
Analysis of Factors Affecting The Backlog Close-Out Project and Its Implications On The Performance of Non-Financial Engineering Departments in Petrochemical Companies
23 pages
IVT Network - FAQ - Statistics in Validation - 2017-07-05
No ratings yet
IVT Network - FAQ - Statistics in Validation - 2017-07-05
2 pages
Kimia Komputasi
No ratings yet
Kimia Komputasi
4 pages
Med Stat
No ratings yet
Med Stat
94 pages
T - Test For Dependent or Correlated Samples - P7A, PS7B
No ratings yet
T - Test For Dependent or Correlated Samples - P7A, PS7B
17 pages
WWW - Manaresults.Co - In: (Common To CSE, IT)
No ratings yet
WWW - Manaresults.Co - In: (Common To CSE, IT)
2 pages
Chapter Four Sample Design and Procedure
No ratings yet
Chapter Four Sample Design and Procedure
13 pages

Bio Statistics 3

Uploaded by

Bio Statistics 3

Uploaded by

Biostatistics

MEASURES OF CENTRAL TENDENCY:

-Definition of the mean

If the values occur in frequencies then the mean can be

(1) The ( ) observations if n is odd.

(2) The average of the ( ) and ( ) observations if n is even.

The rational for these definitions is to ensure an equal number of

The median is defined differently when n is even and odd because it

Example: Consider the following data, which consists of white blood

Solution: First, order the sample as follows. 3,5,7,8,8,9,10,12,35.

The principal weakness of the sample median is that it is determined

Measures of dispersion (variation):

Range = xmax – xmin

- The pth percentile

The spread of a distribution can be characterized by specifying

Example: Given the following data set (age of patients):-

find the interquartile range!

1. sort the data from lowest to highest

rd quartile = {3/4 (n+1)}th observation = (6.75)th observation

Hence, IQR = 39.5 -

The interquartile range is a preferable measure to the range. Because

-Standard Deviation and Variance:

Note that the sum of the deviations of the individual observations of a

Thus if the original observations were, for example, heights in cm

Example: Areas of sprayable surfaces with DDT from a sample of 15

Find the variance and standard deviation of the above distribution.

The mean of the sample is 125 m .

Variance (sample) = s = Σ(xi –x) /n-

= 178.71 (square metres)

- The coefficient of variation:

Solution: in grams Χ = 3166.9 g, S = 445.3 g,

The third lecture has ended

You might also like