L2-Types of Data, Central Tendency and Dispersion-2
L2-Types of Data, Central Tendency and Dispersion-2
Descriptive Statistics
MERITS OF
STATISTICS
Summarization of data
Grouping and presentation
Facilitates comparison
Evaluation
DEMERITS OF
STATISTICS
Concerned only with groups and
neglecting individuals
2- Qualitative variables;
* NOMINAL e.g. sex, blood group,
* ORDINAL e.g. social class, disease
severity or grade
SOURCES OF ERROR
DURING DATA COLLECTION
1- Faulty technique
2- Defective instrument
3- Inter-observer error
4- Intra-observer error
5- Typographical error
Types of Variables:
Overview
Categorical Quantitative
Categories.
treatment groups
exposure groups
disease status
Categorical Variables
Dichotomous (binary) – two levels
Dead/alive
Treatment/placebo
Disease/no disease
Exposed/Unexposed
Heads/Tails
Pulmonary Embolism (yes/no)
Male/female
Categorical Variables
Nominal variables – Named
categories Order doesn’t matter!
The blood type of a patient (O, A, B, AB)
Marital status
Occupation
Categorical Variables
Ordinal variable – Ordered categories.
Order matters!
Staging in breast cancer as I, II, III, or IV
Birth order—1st, 2nd, 3rd, etc.
Level of education
Ratings on a scale from 1-5
Likert scale
Age in categories (10-20, 20-30, etc.)
Level of socio economic standard
Quantitative Variables
Numerical variables; may be
arithmetically manipulated.
Counts
Time
Age
Height
Quantitative Variables
Discrete Numbers – a limited set of
distinct values, such as whole numbers.
Number of new AIDS cases in CA in a year
(counts)
Years of school completed
The number of children in the family (cannot have
a half a child!)
The number of deaths in a defined time period
(cannot have a partial death!)
Roll of a die
Quantitative Variables
Continuous Variables - Can take on any
number within a defined range.
Time-to-event (survival time)
Age
Blood pressure
Serum insulin
Speed of a car
Income
Shock index (Kline et al.)
Looking at Data
How are the data distributed?
Where is the center?
What is the range?
What’s the shape of the distribution (e.g.,
Gaussian, binomial, exponential, skewed)?
Continuous variables
Box Plot
Histogram
Types of graphs
Line graph
Cumulative frequency curve
Bar graph
Histogram
Frequency polygon
Pie chart
Squares
Figures and shapes
Linear graph
It is used when plotting
quantitative data against a time
factor
Another ex. Fever and fluid chart
different in different
year
معدلالحدوثلكل
100000
25
20
15
10
0
1996` 1997 1998 1999 2000
السنوات
Bar Chart
Used for categorical variables to
show frequency or proportion in
each category.
Translate the data from frequency
tables into a pictorial
representation…
Bar Chart: categorical
variables
no
ye
s
Bar Chart for SI categories
200.0
Much easier to
183.3
extract
166.7
information from
Number of Patients
150.0
a bar chart than
133.3
from a table!
116.7
100.0
83.3
66.7
50.0
33.3
16.7
0.0
1 2 3 4 5 6 7 8 9 10
اال ت
عدد الح
12
10
ا نثى
ذكر
8
6
4
2
0
<=3 4-6 7-9 10 - 12 13 - 15 16 - 18 19 - 21 22 - 24
م
المجموعاتال عمرية(يو )
Pie chart
Used for qualitative data if the
distribution is in %
الدائرة
وهي أجزاء من الكل
11%
21%
22%
27%
19%
8.3
0.0
0.0 0.7 1.3 2.0
SI
Histogram
6.0 100 bins (too much detail)
4.0
Percent
2.0
0.0
0.0 0.7 1.3 2.0
SI
Histogram
200.0 2 bins (too little detail)
133.3
Percent
66.7
0.0
0.0 0.7 1.3 2.0
SI
Some histograms from our
class data (n=18 so far…)
Starting with politics…
Feelings about math and
writing…
Measures of Central
Tendency
Basic Measures:
Arithmetic Mean
Median
Mode
Arithmetic Mean
Definition:
Summation of values
divided by its number
Arithmetic Mean
Example:
Monthly income of 5 employees are:
100, 300, 400, 200, 500
L.E. Calculate their arithmetic
mean:
Arithmetic mean = sum of values / n
= 100+ 300+ 400+ 500+ 200
1500 / 5 = 300 L.E.
Arithmetic Mean
Example
Monthly income of 5 employees are:
100 ; 200 ; 300 ; 400 ; 1500
Calculate their mean:
Arithmetic mean = sum of values / n =
(100 + 200 + 300 + 400 + 1500) / 5 =
500 L.E.
2500 / 5 = 500 L.E.
arithmetic
N.B. extreme value affect the value of
mean where we can use the median
Median
Definition:
The value that divides the
data into two equal sets after
arrangement in descending or
ascending order.
Median
To calculate the median you need to:
1. Arrange the values in ascending or
descending order.
2. Determine location of median:
n+1/2
* Odd number, the location is direct
* Even number, the location is midpoint between
two values
3. Determine the Value of the median
* Odd number, the value is direct
* Even number, the value is sum of previous two
values/2.
Median
Example
Definition
The most frequent
value
Mode
Example
Number of children of some
families were 2, 4, 3, 0, 1, 3
Calculate the mode
THANK YOU
Measures of central
tendency
Mean
Median
Mode
Central Tendency
Mean – the average; the balancing
point
∑X i
17 + 19 + 21 + 22 + 23 + 23 + 23 + 38
i =1
X= = = 23.25
n 8
Mean
The mean is affected by extreme values
(outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Central Tendency
Median – the exact middle value
Calculation:
If there are an odd number of
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Central Tendency
Mode – the value that occurs most
frequently
Mode: example
Some data:
Age of participants: 17 19 21 22 23 23 23 38
Mode = 23 (occurs 3
times)
Measures of
Variation/Dispersion
Range
Percentiles/quartiles
Interquartile range
Standard deviation/Variance
Range
9.3
Percent
4.7
0.0
0.0 33.3 66.7 100.0
AGE (Years)
Quartiles
25% 25% 25% 25%
Q1 Q2 Q3
Median
minimum Q1 (Q2) Q3 maximum
15 35 49 65 94
Interquartile range
= 65 – 35 = 30
Variance
Average (roughly) of squared
deviations of values from the mean
2
(x
i X) 2
S i
n 1
Why squared deviations?
Adding deviations will yield a sum of
0.
Absolute values are tricky!
Squares eliminate the negatives.
Result:
Increasing contribution to the variance
as you go farther from the mean.
Standard Deviation
Most commonly used measure of
variation
Shows variation about the mean
Has the same units as the original
n
data
(x i
2
X)
S i
n 1
Calculation Example:
Sample Standard
Deviation
Age data (n=8) : 17 19 21 22 23 23 23 38
Estimation method: if
the distribution is bell
9.3
shaped, the range is
Percent
0.0
0.0 33.3 66.7 100.0
AGE (Years)
Std. Deviation age
Variation Section of SI
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.570
Symbol Clarification
S = Sample standard deviation
(example of a “sample statistic”)
= Standard deviation of the
entire population (example of a
“population parameter”) or from a
theoretical probability distribution
X = Sample mean
µ = Population or theoretical mean
**The beauty of the normal curve:
68% of
the data