Data, Variables & Presentation
Data, Variables & Presentation
Fundamentals Stats
2
What is bio or health Statistics?
• Biostatistics may be defined as application of
statistical methods to medical, biological,
psycho and public health related problems.
• It is the scientific treatment given to the medical
data derived from group of individuals or patients
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
3
Role of Statistics
Statistics finds an extensive use in PH and CM.
Statistical methods are foundations for public
health administrators to understand what is
happening to the population under their care at
community level as well as individual level.
Reliable information helps;
1. Assess community needs
2. Understand socioeconomic & psycho
determinants of health
3. Plan experiment in health research and analyze
results
4. Study diagnosis and prognosis of the disease for
taking effective action
5. Scientifically test the efficacy of new medicines
and 4
Sources of
data
Comprehensive Sample
5
Organizing Data
Line List or Line Listing
•Whether one is conducting routine
surveillance, investigating an outbreak, or
conducting a study…compile information in a
organized manner.
•Organized like a spreadsheet with rows and
columns
•Each row is called record or observation
•Each column is called a variable and contain
information about gender, race, or DOB, etc.
6
7
Variable
• A variable can be any characteristic
that differs from a person to person
such as height, sex, Dengue +/-, fever
present/absent.
• Most research begins with a general
question about the relationship between
two variables for a specific group of
individuals.
• Software: Epi Info available for free
download in CDC website 8
Types of Variables
• Look at columns and rows
• Some values are numeric and some are descriptive.
• Variables can be classified into four types
1. Nominal-scale variable
2. Ordinal-scale variable
3. Interval-scale variable
4. Ratio-scale variable
Nominal & Ordinal…………Qualitative or Categorical or
discrete
Interval & Ratio Scale……….Quantitative or Continuous
9
Nominal Scale Variable
•Values are categories without any numerical
ranking such as;
(alive or dead, ill or well, vaccinated or
unvaccinated,…..etc.)
•Two mutually exclusive categories
•Also called dichotomous variable
Ordinal Scale Variable
Values can be ranked but not evenly spaced
Such as cancer stages, SES
Customer Satisfaction surveys
10
11
Likert Type Scale-Variant of OS
Interval difference between ordinal
variables cannot be concluded. Effectively
used in surveys, polls, and questionnaires .
•SEC such as rich, middle class, poor etc.
•Frequency of occurrence- very often,
often, Not often, Not at all
•Evaluating the degree of agreement-
TA, Agree, Neutral, Disagree, TD
•Understanding preferences-HP, Apple,
Lenovo, Dell, Acer
12
Interval Scale variables
• Measured on equally spaced units, but
without a true 0 like
• zero arbitrary (Not True zero)
• Interval scales are sometimes useful in
statistics because they let you assign
numerical values to arbitrary
measurements, like an opinion
• Quantitative
• Third level of measurement
13
14
Ratio Scale Variable
• Interval variable with true zero such as
height, weight, illness duration, etc.
• Ratio scale ranks the highest in the
four “levels of measurement”
• The ratio scale has all the qualities of
nominal, ordinal, and nominal scales
• Ratio scale has a true zero, that is,
the zero possesses a meaningful value.
• The common example of a ratio scale is
length, duration, mass, etc.
• All statistical analysis (descriptive &
inferential) possible 15
Summary
16
Level Description Attribute Statistical
methods used
1. Nominal Classification into mutually Sex; Religion Mode, Chi-square
exclusive categories based on Dead/alive
presence or absence of a
property
Numerical presentation
Graphical presentation
Mathematical presentation
Numerical Presentation
Tabular presentation (simple – complex)
Simple Frequency Distribution Table (S.F.D.T.)
Title
Name of variable
Frequency %
(Units of variable)
-
- Categories
-
Total
19
Distribution of 50 patients at the surgical department of
AAAAA hospital in May 2008 according to their ABO
blood groups
Blood Frequency %
group
A 12 24
B 18 36
AB 5 10
O 15 30
Total 50 100
20
Complex Frequency Distribution Table
Distribution of 20 lung cancer patients at the chest
department of AAAAA hospital and 40 controls in
May 2008 according to smoking
Lung cancer
Total
Smokin Cases Control
g No. % No. % No. %
Smoke 38.3
15 75% 8 20% 23
r 3
Non 61.6
5 25% 32 80% 37
smoker 7
21
Total 20 100 40 100 60 100
Graphical Presentation
• Line Graph
• Frequency Polygon
• Frequency Curve
• Histogram
• Scatter Plot
• Pie Chart
• Statistical Map
22
Characteristics-Line Graph
• Line graphs are suitable for displaying
numerical (quantitative) data.
• The slope of the line indicates the rate of
change (the relative change) of y over time.
• A horizontal straight line indicates no
change.
• An upward or downward straight line slope
indicates a constant rate of increase or
decrease
• Two parallel lines indicate similar rate of
change over time.
23
Line Graph
25
Frequency Polygon
Age Sex Mid-point of
(years) interval
Males Female
s
20 - 3 (12%) 2 (10%) (20+30) / 2 = 25
30 - 9 (36%) 6 (30%) (30+40) / 2 = 35
40- 7 (8%) 5 (25%) (40+50) / 2 = 45
50 - 4 (16%) 3 (15%) (50+60) / 2 = 55
60 - 70 2 (8%) 4 (20%) (60+70) / 2 = 65
Total 25(100 20(100
%) %)
26
Frequency Polygon
Males Females
%
40
35
Sex
30 M-
Age
M F P
25
20 (12% (10%
20- 25
) )
15
(36% (30%
10 30- 35
) )
5 (25%
40- 45
0 (8%) )
Age
25 35 45 55 65 (16% (15%
50- 55
) )
60- (20%
70
(8%) 65
)
Distribution of 45 patients at (place) , in (time)
by age and sex 27
Frequency Curve
9
Female
Merits: Shows
8
7 Male skewness to the
6 distribution +/-
5
Kurtosis
Frequency
4
3
2
1
0
20- 30- 40- 50- 60-69
Age in years
28
Characteristics of Histogram
• Shows the frequency distribution of
numerical data, either continuous
(such as height) or discrete (such as
mortality).
• Histograms can help visualise gaps in
the data, outliners or other unusual
observations.
• In intervention epidemiology histograms
are frequently used to present
occurrence (distribution) of onsets
of illness according to time.
Histogram
30
Characteristics-Bar Graph
31
Simple Bar Graph
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed
Marital Status
Marital status
32
Grouped Bar Graphs
%
50
Male
40 Female
30
20
10
0
Single Married Divorced Widowed
Marital status
Marital Status
33
33
Stacked Bar Graphs
•.
34
Component Bar Graph
35
Characteristics-Pie Chart
• Circle provides a visual concept of the
whole (100%) and they are simple to use.
• they are best used for displaying statistical
information when there are no more than six
components
• Not useful when the values of each
component are similar because it is
difficult to see the differences between
slice sizes
• Can be very misleading if one presents
percentages based on a limited
36
number of cases
Pie Chart
Number of Deaths by Cause Among 25–
34 Year Olds — United States, 2003
37
Doughnut chart
3838
Reference
•Introduction to Epidemiology
39