Methods of Data Organization and Presentation
Methods of Data Organization and Presentation
Presentation
1
1. Frequency Distributions
Ordered array: A simple arrangement of individual observations in
the order of magnitude.
Very difficult with large sample size
12 19 27 36 42 59
15 22 31 39 43 61
17 23 31 41 44 65
18 26 34 41 54 67
2
The actual summarization and organization of
data starts from frequency distribution.
3
For nominal and ordinal data, frequency
distributions are often used as a summary.
Example:
distributed
4
For both discrete and continuous data, the
values are grouped into non-overlapping
intervals, usually of equal width.
5
a) Qualitative variable: Count the number of
cases in each category.
6
Frequency Relative Frequency
ICU Type (How often)
Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08
Total 25 1.00
7
Example 2:
A study was conducted to assess the
characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female
8
b) Quantitative variable:
Select a set of continuous, non-overlapping
intervals such that each value can be placed in
one, and only one, of the intervals.
The first consideration is how many intervals to
include
9
For a continuous variable
(e.g. – age), the frequency
distribution of the individual
ages is not so interesting.
10
• We “see more” in frequencies
of age values in “groupings”.
Here, 10 year groupings make
sense.
• This kind of frequency
distribution is called Grouped
frequency distribution
11
The following are some rules that are
generally used to construct grouped
frequency distribution:
12
To determine the number of class intervals and the
Sturge’s rule:
K 1 3.322(logn)
LS
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
13
Example:
◦ Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5
14
Time Relative Cumulative
(Hours) Frequency Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00
Total 40 1.00
15
Cumulative frequencies: When frequencies
of two or more classes are added.
16
True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions
17
Time
(Hours) True limit Mid-point Frequency
10-14 9.5 – 14.5 12 5
15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2
Total 40
18
2. Statistical Tables
19
2. Statistical Tables
20
Table 1: Overall immunization status of children in Adami Tulu Woreda,
Feb. 1995
21
2. Statistical Tables
B) Two-way table:
22
Table 2: TT immunization by marital status of the women of childbearing age,
Assendabo town, Jimma Zone, 1996
Immunization Status
Marital Total
Immunized Non Immunized
Status
No. % No. %
24
Table 3: Distribution of Health Professional by Sex and
Residence
Residence
Profession/Sex Urban Rural Total
No (%) No (%) No (%)
Doctors Male 8 (10.0) 35 (21.0) 43 (17.7)
Female 2 (3.0) 16 (10.0) 18 (7.4)
Nurses Male 46 (58.0) 36 (22.0) 82 (33.7)
Female 23 (29.0) 77 (47.0) 100 (41.2)
Total 79 (100.0) 164 (100.0) 243 (100.0)
25
Guidelines for constructing tables
Keep them simple,
Limit the number of variables to three or less,
All tables should be self-explanatory,
Include clear title telling what, when and where,
Clearly label the rows and columns,
State clearly the unit of measurement used,
Explain codes and abbreviations in the foot-note,
Show totals,
If data is not original, indicate the source in foot-note.
26
3. Diagrammatic Representation of
Data
27
Importance of diagrammatic representation:
28
Well designed graphs can be powerful means
of communicating a great deal of information
29
Specific types of graphs include:
Bar graph
Nominal, ordinal
Pie chart data
Histogram
Stem-and-leaf plot
Box plot
Quantitative
Line graph data
30
1. Bar charts (or graphs)
Categories are listed on the horizontal axis
(X-axis)
Frequencies or relative frequencies are
31
Method of constructing bar chart
All the bars must have equal width
The bars are not joined together (leave
equal distances
All the bars should rest on the same line
32
1. Bar charts (or graphs)
There are different types of bar diagrams, the
most important ones are:
Simple bar chart
Multiple bar chart
Component (or sub-divided) Bar Diagram
33
1.1 Simple Bar charts (or graphs)
34
Example of simple bar diagram
100
Number of children
80
60
40
20
0
Not immunized Partially immunized Fully immunized
Immunization status
35
Example: Construct a simple bar chart for the following data.
36
Distribution of patients in hopital X by source of referal, 1999
769
800
700 623
600
No. of patients
500
400
300 256
200 161
97
100
0
Other GP OPD Casualty Other
hospital
Source of referal
37
1.2 multiple bar chart
In this type of chart the component figures
are shown as separate bars adjoining each
other
The height of each bar represents the actual
one variable
38
350
Num b er o f wo m en
300
250
200
150
100
50
0
Married Single Divorced Widowed
Marital status
39
1.3 Component (or sub-divided) Bar
Diagram
Bars are sub-divided into component parts of
the figure.
These sorts of diagrams are constructed
40
I. Actual Component Bar Diagrams
41
Example of actual component bar diagrams
500
Num ber o f wo m en
400
300
200
100
0
Married Single Divorced Widowed
Marital status
42
II. Percentage Component Bar Diagram
43
Example of percentage component bar diagram
N um ber of w om en
100
80
60
40
20
0
Married Single Divorced Widow ed
Marital status
44
2. Pie chart
Shows the relative frequency for each category
by dividing a circle into sectors, the angles of
which are proportional to the relative frequency.
Used for a single categorical variable
Use percentage distributions
45
Steps to construct a pie-chart
Construct a frequency table
46
Example: Distribution of deaths for females, in England
and Wales, 1989.
47
Distribution fo cause of death for females, in England and Wales, 1989
Others
8%
Digestive System
4%
Injury and Poisoning
3%
Circulatory system
Respiratory system
42%
13%
Neoplasmas
30%
48
3. Histogram
Histograms are frequency distributions with
continuous class intervals that have been
turned into graphs.
To construct a histogram, we draw the
interval boundaries on a horizontal line and
the frequencies on a vertical line.
Non-overlapping intervals that cover all of
the data values must be used.
49
Bars are drawn over the intervals in such a
way that the areas of the bars are all
proportional in the same way to their interval
frequencies.
50
Example: Distribution of the age of women at the time of marriage
40
35
30
25
No of women
20
15
10
0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group 51
Histogram for the ages of 2087 mothers with
<5 children, Adami Tulu, 2003
700
600
500
400
300
200
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
N1AGEMOTH
52
Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective
groups are lost and difficult to reconstruct
53
4. Stem-and-Leaf Plot
A quick way to organize data to give visual
impression similar to a histogram while
retaining much more detail on the data.
Similar to histogram and serves the same
purpose and reveals the presence or absence of
symmetry
Are most effective with relatively small data sets
Are not suitable for reports and other
communications, but
Help researchers to understand the nature of
their data
54
Example
43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29,
36, 66, 72, 41
2 2 8 9
3 4 6
4 1 3 7 9
5 1
6 1 6
7 2 7
8 2
55
Steps to construct Stem-and-Leaf Plots
56
Steps to construct Stem-and-Leaf Plots
57
Example: 3031, 3101, 3265, 3260, 3245, 3200,
3248, 3323, 3314, 3484, 3541, 3649 (BWT in g)
58
5. Frequency polygon
A frequency distribution can be portrayed
graphically in yet another way by means
of a frequency polygon.
To draw a frequency polygon we connect
the mid-point of the tops of the cells of
the histogram by a straight line.
The total area under the frequency
polygon is equal to the area under the
histogram
Useful when comparing two or more
frequency distributions by drawing them
on the same diagram
59
Frequency polygon for the ages of 2087 mothers
with <5 children, Adami Tulu, 2003
700
600
500
400
300
200
N1AGEMOTH
60
It can be also drawn without erecting rectangles by joining
the top midpoints of the intervals representing the
frequency of the classes as follows:
40
35
30
No of women
25
20
15
10
0
12 17 22 27 32 37 42 47
Age
61
6. Ogive Curve (The Cumulative Frequency Polygon)
10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100
Total 25 100
63
Cumulative frequency of 25 ICU patients
64
7. Box and Whisker Plot
It is another way to display information when
the objective is to illustrate certain locations
(skewness) in the distribution .
Can be used to display a set of discrete or
defined
65
A box is drawn with the top of the box at the
third quartile (75%) and the bottom at the
first quartile (25%).
The location of the mid-point (50%) of the
66
Percentile = p(n+1), p=the required
percentile
Arrange the numbers in ascending order
A. 1st quartile = 0.25 (n+1)th
B. 2nd quartile = 0.5 (n+1)th
C. 3rd quartile = 0.75 (n+1)th
D. 20th percentile = 0.2 (n+1)th
C. 15th percentile = 0.15 (n+1)th
67
The pth percentile is a value that is p% of the
observations and the remaining (1-p)%.
The pth percentile is:
68
Given a sample of size n = 60, find the 10th
percentile of the data set.
p(n+1) = 0.10(60+1) = 6.1
= Average of 6th and 7th
◦ 10% of the observations are less than or equal to
this value and 90% of them are greater than or
equal to the value
69
How can the lower quartile, median and upper quartile
be used to judge the symmetry of a distribution?
70
71
Box plots are useful for comparing two or
more groups of observations
72
8. Line graph
Useful for assessing the trend of particular situation overtime.
Helps for monitoring the trend of epidemics.
The time, in weeks, months or years, is marked along the
horizontal axis, and
Values of the quantity being studied is marked on the vertical
axis.
Values for each category are connected by continuous line.
Sometimes two or more graphs are drawn on the same graph
taking the same scale so that the plotted graphs are
comparable.
73
No. of microscopically confirmed malaria cases by
species and month at Zeway malaria control unit, 2003
2100
No. of confirmed malaria cases
1800 Positive
1500 P. falciparum
P. vivax
1200
900
600
300
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
74
75