0% found this document useful (0 votes)
94 views

Methods of Data Organization and Presentation

This document discusses various methods for organizing and presenting data, including: 1. Frequency distributions which arrange data by observed values and counts. 2. Statistical tables which present data in rows and columns, including one-way, two-way, and higher order tables. 3. Diagrammatic representations such as graphs which visually depict numerical data patterns and trends. Organizing data through these methods facilitates understanding, comparison, and communication of large amounts of information.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Methods of Data Organization and Presentation

This document discusses various methods for organizing and presenting data, including: 1. Frequency distributions which arrange data by observed values and counts. 2. Statistical tables which present data in rows and columns, including one-way, two-way, and higher order tables. 3. Diagrammatic representations such as graphs which visually depict numerical data patterns and trends. Organizing data through these methods facilitates understanding, comparison, and communication of large amounts of information.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Methods of Data Organization and

Presentation

1
1. Frequency Distributions
 Ordered array: A simple arrangement of individual observations in
the order of magnitude.
 Very difficult with large sample size

12 19 27 36 42 59
15 22 31 39 43 61
17 23 31 41 44 65
18 26 34 41 54 67

2
 The actual summarization and organization of
data starts from frequency distribution.

 Frequency distribution: A table which has a list


of each of the possible values that the data can
assume along with the number of times each
value occurs.

3
 For nominal and ordinal data, frequency
distributions are often used as a summary.
 Example:

 The % of times that each value occurs, or the


relative frequency, is often listed
 Tables make it easier to see how the data are

distributed

4
 For both discrete and continuous data, the
values are grouped into non-overlapping
intervals, usually of equal width.

5
a) Qualitative variable: Count the number of
cases in each category.

- Example1: The intensive care unit type of 25


patients entering ICU at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other

6
Frequency Relative Frequency
ICU Type (How often)
Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08
Total 25 1.00

7
Example 2:
A study was conducted to assess the
characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female

Gender Frequency (n) Relative Frequency


Male (1) 110 47.0%
Female (2) 124 53.0%
Total 234 100%

8
b) Quantitative variable:
 Select a set of continuous, non-overlapping
intervals such that each value can be placed in
one, and only one, of the intervals.
 The first consideration is how many intervals to
include

9
For a continuous variable
(e.g. – age), the frequency
distribution of the individual
ages is not so interesting.

10
• We “see more” in frequencies
of age values in “groupings”.
Here, 10 year groupings make
sense.
• This kind of frequency
distribution is called Grouped
frequency distribution

11
 The following are some rules that are
generally used to construct grouped
frequency distribution:

1. determination of the number of classes (k)


2. Determine the length or width of the class
interval.
3. Determination of class limits:

12
To determine the number of class intervals and the
 

corresponding width, we may use:

Sturge’s rule:
K  1  3.322(logn)
LS
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value

13
Example:
◦ Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5

14
Time Relative Cumulative
(Hours) Frequency Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00

Total 40 1.00
15
 Cumulative frequencies: When frequencies
of two or more classes are added.

 Cumulative relative frequency: The


percentage of the total number of
observations that have a value either in that
interval or below it.

 Mid-point: The value of the interval which


lies midway between the lower and the
upper limits of a class.

16
 True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions

 Used for smoothening of the class intervals

 Subtract 0.5 from the lower and add it to the


upper limit

17
Time
(Hours) True limit Mid-point Frequency
10-14 9.5 – 14.5 12 5
15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2

Total 40

18
2. Statistical Tables

 A statistical table is an orderly and


systematic presentation of numerical data in
rows and columns.

 Rows (stubs) are horizontal and columns


(captions) are vertical arrangements.

19
2. Statistical Tables

A) Simple or one-way table:


 The simple frequency table is used when the
individual observations involve only to a
single variable.
 whereas the cross tabulation is used to
obtain the frequency distribution of one
variable by the subset of another variable

20
Table 1: Overall immunization status of children in Adami Tulu Woreda,
Feb. 1995

Immunization status Number Percent

Not immunized 75 35.7

Partially immunized 57 27.1

Fully immunized 78 37.2

Total 210 100.0

Source: Fikru T et al. EPI Coverage in Adami Tulu. Eth J Health


Dev 1997;11(2): 109-113

21
2. Statistical Tables

B) Two-way table:

 This table shows two characteristics and is


formed when either the caption or the stub is
divided into two or more parts.
 

22
Table 2: TT immunization by marital status of the women of childbearing age,
Assendabo town, Jimma Zone, 1996

Immunization Status
Marital Total
Immunized Non Immunized
Status
No. % No. %

Single 58 24.7 177 75.3 235

Married 156 34.7 294 65.3 450


Source: Mikael A. et al Tetanus Toxoid immunization coverage
among women of
Divorced 10child bearing
35.7 age18
in Assendabo town; Bulletin
64.3 28of
JIHS, 1996, 7(1): 13-20
Widowed 7 50.0 7 50.0 14 23
2. Statistical Tables

C) Higher Order Table:

 When it is desired to represent three or more


characteristics in a single table.

24
Table 3: Distribution of Health Professional by Sex and
Residence

Residence
Profession/Sex Urban Rural Total
No (%) No (%) No (%)
Doctors Male 8 (10.0) 35 (21.0) 43 (17.7)
Female 2 (3.0) 16 (10.0) 18 (7.4)
Nurses Male 46 (58.0) 36 (22.0) 82 (33.7)
Female 23 (29.0) 77 (47.0) 100 (41.2)
Total 79 (100.0) 164 (100.0) 243 (100.0)

25
Guidelines for constructing tables
 Keep them simple,
 Limit the number of variables to three or less,
 All tables should be self-explanatory,
 Include clear title telling what, when and where,
 Clearly label the rows and columns,
 State clearly the unit of measurement used,
 Explain codes and abbreviations in the foot-note,
 Show totals,
 If data is not original, indicate the source in foot-note.

26
3. Diagrammatic Representation of
Data

 Pictorial representations of numerical data

27
Importance of diagrammatic representation:

1.Diagrams have greater attraction than


mere figures.
2. They give quick overall impression of the
data.
3. They have great memorizing value than
mere figures.
4. They facilitate comparison
5. Used to understand patterns and trends

28
 Well designed graphs can be powerful means
of communicating a great deal of information

 When graphs are poorly designed, they not


only ineffectively convey message, but they
are often misleading.

29
Specific types of graphs include:
 Bar graph
Nominal, ordinal
 Pie chart data

 Histogram
 Stem-and-leaf plot
 Box plot
Quantitative
 Line graph data

30
1. Bar charts (or graphs)
 Categories are listed on the horizontal axis
(X-axis)
 Frequencies or relative frequencies are

represented on the Y-axis (ordinate)


 The height of each bar is proportional to the

frequency or relative frequency of


observations in that category

31
Method of constructing bar chart
 All the bars must have equal width
 The bars are not joined together (leave

space between bars)


 The different bars should be separated by

equal distances
 All the bars should rest on the same line

called the base


 Label both axes clearly

32
1. Bar charts (or graphs)
There are different types of bar diagrams, the
most important ones are:
Simple bar chart
Multiple bar chart
Component (or sub-divided) Bar Diagram

33
1.1 Simple Bar charts (or graphs)

 It is a one-dimensional diagram in which the


bar represents the whole of the magnitude.
 The height or length of each bar indicates the

size (frequency) of the figure represented.

34
Example of simple bar diagram
100
Number of children

80

60

40

20

0
Not immunized Partially immunized Fully immunized
Immunization status

Fig. 1. Immunization status of Children in Adami Tulu


Woreda, Feb. 1995

35
Example: Construct a simple bar chart for the following data.

Distribution of patients in hospital by source of referral


Source of referral No. of patients Relative freq.
Other hospital 97 5.1
General practitioner 769 40.3
Out-patient department 623 32.7
Casualty 256 13.4
Other 161 8.5
Total 1 906 100.0

36
Distribution of patients in hopital X by source of referal, 1999
769
800

700 623
600
No. of patients

500

400

300 256

200 161
97
100

0
Other GP OPD Casualty Other
hospital
Source of referal

37
1.2 multiple bar chart
 In this type of chart the component figures
are shown as separate bars adjoining each
other
 The height of each bar represents the actual

value of the component figure.


 It depicts distributional pattern of more than

one variable

38
350
Num b er o f wo m en

300
250
200
150
100
50
0
Married Single Divorced Widowed
Marital status

Immunized Not immunized

Fig. 2 TT Immunization status by marital status of women


15-49 years, Asendabo town, 1996

39
1.3 Component (or sub-divided) Bar
Diagram
 Bars are sub-divided into component parts of
the figure.
 These sorts of diagrams are constructed

when each total is built up from two or more


component figures. They can be of two kind:
I. Actual Component Bar Diagrams
II. Percentage Component Bar Diagram

40
I. Actual Component Bar Diagrams

 When the over all height of the bars and the


individual component lengths represent
actual figures.

41
Example of actual component bar diagrams
500
Num ber o f wo m en

400

300

200

100

0
Married Single Divorced Widowed
Marital status

Immunized Not immunized

Fig. 3 TT Immunization status by marital status of women 15-


49 years, Asendabo town, 1996

42
II. Percentage Component Bar Diagram

 Where the individual component lengths


represent the percentage each component
forms the over all total.
 Note that a series of such bars will all be the

same total height, i.e., 100 percent.

43
Example of percentage component bar diagram
N um ber of w om en

100
80
60
40
20
0
Married Single Divorced Widow ed
Marital status

Immunized Not immunized

Fig. 4 TT Immunization status by marital status of women


15-49 years, Asendabo town, 1996

44
2. Pie chart
 Shows the relative frequency for each category
by dividing a circle into sectors, the angles of
which are proportional to the relative frequency.
 Used for a single categorical variable
 Use percentage distributions

45
Steps to construct a pie-chart
 Construct a frequency table

 Change the frequency into percentage (P)

 Change the percentages into degrees, where:


degree = Percentage X 360o

 Draw a circle and divide it accordingly

46
Example: Distribution of deaths for females, in England
and Wales, 1989.

Cause of death No. of death


Circulatory system 100 000
Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000

Total 236 000

47
Distribution fo cause of death for females, in England and Wales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%

48
3. Histogram
 Histograms are frequency distributions with
continuous class intervals that have been
turned into graphs.
 To construct a histogram, we draw the
interval boundaries on a horizontal line and
the frequencies on a vertical line.
 Non-overlapping intervals that cover all of
the data values must be used.

49
 Bars are drawn over the intervals in such a
way that the areas of the bars are all
proportional in the same way to their interval
frequencies.

 The area of each bar is proportional to the


frequency of observations in the interval

50
Example: Distribution of the age of women at the time of marriage

Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49


group
Number 11 36 28 13 7 3 2
Age of women at the time of marriage

40

35

30

25
No of women

20

15

10

0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group 51
Histogram for the ages of 2087 mothers with
<5 children, Adami Tulu, 2003
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6

0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

52
Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective
groups are lost and difficult to reconstruct

Þ The other graphic display (stem-and-leaf


plot) overcomes these problems

53
4. Stem-and-Leaf Plot
 A quick way to organize data to give visual
impression similar to a histogram while
retaining much more detail on the data.
 Similar to histogram and serves the same
purpose and reveals the presence or absence of
symmetry
 Are most effective with relatively small data sets
 Are not suitable for reports and other
communications, but
 Help researchers to understand the nature of
their data
54
Example
 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29,
36, 66, 72, 41

2 2 8 9
3 4 6
4 1 3 7 9
5 1
6 1 6
7 2 7
8 2

55
Steps to construct Stem-and-Leaf Plots

1. Separate each data point into a stem and leaf


components
• Stem = consists of one or more of the initial digits
of the measurement
• Leaf = consists of the rightmost digit
The stem of the number 483, for example, is 48 and
the leaf is 3.
2. Write the smallest stem in the data set in the
upper left-hand corner of the plot

56
Steps to construct Stem-and-Leaf Plots

3. Write the second stem (first stem +1) below


the first stem
4. Continue with the remaining stems until
you reach the largest stem in the data set
5. Draw a vertical bar to the right of the
column of stems
6. For each number in the data set, find the
appropriate stem and write the leaf to the
right of the vertical bar

57
Example: 3031, 3101, 3265, 3260, 3245, 3200,
3248, 3323, 3314, 3484, 3541, 3649 (BWT in g)

Stem Leaf Number


30 31 1
31 01 1
32 65 60 45 00 48 5
33 23 14 2
34 84 1
35 41 1
36 49 1

58
5. Frequency polygon
 A frequency distribution can be portrayed
graphically in yet another way by means
of a frequency polygon.
 To draw a frequency polygon we connect
the mid-point of the tops of the cells of
the histogram by a straight line.
 The total area under the frequency
polygon is equal to the area under the
histogram
 Useful when comparing two or more
frequency distributions by drawing them
on the same diagram
59
Frequency polygon for the ages of 2087 mothers
with <5 children, Adami Tulu, 2003
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

60
It can be also drawn without erecting rectangles by joining
the top midpoints of the intervals representing the
frequency of the classes as follows:

Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
12 17 22 27 32 37 42 47
Age

61
6. Ogive Curve (The Cumulative Frequency Polygon)

 Some times it may be necessary to know the


number of items whose values are more or less
than a certain amount.
 We may, for example, be interested to know the
no. of patients whose weight is <50 Kg or >60
Kg.
 To get this information it is necessary to change
the form of the frequency distribution from a
‘simple’ to a ‘cumulative’ distribution.
 Ogive curve turns a cumulative frequency
distribution in to graphs.
 Are much more common than frequency
polygons
62
Cumulative Frequency and Cum. Rel. Freq. of Age
of 25 ICU Patients

Relative Cumulative Cumulative


Age Interval Frequency Frequency frequency Rel. Freq.
(%) (%)

10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100

Total 25 100
63
Cumulative frequency of 25 ICU patients

64
7. Box and Whisker Plot
 It is another way to display information when
the objective is to illustrate certain locations
(skewness) in the distribution .
 Can be used to display a set of discrete or

continuous observations using a single


vertical axis – only certain summaries of the
data are shown
 First the percentiles of the data set must be

defined

65
 A box is drawn with the top of the box at the
third quartile (75%) and the bottom at the
first quartile (25%).
 The location of the mid-point (50%) of the

distribution is indicated with a horizontal


line in the box.
 Finally, straight lines, or whiskers, are drawn

from the centre of the top of the box to the


largest observation and from the centre of
the bottom of the box to the smallest
observation.

66
 Percentile = p(n+1), p=the required
percentile
 Arrange the numbers in ascending order
A. 1st quartile = 0.25 (n+1)th
B. 2nd quartile = 0.5 (n+1)th
C. 3rd quartile = 0.75 (n+1)th
D. 20th percentile = 0.2 (n+1)th
C. 15th percentile = 0.15 (n+1)th

67
 The pth percentile is a value that is p% of the
observations and  the remaining (1-p)%.
 The pth percentile is:

◦ The observation corresponding to p(n+1)th if p(n+1)


is an integer
◦ The average of (k)th and (k+1)th observations if p(n+1)
is not an integer, where k is the largest integer less
than p(n+1).
 If p(n+1) = 3.6, the average of 3th and 4th observations

68
 Given a sample of size n = 60, find the 10th
percentile of the data set.
p(n+1) = 0.10(60+1) = 6.1
= Average of 6th and 7th
◦ 10% of the observations are less than or equal to
this value and 90% of them are greater than or
equal to the value

69
How can the lower quartile, median and upper quartile
be used to judge the symmetry of a distribution?

1. If the distribution is symmetric, then the upper


and lower quartiles should be approximately
equally spaced from the median.

2. If the upper quartile is farther from the median


than the lower quartile, then the distribution is
positively skewed.

3. If the lower quartile is farther from the median


than the upper quartile, then the distribution is
negatively skewed.

70
71
Box plots are useful for comparing two or
more groups of observations

72
8. Line graph
 Useful for assessing the trend of particular situation overtime.
 Helps for monitoring the trend of epidemics.
 The time, in weeks, months or years, is marked along the
horizontal axis, and
 Values of the quantity being studied is marked on the vertical
axis.
 Values for each category are connected by continuous line.
 Sometimes two or more graphs are drawn on the same graph
taking the same scale so that the plotted graphs are
comparable.

73
No. of microscopically confirmed malaria cases by
species and month at Zeway malaria control unit, 2003

2100
No. of confirmed malaria cases

1800 Positive
1500 P. falciparum
P. vivax
1200

900

600

300

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Months

74
75

You might also like