0% found this document useful (0 votes)
2 views

Chapter 1 Biostat Discript Stastics

Chapter 1 covers descriptive statistics, focusing on the organization, summarization, and presentation of data, as well as the classification of variables into categorical and quantitative types. It outlines various data collection methods, including primary and secondary data, and emphasizes the importance of clear and valid questionnaire design. The chapter also defines key statistical terms and measurement scales, providing a foundation for understanding biostatistics in health-related research.

Uploaded by

gbekele193
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 1 Biostat Discript Stastics

Chapter 1 covers descriptive statistics, focusing on the organization, summarization, and presentation of data, as well as the classification of variables into categorical and quantitative types. It outlines various data collection methods, including primary and secondary data, and emphasizes the importance of clear and valid questionnaire design. The chapter also defines key statistical terms and measurement scales, providing a foundation for understanding biostatistics in health-related research.

Uploaded by

gbekele193
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 118

Chapter 1

Descriptive Statistics

04/17/25 1
Learning Objectives
At the end of this chapter, the students will be able to
Ω Define and Identify the different types of data and
understand why we need to classify variables
Ω Identify the different methods of data collection and criterion
that we use to select a method of data collection.
Ω Define a questionnaire, identify the different parts of a
questionnaire and indicate the procedures to prepare a
questionnaire.
Ω Identify the different methods of data organization and
presentation .
Ω Understand the criterion for the selection of a method to
organize and present data.
Ω Identify the different methods of data summarization

04/17/25 2
Descriptive Statistics
• Techniques used to organize and
summarize a set of data in a concise way.
– Organization of data
– Summarization of data
– Presentation of data
• Numbers that have not been summarized
and organized are called raw data.

04/17/25 3
Organization of data
• arranging data in a way that makes it
easy to find and use.
• This can be done by creating a filing
system, using spreadsheets or
databases, or by using a data
visualization tool
• Summarization of data- is the
process of reducing the amount of data
to its most important points.
• This can be done by creating a table of
contents, an executive summary,
or by using a data reduction tool.
04/17/25 4
Presentation of data
• the process of communicating data to
others in a way that is easy to
understand.
• This can be done by creating a report,
a presentation, or by using a data
visualization tool.

04/17/25 5
Definition of Terms
• Biostatistics is the application of
statistical techniques to scientific
research in health-related fields,
including medicine, biology, and public
health.
• It involves the collection, analysis, and
interpretation of biological data,
especially data relating to human
biology, health, and medicine

04/17/25 6
Descriptive statistics
• refers to the analysis, summary, and
communication of findings that
describe a data set.
• It involves measures of central
tendency (mean, median, mode) and
measures of variability (range,
variance, standard deviation)
• are used to summarize and present
data concisely and meaningfully,
aiding in the understanding of the
central
04/17/25
tendency, dispersion, and7
shape of the distribution of a dataset
Statistics
• in the context of Biostatistics refers to the
application of statistical methods to biological,
medical, and health-related data.
• It involves the collection, analysis, presentation,
and interpretation of data specifically within
these fields.

04/17/25 8
Descriptive statistics include:
• Tables

• Graphs

• Numerical summary measures

- Measures of central tendency

- Measures of variability/Dispersion

04/17/25 9
• Before summarization and organization, we
need to know the types of variables and
measurement scales of our data.

• Before displaying or analyzing data, classify


the variables into their different types.

04/17/25 10
Variable
• Variable: A characteristic which takes
different values in different persons, places,
or things.
• Any aspect of an individual or object that is
measured (e.g., BP) or recorded (e.g., age,
sex) and takes any value.
• There may be one variable in a study or
many.
• E.g., A study of treatment outcome of TB
04/17/25 11
• Variables can be broadly classified
into:
– Categorical (or Qualitative) or
– Quantitative (or numerical variables).

04/17/25 12
• Categorical variable: A variable or
characteristic which can not be measured in
quantitative form but can only be sorted by
name or categories

• Not able to be measured as we measure


height or weight

• The notion of magnitude is absent or implicit.

04/17/25 13
• Quantitative variable: A variable that can
be measured (or counted) and expressed
numerically.

• Height, wt, # of children, etc.

• Has the notion of magnitude.

04/17/25 14
Quantitative variable is divided into two:
1. Discrete: It can only have a limited number of
discrete values (usually whole numbers).
– E.g., the number of episodes of diarrhoea a child has
had in a year. You can’t have 12.5 episodes of diarrhoea
• Characterized by gaps or interruptions in the
values (integers).
• Both the order and magnitude of the values matter.
• The values aren’t just labels, but are actual
measurable quantities.

04/17/25 15
2. Continuous variable: It can have an
infinite number of possible values in any
given interval.
• Both the magnitude and the order of the
values matter
• Does not possess the gaps or interruptions
• Weight is continuous since it can take on
any number of values (e.g., 34.575 Kg).

04/17/25 16
SUMMARY

Variable

Types
of Qualitative Quantitative
variables or categorical measurement

Nominal Ordinal Discrete Continuous


(not ordered) (ordered) (count data) (real-valued)
e.g. ethnic e.g. response e.g. # of e.g. height
group to treatment admissions

04/17/25 Measurement scales 17


Scales of measurement
• All measurements are not the same.
• Measuring weight = eg. 40kg
• Measuring the status of a patient on scale
= “improved”, “stable”, “not improved”.
• There are four types of scales of
measurement.

04/17/25 18
1. Nominal scale:
• Data that represent categories or names. There is
no implied order to the categories of nominal
data.
• The simplest type of data, in which the values fall
into unordered categories or classes
• Consists of “naming” observations or classifying
them into various mutually exclusive and
collectively exhaustive categories
• Uses names, labels, or symbols to assign each
measurement.
• Each item must fit into exactly one category.
– Examples: Blood type, sex, race, marital status, etc.
04/17/25 19
Example of nominal Scale:

Race/Ethnicity:
1. Black • The numbers have NO
2. White meaning
3. Latino • They are labels only
4. Other

04/17/25 20
• If nominal data can take on only two
possible values, they are called
dichotomous or binary.
• So sex is not just nominal, it is
dichotomous (male or female).
• Yes/no questions
– E.g., cured from TB at 6 months of Rx

04/17/25 21
2. Ordinal scale:
• Assigns each measurement to one of a limited
number of categories that are ranked in terms of
order.
• The spaces or intervals between the categories are
not necessarily equal.
• Although non-numerical, can be considered to
have a natural ordering
– Examples: Patient status, cancer stages,
social class, etc.
04/17/25 22
Example of ordinal scale:

• Pain level: • The numbers have


1. None LIMITED meaning
2. Mild 4>3>2>1 is all we
3. Moderate know apart from their
utility as labels
4. Severe

04/17/25 23
3. Interval scale:
- In interval data the intervals between values are
the same.
- Measured on a continuum and differences
between any two numbers on a scale are of known
size.
Example: Temp. in oF on 4 consecutive days
Days: A B C D
Temp. oF: 50 55 60 65
For these data, not only is day A with 50 o cooler
than day D with 65o, but is 15o cooler.
- It has no true zero point. “0” is arbitrarily chosen
and doesn’t reflect the absence of temp.
04/17/25 24
4. Ratio scale:
- Measurement begins at a true zero point and
the scale has equal space.
- Examples: Height, age, weight, BP, etc.
– The absence of negative numbers and
the presence of a true zero point are
key characteristics of a ratio scale are
key characteristics
– are the highest level of measurement
Note on meaningfulness of “ratio”-
– Someone who weighs 80 kg is two times as
heavy as someone else who weighs 40 kg. This
is true even if weight had been measured in
04/17/25
other measurements. 25
Degree of precision in measuring

Nominal

Ordinal

Interval

Ratio

04/17/25 26
Method of Data collection

04/17/25 27
Introduction
Before any statistical work can be done data must
be collected. Depending on the type of variable
and the objective of the study, different data
collection methods can be employed.
Data Collection Methods
Data collection techniques allow us to
systematically collect data about our objects of
study (people, objects, and phenomena) and
about the setting in which they occur.
 In the collection of data we have to be
systematic. If data are collected haphazardly, it
will be difficult to answer our research questions
in a conclusive way.
04/17/25 28
Various data collection techniques can be used
such as:
Observation
 Face-to-face interview
 self-administered interviews
 Postal or mail method and telephone interviews
Using available information
Focus group discussions (FGD)
In-depth interview
Other data collection techniques – Rapid
appraisal techniques, Nominal group techniques,
Delphi techniques, life histories, case studies, etc.
04/17/25 29
Problems in gathering data
→Language barriers
→Lack of adequate time
→Expense
→Inadequately trained and experienced staff
→ Invasion of privacy
→Suspicion/doubt
→Bias (spatial(r/ship), project, person, season,
diplomatic, professional)
→Cultural norms (e.g. which may preclude
men interviewing women)
04/17/25 30
Choosing a Method of Data Collection
◊ Decision-makers need information that is
relevant, timely, accurate and usable.
◊ The cost of obtaining, processing and
analyzing these data is high.
◊ The challenge is to find ways, which lead to
information that is cost-effective, relevant,
timely and important for immediate use.
◊ The statistical data may be classified under
two categories, depending upon the sources.
◊ 1) Primary data 2) Secondary data

04/17/25 31
Primary Data
• Are those data, which are collected by the
investigator himself for the purpose of a specific
inquiry or study.
•Such data are original in character and are mostly
generated by surveys conducted by individuals or
research institutions.
•High response rates might be obtained since the
answers to various questions are obtained on the
spot.
• It permits explanation of questions concerning
difficult subject matter.

04/17/25 32
Secondary Data: When an investigator
uses data, which have already been
collected by others, such data are called
"Secondary Data".
Such data are primary data for the agency
that collected them, and become secondary
for someone else who uses these data for
his own purposes.
 The secondary data can be obtained from
journals, reports, government publications,
publications of professionals and research
organizations.
04/17/25 33
Secondary data are less expensive to collect
both in money and time.
These data can also be better utilized and
sometimes the quality of such data may be
better because these might have been
collected by persons who were specially
trained for that purpose.
On the other hand, such data must be used
with great care, because such data may also
be full of errors due to the fact that the
purpose of the collection of the data by the
primary agency may have been different from
the purpose of the user of these secondary
data.
04/17/25 34
Secondly, there may have been bias
introduced, the size of the sample may have
been inadequate, or there may have been
arithmetic or definition errors, hence, it is
necessary to critically investigate the validity
of the secondary data.
In general, the choice of methods of data
collection is largely based on the accuracy
of the information they yield.
In this context, ‘accuracy’
accuracy refers not only to
correspondence between the information and
objective reality - although this certainly
enters into the concept - but also to the
information’s relevance.
04/17/25 35
The selection of the method of data collection
is also based on practical considerations,
such as:
The need for personnel, skills, equipment,
etc. .
 The acceptability of the procedures to the
subjects
The probability that the method will provide
a good coverage
The investigator’s familiarity with a study
procedure may be a valid consideration.
04/17/25 36
Types of Questions
Depending on how questions are asked and
recorded we can distinguish two major
possibilities - Open –ended questions, and
closed questions.
Open-ended questions
Open-ended questions permit free responses
that should be recorded in the respondent’s own
words.
The respondent is not given any possible
answers to choose from.
Such questions are useful to obtain information
on: Facts with which the researcher is not very
familiar, Opinions, attitudes, and suggestions of
informants, or Sensitive issues.
04/17/25 37
For example
•“Can you describe exactly what the traditional birth
attendant did when your labor started?”
•“What do you think are the reasons for a high drop-
out rate of village health committee members?”
•“What would you do if you noticed that your daughter
(school girl) had a relationship with a teacher?”

04/17/25 38
Closed ended Questions
Closed questions offer a list of possible
options or answers from which the
respondents must choose.
When designing closed questions one should
try to: Offer a list of options that are exhaustive
and mutually exclusive Keep the number of
options as few as possible.
Closed questions are useful if the range of
possible responses is known.

04/17/25 39
For example
1.“What is your marital status?
a) Single
b) Married/living together
c) Separated/divorced/widowed
2.“Have your every gone to the local village
health worker for treatment?
a) Yes
b) No

04/17/25 40
• Closed questions may also be used if one is
only interested in certain aspects of an issue
and does not want to waste the time of the
respondent and interviewer by obtaining
more information than one needs.

• Closed questions may be used as well to get


the respondents to express their opinions by
choosing rating points on a scale.

04/17/25 41
Requirements of questions
• Must have face validity – that is the question
that we design should be one that give an
obviously valid and relevant measurement for
the variable.
• Must be clear and unambiguous – the way in
which questions are worded can ‘make or
break’ a questionnaire.
– Questions must be clear and unambiguous.
– They must be phrased in language that is believed
the respondent will understand, and that all
respondents will understand in the same way.
– To ensure clarity, each question should contain
only one idea; ‘double- barreled’ questions like ‘Do
you take your child to a doctor when he has a cold or
has diarrhea?’ are difficult to answer, and the
answers are difficult to interpret.
04/17/25 42
• Must not be offensive – whenever possible
it is wise to avoid questions that may offend
the respondent, for example those that deal
with intimate matters, those which may seem
to expose the respondent’s ignorance, and
those requiring him to give a socially
unacceptable answer.
• The questions should be fair - They should
not be phrased in a way that suggests a
specific answer, and should not be loaded.
– Short questions are generally regarded as
preferable to long ones.

04/17/25 43
• Sensitive questions - It may not be
possible to avoid asking ‘sensitive’
questions that may offend respondents,
e.g. those that seem to expose the
respondent’s ignorance. In such situations
the interviewer (questioner) should do it
very carefully and wisely

04/17/25 44
Methods of Data Organization and
Presentation

04/17/25 45
Methods of data organization and
presentation

 The data collected in a survey is called


raw data.
 In most cases, useful information is not
immediately evident from the mass of
unsorted data.
 Collected data need to be organized in
such a way as to condense the information
they contain in a way that will show
patterns of variation clearly.
04/17/25 46
Precise methods of analysis can be
decided up on only when the
characteristics of the data are understood.
For the primary objective of this different
techniques of data organization and
presentation like order array, tables and
diagrams are used.

04/17/25 47
Generally Summarizing and organizing data
can be achieved through:
1. Frequency Distributions
2. Graphical Representations
3. Measures of Central Tendency
4. Measures of variability

04/17/25 48
Frequency Distributions
o For data to be more easily appreciated and to draw
quick comparisons, it is often useful to arrange the
data in the form of a table, or in one of a number of
different graphical forms.
o When analyzing voluminous data collected from
say, a health center's records, it is quite useful to
put them into compact tables.
o Quite often, the presentation of data in a
meaningful way is done by preparing a frequency
distribution.
o If this is not done the raw data will not present any
meaning and any pattern in them (if any) may not
be detected.
04/17/25 49
Array
Array (ordered array) is a serial arrangement
of numerical data in an ascending or
descending order.
This will enable us to know the range over
which the items are spread and will also get
an idea of their general distribution.
Very difficult with large sample size
Hence it is an appropriate way of
presentation when the data are small in size
(usually less than 20).
20
04/17/25 50
Ordered Array
12 19 27 36 42 59
15 22 31 39 43 61
17 23 31 41 44 65
18 26 34 41 54 67

04/17/25 51
• The actual summarization and organization
of data starts from frequency distribution.

• Frequency distribution: A table which


has a list of each of the possible values
that the data can assume along with the
number of times each value occurs.

04/17/25 52
• For nominal and ordinal data, frequency
distributions are often used as a summary.
• Example:

• The % of times that each value occurs, or


the relative frequency, is often listed
• Tables make it easier to see how the data
are distributed
04/17/25 53
• For both discrete and continuous data,
the values are grouped into non-
overlapping intervals, usually of equal
width.

04/17/25 54
a) Qualitative variable: Count the number of
cases in each category.

- Example1: The intensive care unit type of 25


patients entering ICU at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other

04/17/25 55
Frequency Relative Frequency
ICU Type (How often) (Proportionately often)

Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08

Total 25 1.00

04/17/25 56
Example 2:
A study was conducted to assess the
characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female

Gender Frequency (n) Relative Frequency


Male (1) 110 47.0%
Female (2) 124 53.0%
Total 234 100%

04/17/25 57
b) Quantitative variable:
- Select a set of continuous, non-overlapping
intervals such that each value can be placed
in one, and only one, of the intervals.
- The first consideration is how many intervals
to include

04/17/25 58
For a continuous variable
(e.g. – age), the frequency
distribution of the individual
ages is not so interesting.

04/17/25 59
• We “see more” in
frequencies of age
values in
“groupings”. Here,
10 year groupings
make sense.
• Grouped data
frequency
distribution

04/17/25 60
To determine the number of class intervals and the
corresponding width, we may use:

Sturge’s rule:

K 1  3.322(log n)
L S
W
K

where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value

04/17/25 61
Example:
– Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5

04/17/25 62
Time Relative Cumulative
(Hours) Frequency Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00

Total 40 1.00
04/17/25 63
• Cumulative frequencies: When frequencies of
two or more classes are added.

• Cumulative relative frequency: The percentage


of the total number of observations that have a
value either in that interval or below it.

• Mid-point: The value of the interval which lies


midway between the lower and the upper limits
of a class.

04/17/25 64
• True limits: Are those limits that make an
interval of a continuous variable continuous
in both directions

• Used for smoothening of the class intervals

• Subtract 0.5 from the lower and add it to the


upper limit

04/17/25 65
Time
(Hours) True limit Mid-point Frequency

10-14 9.5 – 14.5 12 5


15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2

Total 40
04/17/25 66
Simple Frequency Distribution
• Primary and secondary cases of syphilis
morbidity by age, 1989
Age group Cases
(years) Number Percent

0-14 230 0.5


15-19 4378 10.0
20-24 10405 23.6
25-29 9610 21.8
30-34 8648 19.6
35-44 6901 15.7
45-54 2631 6.0
>44 1278 2.9
04/17/25 Total 44081 100 67
Two Variable Table
• Primary and secondary cases of syphilis
morbidity by age and sex, 1989
Age group Number of cases
(years) Male Female Total

0-14 40 190 230


15-19 1710 2668 4378
20-24 5120 5285 10405
25-29 5301 4306 9610
30-34 5537 3111 8648
35-44 5004 1897 6901
45-54 2144 487 2631
>44 1147 131 1278
04/17/25 Total 26006 18075 44081 68
Tables can also be used to present more than
three or more variables.

Variable Frequency (n) Percent


Sex
Male
Female
Age (yrs)
15-19
20-24
25-29
Religion
Christian
Muslim
Occupation
Student
Farmer
04/17/25 Merchant 69
Guidelines for constructing tables
• Keep them simple,
• Limit the number of variables to three or less,
• All tables should be self-explanatory,
• Include clear title telling what, when and where,
• Clearly label the rows and columns,
• State clearly the unit of measurement used,
• Explain codes and abbreviations in the foot-note,
• Show totals,
• If data is not original, indicate the source in foot-note.
04/17/25 70
Diagrammatic Representation

• Pictorial representations of numerical data

04/17/25 71
Importance of diagrammatic representation:

1. Diagrams have greater attraction than


mere figures.
2. They give quick overall impression of the

data.
3. They have great memorizing value than
mere figures.
4. They facilitate comparison
5. Used
04/17/25
to understand patterns and trends 72
• Well designed graphs can be powerful
means of communicating a great deal of
information

• When graphs are poorly designed, they not


only ineffectively convey message, but they
are often misleading.

04/17/25 73
Limitations of Diagrammatic Representation
1. The technique of diagrammatic representation
is made use only for purposes of comparison.
It is not to be used when comparison is either
not possible or is not necessary.
2. Diagrammatic representation is not an
alternative to tabulation. It only strengthens the
textual exposition of a subject, and cannot
serve as a complete substitute for statistical
data.
3. It can give only an approximate idea and as
such where greater accuracy is needed
diagrams will not be suitable.
4. They fail to bring to light small differences
04/17/25 74
Construction of graphs
The choice of the particular form among
the different possibilities will depend on
personal choices and/or the type of the
data.
Bar charts and pie chart are commonly
used for qualitative or quantitative discrete
data.
Histograms, frequency polygons are used
for quantitative continuous data.
04/17/25 75
There are, however, general rules that are
commonly accepted about construction of graphs:
1.Every graph should be self-explanatory and as simple as
possible.
2.Titles are usually placed below the graph and it should
again question what? Where? When? How classified?
3.Legends or keys should be used to differentiate variables
if more than one is shown.
4.The axes label should be placed to read from the left side
and from the bottom.
5.The units in to which the scale is divided should be
clearly indicated.
6.The numerical scale representing frequency must start at
zero or a break in the line should be shown.
04/17/25 76
Method of constructing bar chart
• All the bars must have equal width
• The bars are not joined together (leave
space between bars)
• The different bars should be separated by
equal distances
• All the bars should rest on the same line
called the base
• Label both axes clearly

04/17/25 77
Specific types of graphs include:
• Bar graph Nominal, ordinal
• Pie chart data

• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
• Scatter plot data
• Line graph
• Others

04/17/25 78
1. Bar Chart
 Bar diagrams are used to represent and compare
the frequency distribution of discrete variables
and attributes or categorical series

 When we represent data using bar diagram, all


the bars must have equal width and the distance
between bars must be equal.

 There are different types of bar diagrams, the


most important ones are:

04/17/25 79
A. Simple bar chart:
• It is a one-dimensional diagram in which the
bar represents the whole of the magnitude.

• The height or length of each bar indicates


the size (frequency) of the figure
represented

04/17/25 80
90
80
Number of Children 70
60
50
40
30
20
10
0
Not Immunized Partialy immunized Fully immunized
Immunization Status

Fig 1. Immunization status of children in x District Jan ,2014

04/17/25 81
B. Multiple bar chart
In this type of chart the component figures
are shown as separate bars adjoining each
other.
The height of each bar represents the
actual value of the component figure.
It depicts distributional pattern of more
than one variable
– Example of multiple bar diagrams: consider
that data on immunization status of women by
marital status.

04/17/25 82
Fig. 2 TT Immunization status by marital status of women
15-49 years, Asendabo town, 1996
04/17/25 83
There’s no reason why the bar chart can’t be
plotted horizontally instead of vertically.

CHA
Type of source
HC

Reading

Training femal
male
e
Campaign

Anti FGMC

CAT

0 10 20 30 40 50
Percent

Figure 1. Source of information on the complications of FGM and participation in RH


programs, Jijiga, 2004*. * FGMC = female genital mutilation committee; CAT= community
action team; HC = health centre; CHA= community health agent
04/17/25 84
Example: Construct a bar chart for the following data.

Distribution of patients in hospital by source of referral


Source of referral No. of patients Relative freq.
Other hospital 97 5.1
General practitioner 769 40.3
Out-patient department 623 32.7
Casualty 256 13.4
Other 161 8.5
Total 1 906 100.0

04/17/25 85
Distribution of patients in hopital X by source of referal, 1999
769
800

700 623
600
No. of patients

500

400

300 256

200 161
97
100

0
Other GP OPD Casualty Other
hospital
Source of referal

04/17/25 86
C. Component ( sub-divided) Bar
Diagram
Bars are sub-divided into component parts of the
figure.
These sorts of diagrams are constructed when each
total is built up from two or more component figures.
They can be of two kinds:
I) Actual Component Bar Diagrams: When the
overall height of the bars and the individual
component lengths represent actual figures.
 Example of actual component bar diagram: The
above data can also be presented as below.

04/17/25 87
04/17/25 88
C. Percentage Component Bar Diagram
 Where the individual component lengths
represent the percentage each component
forms the overall total.
Note that a series of such bars will all be
the same total height, i.e., 100 percent.
oExample of percentage component bar
diagram

04/17/25 89
04/17/25 90
2. Pie chart
• Shows the relative frequency for each category by
dividing a circle into sectors, the angles of which
are proportional to the relative frequency.
• Used for a single categorical variable
• Use percentage distributions

04/17/25 91
Steps to construct a pie-chart
• Construct a frequency table

• Change the frequency into percentage (P)

• Change the percentages into degrees,


where: degree = Percentage X 360o

• Draw a circle and divide it accordingly


04/17/25 92
Example: Distribution of deaths for females, in England
and Wales, 1989.

Cause of death No. of death


Circulatory system 100 000
Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000
Total 236 000

04/17/25 93
Distribution fo cause of death for females, in England and Wales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%

04/17/25 94
3. Histogram
• Histograms are frequency distributions with
continuous class intervals that have been turned
into graphs.

• To construct a histogram, we draw the interval


boundaries on a horizontal line and the
frequencies on a vertical line.

• Non-overlapping intervals that cover all of the


data values must be used.

04/17/25 95
• Bars are drawn over the intervals in such a
way that the areas of the bars are all
proportional in the same way to their
interval frequencies.

• The area of each bar is proportional to the


frequency of observations in the interval

04/17/25 96
Example: Distribution of the age of women at the time of marriage
Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49
group
Number 11 36 28 13 7 3 2
Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group

04/17/25 97
Histogram for the ages of 2087 mothers with <5
children, Adami Tulu, 2003

700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

04/17/25 98
Two problems with histograms
1. They are somewhat difficult to construct
2. The actual values within the respective
groups are lost and difficult to reconstruct

 The other graphic display (stem-and-


leaf plot) overcomes these problems

04/17/25 99
4. Stem-and-Leaf Plot
• A quick way to organize data to give visual
impression similar to a histogram while retaining
much more detail on the data.
• Similar to histogram and serves the same purpose
and reveals the presence or absence of symmetry
• Are most effective with relatively small data sets
• Are not suitable for reports and other
communications, but
• Help researchers to understand the nature of their
data

04/17/25 100
5. Frequency polygon
• A frequency distribution can be portrayed
graphically in yet another way by means of a
frequency polygon.
• To draw a frequency polygon we connect the mid-
point of the tops of the cells of the histogram by a
straight line.
• The total area under the frequency polygon is
equal to the area under the histogram
• Useful when comparing two or more frequency
distributions by drawing them on the same
diagram

04/17/25 101
Frequency polygon for the ages of 2087 mothers with <5
children, Adami Tulu, 2003
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

04/17/25 102
It can be also drawn without erecting rectangles by joining
the top midpoints of the intervals representing the frequency
of the classes as follows:

Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
12 17 22 27 32 37 42 47
Age
04/17/25 103
6. Ogive Curve (The Cummulative
Frequency Polygon)
• Some times it may be necessary to know the
number of items whose values are more or less
than a certain amount.
• We may, for example, be interested to know the
no. of patients whose weight is <50 Kg or >60 Kg.
• To get this information it is necessary to change
the form of the frequency distribution from a
‘simple’ to a ‘cumulative’ distribution.
• Ogive curve turns a cumulative frequency
distribution in to graphs.
• Are much more common than frequency polygons
04/17/25 104
Cumulative Frequency and Cum. Rel. Freq. of Age
of 25 ICU Patients

Relative Cumulative Cumulative


Age Interval Frequency Frequency frequency Rel. Freq.
(%) (%)
10-19 3 12 3 12
20-29 1 4 4 16
30-39 3 12 7 28
40-49 0 0 7 28
50-59 6 24 13 52
60-69 1 4 14 56
70-79 9 36 23 92
80-89 2 8 25 100
Total 25 100
04/17/25 105
Cumulative frequency of 25 ICU patients

04/17/25 106
Example: Heart rate of patients admitted to hospital Y, 1998

Heart rate No. of patients Cumulative frequency Cumulative frequency


Less than Method(LM) More than Method(MM)
54.5-59.5 1 1 54
59.5-64.5 5 6 53
64.5-69.5 3 9 48
69.5-74.5 5 14 45
74.5-79.5 11 25 40
79.5-84.5 16 41 29
84.5-89.5 5 46 13
89.5-94.5 5 51 8
94.5-99.5 2 53 3
99.5-104.5 1 54 1
04/17/25 107
Heart rate of patients admited in hospital Y, 1998

60

50

40
Cum. freqency

30

20

10

0
54.5

59.5

64.5

69.5

74.5

79.5

84.5

89.5

94.5

99.5

104.5
Heart rate

LM MM
04/17/25 108
Percentiles (Quartiles)
• Suppose that 50% of a cohort survived at least 4
years.
• This also means that 50% survived at most 4
years.
• We say 4 years is the median.
• The median is also called the 50th percentile
• We write: P50 = 4 years.

04/17/25 109
• Similarly we could speak of other percentiles:
– P0: The minimum
– P25: 25% of the sample values are less than or
equal to this value. 1st Quartile
. P25 means 25th percentile

– P50: 50% of the sample are less than or equal to


this value. 2nd Quartile

– P75: 75% of the sample values are less than or


equal to this value. 3rd Quartile
– P100: The maximum

04/17/25 110
It is possible to estimate the values of percentiles from
a cumulative frequency polygon.

04/17/25 111
7. Scatter plot
• Most studies in medicine involve measuring
more than one characteristic, and graphs
displaying the relationship between two
characteristics are common in literature.
• When both the variables are qualitative then
we can use a multiple bar graph.
• When one of the characteristics is qualitative
and the other is quantitative, the data can be
displayed in box and whisker plots.

04/17/25 112
• For two quantitative variables we use
bivariate plots (also called scatter plots
or scatter diagrams).

• In the study on percentage saturation of


bile, information was collected on the
age of each patient to see whether a
relationship existed between the two
measures.

04/17/25 113
• A scatter diagram is constructed by drawing X-and Y-axes.
• Each point represented by a point or dot() represents a pair of
values measured for a single study subject

Age and percentage saturation of bile for women patients in


hospital Z, 1998
160

140

120
Saturation of bile

100

80

60

40

20

0
0 10 20 30 40 50 60 70 80
Age
04/17/25 114
• The graph suggests the possibility of a
positive relationship between age and
percentage saturation of bile in women.

04/17/25 115
8. Line graph
• Useful for assessing the trend of particular situation
overtime.
• Helps for monitoring the trend of epidemics.
• The time, in weeks, months or years, is marked along the
horizontal axis, and
• Values of the quantity being studied is marked on the
vertical axis.
• Values for each category are connected by continuous
line.
• Sometimes two or more graphs are drawn on the same
graph taking the same scale so that the plotted graphs
are comparable.

04/17/25 116
No. of microscopically confirmed malaria cases by species
and month at Zeway malaria control unit, 2003
No. of confirmed malaria cases

2100

1800 Positive
1500 P. falciparum
P. vivax
1200

900

600

300

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Months

04/17/25 117
04/17/25 118

You might also like