0% found this document useful (0 votes)
157 views25 pages

MODULE 1 Introduction, Levels of Measurement, Frequency Distribution

Statistics is the science of collecting, organizing, and interpreting numerical data. It originated from data collected on populations in Egypt for tax purposes. Statistics can be used in medicine and nursing. In medicine, it is used to identify disease risk factors, compare drug efficacy, and evaluate health interventions. In nursing, it helps describe patient phenomena, identify symptom patterns, and determine if protocols need revision. Data can be qualitative like gender or quantitative with numerical values. Quantitative data is further divided into discrete data which are whole numbers, and continuous data which can include fractions or decimals.

Uploaded by

ShijiThomas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views25 pages

MODULE 1 Introduction, Levels of Measurement, Frequency Distribution

Statistics is the science of collecting, organizing, and interpreting numerical data. It originated from data collected on populations in Egypt for tax purposes. Statistics can be used in medicine and nursing. In medicine, it is used to identify disease risk factors, compare drug efficacy, and evaluate health interventions. In nursing, it helps describe patient phenomena, identify symptom patterns, and determine if protocols need revision. Data can be qualitative like gender or quantitative with numerical values. Quantitative data is further divided into discrete data which are whole numbers, and continuous data which can include fractions or decimals.

Uploaded by

ShijiThomas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

INTRODUCTION

The word statistics is derived from the latin word Statis- ‘political state’. It was called so because
the first statistics was the data collected on the total people residing in Egypt for estimating the
number of persons capable of payment of tax to the King. Later, it has got many transformations in
its name and finally took the present terminology-‘Statistics’
The word ‘statistics’ is used in different ways- one as a ‘plural’ noun and the other as a singular
‘Noun’. In plural sense, it is defined as ‘ a systematic collection of numerical information’ So the
year –wise collection od data relating to the admissions to BSc Nursing course, the annual
examination results, the number of births and deaths in a country- all are statistics. In this context, it
is often used as a synonym to data or quantitative information. While statistics is considered in
singular sense, it is a science. It is the ‘science of collecting, classifying, presenting and interpreting
data’ relating to any sphere of enquiry. Sir Ronald Fisher is the Father of modern statistics.
The application of statistical methods in biological sciences is called Bio-statistics and Sir Francis
Galton is the known as the Father of Biostatistics
DEFINITION
According to Croxton and Cowton: Statistics is defined as a science of collection, presentation,
analysis, interpretation of numerical data.
USE OF STATISTICS
1. In medicine
 To design statistical procedures for resolving questions relating to medical or public health
data
 To define and quantify the nature and extent of illness and deaths in the community
 To identify risk factors for disease and establish causation for existence of health problem
through statistical analysis
 To establish signs and symptoms of diseases by using various statistical methods
 To design, monitor, analyze, interpret and report results of clinical studies
 To evaluate outcome of health intervention
 To compare certain attributes of the two different populations
 To find the difference between efficacy of two drugs or vaccines or interventions
 To check the efficacy of biomedical equipment

2. In nursing
 Nurses can use statistics for describing phenomena, correlating the variables, finding the
effectiveness of nursing interventions and predicting outcomes in patients
 Helps the nurse to describe how one event or situation relates to another event or situation
 Understanding statistical methodologies are important for a nurse to incorporate empirical
findings to everyday nursing practice
 Helps nurses to identify patterns in signs and symptoms of patients thus enabling them to
take informed decisions and better respond to the patient’s medical status
 It helps the nurse to determine if commonly used interventions should be changed or
protocols to be revised
 Biostatistics can also be useful while dealing with allocation of limited resources or
bringing about a change in the nursing profession
 Every nurse whether pursuing a bachelors or postgraduate degree should have a basic
knowledge of statistics. This will boost their skills and confidence in delivering quality care
to patients
TERMINOLOGY
1. Data : Data is the basic building blocks of statistics and refers to the individual values
presented, measured or observed
2. Qualitative data: When the collected data are non numerical or categorical in nature, it is
called qualitative eg: gender, religion ,marital status
3. Quantitative data: When the data is measured and collected in numerical values, it is called
quantitative data .It can be ordered or ranked Eg: Height, weight, Hb level
3.1 Discrete data: the data in whole number is called discrete data. Eg: number of children in a
family, pulse rate, ESR, blood pressure, serum cholesterol
3.2 Continuous data: The data which can be measured in fractional or decimal values Eg:
height, weight, body temperature
4. Population: Population is the collection of all individuals or items under consideration in a
study
5. Sample: Sample is that part of population from which the information is collected
6. Parameter: Any numerical value computed from the population is called parameter
7.Statistic: Any numerical value computed from the sample is known as a statistic
8. Parametric test: A class of statistical tests that involve assumptions about the distribution of
the variables and estimation of a parameter are known as parametric tests
9. Non parametric tests: A test that does not involve stringent assumptions about the distribution
of critical values
10. Variable: It is an attribute or number that describes an individual or a data item. The value of
the variable may vary from one entity to another
TYPES OF DATA AND ITS MEASUREMENT
TYPES OF DATA:
Data are facts or figures from which conclusions can be drawn. Data can be classified as either
numeric(Quantitative) or nonnumeric( Qualitative)
1. Quantitative data consist of values that indicate counts or measurements. It is further
classified into discrete and continuous data
Discrete data are numeric data that have a finite number of possible values i.e, it can take only
whole number equivalents. For eg: pulse rate can take only whole number values such as 72 beats
per minute, 80 beats per minute and not fractional values
Continuous data have infinite number of possible values i.e, not limited to whole number values.
For eg: age- (1 year 6 month), salary(Rs,11,000, Rs, 11,600), Weight (1.1 Kg, 1.5 Kg )
2. Qualitative data consists of non-numeric values that can be placed into categories, commonly
termed as categorical data
The types of data greatly affect the choice of analysis method. To carry out an analysis, the
variables have to be quantified by providing values and a suitable scale. There are four levels of
measurements on a continuum of discrete and continuous scale
LEVELS OF MEASUREMENT
According to Stevens(1946)
 Nominal
 Ordinal
 Interval
 Ratio
These four measurement scales are listed in their hierarchical order of describing results, with
nominal scale being the least precise and the ratio scale being the most precise of them
1. Nominal scale
 Comes from the Latin root “nomen” meaning “ name”
 Simplest and lowest level of measurement
 Here the numbers are assigned as labels to represent categories or characteristics
 When the data are classified into two or more categories and there is no order or
difference in size of these categories, they can be labeled using nominal scale
 Eg: gender is classified into male, female, transgender etc
 The categories are mutually exclusive and exhaustive
 Mutually exclusive means that the categories must be distinct enough that no
observations will fall into more than one category
 Exhaustive means that there must be enough categories that all the observations will fall
into some category
 This scale lacks numeric order, magnitude or size
 Eg: gender, religion, marital status.
 A scale for gender might include Male 1, Female 2. In this scale the number is assigned
for the purpose of gender identification only and does not in any way represent
magnitude, order or size
 The scale uses numerals than words for statistical analysis
2. Ordinal scale
 Second level of measurement
 More than mere categorization, ranks the characteristics based on certain criteria
 Much more accurate than nominal in measuring the characteristics
 Classifies the data into categories as well as ranks them on a scale: for example, from poor
to excellent
 It lacks magnitude, size, equal intervals or an absolute zero point
 Categories must be mutually exclusive and exhaustive ie., an individual cannot be rated
with both mild and moderate anxieties
 Eg: subjective rating scales measuring satisfaction, pain, discomfort, depression, opinion
and Likert scale are considered ordinal
 Eg: age- young, middle aged, old; Health status- poor, good, average
3. Interval scale
 The level of measurement goes beyond order and classifies data based on magnitude
 Lacks a defined size or absolute zero point
 Its significant feature is that the numbered intervals between points are equidistant,
whether those intervals are measured in centimeters or degrees
 Allows for more mathematical manipulation of data
 Fahrenheit temperature scale is a good example of interval scale, where the degrees are
calibrated from high to low. Each degree in the scale id=s equidistant from the next
degree, zero point is not absolute and is arbiter as a score below zero is actually
possible. The term ‘arbitrary zero’ in the scale does not refer to a complete absence of
the quantity but relatively serves as an initial point
4. Ratio scale
 Final level of measurement scale
 Allows for most manipulation of data
 In this scale the level of measurement goes beyond order and magnitude and includes
absolute zero point
 Thus this scale has all three attributes viz. magnitude, equal intervals and absolute zero
point
 Represents continuous values
 Scores have equal distance between attributes on a scale and are based on a true zero point
 The best example for a ratio scale is height. The height scale has an order: 1 foot is less than
2 feet. It has a magnitude and the difference between 1 foot and 2 feet is the same as the
difference between 2 feet and 3 feet. It has also an absolute zero point, in the sense that 0
feet is the complete lack of height.
 Other examples are weight, pulse and BP

Interval and ratio scales quantify the data and hence are quantitative. In these scales, the
variables can be categorized, ranked, have equal intervals and represent a range of values.
As they can be measured on a scale, they are also called as scale data
Summary of the properties of four levels of measurement
Scale Description Characteristics examples Permissible statistics Graphs
Descriptive Inferential
Nominal Data is Contains no Eye color Frequencies Chi-square Bar
categorized magnitude, just (Brown Percentage Binomial Pie
into names /black) Mode test
categories, Gender
but cannot (Male/
be arranged female)
in any
specific
order
Ordinal Data is Reflects only Levels of Frequencies Rank-order Bar
categorized magnitude depression Percentage correlation Pie
and Does not contain (mild/mode Mode
arranged in equal intervals or rate Median
rank order, an absolute zero /severe)
however Have rank order Stages of
the cancer
differences (1/2/3/4)
between
data values
cannot be
established
Interval Data is Possesses Temperatur Frequencies ANOVA Bar
categorized magnitude and e on (if discrete) Product (if
and ranked equal size of centigrade Percentage moment discrete)
with interval between scale Mode correlation Pie
meaningful data points (30°C/40° (if discrete) t test (if
intervals But no absolute C,50°C) Median discrete)
between zero IQ scores Mean Histogram
measureme (70/90/100) SD
nts Skewness
No zero Kurtosis
point
Ratio Data is Possesses Height Mean Coefficient Histogram
categorized magnitude, equal (2” / SD of
, ranked intervals and an 4.1”/6.3”) Skewness variation
with absolute zero Weight Kurtosis t-test
meaningful (1
intervals Kg/2.3Kg/3
and has an .5Kg)
inherent
zero
starting
point
Conversion of levels:
It is possible to convert a higher level measurement to a lower level measurement, but converse is
not possible
Eg: Consider the height of 5 students in ratio level. We can convert it into interval by subtracting
150 from all the observations. Also we can convert them to ordinal level by assigning ranks to the
heights. We can convert them to nominal level by assigning the number ‘1’ for observations
<150cm and assigning ‘2’ for observations >/= 150cm
Height (cm)
Ratio level Interval level Ordinal level Nominal level
160 10 3 2
170 20 5 2
165 15 4 2
145 -5 1 1
150 0 2 2
TYPES OF STATISTICS
1. Descriptive statistics
 Consist of methods for organizing and summarizing information
 These statistics summarize the data from a sample using computation of averages(mean,
median, mode), measures of dispersion (variance, standard deviation, range, interquartile
range) and include construction of graphs, charts and tables
2. Inferential statistics
 Consist of techniques for measuring the reliability of conclusions about population
based on information obtained from a sample
 These statistics make inferences about a population using methods like estimation (point
estimation , interval estimation) and hypothesis testing based on probability theory
Descriptive and inferential statistics are inter-related in the sense that methods of descriptive
statistics are initially used to organize and summarize the sample information before
methods of inferential statistics are used to analyze the subject under investigation
While the descriptive statistics are initially used to organize and summarize the data, the
inferential statistics are later used to estimate parameters and test the hypothesis

Types of statistics

Descriptive Inferential

Measures of Measures of Estimate Hypothesis


central dispersion
tendency
Standard
Mean deviation, H0, H1
Point
variance,
Median estimate
range,
Mode Interquartile
P value
range Confidence
interval
ORGANIZATION AND PRESENTATION OF DATA
The collected data is usually in an unorganized form. It needs to be organized to conduct statistical
analysis. This organization can be done by way of classification, tabulation and graphical
presentation.
Classification of data
The process of organizing data into groups and classes based on certain attributes is termed as
classification of data. This organization eliminates irrelevant details and places data with common
attributes in one class by dividing the entire data into a number of groups of classes. In this process
data are combined into small number of class intervals followed by an indication of number of
cases in each class. This process of classifying data into groups is called frequency distribution
The range of each group of data is called class interval
Each class has a lower and an upper limit and the difference between the two is known as class
width. This process allows comparison among the categories of observation and highlights the
important aspects of data
Classification can be one either according to attributes or numerical characteristics
Classification according to attributes:
In this data are classified based on descriptive characteristics such as gender, type of family, marital
status etc
These characteristics cannot be measured quantitatively but only their presence or absence in an
individual item can be observed
Classification according to numerical characteristics:
In this data are classified based on numerical characteristics such as height, weight, blood pressure
etc. These are measured using some specific units
Height in ft Frequency
1-2 2
2-3 5
3-4 8

Class width= 2-1 = 1


The class intervals are almost kept equal
Tabulation of data
It is the process of summarizing and presenting the data in the form of a table so as to aid easy
understanding. It should facilitate interpretation and subsequent analysis.
Frequency distribution uses the table format to organize the data, enabling the readers to
comprehend the basics of data distribution. This tabulation makes it easier to gather information
about central tendency, dispersion and outliers.
Frequency distribution: It is a table wherein the data are grouped into classes according to
common attributes and the number of times each value falls into particular class. It records the
frequency of occurrence for each value of a single variable. It facilitates computation of various
statistical measures
Frequency distribution can be classified into two types
1. Univariate frequency distribution: It includes values of only one variable
2. Bivariate frequency distribution:
It includes values of two variables, which can be further classified into three categories
a) Series of individual observation
 It refers to listing of items of each observation
 For eg: well being scores of 20 adolescents displayed individually refers to a series of
individual observation

59 59 59 15 26 59 26 15 59 48
40 33 15 26 59 40 33 26 40 26

b) Discrete frequency distribution


 Different values of a variable and their respective frequencies (ie., the number of
times each value occurs) are displayed side by side
 This is facilitated by listing the values from highest to lowest or vice-versa, using the
technique of tally bars to record the frequencies
 While the first column indicates all the values of the variable, the second column has
vertical bars called tally bars(/) to record the frequencies indicating number of times
the value has occurred
 For easy counting, blocks of five bars are pit together with some space left in
between these blocks
 After placing tally bars for all the values, its corresponding value is recorded in the
third column
Example for discrete frequency distribution :

Marks Tally bars Frequency


15 /// 3
26 //// 5
33 // 2
40 /// 3
48 / 1
69 //// / 6
20

c) Continuous frequency distribution


 It is an arrangement of values wherein each interval of the table includes the
frequency of occurrence of values within that interval.
 The groups into which the values of a variable are classified are known as
classes and the range of vales within that class is called the class interval
 The size of the class interval is called the width of the class and the
boundaries of the class interval are known as class limits
 The arrangement of data into continuous classes with the corresponding
frequencies is known as continuous frequency distribution
 It is ideal to have 5 to 14 class intervals
 After finalizing the number of class intervals, the marking of tally bars is
done to record the number of values that fall into each class interval
 This is followed by recording the frequency in the adjoining column by
adding the tally bars.
 The width/size of the class is obtained by dividing the range of observations
with number of class intervals
 In the earlier mentioned example of wellbeing score of adolescents, the range
of values is 59-15=44
 This is divided by the total number of class intervals ie., to obtain the class
width 44 /5 = 8.8~ 10
Marks Tally bars No. of students (f)
15-25 /// 3
25-35 //// // 7
35-45 /// 3
45-55 / 1
55-65 //// / 6
20

Guidelines for constructing class widths


 Class widths should be of equal size. Unequal class widths should
only be used to overcome large gaps in existing data
 Class intervals should be mutually exclusive and overlapping
 Open-ended class intervals should be avoided eg < 100, > 1000
Two methods of classifying the data according to class intervals are
1. Exclusive method:
 When the lower limit is included and the upper limit excluded, then it is an exclusive
class interval.
 The upper limit of a particular class becomes the lower limit of the next class
interval
 It is called “exclusive” as the values equal to the upper limit of the class are
excluded from that class and included in the next class
 These are used in case of continuous variable
 For eg: consider the exclusive class intervals 20- 40, 40-60 … etc. In the class
interval 20-40, 20 is included, but 40 is excluded and it is included in the next
interval 40-60
2. Inclusive method
 When both the upper and the lower class limits are included, then it is inclusive class
interval
 Here, the upper class limit of a particular class is 1 less than the lower limit of the
next class interval
 It is called “inclusive” as the values of both the upper and lower class limits are
included in the same class itself
 Used in case of discrete variable
 Eg: 20-39,40-59
Guidelines for constructing a frequency distribution
1. Type of classes:
 Each class should be clearly defined and non ambiguous
 They should be mutually exclusive and exhaustive so that each value of a
variable corresponds to only one class
2. Number of classes:
The factors that decide the number of classes are
 The total frequency or number of observations in the distribution
 The size of the magnitude of the values of the variable
 The desired accuracy
 The ease with which the various descriptive measures can be computed
3. Size of class interval
 The size of the class interval and the number of classes in a given distribution
are inversely proportional
 It is always desirable to have class intervals of equal magnitude
4. Class boundaries
 Class boundaries are actual class limits( the minimum and maximum value a class interval
may contain) of a class interval
 In overlapping or exclusive type of classification, both class boundaries and limits are the
same
 For eg: 20-40,40-60, 60-80.., the upper class limits are excluded, thus the actual limits are
20-39,40-59, 60-79 and boundaries are also same
 Whereas in overlapping or mutually inclusive classification, class boundaries and limits are
different.
 For eg: as in 20-39,40-59,60-79 class boundaries are obtained subtracting d/2 from the
lower class of each m=limit and adding d/2 to the upper class of each limit, where d is the
gap between the upper limit of any class and lower limit of succeeding class . In the
example the lass boundary of class boundary of the class interval 20-39 can be obtained by
subtractingb0.5 from the lower limit and adding 0.5 to the upper limit ie., 19.5-39.5
5. Mid- value:
The mid value can be defined as the average of the upper and lower limits of a class
It is obtained by dividing the sum of the upper and lower class limits by 2. In the example for the
class of 14-25, it is (UCL + LCL) / 2 = ( 25 + 15) / 2 = 20
6. Open end class
Class intervals wherein the lower limit of the first class or the upper limit of the last class or both
are not specified are termed as frequency distributions with open end classes. For eg: classes such
as weight less than 50 Kg, or height above 6 ft
Frequency tables:
They are of 2 types
 Ungrouped frequency table
 Grouped frequency table
 Cumulative frequency table
1. Ungrouped frequency table
If the values are discrete or distinct, we use ungrouped frequency table. Here, the distinct
values are represented in the first column, put tally marks in the second column and
frequency in the third column
Example
Marks Tally bars Frequency
15 /// 3
26 //// 5
33 // 2
40 /// 3
48 / 1
69 //// / 6
20
2. Grouped frequency table
They are of two types
2.1 discrete grouped frequency table
Here, the class limits(upper and lower limits) are written in the first column, tally marks
in the second column and frequency in the third column.
Upper and lower limits are included in the same class
Class interval = (Upper limit – lower limit)+1
2.2 Continuous grouped frequency table
Here the class limits are written in the first column, tally marks in the second column
and frequency in the third column.
Lower boundary is included in each class and upper boundary is excluded
Class interval = Upper limit – lower limit
Example1:
Marks Tally bars No. of students (f)
15-25 /// 3
25-35 //// // 7
35-45 /// 3
45-55 / 1
55-65 //// / 6
20
Example 2: Convert the following discrete grouped frequency table into continuous
frequency table
Class Frequency
0-9 2
10-19 3
20-29 5
30-39 4
Lower limit of second class – upper limit of first
class = 10 – 9 / 2 = 0.5

2
Subtract 0.5 from all lower limits and add 0.5 to all upper limit
Class Frequency
-0.5 -9.5 2
9.5 -19.5 3
19.5-29.5 5
29.5-39.5 4
3. Cumulative frequency table
By the term cumulative, we mean increasing by successive addition. We can prepare
cumulative frequency table from a continuous grouped frequency table
There are two types of cumulative frequency tables:
 Less than cumulative frequency table (LCF table)
In LCF table, the upper boundaries are written in the first column and LCF in the
second column
 Greater than/ more than frequency table(GCF table)
In GCF table, lower boundaries are written in the first column and GCF in the
second column. Number of observations greater than each lower boundary is called
GCF
Example : Prepare cumulative frequency table for the following data

Class Frequency
0-10 2
10-20 3
20-30 5
30-40 4
40-50 1
Ans:

LCF Table GCF Table


Upper boundary LCF Lower boundary GCF
10 2 0 15
20 5 10 13
30 10 20 10
40 14 30 5
50 15 40 1
TABLE – TABULAR PRESENTATION OF DATA
A statistical table is the logical listing of related quantitative data in vertical columns and horizontal
rows and numbers, with sufficient explanatory and qualifying words, phrases and statements in the
form of titles, heading and foot notes to make clear the full meaning of the data and their origin.
Turtle
Components of table:
1. Table number: each table should be suitably numbered for proper identification an future
reference. The number should be quoted above or with the title of the table either in the center or in
the left side of the table
2. Table title: The table title should be suitable, clear, concise and self explanatory and be given just
above the frame of the table. The title should be brief an din itself explain the what( the data are
about), where(the data are), when (the data occurred and how(the data is classified) in a concise
language.
3. Caption: It refers to the column headings, which explains what each column represents. The
captions should be clearly defined and placed in the middle of the column
4. Stub: These define each row and are placed in the extreme left. They perform the same function
for the horizontal rows as the caption performs for the vertical column.
5. Body: It includes the crucial part of the collected information and is numerical in nature
6. Head note: It is a brief explanatory note about the contents in the table not covered in the title,
captions or stubs. It is optional and generally placed in brackets immediately after the title
7. Foot notes: It is written below the table for any clarification an d it is optional
8. Source: If any secondary data is used, the source of data should invariably be stated allowing the
reader for cross checking
Requirements of a good statistical table
 Purpose : A table should suit the purpose and keep objective of the statistical enquiry
 Preparation: A table should be simple, systematic, compact and logically organized
 Clarity: A table should be readily comprehensible and complete.
 Size: The table should be of manageable size. It should not neither be very long and broad,
Should not be split into two pages
 Numbering: all tables should be numbered
 Approximation: Approximate to tow decimal space , specify the units of measurement
 Presentation: should be presented in an appealing way and grab the attention of the reader
readily
 Units: The units should invariably be depicted in the table in the corresponding row or
column
 Totals and percentages: the corresponding percentages should find a place to the right or at
the bottom of the column
 Font: It is advisable to use capital and bold for headings, subheadings, stubs, captions,
 Abbreviation: use only accepted abbreviations, avoid use of ditto(“) mark.

Types of tables
1. Frequency distribution table:
These tables represent the frequency and percentage distribution of the information collected,
where an attribute is grouped into number of classes ,which may vary between three and eight
Table 1

Frequency and distribution of samples based on their age and sex

[n = 100]

Demographic Variables Frequency Percentage

(f) (%)

Age in years

40 – 50 15 15

51 – 60 18 18

61 – 70 31 31

71 and above 36 36

Gender

Male 75 75

Female 24 24

Transgender 1 1

2. Contingency table
Tables that report the frequency distribution of two nominal variables simultaneously and that
includes the totals are known as contingency table . The categories should be mutually exclusive as
well as exhaustive. Also known as cross tables, which present the frequency distribution of two or
more variables to establish the relationship or association among them. These tables could be 2 x 2,
2 x 3 and 3 x 3 depending on the number of variables on which the subjects are cross classified.
The number of subjects in a cell is called cell frequency. These tables are generally used for chi-
square test.
Table 2; Types of ventilation and daily bowel movements among patients
Bowel Mode of ventilation Total frequency X2 value
movements Spontaneous Mechanical
f (%) f (%)
Present 391 (64.0) 32(29.4) 423 45.87 * df=1
Absent 220(36.0) 77(70.6) 297
Total 611 109 720
P < 0.05
3. Multiple response table
When classification of the cases is done into categories that are neither exclusive nor exhaustive ,
then it is called multiple response table. For example, a patient can have two or more complaints,
but only the major ones will be listed. In such cases, the sum total of frequencies would exceed the
total number of subjects and may lead to confusion. Therefore, the total number of subjects in case
of multiple response table is given in the base and from this the percentages can be calculated.
Table 3: factors contributing to sleep deprivation among patients
Factors N= 60
f(%)

Blood sampling 35(58.3)


Diagnostic tests 33(55.0)
Medication 33(55.0)
Vital signs monitoring 32(53.3)
Noise 32(53.3)
Bright lights 30(50.0)

4. Miscellaneous tables
These tables are used to present data other than frequency or percentage distribution, such as mean,
median, mode, range, standard deviation and so on

GRAPHICAL PRESENTATION OF DATA


A graphical presentation is another way of analyzing numerical data. In this data are presented
using lines or curves drawn across the coordinated points
Characteristics:
 Helps to quantify, sort and present data in an easily comprehendible manner
 Displays the relationship between variables by exhibiting the change in one variable
corresponding to the change in the other
 Provides complete picture at a glance
 More effective than presenting the data in tabular form
 Computer software packages helps in presenting graphical data in a variety of ways
 A graph consists of two co-ordinate axes, one vertical(known as X- axis or ordinate) and the
other horizontal (known as y axis or abscissa). These two lines are perpendicular to each
other and intersect at “0” also called the origin
 On the X axis the values to the right of the origin are positive and those to the left are
negative
 On the Y axis the values above the origin are positive and below the origin are negative
 The coordinate system is divided into four areas

Tips for constructing a graph:


 The graph should be properly drawn
 The size of the scale should neither be too big nor too small
 The graph should be self explanatory
 To make graphs easily comprehendible, a combination of shades and colors can be used
 Vertical graphs should be preferred over horizontal graphs
 Graphs should not be made attractive or impressive at the cost of accuracy
 Footnotes may be added to overcome ambiguities

Types of graphs
1. Qualitative data presentation
Qualitative data are presented in the forms of bar diagram, pie diagram and pictogram
1.1 Bar graph:
o It is a convenient graphical device that is particularly used for displaying nominal or ordinal data. It
is an easy method adopted for visual comparison of the magnitude of different frequencies
o The length of the bar represent the frequency of occurrence of the category.
o A bar graph has two axes, x and y wherein the x- axis represents the types of categories to be
compared an the y- axis representing the corresponding numerical values of data
o According to the direction of placement of bars, it can be vertical bar charts and horizontal bar
charts
o The width of bars should be uniform throughout the diagram
o The gap between the bars should be uniform throughout
o Both the axes should be labeled with units (if needed)
Types of bar graphs:
1. simple bar graph
 Presents data pertaining to only one variable
 The bars are of equal width but variable length
 Boxes of equal width are marked with a little gap (equal or not less than half the class width)
between two classes

Figure1: Simple bar diagram showing favorite sports of students


2. Multiple bar graph
 It represents data pertaining to two or more variables
 The sets of data are interrelated
 Facilitates comparison between different categories
 Different colors or shades are used for each category

Figure 2: Multiple bar diagram showing pretest and post test scores of students
3. Component / Subdivided bar graph
 Represents data in which the total magnitude is divided into components based on their ratio
 This graph represents the variation in different components within each class and also between
different classes
 Also called stacked chart
Fi 3: Component bar graph showing the favorite sports of boys and girls
4. Deviation bar diagram
 Used when we have positive and negative values
 Here we draw x axis in the middle, positive values above the x-axis and negative values
below the x-axis

Advantages and disadvantages of bar graph


Advantages Disadvantages
 Summarizes large data in visual form  Requires additional written or verbal
 Clarifies trends better than tables explanation
 Allows estimation of key values at a glance  Be easily manipulated to provide false
 Shows each nominal or ordinal category in a impression
frequency distribution  Fails to reveal causes and effects
1.2 Pie / sector diagram
Pie diagram is circular representation of statistical data where percentages are displayed as sectors of
a circle; pieces of the pie. It is a useful pictorial device for presenting discrete data of qualitative
characteristics such a sage groups, gender, occupation etc
Characteristics
 Used only for qualitative data
 Only one variable can be included in a chart
 Should not include too many categories (usually fewer than 6)
 In a circle the angle of the sector is proportional to the frequency of the data and is given by the
formula
Angle of the sector = Frequency of data x 360°
Total frequency

Fig:4 Pie diagram depicting the marital status of the sample

Advantages and disadvantages of pie graph


Advantages Disadvantages
 Easily understandable  Reveals little information on central tendency
 Summarizes a large data set in visual form and dispersion
 Shows areas proportional to the number of  Fails to reveal cause and effect
data points in each category

1.3 Pictogram

 Also called pictograph, picture chart or an icon chart


 It is a popular form of data visualization wherein statistical data is represented using symbolic
figures to match the frequencies of different kind of data
 Each picture or symbol may represent one or more units of data
Advantages and disadvantages of pictogram
Advantages Disadvantages
 More attractive compared to other  Difficult to construct
diagrams  Difficult to compare
 Facts displayed in pictogram are generally  Possibility of data getting distorted
remembered longer than mere facts

Quantitative data presentation


1. Histogram

 It represents frequency distribution of ordinal, interval or ratio level data using rectangles where the
width of the bars represent class intervals and the height represents corresponding frequency
 The data presented pertains to only one variable
 In a histogram the bars are of equal width and touch each other indicating that data are being
presented on a continuum
 The area of rectangle is proportional to the frequency of the corresponding class interval and the
total area of histogram is proportional to the total frequency of all class intervals
 The difference of histogram from bar diagram is that bar diagram is one dimensional and only the
length of the bar has its significance, while in histogram both length and width matters
 When the class intervals are equal, frequency is taken on y-axis, the variables on x-axis and adjacent
rectangles are constructed.
 When the class intervals are unequal, a correction for unequal class intervals must be made
Fig 5: Histogram depicting the distribution of students based on their final exam score

Advantages Disadvantages
 Shows the central tendency and  Requires additional written or verbal
dispersion of a data set explanation
 Summarizes a large dataset in a 
visual form
 Permits a visual check of the
accuracy and reasonableness of
calculations
 Shows each intervals in the
frequency distribution

2. Frequency polygon

 It is the frequency graph that depicts the overall pattern of the frequency distribution.
 It is obtained by joining the tops mid-points of the histogram bars
 Represents the frequency distribution of ordinal, interval or ratio data
 To draw a frequency polygon, plot the frequencies corresponding to the mid-values of each class,
then join the adjacent points.
 The figure obtained will not be closed. So assume, two hypothetical classes at each end.
 Draw straight lines to thee midpoints of these classes on x-axis.
 The resulting figure will be a closed polygon and is known as frequency polygon
Fig 6: Frequency polygon depicting the marks of students
3. Frequency curve

 To draw a frequency curve, plot the points corresponding to the mid values of each class with
frequencies.
 Then draw a smooth free hand curve to the adjacent points.
 The both ends will touch the x-axis at infinity

Fig 7: Frequency curve depicting the age of the students in years


4. Cumulative frequency curve or Ogives

 The frequency curve corresponding to less than cumulative frequency table is called less than ogive
and the frequency curve corresponding to greater than frequency table is called greater than ogive or
more than ogive
5. Line graph

 It is used for representing time series data

6. Box plot:
When the variable has extreme observations, box-plot is an ideal graphical way of viswalizing data
displaying the distribution of data based on the five number summaries: minimum, first quartile,
median, third quartile and maximum. This box is also known as box- and- whisker plot. In a typical
box plot, the top of the rectangle (box) indicates the median, and the bottom of the rectangle
indicates the first quartile. A vertical line extends from the top of the rectangle to indicate the
maximum value and another line extends from the bottom of the rectangle to indicate the minimum
value. The values outside the range of Q1 – 3 X IQR and Q1 +3 X IQR are considered as

extreme outliers.
EXERCISES:
1. Construct a frequency distribution for the following data regarding daily protein intake (in
grams) of 60 adult males
65 49 36 50 49 24 68 27 36 46
16 25 12 27 27 30 25 57 22 74
23 38 22 47 63 38 19 31 23 44
43 49 31 79 45 51 32 28 42 27
24 23 43 42 28 28 12 16 42 51
32 21 69 55 25 30 28 43 12 65

2. Following data is about the weight (Kg) for 30 adult males. Prepare a frequency distribution
55 72 64 48 53 78 58 78 57 48
61 53 77 61 69 61 67 66 68 71
76 56 65 74 69 70 68 53 58 67
3.Convert the following discrete grouped frequency distribution into continuous frequency
distribution
Class Frequency
10-18 2
20-28 3
30-38 5
40-48 4

4. Prepare a LCF and GCF table for the following data


Marks 0-5 5-10 10-15 15-20 20-25
No. of 3 6 10 12 19
students

5. Prepare a LCF and GCF table for the following data


Marks 0-9 10-19 20-29 30-39 40-49
No. of 3 5 6 4 2
students

6. Draw a histogram for the following data

Age No. of males


15 – 20 15
20-25 20
25-30 40
30-35 30
7. Convert the data into histogram and Pie diagram. Frequency distribution of tuberculin
reaction of 100 school children is given.

Reaction in mm Frequency
4-6 10
6-8 13
8-10 19
10-12 35
12-14 14
14-16 09

8. Draw a frequency polygon for the following data

Age in yrs 10-20 20-30 30-40 40-50 50-60 60-70 70-80


No. of 6 25 25 20 16 12 10
patients

You might also like