MODULE 1 Introduction, Levels of Measurement, Frequency Distribution
MODULE 1 Introduction, Levels of Measurement, Frequency Distribution
The word statistics is derived from the latin word Statis- ‘political state’. It was called so because
the first statistics was the data collected on the total people residing in Egypt for estimating the
number of persons capable of payment of tax to the King. Later, it has got many transformations in
its name and finally took the present terminology-‘Statistics’
The word ‘statistics’ is used in different ways- one as a ‘plural’ noun and the other as a singular
‘Noun’. In plural sense, it is defined as ‘ a systematic collection of numerical information’ So the
year –wise collection od data relating to the admissions to BSc Nursing course, the annual
examination results, the number of births and deaths in a country- all are statistics. In this context, it
is often used as a synonym to data or quantitative information. While statistics is considered in
singular sense, it is a science. It is the ‘science of collecting, classifying, presenting and interpreting
data’ relating to any sphere of enquiry. Sir Ronald Fisher is the Father of modern statistics.
The application of statistical methods in biological sciences is called Bio-statistics and Sir Francis
Galton is the known as the Father of Biostatistics
DEFINITION
According to Croxton and Cowton: Statistics is defined as a science of collection, presentation,
analysis, interpretation of numerical data.
USE OF STATISTICS
1. In medicine
To design statistical procedures for resolving questions relating to medical or public health
data
To define and quantify the nature and extent of illness and deaths in the community
To identify risk factors for disease and establish causation for existence of health problem
through statistical analysis
To establish signs and symptoms of diseases by using various statistical methods
To design, monitor, analyze, interpret and report results of clinical studies
To evaluate outcome of health intervention
To compare certain attributes of the two different populations
To find the difference between efficacy of two drugs or vaccines or interventions
To check the efficacy of biomedical equipment
2. In nursing
Nurses can use statistics for describing phenomena, correlating the variables, finding the
effectiveness of nursing interventions and predicting outcomes in patients
Helps the nurse to describe how one event or situation relates to another event or situation
Understanding statistical methodologies are important for a nurse to incorporate empirical
findings to everyday nursing practice
Helps nurses to identify patterns in signs and symptoms of patients thus enabling them to
take informed decisions and better respond to the patient’s medical status
It helps the nurse to determine if commonly used interventions should be changed or
protocols to be revised
Biostatistics can also be useful while dealing with allocation of limited resources or
bringing about a change in the nursing profession
Every nurse whether pursuing a bachelors or postgraduate degree should have a basic
knowledge of statistics. This will boost their skills and confidence in delivering quality care
to patients
TERMINOLOGY
1. Data : Data is the basic building blocks of statistics and refers to the individual values
presented, measured or observed
2. Qualitative data: When the collected data are non numerical or categorical in nature, it is
called qualitative eg: gender, religion ,marital status
3. Quantitative data: When the data is measured and collected in numerical values, it is called
quantitative data .It can be ordered or ranked Eg: Height, weight, Hb level
3.1 Discrete data: the data in whole number is called discrete data. Eg: number of children in a
family, pulse rate, ESR, blood pressure, serum cholesterol
3.2 Continuous data: The data which can be measured in fractional or decimal values Eg:
height, weight, body temperature
4. Population: Population is the collection of all individuals or items under consideration in a
study
5. Sample: Sample is that part of population from which the information is collected
6. Parameter: Any numerical value computed from the population is called parameter
7.Statistic: Any numerical value computed from the sample is known as a statistic
8. Parametric test: A class of statistical tests that involve assumptions about the distribution of
the variables and estimation of a parameter are known as parametric tests
9. Non parametric tests: A test that does not involve stringent assumptions about the distribution
of critical values
10. Variable: It is an attribute or number that describes an individual or a data item. The value of
the variable may vary from one entity to another
TYPES OF DATA AND ITS MEASUREMENT
TYPES OF DATA:
Data are facts or figures from which conclusions can be drawn. Data can be classified as either
numeric(Quantitative) or nonnumeric( Qualitative)
1. Quantitative data consist of values that indicate counts or measurements. It is further
classified into discrete and continuous data
Discrete data are numeric data that have a finite number of possible values i.e, it can take only
whole number equivalents. For eg: pulse rate can take only whole number values such as 72 beats
per minute, 80 beats per minute and not fractional values
Continuous data have infinite number of possible values i.e, not limited to whole number values.
For eg: age- (1 year 6 month), salary(Rs,11,000, Rs, 11,600), Weight (1.1 Kg, 1.5 Kg )
2. Qualitative data consists of non-numeric values that can be placed into categories, commonly
termed as categorical data
The types of data greatly affect the choice of analysis method. To carry out an analysis, the
variables have to be quantified by providing values and a suitable scale. There are four levels of
measurements on a continuum of discrete and continuous scale
LEVELS OF MEASUREMENT
According to Stevens(1946)
Nominal
Ordinal
Interval
Ratio
These four measurement scales are listed in their hierarchical order of describing results, with
nominal scale being the least precise and the ratio scale being the most precise of them
1. Nominal scale
Comes from the Latin root “nomen” meaning “ name”
Simplest and lowest level of measurement
Here the numbers are assigned as labels to represent categories or characteristics
When the data are classified into two or more categories and there is no order or
difference in size of these categories, they can be labeled using nominal scale
Eg: gender is classified into male, female, transgender etc
The categories are mutually exclusive and exhaustive
Mutually exclusive means that the categories must be distinct enough that no
observations will fall into more than one category
Exhaustive means that there must be enough categories that all the observations will fall
into some category
This scale lacks numeric order, magnitude or size
Eg: gender, religion, marital status.
A scale for gender might include Male 1, Female 2. In this scale the number is assigned
for the purpose of gender identification only and does not in any way represent
magnitude, order or size
The scale uses numerals than words for statistical analysis
2. Ordinal scale
Second level of measurement
More than mere categorization, ranks the characteristics based on certain criteria
Much more accurate than nominal in measuring the characteristics
Classifies the data into categories as well as ranks them on a scale: for example, from poor
to excellent
It lacks magnitude, size, equal intervals or an absolute zero point
Categories must be mutually exclusive and exhaustive ie., an individual cannot be rated
with both mild and moderate anxieties
Eg: subjective rating scales measuring satisfaction, pain, discomfort, depression, opinion
and Likert scale are considered ordinal
Eg: age- young, middle aged, old; Health status- poor, good, average
3. Interval scale
The level of measurement goes beyond order and classifies data based on magnitude
Lacks a defined size or absolute zero point
Its significant feature is that the numbered intervals between points are equidistant,
whether those intervals are measured in centimeters or degrees
Allows for more mathematical manipulation of data
Fahrenheit temperature scale is a good example of interval scale, where the degrees are
calibrated from high to low. Each degree in the scale id=s equidistant from the next
degree, zero point is not absolute and is arbiter as a score below zero is actually
possible. The term ‘arbitrary zero’ in the scale does not refer to a complete absence of
the quantity but relatively serves as an initial point
4. Ratio scale
Final level of measurement scale
Allows for most manipulation of data
In this scale the level of measurement goes beyond order and magnitude and includes
absolute zero point
Thus this scale has all three attributes viz. magnitude, equal intervals and absolute zero
point
Represents continuous values
Scores have equal distance between attributes on a scale and are based on a true zero point
The best example for a ratio scale is height. The height scale has an order: 1 foot is less than
2 feet. It has a magnitude and the difference between 1 foot and 2 feet is the same as the
difference between 2 feet and 3 feet. It has also an absolute zero point, in the sense that 0
feet is the complete lack of height.
Other examples are weight, pulse and BP
Interval and ratio scales quantify the data and hence are quantitative. In these scales, the
variables can be categorized, ranked, have equal intervals and represent a range of values.
As they can be measured on a scale, they are also called as scale data
Summary of the properties of four levels of measurement
Scale Description Characteristics examples Permissible statistics Graphs
Descriptive Inferential
Nominal Data is Contains no Eye color Frequencies Chi-square Bar
categorized magnitude, just (Brown Percentage Binomial Pie
into names /black) Mode test
categories, Gender
but cannot (Male/
be arranged female)
in any
specific
order
Ordinal Data is Reflects only Levels of Frequencies Rank-order Bar
categorized magnitude depression Percentage correlation Pie
and Does not contain (mild/mode Mode
arranged in equal intervals or rate Median
rank order, an absolute zero /severe)
however Have rank order Stages of
the cancer
differences (1/2/3/4)
between
data values
cannot be
established
Interval Data is Possesses Temperatur Frequencies ANOVA Bar
categorized magnitude and e on (if discrete) Product (if
and ranked equal size of centigrade Percentage moment discrete)
with interval between scale Mode correlation Pie
meaningful data points (30°C/40° (if discrete) t test (if
intervals But no absolute C,50°C) Median discrete)
between zero IQ scores Mean Histogram
measureme (70/90/100) SD
nts Skewness
No zero Kurtosis
point
Ratio Data is Possesses Height Mean Coefficient Histogram
categorized magnitude, equal (2” / SD of
, ranked intervals and an 4.1”/6.3”) Skewness variation
with absolute zero Weight Kurtosis t-test
meaningful (1
intervals Kg/2.3Kg/3
and has an .5Kg)
inherent
zero
starting
point
Conversion of levels:
It is possible to convert a higher level measurement to a lower level measurement, but converse is
not possible
Eg: Consider the height of 5 students in ratio level. We can convert it into interval by subtracting
150 from all the observations. Also we can convert them to ordinal level by assigning ranks to the
heights. We can convert them to nominal level by assigning the number ‘1’ for observations
<150cm and assigning ‘2’ for observations >/= 150cm
Height (cm)
Ratio level Interval level Ordinal level Nominal level
160 10 3 2
170 20 5 2
165 15 4 2
145 -5 1 1
150 0 2 2
TYPES OF STATISTICS
1. Descriptive statistics
Consist of methods for organizing and summarizing information
These statistics summarize the data from a sample using computation of averages(mean,
median, mode), measures of dispersion (variance, standard deviation, range, interquartile
range) and include construction of graphs, charts and tables
2. Inferential statistics
Consist of techniques for measuring the reliability of conclusions about population
based on information obtained from a sample
These statistics make inferences about a population using methods like estimation (point
estimation , interval estimation) and hypothesis testing based on probability theory
Descriptive and inferential statistics are inter-related in the sense that methods of descriptive
statistics are initially used to organize and summarize the sample information before
methods of inferential statistics are used to analyze the subject under investigation
While the descriptive statistics are initially used to organize and summarize the data, the
inferential statistics are later used to estimate parameters and test the hypothesis
Types of statistics
Descriptive Inferential
59 59 59 15 26 59 26 15 59 48
40 33 15 26 59 40 33 26 40 26
2
Subtract 0.5 from all lower limits and add 0.5 to all upper limit
Class Frequency
-0.5 -9.5 2
9.5 -19.5 3
19.5-29.5 5
29.5-39.5 4
3. Cumulative frequency table
By the term cumulative, we mean increasing by successive addition. We can prepare
cumulative frequency table from a continuous grouped frequency table
There are two types of cumulative frequency tables:
Less than cumulative frequency table (LCF table)
In LCF table, the upper boundaries are written in the first column and LCF in the
second column
Greater than/ more than frequency table(GCF table)
In GCF table, lower boundaries are written in the first column and GCF in the
second column. Number of observations greater than each lower boundary is called
GCF
Example : Prepare cumulative frequency table for the following data
Class Frequency
0-10 2
10-20 3
20-30 5
30-40 4
40-50 1
Ans:
Types of tables
1. Frequency distribution table:
These tables represent the frequency and percentage distribution of the information collected,
where an attribute is grouped into number of classes ,which may vary between three and eight
Table 1
[n = 100]
(f) (%)
Age in years
40 – 50 15 15
51 – 60 18 18
61 – 70 31 31
71 and above 36 36
Gender
Male 75 75
Female 24 24
Transgender 1 1
2. Contingency table
Tables that report the frequency distribution of two nominal variables simultaneously and that
includes the totals are known as contingency table . The categories should be mutually exclusive as
well as exhaustive. Also known as cross tables, which present the frequency distribution of two or
more variables to establish the relationship or association among them. These tables could be 2 x 2,
2 x 3 and 3 x 3 depending on the number of variables on which the subjects are cross classified.
The number of subjects in a cell is called cell frequency. These tables are generally used for chi-
square test.
Table 2; Types of ventilation and daily bowel movements among patients
Bowel Mode of ventilation Total frequency X2 value
movements Spontaneous Mechanical
f (%) f (%)
Present 391 (64.0) 32(29.4) 423 45.87 * df=1
Absent 220(36.0) 77(70.6) 297
Total 611 109 720
P < 0.05
3. Multiple response table
When classification of the cases is done into categories that are neither exclusive nor exhaustive ,
then it is called multiple response table. For example, a patient can have two or more complaints,
but only the major ones will be listed. In such cases, the sum total of frequencies would exceed the
total number of subjects and may lead to confusion. Therefore, the total number of subjects in case
of multiple response table is given in the base and from this the percentages can be calculated.
Table 3: factors contributing to sleep deprivation among patients
Factors N= 60
f(%)
4. Miscellaneous tables
These tables are used to present data other than frequency or percentage distribution, such as mean,
median, mode, range, standard deviation and so on
Types of graphs
1. Qualitative data presentation
Qualitative data are presented in the forms of bar diagram, pie diagram and pictogram
1.1 Bar graph:
o It is a convenient graphical device that is particularly used for displaying nominal or ordinal data. It
is an easy method adopted for visual comparison of the magnitude of different frequencies
o The length of the bar represent the frequency of occurrence of the category.
o A bar graph has two axes, x and y wherein the x- axis represents the types of categories to be
compared an the y- axis representing the corresponding numerical values of data
o According to the direction of placement of bars, it can be vertical bar charts and horizontal bar
charts
o The width of bars should be uniform throughout the diagram
o The gap between the bars should be uniform throughout
o Both the axes should be labeled with units (if needed)
Types of bar graphs:
1. simple bar graph
Presents data pertaining to only one variable
The bars are of equal width but variable length
Boxes of equal width are marked with a little gap (equal or not less than half the class width)
between two classes
Figure 2: Multiple bar diagram showing pretest and post test scores of students
3. Component / Subdivided bar graph
Represents data in which the total magnitude is divided into components based on their ratio
This graph represents the variation in different components within each class and also between
different classes
Also called stacked chart
Fi 3: Component bar graph showing the favorite sports of boys and girls
4. Deviation bar diagram
Used when we have positive and negative values
Here we draw x axis in the middle, positive values above the x-axis and negative values
below the x-axis
1.3 Pictogram
It represents frequency distribution of ordinal, interval or ratio level data using rectangles where the
width of the bars represent class intervals and the height represents corresponding frequency
The data presented pertains to only one variable
In a histogram the bars are of equal width and touch each other indicating that data are being
presented on a continuum
The area of rectangle is proportional to the frequency of the corresponding class interval and the
total area of histogram is proportional to the total frequency of all class intervals
The difference of histogram from bar diagram is that bar diagram is one dimensional and only the
length of the bar has its significance, while in histogram both length and width matters
When the class intervals are equal, frequency is taken on y-axis, the variables on x-axis and adjacent
rectangles are constructed.
When the class intervals are unequal, a correction for unequal class intervals must be made
Fig 5: Histogram depicting the distribution of students based on their final exam score
Advantages Disadvantages
Shows the central tendency and Requires additional written or verbal
dispersion of a data set explanation
Summarizes a large dataset in a
visual form
Permits a visual check of the
accuracy and reasonableness of
calculations
Shows each intervals in the
frequency distribution
2. Frequency polygon
It is the frequency graph that depicts the overall pattern of the frequency distribution.
It is obtained by joining the tops mid-points of the histogram bars
Represents the frequency distribution of ordinal, interval or ratio data
To draw a frequency polygon, plot the frequencies corresponding to the mid-values of each class,
then join the adjacent points.
The figure obtained will not be closed. So assume, two hypothetical classes at each end.
Draw straight lines to thee midpoints of these classes on x-axis.
The resulting figure will be a closed polygon and is known as frequency polygon
Fig 6: Frequency polygon depicting the marks of students
3. Frequency curve
To draw a frequency curve, plot the points corresponding to the mid values of each class with
frequencies.
Then draw a smooth free hand curve to the adjacent points.
The both ends will touch the x-axis at infinity
The frequency curve corresponding to less than cumulative frequency table is called less than ogive
and the frequency curve corresponding to greater than frequency table is called greater than ogive or
more than ogive
5. Line graph
6. Box plot:
When the variable has extreme observations, box-plot is an ideal graphical way of viswalizing data
displaying the distribution of data based on the five number summaries: minimum, first quartile,
median, third quartile and maximum. This box is also known as box- and- whisker plot. In a typical
box plot, the top of the rectangle (box) indicates the median, and the bottom of the rectangle
indicates the first quartile. A vertical line extends from the top of the rectangle to indicate the
maximum value and another line extends from the bottom of the rectangle to indicate the minimum
value. The values outside the range of Q1 – 3 X IQR and Q1 +3 X IQR are considered as
extreme outliers.
EXERCISES:
1. Construct a frequency distribution for the following data regarding daily protein intake (in
grams) of 60 adult males
65 49 36 50 49 24 68 27 36 46
16 25 12 27 27 30 25 57 22 74
23 38 22 47 63 38 19 31 23 44
43 49 31 79 45 51 32 28 42 27
24 23 43 42 28 28 12 16 42 51
32 21 69 55 25 30 28 43 12 65
2. Following data is about the weight (Kg) for 30 adult males. Prepare a frequency distribution
55 72 64 48 53 78 58 78 57 48
61 53 77 61 69 61 67 66 68 71
76 56 65 74 69 70 68 53 58 67
3.Convert the following discrete grouped frequency distribution into continuous frequency
distribution
Class Frequency
10-18 2
20-28 3
30-38 5
40-48 4
Reaction in mm Frequency
4-6 10
6-8 13
8-10 19
10-12 35
12-14 14
14-16 09