0% found this document useful (0 votes)

10 views

DOC-20230410-WA0113.

Uploaded by

drtejaschopda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

DOC-20230410-WA0113.

Uploaded by

drtejaschopda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 57

INTRODUCTION AND APPLICATIONS OF BIOSTATISTICS IN MEDICAL

SCIENCES
STATISTICS
Statistic or datum means measured or counted fact or piece of information stated as a figure such as height, age
weight, birth of a baby etc. Statistics or data would be same stated in more the one figures such as height of 10
persons, blood pressure of 15 patients, dental caries for 5 children etc. Statistics is a science of figure; Statistics
can be applied in various fields/ areas such as agriculture, economics pharmacy engineering etc.

Types of statistics in medical science:

(1) Biostatistics (2) Vital statistics and (3) Health statistics
Biostatistics: Statistics which is applied in Medicine
Vital statistics: Statistics which is applied for vital events such as birth, death, marriages, etc.
Health Statistics: Statistics which is applied in Public health
Biostatistics is a technique which deals with Collection, Classification, Presentation, Analysis and Interpretation
of data.
Data: A set of values or information is called as data.
e.g. Blood pressure of 150 patients of AMI, BSL of 10 patients etc.
Source of data: Experiments - Surveys - Records
Applications and Uses of Biostatistics as a science:

1. Physiology: To find out co-relation between two variables such as height and weight – whether weight
increases / decreases, height increases / decreases.
2. Anatomy: To find out difference between means and proportions of normal at two places or in different
places.
3. Pharmacology: to find out the action of drug, to compare the action of two drugs, to find out relative potency
of a new drug with respect to a standard drug etc.
4. Medicine: To find out the efficacy of a particular drug, operation or line of treatment (i.e. comparison of case
and control), to find out an association between toe attributes such as cancer and smoking or malaria
and social class, to identify signs and symptoms of a disease or syndrome.
5. Dentistry: to find out the dental carries in a community or in a school going children.
6. Community Medicine: To test usefulness of sera and vaccines in the field / community i.e. percentage of
attacks or deaths among vaccinated subjects is compared with that among unvaccinated. In epidemiology
studies the role of causative factors is statistically tested. Deficiency of Iodine as an important cause of
Goitre in a community is confirmed after comparing the incidence of goitre cases before and after giving
iodized salt. To find out different rates / ratios, prevalence and incidence rates of a disease in a community.

1
ROLE OF BIOSTATISTICS IN MEDICINE

 WHY STATISTICS TO MEDICAL AND OTHER HEALTH SCIENCES STUDENTS?

Some of the more important reasons to introduce the subject of Biostatistics to Medical and Health Sciences
Students are
 Knowledge of statistics is required in order both to understand the rationale on which diagnostic,
prognostic and therapeutic decisions are, or should be, based, and to appreciate that medicine is highly
dependent on concepts of probability.
 Health workers need to interpret, within their competence, laboratory tests and bedside observations and
measurements in the light of knowledge of physiological, observer, and instrument variation.
 Health workers must know and understand the statistical and epidemiological facts about the etiology
and prognosis of the diseases they treat, in order to give the best advice to their patients about how to
avoid, or limit the effects of, these diseases.
 Health workers are the primary generators of the data on which health statistics is based. They therefore
need to know how data can and should be used, both for the benefit of their own practices, and for the
organization and delivery of health care in their countries/communities.
 Health managers need to know how to interpret and draw inferences from the statistics that describe
their country’s health problems, and how best to use the resources available to meet them.
 The study of statistics helps to foster in students the critical and deductive faculties they will need
throughout their studies and, after graduation, in their practices.

WHAT IS STATISTICS?

The term “statistics” is used in two ways. First, it refers to the everyday use of: data, - numerical observations,
- quantitative information
Examples
1. Number of trained medical personnel in Maharashtra (district-wise). 2. Birth weights of babies born in a
hospital/community. 3. Age of patients seen at Orthopedic Clinic in a Hospital. Prevalence of oral cancers, per
1000 of population, in Ahmednagar district. Prevalence of physical disability in children < 14 years in a
community. 5. Amount of creatinine in mg per liter in a 24-hour urine specimen. Statistics also refers to the
discipline, comprising: Statistical methods The study of scientific methods of collecting, processing, reducing,
presenting, analyzing and interpreting data, and of making inferences and drawing conclusions from numerical
data.

WHAT IS BIOSTATISTICS?
The term “Biostatistics” can be understood as (1) Statistics arising out of biological sciences, including from the
fields of medicine and public health. (2) The methods and principles used in dealing with statistics in the field of
biological sciences including medicine and public health and planning, conducting and analyzing data which
arise in investigations in these branches.
MAIN USES OF STATISTICAL METHODS
Three main uses of statistical methods are:
a) To collect data in the best possible and scientific way: This includes methods of: - designing forms for data
collection - organizing the collection procedure - designing and executing experiments/clinical trials -
conducting surveys in a population

2
Examples
1. Collection of data from mothers about their breast feeding practices. 2. Systematic collection of data on births
and deaths. 3. Collection of data to compare the relative effects of ergometrine+oxytocin and ergometrine alone
in the third-stage management of obstetric labour. 4. Collection of data on industrial workers of a given
geographical area.
b) To describe the characteristics of a group or a situation: This is accomplished mainly by: data reduction,
- data summary, - data presentation (Classification, Tabulation and Graphs/diagrams) C) To analyze data and to
draw conclusions from such analyses. This involves the use of various analytical techniques and the35 use of
probability concepts in drawing conclusions.

USE OF STATISTICAL CONCEPTS AND PRINCIPLES IN MEDICINE

The use of statistics is essential in health care delivery, at the levels of both community and individual patients.
Medicine deals with individuals who exhibit differences in various characteristics such as weight, height, blood
pressure, cholesterol, immunoglobulin levels, blood sugar etc. What constitutes a healthy state with respect to
each characteristic varies from individual to individual. No two patients or groups of individuals are ever
exactly alike, yet decisions affecting patients or communities of similar biological and social characteristics. It
must be recognized that, because of the differences, these decisions can not be exact: they are always
accompanied by some uncertainty. This is the probabilistic nature of medicine.

The application of statistics is also useful in developing a critical thinking faculty, in order to be able to:- think
scientifically, logically and critically about medical problems.- assess properly available evidence for decision-
making. - be aware of possible risks associated with medical decisions. Identify decisions and conclusions that
lack a scientific and logical basis.
Statistical principles and concepts are applied in various areas in medicine. Some examples are given below.
A) Handling of variation
Variation in a characteristic occurs when its value changes from subject to subject, or from time to time within
the same subject. Nearly all characteristics encountered in health care delivery, whether physiological,
biochemical or immunological, exhibit variations. The extent of this variability is biological or otherwise is
learnt by defining normal value and fixing normal limits.
Examples: Age, weight, height, blood pressure, cholesterol level, bilirubin, albumin, immunoglobulin levels,
platelet count, glucose level etc.
B) Diagnosis of patients’ ailments and community health
Diagnosis is the process whereby the health status of an individual, or group of individuals, and factors
producing it, is identified. The various disease categories, one distinct from the other, based on clustering of
signs, symptoms and magnitude of biochemical values, have often been established by procedures employing
implicit statistical methods.
In placing an individual or a community’s health status in one of these categories there is always some
uncertainty. It may happen that the stated signs and symptoms are not exactly the same as those listed for, and
defining, that category. Conversely, more than one category may have the same set of signs and symptoms
ascribed to it.
Statistical reasoning is often unconsciously employed when a doctor selects a disease category with the best
chance of being correct.

C) Prediction of likely outcome of a disease intervention programme in a community or of disease in

individual patients.
Prognosis is the assessment or prediction of the likely outcome of an intervention programme in a community or
of disease in patients in the light of the presenting symptoms, signs and circumstances. The procedure draws on
previous experience with similar intervention programmes or patients, and the exercise is, in principle, mainly
statistical.
D) Selection of appropriate intervention for a patient or community: This is based on the following: previous

3
experience with similar patients or communities that had received the intervention. - Reports in the literature of
clinical trials or experiments to assess the relative efficacy of different drugs and other methods of treatment. -
Objective assessment of the health worker’s previous experiences.
The design, execution and analysis of medical experiments and intervention programs must employ sound
statistical principles and methods if the findings and conclusions are to be valid.
E) Public health, health administration, and planning
The major application here is in the use of data relating to illness in the population in order to make community
diagnosis. This requires knowledge of: characteristics such as size and age structure of the population; - the
health profile of the population, in terms of disease or risk factor distribution; - influence of environmental
factors; - use of vital statistics (data on births and deaths) In health administration and planning, use is also made
of data on the distribution of all levels of health care resources (need, availability, utilization etc.)

LIMITATIONS OF STATISTICS AND STATISTICAL FALLACIES

LIMITATIONS OF STATISTICS: Statistics, with its wide applications in almost every sphere of human
activity, is not without limitations. The following are some of its important limitations:

I) Statistics is not suited to the study of qualitative phenomenon: Statistics, being a science dealing with a set
of numerical data, is applicable to the study of only those subjects of inquiry, which are capable of quantitative
measurement. As such, qualitative phenomena like honesty, poverty, intelligence, culture, status of health, etc.,
which can not be expressed numerically, are not capable of direct statistical analysis. However, statistical
techniques may be applied indirectly by first reducing the qualitative expressions to precise quantitative terms.
For example, the intelligence of a group of candidates can be studied on the basis of their scores in a certain test.
ii ) Statistics laws are not exact : Unlike the law of physical and natural sciences statistical laws are only
approximations and not exact. On the basis of statistical analysis we can talk only in terms of probability and
chance and not in terms of certainty. Statistical conclusions are not universally true - they are true only on an
average. For example, let us consider the statement: “ It has been found that 20 % of certain surgical operations
by a particular doctor are successful.” The statement does not imply that if the doctor is to operate on 5 persons
on any day and four of the operations have proved fatal, the fifth must be a success. It may happen that fifth
man also dies of the operation or it may also happen that of the five operations on any day 2 or 3 or even more
may be successful. By the statement, we mean that as number of operations becomes larger and larger we
should expect, on the average, 20 % operations to be successful.
iii) Statistics does not study individuals: Statistics deals with an aggregate of objects and does not give any
specific recognition to the individual items of a series. Individual items, taken separately, do not constitute
statistical data and are meaningless for any statistical inquiry. For example, selected data of a patient have
limited value in statistics, unless they are compared with either previous data of the same patient or concurrent
data of other patents, to facilitate comparison. Thus, statistical analysis is more suited to problems where group
characteristics are to be studied.
iv) Statistics is liable to be misused: Perhaps the most important limitation of Statistics is that experts must use
it. As the saying goes, “Statistical methods are the most dangerous tools in the hands the inexpert.” The use of
statistical tools by inexperienced and untrained persons might lead to very fallacious conclusions. One of the
greatest shortcomings of statistics is that they do not bear on their face the label of their quality and as such can
be molded and manipulated in any manner to support one’s way of argument.

COLLECTION OF DATA
4
Introduction: Success of any statistical investigation depends upon the availability of accurate and reliable
data. Collection of data is a very basic activity in decision making. Following points should be considered before
starting data collection by an investigator:
(1). Purpose (2). Scope (3). Limitations and (4). Degree of accuracy
Primary and Secondary data : Data used in different studies is terms either ‘primary’ and ‘secondary’
depending upon whether it was collected specifically for the study in question or for some other purpose.
Primary data: data which is collected under the control and direct supervision of the investigator (investigator
collects the data himself) is called as primary data or direct data. (Direct or Primary method)
Secondary data: data, which is not collected by an investigator, but is derived from other sources, is called as
secondary or indirect data (Indirect or Secondary method)
Sources of Primary data: Survey
Sources of Secondary data:: Published and Unpublished
Published sources: National and International organizations which collects statistical data and publish their
findings in terms of statistical reports.
National Organizations: Census, Sample Registration System (SRS), National Sample Survey Organizations
(NSSO), National Family Planning Association (NFPA), Ministry of health, Magazines, Journals, Institutional
reports etc.
International Organizations: World Health Organization (WHO), United Nations Organizations (UNO),
UNICEF, UNFPA, World Bank etc. Unpublished sources: Records maintained by various Govt. and private
offices, studies made by research institutes, schools etc. This data based on internal records. Provides authentic
statistical data and is much cheaper than primary data.

METHODS OF DATA COLLECTION (PRIMARY METHOD)

5
1. Observation method: In this method investigator asks no questions, but he simply observe the phenomenon
under consideration & records the necessary data. It may be different to produce accurate data. Physical
difficulties on the part of observer may result in errors. The information obtained is first hand or original in
character.e.g. Interaction in a group, study the dietary patterns of population, ascertain the functions performed
by a worker or study of behavior or personality traits
Types of observations: Participant and Non - participant
Participate in the activities of the group being in the same manner as its members with or without their knowing
that they are being observed. For example: you might want to examine the reactions of the general population
towards people in wheel chairs. You can study their reactions by sitting in a Non-Participant observation:
researcher does not get involved in the activities of the group but remains a passive observer, watching and
listening to its activities and drawing conclusions from this. For example: you might want to study the functions
carried out by nurses or staff in a hospital. As an observer, you could watch, follow, and record the activities as
they are performed. After making a number of observations, conclusions could be drawn about the functions
carried out by nurses or staff in the hospital.
2. Interview method:
This method is commonly used to collect information from people. Any person-to-person interaction between
two or more individuals with a specific purpose in mind is called as interview. Can be flexible – interviewer has
the freedom to formulate questions as they come to mind around the issue being investigated. Can be inflexible –
when the investigator has o keep strictly to the questions decided beforehand Interview are classified according
to degree of flexibility as follows:
Unstructured Interviews:
Complete freedom to provide in terms of content and structure. Complete freedom in terms of the wording to use
and explain questions to respondents. It can formulation of questions and raise issues on the spur of the moment.
Four types: In-depth interviewing, focus group discussion, narratives and oral histories.
1. In-depth interviews: It involves face-to-face, repeated interactions between the researcher and his/her
informants. It seeks to understand the latter’s perspectives (since of repeated contents –extended length of time
spent with informants)
2. Focus group interviews:
Broad discussion topics can be developed beforehand, either by researcher or by the group. Broad frame for
discussions which follow. Specific discussion points emerge as a part of discussion. Members of a focus group
express their opinions while discussing issues.
3. Narratives:
No predetermined contents except that the researcher seeks to hear the personal experience of a person with an
incident or happening in his/her life. This method is very powerful for situations which are very sensitive in
nature. For example: impact of child sexual abuse on people who have gone through such experience, and how
they have been affected
4. Oral histories:
Use of both passive and active listening. Commonly used for learning about a historical event or episode about a
cultural, custom or story that has been passed from generation to generation.
Structured Interviews
Researcher asks a predetermined set of questions, using the same wording and order of questions as specified in
the interview schedule. Interview schedule – a written list of questions, open-ended or close-ended, prepared for
use by an interviewer in a person-to-person interaction. Provides uniform information.
3. QUESTIONNAIRE METHOD

6
A written list of questions, the answers to which are recorded by respondent. No one to explain the meaning of
questions to respondents. Questions are clear and easy to understand. It should be developed in an interactive
style. In this method the investigator draws up a questionnaire containing all the relevant questions which he
wants to ask from the respondents & accordingly records the reports.
Criteria's of selection for questionnaire method:
1.The nature of investigations 2. The geographical distribution of the study population
3. The type of study population
Questions may be formulated as: Open ended and Closed ended
In Open-ended questions the possible responses are not given. Respondents write the answers in his/her words.
In closed-ended questions the possible answers are set out in the questionnaire or schedule &the respondent or
investigator ticks the category that best describes the respondent's answer.
Examples:
Open - ended questions:
What is your current age? _____ Years
How would you describe your marital status? ________
What is your average annual income? _____
What is your opinion, are the qualities as a good administrator?
1. _____________
2. ______________
3. _______________
4 . ________________
Examples: Closed - ended questions:
a. Indicate your age by placing a tick mark
1. under 15 years
2. 15-29 years
3. 30-44 years

b. How would you described your married status?

1. Married
2. Single
3. Divorced
4. Separated

7
Examples: Closed - ended questions:
c. What is your average annual
1. under Rs. 10000/-
2. Rs. 10000-19999/-
3. Rs. 20000-39999/-
4. Rs. 40000/- and above

In closed-ended questions having developed categories, no change is allowed hence investigator should be very
certain. In open – ended questions there is a chance to develop any categories at the time of analysis.
Closed ended questions are extremely useful for eliciting factual information and open ended questions for
seeking opinions, attitudes and perceptions. The choice of open ended and closed ended questions should be
made according to the purpose for which a peace of information is to be used, the type of study population, the
methods of communicating the findings and the relationship.

Advantages of open ended questions:

Open ended questions provides in - depth information if used in an interview by an experienced interviewer. In a
questionnaire, open ended questions provide respondents with the opportunity to express themselves freely,
resulting in a greater variety of information. Open ended questions eliminate the possibility of investigator bias.

Advantages of closed - ended questions

The possible responses are already categorized, they are easy to analyze. Greater possibility of investigator bias,
since the researcher may list only the response patterns the ease of answering a ready made list of responses may
create a tendency among some respondents and interviewers to tick a category without thinking through the
issues. Considerations in formulating questions always use simple and everyday language. Do not use
ambiguous questions. Do not ask double barreled questions do not ask leading questions do not ask questions
that are based on presumptions.

Mail Questionnaire:

A list of questions (questionnaire) is prepared & mailed to the respondents. Respondents are expected to fill in
the questionnaire & send it back to the investigator. This method can be easily adopted where the field of
investigation is very vast & respondents are spread over a wide geographical area. It can be adopted only where
the respondents are literate & can understand written questions & answer them.

METHODS OF DATA COLLECTION (SECONDARY METHOD)

Govt. or semi-government publications. Earlier research Previous records Mass media Problems with using data
from secondary sources: Validity and reliability Personal bias Availability of data Format

8
CLASSIFICATION OF DATA
 The process of arranging data in different groups according t o similarities. The process of classification can
be compared with the process of sorting out letters in post office.

SIGNIFICANCE
 Classification is fundamental to the quantitative study of any phenomenon. It is recognized as the basis
of all scientific generalization and is therefore an essential element in statistical methodology. Uniform
definitions and uniform systems of classification are prerequisites in the advancement of the scientific
knowledge.
WHAT IS CLASSIFICATION?
 Classification is a process of arranging a huge mass of heterogeneous data into homogeneous groups to
know the salient features of the data.
WHY CLASSIFICATION?
 Facilitates comparison of data within and between the classes It renders the data more reliable because
homogeneous figures are separated from heterogeneous figures It helps in proper analysis and
interpretation of the data.
Objectives
1. To condense the mass of data in such a way that salient features can readily noticed. 2. To compare two
variables. 3. To prepare data this can be presented in tabular form. 4. To highlight the significance features of
data at a glance. 5. It reveals pattern 6. It gives prominence to important figures.7. It enables to analyze data.8. It
help in drafting a report
CLASSIFICATION OF DATA
Common types of classifications are: 1. Geographical i.e. according to area or region 2. Chronological i.e.
according to occurrence of an event in time 3. Quantitative i.e. according to magnitude 4. Qualitative i.e.
according to attributes
1. Geographical:
In this type of classification, data is classified according to area or region. For example: when we considered
state wise distribution of sex ratio in India. This listing of individual entries is generally done in alphabetical
order or according to size to emphasize the importance of a particular area or region.
2. Chronological: When the data is classified according to the time of its occurrence, it is known as chronological
classification.
For example: Distribution of the Deaths for last five years
-------------------------------------------------------
Year No. of deaths
-------------------------------------------------------
2001 241
2002 348
2003 412
2004 548
2005 698
3. Quantitative data:
When the data is classified according to some characteristics that can be measure, continuous data can take all
values of the variable.
* Definition: any statistical data, which are described both by measurement and counting is called as quantitative
data. For example: Height, Weight, Pulse Rate, BP, BSL, Age, RR, Age, Income, etc.

9
4. Qualitative data: When the data is classified according to some attributes (distinct categories) which are not
capable of measurement. An attribute is divided into two classes, one possessing the attribute and other not
possessing it. - Definition: Any statistical data, which are described only counting not by measurement, is
called as qualitative data. For example: Sex, Blood group, Births, Deaths, No. of patients suffering from a
diseases, SE classification such as Lower, middle and upper, No. of vaccinated, not vaccinated etc.
-Technical terms for quantitative classification:
a. Variable: a quantity which changes its values is called as variable. e.g. age, height, weight, etc Continuous variable: age,
height, weight etc. Discrete variable: Population of a city, production of a machine, spare parts etc.
b. Class Limits: the lowest and highest value of the class are called as class limits.
c. Open – ended and closed ended classes:
Open ended (Exclusive method) Closed ended (Inclusive method)
0 - 10 0-9
10 -20 10 -19
20 – 30 20 - 29
30 – 40 30 - 39
40 – 50 40 - 49
50 - 60 50 - 59
d. Class frequency: the number of items belonging to the same class
e. Class magnitude or class interval: the length of class i.e. the difference between the upper limit and lower
limit of the class.
Frequency distribution; The way in which the items are spread our or distributed into various classes is a
called the frequency distribution.
Two types of frequency distributions: Continuous and discrete
Formation of discrete frequency distribution table: By tally bars (tally marks) count how many time a
particular value is repeated and this number is the frequency of that value.
Example: given below the value of marks obtained by 19 students in term ending theory examination. Form the discrete
frequency distribution.

10,15,10,10,15,20,15,15,20
,10,20,15,20,25,15,15,15,,2
0,15,25
Marks Tally bars Frequency
10 4
15 9
20 5
25 2
Total 20

Formation of continuous frequency distribution table:By counting how many values fall into each class
Example: given below the value of weight in kgs of 25 students. Form the continuous frequency distribution.

10.7, 10.3, 19.3, 28, 15,

10
12.3, 15.4, 34.6, 23, 26.7,
35, 33.3, 23.7, 19.6, 20.9,
12.3, 29.4, 25, 22.3, 25.4,
24.6, 13, 16.7, 25, 13.3
Weight (kgs) Tally bars Frequency
10-15 7
15-20 3
20-25 8
25-35 7
Total 25
PRESENTATION OF DATA
 Principles of presentation:

a) To arrange the data in such a way that will arouse interest in a reader.
b) To make the data sufficiently concise without loosing important details.
c) To present data in simple form to enable the reader to form quick impressions and draw some conclusions,
directly or indirectly.
d) To facilitate further statistical analysis

Types of presentation:
i) Ordered array, ii) Tabulation, and iii) Drawings
When the data is simple and small it can be presented by arranging them in an orderly manner. The order may be
ascending or descending in magnitude, if the data is quantitative, it may be alphabetical or any other acceptable
norm if the data is qualitative.
Table method (Tabulation) A table is a systematic arrangement of statistical data in columns and rows. The
purpose of a table is to simplify the presentation and to facilitate comparison.
Role of tabulation: The significance of tabulation will be clear from the following points: It simplifies complex
data, It facilitates comparison, It gives identity to the data
PARTS OF A TABLE
(1) Table Number (2) Title of a table (3) Head Note (4) Caption (5) Stub (6) Body of a table (7) Foot note and
(8) Source note

TABLE FOR QUANTITATIVE

DATA
Age in Year Males Females Total
40-50 12 8 20

11
Age in Year Males Females Total
50-60 27 13 40
60-70 23 7 30
70-80 18 2 20
>80 10 0 10
Total 90 30 120

TABLE FOR QUALITATIVE DATA

Cancer No Cancer Total
Smoking 80 (72.73%) 8 (20%) 88 (58.66%)
No Smoking 30 (27.27%) 32 (80%) 62 (41.34%)
Total 110 (73.34%) 40 (26.66%) 150
 General-purpose table: It is generally prepared to present information which is intended for use by an
out side party. Data are usually broken down in considerable detail and completely classified according
to every characteristic for which data are presented.
 Special purpose table: A special purpose table is prepared with an idea of making comparative studies
and studying the relation ship and significance of the figures provided by the data.
 Frequency distribution table: It is showing the frequencies with which the values are distributed into
different mutually exclusive groups with some defined characteristic(s).

 Cross tabulation (Two-way table): A frequency table involving at least two variables that have been
cross classified. This table furnishes information about two interrelated characteristics for a particular
phenomenon.

Rules for making a frequency table:

a ) Divisions ( groups ) should not be too broad or too narrow,

b) No. of groups should be reasonably less,
c) Class interval must be uniform throughout and closed, as far as possible,
d) If data includes rates, mention the denominator.

ii) Drawings (Graphs & Diagrams):

1. The advantage as well as drawback of graphs is that they give a quick impression. Graphs facilitate
understanding of comparison of strengths, correlations and trends.
2. Interpretation is done by rough translation of the points into actual figures. A change in the scale will
give a different pattern.
3. Graphs should be adjuncts to respective tables and not their substitutes.
4. While comparing, note the difference in scales.
5. They must be drawn following certain basic rules which are dependent partly on convention, partly on .
mathematical considerations and partly on personal preferences.

Types of Drawings:

For Quantitative Data: The following graphs are normally drawn for quantitative data.
Histogram, b) Frequency polygon, c) Frequency Curve, d) Cumulative Frequency curve (Ogive), e)
Line chart, and f) scatter diagram
12
For Qualitative Data: The following diagrams are normally drawn for qualitative data.
(a)Bar diagram (Simple, Multiple & proportional), (b) Pie diagram (c) Pictogram (picture diagram),
d) Contour map
Other types: Age-sex pyramid, Epidemic curve etc.

A) Graphs for Quantitative Data :

 Histogram: This is used to present a frequency distribution of a characteristic measured on a continuous

scale. The variable under study is indicated on the horizontal line while frequency is marked on the
vertical line. Frequency of each group will form a column or rectangle. The area of each rectangle is
proportional to the frequency of that group, hence it is also called an ‘area diagram’.
 Frequency Polygon: It is obtained by joining the mid points of class intervals at the height of frequency.
It is used when sets of data are to be represented on the same diagram.
 Frequency Curve: A free hand curve drawn by connecting the mid-points of class-intervals at the height
of frequencies of a histogram is called a frequency curve.
 Line graph or chart: It shows the trend of an event occurring over a period of time- rising, falling or
fluctuations.
 Cumulative Frequency Curve: This diagram is drawn after converting a frequency table into a
cumulative frequency table. The cumulative frequency is plotted opposite the group's limits of the
characteristic. The diagram appears on joining the points by a smooth free hand curve, which is also
called “ogive”.
 Scatter diagram: It is drawn to show the nature of correlation between two variable characters ( Ex.
Height & weight, Temp. & No. Spells of an illness )

B) Diagrams for Qualitative Data :

* Bar diagram: It is the most commonly used devices of presenting categorical data. They consist of a group of
equidistant rectangles, one for each group of the data in which values are represented by the length or height of
the rectangles. The bars should have uniform width on the same base line. The bars may be drawn vertically or
horizontally.
Types of Bar diagrams:

I) Simple bar diagram: It is the simplest and more frequently used diagram for the comparison of two or more
items or values of a single variable or category of data. For example: The data relating to births of a region
during 91 - 95.

II) Multiple bar diagram: If two or more sets of inter-related variables are to be presented graphically, multiple
bar diagrams are used. For example: The data relating to births and deaths of a region during 91 - 95.

III) Proportionate (Component) bar diagram: The proportionate bar diagrams are used, if the total magnitude
of the given variable is to be divided into various parts or components. First of all bar representing the total is
drawn. Then it is divided into various segments. Each segm35ent should represent a given component of the
total. A key index is given along with the diagram to explain these differences. Thus it is useful not only for
presenting several terms of the variable but also enable us to make comparative study.

IV) Pie diagram: Circle may be divided into various sections of segments representing certain proportions or
percentage to the total is known as Pie diagram. For example, with the help of pie diagram we can exhibit the
information relating to the causes of death in children in community (Diarrhoea & Enteritis, Prematurity &
Atrophy, * Bronchitis & Pneumonia). While laying out the sectors, it is a common practice to begin with the
largest component sector of pie diagram 12 O’ clock position on the circle. The other component sectors are
placed in clock wise succession in descending order of magnitude.
13
V) Pictogram : Pictogram is technique of presenting statistical data through appropriate pictures. It is popularly
used when the facts are to be presented to layman and less educated masses.

Other charts:
Epidemic curve: This gives the chronological distribution of the number of cases of a disease i.e., the
distribution in time.

Population Pyramid: Two histograms showing age distribution of a population separately for the males and
females, are put base-to-base.

DATA PROCESSING AND DATA ANALYSIS

Introduction
On successful completion of data collection, the next logical step in the research process is data
processing and analysis. This is one of the crucial steps in any scientific study. The researcher or experimenter
tests his hypothesis, draws conclusions, and assesses the possibility of generalization of his findings and goes on
to take decisions. This step in the research process involves data reduction and data analysis. To start with, data
is edited, followed by data coding, classification and tabulation. Such reduced or edited data is then subjected to
various types of statistical analysis. Certain statistical measures are computed and tests are carried out to
vindicate the stated hypothesis or otherwise reject them. The validity of data is assessed and confidence limits or
significance levels are placed.
Experts in the subject of research methodology often distinguish between data processing and data
analysis as two distinct steps. Processing of data is considered equivalent to concentrating, recasting and dealing
with data with the aim of making the collected data amenable to analysis. Data analysis follows data processing
and constitutes focusing on the data from the perspective of the hypothesis/research problems. This ultimately
contributes to reviewing existing theory, modifying it or proposing an entirely new theory.
However, there exists a contrarian’s school of thought that dose not make such a sharp differentiation.
They consider data analysis as a process that involves processing i.e. operations designed to facilitate and
increase amenability of data for analysis, as also, the operations designed to draw generalization or test
hypothesis.
Conceptually the data processing and analysis must be well integrated with the entire research process.
The prior steps in the research process must always keep in view the data processing and analysis to be done
later on. The researcher has to give adequate thought as to what he will do after he is through with data
collection. In absence of such a concern the researcher may later on understand that some relevant data is
missing or is not in the format required and thus may hamper his analysis.
Analysis of quantitative data is far simpler as compared to that of qualitative data, Typically, exploratory
o formulative studies are potentially hard to analyze properly. In our discussion will focus on the analysis of

14
quantitative data. As social sciences advance from qualitative stages, statistical methods will gain wide currency.
The broad aspects of data analysis are as follows:
 Data Editing
 Data classification of establishment of categories
 Data coding
 Data tabulation
 Statistical analysis of data
 Drawing inferences.
The discussion on statistical analysis of data is purposefully kept simplistic and preliminary as it is considered
quite essential for any student of research methodology to be well versed with these fundamental concepts. For
an advance treatment of any topic, the students may refer to a specialized text on statistics.

DATA EDITING

This is the first step in processing data after the collection of data collection instrument – the questionnaire.
Editing refers to the process of examining the data for any obvious errors, omissions, inconsistencies and
illegible recording, with the aim or rectifying them at an early stage. It calls for a careful scrutiny or the
questionnaires for assessing completeness, accuracy and uniformity. The quality of editing influences the
convenience and speed at later stages of data analysis. Data can be edited at two stages – field editing and
centralized editing.
Very often the interviewer records the responses or his observations, during the course of his administering the
questionnaire, in abbreviated form or at times as an illegible scribble. Therefore, it is prudent that after each
interview is over, he should review the questionnaire to complete abbreviated shorthand responses, rewrite
illegible scribble and correct any omissions. This type of editing is essential as it is often difficult for the staff
undertaking central editing to exactly decipher every field investigator's notes. It is preferable that the
interviewer does this type of editing on the same day or the next day at the most so that the information is fresh
and he can recall it with ease. If at all required, the interviewer may again contact the respondent, and ask for
another possible appointment. The interviewer should be clearly briefed about not engaging in any guesswork.
At times, questionnaires from all places of fieldwork are mailed to the central office in batches or after the entire
fieldwork is completed, In case the number of questionnaires (the sample size) is small, a single qualified editor
can do the take of editing or else a team of persons may be employed. The inappropriate answers are striked off,
answers recorded in wrong units are rectified, no responses are segregated etc. Such editing is generally done in
red ink to help to distinguish between the recorded information and the editor's remarks. The persons doing the
editing work must be conversant with the research study, the questionnaire, the interviewer instructions, the
coding pattern to be followed, etc. The editors are usually instructed to put their signatures and dates at
appropriate places. In any case no original data must be erased. At the end of the editing process any
questionnaires that do not meet the criteria of completeness, accuracy and inconsistency are discarded.

DATA CLASSIFICATION OR ESTABLISHMENT OF CATEGORIES

Any research study collects a large volume of raw data. To make any sense out of this huge quantum of data, to
understand the underlying patterns and for meaningful analysis, it is absolutely essential that the data must be
reduced or condensed into a suitable format. The task of arranging the collected data must into homogeneous
groups or classes based on some common characteristics are called as 'classification of data'. Appropriate
classification of data requires that the researcher must thoughtfully select an appropriate principal of
classification. The scheme of classification must be linked to the theory and to the objective of the study. The
classes established must fulfill the information requirements of the hypothesis testing procedure. Every data item
must be classified only once and no data item should be left unclassified. In other words, the classes must be
mutually exclusive and collectively exhaustive. The decision as to the number of classes or categories is a tricky
one. Usually it is better to have more number of classes than fewer classes as the categories can be easily
reduced further, but it is rather difficult to split or reclassify an already classified data set.
Data can be classified according to the presence or absences of an attribute, or the level of an attribute such as
15
awareness, literacy, sex, honesty, confidence, etc.
Numerical data can be classified into class-intervals e.g. income, education, age, marks, etc. Every class has an
upper limit and a lower limit and the classes are usually contiguous e.g. for age the classes may be 0-5, 5-10,
10-15, 15-20, 25- and above. Statistics of attributes is used to analyze the qualitative data whereas statistics of
variables is used to analyze quantitative data. Usually qualitative data, being unstructured in nature, presents
greater difficult of classification.

DATA CODING
Coding refers to the process of assigning numerals or other symbols to the answers or responses. A coding
scheme or coding frame needs to be designed for every question, such that the responses fall in specific
categories. The categories, as said earlier, must be mutually exclusive and collectively exhaustive. Care is taken
to define every class along a single dimension or concept. Codification and classification are largely intertwined.
It helps the further process of data tabulation. Many times the questionnaire is pre-coded i.e. the responses are
already put into specific categories. The respondent himself may be asked to assign appropriate codes to his
responses. Coding can be done by the interviewer during the course of interview itself. This is usually the case
for dichotomous and multiple-choice questions. In case of open-ended questions, usually the coding is done by
an experienced coder, at the central office. An experienced researcher will give adequate thought to the various
aspects of coding, right at the time of designing the questionnaire. Proper coding helps tabulation and computer
entry of data.
Coding errors should be always kept at a minimum, if not completely eliminated. Training of the interviewers
and coders helps in reducing the inaccuracies. The coders must be explained the rules of coding with appropriate
examples. They may also be given dummy data for practice. Any revisions if required are carried out in the
codes. As always, responses to open-ended questions or qualitative data are hard to code. It is important that a
given type of answer is assigned to a given category, appropriately and consistently.

DATA TABULATION

Tabulation refers to summarizing the assembled heap of data in the form of a matrix in a concise and logical
fashion. It reduces data to a compact form. Tabulation builds on the earlier processes of data editing,
categorization and coding. If the researcher is familiar with his study and its various dimensions, it is possible
for the researcher to draw tentative tabulations at the point of designing the questionnaire. He may use dummy
data for this purpose or may use secondary data from earlier studies. Also, pre-testing the questionnaire by
means of a pilot study presents some hints regarding the kind of appropriate tabulation required. Tabulation may
be done manually or can be done by the use of mechanical methods. Theses days, computerized tabulations is
the norm. Manual or hand tabulation is practical when the length of questionnaire (i.e. the number of questions)
and the sample size are small. For large scale, commercial surveys computerized tabulation is preferred.

Tabulation can be one-dimensional or bi-dimensional, i.e. simple or complex. Simple tabulation is uni-
dimensional, i.e. it gives information about one or more groups of independent questions. Complex tabulation is
bi-dimensional and gives information about one or more inter-related question is bi-dimensional and gives
information about one or more inter-related questions.

It is a good practice to give a short, clear and relevant title to every table. The table is given a unique number and
the table number and its title being placed above the table. These two help in easy and specific identification of
the table. The row and column headings are bold type faced and short and concise, along with their appropriate
units of measure. The source of data is acknowledged on the same page by way of a footnote. Gridlines make the
table more presentable and legible. Columns are numbered to facilitate quick and easy reference. Comparative
data is placed in adjacent columns (or rows). One category of data may be separated from the other by means of
thick lines or by leaving a column/row blank. Negative figures may have a 'minus' sing prefixed to them or may
be mentioned in parenthesis. Presentation of data should be logical and orderly, flowing from the more
significant one to the lesser one. The prime objective of tabulation must be to facilitate the easy, fast and
16
accurate processing and analysis.

STATISTICAL ANALYSIS OF DATA

Statistical analysis of data means critical examination of the collected data, reduced to a convenient form so as to
study the characteristics of the object under study by the application of statistics. Such an analysis service many
purpose : summarizing the data into a meaningful form, make exact descriptions possible, help to identify the
causal factors, aid the drawing of reliable inferences, help in arriving at estimates or generalizations, test various
hypothesis at appropriate levels of significance.

Statistical analysis can be broadly categorized into two:

 Descriptive statistical analysis
 Inferential statistical analysis.

Descriptive analysis describes the variation dimensions of the object under study. It gives a vivid picture of the
subject matter as regards size, composition, preferences, attitudes, etc. It may use one or more dimensions of
analysis. If the analysis is based on one variable it is called univariate analysis. Bivariate analysis deals with two
variables and multivariate analysis with more than three variables. Multivariate analysis further consists of
multiple regression analysis, multiple discriminant analysis, and canonical analysis, Multivariate analysis of
variance and factor analysis. Inferential analysis concerns with drawing inferences and conclusions from the
gathered mass of data. Inferential analysis consists of two areas – statistical estimation and hypothesis testing.
Statistical estimation involves estimating the population parameters from the sample statistics and is an inherent
aspect of any sample survey. This forms the subject matter of an independent chapter. Hypothesis testing refers
to the application of statistical techniques to accept or reject the proposed hypothesis at specific levels of
significance under assumed population parameters. This is discussed in detail in a later chapter.
MEASURES OF CENTRAL TENDENCY (Centering Constants)
Objectives:
(1) To find out one representative value. (2) To locate and summarize the entire set of varying values.
(3) To make decision concerning the entire set. (4) To compare different distributions.
Significance: The mass of data in one single value enables us to get an idea of the entire data. It also enable us to
compare two or more sets of data to facilitate comparison. It represents whole data / distribution with unique
value.
Characteristics of good measure of central tendency:
It should be easy to understand. It should be simple to calculate. It should be based on all observations. It should
be uniquely defined. It should be capable of further algebraic treatment. It should not be unduly affected by
extreme values.
Important measures of central tendency or centering constants which are commonly used in medical science, are:
1. Mean (Average)
2. Median
3. Mode
4. Geometric mean
5. Harmonic mean
1. Mean:
The ratio of addition of the all values to the total number of observations in a series of data is called as Mean or
17
Average.
General Formula:
If X1,X2,X3,X4,………..XN be the ‘N’ number of observations in a data then,
Mean = X1+X2 +X3 + X4 +……….+XN / N i.e. Mean = X (X bar) = Σ X / N
Merits of mean: It is simplest to understand and easy to compute. It is affected by the value of every item in
the series. It is the center of gravity, balancing the values on either side of it. It is calculated value, and not based
on position in the series.
2. MEDIAN:
The centre most value in a series of data is called as Median. The median is the 50 th percentile value below
which 50% of the values in the simple fall. It divides the whole distribution in two equal parts.
General formula:

Median = Size of (N+1/2)th observation in a series of data when the data is arranged in ascending or
descending order is called as Median.
Merits of median: It is specially used in only the position and not the values. Extreme values do not affect the
median as strongly as the mean. It is the most appropriate average in dealing with qualitative data. The value of
median can be determined graphically but value of mean can not be determined graphically.
3. Mode The most commonly or frequently occurring observation in a series of data is called as Mode.
Relationship between Mean, Median & Mode: Mode = 3 Median – 2 Mean
Relation in their size: Mean < Median < Mode
MEASURES OF CENTRAL TENDENCY FOR CONTINIOUS FREQUENCY DISTRIBUTION
In case of continuous frequency distribution following formula can be used:
Mean = X = A + Σ {fd`/ N} x C
Where, A = assumed mean, d`= m-A / C, m = mid points of the classes,C= size of the equal class interval
For example: Following data shows the distribution of Pulse rate / min for 210 cases. Find out Mean pulse
rate /min.
Pulse rate (X) No. of cases (f)
-------------------------------------------------------------------------------
65-70 15
70-75 41
75-80 58
80-85 47
85-90 32
90-95 17

Solution:
Pulse Rate (X) No. of Cases (f) Mid Points (m). d= m-A d'= m-A/C fd'
65-70 15 67.5 -10 -2 -30
70-75 41 72.5 -5 -1 -41

18
Pulse Rate (X) No. of Cases (f) Mid Points (m). d= m-A d'= m-A/C fd'
75-80 58 77.5=A 0 0 0
80-85 47 82.5 +5 +1 47
85-90 32 87.5 +10 +2 64
90-95 17 92.5 +15 +3 51
Total N=∑f=210 ∑fd'=91
Mean = X = A + Σ {fd`/ N} x C = 77.5 + 91 / 210 x 5 = 77.5 + 0.4333 x 5 = 77.5 + 2.166
Thus, Mean Pulse rate = 79.66 / minute
MEDIAN
Me = L1 + {N /2 – c.f. /f} x C
Where, L1 = lower limit of the median class Median class = size of N/2 and the class having just greater than
c.f., the corresponding class is called median class c.f. = cumulative frequency of the previous class of median
class N = Σ f, f = frequency of the median class and C = Class interval
For example: Following data shows the distribution of Weight in Kgs of 200 TB patients in a hospital . Find
out Median weight.

Weight 40-45 45-50 50-55 55-60 60-65 65-70 70-75

(kg) (X)
No. Of 12 29 49 71 33 20 6
pts (f)

Solution:
Weight (X) No. of Pts (f) Cumulative freq (c.f.)
60-70 15 15
70-75 41 56
75-80 58 114
80-85 47 161
85-90 32 193
90-95 17 210
Total N=∑f=210
Me = L1 + {N /2 – c.f. /f} x C
Now, Median class=Size of N / 2 = 210 /2 = 105 Then, the c. f. just greater than N/2=105 is 114 Thus, median
class = 75-80 Then, Me = L1 + {N /2 – c.f. /f} x C =75 + {105 – 56 / 58} x 5 = 75 + {49/58} x 5 = 75 + 4.22 =
79.22 Kg
MODE
Mode = L1 + { f1-f0 /2f1-f0-f2 } x C
19
Where, L1 = lower limit of the modal class, Modal class= Class having highest frequency f0= frequency of the
class preceding the modal class, and f1 = frequency of the modal class and f2 = frequency of the class succeeding
the modal class. And C = class interval
For example:Following data shows the distribution of Hb (gm%) of 500 anaemia cases in a village. Find

out Mode.
Hb 5.0-5.5 5.5-6.0 6.0-6.5 6.5-7.0 7.0-7.5 7.5-8.0
No. of cases 48 89 102 80 57 24
Mode = L1 + { f1-f0 /2f1-f0-f2 } x C Modal class = class having highest frequency = 102 = 6.5-7.0

Mo = 6.5 + {102-89 / 2x102-89-80}

x 0.5 = 6.5 + 13 / 35 x 0.5 = 6.5 +
0.1857 Thus , Mo = 6.6857 gm%
Age in yrs (X) No. of cases Mid point d= m-A d'= m-A/C fd' c.f.
(f) (m)
20-25 24 22.5 -20 -4 -96 24
25-30 49 27.5 -15 -3 -147 73
30-35 81 32.5 -10 -2 -162 154
35-40 147 37.5 -5 -1 -147 301
40-45 72 42.5=A 0 0 0 373
45-50 61 47.5 +5 +1 +61 434
50-55 37 52.5 +10 +2 +74 471
55-60 29 57.5 +15 +3 +87 500
Total ∑f=500 -330
Mean = X = A + Σ {fd`/ N} x C = 42.5 + {-330 / 500} x 5 = 42.5 – 3.3= 39.2 years
Median = L1 + {N /2 – c.f. /f} x C Median class = size of N / 2 = 500/2 = 250 and C.f. just greater than 250 =
301 i.e. class 35-40 is median class Median = 35 + 250 – 154 / 147 x 5 = 35 + 3.26 = 38.26 years Mode = L1
+ { f1-f0 /2f1-f0-f2 } x C Modal class = class having highest frequency = = 147 = 35-40
Mode = L1 + { f1-f0 /2f1-f0-f2} x C = 35 + { 147-81/ 2*147-72-81} *5= 35 + 66/141 *5 = 35 + 2.34 = 37.34 yrs
VARIABILITY, VARIATION OR DISPERSION
Types of Variabilities: Experimental, Biological, Real
To measure the variability among the different variables or distributions there are several measurements which is
called as Measures of variation.
Concept: It describes the spread or scattering of the individual values around the central value.

Objective: The main objective of the measures of variation / variability / dispersion is “How the individual
observations are dispersed around the mean”.

Significance:
 It determines the reliability of an average. To determine the nature and cause of variation in order to
20
control the variability. To compare two or more than two distributions with regards to their variability.
It is of great importance to advanced statistical analysis. To find out the variation in a distribution.

Characteristics of good measure of variation:

 It should be simple to understand. It should be easy to calculate. It should be rigidly defined. It should
be based on each and every item of the distribution. It should not be affected by the extreme values.

Following are some important measures of variation:

1. Range 2. Inter-quartile range 3. Mean Deviation 4. Standard Deviation (S.D.) 5. Coefficient of

Variation (C.V.)
Measures of variation may be absolute or relative. Measures of absolute variation are expressed in terms of
the original data. Measures of relative variation compare two sets of data.

1. Range: This is a crude measure of variation since it uses only two extreme values.
Definition: It is defined as the difference between highest and lowest value in a set of data.
Symbolically, Range can be given as Range = X max. - X min Range is useful in quality control of drug,
maximum and minimum temperature in a case of enteric fever etc.

2. Interquartile Range: The difference between third and first quartile. Symbolically, Q = Q3 – Q1
Where, Q1 = First quartile Q3 = Third quartile The interquartile range is superior to the range as it is based
on two extreme values but rather on middle 50% observations.

3. Mean Deviation:
The ratio of sum of deviations from mean to individual observations to the number of observations after ignoring
the sign. i. e. Mean Deviation = |∑(X – X)| / N Although the mean deviation is good measure of variability, its
use is limited. It measure and compare variability among several sets of data.

4. Standard Deviation (SD):

Karl Pearson introduced in 1893. Most widely used & important measures of variation. It is based on all
observations. Even if one of the observations is changed, SD also changes. It is least affected by the fluctuations
of sampling.

Definition: SD is Root Mean Square Deviation i.e. it is the square root of the mean of squares of deviation from
individual observation to mean. Generally, it is denoted by δ (sigma) Greater/smaller the value of SD in data,
greater/smaller will be the variation among data.

Steps to calculate SD:

If X1, X2, X3, X4, .,.,.,.,.,., XN be the 'N' numbers of observations in a series of data then value of SD can be
calculated as follows:
1. Calculate Mean (i.e. X = ΣX / N)
2. Take the differences from the mean from each value in data (i.e. X 1-X, X2-X, X3-X, X4-X, ...... XN-X)
3. Take the squares of the differences taken from the mean from all individual observations {i.e. (X 1-X)², (X2-
X)², (X3-X)²,(X4-X)²,......(XN-X)²}
4.Take the addition of the squares of the differences taken from the mean from all individual observations i. e.
Σ (X- X)²
5. Divide the addition of the squares of the differences taken from the mean from all individual observations by
N-1. i. e. Σ (X- X)² / N-1

21
6. Take the square root of step no. 5. Thus, SD = √Σ (X- X)² / N-1

If the data represents a small sample of size N from a population, then it can be proved that the sum of the
squared differences are divided by (N-1) instead by N. However, for large sample sizes, there is very little
difference in the use of N-1 or N in computing the SD. SD is directly proportional to the variation in a data. i.e.
if the value of SD is more / less, the variation will be more/ less. To minimize the value of SD increase the
number of observations in a series of data. Thus, it is better that investigator should take more number of
observations in any research study.

Applications / Uses of SD:

1. It is used to find out variation in data. 2. To control the variability in a distribution. 3. To find out errors in a
study. 4. To determine adequate sample size in various research studies. 5. To find out sampling error of
difference between two sample means.

Example: Find out value of SD for the following data showing DBP (mm of Hg) for 10 NIDDM patients:

70, 80, 94, 70, 58, 66, 78, 67, 82, 60.
SN X (X-X) (X-X)²
-----------------------------------------------------------------------------------------------------------------------------------

1. 70 - 2.5 6.25
2. 80 + 7.5 56. 25
3. 94 + 21.5 462.25
4. 70 - 2.5 6.25
5. 58 - 14.5 210.25
6. 66 - 6.5 42.25
7. 78 + 5.5 30.25
8. 67 -5.5 30.25
9. 82 +9.5 690.25
10. 60 -12.5 156.25
----------------------------------------------------------------------------------
Σ X = 725 Σ(X-X)² = 1090.5
X = Σ X / N =725/10 = 72.5
SD = √Σ (X- X)² / N-1 = SD = √1090.5/9 = 11.07

5. Coefficient of Variation (CV): Frequently used relative measure of variation.

Definition: It is the ratio of the standard deviation (SD) to the mean expressed as the percentage (%).
i.e. CV = SD / Mean x 100

Uses of CV: 1. It is applied to know the variation in the data. 2. To find out consistency and reliability of data. 3.
To compare two different variables.

Example: In a distribution mean weight is 76.4 kg with a SD of 7.7 and Mean DBP is 98.8 mm of Hg with SD as
10.5. Which variable is more consistent?

Solution: CV for weight = SD/Mean x 100 = 7.7/76.4x100 = 10.08% CV for DBP = SD/Mean x 100 =
10.5 /98.8x100 = 10.63% Thus, CV for DBP is more than CV for weight, (10.63% > 10.08%), then variable
22
weight shows less variation as compared to DBP. Thus, Weight is consistent variable than DBP.

Standard Deviation (SD) for Continuous frequency distribution

In case of continuous frequency distribution following formula can be used:
S.D. = √ {Σ fd`² - (Σ fd`)²/ N / N-1} x C Where, d = m –A, (A- assumed mean), d`= m-A / C , m = mid points
of the classes C= size of the equal class interval Σ fd`² = f d`xd`

xample: Following data shows the

distribution of Pulse rate / min for

210 cases. Find out SD and Coeff.
Of variation of pulse rate /min.
PULSE RATE /MIN 65-70 70-75 75-80 80-85 85-90 90-95
NO. OF CASES 15 41 58 47 32 17

Solution:
Pulse Rate (X) No. of Cases (f) Mid Point (m) d=m-A d'=m-A/C fd'
fd'2
65-70 15 67.5 -10 -1 -30 60
70-75 41 72.5 -5 -2 -41 41
75-80 58 77.5=A 0 0 0 0
80-85 47 82.5 +5 +1 47 47
85-90 32 87.5 +10 +2 64 128
90-95 17 92.5 +15 +3 51 153
Total N=∑f=210 ∑fd'=91
Σ fd`² = 429
Mean = X = A + Σ {fd`/ N} x C = 77.5 + 91 / 210 x 5 = 77.5 + 0.4333 x 5 =77.5 + 2.166=Mean Pulse rate =
79.66 / minute
S.D. = √ {Σ fd`²- {(Σ fd`)²/ N} / N-1} x C = √ {429 – {(91) ² / 210} / 209} x 5= √ {429-(8281/210)/ 209} x 5
= √ 429-39.44 / 209 x 5 = √ 389.56 / 209 x 5 = √1.8639 x 5 = 1.36 x 5 Thus, SD of pulse rate = 6.8/ minute
Coefficient of Variation =SD / Mean x 100 = 6.8 / 79.66 x 100= 0.08536 x 100 C.V. = 8.53 %

NORMAL DISTRIBUTION (GAUSSIAN DISTRIBUTION)

A histogram of a quantitative data obtained from a single measurement or different subjects a ‘bell shaped’
23
distribution is known as Normal distribution.The Normal distribution is completely described by two parameters
Mean and SD. When a large number of observations of any variable character such as height, BP, weight,
Blood sugar level etc. to make it a representative sample a frequency distribution table is prepared by keeping
group interval small; then it will be seen that:
(a) Some observations are above the mean and others are below the mean. (b) If they are arranged in order,
deviating symmetrically towards the two extremes from the mean, then maximum number of frequency will be
seen in the middle around the mean and fewer at the extremes, decreasing smoothly on both sides. c. Normally,
half (i.e. 50%) observations lie above and half (i.e. 50%) observations below the mean. (i.e. all observations are
symmetrically distributed on each side of a mean) Distribution of this nature is called as Normal distribution.
Confidence Intervals (limits): A range of values within which population mean likely to lie.
68% C.I. = Mean ± 1SD contains 68% of all the observations.
95% C.I. = Mean ± 2SD contains 95% of all the observations
99% C.I. = Mean ± 3SD contains 99% of all the observations.
Normal distribution can be expressed arithmetically with confidence intervals (limits) as follows:
Mean ± 1SD limits include 68% or roughly 2-3rd of all the observations, 32% lie outside the range Mean ± 1SD.
Mean ± 2SD limits include 95% of all the observations and 5% lie outside the range Mean ± 2SD.
Mean ± 3SD limits include 99% of all the observations and only 1% lie outside the range Mean ± 3SD.

Normal Curve:
If an Area diagram of Histogram of such type of distribution is constructed then this diagram is called as Normal
curve.
Characteristics of Normal Curve:
(1) It is bell shaped. (2) It is symmetrical around the mean. (3). Mean, Median and Mode are co inside.
(4) It has two points of inflections. (5). Area under the curve is always equal to one. (6). It does not touch the
base line.
NORMAL CURVE

- 3SD -2SD Mean + 2SD + 3SD

PROBABILITY
Definition of various terms: 95%

Trail and Event: The experiment is known as Trial 99%

and a possible outcome from experiments is called as Events
or cases. For example: Tossing a coin is a trial and getting Head or Tail is called as event
24
Definition: Probability can be defined as the Relative frequency or Probable chance of occurrence with which an
event is expected to occur on an average such as giving birth to a boy in the first pregnancy, chance of one drug
being better than the other etc.
 Purpose of selecting a representative sample is to know the probability (relative frequency) of
occurrence of single or group of observations in a Normal distribution of any biological variable.
 Probability of occurrence of sample values by chance so that sample results can be compared with those
of population.
An element of uncertainty is associated with every conclusion because information on all happenings is not
available.
 This uncertainty is numerically expressed as Probability. 0 < p <1. When p=0, there is no
chance of an event happening or its occurrence is impossible. When p=1, there is a 100% chance
of an event happening. If the probability of an event happening in a sample is ‘p’ and of not
happening is denoted by ‘q’, then q = 1 – p or p + q =1. Arithmetically, Probability of an event can
be given by the following formula: p=No. of events occurring/Total no. of trials
For example: If a surgeon transplants kidney in 200 cases and succeeds in 80 cases then probability of survival
after operation is calculated as: p= No. of survivals after the operation / Total no. of patients operated p =
80/200 = 2/5 =0.4 ‘q’ or probability of not surviving or dying = 120 / 200 = 3 /5 = 0.6
For example: If twins are born once in 80 different pregnancies, then p for birth of twins = 1 /80 and the
probability for single birth q = 1- 1/80 = 79/80. If probability of being Rh-ve is 1 /10, then to being Rh+ve
will be 1-1/10 = 9/10.
TYPES OF PROBABILITY:
There are three types of probabilities:
1. Frequency approach 2. Model based approach 3. Subjective approach
Most of the times in analysis of data techniques frequency and Model based approaches is used. Whereas
the nature of diagnostic procedures tends to lead clinicians to a ‘subjective approach. That is, probability is a
measure the strength of the belief of the clinician in the two alternative hypothesis (diagnostics) , that the patient
has or not has got heart disease.
Whereas the nature of diagnostic procedures tends to lead clinicians to a ‘subjective approach. That is,
probability is a measure the strength of the belief of the clinician in the two alternative hypothesis (diagnostics) ,
that the patient has or not has got heart disease. The p-value is the result observed after the study is completed
and is based on the observed result.
The p-value can be interpreted as the probability of obtaining the observed difference.

SAMPLING
Introduction: Population: Individuals under study or census or complete enumeration. - Sample: A subset or
part or group of a population is called as sample. The process of inferring something about a large group of
25
elements by studying only a part of it is called as Sampling. Sampling refers to the part of the population so that
some inference about the population can be made by studying a sample. Sampling is most frequently used in
surveys. The purpose of sample survey is to obtain information about population Sampling involves selecting a
number of units
Sampling and Representativeness
Developing a Sampling Plan, Define the Population of Interest, Identify a Sampling Frame (if possible)

Select a Sampling Method, Determine Sample Size, Execute the Sampling Plan
Population of interest is entirely dependent on Management Problem, Research Problems, and Research
Design.Some Bases for Defining Population: Geographical area, Demographic profile, Lifestyle, and Awareness
Population of interest is entirely dependent on Management Problem, Research Problems, and Research
Design.
Sampling unit : Subject under observation on which information is collected Example: Children <5 years,
hospital discharges, health events, etc.
Sampling fraction: Ratio between the sample size and the population size Example: 100 out of 2000 (5%)
Sampling frame: Any list of all the sampling units in the population e.g List of households, health care units…
Sampling scheme : Method of selecting sampling units from sampling frame Randomly, convenience sample…
OBJECTIVES
There are two important objectives of the sampling which are :
1. To estimate the population “Parameter” from sample “Statistic” and 2. Testing of hypothesis
Parameter: any characteristics or value obtained from population is refereed as Parameter. For example:
Population mean, median, mode, proportion etc.
Statistic: any characteristics or value obtained from sample is refereed as Statistic. For example: Sample mean,
median, mode, proportion etc.
Why Sampling? Following are various reasons which make sampling a desirable:
1. Time taken for the study: Results from a sample can be much faster than from a complete enumeration.
2. Coast involved in the study: sampling also helps in substantial cost reductions as compared to population.
3. Physically impossible of complete enumeration: In many situations the elements being studied gets destroyed
while being test.
4. Practical infeasibility of complete enumeration: Quite often, it is practically infeasible to do a complete
enumeration due to many practical difficulties.
5. Enough reliability of inference based on sampling: In many cases sampling provides adequate information so
that not much reliability can be gained with complete enumeration in spite of additional money and time.
6. Quality of data collected: For large population, complete enumeration also suffers form the possibility of
unreliable data collected by the investigator.
Types of Sampling
There are two basic types depending on who or what is allowed for selection of the sample which are:
(1) Probability sampling (2) Non-probability sampling

1. Probability Sampling: In this type the decision whether a particular element is included in the same or not, is
done by chance only. Each element in the population has some non-zero probability of getting included in the
sample. Time consuming sampling It is possible to quantify the magnitude of the likely error in inference made.
2. Non-Probability Sampling: Each element in the population has does not ensure some non-zero probability
26
of getting included in the sample .In this method samples may be picked up based on the judgment or
convenience of the investigator. Can also introduce biases in the study Such a sampling design would also
belong to the non probability category.
Probability Sampling Methods:
1. Simple random sampling (2). Systematic random sampling (3) Stratified random sampling (4) Multi-stage
random sampling (5) Cluster random sampling (6) Multiphase random sampling
1. Simple random sampling
Principle: This is a process which ensures that each of the sample size ‘n’ has an equal probability of being
selected as the chosen sample.
Procedure: Randomly draw units. This method requires a list of all members of the population Table nos. is
also used to select the sample.
Advantages: Simple and Sampling error easily measured
Disadvantages: Need complete list of units, Does not always achieve best representativeness, Units may be
scattered
Example: evaluate the prevalence of tooth decay among the 1200 children attending a school, List of children
attending the school Children numerated from 1 to 1200 Sample size = 100 children Random sampling of 100
numbers between 1 and 1200

2. Systematic random sampling: A list of population must be available. , A systematic way i.e. every 5 th, 7th,
10th or 15 th house will be the sample.
3. Stratified random sampling: Homogeneous population, Different starat’s (groups) can be performed with
the size of population. Then apply simple random sampling
Advantages: More precise if variable associated with strata, all subgroups represented, allowing separate
conclusions about each of them
Disadvantages: Sampling error difficult to measure, Loss of precision if very small numbers

Example: Determine vaccination coverage in a country: One sample drawn in each region Estimates calculated
for each stratum Each stratum weighted to obtain estimate for country (average)
4. Multiple stage random sampling: Sample can be selected by multiple stages, e.g. State – districts – blocks –
village etc.
st nd
Example : sampling unit = household, 1 stage : drawing areas or blocks , 2 stage : drawing buildings, ,
houses, 3rd stage : drawing households
5. Cluster random sampling
A list of population must be available, Cumulative population can be determined, Useful in evaluation of
Universal Immunization Programme (UIP)
Principle: Random sample of groups (“clusters”) of units, In selected clusters, all units or proportion (sample) of
units included
Advantages: Simple as complete list of sampling units within population not required, Less travel/resources
required
Disadvantages : Imprecise if clusters homogeneous and therefore sample variation greater than population
variation (large design effect), Sampling error difficult to measure
6. Multiphase random sampling: Can be used in hospital based set up studies to diagnose a disease.
Example: In case of TB: Sputum test +ve / -ve, Chest X-ray +ve / -ve etc.

27
Place of sampling in descriptive surveys
(1). Define objectives (2).Define resources available (3).Identify study population (4). Identify variables to
study (5). Define precision required (6).Establish plan of analysis (questionnaire) (7). Create sampling frame
(8).Select sample (9). Pilot data collection (10). Collect data (11). Analyse data (12). Communicate results
(13). Use results
Conclusion: Probability samples are the best

DETERMINATION OF SAMPLE SIZE

DETERMINATION OF SAMPLE SIZE FOR QUANTITATIVE DATA:
Formula to find out the sample size is n = 4 x (S.D.) ² / L²
Where, S.D.= Population standard deviation which is known from past experience. L = allowable error in the
sample mean at 5% risk by researcher. In case, S.D. is not known preliminary investigations or pilot survey may
have to be carried out to estimate population S.D.
For example: Mean systolic blood pressure in a college student was found to be 118 mm of Hg with S.D. of 9
mm of Hg then sample size is: n = 4 x (S.D.)² / L² n = 4 x 9 x 9 / 1 x1 (since L=+1 at 5% risk) n = 4 x 81 / 1
n = 324
That is, if ‘L’ is less ‘n’ will be more. Thus, larger the sample size lesser will be the error.
DETERMINATION OF SAMPLE SIZE FOR QUALITATIVE DATA
Formula to find out the sample size is n = 4 x p x q / L²
Where, p = Positive percentage (%) or positive proportion. q = 100- p = Alternative percentage (%) or
negative proportion L = Allowable error not exceed 10% or 20% of p.
For example: Incidence rate in common cold was found to be 200 per 1000 persons, and then sample size is:
n = 4 x p x q / L² i.e. n= 4 x 20x 80 /2x2 Where, p = positive % age of common cold is
200/1000x100 = 20% q = 100 – p = alternative% age of common cold = 80%
L = allowable error not exceed 10% of p = 2 Then, n = 4 x 20 x 80 / 2 x 2 = 6400 / 4 n = 1600
Thus, if we allow a small error, the required sample size will be much larger.
To know adequate sample size in both hospital and community set up in terms of high & low prevalence rate we
have,

HIGH PEVALENCE RATE LOW PREVALENCE RATE

Hookworm prevalence rate was 30% Hookworm prevalence rate was 16%
then, sample size will be: then,
n = 4 x p x q /L² sample size will be:
n = 4 x 30 x 70 / 3 x 3 n = 4 x p x q /L²
n = 8400 / 9 n = 4x16 x 84 /1.6x1.6
n = 933 n = 5376 / 2.56
n = 2100

Thus, if prevalence rate is small the required sample size will be much larger. Therefore, deciding the sample
size it is necessary to know the prevalence rate of diseases in hospital or in community set up. Since no
statistical method can compensate for badly planned experiment on adequate sample size.

28
MEASURES OF CHANCE VARIATION
{Sampling Error or Standard Error (S.E.)}
The difference or deviation between the value of 'Statistic' of a particular sample and the corresponding
population 'Parameter' is known as measures of chance variation i. e. Standard Error or Sampling Error (S.E.) It
is used to find out variation in samples. The main objective of the measures of chance variation is to estimate the
population parameter from sample statistic.
Factors controlling S.E.:

1. Increase the size of sample, decreases S.E.

2. Decrease the S.E., accuracy increases

3. The nature of Statistic: example: mean, SD etc.

Measures of chance variation for Quantitative data:

It is called as Standard Error of Sample Mean (i.e. S. E.x ).SE of sample mean is defined as the ratio of SD to the
square root of sample size. i.e. S.E. x = SD /√n
Confidence Limits (Confidence Interval) : To estimate population mean, SE for sample mean can be used
The 95% and 99% confidence limits for population mean is as follows:
95%CI = Sample mean ± 2 SEx
99% CI = Sample mean ± 3 SEx
Measures of chance variation for qualitative data: It is called as Standard Error of Sample proportion i.e.
S.E.p S.E. of sample proportion is defined as ratio of the square root of multiplication of positive proportion (p)
and alternative proportion (q) to the sample size (n)
i. e. S E p = √ p x q/ n
Confidence Limits (Confidence Interval) :
To estimate population proportion, SE for proportion can be used The 95% and 99% confidence limits for
population proportion is as follows:
95%CI=Sample proportion±2 SEp
99%CI=Sample proportion±3 SEp
FOR QUANTITATIVE DATA (EXAMPLE):

1. In a random sample of 136 college students the mean Weight in Kg was observed to be 45.5 with SD 8.9. Find
out 95% and 99% confidence limits for population mean weight.
Solution:
Given that mean Wt.= 45.5kg and SD = 8.9kg , n=136. Now to estimate population mean weight first find out
SE of mean as follows: SEx = SD/√n= 8.9/√136 = 8.9/11.66 = 0.76
Then, 95% Confidence limits for population mean weight is as follows:
95%CI = Sample mean ± 2 SEx = 45.5± 2* 0.76 = 45.5± 1.52 = 43.98kg to 47.02 kg
Thus, population mean weight will lie in the range 43.98 to47.02 kg in 95% of all the cases.
Now, 99% Confidence limits for population mean weight is as follows:
99%CI = Sample mean ± 3 SEx = 45.5± 3* 0.76 = 45.5± 2.28 = 43.22kg to 47.78 kg

29
Thus, population mean weight will lie in the range 43.22 to 47.78 kg in 99% of all the cases.
Qualitative data (Example):
In a random sample of 150 school going children in an area, 20% were suffering from malaria. Find out 95% and
99% confidence limits for population proportion of malaria
Given that, n=150, p=proportion of malaria=20%,& q=100-20=80%.

Then, first find out SEp as follows: SEp =√pxq/n=√20x80/150=√1600/150= 3.26

Now, 95% confidence limits for population proportion of malaria can be given as,
p ± 2SEp = 20 ± 2 x 3.26 = 20 ± 6.52 = 13.48% to 26.52%.Thus, population proportion of malaria will lie
between 13.48% to 26.52% in 95% of all the cases. And 99% confidence limits for population proportion of
malaria can be given as
p ± 3SEp = 20 ± 3 x 3.26 = 20 ± 9.78 = 10.22% to 29.78%.
Thus, population proportion of malaria will lie between 10.22% to 29.78% in 99% of all the cases.

30
TESTING OF HYPOTHESIS AND TESTS OF SIGNFICANCE
HYPOTHESIS TESTING

INTRODUCTION
Hypothesis testing also referred to as 'Statistical decision-making' is an important aspect of research. Quite often,
in real life situations, we need to take decision about the population based on the information about the sample.
Very simply, hypothesis testing enables us to make probability statements about population parameter(s). A
hypothesis may not be proved absolutely, but it is accepted if it stands the test of critical objective analysis.

WHAT IS A HYPOTHESIS?

A hypothesis, in plain terms, is a tentative solution or answer to the research problem, which the researcher has
to test based on the available body of knowledge, or on knowledge that can be known. It is merely an
assumption or some supposition to be proved or disproved.

A hypothesis may be defined as a proposition or a set of propositions set forth as an explanation for the
occurrence of some specified group of phenomena either asserted merely as a provisional conjecture to guide
some investigation or accepted as highly problem in the light of established facts. Webster's New International
Dictionary of English Language defines the terms as 'a proposition, condition or principle which is assumed,
perhaps without belief, in order to draw out its logical consequences and by this method to test its accord with
facts which are known or may be determined. Quite often a research hypothesis, is a predictive statement, that
can be tested scientifically and relates an independent variable to some dependent variable.

USES OF A HYPOTHESIS
Hypothesis is a principle instrument in research. Its primary function is to suggest new experiments and
observations. Many experiments have hypothesis testing as their objective. Quite clearly hypothesis is a useful
aid to every researcher. If hypothesis is not formulated, even implicitly, the researcher cannot effectively
proceed with problem investigation. In the absence of such hypothesis, the researcher has little clue about what
to look for and in what specific order, during the data collection phase. In the light of a well-defined hypothesis,
the researcher can assess the relevance and usability of any data that he comes across. Lundberg brings forth the
value of hypothesis in the following words – 'The only difference between gathering data without a hypothesis
and gathering the data with a hypothesis is that in the latter case, we deliberately recognize the limitations of our
senses and attempt to reduce their fallibility by limiting our field of investigation so as to prevent greater
concentration of attention of particular aspects which past experience leads us to believe are insignificant for our
purpose'.

Thus hypothesis enables collecting relevant data and organizing them effectively. It prevents a blind search and
indiscriminate gathering of data which may later prove irrelevant to the problem under study.

CHARACTERISTICS OF A GOOD HYPOTHESIS

Research can being with a well formulated hypothesis or it may come out with hypothesis as its end product.
Hypothesis is not given readymade to the researcher, it has to be formulated. What then are the essential
characteristics of a good hypothesis?

 A sound hypothesis is generally a simple one. Simple, however, dose not mean obvious. The more
insight the researcher has about the problem, simpler will be his hypothesis. This simplicity is termed as
'Occam's Razor', after the English phiolospher, William of Occam, who remarked, '...... neither more nor
more onerous causes are to be assumed than are necessary to account for the phenomena'.
 A hypothesis must be clear and precise to allow reliable inferences.

31
 A hypothesis must be capable of being tested. Science dose not admit anything as valid knowledge until
a satisfactory test of its validity has been completed. Very exacting proof and measurement are
demanded, often by two or more persons, or by retest, of a hypothesis. A hypothesis is testable if other
deductions can be made form it which, in turn, can be confirmed or disproved by observation.
 Hypothesis should be focused in scope and be specific. Narrower hypothesis is more amenable to
testing.
 Hypothesis must be in line with a substantial body of established facts.
 Hypothesis must explain the facts that gave rise to the need for explanation. It must actually explain
what it claims to explain, it should have empirical reference.

TYPES OF HYPOTHESIS
Hypothesis can be classified in many ways, but classification based on the basis of their level of abstraction is
considered useful.
Good and Hatt classify hypothesis based on the levels of abstraction, into three categories:

 At the lowest level of abstraction are hypothesis that state existence of certain empirical uniformities e.g.
experienced graduates are likely to be better managers than freshers, after completing their M.B.A. This
type of hypothesis seems of invite scientific variations of common sense propositions.
 Hypothesis that deal with 'complex ideal types' are next in the hierarchy. They go beyond the level of
anticipating a simple empirical uniformity to purposeful distortions of empirical exactness. They aim at
testing whether logically derived relationships between empirical uniformities. Their function is to
create tools and termed as 'ideal types' because they are removed from empirical reality.
 The hypothesis at the highest level of abstraction is concerned with the formulation of a relation
amongst analytic variables. They state possible variations or changes in a dependent variable when the
independent variable varies in a certain fashion. They explain how one variable affects the.

Students must not have the misconception that any type of hypothesis is superior to the as each hypothesis has
its own importance and utility. The higher level hypothesis is built on the lower level ones.

SOURCES OF HYPOTHESIS

A hypothesis may originate from any of the following potential sources:

 Personal and idiosyncratic from any of the following potential scientist
 Analogies between different fields of study
 Findings of previous studies
 An existing theory
 Value orientation of the culture in which the researcher operates.

Testing of Hypothesis:
Definition: - A statement or pre-assumption about population parameter or population distribution is called as
hypothesis. If the population is large; there is no way of analyzing the population or testing the population or
testing hypothesis directly. Instead, the hypothesis is tested on the basis of random sample.
Types of hypothesis: Null hypothesis (H0) and Alternative hypothesis (H1)

Null hypothesis (H0): A hypothesis of there is no significant difference between population values and sample
values or from sample to sample values are called as Null hypothesis (H0).
Alternative hypothesis (H1): A complimentary statement of null hypothesis or a hypothesis of there is a
significant difference between population values and sample values or from sample to sample values is called as
“Alternative hypothesis (H1)”.

32
Type I and type II error: Since the conclusions of any research study depend on the evidence provide by a
sample. Variations from one sample to another sample never be eliminated until the sample is large as the
population itself. It is possible that the conclusions drawn are incorrect which leads to error. There can be two
types of errors which are as follows:

Decision based on sample

Accept H0 (Reject H1) Reject H0 (Accept H1)

H0 is true (Correct decision) (H1 is false) Wrong decision (Type I error)
H0 is false (Wrong decision) (H1 is true) (Type II error) Correct decision

If we wrongly reject H0, when in reality H0 is true - the error is called as Type I error. Similarly, when we
wrongly accept H0 when H0 is false- the error is called as Type II error. Both these errors are bad and should
be reduced to minimum. However, they can be completely eliminated only when the full population is
examined- in this case there would be no practical utility of the testing procedure. In all testing of hypothesis
procedure, it is assumed that type I error is more severe than type II error and so needs to controlled.

THE SIGNIFICANCE LEVEL: In all testing of hypothesis procedure, it is assumed that type I error is
more series, so the probability of type I error needs to be explicitly controlled. This is done through specifying a
significance level at which a test is conducted. Therefore, significance level sets a limit to the probability of type
I error and a test procedures are designed so as to get the lowest probability of type II error. There are two
significance levels which are 5% and 1%. 5% is called as Critical level of significance and 1% is called as higher
level of significance. A test of hypothesis is designed for a significance level and at the end of the test we can
reject the null hypothesis at 5% and 1% level of significance, this level is called as 'p value'. The “p” value of a
test expresses the probability of observing a sample statistic as extreme as the one observed if the null hypothesis
is true.
TESTS OF SIGNIFICANCE: The term statistically significance is often encountered in scientific literature,
and yet its meaning is still widely misunderstood. Determination of statistical significance is made by the
application of a procedure called as Tests of significance. Tests of significance are useful for interpreting
comparison results. For example: Suppose that a clinician finds that in small series of patients the mean response
to treatment is greater than for drug A than for drug B. Obviously the clinician would like to know if the
observed difference in this small series of patients will hold up for a population of such patients. In other words
he wants to know whether the observed difference is more than merely “sampling error”. This assessment can be
made with a statistical test of significance. To decide whether null hypothesis is to be accepted or rejected, a test
statistic is computed and compared with critical value obtained from a set of statistical tables. When the test
statistic exceeds the critical value, the null hypothesis is rejected and the difference is declared statistically
significant. Any decision to reject the null hypothesis carries with it a certain risk of being wrong.The risk is
called the significance level of the test.

COMMON STEPS INVOLVING IN TESTS OF SIGNIFICANCE

1.State the Null hypothesis (H0) and its Alternative hypothesis (H1) .
2. Find out the value of 'Test statistic' i.e. value of 'Z', 't', 'χ2, 'F', etc.
3. Determine probability i.e. p value at 5% and 1% level of significance.
4. Accept or reject the Null hypothesis (H0) at 5% and 1% level of significance.
5. Result / Interpretation
33
TESTS OF SIGNIFICANCE
Tests of significance are standard statistical procedures for drawing inference from sample estimates about
unknown population parameters. Sample estimates are never exact, being subject to sampling errors. In the
design of any medical research, attempts are made to reduce these sampling errors. Tests of significance allow
us to decide whether the sample estimates or the differences between estimates, are within their normal
biological variation, commonly called variability due to chance variation can give rise to difference between
samples being studied, and so every time a difference is observed. Question arises as to its statistical significance
i.e. whether the difference is unlikely to have occurred purely by chance alone. The objective of this lesson is to
enable understand meaning and application and its role in statistical inference. There are many types of tests of
significance carrying to different types of data according to their sample size. Choice of a significance test
depends on nature of data and design of a study.
INTERPRETATION OF SIGNIFICANCE
SIGNIFICANT does not necessarily mean that the observed difference is REAL or IMPORTANT. Only that it
is unlikely (< 5%) to be due to chance. Trivial differences can be statistically significant if they are based on
very large numbers.
INTERPRETATION OF NON - SIGNIFICANCE
NON - SIGNIFICANT does not necessarily mean that there is no real difference; it means only that the
observed difference could easily be due to chance (Probability of at least 5%) There could be a REAL or
IMPORTANT difference.
But due to INADEQUATE SAMPLE SIZE we might have obtained a non-significant result

TESTS OF SIGNIFICANCE For Quantitative TESTS OF SIGNIFICANCE For Qualitative Data

Data
If data is large (n>30)
a. If data is large (n>30) - ‘Z’ test 1. ‘Z’ test
2. Chi-square test
b. If data is small (n<30) - Student ‘t’ test
1. Paired ‘t’ test
2. Unpaired ‘t’ test

COMMON STEPS (PROCEDURE) FOR A TEST OF SIGNIFICANCE

1. State the Null hypothesis (H0) and its alternative hypothesis (H1)
2. Find out the value of Test statistic (i. e values of Z, t, Chi-square etc.)
3. Determine the probability i.e.‘p’ value
4. Accept or Reject the Null hypothesis
5. Result

34
1. Z-test for difference between two sample means:
Application: To find out Standard Error of difference between two sample means i.e. S. E. (X1-X2) e.g. To find
our significant difference between two different variables/groups i.e. Efficacy of two drugs, difference between
two groups etc.
Criteria' s:
Data must be quantitative. Data must be large (i.e. n>30) and Random samples selected from normal population
Steps involving in Test:
1. State the null hypothesis i.e. H0 and its alternative hypothesis i.e. H1
2. Find out the values of test statistic i.e. value of 'Z' as follows:
Z = X1 – X2 / SE (X1 – X2), where SE (X1 – X2)=√ (SD1)²/n1 + (SD2)² /n2
3.Determine probability i.e. 'p' value as follows:
If calculated value of Z <1.96, table value of Z at 5% level of significance, then accept null hypothesis i.e. H0.
(i.e. p>0.05) and If calculated value of Z >1.96,(2.58) table value of Z at 5% (1%) level of significance, then
Reject null hypothesis i.e. H0. (i.e. p<0.05 and p<0.01)

4. Thus, there is a not significant (p>0.05) and Significant difference (p<0.05) and Highly significant difference
(p>0.01) between two different groups under study.
The required data for Z test will be
n1 n2
X1 X2
SD1 SD2

Example: In an area a random sample of 500, the mean Hb (gm %) were found to b 9.8 with SD of 1.5. In
another area a random sample of 400 , the mean Hb was 8.6 with SD of 1.9. Test whether there is any significant
difference between mean Hb levels in both the groups.

Solution:
Given that, n1= 500 n2 = 400 X1 = 9.8 X2 = 8.6 SD1 = 1.5 SD2=1.9
Data is quantitative and large, Then, by applying Z test as follows: H0: There is no significant difference
between mean Hb level in both the groups
SE (X1 – X2)=√ (SD1)²/n1 + (SD2)² /n2) = √ (1.5)²/ 500 +(1.9)²/400 = √2.25/ 500 +3.61/400
= √0.0045+0.009025 = √0.013525 = 0.116297 Then, Z = 9.8-8.6 / 0.116 = 10.345
Thus value of Z >2.58 at 1% level of significance, hence reject H0.
Thus, There is a highly significant difference between mean Hb level in both the groups. i.e. p<0.01

35
2. Z test for difference between two sample proportions
Application: To fine out significant difference between two sample proportions i.e. SE (p1-p2)
Criteria: data must be qualitative ,data must be large (i.e. n>30), Random samples from normal population
Required data: p= positive proportion q= alternative proportion (100-p)
n1 n2
p1 p2
q1 q2
Steps involving in the test: 1.State the null hypothesis H0 and its alternative hypothesis H1
2. Applying Z test as follows: Z = p1-p2 / SE (p1-p2) where, SE (p1-p2)= √p1q1/n1+p2q2/n2
Steps involving in the test: Determine probability i.e. 'p' value as follows: If calculated value of Z <1.96, table
value of Z at 5% level of significance, then accept null hypothesis i.e. H0. (i.e. p>0.05) and If calculated value
of Z >1.96,(2.58) table value of Z at 5% (1%) level of significance, then Reject null hypothesis i.e. H0. (i.e.
p<0.05 and p<0.01)Thus, there is a not significant (p>0.05) and Significant difference (p<0.05) and Highly
significant difference (p>0.01) between two different groups under study.

Example: In School A, out of 900 students, 3% showed vitamin A deficiency. In school B, out of 700 students,
5% showed vitamin A deficiency. Test the significance by applying suitable test
Solution:
given that n1 = 900, n2=700
p1 = 3% p2= 5%
q1= 97% q2 = 95%
Null hypothesis (H0): There is no significant difference between proportions of vitamin 'A' deficiency in both the
school.
Z = p1-p2 / SE (p1-p2) where, SE (p1-p2)= √p1q1/n1+p2q2/n2 =√3x97/900 +5x95/700
= √0.3233+0.6785 = 1.00
Now Z = 5-3/1 = 2, Thus, Z >1.96, at 5% level of significance, than reject null hypothesis.
Thus there is a significant difference between proportions of vitamin A deficiency in both the schools.

3. Student’s Paired 't' test

Application: This test is applied to find out the significant difference between two sample means in before and
after observations.
Criteria's:
1. Data must be quantitative
2. Data must be small (i.e. n < 30)
3.Random samples selected from Normal population
Steps involving in Paired 't' test:
36
1.State the Null Hypothesis (H0) and Alternative Hypothesis (H1)
2. Find out the value of Test Statistic i.e. value of 't' as follows:
t = x / S.E.X

Where, S.E.x = Standard Error of difference between before and after observations and can be determined as
follows: i.e. S.E. x = SD/√n

Where, x = difference between before and after observations, SD is the Standard Deviations of the difference
between before and after observations and n = no. of observations.
The required data for Paired t test is as follows: n X1 = before data X2 = after data
3. To determine the probability or 'p' values first find out degrees of freedom (d.f.) as follows: d. f. = n-1
Find out value of Table 't' at 5% and 1% level of significance at n-1 d.f. for acceptance or rejection of the Null
hypothesis as:
If calculated value of 't' (actual value) < (less than) Table value of 't' at 5 % and 1% level of significance at n-1
d.f. Then accept Null hypothesis (H0) {Reject H1 }, i.e. Not significant(p>0.05). And if calculated value of 't'
(actual value) > (greater than) Table value of 't' at 5 % and 1% level of significance at n-1 d.f. then reject Null
hypothesis (H0){Accept H1 }, i.e. Significant (p<0.05) &Highly Significant (p<0.01)

Example:
Following data shows the effect of a drug on Weight in Kgs of 7 TB patients:

Sr. no. Before drug Weight (X1)

After drug Weight (X2)
1 45 49
2 43 51
3 47 54
4 41 43
5 40 42
6 45 47
7 46 50

37
Calculation table for Paired’ t’ test:

Mean of x = x =∑x/n =29/7 = 4.14

Null Hypothesis (H0):There is no significant difference between mean Weight of TB patients in before and after
drug By applying Paired 't' test as: t = x / SE x
where SE x = SD / √n and SD = √∑(x-x)2 / n-1 = √36.8572 / 6 = √6.142 = 0SD = 2.41
Now, S E x = SD / √n = 2.41 /√7 = 2.41/2.64 = 0.91
Then applying t test as follows: t = x / SE x = 4.14 / 0.91 = 4.55. Now d .f. = n-1 = 7-1 = 6 Table value of 't' at 6
d. f. at 5% (1%) level of significance = 2.45 ( 3.71) respectively. Calculated value of 't' > Table value of 't' at 5%
and 1% level of significance, then, Reject Null hypothesis i.e. H0
Result: There is highly significant difference between weight in Kgs of 7 TB patients before and after giving
drug.

4. Weight Weight x= X1-X2 x-x (x-x)2

(X1)
(X2)
1 45 49 4 0.14 0.0196
2 43 51 8 3.86 14.8996
3 47 54 7 2.86 8.1796
4 41 43 2 2.14 4.5796
5 40 42 2 2.14 4.5796
6 45 47 2 2.14 4.5796
7 46 50 4 0.14 0.0196
∑x = 29 ∑(x-x)2
=36.8572
Student’s Unpaired 't' test
Application: -This test is applied to find out the significant difference between two sample means if data is un-
grouped (unpaired)
Criteria's:
(1) Data must be quantitative (2). Data must be small (i.e. n < 30) (3)Random samples selected from Normal
population
38
Steps involving in Unpaired 't' test:

1.State the Null Hypothesis (H0) and Alternative Hypothesis (H1)

2. Find out the value of Test Statistic i.e. value of 't' as : t = X1 – X2 / S.E.( X1-X2)
Where, S.E.(X1 - X2)=Standard Error of difference between two sample means for two different groups and
can be determined as follows:
i.e. S.E. S.E.( X1-X2) = √ SD1² /n1 + SD2²/n2

Where SD1 and SD2 are the Standard Deviations of the two different samples i.e. n 1 and n2 respectively.

- The required data for Unpaired 't' test is as follows:

n1 n2
X1 X2
SD1 SD2
3. To determine the probability or 'p' values first find out degrees of freedom (d.f.) as follows:
d. f. = (n1-1) + (n2-1)= n1 + n2 -2
Find out value of Table 't' at 5% and 1% level of significance at n1+n2 -2 d.f. for acceptance or rejection of the
Null hypothesis as calculated value of 't' (actual value) < (less than) Table value of 't' at 5 % and 1% level of
significance at n1 + n2 -2 d.f. Then accept Null hypothesis (H 0) {Reject H1 }, i.e. Not significant(p>0.05). And
if calculated value of 't' (actual value) > (greater than) Table value of 't' at 5 % and 1% level of significance at n 1
+ n2 -2 d.f. then reject Null hypothesis (H0){Accept H1 }, i.e. Significant(p<0.05)&Highly Significant(p<0.01)
Example: Mean Apgar score of 15 newborns of high risk mothers was 4.2 with SD of 0.8. In another group of 17
new born in normal mothers the mean Apgar score showed 2.3 with SD of 0.5. Apply a suitable test of
significance to test there is any significant difference between mean Apgar score.

Given that, data is quantitative and small. Two different groups are given. then apply Un-paired 't' test as
follows:

Here, n1 = 15 n2 = 17
X1 = 4.2 X2 = 2.3
SD1 = 0.8 SD2 = 0.5
Null Hypothesis (H0): There is no significant difference between mean Apgar score of new born in Normal
mothers and High risk mothers and By applying unpaired 't' test as:
S.E. S.E.( X1-X2) = √ SD1² /n1 + SD2²/n2 = √ (0.8)² /15 + (0.5)²/17 = √(0.64 /15 )+( 0.25/17)
= √ 0.0427 + 0.0147 = √ 0.0574 = 0.239
t = X1 – X2 / S.E.( X1 - X2) = 4.2 – 2.3 / 0.239 = 1.9 / 0.239 t = 7.499

Now, d. f. = n1 + n2 -2 = 15 + 17 -2 = 32 - 2 = 30and Table value of 't' at 30 d. f. is at 5% level of

significance at 30 d.f. = 2.04 and at 1% level of significance at 30 d.f. = 2.75 Thus, Calculated value of 't' >
Table value of 't' at 5% and 1% level of significance, then, Reject Null hypothesis i.e. H0 Then there is a highly
significant difference between mean Apgar score of new born in Normal mothers and High risk mothers i.e.
p<0.01

5. CHI-SQUARE TEST (χ²)

This Non-parametric (distribution free) test was developed by Karl Pearson.
Application: To find out significant association between two variables. e.g. Smoking and Cancer, SE status and
Incidence of a disease, Occupation and Literacy, Vaccination and attack rate of a disease, etc...

39
Criteria’s: Data must be qualitative, Data must be large (n>.30), Expected frequency in any cell should not be
less than 5 and random samples selected from normal population

Required data: Two by two

contingency table (2 rows and 2
columns) or more rows and
columns
Smoking No smoking Total
C1 C2 CT
Cancer R1 O1 (E1) O2 (E2) R1T
No Cance1 R2 O3 (E3) O4 (E4) R2T
Total RT C 1T C2T GT

Where R1, R2, and C1 ,C2 are the rows and columns respectively. There corresponding totals are R 1T, R2T, C1T,
C2T. GT = GRAND TOTAL
O1,O2,O3 and O4 are the observed values (frequency) or actual values and E 1, E2, E3, and E4 are the expected
values (frequency)
Steps involved in test:
1.State the Null hypothesis (H0) & its alternative hypothesis (H1).
2. Find out the value of Test Statistic as i.e. value of χ² as follows:
χ² = Σ {(O-E)² / E} ={(O1-E1)²/E1+(O2-E2)²/E2+(O3-E3)²/E3+ (O4-E4)²/E4}
Where E1, E2, E3 and E4 are the expected values for observed values O1, O2, O3, and O4 in each cell each cell.
The values of E1, E2, E3 and E4 can be calculated as follows:
E1 = R1T X C1T / GT
E2 = R1T X C2T / GT
E3 = R2T X C1T / GT
E4 = R2T X C2T / GT
3. To determine the probability values of Chi-square at 5% and 1% level of significance, first find out degrees of
freedom (d.f.) as follows:
d. f. = (R-1 ) X (C-1) Where, R=No. of rows and C=No. of columns
4. Then, to accept or reject the null hypothesis find out Table values of Chi-square at 5% and 1% level of
significance and compare with actual values of Chi-square
40
Example: A study of vaccination against measles was conducted in a village. Out of 500 vaccinated 14 showed
attacks of measles, and out of 400 not vaccinated 27 showed attacks of measles. Apply a suitable test to find out
any significant association between vaccination and attack rate of measles.

Solution: Given data is qualitative, sample size is large.

Given data as follows:

Measles No measles Total
C1 C2 CT
Vaccinated R1 14 O1 486 O2 500 R1T
No vaccinated R2 27 O3 373 O4 400 R2T
Total RT 41 C1T 859 C 2T 900 GT

Steps involving in the test:

1.Null hypothesis (H0): There is no significant association between vaccination and attacks of measles
2. Find out the value of Test Statistic as i.e. value of χ² = Σ {(O-E) ² / E}
The values of E1, E2, E3 and E4 can be calculated as follows:
E1 = R1T X C1T / GT = 500x41/900=22.77
E2 = R1T X C2T / GT = 500x859/900 =477.22
E3 = R2T X C1T / GT = 400x41/900=18.22
E4 = R2T X C2T / GT =400x859/900=381.77
Then applying Chi-square test as follows
χ² = Σ {(O-E)² / E}={(O1-E1)²/E1+(O2-E2)²/E2 + (O3-E3)²/E3 + (O4-E4)²/E4}
=(14-22.77)²/ 22.77+(486-477.22)²/477.22+(27-18.22)²/18.22+ (373- 381.77)²/381.77
3.38+0.16+4.23+0.20 = 7.97
3. To determine probability, d.f.=(R-1)x(C-1)= (2-1) x (2-1) = 1x1= 1.
Now compare with table value of χ² at 5% and 1% level of significance i.e. 3.84 and 6.63 respectively.
Calculated value of χ² = 7.97 > 3.84 and 6.63 table value of χ² at 1 d.f. at 5% (3.84) and 1% (6.63) level of
significance, then reject Null hypothesis H0. (i.e. p<0.05, p<0.01)Thus, there is a highly significant association
between vaccination and attack rate of measles. (p<0.01)
Yates correction formula for Chi-square test: If in 2x2 contingency table if expected frequency of any cell is
less than 5 then following Yates formula can be applied
χ²= (ad-bc-½)²/(a+b)x(a+c)x(b+d)x(c+d)/ a+b+c+d (N) For this following table will be required

No. of pts Admitted No. of deaths Total

Hospital A a b a+b
Hospital B c d c+d
Total a+c b+d N = a+b+c+d

41
NON-PARAMETRIC TESTS IN MEDICAL RESEARCH

Non-parametric tests or distribution free methods are applicable to all types of data-qualitative (nominal
scaling), data in rank form (ordinary scaling) as well as data that have been measured more precisely (internal or
ratio scaling). Many non-parametric tests make it possible to work with very small samples. This is particularly
helpful to the medical researcher for collecting pilot study data especially working with a rare disease. A large
number of non-parametric tests exist. But in this scientific paper following few of the better known and more
widely used ones tests with their application in medical research with examples are discussed.

1. The Sign Test:(Ordinal level of measurement for two related samples)

Example: New drug ‘A’ was provided to 10 villages. It was observed that number of deaths due to Malaria in
these villages prior to provide drug ‘A’ and after that were as follows:

Village 1 2 3 4 5 6 7 8 9 10
-------------------------------------------------------------------------------------------------------------
Deaths prior to provide drug A 13 15 12 13 13 13 11 14 13 10
Deaths after drug A 11 12 10 12 10 9 8 12 15 14

Question: Can it be said that drug A has reduced significantly the deaths due to malaria in 10 villages?

Solution: In this non parametric test, we are interested in finding out the sign of the difference that occur in
number of deaths due to malaria before and after drug A.

By using data given, we can see that the difference can be assigned signs as:

13-11=2(+), 15-12=3(+), 12-10=2(+), 13-12=1(+), 13-10 =3(+), 13-9=4(+), 11-8=3(+) , 14-12=2(+), 13-
15=-2(-), 10-14 =-4(-)

Hence, the corresponding signs are:

+, +, +, +, +, +, +, +, -, -
(If difference in any two observations is zero we ignore it)

We have seen that we got 8 plus signs and 2 minus signs,, if providing drug A has no effect then we should
obtained 5 plus and 5 minus signs i.e. p=0.5 and q =0.5 in Binomial distribution.
Now we must answer the question whether getting 8 plus signs out of 10 is by chance at 0.05 level of
significance or not. We find that for n=10 and p=0.5, the probability of getting s8 plus signs (reduction in
deaths) is
P = n! / r! (n-r)! x pr x qr (Binomial distribution)

Where, n = 10, r = 8, p = 0.5, and q = 0.5, Thus, P = 10! / 8! 2! x (0.5)8 x (0.5)2 = 45 x (0.5)10 = 45 x 0.0009765
= 0.044
This probability is less than 0.05 and hence we can say that providing drug ‘A’ there is a significant reduction of
the deaths due to malaria in 10 villages.

42
2. Wilcoxon Signed Rank Test:(Ordinal level of measurement for two related samples)

Example: A child psychologist wished to test whether nursery school, attendance has any effect on children’s
social perceptiveness; he obtained 8 pairs of identical twins. At random 1 twin from each pair is assigned to
attend nursery school and other twin in each pair is to remain out of school.
Question: Can we comment on the Difference in children’s social perceptiveness of home and nursery school
children.

Solution: First we calculate the difference in children’s social perceptiveness for each children and next step are
to rank these differences from 1 to 8. While giving the ranks we give highest rank to highest difference. Then we
put sign of each mark as follows:

Pair Twin in school Twin at home Difference Rank of diff. Rank with less frequent sign
---------------------------------------------------------------------------------------------------------------------
1. 82 63 10 7
2. 69 42 27 8
3. 73 74 -1 -1 1
4. 43 37 6 4
5. 58 51 7 5
6. 56 43 13 6
7. 76 80 -4 -3 3
8. 85 82 3 2
T=4
----------------------------------------------------------------------------------------------------------------------

T= sum of the positive or negative ranks and n= no. of pairs

Then, Z = {T – N(N+1) / 4} / {√n(n+1)(2n+1)/ 24}= {4-8x9/4}/ 7.14 = -14 /7.14 = -1.96
Table value at 0.05 is 2.0, which is more than calculated value, Thus, the social perceptiveness of home and
nursery school children does not differ.

3. Mann-Whitney U test:
(Ordinal level of measurement for independent samples)
Mann-Whitney U test may be used to test whether two independent groups have been drawn from the same
population. This is one of the most powerful alternative parametric test for the‘t’ test, when the investigator
wishes to avoid the t test’s assumptions, or when the measurement in the research is weaker than interval
scaling.
Example: David and Jackson studied whether rats would generalize learned limitation when placed under a new
drive and in a new situation. Five rats were trained to imitate leader rats in a T maze. They were trained to
follow the leaders when hungry in order to attain a food incentive. Then 5 rats were transferred to a shock
avoidance situation, where imitation of leader rats would have enabled them to avoid electric shock. The shock
avoidance situation was compared to that of 4 controls that have no previous training to follow leader. The
comparison is in terms of how many trials each rat took to reach a criterion of 10 correct responses in 10 trials.
The numbers of trials to criterion required by the Experimental (E) and Control(C) rats are as follows:

E rats: 78 64 75 45 82
C rats: 110 70 53 51
Solution: We arrange these scores in the order of their size, retaining the identity of each:

45 51 53 64 70 75 78 82 110
E C C E C E E E C

43
Then obtain U by the following formula
U = n1n2 + {n1 (n1+1) / 2} –R1
Where, n1 and n2 are sample sizes and R1 is the sum of the ranks assigned to the values of the first sample.
i.e. R1 = 26

Thus U = 9, Table value of U at n2 = 4 = 0.243, thus rats previously trained to follow a leader to a food
incentive will reach the criterion in the shock avoidance situation.

4. One –Sample Run Test: (Ordinal level of measurement for one sample case)
This test is based on the order or sequence in which the individual scores or observations originally were
obtained. The technique to be presented here is based on the number of runs which a sample exhibits. A run
is defined as a succession of identical symbols.

Example: In a study of the dynamics of aggression in young children, the investigator observed pairs of
children in a controlled play situation. 12 children were surveyed those played together daily. The median of
this set in order in which those scores occurred is 24.5. The following aggression scores in order to pluses
and minuses were observed:

Child: 1 2 3 4 5 6 7 8 9 10 11 12
Score: 31 23 21 43 51 22 12 26 43 75 2 3
Position: + - - + + - - + + + - -
(score w.r.t median)

All scores falling below that median are designated as minus, all above that median are designated as plus.
Then r = 6 runs occurred in this series. Reference to the table reveals that r = 6, n 1=6 and n2 = 6 of critical
values of the ‘r’ in the run test does not fall in the region of rejection, and thus decision is that the sample
scores occur in random order is acceptable.

5. Spearman Rank Correlation Coefficient: (Test of correlation)

This test is applied only for rank correlation. It is a measure of association which requires that both variables
be measured in at least an ordinal scale so that the objects or individuals under study may be ranked in two
ordered series.

Example:
Patient No. : 1 2 3 4 5 6 7
Weight (Kg) (X) : 54 68 57 49 52 65 74
Systolic Blood Pressure (mm of Hg) (Y): 120 124 128 122 130 134 140
Rank of X: 3 6 4 1 2 5 7
Rank of Y: 1 3 4 2 5 6 7
Difference of ranks (d) : +1 +3 0 -1 -3 -1 0
Squares of diff. (d2) : 1 9 0 1 9 1 0
Then the value of Spearmen’s Rank Correlation coefficient can be calculated as follows:
rs = 1-{6Σd2/ n3-n} = 1- {6x21/7x7x7-7}= 1- {126/336} = 0.625

Then, apply ‘t’ test as follows:

t = rs√ n-2/1-rs2 = 0.625 {√7-2 / 1-0.39}=1.78

44
d.f. = n-2, Table value of t = 2.015,
Therefore calculated value of t < table value of t at 5% level of significance then, there is no significant
correlation between Weight and Systolic blood pressure for 7 patients. (p>0.05)

The various non-parametric tests can be summarized as follows:

Level of Non-parametric tests Nonparametric

measurement One sample case Two sample case tests of
Related samples Independent correlation
samples
Nominal χ2 one sample test McNemar test χ2 test
-----

Ordinal One sample run test Sign test Mann-Whitney Spearman rank
U test correlation
Wilcoxon matched
pairs signed–ranks Median test
test
Kolmogorov-
Smirnov test
Interval --------- Walsh test ------- -------

45
CORRELATION AND REHRESSION ANALYSIS
Introduction: Correlation measures the degree of relationship between the variables.The relationship of
quantitative nature, the appropriate statistical tool for discovering and measuring the relationship, and expressing
it in a brief formula is known as Correlation. Correlation is a statistical device which helps us in analyzing the
covariance of two or more than two variables.

TYPES OF CORRELATION Positive or negative Simple, partial and multiple Linear and Non-linear

Correlation is positive (direct) or negative (inverse) would depend upon the direction is a change of the
variables. If both variables are varying in the same direction, i.e. if one variable is increasing (decreasing) the
other is also increasing (decreasing) – Positive correlation If one variable is increasing (decreasing) the other is
decreasing (increasing) – negative correlation
METHODS OF CO-RELATION
1. SCATTER DIAGRAM (DOT DIAGRAM)
2. KARL PEARSON’S COEFFICIENT OF CORRELATION
3. SPAERMAN’S RANK CORRELATION COEFFICIENT

2. Karl Pearson’s correlation coefficient:

It is denoted by ‘r’ Range of r: - 1<= r <= 1
Formula: r = Σ xy / N δx δy where x= (X-X) and y=(Y-Y) and δx= SD of X, δy=SD of Y
Direct method:
When the number of observations are small and when the figures are not large and odd then following formula
can be used :
r = Σ XY – N X Y / √{ΣX2-N(X)²} √ {ΣY2-N(Y)²}
Example No.1: Find out Karl Person’s correlation coefficient of the following data showing relationship
between Weight (Kg) and Hb (gm %) of 7 patients admitted in a hospital:
Weight (kg) (X) Hb (gm %) (Y)
1. 53 12
2. 49 10
3. 54 11
4. 43 9
5. 45 12
6. 55 13
7. 44 10

Solution: given that, n=7, X= weight and Y= Hb Formula for Karl Pearson's correlation coefficient is
r = Σ XY – N X Y / √ {ΣX-N(X) ²} √ {ΣY-N(Y) ²}

46
-----------------------------------------------------------------------------------------------------------
Weight (kg) (X) Hb (gm%) (Y) X² Y² XY
----------------------------------------------------------------------------------------------------------
53 12 2809 144 636
49 10 2401 100 490
54 11 2916 121 594
43 9 1849 81 387
45 12 2025 144 540
55 13 3025 169 715
44 10 1936 100 440
------------------------------------------------------------------------------------------------------------
ΣX= 343 ΣY=77 ΣX² =16961 ΣY² = 859 ΣXY=3802
X = ΣX/N =49 Y = ΣY/N= 11
Putting the values in the formula as follows:
r = Σ XY – N X Y / √ {ΣX²-N(X)²} √ {ΣY²-N(Y)²}
r = 3802 -7*49*11 / √ {16961-7*(49) ² √859 -7*(11)²} = 3802-3773 / √ 16961-16807√859 -847
r= 29 / √154 * 12 = 29 / √ 1848 = 29 / 42.99 = 0.67
Thus correlation between weight and Hb level is positive.

RANK CORRELATION COEFFICIENT

The Karl Pearson's method is based on assumptions that the population being studied in normally distributed.
This method of finding out co-variability or lack of it between two variables was developed by British
Psychologist Charies Edward Spearman in 1904. This measures is especially useful when quantitative measures
of certain factors (such as in the evaluation of leadership ability or the judgment of female beauty) can not be
fixed, but the individual in the group can be arranged in order thereby obtaining for each individual a number
indicating his (her) rank in the group.

Spearman's rank correlation coefficient is defined as:

R=1-{6 ∑D²/ N (N2-1}
Where D = the difference of rank between paired items in two variables.
Example No. 1:
Find out Pearson's Rank Correlation coefficient of the following data:
S.N. Marks in PSM Marks in ENT
(Out of 50) (Out of 50)
1. 33 28
2. 22 17
3. 20 29
4. 14 31
5. 29 38
6. 41 26
7. 37 36
8. 25 21
9. 18 14
10. 34 24

47
Solution: X-Marks in PSM Y-Marks in ENT N = 10

S.N. X Y Rx Ry D = Rx-Ry D²
---------------------------------------------------------------------------------
1. 33 28 4 5 -1 1
2. 22 17 7 9 -2 4
3. 20 29 8 4 4 16
4. 14 31 10 3 7 49
5. 29 38 5 1 4 16
6. 41 26 1 6 -5 25
7. 37 36 2 2 0 0
8. 25 21 6 8 -2 4
9. 18 14 9 10 -1 1
10. 34 24 3 7 -4 16
-------------------------------------------------------------------------------
∑D² = 132
Applying formula for pearson's Rank Correlation coefficient as follows:
R = 1-6 ∑D2 / N3 -N = 1- {6*132 / (103 – 10)} = 1-{792 / (1000 – 10)} = 1-{792 / 990} = 1-0.8 = 0.2

REGRESSION
Regression analysis reveals average relationship between two variables and this makes possible
estimation of prediction. The meaning of the term regression is the act of returning of going back.
Regression analysis is a statistical device with the help of which we are in a position to estimate (predict)
the unknown values of one variable from known values of another variable. The variables which are used to
predict the variable of interest is called the ‘independent variable’ and the variable we are trying to predict is
called the ‘depending variable’. The independent variable is denoted by ‘X’ and dependent variable is denoted
by ‘Y’. The analysis used is called the simple linear regression analysis. The liner means that an equation of a
straight line of the form Y =a + bX.
Line of regression
There are two lines of regression for analysis of two variables under study and to estimate the unknown value.
1. Line of regression of X on Y is given as:
X – X = bxy (Y-Y)
Where, X and Y are the means of X and Y respectively. bxy = Regression coefficient of X on Y and is
calculated as follows: bxy = rσx / σy, Where, r is correlation coefficient between x and y and σx and σy are the
SD's of x and y respectively.
2. Line of regression of Y on X is given as:
Y-Y = b yx (X-X)

Where, X and Y are the means of X and Y respectively. b yx = Regression coeff. of X on Y and is calculated as
follows: byx = r σy / σx, where, r is correlation coefficient between x and y and σx and σy are the SD's of x and
y respectively.

To estimate X when Y is known the line of regression of X on Y can be used.To estimate Y when X is known
the line of regression of Y on X can be used.

48
Example:

1. Find out correlation coefficient and Construct two line of regression and estimate X when Y=85 for the
following data:

S. No. Weight (kg) (X) Diastolic BP (Y)(mm of Hg)

1. 52 88
2. 57 92
3. 48 78
4. 45 72
5. 49 74
6. 51 98
7. 53 94
Solution:
Given that, n=7, X= weight and Y=DBP
---------------------------------------------------------------------------------------------
X Y X² Y² XY
---------------------------------------------------------------------------------------------
52 88 2704 7744 4576
57 92 3249 8464 5244
48 78 2304 6084 3744
45 72 2025 5184 3240
49 74 2401 5476 3626
51 98 2601 9604 4998
53 94 2809 8836 4982
------------------------------------------------------------------------------------------
ΣX=355 ΣY= 596 ΣX² = 18093 ΣY² = 51392 Σ XY=30410

X = 355/7 = 50.71 Y = 596 /7=85.71

Now, correlation coefficient will be as follows:

r = Σ XY – N X Y / √{ΣX²-N(X)²} √ {ΣY²-N(Y)²} =30410 -7*50.71*85.71/√{ 18093-7*(50.71)²√51392 -7*(85.71)²}
30410-30424.47/√18093-18000.53 √51392-51423.43 = -14.47 / √(92.47)*(-31.43) = - 14.47/ √- 2906.33
= - 14.47 / - 53.91 = + 0.27

Line of regression of X(Weight) on Y (DBP) is as follows:

(X-X) =b xy (Y-Y)
where, bxy = r*SDx / SDy = 0.27* √(92.47)/(-31.43)
bxy = 0.27 * √-2.94 =0.27*- 1.71 = - 0.46

(X-50.71) = -0.46 *(Y-85.71) = (X-50.71) = -0.46 Y+39.43 i.e. X = -0.46Y + 39.43+50.71

Thus, X = -0.46Y +90.14

Line of regression of Y(DBP) on X (Weight) is as follows:

(Y-Y) = byx (X-X)
where, byx = r*SDy / SDx = 0.27* √(-31.43)/(92.47)= 0.27 *-0.34 = -0.092 = (Y-85.71) = -0.092 *(X-50.71)

(Y-85.71) = -0.092 X+4.66 =i, e. Y = -0.092 X+ 4.66 + 85.71 Thus, Y= -0.092 X+ 90.37

Now, to estimate X (weight) when Y (DBP) is given as 85 use Line of regression of X (Weight) on Y (DBP) as follows:

X = -0.46Y +90.14 X estimate = - 0.46 * 85 +90.14 = - 39.10 + 90.14 Thus, X estimate = 51.04
Thus, the estimated weight=51.04 when DBP =85

49
HEALTH STATISTICS

I. DEFINITION

“Health Statistics" is a specialized branch of Statistics that relates to the application of numerical methods to
all matters that have direct or indirect influence upon or relationship with health and are required for health
planning, services and reporting. In other words, it includes all statistical information required for the
administration of a health agency like health care providers, recipients of health care and health seeking
behaviors, other infrastructure facilities like hospitals, Clinics (MCH, STD etc), Blood banks, health expenditure
etc, and also the statistics required to assess the health status of people like vital statistics, demography,
morbidity statistics, hospital statistics, socio-economic, political spiritual and environmental factors which
influence health. Thus, “health statistics” is normally said as the “eyes and ears” of Public Health.

II. USES OF HEALTH STATISTICS:

The statistics can be used to answer the following questions, which every Public Health Personnel would
encounter while delivering the services -

 How many people suffer from particular diseases, how often and for how long.
 What demands these diseases place on the medical and public health resources; and what financial loss they
cause;
 How fatal the different diseases are;
 To what extent these diseases prevent people from carrying out their normal activities.
 To what extent diseases are concentrated in particular groups of the population e.g., according to age, sex,
ethnic group, occupation or place of residence;
 How far the above factors vary from time to time.
 What is the effect of medical care and health services on the control of disease incidence.

III. COMPONENTS OF HEALTH STATISTICS:

 Health status of persons and population in a given area, providing us with indices of vitality and health;
 The physical environmental & other conditions and factors having a more are less direct bearing on the
health status of the population - indices of social & environmental factors.
 Health services and activities directed at the improvement of health conditions - indices of health activity
and facilities.

IV. SOURCES OF HEALTH STATISTICS:

 Systems organized on a national scale to obtain information continuously from each household or institution
that is census, registration of vital events, notification of diseases, disease surveillance registry (National
Cancer Registry, National Tuberculosis Registry), national population surveys (National Sample Surveys),
MIS of national health programmes.
 Records of medical & health institutions providing service to the community.
 Surveys or investigations conducted in response to the need for more detailed information. ex. Nutritional
surveys, epidemiological investigations, field trials of vaccines.
 Miscellaneous eg. Physician case records, police record on accidents/ injuries/ suicides/ homicides etc,
meteorological data - temp, rainfall, humidity, air quality etc, morbidity records of industrial units, schools,
records of statutory bodies (DMER, MCI, DCI, FDI etc.) and information on social, economic or
occupational factors affecting health, health budget and expenditure etc.

50
V. DETAILS OF SOME IMPORTANT SOURCES:
 Census: According to United Nations the Census is defined as “ a process of collecting, compiling and
publishing demographic, economic and social data pertaining to all persons in a country at a specified time."
The purpose of census is to provide required information for planning and administering developmental
activities, including health. The indices normally calculated for planning health services from census
include - birth rates, death rates, sickness rates, literacy rates, age at marriage, expectation of life, age and
sex composition, urban and rural distribution of population, language, place of birth and nationality, amount
of disability, fertility data (number of children born and remaining alive), distribution of population by
occupation, housing etc, and rate of increase of population.

2) Registration of Vital Events:

Registration of vital events is a legal recording of the occurrence of an event with certain identifying descriptive
characteristics. Registration of births and deaths is known to have a long history as it was established in 1250
B.C in Egypt. In modern India, British first introduced registration of Births, Deaths and Marriage Act in 1886.
This provided a system for voluntary registration. Due to poor response from the public for this Act, after
Independence, the central government has promulgated a Central Births and Deaths Registration Act in 1969,
making it obligatory on the part of individuals and institutions. The defaulters will be punishable. As per this act
every death is to be reported within 7 days and birth is to be reported within 14 days of its occurrence.

Earlier the responsibility of registration of these vital events was shouldered on the village police Patil/revenue
official. After Panchayat Raj Act in 1961, the responsibility was shifted to Village secretary ( gramsevak).
However, the response of registration has not improved. Subsequently, at the peripheral level CHGs, Health
Assistants (Has), at intermediate level I/C M.O PHCs, at District level DHOs have been involved in collecting
this information, though the event is registered by the Gramsevak. In urban areas the responsibility of
registration lies on Municipal Councils or Corporations, as the case may be, while the CMO of the Municipal
Hospital also collects the information and forward it to the concerned state authorities.

3) Records of Medical & Health institutions

Hospital records and statistics provide valuable information though selective - tip of the iceberg, such as kinds of
diseases and injuries for which treatment is sought, seasonal and annual variations in the diseases, causes of
death etc. The following are the some of the indices calculated through hospital statistics.
1) Admission rate, 2) Average daily bed occupancy, 3) Average duration of stay
4) Percentage of occupancy, 5) Fatality ratio, 6) Net death rate, 7) Gross death rate
8) Autopsy rate 9) Post operative complication rate, 10) Geographical, time, age,
sex, occupation, religion-wise distribution of patients.

VI. PROBLEMS IN COLLECTION & MEASUREMENT OF HEALTH :

(1) Problem of detection, 2) Problem of diagnosis,3) Criteria of ill health(clinician, individual opinion,
diagnostic tests), 4) Classification into healthy & sick

Group 1 : Healthy without defect Healthy

Group 2 : Healthy with abnormality (congenital defect) Ill health
Group 3 : Healthy with scars or deformities left by past illness. Ill health
Group 4 : Healthy with latent disorder. Ill health
Group 5 : Sick (affected by a disease) Sick

VII. MEASUREMENT OF SICKNESS

Usually the sickness rates are measured in terms of “Persons” “Illness” or “Spells”

51
a) Incidence rate (spells): Total no. Of new spells of illness during a defined period
/population exposed to risk in the same period x 1000

b) Incidence rate (persons): Total no. of new persons who become ill at least once in a
defined period /population exposed to risk in the same period x 1000.

c) Period prevalence rate: Total no. Of new & old cases found during a sp. Period./
population exposed to risk in the same period. X 1000

d) Point prevalence rate: Total no. of new & old cases found at a particular point of time
/population exposed to risk in the same point of time X 1000

e) Fatality ratio: Total no. of deaths from a disease/ No. Of new cases of that disease.

VIII. PLANNING A HEALTH SURVEY

Often, routinely collected data from health service records and from other sources do not provide a complete
description of the population suitable for use in health service planning. On such occasions carefully planned
health surveys may be used to collect additional information.
A Health Survey

A health survey is a planned study to investigate the health characteristics of a population. It is used to
 measure the total amount of illness in the population,
 measure the amount of illness caused by a specified disease,
 study the nutritional status of the population,
 examine the utilization of existing health care facilities and demand for new ones,
 measure the distribution in the population of a particular characteristic e.g, Hb level, practices of
brushing the teeth, habits etc.
 examine the role and relationship of one or more factors in the etiology of a disease.

Planning a health survey

Step 1: Preparation of a detailed written statement of the objectives of the survey.

Step 2: Determination of the items of information required, and specification of definitions, criteria of
classification, and methods of collection.
Step 3: Definition of the reference population on which information is to be sought.
Step 4: Decision on whether the reference population is to be studied as a whole or in part ( sampled).
Step 5: Determination of the number of units in the population to be selected for study during the survey.
Step 6: Decision on how respondents will be selected from the population (sampling methods-simple random
sampling, stratified random sampling, systematic
sampling, clustering sampling, multi-stage sampling)
Step 7: Design, testing and validation of the questionnaires or forms on which observations will be recorded.
Main principles in designing a questionnaire
- geared to the specific objectives of the enquiry,- set out in suitable order,
- pre-classified and pre-coded wherever possible,- clear and unambiguous,
- simple, - adequate, - valid.
Step 8: Selection and training of interviewers.
Step 9: Collection of data.
Step 10: Preparation for data analysis.
52
VITAL STATISTICS
DEFINITION

“Vital Statistics” has been used to denote facts systematically collected and compiled in numerical form
relating to or derived from records of vital events, namely live birth, death, foetal death, marriage, divorce,
adoption, legitimating, recognition, annulment, or legal separation. In essence, vital statistics are derived from
legally registerable events without including population data or morbidity data.

USES OF VITAL STATISTICS

1) To describe the level of community health. 2) To diagnose community ills and determine the met and unmet
health needs, 3) To disseminate reliable information on the health situation and health programmes. 4) To direct
or maintain control during execution of program. 5) To develop procedures, definitions and techniques such as
recording system, sampling schemes etc. 6)To undertake overall evaluation of health programmes and public
health work.

For instance, carefully compiled causes of death in a city can provide answers to

1) The leading cause of death in the city? (Malaria, TB etc.) 2) At what age is the mortality highest and from
what disease?
3) What sections of the city (women or children or individuals following certain occupation ) are the most
unhealthy and what is the outstanding cause of death there. 4) Comparison of cities in relation to their health
status, health facilities to cope with the problem

IMPORTANT VITAL RATES AND RATIOS

A) Fertility rates/indices
1) Crude birth rate : An index of the relative speed at which additions are being
made to the population through child birth. CBR = {No. Of live births/ Mid-year population} X 1000
2) General Fertility Rate: The number of births in a population depends on the proportion of women of
child-bearing age. Thus a more appropriate measure of fertility would be obtained if we relate the live births to
the total women in the reproductive age period viz., 15-44 years. GFR: {No. Of live births in one year/ No. Of
women in 15-44 yrs} X 1000
3) General Marital Fertility Rate: If only the married women of 15-44 years age group, are considered, it
is called as General Marital Fertility Rate. GMFR :{ No. Of live births in one year / No. Of married women in
15-44 years} X1000
4) Age specific fertility rates: These take account of age-sex composition of population and are used to
analyze the trend of the live birth rate for various ages.
No. Of live births occurred to mothers of a specified age group
---------------------------------------------------------------------------- X 1000
Mid-year female population of the specified age group}

5) Total Fertility Rate (TFR): Average number of children that would be born alive to a woman during her
lifetime, who would be subjected to age-specific fertility rates of a given year.
6) Gross Reproduction Rate (GRR) : Average number of female live births that would be borne to a
woman during her lifetime who would be subjected to age- specific fertility rates of a given year.
7) Net Reproduction Rate ( NRR) : Average number of female live births that would be borne to a woman
during her lifetime who would be subjected to age specific fertility and mortality rates of a given year.

B) Mortality Rates :
a) Crude death rate (CDR) : To measure the decrease of population due to death the rate commonly used is
53
the CDR.
CDR : {No. of deaths in a given area and period / Mid-year population} x 1000
b) Specific Death rates: Specific death rates include age-specific (infants, neo-nats, geriatric), sex-specific,
vulnerable group-specific (maternal cases), disease- specific etc
i ) Infant Mortality Rate : It is one of the most sensitive indexes of health conditions of the general population.
It is sensitive measure because a baby in its extrauterine life is suddenly exposed to a multitude of new
environmental factors and their reactions are reflected in this rate. Under ideal conditions of social welfare no
normal baby should die.
IMR : ( No. of deaths occurred under 1 year of age / No. of live births ) X 1000

ii ) Neo-natal Mortality Rate : For deaths occurring under 28 days .

NMR : ( No. of deaths occurred under 28 days of age / No. of live births) X 1000

iii ) Post-neonatal Mortality Rate :

( Deaths of infants of age 28 days to under 1 year / Live births - neonatal deaths) X 100
iv ) Fetal Death Ratio : This ratio related the number of late fetal deaths to the number of live births.
Fetal Death Ratio : { No. of fetal deaths of 28 or more completed weeks of gestation / No. of live
births } x 1000
v) Perinatal Mortality Rate : Many late foetal deaths and early neo-natal deaths may be attributed to similar
underlying conditions, so this rate is calculated.

Perinatal Mortality Rate: {Late foetal deaths (20 weeks or more) + deaths under one week / live
births + late foetal deaths} X 1000

vi ) Maternal Mortality Rate : The risk of dying from causes associated with childbirth is measured by the
maternal mortality rate. MMR = { Number of deaths occurred due to delivery, child birth and puerperium / No
of live births } X 1000
vii ) Cause-of-death rate : This rate is calculated to understand, which cause/disease is commonly responsible
for mortality in the community/population.

C) Quality of Life Indicators:

i ) Life Table : William Farr called the life table the “Biometer” of the population. A life table is
composed of several sets of values showing how a group of infants, all supposed to be born at the same time and
experiencing specified mortality conditions, would gradually die out. These tables can be constructed
separately for male / female, occupational groups, population segments, geographical subdivision of a country.

It is constructed, showing survival and death occurring in a generation of 100,000 babies. On the basis
of mortality rate operating at the time under study, the number of babies would be alive at the first birth day is
estimated. By the application of the mortality rate of the second year on the number surviving at the end of
second year, we estimate the number surviving at the end of second year of life. Similarly for other ages. From
these we can also calculate the average lifetime a person can expect to live after any age.

ii ) Physical Quality of Life Index (PQLI ) : This is the average of Infant Mortality Rate, Literacy rate
and Life Expectancy at Birth.

iii) Human Development Index ( HDI ) : The HDI is based on three indicators :longevity, as measured
by life expectancy at birth; educational attainment, as measured by a combination of adult literacy (two-thirds
weight ) and combined primary, secondary and tertiary enrolment ratios (one-third weight);and standard of
living, as measured by real GDP per capita ( purchasing power parity - ppp).

54
COMPUTERS IN MEDICINE
INTRODUCTION:
The dictionary calls a computer an electronic device that stores, retrieves, and processes information. Thus, a
computer is a machine that can store large volumes of data and manipulate it by using arithmetic and logical
methods.

Salient features

- SPEED; - ACCURCY; - CONVENIENCE

HISTORY :

Earliest version was developed by Pascal baline in 1642 - The machine worked on wheels and gears, is only for
additions.
In 1694 G.W. Liebnitz devised a machine for other mathematical operations. Charles Babbage (1833), analytical
engine remained on paper, but the concept was accepted after his death.

In 1890 Dr. Hollerith devised an elementary punch card machine.

In 1946, the first electronic Computer was developed by J.Presper and John Mauchly. Weight 30 tons, space
15000 Sq.ft 300 multiplications per second.

Computer Generations
First generation - 1946, bulky could out 5000 basic arithmetic operations per second. Used vacuum tubes as the
main logical units.

Second generation - 1959, Transistors replaced vacuum tubes and occupied smaller space, consumed less
power, faster and more accurate.

Speed - Approx. One lakh inst/sec.

Third generation - 1970’s

Advent of silicon chip which could accommodate 100s of transistors, made computer still smaller and faster.

Fourth generation 1980s

Further compaction, a chip 0.5 sq.mm can accommodate 2000 transistors.

Super computer carry out 500 lakh instructions per second, but they still work too slowly to approximate higher
forms of human though involving rapid association and analysis of ideas.

Functional Units of Computer:

Memory

Input Control Out put

Key board, floppy ALU CDU, Printer

magnetic tape

55
CPU
Hardware is the physical components of the computer, the things you can see and touch.

Software is a set of instructions that tell the computer what to do.

Computer Languages :

- FORTRAN [ Formula Translator]

- COBOL [ Common Business Oriented Language]
- BASIC
- PASCAL - PL/1

Fourth generation Languages

- DBASE - Fox - base - For - Pro

- C, Turbo C, Word star etc.,

Application in Medicine

In health care, computers were tested in as early as the 60 s , primary in bio-medical research.

The last decade has seen the computer coming out from laboratory to routine clinical environment.

- Management of health system - hospital health institutions health departments

- Training and education
- Computer aided diagnosis; CAT -Scan, computer aided monitors, X-ray readers
- Computerized Hospital MIS
- Assisting patients care, referral, follow-up, preventive screening
- Epidemiological surveillance
- Expert systems to aid diagnosis and treatment
- Primary health care information systems.
STRUCTURE OF MEDICAL INFORMATICS

Medical informatics comprised the theoretical and practical aspects of information processing and
communication, based on knowledge and experience derived from processes in medicine and health care.

Interface of medicine & computer technology at ‘Six’ different levels of complexity.

1. Communication and recording :

- Visualization of biological variables on a CRT, connected to a processor.

2. Storage and retrieval of date bases

- Medical records and other information of the hospitals

3. Computation and Automation

- Laboratory equipment - quality control & reporting

56
- ECG, EEG, EMG, lung fn. analysis etc.
- Computerized tomography

4. Recognition and diagnosis

- development & use of diagnostic models using truth tables, decision trees, multivariate statistics ( Bayes
thermo) and expert system.
- recognition of objects and palters in images and signals as in X-rays, ECG interpretation and cell, chromosome
or cervical smear recognitio35n.

5. Therapy and control

- Automated control of the patients third balance in a post operative ICU.

- Implantable micro system to be carried by the patient subcutaneously e.g. insulin pump for the diabetic

6. Research and Modeling

- Models for cardiovascular physiology in terms of mechanical ( flows, pressures, volume) and electrical
( depolarization and replarisation) parameters have been developed.

- epidemiology
- expert systems using AI

References :

- Computer in Medline , - R D Lele

- Computers in Medicine - applications & possibilities - Javitt

Drug Allergy Clinical Aspects, Diagnosis, Mechanisms, Structure Activity Relationships 2nd Edition All-in-One Download
100% (12)
Drug Allergy Clinical Aspects, Diagnosis, Mechanisms, Structure Activity Relationships 2nd Edition All-in-One Download
16 pages
Epidemiology for Canadian Students: Principles, Methods and Critical Appraisal
From Everand
Epidemiology for Canadian Students: Principles, Methods and Critical Appraisal
Scott Patten
1/5 (1)
Statistics Notes
100% (1)
Statistics Notes
94 pages
Applicationofbiostatistics 141230000032 Conversion Gate02
No ratings yet
Applicationofbiostatistics 141230000032 Conversion Gate02
77 pages
Bio Statistics
No ratings yet
Bio Statistics
30 pages
Biostatistics On Health
No ratings yet
Biostatistics On Health
44 pages
Application of Statistics On Health
100% (2)
Application of Statistics On Health
42 pages
Topic 1 - Introduction To Biostatistics
No ratings yet
Topic 1 - Introduction To Biostatistics
12 pages
Bio Statistics
No ratings yet
Bio Statistics
14 pages
Research Methodology and Biostatistics Unit II Part I
No ratings yet
Research Methodology and Biostatistics Unit II Part I
71 pages
01 - Introduction
No ratings yet
01 - Introduction
27 pages
Biostatistics Module Sep2023 240520 122333
No ratings yet
Biostatistics Module Sep2023 240520 122333
65 pages
Chapter Introduction Is Comprises of Use of Statistics in Science and Research, Definition of Statistics, Along With Use in Medicine and Health
No ratings yet
Chapter Introduction Is Comprises of Use of Statistics in Science and Research, Definition of Statistics, Along With Use in Medicine and Health
5 pages
Application and Uses of Biostatistics in Different Medical
No ratings yet
Application and Uses of Biostatistics in Different Medical
15 pages
Introduction to Medical Statistics 2024-25
No ratings yet
Introduction to Medical Statistics 2024-25
50 pages
Lecture 1 NSU
No ratings yet
Lecture 1 NSU
68 pages
Introductoin To Biostatistics (1st and 2nd Lec)
No ratings yet
Introductoin To Biostatistics (1st and 2nd Lec)
62 pages
Biostat Epidemio Journal 2020
No ratings yet
Biostat Epidemio Journal 2020
253 pages
biostatistics and its application
100% (1)
biostatistics and its application
3 pages
Public Health Epidemiology
From Everand
Public Health Epidemiology
Crystel Harb
5/5 (1)
Biostat Intro
No ratings yet
Biostat Intro
60 pages
Biostat English
No ratings yet
Biostat English
52 pages
Medical_Statistics
No ratings yet
Medical_Statistics
94 pages
Statistics DietCook
No ratings yet
Statistics DietCook
88 pages
2 Biostatics UNIT-II
No ratings yet
2 Biostatics UNIT-II
18 pages
Biostat Journal 2020 v4 PDF
No ratings yet
Biostat Journal 2020 v4 PDF
83 pages
Introductoin to Biostatistics ( 1st and 2nd Lec )
No ratings yet
Introductoin to Biostatistics ( 1st and 2nd Lec )
47 pages
Unit-1 Biostatistics Descriptive
No ratings yet
Unit-1 Biostatistics Descriptive
17 pages
Contact Details:: Dr. Joy C. Chavez
No ratings yet
Contact Details:: Dr. Joy C. Chavez
101 pages
1Introduction
No ratings yet
1Introduction
36 pages
Biostatistics and Research Methodology
No ratings yet
Biostatistics and Research Methodology
128 pages
Teaching Health Statistics
No ratings yet
Teaching Health Statistics
242 pages
Biostatistics Manual
No ratings yet
Biostatistics Manual
95 pages
810-Article Text-1612-1-10-20161231
No ratings yet
810-Article Text-1612-1-10-20161231
5 pages
Med Stat
No ratings yet
Med Stat
94 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
3 pages
Biostat Lecture 1
No ratings yet
Biostat Lecture 1
17 pages
Lecture 3 PH
No ratings yet
Lecture 3 PH
20 pages
Biostatistics - Bme Yr Iii
No ratings yet
Biostatistics - Bme Yr Iii
38 pages
Biostatistics 1N - Prelim Module
No ratings yet
Biostatistics 1N - Prelim Module
10 pages
Presentation 1
No ratings yet
Presentation 1
48 pages
Final Biostatistics Lecture Notes
No ratings yet
Final Biostatistics Lecture Notes
71 pages
Statistics 1
No ratings yet
Statistics 1
20 pages
Biostat Manual
100% (1)
Biostat Manual
97 pages
Biostatistics Kitabu
No ratings yet
Biostatistics Kitabu
97 pages
Medical Statistics: "Statistics in Medicine" Redirects Here. For The Journal, See
No ratings yet
Medical Statistics: "Statistics in Medicine" Redirects Here. For The Journal, See
5 pages
1 Introduction To Biostatistics
No ratings yet
1 Introduction To Biostatistics
52 pages
Biostatitics, Vital and Health Statistic 1
No ratings yet
Biostatitics, Vital and Health Statistic 1
41 pages
Scope of Statistics
No ratings yet
Scope of Statistics
33 pages
0 Ppt1 Introduction To Biostatistics123
No ratings yet
0 Ppt1 Introduction To Biostatistics123
59 pages
Biostatistics Lecture Notes - 1 - 4
No ratings yet
Biostatistics Lecture Notes - 1 - 4
34 pages
Biostatistics
No ratings yet
Biostatistics
53 pages
Bio Stats
No ratings yet
Bio Stats
19 pages
HIM 340 STATISTICAL METHODS IN HEALTH INFORMATION MANAGEMENT III
No ratings yet
HIM 340 STATISTICAL METHODS IN HEALTH INFORMATION MANAGEMENT III
21 pages
27-11 Statistics
No ratings yet
27-11 Statistics
57 pages
Basic of Biostatistics_1
No ratings yet
Basic of Biostatistics_1
34 pages
MODULES
No ratings yet
MODULES
89 pages
Lecture 1.4-Introduction To Biostatistics-Danardono
No ratings yet
Lecture 1.4-Introduction To Biostatistics-Danardono
18 pages
3.C. Intoduction to Biostatistics
No ratings yet
3.C. Intoduction to Biostatistics
33 pages
222 STA Lecture Guides W1 2025-2(1)
No ratings yet
222 STA Lecture Guides W1 2025-2(1)
10 pages
07-Health Information and Biostatistics: November 2016
No ratings yet
07-Health Information and Biostatistics: November 2016
32 pages
A Study On Hospital Acquired Infection A
No ratings yet
A Study On Hospital Acquired Infection A
22 pages
DOH Programs Lecturette
No ratings yet
DOH Programs Lecturette
30 pages
MRCS Applied Basic Science and Clinical Topics 1st Edition Steve Parker pdf download
100% (2)
MRCS Applied Basic Science and Clinical Topics 1st Edition Steve Parker pdf download
48 pages
Etiological Factors of Temporomandibular Joint Disorders: Wolters Kluwer - Medknow Publications
No ratings yet
Etiological Factors of Temporomandibular Joint Disorders: Wolters Kluwer - Medknow Publications
6 pages
BVBRC Genome
No ratings yet
BVBRC Genome
7 pages
School Form 8 (SF 8)
No ratings yet
School Form 8 (SF 8)
3 pages
Hot-Stone-Massage-Consent-and-Release-Form
No ratings yet
Hot-Stone-Massage-Consent-and-Release-Form
1 page
Cebu Doctors' University College of Nursing: Module 1F: Group Worksheet
No ratings yet
Cebu Doctors' University College of Nursing: Module 1F: Group Worksheet
55 pages
Agric JSS2 Second Term
50% (2)
Agric JSS2 Second Term
29 pages
Elbow Assessment
No ratings yet
Elbow Assessment
39 pages
Choi 1998
No ratings yet
Choi 1998
6 pages
2024 Employee Wellness Industry Trends Report Key Takeaways
No ratings yet
2024 Employee Wellness Industry Trends Report Key Takeaways
15 pages
Amm CV
100% (2)
Amm CV
7 pages
Advanced Listening and Speaking C1 - Handout For Session 1
No ratings yet
Advanced Listening and Speaking C1 - Handout For Session 1
3 pages
Definition, Scope and History of Microbiology
No ratings yet
Definition, Scope and History of Microbiology
13 pages
Full Body Checkup A5 V3
No ratings yet
Full Body Checkup A5 V3
2 pages
Biochemistry - Prelim
No ratings yet
Biochemistry - Prelim
1 page
Aditya LD
No ratings yet
Aditya LD
144 pages
Effectiveness of Glass Ionomer
No ratings yet
Effectiveness of Glass Ionomer
13 pages
PHT 111 Intro to Public Health
No ratings yet
PHT 111 Intro to Public Health
25 pages
Site Specific Biosec Plan
No ratings yet
Site Specific Biosec Plan
33 pages
(Without Prejudice To The Movements Ordered by The Health Authorities) (Indicate The Address From Which It Started) Targeting
No ratings yet
(Without Prejudice To The Movements Ordered by The Health Authorities) (Indicate The Address From Which It Started) Targeting
1 page
Cyclic Behaviour Of Soils And Liquefaction Phenomena Proceedings Of The International Conference On Cyclic Behaviour Of Soils And Liquefaction Phenomena 31 March02 April 2004 Bochum Germany Theodoros Triantafyllidis instant download
100% (2)
Cyclic Behaviour Of Soils And Liquefaction Phenomena Proceedings Of The International Conference On Cyclic Behaviour Of Soils And Liquefaction Phenomena 31 March02 April 2004 Bochum Germany Theodoros Triantafyllidis instant download
29 pages
(PDF Download) Tropical Dermatology Steven K Tyring Fulll Chapter
100% (7)
(PDF Download) Tropical Dermatology Steven K Tyring Fulll Chapter
49 pages
Group 4 General Pathology of Ischemia
No ratings yet
Group 4 General Pathology of Ischemia
13 pages
BDP Annex 4 BOPIS - 7.8.2021
No ratings yet
BDP Annex 4 BOPIS - 7.8.2021
24 pages
CMS IV Mrp-Updated 9jun
No ratings yet
CMS IV Mrp-Updated 9jun
12 pages
Knowledge Assessment Workbook CHCCCS038 V1.0 Done
No ratings yet
Knowledge Assessment Workbook CHCCCS038 V1.0 Done
73 pages
Research Paper 5
No ratings yet
Research Paper 5
12 pages