Statistics Notes
Statistics Notes
MEDICAL STATISTICS
SYLLABUS POINTS
Vital Statistics.
Medical Statistics – Dr. Suhas Kumar Shetty
BRANCHES OF STATISTICS
Descriptive statistics.
Inferential statistics.
DESCRIPTIVE STATISTICS
It refers to the various statistical measures that are used to describe the
various characteristics of data. From this type of statistics we can not conclude
INFERENTIAL STATISTICS
It refers to various statistical measures that are used to draw some valid
e.g. Test of significance like t-test, f-test, z-test, Chisquare test, etc.
APPLICATION OF STATISTICS
Science with statistical support will yield fruits. (i.e. will achieve its
maximum outcome).
The science of statistics can be applied to any of the scientific fields like
on.
When the statistical methods or science of statistics are applied for public
Biostatistics or Biometry.
BIOSTATISTICS
variations.
VARIABLE
Variables.
BIOMETRY
Bio + Metry.
HEALTH STATISTICS
MEDICAL STATISTICS
VITAL STATISTICS
When the statistics is applied in the field of demography (i.e. Study of the
population) and its important events like – Birth, Death, Mortality rate, Fatality
! ! !
" #
Ayurveda, deals with the four types of Ayu i.e. Hitayu, Sukhayu, Ahitayu,
Dukhayu.
So, it can be concluded that both biometry as well as Ayurveda deals with
the measurement of life.
Biometry, can be applied in various fields of Ayurvedic Researches like –
Literary study, Pharmacological study, Clinical study, Survey study, etc.
Some of the common applications of the Biostatistics are as follows –
TO SIMPLIFY OR TO CONDENSE THE HUGE DATA
Collection of the lakshanas of various diseases.
Collection of lakshanas as per Poorvaroopa, Roopa, Upadrava, Asadhya
lakshana, Arishta lakshana, etc. (i.e. Hetu kosha, Lakshana kosha)
Literary study on Prakriti – Collection of various factors about Prakriti and
classifying them according to the physical factors, psychological factors,
Shadanga shareera, etc.
Vyadhi Kshamatwa – Collection of the concept of Bala in various texts and
dividing them as per the dividing base i.e. Sahaja bala, Kalaja bala,
Yuktikrita bala.
TO TEST THE HYPOTHESIS
Whatever mentioned in classics, to re-evaluate the concept.
e.g. '# % (% ' )$ * * +, - "." " / / /0 $ " %
Kavalagraha in Mukhapaka with some medicine but with varying duration of the
Kavalagraha. (i.e. 5 minutes, 10 minutes, 15 minutes, etc.) In this research work
finally on the basis of statistical results obtained the scholar can draw some
conclusion and can standardize the particular time for the Kavalagraha
procedure in respected condition.
TO STUDY THE RELATIONSHIP BETWEEN 2 OR MORE VARIABLES
This can be done with the help of concept of co-relation.
e.g. When a scholar planned a research work to evaluate the effect of
Kavalagraha in Mukhapaka with some medicine but with varying duration of the
Kavalagraha. (i.e. 5 minutes, 10 minutes, 15 minutes, etc.) In this research work
finally on the basis of statistical results obtained the scholar can draw some
conclusion and can standardize the particular time for the Kavalagraha
procedure in respected condition.
Relation between the number of cigarettes per day and the life span of
smokers, etc studies can be undertaken.
TO PREDICT THE FUTURE THINGS (i.e. to assess the future events)
LIMITATIONS OF STATISTICS
Statistics deals with the quantitative characters rather than qualitative data.
e.g. Statistics can predict the number of books in library, but not the number
of good quality books.
Statistics does not deal with individual or single character. It is true on
average.
e.g. In class A, 3 students scored 35, 35 and 35 marks respectively. The
mean score of the class will be 35+35+35=105/3=35.
In class B, 3 students scored 78, 22 and 5 marks respectively. The mean
score of the class will be 78+22+05=105/3=35.
Though, the average is same in both the groups, the individual values
differs. This is the limitation of the statistics. Here, statistics deals with the
group not with an individual entity. Though the average marks scored in both
classes is same it does not mean that all the students have scored similar
marks. But, this limitation can be neglected / nullified by the concept of
dispersion.
Statistical results may be hampered by various physical, biochemical,
analytical, methodology, etc. forms of research bias. (i.e. Errors in
conducting research.)
e.g. Errors done by researchers, Errors in methodology, Errors in analysis,
Errors in collection and calculation of data, etc.
Statistics can be miss used and wrong statistical methods can be
manipulated.
e.g. “Number of accidents are committed by females are less as compared to
Males.” Out of 1000 male riders, 15 males were committed with accident. Out
of 100 female riders, 3 were committed with accident. Here, numerically the
number of accident seems to be more in males, but it is wrong to give above
mentioned statement. Because, the incidence of the event taken in both the
group is not same. If we take the mean in male riders it will be 1.5 and in
females it will be 3.0. So, if we calculate the incidence as per the size of
population the number of accidents committed by females will be 30. It is clear
that, female riders are more prone to commit accidents. So, the above
mentioned statement is statistically wrong.
Medical Statistics – Dr. Suhas Kumar Shetty
!
DATA
figures, numbers or the set of the values i.e. recorded in one or more
observational queries.
OBSERVATIONAL UNITES
OBSERVATIONS
e.g. Measuring the Blood pressure is the event & the measured blood pressure
It should be – (CURA)2
Complete
Comparable.
Up to dated.
Understandable.
Reliable.
Relevant.
Accurate.
Available easily.
CLASSIFICATION OF DATA
CONTINUOUS DATA
The data which is collected by using measuring instrument and
represented as round number or fraction or decimals, is called as continuous
data.
e.g. Weight of New borns in a hospital – 2.8 Kg, 3.5 kg, etc.
Hb% of the patients – 8.6gm%, 11.5gm%, etc.
CLASSIFICATION OF DATA BASED ON FUNCTIONAL CLASSIFICATION
PRIMARY DATA
Those data, which are collected for the very first time, original in nature
under the control and supervision of medical investigator, is called as primary
data.
e.g. A research scholar collecting data for thesis work. Number of family planning
operations conducting in P.H.C., etc.
SECONDARY DATA
The data which is not collected by the investigator, but it is derived from
other reliable sources, referred as secondary data.
e.g. The D. H. O. collects the information about the number of Tuberculosis
patients in a district.
A doctor wants to study the relationship of smoking and Heart diseases
based on the data given in Indian Medical Journals, etc.
RELIABLE SOURCE OF DATA
The data which is collected from a reliable source like Government offices,
Standard and Recognized institutes, National and International Organization, etc.
The National Level – Various ministries coming under Government of
India.
e.g. Ministry of Family and Health Welfare, Ministry of Mother and child Health
welfare, etc.
The State Level – Various ministries running under the state Government
under the control of Central Government.
The District Level – District / Community hospitals running under the
control of state government respective ministries.
The Local Level – Recognized hospitals, NGO’s, Private organizations, etc
The various standard Index Journals and Publications like BMJ, etc.
Medical Statistics – Dr. Suhas Kumar Shetty
VARIABLE
A characteristic that takes on different values in different persons, places
or things.
CONSTANT
Quantity that do not vary in a given set of observational data. they do not
require statistical study. (S.D., S.E., Mean, C.C.)
POPULATION
Study of elements such as person, things or measurements for which we
have an interest at a particular time.
SAMPLE
Part of population or group of sample unit.
SAMPLING UNIT
Each member of a population.
PARAMETER
Summary value or constant of a variable that describe the population such
as mean, C. C., etc.
STATISTIC
Summary value that describe the sample such as its mean, S.D., S.E., etc.
PARAMETRIC TEST
It is one in which population constants are used such as mean, variance,
C.C., etc.
NON-PARAMETRIC TEST
The tests such as x2 test in which population no constant of a population is
used. Data do not follow any specific distribution and no assumptions are made.
e.g. To clarify good, better, best values.
COLLECTION OF DATA
DEFINITION
The various methods by which the necessary samples or data are
collected for the study in a systemic manner depending upon need / requirement
of researcher.
SOURCE OF COLLECTION OF DATA
There are main 3 sources.
Experiments
Surveys
Records
Medical Statistics – Dr. Suhas Kumar Shetty
EXPERIMENTS
Various experiments are conducted for investigation and fundamental
research based on the basic principles of particular science.
The data is collected with specific objectives and the results obtained are
used in the preparation of dissertation, thesis, research paper, journal articles,
etc.
SURVEY
It is used in epidemiological studies to find out the incidence or prevalence
of health or disease in a community.
Survey provide useful information for –
Changing the trends in health status, morbidity, mortality, etc.
Provides feed back, which will be helpful to plan or alter or to modify the
policies run by Government or any of the authority.
RECORDS
These are maintained for a long period of time in registers or books of
concern departments like Central Government, State Government, etc.
These are used for various purposes like Vital statistics, demography, etc.
METHODS OF COLLECTION OF DATA
It is important to differentiate a primary or a secondary data before we start
the collection. The important methods of collection of data are –
Observational
Interview
Questionnaire
Experimental
OBSERVATIONAL METHOD OF DATA COLLECT
The general observation does not stand for observation.
Observation is a scientific toll and a systematic method of collection of data
(i.e. In preview of the objective of the researcher.)
Types
Based on systematic plan and organization of the researcher, the
observation is divided into 3 categories –
Structured
Unstructured
Medical Statistics – Dr. Suhas Kumar Shetty
STRUCTURED OBSERVATION
If the data collection is done in a systematic manner, with fulfillment of all
pre-requisites, then it is called as Structured Observation.
Most of the researches use this type of observation.
UNSTRUCTURED OBSERVATION
If a systematic approach is not taken towards data collection, it is called as
unstructured observation.
Types of Observation
Based on the involvement of observer, observation it is divided into –
Participant Observation
Non-participant Observation
PARTICIPANT OBSERVATION
When the observer becomes a part of the sample, understanding in the
emotional, socio-cultural, occupational background, it is called as Participant
Observations.
e.g. A research scholar conducting a research in his native area, called as
Participant observation. Because, the observer will be the native of that particular
area and will be aware with all the emotional, socio-cultural, occupational
background of the samples.
NON PARTICIPANT OBSERVATION
When the observer is not a part of the sample and there will not be any
understanding in the emotional, socio-cultural, occupational background, it is
called as Non-participant Observations.
In this type of observation, the chances of bias is more.
e.g. A Indian research scholar conducting a research in London which is totally
different from his present status, called as Participant observation. Because, the
observer will not be the part of that particular area and will not be aware with all
the emotional, socio-cultural, occupational background of the samples.
Benefits / Merits
Subjective bias is eliminated in participant.
Independent of willingness by respondent.
Non-need of active co-operation.
De-merits
Limited information.
Same unforeseen factors / Hidden factor may interfere with observation.
Medical Statistics – Dr. Suhas Kumar Shetty
INTERVIEW METHOD
It is a form of interrogation / communication based on stimuli and response
or questions and answers.
It is of 2 types –
Direct personal investigation.
Indirect oral examination.
DIRECT PERSONAL INVESTIGATION
It is a form of investigation where the interviewer relies on the wordings of
the interviewee.
INDIRECT ORAL EXAMINATION
It is a form of examination, where the cross check of the interview is done
by related person.
e.g. Paediatric examination, Psychiatric examination, CBI investigations, etc.
Characteristics of Interviewer
Interviewer should be – Polite, honest, sincere, impartial, technical,
competence with necessary practical experience and must be friendly with the
interviewee.
Guidelines for interviewer
Interviewer should know the problem and well planned prepared.
Always have good set up. (Cool and Calm)
Have friendly and informal talks.
Have curiosity and respect.
Ask well phrased questions.
Should not hurt the interviewee.
The matter must be confidential.
Merits
More detail information can be obtained.
Greater flexibility to restructure the questions.
De-merits
Respondent / Subjective bias.
Time consuming.
QUESTIONNAIRE METHOD
It is a method, where the questions are given and the respondent is asked
to reply the same according to the instructions.
It is of 2 types –
Given
Posted
GIVEN
In this type of questionnaire method a set of questions is prepared and
provided to the respondent. Sufficient time is given to respondent to answer the
given questions.
Medical Statistics – Dr. Suhas Kumar Shetty
POSTED
In this type of questionnaire method a set of questions are prepared and
provided to the distant respondent. Sufficient time is given to respondent to
answer the given questions and asked the respondent to post it back to the
observer. In this type of method there is low return rate.
GUIDELINES FOR QUESTIONNAIRES
Questions should be simple, clear, understandable and related to the topic
or problem.
Decide either closed end or open end or even both types of questions.
Maintain the sequence (order) of questions (i.e. From general to complex)
Questions should not be related to personal character / wealth.
Questions should not hurt the person.
Avoid the use of those questions which puts too much of strain to one’s
memory or intellect. (i.e. it should be according to the qualification and I. Q.
of the respondent.
Merit
Time saving.
Low cost.
Large sample can be taken.
Sufficient time to answer.
Best method to those who are not approaching.
De-Merits
Can be used in only educated and co-operative patients.
Low return rate, especially in posting method.
Doubt about its own version.
EXPERIMENTAL METHOD
The method in which various experiments or measurable instruments are
adopted for the collection of data, is called as Experimental method.
Merits
An ideal objective parameter.
Beneficial in comparison.
Lack of subjective bias.
De-merits
Expensive.
Chance of observer bias.
Sometimes it may false positive results.
Hence, it is very important to co-relate the investigative values with the
clinical presentations.
Medical Statistics – Dr. Suhas Kumar Shetty
$ !
$ ! !
It includes sorting (i.e. classification and presentation of data.)
CLASSIFICATION
Definition
The grouping or arranging or division of data based on some similar or
dissimilar characteristics, to facilitate easy analysis and condensation of huge
data is called as classification of data.
Types
Based on the number of attributes / characteristics it is divided into 2 types.
Simple
Manifold
SIMPLE CLASSIFICATION
If the classification is based on the single attribute / characteristic is called
as simple classification.
e.g. Single classification based on any of the based entity Age, Sex, Religion,
Nutritional status, etc.
Table showing the number of patients in different age groups.
Sl. Age groups Number of patients
01. 10-20 15
02. 20-30 23
03. 30-40 24
MANIFOLD CLASSIFICATION
If the classification is based on the 2 or more than 2 attributes, it is called
as Manifold classification.
e.g. Single classification based on Age, Sex, Religion, Nutritional status, etc.
Table showing the number of patients according to sex, age groups and their
nutritional status.
Sl. Sex No. of Age No. of Nutritional No. of
Pt.’s Pt.’s status Pt.’s
01. Male Children Normal nutrition 08
26 Under nutrition 16
Over nutrition 02
Adulthood Normal nutrition 19
30 36 Under nutrition 12
Over nutrition 05
Adult Normal nutrition 32
48 Under nutrition 15
Over nutrition 01
02. Female Children Normal nutrition 19
26 Under nutrition 12
Over nutrition 05
Adulthood Normal nutrition 32
36 Under nutrition 15
Over nutrition 01
Adult Normal nutrition 08
48 Under nutrition 16
Over nutrition 02
Medical Statistics – Dr. Suhas Kumar Shetty
!
Definition
Systematic representation of the data, which is collected and classified in
the form of tables or drawing (graphs / diagrams) is called as presentation of
data.
IDEAL PRESENTATION
It should be simple and systematic to arouse the interest.
It should be concised, but there should not be any vomition / deletion of
data.
It should be arranged in logical or chronological manner.
It should be useful for further analysis.
OBJECTIVES / USE OF PRESENTATION OF DATA
Easy and better understanding.
Helpful in future analysis.
Easy for comparison.
It gives a first hand information.
It is an attractive and appealing way of presentation.
Types of presentation
Presentation can be made in mainly 2 forms –
Tables (Tabulation / Frequency Distribution Tables. FDT)
Drawing (Geographical Presentation / Frequency Distribution Drawing.
FDD)
TABULATION / FREQUENCY DISTRIBUTION TABLE / FDT / TABLES
The systematic presentation of data in rows and columns, called as FDT
(Frequency Distribution Table / Tabulation)
Tabulation is a process by which a data of a long series of observation are
systematically organized and recorded, so as to unable analysis and
interpretation.
CHARACTERISTICS OF FREQUENCY DISTRIBUTION TABLE (FDT)
It should be simple and clear cut.
The title of the Frequency Distribution Table (FDT) should be expressed in
appropriate terms.
The figures / numbers in the body of table should be arranged in logical
manner.
If several points are emphasized from the same data, make many small
tables.
Medical Statistics – Dr. Suhas Kumar Shetty
LOWER LIMITS
It is a starting / first value of the class.
e.g. In the class 20-30, 20 is the lower limit of the particular class.
UPPER LIMIT
It is a last / ending limit of the class.
e.g. In the class 20-30, 30 is the upper limit of the particular class.
CLASS MID POINT
It is a single representative value of the class, which is used for the further
statistical classification.
It is calculated by 2 methods.
Lower limit + Upper limit Lower limit (of 1st Class) + Lower limit (of next class)
2 2
In the class 20-30, the class mid point will be –
20+30 = 50/2 = 25.
In the class 20-30, 30-40 the class mid point will be –
20+30 = 50/2 = 25.
Among these 2nd method of calculating the class mid point is the better
way for inclusive type of tables.
CLASS FREQUENCY
The number of observation following in a particular class called as class
frequency.
The sum of all class frequencies will give the total number of observations.
Class frequency of 20-30 is 6.
METHOD OF CONSTRUCTION OF CLASSES
There are 3 methods in constructing classes.
Exclusive
Inclusive
Open end method
EXCLUSIVE METHOD
Upper limit of the class is excluded. (i.e. Not a part of from particular
class.) The upper limit of the class will be the lower limit of the next class.
It is used for discrete or continuous type of data.
e.g. 0-10, 10-20, 20-30, etc. Here, there is continuation of the upper limit of one
class with the lower limit of the next class.
Medical Statistics – Dr. Suhas Kumar Shetty
INCLUSIVE METHOD
The upper limit of the class is included. (i.e. It is a part of the same class.)
Upper limit of the class will not be the lower limit of the next class.
Because, it is included in the same class itself.
It is used for discrete data.
e.g. Weight, Hb%, height of the person.
OPEN END
When the lower limit of the first class or upper limit of the last class or both
will not be fixed, called as open end method.
It is used to accumulate a few extreme low or high.
e.g. 0, 3, 5, 50, 20, 27, 26, 244487, 6, 89, 984526.
TYPES OF TABLES / FREQUENCY DISTRIBUTION TABLE
There are 3 common types of frequency distribution table (FDT).
Ordinary frequency distribution table (FDT)
Relative frequency distribution table (FDT)
Cumulative frequency distribution table (FDT)
ORDINARY FREQUENCY DISTRIBUTION TABLE (FDT)
It is a type of frequency distribution table (FDT) in which the observations /
classes are arranged with their respective frequencies, called as ordinary
frequency distribution table (FDT).
Uses :
It is simple, easy understanding for a large data in a snap.
RELATIVE FREQUENCY DISTRIBUTION TABLE (FDT)
It is a type of frequency distribution table (FDT) in which the frequency of
each is expressed in terms of fractions, decimals or percentage, is called as
relative frequency distribution table (FDT).
It is calculated by the number of frequency of the class divided by the total
number of frequencies.
Uses :
It facilitates the comparison of 2 or more sets of data.
It constitutes the basis of understanding the concept of probability.
CUMULATIVE FREQUENCY DISTRIBUTION TABLE (FDT)
It adds the frequency starting from the first class to the last class.
The cumulative frequency of the given class represents the total of all
previous class frequency including that particular class.
Uses
To calculate more than and less than values of a given observation / class.
For further statistical calculations like median.
Medical Statistics – Dr. Suhas Kumar Shetty
! % " &
Presentation of the data in a form of graph or diagram is known as drawing
or Geographical presentation or Frequency Distribution Diagram.
Generally, graphs are used to represent quantitative data, where as
diagrams are used to represent qualitative data.
GRAPH
These are commonly used frequency distribution drawings. These are of 6
types. Viz. –
Histogram
Frequency polygon
Frequency curve
Line graph (Chart)
Cumulative frequency diagram (Ogive)
Dot or scattered diagram
HISTOGRAM
It is also called as Block Diagram. It is a type of Area diagram where the
variable or characters are plotted in X axis (Abscissa) where as frequencies are
marked in Y axis (ordinate).
A continuous series of rectangles are formed and this is called as
Histogram. The width of the bars may vary.
e.g. Mountaux test of 206 patients.
Result of Montaux test in 206 patients is as follows -
Result of the Test Number of patients Result of the Test Number of patients
08 – 10 24 16 – 18 12
10 – 12 52 18 – 20 8
12 – 14 42 20 – 22 14
14 – 16 48 22 – 24 6
Histograph Graph Showing the Result of Mountaux test in 206 patients.
60 X - Axis (Abscissa) = Result of
52
48 Mountaux Test in mm.
50 Scale = 1 cm = 2 mm.
42
40 Y - Axis (Ordinate) = Number of
the patients.
30 24 Scale = 1 cm = 10 patients.
20 14
12
10 08 06
X
0
Y 8 10 12 14 16 18 20 22 24 26
Medical Statistics – Dr. Suhas Kumar Shetty
20
10
10
X
0
8 10 12 14 16 18 20 22 24 26
Y
FREQUENCY POLYGON
Polygon means figures with the many angles. Joining the midpoints of
class intervals at the height of frequency after Histogram with a straight line is
called as frequency polygon.
Histograph Graph Showing the Result of Mountaux test in 206 patients.
150
F
R 125
E
Q 100
E
N 75
C
Y 50
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
LINE GRAPH OR CHART
The points are marked corresponding to each class or variables against
their frequencies and they are joined by smooth line.
It is used to represent the trend in the form of increase or decrease or the
fluctuation of given data.
e.g. Population in million of various decades. (It can be either in descending or
ascending)
150
F
R 125
E
Q 100
E
N 75
C
Y 50
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
150
F
R 125
E
Q 100
E
N 75
C
Y 50
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
DOT DIAGRAM / SCATTERED DIAGRAM
Generally used in correlation when there is more than one variable to
compare this type of diagrams are used.
It is applicable when one has to represent two variables in same direction.
One variable can be represented in X axis and other can be in Y axis. We plot
variables in X axis, then frequency to be considered in Y axis and viceversa.
It is used in context of correlation. Therefore, it is also called as
“Correlation Diagram.”
e.g. Height and Weight
150
F
R 125
E
Q 100
E
N 75
C
Y 50
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
Medical Statistics – Dr. Suhas Kumar Shetty
! % #
To present qualitative or discrete data diagrams are generally used. The
commonly used diagrams are as follows –
01. Bar Diagram
02. Pie Diagram – Sector Diagram
03. Pictogram – Picture Diagram
04. Map Diagram – Spot Map
BAR DIAGRAM
Representation in the form of rectangles with spacing with uniform width of
rectangle is called as Bar Diagram. The spacing between the two bars should be
½ of the width of the rectangle.
Types of Bar Diagram
01. Vertical Bar Diagram
02. Horizontal Bar Diagram
In case of horizontal bar diagram, variable is represented in Y axis and in
case of vertical bar diagram variable is in X axis and frequency in Y axis.
e.g. Attendance of Boys and Girls of 1st year PG class.
Bar diagram can be also classified as –
01. Simple bar diagram
02. Multiple bar diagram
03. Proportionate bar diagram
SIMPLE BAR DIAGRAM
When you represent a single variable as a set of rectangle is called as
simple bar diagram.
e.g. Height of Boys of 1st year PG class.
The following graph is an example of VERTICAL BAR DIAGRAM.
150
F
R 125
E
Q 100
E
N 75
C
Y 50
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
Medical Statistics – Dr. Suhas Kumar Shetty
150
F
R 125
E
Q 100
E 75
N
C 50
Y
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
MULTIPLE BAR DIAGRAM
When variables are represented in sets of more than one is called as
multiple bar diagram.
e.g. Heights of boys in 1st, 2nd year PG.
150
F
R 125
E
Q 100
E 75
N
C 50
Y
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
PROPORTIONATE BAR DIAGRAM
Useful for comparison and is represented by subdivision in a same
rectangle.
e.g. Heights of boys in 1st,2nd and 3rd year PG classes.
150
F
R 125
E
Q 100
E
N 75
C
Y 50
25
X
142.50 145 147.50 150 152.50 155 157.50 160
Y HEIGHT IN CMS.
Medical Statistics – Dr. Suhas Kumar Shetty
PIE DIAGRAM
It is also called as sector diagram. Frequencies are represented by a circle
where each class or observation is represented by class frequency divided by
total number of observations and multiplied by 360.
Class frequency
Pie Diagram = x 360
Total number of observation
e.g. Draw a pie diagram of following data.
Prakriti Frequency Calculation Degrees
Vata 12 12 / 36 x 360 120
Pitta 18 18 / 36 x 360 180
Kapha 6 6 / 36 x 360 60
V (12)
K (6)
P (18)
! ! % "
Measures of location
Major characteristics of frequency distribution are –
Measures of Central tendency (Location, Position, Average)
Measures of scatteredness / Degree of scatteredness (Dispersion, /
Variability / Spread)
Extent of symmetry – If the data are asymmetrical called as “Skewness,”
which can be of two types –
Positive Skewness (Right sided)
Negative Skewness (Left sided)
Measures of Peakedness – If it is abnormally peak or flat is called as
“Kurtosis.”
! "
' ' #
Introduction
It is a most preferred and commonly used measure of central tendency.
It is also called as “Average.”
Definition
It means, the additional / summation of all individual observations divided
by total number of observations.
Types of Series / Problems
There are 2 types of series –
Series
Formula = x = ε x / n
The arithmetic mean of the above given set of data can be calculated by 2
methods –
Direct Method
Step Deviation Method
DIRECT METHOD
Formula = x = ε x / n
x = 2 + 4 + 7 + 3 + 5 + 6.
6
x = 27 / 6 = 4.5
So, the Arithmetic mean of the above given data is 4.5.
STEP DEVIATION METHOD
x = 10 + (– 5.5) = 4.5.
So, the arithmetic mean of the above given data is 4.5 calculated by SDM.
Medical Statistics – Dr. Suhas Kumar Shetty
Formula = x = ε f x / n
Formula = x = ε fx / n
x = 135.
50
x = 2.7 i.e. Approximately 3 children per family.
So, the Arithmetic mean of the above given data is 2.7 i.e. 3.
Medical Statistics – Dr. Suhas Kumar Shetty
Formula = x = ε f x / n
e.g. Following are the waiting time of 20 patients to consult a physician in clinic –
Class Frequency Class midpoint (x) fx
0–5 3 2.5 7.5
5 – 10 2 7.5 15
10 – 15 3 12.5 37.5
15 – 20 5 17.5 87.5
20 – 25 3 22.5 67.5
25 – 30 4 27.5 110
325
The arithmetic mean of the above given set of data can be calculated by 2
methods –
Direct Method
Step Deviation Method
DIRECT METHOD
Formula = x = ε fx / n
x = 325
20
x = 16.25 i.e. Approximately 17 minutes per minutes.
So, the Arithmetic mean of the above given data is 16.25 i.e. 17.
STEP DEVIATION METHOD
!
01. The sum of the deviation from the arithmetic mean is always zero for a given
distribution.
i.e. ε (x – x ) = 0.
Formula = x = ε x / n
x = 10 + 12 + 11 + 14 + 15 +13.
6
x = 75 / 6 = 12.5.
So, the Arithmetic mean of the above given data is 12.5.
i.e. (x – x ) = 0.
10 – 12.5 = – 2.5.
12 – 12.5 = – 0.5
11 – 12.5 = – 1.5
14 – 12.5 = 1.5
15 – 12.5 = 2.5
13 – 12.5 = 0.5 = 0.
Summation of the ε (x – x ) = 0.
Medical Statistics – Dr. Suhas Kumar Shetty
CAM = x 1,2 = n1 x1 + n2 x 2
n1+n2
e.g. A student has scored 60% marks in SSLC and 70% in PUC with 6 subjects
= 6 x 60 + 6 x 70
x 1,2
6+6
= 360+420
12
= 780
12
x1,2 = 65%
cases, all the observations do not have same importance. When this is true,
It is calculated by,
ε wx
xw=
εw
Where, W= weighted given to each observation, x w = Weighted Arithmetic Mean,
mean.
It is calculated by,
ε wx
xw=
εw
Where, W= weighted given to each observation, x w = Weighted Arithmetic Mean,
MERITS
Easy to understand.
Easy to calculate.
Every set of data has one and only one Arithmetic mean.
DEMERITS
(% )*
It is called Q2 because it denotes 2nd Quartile or positional value.
Introduction
It is the 2nd measure of central tendency. Here there are 3 quartiles Q1, Q2,
A Q1 Q2 Q3 B
Definition
Median or 2nd quartile (Q2) divides the distribution into two equal parts i.e.
50% of the distribution is below the median & 50% is above the median.
CALCULATION
Type I Problem
A) When ‘n’ is odd (n – Total number observation)
If the total number of observations are odd, then arrange the observations
either in ascending or descending order and calculate the median by following
method –
Q2 = n+1 item
2
Where, Q2 – is median and n – is total number of observations
e.g. Number of patients treated in emergency room on 7 consecutive days are as
86, 49, 52, 43, 25, 11, 31. Calculate the median.
Answer :
Arranging the observations in ascending order –
11, 25, 31, 43, 49, 52, 86
Total number of observations are 7. i.e. Odd number.
So, Q2 = n+1 item
2
Where, Q2 – is median and n – is total number of observations
Q2 = 7 + 1 / 2
Q2 = 8 / 2
Q2 = 4th item. i.e. 43.
So, the median of above given set of data is 43. (i.e. 4th item)
Medical Statistics – Dr. Suhas Kumar Shetty
Merits
Easy to understand.
Easy to calculate.
De-merits
Mode is not based on the all the observations. (i.e. Gives only positional
values)
exactly.
Medical Statistics – Dr. Suhas Kumar Shetty
(+*
Dictionary meaning of the mode is common, fashionable or usual. Mode is
the value which occurs more frequently (i.e. Maximum number of times) in a
given set of data and around which other items of the set cluster each other (i.e.
Central point of alteration)
Type I :
Selection of Mode = The Observation having highest repetition.
Find out the mode of the following data.
10, 11, 12, 26, 20, 40, 20, 10, 12, 10.
As 10 is repeating 3 times 10 is the mode.
But, some times there can be no mode (i.e. 1, 2, 3, 4, 5, 6.) or more than
one mode (i.e. 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4.).
Type II :
Selection of Mode Observation = Observation containing highest
frequency.
Following table showing number of children per family.
Number of children per family Number of families
0 13
1 24
2 25
3 13
4 14
In this case, the data which has maximum frequency is taken as Mode (z).
In the above series the observations which has maximum frequency is the
mode. As 2 has maximum frequency i.e. 25.
Hence, the mode of the above given set of data is 2.
Type III :
Selection of Model class = The class containing highest frequency
Formula = Mode (z) = L1 + f1 – f0
XC
2f1 – f0 – f2
Where, z – is Mode, L1 – is the lower limit of the modal class,
f1 – is frequency of the modal class, f0 – is frequency of previous class,
f2 – is frequency of next class, c – is class interval.
If the modal class is 1st or last class their frequencies f0 & f2 should be taken as 0.
Medical Statistics – Dr. Suhas Kumar Shetty
e.g. Following Table showing the Age wise distribution of 150 patients.
Age groups Frequency (f)
20 – 30 15
30 – 40 23
40 – 50 27
l0 50 – 60 20 f0
l1 60 – 70 35 f1
l2 70 – 80 25 f2
80 – 90 5
Formula = Mode (z) = L1 + f1 – f0 x c
2 f1 – f0 – f2
Where, z – is Mode, L1 – is the lower limit of the class model,
f1 – is frequency of the modal class, f0 – is frequency of previous class,
f2 – is frequency of next class, c – is class interval.
Mode (z) = 60 + 35 – 20 x 10.
2 x 35 – 20 – 25
= 60 + 15 x 10
70 – 20 – 25.
= 60 + 15 x 10
50 – 25.
= 60 + 150
25
= 60 + 6
= 66.
Mode (z) = 66.
Merits
Most representative value of a given set of data.
Easy to calculate.
Not affected by extreme values.
Mode can be found for both qualitative and quantitative data.
Easy to understand.
Average to be used to find the ideal size.
De-merits
Sometimes no mode or more than one mode in a given set of distribution.
Not used for further mathematical calculations.
Not commonly used.
Medical Statistics – Dr. Suhas Kumar Shetty
, #
MEASURES OF VARIABILITY
Introduction
In the previous chapter on measure of central tendency, it was providing
us a single representation value of a given set of data. But that alone may not be
adequate to describe the complete data.
e.g. Table showing marks scored by the 3 students in 6 subjects.
Subjects/ Students A B C
1st Subject 50 49 80
2nd Subject 50 51 20
3rd Subject 50 48 60
4th Subject 50 52 40
5th Subject 50 47 70
6th Subject 50 53 30
The arithmetic mean of all the above students are same i.e. 50. But
student A has no variation. Student B has little variation and student C has more
variation. This scatteredness can be calculated by various measures of variability
/ dispersion.
Definition
Measures of variation / dispersion describe the spread or scatteredness of
the individual observations or items around the central tendency.
Significance
Gives complete idea / picture of data.
Gives information about scatteredness around the central tendency.
Useful for further calculations e.g. Test of significance, etc.
Helps in comparison of distribution.
Gives idea about the reliability of average value.
Methods of Dispersion
Commonly used methods are –
Range
Inter quartile range (IQR)
Semi inter-quartile Range / Quartile deviation (QD)
Mean deviaiton / Average deviation (MD)
Standard deviation (SD)
Medical Statistics – Dr. Suhas Kumar Shetty
Range is defined as the difference between the highest and lowest values
in a set of data.
Calculation
R=H–L
8.8 gm%, 9.3 gm%, 10.5 gm%, 11.4 gm%, 14 gm%, 10.5 gm%.
Formula – R = H – L
Merits
Easy to compare.
De-merits
% ( % *
It is defined as the difference between the 3rd quartile and 1st quartile.
n – is Number of observations.
84 Kg., 48 Kg., 39 Kg., 64 Kg., 78 Kg., 63 Kg., 38 Kg., 54 Kg., 60 Kg., 62 Kg.
Ascending order –
38, 39, 48, 54, 60, 62, 63, 64, 78, 84. (Even numbers method for Median)
Formula = Q3 = 3 x n / 4.
Formula = Q1 = n / 4.
IQR = 64 – 48
IQR = 16.
Merits
Easy to calculate.
De-merits
Based on first and last values. (i.e. Initial and last 25% values are not
included)
Medical Statistics – Dr. Suhas Kumar Shetty
-% % #
It is a measure of variability.
It is calculated by the average difference of 3rd quartile and 1st quartile.
Formula = QD = IQR / 2. = Q3 – Q1 / 2.
Where, QD – is Quartile Deviation, IQR – is Inter quartile deviation.
Q3 – is Item 3rd quartile, Q1 – is Item 1st quartile
e.g. Following are the weights of 10 students. Calculate the IQR.
84 Kg., 48 Kg., 39 Kg., 64 Kg., 78 Kg., 63 Kg., 38 Kg., 54 Kg., 60 Kg., 62 Kg.
Ascending order –
38, 39, 48, 54, 60, 62, 63, 64, 78, 84. (Even numbers method for Median)
Q3 – is 64.
Q2 – is 48.
IQR – is 16.
Formula = QD = IQR / 2. = Q3 – Q1 / 2.
Where, QD – is Quartile Deviation, IQR – is Inter quartile deviation.
Q3 – is Item 3rd quartile, Q1 – is Item 1st quartile
QD = 16 / 2
QD = 8.
Coefficient of QD = Q3 – Q1
x 100
Q3 + Q1
Where, RD Range deviation, Q3 – is 3rd quartile, Q1 – is 1st quartile.
e.g. Following are the weights of 10 students. Calculate the IQR.
84 Kg., 48 Kg., 39 Kg., 64 Kg., 78 Kg., 63 Kg., 38 Kg., 54 Kg., 60 Kg., 62 Kg.
Ascending order –
38, 39, 48, 54, 60, 62, 63, 64, 78, 84. (Even numbers method for Median)
Q3 – is 64.
Q2 – is 48.
Coefficient of RD = 64 – 48
x 100
64 + 48
= 16 / 112 x 100
Coefficient of RD = 14.28 %
Merits
Easy and simple to understand.
Easy to calculate.
Not affected by extreme values.
Demerits
It is a positional valve which is based on two quartiles.
Based on 1st & last value (First 25% and last 25% are not included.)
Medical Statistics – Dr. Suhas Kumar Shetty
# ' # # ( ' *
Introduction
Definition
measure of the central tendency. (i.e. May be Mean, Mode, etc.) by ignoring the
mathematical signs.
Calculations
It is calculated by –
Formula = AD = ε|x – x |
n
Where, AD – is Average Deviation / Mean deviation, ε – is summation,
| | – is Modulus, x – is Individual observation, x – is Arithmetic mean,
n – is Total number of observations.
e.g. Number of students in a single class in different divisions.
10, 20, 30, 40, 50. Calculate the Average Mean.
Step 1st : Calculate the arithmetic mean.
Formula = x = ε x
n
Where, x – is Arithmetic mean, ε – is summation,
x – is individual observations, n – is total number of observations.
x = 10 + 20 + 30 + 40 + 50
5
x = 150
5
x = 30.
Step 2nd : Calculate the Average Mean.
It is calculated by –
Formula = AD = ε|x – x |
n
Where, AD – is Average Deviation / Mean deviation, ε – is summation,
Calculate x – x
10 – 30 = – 20, 20 – 30 = – 10, 30 – 30 = 0, 40 – 30 = 10, 50 – 30 = 20.
AD = ε|x – x|
n
AD = ε|20 + 10 + 0 + 10 + 20 |
5
AD = 60 / 5 = 12.
So, the absolute average deviation of the given set of the data is 16.
CAD = 12 / 30 x 100
CAD = 40 %.
Merits
Easy to calculate.
Easy to Understand.
De-merits
mathematical signs then, sum of deviation from the arithmetic mean will be
# ( *
Introduction
It is a most widely used and the best method of calculating deviation.
While calculating the Average deviation (AD), though it takes into
consideration of all the observations, it ignores the mathematical signs. But,
standard deviation (SD) overcomes this problem by squaring the deviation.
Definition
The Standard deviation is the square root of summation of square of
deviation of given set of observations from the arithmetic mean divided by the
total number of observations.
Calculations
It is calculated by following ways –
Type I : Individual observation without frequency.
Formula = σ = ε (x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
x – is Individual observation, x – is Arithmetic mean,
n – is Total number of observation.
e.g. Following are the results of the ESR in mm for 1st hour observed in 5
individuals. Calculate the standard deviation.
2, 4, 6, 8, 10.
The above mentioned example comes under the Type I series of data
Step 1st : Calculate the arithmetic mean.
Formula = x = ε x
n
Where, x – is Arithmetic mean, ε – is summation,
x – is individual observations, n – is total number of observations.
= 2 + 4 + 6 + 8 + 10
x
5
x = 30
5
x = 6.
Step 2nd : Calculate the x – x .
2–6=–4
4–6=–2
6–6= 0
8–6= 2
10 – 6 = 4
Medical Statistics – Dr. Suhas Kumar Shetty
Formula = σ = ε (x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
x – is Individual observation, x – is Arithmetic mean,
n – is Total number of observation.
σ= 40
5
σ = 2.8.
So the standard deviation of the above given set of data is 2.8.
Coefficient of Standard Deviation / Coefficient of Variation (CSD/CV)
Formula = Coefficient of Variation = SD / AM x 100
Where, SD – is Standard Deviation, AM – is Arithmetic Mean.
CSD / CV = 2.8 / 5 x 100
CV = 47%.
Type II : Individual observation with frequency.
Formula = σ = ε f(x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
F – is frequency, x – is Individual observation, x – is Arithmetic mean,
n – is Total number of observation.
e.g. Following table shows the number of children per family. Calculate the
standard deviation.
Number of Children (x) Number of families (f)
1 2
2 3
3 2
4 4
5 3
Total Number of Observations = 14. (Add all frequencies)
Medical Statistics – Dr. Suhas Kumar Shetty
Formula = x = ε fx
n
Where, x – is Arithmetic mean, ε – is summation, f – is frequency,
x – is individual observations, n – is total number of observations.
x = ((2x1) + (3x2) + (2x3) + (4x4) + (3x5))
5
x = 45
14
x = 3.21.
Step 2nd : Calculate the x – x .
1 – 3.21 = – 2.21
2 – 3.21 = – 1.21
3 – 3.21 = 0.21
4 – 3.21 = 0.79
5 – 3.21 = 1.79
Step 3rd : Calculate the summation of f (x – x )2.
(– 2.21) 2 = 4.88 x 2 = 9.76.
(– 1.21) 2 = 1.46 x 3 = 4.38.
2
(0.21) = 0.04 x 3 = 0.08.
2
(0.79) = 0.62 x 4 = 2.48.
2
(1.79) = 3.20 x 3 = 9.6.
Summation = 9.76 + 4.38 + 0.08 + 2.48 + 9.6
ε f(x – x )2 = 26.3.
Step 4th : Calculate the Standard Deviation (SD).
Formula = σ = ε f(x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
F – is frequency, x – is Individual observation, x – is Arithmetic mean,
n – is Total number of observation.
σ= 26.3 / 14
SD = 1.87
SD = 1.36.
So, the standard deviation of the given set of data is 1.36.
Coefficient of Standard Deviation / Coefficient of Variation
Formula = CV = SD / AM x 100.
Where, CV – Coefficient of Variation, SD – is Standard Deviation,
AM – is Arithmetic mean.
CV = 1.36 / 3.21 x 100.
Coefficient of Variation = 42.36%.
Medical Statistics – Dr. Suhas Kumar Shetty
Formula = σ = ε f(x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
F – is frequency, x – is Class midpoint, x – is Arithmetic mean,
n – is Total number of observation.
e.g. Following are the number of patients according to the age groups. Calculate
the standard deviation.
Sl. Age groups No. of Pt.’s
01. 10 – 20 2
02. 20 – 30 1
03. 30 – 40 3
04. 40 – 50 4
Total number of Observations 10
Step 1st : Calculate the arithmetic mean.
Formula = x = ε fx
n
Where, x – is Arithmetic mean, ε – is summation, f – is frequency,
x – is class mid point, n – is total number of observations.
Class midpoint
Formula = C.M. = L.L. + U.L. / 2
Where, C. M. – is Class midpoint, L.L. – is Lower limit of the class,
U.L. – is Upper limit of the class.
CM = L.L. + L.L. / 2
10 + 20 / 2 = 15.
20 + 30 / 2 = 25.
30 + 40 / 2 = 35.
40 + 50 / 2 = 45.
x = ((2x15) + (1x25) + (3x35) + (4x45))
10
x = 340
10
x = 34.
Step 2nd : Calculate the x – x .
15 – 34 = – 19
25 – 34 = – 9
35 – 34 = 1
45 – 34 = 11
Medical Statistics – Dr. Suhas Kumar Shetty
ε f(x – x )2 = 1290.
Step 4th : Calculate the Standard Deviation (SD).
Formula = σ = ε f(x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
F – is frequency, x – is class midpoint, x – is Arithmetic mean,
n – is Total number of observation.
σ= 1290 / 10
SD = 129
SD = 11.35.
So, the standard deviation of the given set of data is 1.36.
Coefficient of Standard Deviation / Coefficient of Variation
Formula = CV = SD / AM x 100.
Where, CV – Coefficient of Variation, SD – is Standard Deviation,
AM – is Arithmetic mean.
CV = 11.35 / 34 x 100.
Significance
( *
Introduction
In medical investigations only a sample portion of the population is studied.
The sample results are bounded to differ from population results.
This difference or the error is measured by “Standard Error.”
The word error here means – “The difference between the true value of
a population parameter and estimated value provided by appropriate
sample statistics.”
Definition
Formula SE = SD / n
Calculation
e.g. Following are the results of the ESR in mm for 1st hour observed in 5
2, 4, 6, 8, 10.
The above mentioned example comes under the Type I series of data
Formula = x = ε x
n
Where, x – is Arithmetic mean, ε – is summation,
x – is individual observations, n – is total number of observations.
x = 2 + 4 + 6 + 8 + 10
5
x = 30
5
x = 6.
Formula = σ = ε (x – x )2
n
Where, σ – is Standard Deviation, ε – is Summation of,
x
x – is Individual observation, – is Arithmetic mean,
n – is Total number of observation.
σ= 40
5
σ = 2.8.
So the standard deviation of the above given set of data is 2.8.
Step 5th : Calculate Standard Error
The standard error of the mean is the Standard deviation of the sample mean
divided by the square root of the sample size.
Formula SE = SD / n
Where, SE – is Standard error, SD – is Standard Deviation,
n – is the total number of observations.
SE = 2.8 / 5
SE = 2.8 / 2.23
SE = 1.25.
So, the Standard error of the given set of data is 1.25.
Interpretation
The value of the standard error (SE) is directly proportional with the
standard deviation (SD). i.e. Higher the SD, higher the SE.
SE α SD
Where, SE – is the Standard error, SD – is Standard deviation.
The value of the standard error (SE) is inversely proportional with the
sample size. i.e. Higher the Sample size, higher the SE.
' "
Based on number of variables, there are 3 types of statistical analysis. Viz.
01. Univariate analysis
02. Bivariate analysis
03. Multivariate analysis
Univariate Analysis – The statistical analysis that has only 1 variable, called as
Univariate analysis.
e.g. Mean, Mode.
Bivariate Analysis – Those set of analyses which have 2 variables are called as
Bivariate Analysis.
e.g. Correlation, Regression analysis.
Multivariate Analysis – Those set of Analysis which have more than 2
variables, are called as Multivariate analysis.
e.g. Multiple correlation analysis, Multiple regression analysis.
CORRELATION
Definition – Correlation is the method of investigating the relationship between
the 2 variables. Both of which are quantitative in nature.
Correlation analysis attempts to determine the degree of two variables.
e.g. Increase of advertisement and increase of sales.
Increase in family income decrease in infant mortality rate.
TYPES
There are five types of correlation.
PNC INC NC IPC PPC
-1 0 +1
Note : – Where, PNC – Perfect positive correlation. IPC – Imperfect positive
correlation, PNC – Perfect Negative correlation. INC – Imperfect Negative
correlation, NC = No correlation.
PERFECT POSITIVE CORRELATION
If the values of 2 variables vary in same direction and same proportion,
then it is called as Perfect Positive Correlation. (PPF)
Here value of r will be +1.
e.g. Age and expenses. y
0 x
Medical Statistics – Dr. Suhas Kumar Shetty
0 x
0 x
0 x
NO CORRELATION
If there is no relationship between 2 variables i.e. if the values of 2
variables do not vary either in the same direction or in proportion, then it is called
as No Correlation.
e.g. Height of the students and marks scored in exams.
y
. .. .
. ..
. ..
METHODS OF CALCULATION 0 x
Dot / Scattered Diagram.
Karl Pearson’s Coefficient of Correlation.
Rank Correlation.
Medical Statistics – Dr. Suhas Kumar Shetty
387150 – 384282.
r=
[447100 – 443556] x [335970 – 332929]
2868
r=
3544 x 3041
2868
r=
10777304
2868
r=
3282.88
r= 0.87.
The correlation in the above given example is – Imperfect Positive
Correlation. i.e. There is imperfect positive correlation in Height and Weight
in given example.
FORMULA FOR DIRECT METHOD
Co-variance of x & y
r=
Standard Deviation of x & y
3650 – 782
r=
[4700 – 1156] x [3570 – 529]
2868
r=
3544 x 3041
2868
r=
10777304
2868
r=
3282.88
r = 0.88.
The correlation in the above given example is – IMPERFECT
POSITIVE CORRELATION. i.e. There is imperfect positive correlation in
Height and Weight in given example.
Medical Statistics – Dr. Suhas Kumar Shetty
bxy = εdx x dy
εd2y
Where, y is dependent variable,
y is Arithmetic mean of y series.
Where, dx and dy are the – deviated values of x and y from its respective
ε d 2x
Where, dx and dy = deviated values of x and y from its respective Arithmetic
mean. ε = summation.
CALCULATION OF CO-RELATION CO-EFFICIENT
USING REGRESSION EQUATION
r = bxy x byx
r = Co-relation co-efficient.
bxy = Regression co-efficient of x on y.
byx = Regression co-efficient of y on x.
e.g. Following are the age and systolic blood pressure of 5 patients. Calculate
the systolic blood pressure when his age is 45 years. Calculate the age when his
systolic blood pressure is 180 mm of Hg. Also calculate co-relation of x and y.
Age Systolic blood pressure in mm of Hg
(x) (y)
40 130
50 150
30 120
20 110
60 160
Answer :
Age SBP Mean Mean dx d2x dy d2y dxdy bxy byx
x Y x y (x – x ) (y – y )
40 130 40+ 130+ 0 0 –4 16 0
50 150 50+ 150+ 10 100 + 16 256 160
30 120 30+ 120+ – 10 100 – 14 196 140
20 110 20+ 110+ – 20 400 – 24 576 480
60 160 60/5 160/5 20 400 26 676 520
40 134 0 1000 0 1720 1300 0.76 1.3
01. Regression equation of y on x [Calculation of dependent variable (y) based
on independent variable (x)]
y – y = byx (x – x )
Where,
y – Dependent variable = Systolic B.P. y – is Arithmetic mean of y series
byx – Regression co-efficient of y on x.
x – is independent variable = age 45 years. x – is Arithmetic mean of x series.
Medical Statistics – Dr. Suhas Kumar Shetty
ε d 2x
Where, dx and dy = deviated values of x and y from its respective Arithmetic
mean. ε = summation.
byx = (1300) / 1000
byx = 1.3
y – 134 = 1.3 x (45 – 40)
y – 134 = 1.3 x 5
y = 6.5 + 134
y = 140.5 mm of Hg.
The systolic blood pressure when his age is 45 years will be 140.5 mm of Hg.
02. Regression equation of x on y [Calculation of independent variable (x)
based on the dependent variable (y)]
x – x = bxy (y – y )
Where, x = Independent variable = Age.
x = Arithmetic mean of x series.
bxy = Regression co-efficient of x on y.
bxy is calculated by –
bxy = εdx dy
εd2y
Where, y is dependent variable = systolic blood pressure = 180 mm of Hg.
y is Arithmetic mean of y series.
Where, dx and dy are the – deviated values of x and y from its respective
r= 0.76 x 1.3
r= 0.988.
r = 0.993.
The co-relation co-efficient of x and y is type of imperfect positive or
! !
Definition
Hypothesis
researcher or investigator.
It is of 2 types. Viz. –
Null hypothesis.
the aim of being rejection. This part takes a great role in implication of any rules
Comparing within the groups – Comparing the results before and after the
more groups.
01. Formulate the hypothesis. (i.e. both the Null and Research hypothesis)
Medical Statistics – Dr. Suhas Kumar Shetty
sample size.
04. Calculation of sample mean, standard deviation, standard error and any of
05. Comparing the observed values with the table value of selected test of
significance.
./ ' ./ '
Among all the test of significance the most common is z test because of
distribution / Naval distribution. But, when the sample size are less or small (i.e.
less than 30) it does not follow normal distribution. Therefore, there was a need
The early work / initial work was done by W. S. Gossett in Ierland, who
was working in a beverages company. The company did not allow its employ to
publish any research article. So he published this test in the pen name of student
test.
Therefore, this test became famous by the name of student test / student
APPLICATIONS
When the sample size gets larger than (i.e. more than 30) the t distribution
TEST OF SIGNIFICANCE
groups.
occasions and time like before and after the intervention readings of the same
./
Calculations
t = Difference in mean of 2 groups / S. E. of 2 groups.
t= | x1– x2 |
t=
SE ( x1– x2)
Where, S. E. ( x1 – x2 ) = (n1 – 1) SD12 + (n2 – 1) SD22 x 1 + 1
n1 + n2 – 2 n1 n2
Where, t – is unpaired t value, x1 x2 – Arithmetic mean of 1st and 2nd group, n1
& n2 – Sample size of 1st and 2nd group, SD1 & SD2 – Are the variations /
Standard deviations of 1st and 2nd group.
Example : Following are the values of birth weight of high socio-economical
group and low socio-economical group. Find whether there is a significant
difference between 2 groups.
Given Values Gr. A (High S-E Status) Gr. B (Low S-E Status)
Degree of freedom
n1 + n2 – 2.
The obtained ‘t’ value is 6.6. By comparing the obtained value with the
t23,0.05 = 2.07.
t23,0.02 = 2.50.
t23,0.01 = 2.81.
t23,0.001= 3.77
HOMEWORK
Problem
The following data gives the values of acidic reactions of solution (pH test)
Test whether there is a significant difference between 2 groups at significant level
of 0.001 level.
Group A Group B
7 6.8
7.8 7.4
7.9 7
8 7.2
7.6 7.4
7.4
Step 01 : Postulating Hypothesis.
Null Hypothesis – H0 = H1. There is no significant difference in group A and
group B acid test at significance level of 0.001.
Research Hypothesis – H0 = H1. There is a significant difference in group A and
group B acid test at significance level of 0.001.
Step 02 : Selection of level of significance.
2 groups and less than 30 samples (i.e. 11). So, the unpaired ‘t’ test
should be applied.
Step 03 : Selection of level of significance.
Calculations
x1 Group A – x (12– x )2 Group B (112–
x x2 ) (12–
x2 x2)
2
x1 2 x1 2
7.0 7.0–7.61 = 0.61 0.3721 6.8 6.8–7.16 = - 0.36 0.1296
7.8 7.8–7.61 = 0.19 0.361 7.4 7.4–7.16 = 0.24 0.0576
7.9 7.9–7.61 = 0.29 0.0841 7.0 7.0–7.16 = - 0.16 0.0256
8.0 8.0–7.61 = 0.39 0.1521 7.2 7.2–7.16 = 0.04 0.0016
7.6 7.6–7.61 = - 0.01 0.0001 7.4 7.4–7.16 = 0.024 0.0576
7.4 7.4–7.61 = -0.21 0.0441
Mean =
x = εx/n
Where, x – is the Arithmetic mean, ε – is the summation, x – is Individual
observation, n – is the total number of observations.
Arithmetic Mean of group A = 45.7 / 6 = 7.61.
Arithmetic Mean of group B = 35.8 / 5 = 7.61.
Standard Deviation =
Formula
S.D. = ε (x – x )2 / n.
Where, S.D. – is the Standard Deviation, ε – is the summation, x – is the
sum of all individual observations, x – is the arithmetic mean of the whole group.
SD1 Standard Deviation of Group A = 0.6886 / 6 = 0.33.
SD2 Standard Deviation of Group B = 0.2720 / 5 = 0.23.
Medical Statistics – Dr. Suhas Kumar Shetty
Formula =
t = Difference of mean of 2 groups / S. E. of 2 groups.
t= | x1– x2 |
t=
SE ( x1– x2)
Where, S. E. ( x1 – x2 ) = (n1 – 1) SD12 + (n2 – 1) SD22 x 1 + 1
n1 + n2 – 2 n1 n2
Where, t – is unpaired t value, x1 x2 – Arithmetic mean of 1st and 2nd group, n1
& n2 – Sample size of 1st and 2nd group, SD1 & SD2 – Are the variations /
Standard deviations of 1st and 2nd group.
= (6 – 1) (0.33)2 + (5 – 1) (0.23)2 x 1/6 + 1/5
6+5–2
= (5 x 0.1089) + (4 x 0.0529) x 1/6 + 1/5
11 – 2
= 0.5445 + 0.2116 x 0.16 + 0.2
9
= 0.7561 x 0.36
9
= 0.2721
9
= 0.030.
= 0.1760.
Step 04 : Calculate the ‘t’ value.
t = |7.61 – 7.16|
0.17
t = 0.45 / 0.17
t = 2.64.
Step 05 : Compare with the table values.
The obtained ‘t’ is 2.64. By comparing the obtained value with the table
value we can get following values. Viz. –
t11,0.001= 4.78
Step 06 : Drawing the conclusion on the basis of obtained and tabular
values for the corresponding values at different levels of significance.
The obtained ‘t’ value is 2.64, which is more than the tale value at the
0.001 significance level (i.e. 4.78), which is less than the table value.
Therefore, we have to accept the research hypothesis, which says that
there is a significant difference in acidic reaction of both the groups.
So, here the null hypothesis is accepted, saying that the there is no
significant difference in acidic reactions of group A and group B at the
significance level of 0.001.
Medical Statistics – Dr. Suhas Kumar Shetty
./
t = |x – µ |
SE
Where, ‘t’ – is the paired ‘t’ value, x - Arithmatic Mean, µ - Population mean or
null hypothesis, SE – is Standard Error.
e.g. Following are the results of systolic blood pressure before and after
treatment of a hypotensive drug of 9 individuals. Test their significance.
BT AT x
X (I.E. BT – AT) X – M O )2
(X – x
122 120 2 2–3=1 1
121 118 3 3–3=0 0
120 115 5 5–3=2 4
115 110 5 5–3=2 4
126 122 4 4–3=1 1
130 130 0 0–3=3 9
120 116 4 4–3=1 1
125 124 1 1–3=–2 4
128 125 3 3–3=0 0
Summation of (x – x )2 24
STEP 01.: Formulation of Hypothesis.
Since the sample size is less than 30 and we have to test the significance
within the same sample, we have to select the unpaired ‘t’ test.
Since, here the level of the significance is not given we have to take it as 0.05.
S.E. = S.D. / n
t = |x – µ |
SE
Where, ‘t’ – is the paired ‘t’ value, x - Arithmatic Mean, µ - Population mean or
t value = | 3 – 0|
0.54
= 3 / 0.54.
t value = 5.55.
Degree of freedom = n – 1.
=9–1.
Degree of freedom = 8.
t8,0.05 = 2.31.
t8,0.01 = 5.01.
The obtained value is greater than the table value. So, we have to accept
the research hypothesis which states that the drug is having hypotensive effect at
" # ( * '
' '
‘z’ test / Normal curve test / Test of significance for larger sample.
But this does not occur in the field of biostatistics. In medical field, small
PROPERTIES OF NPC
68.26%
-1σ x 1σ
Medical Statistics – Dr. Suhas Kumar Shetty
95.44%
-2σ x 2σ
If we further extend these vertical lines to +3 or –3 standard deviation from
the Arithmetic mean, then it will cover 99.74% of the total observations.
99.74%
-3σ x 3σ
Medical Statistics – Dr. Suhas Kumar Shetty
.+/ ' #
It is most widely used test of significance for larger samples. (i.e. Greater
than 30.)
It is based on Normal distribution. (NPC)
Karl Gouss invented this normal distribution.
SIGNIFICANCE / APPLICATION
Samples are randomly collected.
Data should be quantitative in nature.
Variables are normally distributed.
Sample size should be more than 30.
TYPES
There are 2 types of z types.
One tailed ‘z’ test.
Two tailed ‘z’ test.
ONE TAILED ‘z’ TEST
If the distribution is considered only one side, either less than or more than
Arithmetic mean, it is called as one tailed ‘z’ test.
x
Medical Statistics – Dr. Suhas Kumar Shetty
CALCULATION
z value = x – x / S. D.
Where, x – Value for which the probability should be calculated.
x – Arithmetic mean of the given distribution.
S. D. – Standard deviation.
e.g. A nurse supervisor has found that staff nurses in an average complete a
certain task in 10 minutes. If the time required completing a certain task is
normally distributed at the standard distribution of 3 minutes. Then calculate –
a) Proportion of nurses completing the task within 4 minutes.
b) Proportion of nurses required less than 5 minutes.
c) Probability that nurses completes the task in between 3 to 6 minutes.
a) For Proportion of nurses completing the task within 4 minutes. (i.e. for
<4 minutes)
z value = x – x / S. D.
Where, x – Value for which the probability should be calculated.
x – Arithmetic mean of the given distribution.
S. D. – Standard deviation.
Here, Arithmetic Mean is 10.
Standard deviation is 3.
Then,
z = 4 – 10 / 3 = – 6 / 3 = – 2.
z = – 2.
‘p’ value = 0.0228.
In % = 2.28%.
Therefore, about 2.28% of nurses complete the task within 4 minutes.
b) For proportion of nurses required less than 5 minutes. (i.e. for >5
minutes)
z value = x – x / S. D.
Where, x – Value for which the probability should be calculated.
x – Arithmetic mean of the given distribution.
S. D. – Standard deviation.
Then,
z = 5 – 10 / 3 = – 5 / 3 = – 1.66.
z = – 1.66.
‘p’ value = 0.0485.
In % = 4.45%
p value for > 5 minute in % = 100 – 4.85 = 95.15%.
Therefore, about 95.15% of nurses complete the task less than 5
minutes.
Medical Statistics – Dr. Suhas Kumar Shetty
Here, x = 3.
Then,
z = 3 – 10 / 3 = – 7 / 3 = – 2.33.
z = – 2.33.
minutes.
z value = x – x / S. D.
Where, x – Value for which the probability should be calculated.
x – Arithmetic mean of the given distribution.
S. D. – Standard deviation.
Here, x = 6.
Then,
z = 6 – 10 / 3 = – 4 / 3 = – 1.33.
z = – 1.33.
In % = 8.19%
f11,3,0.05 = 3.59.
f11,3,0.01 = 6.22.
f ratio = 0.27.
Since, the obtained “f” ratio value is less than the table “f” value at
% (03 )2 *
INTRODUCTION
The letter “x” in Greek represents “chi”. As it is “x2” or square of “x” it is
called as “Chisquare test.”
It was first introduced by a famous statistician “Karl Pierson” in 1889.
It is used for more than 2 categories of data. (i.e. Dichotomus data)
e.g. Boys and Girls, Yes and No, Rural and Urban, etc.
It is used to check the prevalence among the data.
APPLICATION / UTILITY
It evaluates whether the observed frequency in a sample differ significantly
from the expected frequencies. In other words, it is used to test whether a
significant difference exists between the observed number of samples and the
expected number of responses.
CALCULATIONS
It is the summation of the squared deviations of each observed frequency
from its expected frequency divided by corresponding expected frequency.
x2 = ε (O – E)2
E
Where, x2 – Chisquare value, O – Observed value, E – Expected value,
ε – Summation.
INTERPRETATION
It is the difference of Observed value and Expected value is zero or less,
then there is no significant difference. But, if the difference is more then, there
will be statistically significant difference.
e.g. A doctor has a hypothesis that headache is common among males and
females during examinations in a sample of 100 students. If he finds 58 girls and
42 boys suffering from headache, does the finding support or contradict his
hypothesis?
STEP 01. : Formulation of Hypothesis.
Null hypothesis – There is no difference between the boys and girls suffering
from headache . H0 = B = G.
Research Hypothesis – There is a significant difference between the boys and
girls from headache. H1 = B = G.
STEP 02. : Selection of appropriate test of significance.
As we have to compare the observed and expected value, we have to
select x2 test.
Medical Statistics – Dr. Suhas Kumar Shetty
2
ε (OB – EB)2 ε (OG – EG)2
x = +
EB EG
2
ε (42 – 50)2 ε (58 – 50)2
x = +
50 50
(8)2 (8)2
2
x = +
50 50
x2 = 128 / 50
x2 = 2.56.
STEP 05.: Comparison of obtained x2 value with table value.
Df = K – 1 = 2 – 1 = 1.
As the obtained x2 value is less than the table value, we have to accept
null hypothesis, which states that, there is no significant difference between the
Thus, the statistics support the doctor’s hypothesis, which is saying that
#
INTRODUCTION
It is an important branch of biostatistics which is necessary for
documentary and legal purpose.
In India, office of registrar general of India, (RGI) was established in the
year 1951 for colleting vital statistics and conducting census.
The registration of birth and death was made compulsory and uniform all
over India in 1969.
DEFINITION
The branch of biostatistics which deals with the important events of the life
like birth, death, marriage, etc is called as vital statistics.
USES OR SIGNIFICANCE OF THE VITAL STATISTICS
To describe the community health.
To diagnose the community illness.
To find the solutions for social problems.
To plan or modify health programmes.
For maintenance of records.
BASIC REPRESENTATION OF VITAL STATISTICS
It is expressed either in terms of rate or ratio.
RATE
It refers to those calculations that involve frequency of occurrence of some
events in a specific period.
It is calculated by –
Rate = a
Rate = xk
a+b
Where, a – is Frequency of the event during specific period of time, a + b –
It is the persons who are exposed to risk of events, k – is the constant, it is
generally taken as 1000.
RATIO
It is the proportion between 2 or more events.
e.g. Male and Female ratio in a class, Patient and Doctors ratio in a city, Student
and Teacher ratio in a college, etc.
All these can be expressed in 3 index. They are Viz. –
Mortality
Morbidity
Fertility
Medical Statistics – Dr. Suhas Kumar Shetty
MORTALITY
Death and birth are unique (i.e. it occurs only once). Hence, its recording is
easy.
ACDR = Annual Crude Death Rate.
Total number of death during the year
ACDR = x 1000
Total mid year population
AIMR = Annual Infant Mortality Rate.
Number of death within 1 year of birth
AIMR = x 1000
Total number of live births during the year
MORBIDITY
It is difficult to record morbidity. Hence, WHO has laid down few guidelines
for recording morbidity. They are Viz. –
Person
Illness
Spells of illness
Duration
FERTILITY
AFR = Annual Fertility Rate.
Number of births during the year
AFR = x 1000
Number of females in reproductive age