Test of Significance
Test of Significance
Normal Distribution
Type of Errors
Hypothesis & Types
Test of Hypothesis
Test of Significance
Parametric Tests Nonparametric Tests
A random variable X is said to have a normal distribution
with parameter (,2) if its density function is given by,
f(x) = (x, ,) = 1 / 2 e – (x-)2/22
- < x < - < < , >0
Reality True Reality false
Researcher True Correct X II error (β)
Researcher False X I error (α) Correct Power (1 – β)
Type I Error (α): - the probability that difference shown occurred due to
chance
It is known as level of significance / critical region / region of rejection
P [reject H0 when it is true]
Type II Error (β): - if a true difference of a stated magnitude existed but
the study would not have picked up as statistically significant.
P [accept HA when it is false]
Power of Study (1 – β): - the probability that if a true difference of stated
magnitude existed then the study would have picked up as statistically
significant.
A hypothesis is a statement of predicted
relationship between 2 or more variables.
A definite statement about the population
parameter.
Steps: – t test: -
Calculate the difference between 2 means.
Calculate SE of diff of mean. This will give us a variation which
can occur purely by chance between the mean of random
samples from the same population
Find out ‘t’ by t = x1 – x2 / SE (x1 – x2)
SE (x1 – x2):-
Final out sum of sq for both the groups
SE (x1 – x2) = SD2 / n1 + SD2 / n2
Calculate pooled d.f from formula n1+n2-2
Test statistic for an experimental comp 2 samples of
equal size:-
t = x1 – x2 / S2(1/n1 + 1/n2)
S2 = ∑ (x-x1)2 + (x-x2)2 / (n1-1)+(n2-1)
Two sample of unequal size:-
t = x1 – x2 / S2(n1+n2 / n1n2) where S2 = ∑ (x-x1)2 + (x-x2)2
/ (n1-1)+(n2-1)
The study was conducted to find out whether there is a significant
difference in the serum cholesterol of sedentary workers and hard
laborers. A representative sample of 150 sedentary workers had an
average mean serum cholesterol of 230 mg% with SD of 15
mg% another representative sample of 100 hard workers had
mean serum cholesterol level of 210 mg% with SD of 20
mg%. On the basis of the above study can you say that serum
cholesterol in both the groups is significantly different?
Group 1 n 150, Mean 230 , SD 15
Group 2 n 100, Mean 210 , SD 20
SE (X1 – X2):-
(15*15/150)+(20*20/100)
=5.5 = 2.34
t = (230-210) / 2.34
= 20 / 2.34 = 8.54
tcal = 8.54 > ttab = 1.96
Significant.
Calculate the difference in each set of paired observation.
Find out mean difference (d)
Find out S.E (d)
t = d / SE(d)
Calculate df = (n-1)
Refer to a table ‘t’ & see the probability of calculation ‘t’
corresponding to the degrees of freedom t = d / ( S2 / n)
Is there any significant difference exits?
Pre test: 6 8 8 6 5 9 6 7 6 6 4 8
Post test: 8 8 10 7 6 10 9 8 5 7 4 6
Solution: -
Diff (di): 2 0 2 1 1 1 3 1 -1 1 0 -2
d = ∑di / n = 0.75 s2 = 1.84
tcal = 0.75/ (1.84 / 12)
tcal= 1.91< ttab = 2.2
Not significant
One way ANOVA (Analysis of variance)
Two way ANOVA
Multivariate Analysis
Discriminant Analysis
Principle Component Analysis
Factor Analysis
Cluster Analysis
To find out if there is any correlation between the two
variables under study.
Karl pearson a British biometrician developed a formula
for correlation.
Correlation coefficient between two random variables X
and Y usually denoted by r(X,Y) = Cov (X,Y) / xy
X – Independent Variable, Y – Dependent Variable
Assumptions: -
The variables x and y under study are linearly related.
Follow normal distribution
Not independent of each other but are related in a casual
fashion.
Range -1 to 1 usually represented as -100% (perfect negative
correlation) to 100% (perfect positive correlation)
The linear Regression Model: Y = a + b X ,
Where Y is Dependent Variable, X is Independent
variable
b is Slope, b = cov (x,y) / var(x)
a= Y–b*X
a = intercept for which X = 0
the slope b will explain that for each unit change in x, y
increase.
U test : - test difference in ranks score of two different groups
Wilcoxon Signed rank test: - within difference in ranks of two
related groups
Median Test:- To test difference b/w medians of two different
groups
Kruscal Wallis Test: - Difference in ranks of three or more
independent groups
Chi square test:- find difference b/w 2 categorical data
McNemar chi sq: - within difference in proportion for paired
2x2
Pearson Chi square test:-
Qualitative variable with 2 or more categories
The no of observation in each cell of the table must be known
Frequencies in the different categories should be mutually
exclusive and exhaustive
2 = ∑(Oi – Ei)2 / Ei
Where Oi = observed value
E i = R i * Cj / N
Considering a data where we want to find association
between prenatal care received by mother and survival
status of infants at one month. Dead Alive
5%
Not significant
Variable Research Parametric Nonparametric
Question
A vs B t test U test
Quantitative
Variables
Pre vs Post Paired t test Signed rank test
A vs B vs C ANOVA with Kruscal wallis test
multiple comparison
Qualitative
Variables Proportion Z test