19 MAY - NR - Tests of Significance
19 MAY - NR - Tests of Significance
In applied investigations, one is often interested to compare some characteristic [ mean, Variance] of a
group with a specified value or in comparing it between two groups. In making such comparison, one
cannot rely on mere numerical magnitudes of the index of comparison like mean or variance. The
reason is that each group is represented by a sample of observation. The numerical value will change if
another sample is drawn. This is because of sampling fluctuations. This affects the observed differences
between groups concealing the real differences. Statistical science provides an objective procedure for
distinguishing whether the observed difference signifies any real difference between the groups. Such a
procedure is called ‘test of significance ‘.
A. Student's t-test
A t-test is a statistical hypothesis test in which the test statistic follows a Student's t distribution. It can
be used when the test statistic would not follow a normal distribution. The t-test is used for small
samples for which the size ‘n ‘will be < 30.
The t-statistic was introduced in 1908 by William S Gosset.
1. A one sample test: to know whether the mean of a population has a value specified in a null
hypothesis.
2. A two sample test: to know whether the means of two populations are equal.
3. Another test is to see whether the difference between two responses measured on the same
statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer
patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor
size for many of the patients to be smaller following the treatment. This is often referred to as
the "paired" or "repeated measures" t-test.
4. A test of whether the slope of a regression line differs significantly from 0.
1
x 0
S
t = n 1 df = n-1
∑ ( x− x̄ )2
Note: The SD is calculated as S = SD = √ n , The divisor is ‘n’ only.
∑ ( x− x̄ )2
If SD calculated using S = SD = √ n−1
x−μ ˳
the formula is used the ‘t’ calculation formula will be t = S
√n
The t value calculated is compared with t 2 for = 0.05 or = 0.01, if alternate hypothesis used is
H1 : <0 or H1 : >0 [ Known as one tailed tests]
The t value calculated is compared with t for = 0.05 or = 0.01, if alternate hypothesis used
is H1 : 0. [ Known as two tailed tests]
From ‘t’ tables, the values of tfor two tailed tests will be: For = 0.05, t will be t0.05. For = 0.01, t will
be t0.01.
From ‘t’ tables, the values of t 2for one tailed tests will be: For = 0.05, t2 will be t0.10. For = 0.01, t2
will be t0.02.
If the t value calculated is <t& t2 for ‘df’ (n-1), then H0 is accepted.
If the t value calculated is >t& t2 for ‘df’ (n-1), then H0 is rejected.
Write the inference compulsorily after the analysis is over.
2
x 0
S
H0 : = 0 ( 0 ) ; H1 : >0 ( 0 ); t = n 1 ; df = (n-1);x = = 2.58, S= 2.95,
2.58
t= * ξ 11 = 2.89. ‘ t ‘ value from table, is 1.796. t > 1.796 Hence H0 is rejected.
2.95
Example.
It was observed from a random sample of size 5 that the mean was 51.4 mm and SD was 6.37
mm. Test whether the mean of the population is 55 mm. (t 0.05 = 2.78 for 4 df)
x =51.4 S = 6.37n = 5 0 = 55
H0 : = 0 (35mm) H1 : 0
51.4 55
4
t= 6.37 = 1.13; df = 4 t < 2.78
Hence H0 is accepted. Hence it can be concluded that the mean of the population is 55 mm.
Example
A certain type of rats show a mean gain in weight of 6.5 gm during the first 6 months of life. Twelve rats
were fed with a particular diet from birth till they attained 6 months. The following weight gains were
found in them.
5.5, 6.2, 5.4, 5.8, 6.4, 6.0, 6.2, 5.9, 6.7, 6.2, 6.1, 6.5
Is there any reason to believe that the causes a decrease from the normal gain? Test using 5 % level of
significance.(t0.05 = 1.796 for 11 df)
H0 : = 0 or H1 : <0
3
t = 3.833 and it is >t2αvalue 1.796. Hence H0 rejected.
Inference:Diet causes decrease
Exercises
1. A random sample of size 15 from a normal population yielded a mean value of 2.23 and a
variance of 7.33. Does this support the hypothesis that the population mean is ‘zero’? (t0.05 =
2.145 for 14 df)
2. Nine students are chosen at random from a group and their heights are found to be 63, 63, 64,
65, 69, 69, 70, 70, 71. Test whether the mean height may be taken to be 65. (t0.05 = 2.306 for 8
df)
3. A health survey revealed that serum protein value of children is 7.0 g/100 ml. A group of 16
children showed a mean serum value 7.4 g/100ml with SD 0.56. Test the sample mean against
the population mean. Table value of ‘t’ is 2.31
4. Ten weights ten objects observed were 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Test whether the
population mean is 66. (t0.05 = 2.262 for 9 df)
A paired t-test is used to compare two population means where we have two samples in which
observations in one sample are paired with observations in the other sample. Examples of this are:
• Before-and-after observations on the same subjects (e.g. students’ diagnostic test results before and
after a module or course).
• A comparison of two different methods of measurement or two different treatments where the
measurements/treatments are applied to the same subjects (e.g. blood pressure measurements using a
stethoscope and a Dinamap).
Suppose a sample of ‘n’ students were given a test before studying a module and then again after
completing the module. We want to find out whether, in general, the teaching leads to improvements in
students’ knowledge/skills (i.e. test scores). We can use the results from our sample of students to draw
conclusions about the impact of this module in general. Let B = test score before the module, A = test
score after the module.
Let ‘x’ denote the differences. x = Value A – Value B if ‘A’ values are generally greater than ‘B’ values.
On the other hand, if ‘A’ values are generally greater than ‘B’ values , take X = Value B – Value A.
[ Note this]
4
The hypotheses set as: H0: x = 0 H1: x ≠ 0
1. Calculate the difference (x = A − B) OR ( x = B − A) between the two observations on each pair, making
sure to distinguish between positive and negative differences [ positive values taken as positive &
negative values taken as negative values ].
x 0
S
t= n 1 ; df = (n-1)
Example
A group of 10 children were tested to know the number of digits they can memorize to repeat them
after hearing them once. A test was given initially and the performance of them was noted. The students
were then given a special training and the test was conducted after a week. The performance was then
noted. Is there any significant effect due to the training? t 0.05 = 2.262 for 9 df
Test 1: 6 5 4 7 8 6 7 5 6 8
Test 2: 7 7 6 7 9 6 8 6 6 10
H0: x = 0 H1: x ≠ 0
5
Example
A sleeping drug and a neutral control preparation was administered in a selected 10 patients from a
hospital. The result regarding the numbers hours they slept are given below. Test
whether the drug has any greater effect than the control preparation. Use 5 % level of significance. t 0.05
= 2.26 for 9 df
Drug: 10.6 7.5 9 5.4 6.1 10.2 7.1 9.7 8.4 7.9
Control: 8.6 7.3 9.4 5.1 5.4 9 6.5 7.9 8.7 6.9
H0: x = 0 H1: x ≠ 0; t = 2.78. t0.05 = 2.26. df = 9. ‘t’ is > t 0.05. Hence H0 is rejected.
1st test: 50 52 53 60 65 67 48 69 72 80
2nd test: 65 55 65 65 60 67 49 82 74 86
2. A weight reduction teaching was conducted. The weights before and after of 10 people were
It is a method for testing a claim about a parameter in a population, using data measured in a sample.
Through this, we test some hypothesis by determining the likelihood that, a sample statistic could have
been selected from the population.
The method of hypothesis testing can be summarized in four steps.
Step 1: State the hypotheses.
Step 2: Set the criteria for a decision.
Step 3: Decide the appropriate test criterion
Step 4: Calculate the value of the test statistic using the appropriate formula
Step 5: Decide
Step 1:
We begin by stating the value of a population mean in a null hypothesis, which we presume is true. The
null hypothesis (H0) is a statement about a population parameter, such as the population mean, that is
assumed to be true.
6
We will test whether the value stated in the null hypothesis is likely to be true or it is contradicted. That
is, we test whether the parameter is less than, greater than, or not equal to the value stated in the null
hypothesis.
One of these will be stated in the alternate hypothesis. The alternative hypothesis is thus a rival
hypothesis which states what we think is wrong about the null hypothesis.
Step 2:
To set the criteria for a decision, we state the level of significance for the test. This level is typically set
at 5% [ 0.05] or 1 % [ 0.01], in research studies.
Step 3:
The test statistic is a mathematical formula which allows researchers to determine the likelihood of the
sample to have come from the population if the null hypothesis were true.
The value of the test statistic is used to decide regarding the null hypothesis.
Z test
‘t’ test
Chi square test
F test
Step 4:
Obtaining the value of the test statistic based on the selected test criterion.
Step 5:
Take a decision
For this the value of the statistic is compared with the corresponding value in the standard statistical
table.
If calculated statistic value is LESS THAN OR EQUAL to the table value, the Null Hypothesis is accepted
If calculated statistic value is GREATER THAN the table value, the Null Hypothesis is rejected
7
Parametric methods deal with the estimation of population parameters (like the mean).
The non-parametric methods are distribution free methods. They rely on ordering (ranking) of
observations.
More specifically, the data distribution is significant in the choice between parametric and non-
parametric procedures.
If it is proved that the populations are normally distributed, then parametric methods can be used.
If we are not sure or we suspect that they do not behave normally, then we use non-parametric
methods.
Categorical (nominal) or ordinal scale data demand the use of non-parametric methods.
Even if the interval and ratio scale data where population normality cannot be assumed, the non-
parametric methods must be used. But it cannot be vice versa.
3. Paired test for testing difference between two related samples – ‘Z’ test
8
3. Paired ‘t’ test for testing difference between two related samples.
‘F’ test:
2. Wilcoxon sign-rank test tests the difference between two related variables
4. Median test
5. Chi-square test
7. Mann-Whitney test
8. Kolmogorov-Smirnov test
9. McNemar tests
Statistical Packages
Statistical packages are collections of software designed to aid in statistical analysis. Most of the
quantitative and statistical analysis relies upon statistical packages for its execution. An understanding of
statistical packages is essential to correct and efficient application of many quantitative and statistical
methods
Some important packages are STATISTICA, SPSS, SAS, MATLAB, Minitab, R soft Ware, StatistiXL,
Vassarstat.net.
A. SPSS PACKAGE
SPSS stands for Statistical Package for the Social Sciences. First version of SPSS was released in 1968.
Company Logo SPSS is now owned by IBM. IBM SPSS Statistics 21.0 was released on August 2012.
With SPSS we can analyse data in three basic ways.
9
Describe data using descriptive statistics
Examine Relationships between variables
Compare groups to determine if there are significant difference between these groups example; t-test,
ANOVA.
There are two different windows in SPSS
Data Editor Window.: In data editor we can create variables, enter data and carry out statistical
functions.
Output Viewer Window: Output window shows what results are produced by analysing the functions
Prepared By
Mr. Balasubramanian N K
External Faculty, Statistics
10