Module-6-Inferential-Statistics-Parametric-Test
Module-6-Inferential-Statistics-Parametric-Test
DATA ANALYSIS
MODULE 4
INFERENTIAL STATISTICS - PARAMETRIC TEST
We use inferential statistics to try to infer from the sample data what the population might
be or we use inferential statistics to make a conclusion of the probability observed between the
differences of two groups. In other words, we use inferential statistics to make inferences from
our collected data to more general conditions. In inferential statistics, we do different statistical
tests whether the data is normally distributed or not and these are the parametric test and
non-parametric test.
In the previous module, your teacher presented you with the different test statistics under
parametric and non-parametric tests. In this module, selected statistical tests under parametric
tests will be discussed first then followed by the selected statistical tests under non-parametric
tests. Manual computation with the use of statistical tests as well as on how to use SPSS to do
these tests will be presented.
PARAMETRIC TEST
The parametric tests are tests that require normal distribution or we assume that
the data is normally distributed as mentioned above; the levels of measurement are expressed in
an interval or ratio data. We use the parametric test when the distribution is normal, i.e. when the
skewness is equal to zero and the kurtosis is equal to 0.265.
We use the parametric test since it is a more powerful statistical tool compared to
the non-parametric test. But, when do we use the parametric tests? We can use the different
parametric test if we meet two major conditions, one is when we assume that the data is normal
and two when the data is in the interval and in ratio data as stated above.
A. t-test
One of the most commonly used parametric tests in statistics is the t-test also
known as the “Student t-test”. For this section, we will be learning about two tests of
t-test and these are the t-test for independent samples and the correlated t-test or known
as the paired-test. We use this test in order to determine if the two quantitative variables
are statistically significant or not. In doing this test, our assumptions are that the
distribution of data is normally distributed.
What is the t-test for independent samples? The t-test is a test of difference between two
independent groups. This means that we are going to compare two means, say x1 against x2.
This test was introduced by William S. Gosset under the pen name “Student”, hence the
t-test is also called “Student t-test”. It is the most common test used by the researchers.
According to Broto (2007), the t-test for independent samples is used when we compare
means of two independent groups and if the distribution is normally distributed where Sk = 0
and Ku = 0.265. This test is used when the data is interval or ratio with a sample which is less
than 30.
However, if n becomes larger, the t-distribution comes close to z-distribution. Thus, t-test can
be used not only if n less than thirty (30) but also if n is large or n is greater than or equal to
thirty ( 30) and if the populations’ standard deviation is not known.
In our previous lesson, the divisor n – 1 in the formula for variance and the standard
deviation is what we call the degree of freedom. The degree of freedom is the number of
variables which are free to vary.
where;
t = t-test
Example 1:
The following are the scores of 10 male students and 10 female third year
Computer Science students of Prof. Alyssa Jandy Angulo in the preliminary examination in
Statistics and Probability.
Problem: Is there a significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in Statistics and
Probability?
1. Hypotheses:
Ho: There is no significant difference between the performance of the male and female
third year Computer Science students in the preliminary examination in Statistics and
Probability.
Ha: There is a significant difference between the performance of the male and female
third year Computer Science students in the preliminary examination in Statistics and
Probability.
2. Level of Significance:
Based on the problem;
α = 0.05
df = = 10 + 10 – 2 = 18
3. Compute the t-value using the t-test for two independent samples:
Male Female
28 784 24 576
36 1296 18 324
34 1156 22 484
32 1024 10 100
24 576 6 36
24 576 14 196
20 400 4 16
18 324 12 144
34 1156 26 676
(∑𝑥1) 2
2 (258) 66564
𝑆𝑆1 = ∑ 𝑥1 − 𝑛1
= 7356 − 10
= 7356 - 10
= 7356 – 6656.4 = 699.6
2
(∑𝑥2) 2
2 (156) 24336
𝑆𝑆2 = ∑ 𝑥2 − 𝑛2
= 2952 − 10
= 2952 - 10
= 2952 – 2433.6 = 518.4
= 2.773
4. Decision Rule:
Reject the Ho if |t-computed| is greater than the |t-critical| otherwise accept the Ho.
5. Conclusion:
Since the t-computed value of 2.773 is greater than the t-tabular value of 2.101 at 0.05
level of significance with 18 degrees of freedom, the null hypothesis is not confirmed. This
means that there is a significant difference between the performance of the male and female
Example 2.
To compare the batting average of two review centers for engineering board
examination, a graduate obtains the following data on the previous batch of examinees.
Test the hypothesis that RAR Review Center has a better track record than JTH Review
Center. Use a 5% (0.05) level of significance.
RAR JTH
Sample Size 10 12
Mean Rating 94.5 90
Standard Deviation 6.2 7.1
Solution:
1. Hypotheses:
Ho: The RAR Review Center has the same track record to JTH Review Center.
Ha: The RAR Review Center has a better track record than JTH Review Center.
2. Level of Significance:
α = 0.05
df = 𝑛1 + 𝑛2 − 2 = 10 + 12 – 2 = 20
𝑥1−𝑥2 94.5−90
𝑡= = = 11. 71
( 𝑆𝑆1+ 𝑆𝑆2
𝑛1+ 𝑛2−2 )( 1
𝑛1
+
1
𝑛2 ) ( 6.2+7.1
20 )( 1
10
+
1
12 )
4. Decision Rule:
If the t-computed is greater than or equal to the t-critical value, reject the Ho in favor of
Ha.
5. Conclusion:
In our previous lesson on how to test two independent samples if they are associated with each
other and look if there is a significant difference between the mean of these two independent samples, the
t-test for two independent samples was used. Also, this test statistics is under parametric test hence we
will be using this test if the distribution of the data is normal. But it is very tedious if we do the manual
computation of determining if the two independent samples are significantly different. We can use the
SPSS to test two independent samples if they are significantly different or not. Let us take this example.
A teacher wants to determine whether the group of male students and female students in his class
“Data Analysis” performs differently based on the score of their midterm examination. Here is the data
based on the class record of a teacher.
28 24
36 18
34 22
32 10
8 20
24 6
24 14
20 4
18 12
34 26
Our research problem would be: “Is the score of male students statistically significant to the score of
female students?” So, our hypotheses would be that:
Ho: There is no significant difference between the scores of male and female students.
Ha: There is a significant difference between the scores of male and female students.
Step 1. Make an appropriate variable to be used in the variable view of SPSS and enter the data in the data
view of SPSS.
Note: On this output view, we will be focusing more on the middle part of the table such as the t = 2.773,
df = 18 and the Sig. (2-tailed) = 0.013 in deciding whether we will be rejecting the null or accepting it.
Conclusion: Since the p-value of 0.013 is less than the level of significance of 0.05, we need to reject the
null hypothesis in favor of the alternative hypothesis, therefore there is a significant difference between
the score of male and female.
Section:________________________ Score:__________
Practice Exercises
t-test for Independent Sample (Two-tailed test)
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TTM 35 33 34 25 26 35 43 28 39 36 28 18 36 34 42
OBE 32 36 45 36 36 25 37 42 33 28 37 29 32 43
*TTM – traditional teaching method OBE – Outcome Based-Education
Test the null hypothesis that there is no significant difference between the
performance of the two groups of students under traditional and outcomes-based education
methods of teaching. Use t-test at 0.05 level of significance.
Note: Use both the manual computation and with the use of SPSS.
Another parametric test is the t-test for correlated samples. This test is
applied only in one group of samples. This could be used in the evaluation of a
certain program or treatment. Since the t-test for correlated sample is another
parametric test, conditions must be met such as it should be in a normal
distribution and the use of interval or a ratio data.
This test is applied when the mean before is being compared by the mean
after in a certain evaluation of a group. This could be done by this simple
diagram:
Compare the pre-test and post-test and decide whether there is a significant or
non-significant difference between the pre-test and post-test using the t-test for correlated
sample known as the paired t-test.
The t-test for correlated samples is used to find out if a difference exists
between the before and after means. If there is a difference in favor of the post
test then the intervention or treatment or method is effective. However, if there is
no significant difference then the treatment or method is not effective.
The formula for the t-test for correlated samples is given by;
𝐷
𝑡=
2
(∑𝐷)
2
∑𝐷 − 𝑛
𝑛(𝑛−1)
Where;
2
∑ 𝐷 = the sum of the squares of the difference between the pretest and the
posttest
Example 1.
During the first day of class, a professor conducted a 50-item pre-test to his
fifteen students in Statistics and Probability before the formal lesson of the said subject. After a
semester, he gave a posttest to his fifteen students using the same set of examinations that he was
given in the pretest. He wants to determine if there is a significant difference between the pretest
and the posttest. The following is the result of the experiment. The professor uses the α = 0.05
level of significance.
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pre-test 15 12 20 10 8 27 29 13 19 22 25 14 28 18 16
Post-test 20 18 25 25 20 35 43 28 29 37 46 27 33 37 28
Solution:
Problem: Is there a significant difference between the Pretest and Posttest of the fifteen students
in Statistics and Probability on the use of the teaching method by the Professor?
1. Hypotheses:
Ho: There is no significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability based on the teaching method used by the Professor.
Ha: There is a significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability based on the teaching method used by the Professor.
2. Level of Significance:
Pretest Posttest
𝑥1 𝑥2 D D2
15 20 -5 25
12 18 -6 36
20 25 -5 25
10 25 -15 225
8 20 -12 144
27 35 -8 64
29 43 -14 196
13 28 -15 225
19 29 -10 100
22 37 -15 225
25 46 -21 441
14 27 -13 169
28 33 -5 25
18 37 -19 361
16 28 -12 144
2
∑𝐷 = − 175 ∑ 𝐷 = 2405
𝐷 = -11.67
Plug in a formula:
𝐷 −11.67
𝑡= = 2
2 (−175)
2405− 15
(∑𝐷)
2 15(14)
∑𝐷 − 𝑛
𝑛(𝑛−1)
−11.67 −11.67
𝑡= =
2405− 2041.67 1.73
210
−11.67
𝑡= 1.32
=− 8. 84
4. Decision Rule:
Example 2.
Solution:
Research Problem: Is the coach's claim that the times of the five runners will improve
before and after the training is true?
1. Hypotheses:
Ho: There is no significant difference on the average time of running before and after
the training.
Ha: There is a significant difference in the average time of running before and after
the training.
2. Level of Significance:
𝐷 = 0.096
Plug in a formula:
𝐷 0.096
𝑡= = 2
2 (0.48)
0.0694− 5
(∑𝐷)
2 5(4)
∑𝐷 − 𝑛
𝑛(𝑛−1)
0.096 0.096
𝑡= =
0.0694−0.04608 0.001166
20
0.096
𝑡= 0.0341
= 2. 815
4. Decision Rule:
If the t-computed value is greater than or is beyond the critical value, reject the Ho
otherwise accept the Ho.
5. Conclusion:
The t-computed value of 2.815 is beyond the t-critical value of 2.776 at 0.05 level of
significance with 4 degrees of freedom. The null hypothesis is therefore to reject in favor of
the alternative hypothesis. This means that there is a significant difference on the average
time of running before and after the training.
In order to do t-test for correlated samples in SPSS, you have to bear in mind that the sample here
that we are referring to is in only one group and we need to test two different parameters in every
observation in a group. Examples for this are before treatment and after treatment.
Illustration:
1. Is there any difference in the number of heartbeats per minute before and after jogging?
2. Is there any difference between the scores of students in pre-test and in post-test?
3. Is there any difference in the temperature of rats before and after a shot of dosage of medicine?
These are just some possible problems that t-test for correlated samples could be used and in order to test
the hypothesis in using these test statistics in a very easy way, SPSS could be one of the useful software to
be used.
A 23 40 L 44 40
B 10 27 M 32 30
C 25 26 N 29 43
D 18 37 O 45 43
E 35 40 P 21 29
F 33 46 Q 18 29
G 26 28 R 32 42
H 18 32 S 29 36
I 46 35 T 32 45
J 32 32 U 35 40
K 29 37 V 28 29
Ho: There is no significant difference in the pre-test and post-test scores of students in Data Analysis
Ha: There is a significant difference in the pre-test and post-test scores of students in Data Analysis
To test the hypothesis whether there is no significant difference or there is a significant difference, we will
do the paired t-test using SPSS.
Step 2: Click “Analyze” then proceed to “Compare Means” and another dialog box will appear and look
for “Paired t-test”.
Step 3: Click the “Paired-samples t-test” and another dialog box will appear just like below;
Step 4: Place the “pre-test” in variable 1 and “post-test” in variable 2 by highlighting and clicking the
arrow.
Note: The option button will serve what confidence level you want to choose. The 95% is the default of
SPSS.
Step 5: Then click “ok”. Once you click the ok button, the output view will appear.
In an output view, we need to focus more on the “Sig” for paired samples test. The sig value for
this pair of data to be tested is 0.001 and this value will be compared to the level of significance of 0.05.
You will notice that the sig value is less than the level of significance. What is the meaning of this? If
you are asking to analyze and interpret the result based on the sig value you could say that we need to
reject the null hypothesis in favor of the alternative hypothesis.
Section:________________________ Score:__________
Practice Exercises
t-test for Correlated Sample (Paired-t-test)
Note: Use both the manual computation that is with the use of the formula and check if the
result is as the same in using the SPSS.
B. The z-test
It is also used to compare the two sample means taken from the same
population. The z-test can be applied in two ways: (1) the One-Sample Mean test
and (2) the Two-Sample Mean test.
The tabular value of the z-test at 0.01 and 0.05 level of significance is
shown below:
Level of Significance
Test
0.01 0.05
One-tailed ±2.33 ±1.645
Two-tailed ±2.575 ±1.960
The z-test for one sample group is used to compare the perceived population
mean μ against the sample mean 𝑥. Using this test, we can determine whether the mean
of a group differs from a specified value.
This procedure is based on the normal distribution. So for small samples, this
procedure works best if the data that were drawn is a normal distribution or one that is
close to normal. Usually, we can consider samples of size 30 or higher to be large
samples. However, if the population standard deviation is not known, the sample standard
deviation can be used as a substitute.
(𝑥−μ ) 𝑛
𝑧= σ
where;
Example 1:
A Mathematics Professor claims that the average performance of his students is at least
86%. To be able to verify if his claim is true, he conducted an examination of his 40 students.
After the exam, he got a mean grade of 80%. With the standard deviation of 76%, is the claim of
the Professor true? Use the z-test at 0.05 level of significance.
Solution:
Problem: Is the claim of a Professor true that the average performance of his student is at
least 86%?
1. Hypotheses:
2. Level of Significance:
α = 0.05
z = ± 1.645
3. Statistics:
z-test for a one-tailed test
Plug in a formula:
4. Decision Rule:
If the z-computed value is greater than or beyond the z-tabular, reject the Ho.
5. Conclusion:
Since the z-computed value of -0.49 is not beyond the critical value of -1.645 at
0.05 level of significance, the research hypothesis that the average performance of the
students is 86% will be accepted.
Example 2.
Based on the previous records, it was believed that the obesity among pupils aged 8-10
years old is less than 1.4% for boys. If 15 of the 500 boys aged 8-10 years old are obese, test the
claim that the proportion of obesity among male pupils is not less than 1.4%. Use a 5% (0.05)
level of significance.
α = 0.05
Step 4: Compute the test value and solve for the p-value
𝑋 15
𝑝= 𝑛
= 500
= 0. 03
0.03−0.014
𝑧= (0.014)(1−0.014)
500
0.016 0.016
𝑧= (0.014)(0.986)
= 0.0053
= 3. 02
500
Subtracting the area A = 0.4987 from A = 0.5000, we get 0.0013. Hence the p-value is 0.0013.
How do we use SPSS if you want to use the z-test as your statistical tool? There will be a slight
difference if you would use SPSS to determine the test statistic under z-test since we need a syntax here.
We all know that using this test, we need to know if there is a significant difference between the
population mean and the sample mean where the population standard deviation is also known.
Let us explore this test in SPSS and in order to do this, let us use example number 1 that was
illustrated previously under z-test for one sample group. In example 1, the following are the given: 𝑥 =
Step 1. First, look for the website for “how2stats” and if you are now inside the website, look for
the “One sample z-test” and click it. Another site will open.
Step 2. Highlight the syntax, make a copy of it by using the command either “control c” or “command c”.
Once you do this, proceed to open SPSS and click the file and look for “new” then “syntax”.
Step 4. On the right side blank screen, you will paste the copied syntax taken from the website.
Here, on line number 14, you could see the default number of 35, 105, 100, 15. These numbers
are represented by your sample size, sample mean, population mean and population standard deviation
respectively. If you have different values for this, you need to change it based on the given sample size,
sample mean, population mean and population standard deviation. Here are the given in our example, 𝑥
= 80(sample mean), μ = 86 (population mean), n = 40 (sample size), and σ = 76 (population
standard deviation).
Step 5. Put this in your syntax and then click “run” on the menu bar. Once you click the “run”,
SPSS generates the output.
Variable View
Data View
Step 6. Click the “run” icon and SPSS will generate the output.
On the output view, you will see on the “List” the value of z-stat = -0.49931, the p-value =
0.61756 and the cohens-d = -0.07895.
Now, if the Ho is “The average performance of the students is 86%” as what stated in our
previous example and the p-value of 0.61756 is greater than the level of significance of 0.05, our
conclusion is that the average performance of the students is 86% will be confirmed.
Section:________________________ Score:__________
Practice Exercises
z-test for One Sample Group
Note: In performing this problem, use both the manual computation and with the use of SPSS.
The z-test for two sample mean is another parametric test. This is
basically to compare the means of two independent groups where the samples were
drawn. The sample must be drawn from a normally distributed population.
Ha: μ1 ≠ μ2; (The population mean 1 is not equal to the population mean 2)
The formula for the z-test for two sample means is:
𝑥1− 𝑥2
𝑧 = 2 2
𝑠1 𝑠2
𝑛1
+ 𝑛2
Where:
2
𝑠1 = the variance of sample 1
The tabular value of the z-test at 0.01 and 0.05 level of significance is
shown below:
Level of Significance
Test
0.01 0.05
One-tailed ±2.33 ±1.645
Two-tailed ±2.575 ±1.960
Example:
1. Hypotheses:
2. Level of Significance:
α = 0.01
z = ± 2.575
Here;
𝑥1 = 89
𝑥2 = 83
2
𝑠1 = 45
2
𝑠2 = 40
𝑛1= 100
𝑛2= 100
𝑥1− 𝑥2 89−83 6 6
𝑧 = 2 2
= 45 40
= 85
= 0.92
= 6. 52
𝑠1 𝑠2
100
+ 100 100
𝑛1
+ 𝑛2
4. Decision Rule:
If the z-computed value is greater than or beyond the z-tabular, reject the Ho.
5. Conclusion:
Section:________________________ Score:__________
Practice Exercises
z-test for Two Sample Means
The amount of a certain trace element in blood is known to vary with a standard
deviation of 14.1 ppm (parts per million) for male blood donors and 9.5 ppm for female donors.
Random samples of 75 male and 50 female donors yield concentration means of 28 and 33 ppm,
respectively. What is the likelihood that the population means concentrations of the element are
the same for men and women? Use α = 0.05 level of significance.
Note: In performing this problem, use both the manual computation and with the use of SPSS.
C. The F-Test
Earlier in this section we perform a t-test to compare a sample mean to an accepted value,
or to compare two sample means. This is another parametric test also called the Analysis of
Variance (ANOVA).
Ronald A. Fisher developed the F-test. This test is used when the variances of two
populations are equal or when the distribution is normal and the level of measurement is interval
or ratio.
The test can be a two-tailed or a one-tailed test. If the variances are not equal, the
two-tailed test should be used while the one-tailed test used in one direction where the variance
of the first population is either greater or less than the second population variance but it should
be both.
There are three kinds of analysis of variance and these are (a) one-way analysis of
variance, (b) two-way analysis of variance and the (c) three-way analysis of variance. In this
section, one-way analysis of variance will be the focus of discussion.
The one-way Analysis of Variance (ANOVA) can be used for the case of a
quantitative outcome with a categorical explanatory variable that has two or more
levels of treatment. The term one-way, also called one-factor, indicates that there
is a single explanatory variable (“treatment”) with two or more levels, and only
one level of treatment is applied at any time for a given subject.
The sources of variations are between the groups, within the group itself
and the total variation. The degree of freedom for the total is the total number of
observations minus 1. The degree of freedom from the between groups is the total
number of groups minus 1. The degree of freedom for the within group is the
total degree of freedom (df) minus the between groups of degree of freedom (df).
Source of F-value
df SS MS
Variation Computed Tabular
See the table
Between 𝐵𝑆𝑆 at 0.05 or
𝐾−1 BSS
Groups 𝑑𝑓 the desired
𝑀𝑆𝐵 level of
= 𝐹
𝑀𝑆𝑊 significance
with df
Within 𝑊𝑆𝑆
(𝑁 − 1) − (𝐾 − 1) WSS between and
Groups 𝑑𝑓
within
group
Total N-1 TSS
Example 1.
The computer store is selling three different brands of cellular phone. The
manager of the store wants to determine if there is a significant difference in the
average sales of the three brands of cellular phone for five-day selling. The
following data are recorded:
BRAND
DAY
A(x1) B(x2) C(x3) (x1)2 (x2)2 (x3)2
1 4 8 3 16 64 9
2 6 3 5 36 9 25
3 2 6 3 4 36 9
4 5 4 6 25 16 36
5 2 7 4 4 49 16
Total 2 2 2
∑ 𝑥1 = 19 ∑ 𝑥1 = 28 ∑ 𝑥1 = 21 ∑ 𝑥1 = 85 ∑ 𝑥2 = 174 ∑ 𝑥3 = 95
n1 = 5 n2 = 5 n3 = 5
Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the average sales of the three brands of cellular phone are equal.
Problem: Is there any significant difference in the average sales of the three
brands of cellular phone?
1. Hypotheses:
Ho: There is no significant difference in the average sales of the
three brands of cellular phone.
3. Computation:
2
2
(19+28+21)
𝐶𝐹 = 5+5+5
= 308. 27
2 2 2
𝑇𝑆𝑆 = ∑ 𝑥1 + ∑ 𝑥2 + ∑ 𝑥3 − 𝐶𝐹
2 2 2
2 2 2
(19) (28) (21)
𝐵𝑆𝑆 = 5
+ 5
+ 5
− 308. 27
ANOVA Table
Source of F-value
Df SS MS
Variation Computed Tabular
4. Decision rule:
Example 2.
Test scores in the entrance examination of the incoming Grade 11 students
from four strands of the Academic Track are shown in the next table. Is there a
difference in the mean scores among the strands?
Strands
A(x1) B(x2) C(x3) D(x4) (x1)2 (x2)2 (x3)2 (x4)2
40 50 50 35 1600 2500 2500 1225
30 47 45 45 900 2209 2025 2025
25 45 48 50 625 2025 2304 2500
27 48 50 26 729 2304 2500 676
31 38 38 36 961 1444 1444 1296
42 45 24 1764 2025 576
39 1521
2 2 2 2
∑ 𝑥1 = 195 ∑ 𝑥2 = 312 ∑ 𝑥3 = 231 ∑ 𝑥4 = 216 ∑ 𝑥1 = 6579 ∑ 𝑥2 = 14028 ∑ 𝑥3 = 10773 ∑ 𝑥4 = 8298
n1 = 6 n2 = 7 n3 = 5 n4 = 6
Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the mean scores among the strands are equal.
3. Computation:
2
2
(195+312+231+216)
𝐶𝐹 = 6+7+5+6
= 37921. 5
2 2 2 2
𝑇𝑆𝑆 = ∑ 𝑥1 + ∑ 𝑥2 + ∑ 𝑥3 + ∑ 𝑥4 − 𝐶𝐹
2 2 2 2
2 2 2 2
(195) (312) (231) (216)
𝐵𝑆𝑆 = 6
+ 7
+ 5
+ 6
− 37921. 5
ANOVA Table
Between
3 770.5 256.83
Groups
5.21 3.10
Within
20 986 49.3
Groups
Total 23 1756.5
4. Decision rule:
5. Conclusion:
Testing the hypothesis using the manual computation especially in doing one-way
analysis of variance is a very lengthy process or computation on the part of the researchers. To
make it easier performing this statistical tool to analyze the data, the SPSS will help the
researcher to make the computation easier on the part of a researcher since the software will do
the mathematical computation.
Let us use the data on the first example presented under this test statistics. Based on this
example, we will test the hypothesis if there is any significant difference in the average sales of
the three brands of cellular phone? Assuming that the researcher already encodes all the data in
SPSS.
Variable View
Data View
Remember that the dependent variable is the number of sales as scale variable while the
independent variable as the nominal variable is the brand of cellular phone.
Step 1. On the main menu bar, click “Analyze”, then look for the “Compare Means” then
look and click the “One-way Analysis of Variance”.
Step 3. On the above window, place the independent variable into factor and the
dependent variable into the dependent list using the arrow. In this example, the independent
variable is the nominal variable and that is the brand of cellular phone and the dependent variable
is the number of sales.
Step 4. Click the “option” icon and another window will appear. In this window you will
select important things that can be used in your analysis of data.
Step 5. Select the important things to be used in the analysis of data by clicking on the
following: descriptive, homogeneity of variance test, brown-forsythe test and the welch test.
You could also click the means plot in order to see the graph if there is a significant difference
among the means of the three brands.
Note: The output window for the homogeneity of variance test, brown-forsythe test and the
welch test would be discussed on the latter part of this course. It will be focusing more on the
descriptive table, the ANOVA table and the graph.
Section:________________________ Score:__________
Practice Exercises
One-Way Analysis of Variance
In other words, we are going to consider the relationship between two variables X and Y
rather than predicting a value of Y.
NEGATIVE CORRELATION
NO CORRELATION; r = 0
Note that we use the r to determine the index of relationship between two
variables, the independent and the dependent variables.
Looking at the scatter plot diagram, if r = +1.0, this correlation is a perfect positive
correlation while if r = +0.6, this is a positive correlation. The trend of this line graph is going
upward. This indicates that as the value of x increases, the value of y also increases.
If the value of r = -1.0, this is what we call a perfect negative correlation but if r = -0.6,
this is something we call a negative correlation. In the negative correlation, it indicates that the
line graph is going downward. Here, as the value of x increases, y decreases.
Degree of Correlation
𝑛∑𝑥𝑦− ∑𝑥∑𝑦
𝑟=
2 2
2 2
(𝑛∑𝑥 −(∑𝑥) )(𝑛 ∑𝑦 −(∑𝑦) )
2
∑ 𝑥 = sum of squares of x
2
∑ 𝑦 = sum of squares of y
Example:
Consider the pre-test and the post test in Statistics and Probability of the ten
students of CS 3201.
Student
Exam A B C D E F G H I J
Pre-test
56 70 60 85 75 87 72 89 75 86
(x)
Post-test
65 78 60 90 75 90 79 89 89 95
(y)
Solution:
1. Hypotheses:
2. Level of Significance:
α = 0.05
df = n – 2 = 10 – 2 = 8
r(0.05) = 0.6319 (tabular value)
3. Computation: (Statistics)
x y x2 y2 xy
56 65 3136 4225 3640
70 78 4900 6084 5460
60 60 3600 3600 3600
85 90 7225 8100 7650
75 75 5625 5625 5625
87 90 7569 8100 7830
72 79 5184 6241 5688
89 89 7921 7921 7921
75 89 5625 7921 6675
86 95 7396 9025 8170
2 2
∑ 𝑥 = 755 ∑ 𝑦 =810 ∑ 𝑥 =58181 ∑ 𝑦 =66842 ∑ 𝑥y =62259
Apply in a formula:
𝑛∑𝑥𝑦− ∑𝑥∑𝑦
𝑟=
2 2
2 2
(𝑛∑𝑥 −(∑𝑥) )(𝑛 ∑𝑦 −(∑𝑦) )
(10)(62259)− (755)(810)
𝑟= 2 2
[(10)(58181)− (755) ][(10)(66842)−(810) ])
11040
𝑟=
(11785)(12320)
11040 11040
𝑟= = 12049.53
= 0. 92
145191200
4. Decision Rule:
If the computed r value is greater than the r tabular value, reject the
null hypothesis.
5. Conclusion:
Since the computer r-value which is 0.92 is higher than the r-tabular
value which is 0.632 at 0.05 level of significance with 8 degrees of
freedom, we are going to reject the null hypothesis. This means that there
is a significant relationship between the pre-test and post-test of the ten
students of CS in Statistics and Probability. It implies that the higher the
pre-test the higher also the post-test.
Another statistical test that the SPSS could do is the Pearson product moment of
correlation or simple Pearson-r. Just like what we do in our previous examples, we need to have
first a data to be analyzed that are already encoded in the data view of SPSS.
Basically, this test statistics is used if you want to determine if the x variable is correlated
or have a relationship in y variable whether it is a positive correlation or a negative correlation or
in other words, the Pearson correlation is one of the test statistics in analyzing if one variable is
associated into another variable. Usually, we compare two quantitative data or the two groups of
data are scale in nature. In doing this test statistics, we will be using the given data in an example
under this statistical test. Let us say the data on the pre-test and in post-test are already encoded
in the variable and data view of SPSS.
Step 1. On the menu bar, click “Analyze” then “Correlate” and look for “Bivariate” and
click the “Bivariate”.
Step 2. When you click the “Bivariate”, another window will appear.
Note: Here, you will see the two groups, the pre-test and post-test that we need to test if the two
groups are correlated or if the pre-test has a relationship or is associated with the post-test. You
will notice also that the default is in Pearson, two-tailed (meaning our assumption is no
relationship) and the flag significant correlation was checked, meaning that SPSS will tell us or it
shows us if there is a significant correlation between the two variables or not.
Step 3. Transfer the two groups of variables, the pre-test and the post-test into the
variables box with the use of an arrow. If you want to have other test statistics to be used such as
mean and others, you may click the “option” button. Another window will appear.
Step 4. Once the window will appear, you could click the “means and standard
deviation” as part of the analysis of your data and then click “continue” and it will go
back to the first window. The next thing that you are going to do is just click “ok”. When
you click the “ok” button, the output view will be displayed.
With this output, we are now ready to analyse and interpret the result. As you can see on the
bottom part to the correlation table, it has two asterisk **, meaning that the pre-test and post test
have a relationship or they could say that they are statistically related to each other. We could
conclude that “Since the p-value of 0.000 is less than the level of significance of 0.01 on the
r-value of 0.916, we will reject the null hypothesis in favor of alternative hypothesis that is
“There is a significant relationship between the pre-test and the post-test” and there is a high
positive correlation since the r = 0.916. So, we could say that the higher the pre-test, the higher
the post test.
Section:________________________ Score:__________
Practice Exercises
Pearson Product Moment of Correlation
Find the value of r and interpret the result at 0.05 level of significance both
manual computation with the use of test statistics formula and with the use of SPSS.
Like other parametric tests, it must also meet some conditions. First the data should
be normally distributed using the level of measurement which is expressed in an interval or
ratio data.
Linear regression analysis consists of more than just fitting a linear line through a
cloud of data points. It consists of 3 stages – (1) analyzing the correlation and directionality
of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and
usefulness of the model.
In regression analysis, we are going to consider two variables. If two variables are
correlated, that is if the correlation coefficient (r) is significant, then it is possible to predict
or estimate the value of one variable from the knowledge of the other variable.
𝑦 = 𝑎 + 𝑏𝑥
Where;
x is the independent or the predictor variable
Note that if the slope is positive, it means that as x increases, y also increases
while if the slope is negative, it means that as x increases, y decreases. In
addition, if the slope is zero (0), the y would become constant, ie., y = a.
∑𝑦− 𝑏∑𝑥
𝑎= 𝑛
𝑛(∑𝑥𝑦)− ∑𝑥∑𝑦
𝑏= 2
2
𝑛∑𝑥 − (∑𝑥)
Computation:
Here x = 79
11040
= 11785
= 0. 94
∑𝑦− 𝑏∑𝑥
𝑎= 𝑛
The computed value of b = 0.94 indicates that the slope of a line is positive or we
have a positive slope. We could say that as the score of the pre-test increases, the
post-test also increases. Based on the value for a and b, if the pre-test of a certain student
is 79, the possible grade for his post-test would be 84.
Once you are familiar with the use of SPSS, doing test statistics in simple linear
regression would be easy for you. Just like other test statistics previously presented, we need to
look for the menu bar of this software for the “Analysis” icon and you will see the different test
statistics that you need. Just click on the analysis icon and drop down in order to look for the
target statistics tool. Of course, to do this, you first need to have data in the data view of your
SPSS.
Step 1. On the menu bar, look for the “Analyze” and click this icon. Look for the “Regression”
and proceed to “Linear”. Once you click “Linear”, another window will appear.
Step 2. You have to identify the dependent and the independent variable. In this example, will
place the independent variable (the pre-test) in “X-Axis” while the dependent variable (the
post-test) in the “Y-Axis”. Highlight each of these and click the arrow.
Step 3. If you want to have additional information such as descriptive statistics and other
information like the scatter plot diagram, we will look at those important details by clicking on
the appropriate and respective icon. If you click on the “Statistics”, you will see another dialog
box.
Here, the default is the “estimates” as well as the “model fit”. The important details that we need
to look into for the analysis are the “confidence intervals” and basically we use the 95% level of
confidence, and the “Descriptive”. The other icon is used for multi-regression analysis. Then
click “continue”.
Once you click the continue button/icon, you will see the first dialog box. Then click “ok” and
the output view will appear that is needed in our analysis of data.
On the output view, you will see the table for “Descriptive Statistics” as well as the table for
“Correlations” in Pearson correlation. The numerical value for mean, standard deviation, the
value for Pearson correlation and the p-value are the important details that are needed in the
analysis of our data.
Another important output needed in analyzing the data are as follows and you could also see
these in the output view on the SPSS.
Note: Listen to your instructor on how your instructor analyzes the data with the use of some
important numerical values that are needed in analyzing the data. The only numerical data that
your instructor could use are as follows:
>R
> R-square
> Standard error of estimate
Another way in order to determine or to create a graph with the use of SPSS, we will be using
the “graph” icon as what you have learned in our previous lesson. On the menu bar, look for the
“Graphs” icon and proceed to “Legacy Dialogs” and proceed to “Scatter/Dot Diagram” and
follow this sequence of dialog boxes.
If you click the “continue” button, the result graph will appear.
Section:________________________ Score:__________
Practice Exercises
Simple Linear Regression