0% found this document useful (0 votes)
19 views74 pages

Module-6-Inferential-Statistics-Parametric-Test

This document discusses inferential statistics, focusing on parametric tests, particularly the t-test, which is used to compare means between two independent groups. It explains the conditions for using parametric tests, the process of conducting a t-test, and provides examples of how to apply these tests manually and using SPSS software. The document also includes practice exercises for further understanding of the concepts presented.

Uploaded by

jamespacilan06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views74 pages

Module-6-Inferential-Statistics-Parametric-Test

This document discusses inferential statistics, focusing on parametric tests, particularly the t-test, which is used to compare means between two independent groups. It explains the conditions for using parametric tests, the process of conducting a t-test, and provides examples of how to apply these tests manually and using SPSS software. The document also includes practice exercises for further understanding of the concepts presented.

Uploaded by

jamespacilan06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Semester

DATA ANALYSIS

MODULE 4
INFERENTIAL STATISTICS - PARAMETRIC TEST

In inferential statistics, the researcher is trying to arrive at a conclusion. When we say


inference, this is an educated guess or a meaningful prediction based on findings and conclusion.

We use inferential statistics to try to infer from the sample data what the population might
be or we use inferential statistics to make a conclusion of the probability observed between the
differences of two groups. In other words, we use inferential statistics to make inferences from
our collected data to more general conditions. In inferential statistics, we do different statistical
tests whether the data is normally distributed or not and these are the parametric test and
non-parametric test.

In the previous module, your teacher presented you with the different test statistics under
parametric and non-parametric tests. In this module, selected statistical tests under parametric
tests will be discussed first then followed by the selected statistical tests under non-parametric
tests. Manual computation with the use of statistical tests as well as on how to use SPSS to do
these tests will be presented.

PARAMETRIC TEST

The parametric tests are tests that require normal distribution or we assume that
the data is normally distributed as mentioned above; the levels of measurement are expressed in
an interval or ratio data. We use the parametric test when the distribution is normal, i.e. when the
skewness is equal to zero and the kurtosis is equal to 0.265.

We use the parametric test since it is a more powerful statistical tool compared to
the non-parametric test. But, when do we use the parametric tests? We can use the different
parametric test if we meet two major conditions, one is when we assume that the data is normal
and two when the data is in the interval and in ratio data as stated above.

A. t-test

One of the most commonly used parametric tests in statistics is the t-test also
known as the “Student t-test”. For this section, we will be learning about two tests of
t-test and these are the t-test for independent samples and the correlated t-test or known
as the paired-test. We use this test in order to determine if the two quantitative variables
are statistically significant or not. In doing this test, our assumptions are that the
distribution of data is normally distributed.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
1. t-test for Independent Sample (Two-tailed test)

What is the t-test for independent samples? The t-test is a test of difference between two
independent groups. This means that we are going to compare two means, say x1 against x2.

This test was introduced by William S. Gosset under the pen name “Student”, hence the
t-test is also called “Student t-test”. It is the most common test used by the researchers.

According to Broto (2007), the t-test for independent samples is used when we compare
means of two independent groups and if the distribution is normally distributed where Sk = 0
and Ku = 0.265. This test is used when the data is interval or ratio with a sample which is less
than 30.

However, if n becomes larger, the t-distribution comes close to z-distribution. Thus, t-test can
be used not only if n less than thirty (30) but also if n is large or n is greater than or equal to
thirty ( 30) and if the populations’ standard deviation is not known.

In our previous lesson, the divisor n – 1 in the formula for variance and the standard
deviation is what we call the degree of freedom. The degree of freedom is the number of
variables which are free to vary.

The formula for t-test for two independent sample is:

where;

t = t-test

x1; the data of group 1

x2 = the data of group 2

= the sum of squares of group 1

= the sum of squares of group 2

= the number of observations in group 1

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

= the number of observations in group 2

= degree of freedom (df)

= the mean of group 1

= the mean of group 2

= =sum of squares of group 1

= = sum of squares of group 2

Example 1:

The following are the scores of 10 male students and 10 female third year
Computer Science students of Prof. Alyssa Jandy Angulo in the preliminary examination in
Statistics and Probability.

Scores of Male and Female Third Year Computer Science students

in the Preliminary Examination in Statistics and Probability


Male Female
28 24
36 18
34 22
32 10
8 20
24 6
24 14
20 4
18 12
34 26

Prof. Alyssa Jandy Angulo wants to determine if there is no significant difference


between the performance of the male and female students in the preliminary examination. She
uses the t-test for two independent samples at 0.05 level of significance.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Solution:

Problem: Is there a significant difference between the performance of the male and
female third year Computer Science students in the preliminary examination in Statistics and
Probability?

1. Hypotheses:

Ho: There is no significant difference between the performance of the male and female
third year Computer Science students in the preliminary examination in Statistics and
Probability.

Ha: There is a significant difference between the performance of the male and female
third year Computer Science students in the preliminary examination in Statistics and
Probability.

2. Level of Significance:
Based on the problem;

α = 0.05

df = = 10 + 10 – 2 = 18

t(0.05) = 2.101 (see table for critical values of t-test)

3. Compute the t-value using the t-test for two independent samples:

Male Female

28 784 24 576

36 1296 18 324

34 1156 22 484

32 1024 10 100

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
8 64 20 400

24 576 6 36

24 576 14 196

20 400 4 16

18 324 12 144

34 1156 26 676

Mean 𝑥 = 25.80 Mean 𝑥 = 15.60


1 2

Solving for SS1 and SS2;


2

(∑𝑥1) 2
2 (258) 66564
𝑆𝑆1 = ∑ 𝑥1 − 𝑛1
= 7356 − 10
= 7356 - 10
= 7356 – 6656.4 = 699.6
2

(∑𝑥2) 2
2 (156) 24336
𝑆𝑆2 = ∑ 𝑥2 − 𝑛2
= 2952 − 10
= 2952 - 10
= 2952 – 2433.6 = 518.4

Plug in the formula:

= 2.773

4. Decision Rule:

Reject the Ho if |t-computed| is greater than the |t-critical| otherwise accept the Ho.

5. Conclusion:

Since the t-computed value of 2.773 is greater than the t-tabular value of 2.101 at 0.05
level of significance with 18 degrees of freedom, the null hypothesis is not confirmed. This
means that there is a significant difference between the performance of the male and female

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
third year Computer Science students in the preliminary examination in Statistics and
Probability.

Example 2.

To compare the batting average of two review centers for engineering board
examination, a graduate obtains the following data on the previous batch of examinees.
Test the hypothesis that RAR Review Center has a better track record than JTH Review
Center. Use a 5% (0.05) level of significance.

RAR JTH
Sample Size 10 12
Mean Rating 94.5 90
Standard Deviation 6.2 7.1

Solution:

1. Hypotheses:

Ho: The RAR Review Center has the same track record to JTH Review Center.

Ha: The RAR Review Center has a better track record than JTH Review Center.

2. Level of Significance:

α = 0.05
df = 𝑛1 + 𝑛2 − 2 = 10 + 12 – 2 = 20

t(0.05) = 2.086 (see table for critical values of t-test)

3. Plug in the formula:

𝑥1−𝑥2 94.5−90
𝑡= = = 11. 71
( 𝑆𝑆1+ 𝑆𝑆2
𝑛1+ 𝑛2−2 )( 1
𝑛1
+
1
𝑛2 ) ( 6.2+7.1
20 )( 1
10
+
1
12 )

4. Decision Rule:

If the t-computed is greater than or equal to the t-critical value, reject the Ho in favor of
Ha.

5. Conclusion:

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Since the t-computed value of 11.71 is greater than the t-tabular value of 2.086 at
0.05 level of significance with 20 degrees of freedom, the null hypothesis is rejected in
favor of alternative hypothesis. This means that the RAR Review Center has a better
track record than JTH Review Center.

t-test for Two Independent Sample using SPSS

In our previous lesson on how to test two independent samples if they are associated with each
other and look if there is a significant difference between the mean of these two independent samples, the
t-test for two independent samples was used. Also, this test statistics is under parametric test hence we
will be using this test if the distribution of the data is normal. But it is very tedious if we do the manual
computation of determining if the two independent samples are significantly different. We can use the
SPSS to test two independent samples if they are significantly different or not. Let us take this example.

A teacher wants to determine whether the group of male students and female students in his class
“Data Analysis” performs differently based on the score of their midterm examination. Here is the data
based on the class record of a teacher.

Male Students Female Students

28 24

36 18

34 22

32 10

8 20

24 6

24 14

20 4

18 12

34 26

Our research problem would be: “Is the score of male students statistically significant to the score of
female students?” So, our hypotheses would be that:

Ho: There is no significant difference between the scores of male and female students.
Ha: There is a significant difference between the scores of male and female students.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Note: For this example, we will be using a level of significance of 0.05.

Step 1. Make an appropriate variable to be used in the variable view of SPSS and enter the data in the data
view of SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 2. Go to “Analyze”; compare means then click the “Independent-Sample T Test”.

Then, another dialog box will appear.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Step 3: Highlight and transfer the “score” using the arrow to the “Test Variable(s) and the “sex” to the
“grouping variable”. When you transfer the “sex” into the grouping variables, click the “define groups”
since SPSS will ask you what groups are we going to test. We need to use the coding scheme. Here we
use 1 for male and 2 for female.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Step 4. Click “continue”. Then the icon “ok” will appear in the previous dialog box and you may now
click “ok”. The output view will appear.

Note: On this output view, we will be focusing more on the middle part of the table such as the t = 2.773,
df = 18 and the Sig. (2-tailed) = 0.013 in deciding whether we will be rejecting the null or accepting it.

Conclusion: Since the p-value of 0.013 is less than the level of significance of 0.05, we need to reject the
null hypothesis in favor of the alternative hypothesis, therefore there is a significant difference between
the score of male and female.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
t-test for Independent Sample (Two-tailed test)

Consider the problem below:

A Professor wanted to find out if the traditional method of teaching in


Mathematics in the Modern World has no significant difference in the outcomes-based education
method of teaching. Two sections having equal intelligence were selected. From one class, he
considered 15 students with whom he used the traditional method of teaching and from the other
class, he considered 14 students with whom he used the outcomes-based education method of
teaching. After a series of sessions, a 50-item test was given. The scores are shown in the table
below:

Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TTM 35 33 34 25 26 35 43 28 39 36 28 18 36 34 42
OBE 32 36 45 36 36 25 37 42 33 28 37 29 32 43
*TTM – traditional teaching method OBE – Outcome Based-Education

Test the null hypothesis that there is no significant difference between the
performance of the two groups of students under traditional and outcomes-based education
methods of teaching. Use t-test at 0.05 level of significance.

Note: Use both the manual computation and with the use of SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
B. The t-test for Correlated Samples or Paired t-test

Another parametric test is the t-test for correlated samples. This test is
applied only in one group of samples. This could be used in the evaluation of a
certain program or treatment. Since the t-test for correlated sample is another
parametric test, conditions must be met such as it should be in a normal
distribution and the use of interval or a ratio data.

This test is applied when the mean before is being compared by the mean
after in a certain evaluation of a group. This could be done by this simple
diagram:

Pre-test (Before Intervention) → Intervention → Post-test (After Intervention)

Compare the pre-test and post-test and decide whether there is a significant or
non-significant difference between the pre-test and post-test using the t-test for correlated
sample known as the paired t-test.

The t-test for correlated samples is used to find out if a difference exists
between the before and after means. If there is a difference in favor of the post
test then the intervention or treatment or method is effective. However, if there is
no significant difference then the treatment or method is not effective.

The formula for the t-test for correlated samples is given by;

𝐷
𝑡=
2
(∑𝐷)
2
∑𝐷 − 𝑛

𝑛(𝑛−1)

Where;

t = t-test for correlated samples

𝐷 = the mean difference between the pretest and the posttest

2
∑ 𝐷 = the sum of the squares of the difference between the pretest and the

posttest

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
∑ 𝐷= the summation of the difference between the pretest and the posttest

n = the sample size

Example 1.

During the first day of class, a professor conducted a 50-item pre-test to his
fifteen students in Statistics and Probability before the formal lesson of the said subject. After a
semester, he gave a posttest to his fifteen students using the same set of examinations that he was
given in the pretest. He wants to determine if there is a significant difference between the pretest
and the posttest. The following is the result of the experiment. The professor uses the α = 0.05
level of significance.

Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pre-test 15 12 20 10 8 27 29 13 19 22 25 14 28 18 16
Post-test 20 18 25 25 20 35 43 28 29 37 46 27 33 37 28
Solution:

Problem: Is there a significant difference between the Pretest and Posttest of the fifteen students
in Statistics and Probability on the use of the teaching method by the Professor?

1. Hypotheses:

Ho: There is no significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability based on the teaching method used by the Professor.

Ha: There is a significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability based on the teaching method used by the Professor.

2. Level of Significance:

Based on the problem;


α = 0.05
df = 𝑛 − 1 = 15 - 1 = 14
t(0.05) = ±2.145 (see table for critical values of t-test)

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
3. Compute the t-value using the t-test for correlated samples:

Pretest Posttest
𝑥1 𝑥2 D D2
15 20 -5 25
12 18 -6 36
20 25 -5 25
10 25 -15 225
8 20 -12 144
27 35 -8 64
29 43 -14 196
13 28 -15 225
19 29 -10 100
22 37 -15 225
25 46 -21 441
14 27 -13 169
28 33 -5 25
18 37 -19 361
16 28 -12 144
2
∑𝐷 = − 175 ∑ 𝐷 = 2405

𝐷 = -11.67

Plug in a formula:

𝐷 −11.67
𝑡= = 2
2 (−175)
2405− 15
(∑𝐷)
2 15(14)
∑𝐷 − 𝑛

𝑛(𝑛−1)

−11.67 −11.67
𝑡= =
2405− 2041.67 1.73
210

−11.67
𝑡= 1.32
=− 8. 84

4. Decision Rule:

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
If the t-computed value is greater than or beyond the critical value, reject the Ho
otherwise accept the Ho.
5. Conclusion:

The t-computed value of -8.84 is beyond the t-critical value of -2.145 at


0.05 level of significance with 14 degrees of freedom. The null hypothesis is
therefore to reject in favor of the research hypothesis. This means that the posttest
result is higher than the pretest result. It implies that the method of teaching of a
Professor is effective.

Example 2.

A coach in a certain University wants to determine if there is a difference in the average


running time of his five players before and after the training. He claims that the times of his five
runners will improve before and after the training. Is his claim true? Use 0.05 as level of
significance. Below is the data:

Average time before training Average time after training


Runner 𝑥1 𝑥2
1 1.46 1.42
2 1.52 1.47
3 1.74 1.6
4 1.77 1.73
5 1.83 1.62

Solution:

Research Problem: Is the coach's claim that the times of the five runners will improve
before and after the training is true?

1. Hypotheses:

Ho: There is no significant difference on the average time of running before and after
the training.

Ha: There is a significant difference in the average time of running before and after
the training.

2. Level of Significance:

Based on the problem;


α = 0.05

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
df = 𝑛 − 1 = 5 - 1 = 4
t(0.05) = ±2.776 (see table for critical values of t-test)
3. Compute the t-value using the paired t-test or t-test for correlated samples:

Average time Average time


before training after training
Runner 𝑥1 𝑥2 D D2
1 1.46 1.42 0.04 0.0016
2 1.52 1.47 0.05 0.0025
3 1.74 1.6 0.14 0.0196
4 1.77 1.73 0.04 0.0016
5 1.83 1.62 0.21 0.0441
2
∑ 𝐷 = 0. 48 ∑ 𝐷 = 0. 0694

𝐷 = 0.096

Plug in a formula:

𝐷 0.096
𝑡= = 2
2 (0.48)
0.0694− 5
(∑𝐷)
2 5(4)
∑𝐷 − 𝑛

𝑛(𝑛−1)

0.096 0.096
𝑡= =
0.0694−0.04608 0.001166
20

0.096
𝑡= 0.0341
= 2. 815

4. Decision Rule:

If the t-computed value is greater than or is beyond the critical value, reject the Ho
otherwise accept the Ho.

5. Conclusion:

The t-computed value of 2.815 is beyond the t-critical value of 2.776 at 0.05 level of
significance with 4 degrees of freedom. The null hypothesis is therefore to reject in favor of
the alternative hypothesis. This means that there is a significant difference on the average
time of running before and after the training.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
How to do t-test for correlated sample or paired t-test in SPSS?

In order to do t-test for correlated samples in SPSS, you have to bear in mind that the sample here
that we are referring to is in only one group and we need to test two different parameters in every
observation in a group. Examples for this are before treatment and after treatment.

Illustration:

1. Is there any difference in the number of heartbeats per minute before and after jogging?
2. Is there any difference between the scores of students in pre-test and in post-test?
3. Is there any difference in the temperature of rats before and after a shot of dosage of medicine?

These are just some possible problems that t-test for correlated samples could be used and in order to test
the hypothesis in using these test statistics in a very easy way, SPSS could be one of the useful software to
be used.

Below is an example of data to analyze using SPSS.

Result of Pre-test and Post-test of Students in Data Analysis

Student Pre-test Post-test Student Pre-test Post-test

A 23 40 L 44 40

B 10 27 M 32 30

C 25 26 N 29 43

D 18 37 O 45 43

E 35 40 P 21 29

F 33 46 Q 18 29

G 26 28 R 32 42

H 18 32 S 29 36

I 46 35 T 32 45

J 32 32 U 35 40

K 29 37 V 28 29

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Problem: Is there any significant difference in the pre-test and post-test scores of students in Data
Analysis?

Ho: There is no significant difference in the pre-test and post-test scores of students in Data Analysis

Ha: There is a significant difference in the pre-test and post-test scores of students in Data Analysis

To test the hypothesis whether there is no significant difference or there is a significant difference, we will
do the paired t-test using SPSS.

Step 1: First, enter the data in the data view of SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 2: Click “Analyze” then proceed to “Compare Means” and another dialog box will appear and look
for “Paired t-test”.

Step 3: Click the “Paired-samples t-test” and another dialog box will appear just like below;

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 4: Place the “pre-test” in variable 1 and “post-test” in variable 2 by highlighting and clicking the
arrow.

Note: The option button will serve what confidence level you want to choose. The 95% is the default of
SPSS.

Step 5: Then click “ok”. Once you click the ok button, the output view will appear.

In an output view, we need to focus more on the “Sig” for paired samples test. The sig value for
this pair of data to be tested is 0.001 and this value will be compared to the level of significance of 0.05.
You will notice that the sig value is less than the level of significance. What is the meaning of this? If
you are asking to analyze and interpret the result based on the sig value you could say that we need to
reject the null hypothesis in favor of the alternative hypothesis.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
So, what would be our conclusion? Since the p-value of 0.001 is less than the level of
significance of 0.05, reject the null hypothesis in favor of the alternative hypothesis. Thus, there is a
significant difference on the score of students in their pre-test and post test.
Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
t-test for Correlated Sample (Paired-t-test)

Consider the research problem below:


A Mathematics teacher in Secondary Education conducted a 50-item multiple choice
pretest in Trigonometry to his 24 second year students before the formal class started. He used
several methods of teaching to his students. Before the academic year ends, he gives a post test
on the same set of students. Based on the record, determine if there is a significant difference
between the pretest and posttest. The teacher uses α = 0.05 level of significance.

Student Pretest Posttest Student Pretest Posttest


1 15 35 13 10 29
2 20 43 14 9 30
3 12 33 15 22 30
4 24 30 16 18 26
5 18 27 17 24 35
6 35 35 18 22 40
7 29 30 19 17 29
8 37 40 20 16 32
9 30 35 21 27 36
10 18 20 22 22 33
11 28 36 23 35 43
12 30 30 24 16 38

Note: Use both the manual computation that is with the use of the formula and check if the
result is as the same in using the SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Table for the Critical Value of t-test

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

B. The z-test

z-test is another test under parametric statistics which requires normality


of distribution. It is often applied in large samples (n > 30), if the population
standard deviation (σ) is known. It is used to compare two means, the sample
mean, and the perceived population mean.

It is also used to compare the two sample means taken from the same
population. The z-test can be applied in two ways: (1) the One-Sample Mean test
and (2) the Two-Sample Mean test.

The tabular value of the z-test at 0.01 and 0.05 level of significance is
shown below:

Level of Significance
Test
0.01 0.05
One-tailed ±2.33 ±1.645
Two-tailed ±2.575 ±1.960

1. The z-test for One Sample Group

The z-test for one sample group is used to compare the perceived population
mean μ against the sample mean 𝑥. Using this test, we can determine whether the mean
of a group differs from a specified value.
This procedure is based on the normal distribution. So for small samples, this
procedure works best if the data that were drawn is a normal distribution or one that is
close to normal. Usually, we can consider samples of size 30 or higher to be large
samples. However, if the population standard deviation is not known, the sample standard
deviation can be used as a substitute.

The null hypothesis used for this test is as follow:

Ho: μ = μc ; (The population mean equals the hypothesized mean)

The alternative hypothesis that we could choose are as follows:

Ha: μ ≠ μc ; (The population mean differs the hypothesized mean)


Ha: μ > μc ; (The population mean is greater than the hypothesized mean)
Ha: μ < μc ; (The population mean is less than the hypothesized mean)

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
The formula for the z-test for a one-sample group is:

(𝑥−μ ) 𝑛
𝑧= σ

where;

z = the z-test for one sample group


𝑥= sample mean
μ = hypothesized value of the population mean
σ = population standard deviation
n = sample size

Example 1:

A Mathematics Professor claims that the average performance of his students is at least
86%. To be able to verify if his claim is true, he conducted an examination of his 40 students.
After the exam, he got a mean grade of 80%. With the standard deviation of 76%, is the claim of
the Professor true? Use the z-test at 0.05 level of significance.

Solution:

Problem: Is the claim of a Professor true that the average performance of his student is at
least 86%?

1. Hypotheses:

Ho: The average performance of the students is 86%.

Ha: The average performance of the students is not 86%.

2. Level of Significance:

α = 0.05
z = ± 1.645

3. Statistics:
z-test for a one-tailed test

Here; 𝑥 = 80(sample mean)


μ = 86 (population mean)

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
n = 40 (sample size)
σ = 76 (population standard deviation)

Plug in a formula:

(𝑥−μ ) 𝑛 (80−86) 40 (−6)(6.32) −37.92


𝑧= σ
= 76
= 76
= 76
= − 0. 49

4. Decision Rule:

If the z-computed value is greater than or beyond the z-tabular, reject the Ho.

5. Conclusion:

Since the z-computed value of -0.49 is not beyond the critical value of -1.645 at
0.05 level of significance, the research hypothesis that the average performance of the
students is 86% will be accepted.

Example 2.

Based on the previous records, it was believed that the obesity among pupils aged 8-10
years old is less than 1.4% for boys. If 15 of the 500 boys aged 8-10 years old are obese, test the
claim that the proportion of obesity among male pupils is not less than 1.4%. Use a 5% (0.05)
level of significance.

Step 1: State the Ho and Ha

Ho: p = 0.014 and Ha : p < 0.014

Step 2: Determine the level of significance

α = 0.05

Step 3: Test Statistics


𝑝−𝑝
𝑧 = 𝑝(1−𝑝)
𝑛

Step 4: Compute the test value and solve for the p-value

Given: X = 15; n = 500; p = 0.014

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
First, find the value of 𝑝.

𝑋 15
𝑝= 𝑛
= 500
= 0. 03

Then plug into a formula:

0.03−0.014
𝑧= (0.014)(1−0.014)
500

0.016 0.016
𝑧= (0.014)(0.986)
= 0.0053
= 3. 02
500

The area of z = 3.02 under a normal curve is A = 0.4987.

Subtracting the area A = 0.4987 from A = 0.5000, we get 0.0013. Hence the p-value is 0.0013.

Step 5 : Decision Rule and Conclusion

If p ≤ α ; reject the null hypothesis otherwise accept the alternative


hypothesis.

Since the value of p = 0.0013 is less than the level of significance,


α = 0.05, reject the null hypothesis hence we could say that the claim of a
proportion of obesity among male pupils is not less than 1.4% is true.

One sample group Z-test using SPSS

How do we use SPSS if you want to use the z-test as your statistical tool? There will be a slight
difference if you would use SPSS to determine the test statistic under z-test since we need a syntax here.
We all know that using this test, we need to know if there is a significant difference between the
population mean and the sample mean where the population standard deviation is also known.

Let us explore this test in SPSS and in order to do this, let us use example number 1 that was
illustrated previously under z-test for one sample group. In example 1, the following are the given: 𝑥 =

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
80(sample mean), μ = 86 (population mean), n = 40 (sample size), and σ = 76 (population
standard deviation).

Step 1. First, look for the website for “how2stats” and if you are now inside the website, look for
the “One sample z-test” and click it. Another site will open.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 2. Highlight the syntax, make a copy of it by using the command either “control c” or “command c”.
Once you do this, proceed to open SPSS and click the file and look for “new” then “syntax”.

Step 3. Click “syntax” and another window will open.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 4. On the right side blank screen, you will paste the copied syntax taken from the website.

Here, on line number 14, you could see the default number of 35, 105, 100, 15. These numbers
are represented by your sample size, sample mean, population mean and population standard deviation
respectively. If you have different values for this, you need to change it based on the given sample size,
sample mean, population mean and population standard deviation. Here are the given in our example, 𝑥
= 80(sample mean), μ = 86 (population mean), n = 40 (sample size), and σ = 76 (population
standard deviation).

Step 5. Put this in your syntax and then click “run” on the menu bar. Once you click the “run”,
SPSS generates the output.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Variable View

Data View
Step 6. Click the “run” icon and SPSS will generate the output.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

On the output view, you will see on the “List” the value of z-stat = -0.49931, the p-value =
0.61756 and the cohens-d = -0.07895.

Now, if the Ho is “The average performance of the students is 86%” as what stated in our
previous example and the p-value of 0.61756 is greater than the level of significance of 0.05, our
conclusion is that the average performance of the students is 86% will be confirmed.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
z-test for One Sample Group

Consider the research problem below:

The manager of a certain review center is interested in determining whether the


review session affects the exam performance of the reviewer. In the previous examination, the
population mean of an exam is 68. With the standard deviation of 10 and a sample mean of 25,
does the review session affect the performance of the 15 students? Use α = 0.05 level of
significance.

Note: In performing this problem, use both the manual computation and with the use of SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

2. The z-test for Two Independent Sample Means

The z-test for two sample mean is another parametric test. This is
basically to compare the means of two independent groups where the samples were
drawn. The sample must be drawn from a normally distributed population.

We use z-test to find out if there is a significant difference between the


two populations by only comparing the sample mean of the population.

The null hypothesis used for this test is as follow:

Ho: μ1 = μ2 ; (The population mean 1 equals the population mean 2)

The alternative hypothesis is as follow:

Ha: μ1 ≠ μ2; (The population mean 1 is not equal to the population mean 2)

The formula for the z-test for two sample means is:

𝑥1− 𝑥2
𝑧 = 2 2
𝑠1 𝑠2
𝑛1
+ 𝑛2

Where:

𝑥1 = the mean of sample 1

𝑥2 = the mean of sample 2

2
𝑠1 = the variance of sample 1

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
2
𝑠2 = the variance of sample 2

𝑛1= size of sample 1

𝑛2= size of sample 2

The tabular value of the z-test at 0.01 and 0.05 level of significance is
shown below:

Level of Significance
Test
0.01 0.05
One-tailed ±2.33 ±1.645
Two-tailed ±2.575 ±1.960

Example:

An entrance examination was administered to incoming freshmen in the department of


Information Technology and the department of Computer Science with 100 students in each
department randomly selected. The mean scores of the given samples were 𝑥1= 89 and 𝑥2 = 83
and the variances of the test scores were 45 and 40 respectively. Is there a significant difference
between the two groups? Use 0.01 level of significance.

Problem: Is there any significant difference between the two groups?

1. Hypotheses:

Ho: There is no significant difference between the two groups.

Ha: There is a significant difference between the two groups.

2. Level of Significance:

α = 0.01
z = ± 2.575

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
3. Statistics:
z-test for a two-tailed test

Here;
𝑥1 = 89
𝑥2 = 83
2
𝑠1 = 45
2
𝑠2 = 40
𝑛1= 100
𝑛2= 100

Plug into a formula:

𝑥1− 𝑥2 89−83 6 6
𝑧 = 2 2
= 45 40
= 85
= 0.92
= 6. 52
𝑠1 𝑠2
100
+ 100 100
𝑛1
+ 𝑛2

4. Decision Rule:

If the z-computed value is greater than or beyond the z-tabular, reject the Ho.

5. Conclusion:

Since the z-computed of 6.52 is beyond the z-tabular of 2.575 at a 0.01


level of significance, we could say that we are going to reject the null hypothesis
and instead we are going to accept the alternative hypothesis that there is a
significant difference between the two groups.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
z-test for Two Sample Means

Consider the research problem below:

The amount of a certain trace element in blood is known to vary with a standard
deviation of 14.1 ppm (parts per million) for male blood donors and 9.5 ppm for female donors.
Random samples of 75 male and 50 female donors yield concentration means of 28 and 33 ppm,
respectively. What is the likelihood that the population means concentrations of the element are
the same for men and women? Use α = 0.05 level of significance.

Note: In performing this problem, use both the manual computation and with the use of SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

C. The F-Test

Earlier in this section we perform a t-test to compare a sample mean to an accepted value,
or to compare two sample means. This is another parametric test also called the Analysis of
Variance (ANOVA).

Ronald A. Fisher developed the F-test. This test is used when the variances of two
populations are equal or when the distribution is normal and the level of measurement is interval
or ratio.

The test can be a two-tailed or a one-tailed test. If the variances are not equal, the
two-tailed test should be used while the one-tailed test used in one direction where the variance
of the first population is either greater or less than the second population variance but it should
be both.

There are three kinds of analysis of variance and these are (a) one-way analysis of
variance, (b) two-way analysis of variance and the (c) three-way analysis of variance. In this
section, one-way analysis of variance will be the focus of discussion.

One-Way Analysis of Variance

The one-way Analysis of Variance (ANOVA) can be used for the case of a
quantitative outcome with a categorical explanatory variable that has two or more
levels of treatment. The term one-way, also called one-factor, indicates that there
is a single explanatory variable (“treatment”) with two or more levels, and only
one level of treatment is applied at any time for a given subject.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
The ANOVA table has five columns and these are the sources of variation,
degrees of freedom, sum of squares, mean squares and the F-value, both the
computed and the tabular values.

The sources of variations are between the groups, within the group itself
and the total variation. The degree of freedom for the total is the total number of
observations minus 1. The degree of freedom from the between groups is the total
number of groups minus 1. The degree of freedom for the within group is the
total degree of freedom (df) minus the between groups of degree of freedom (df).

Source of F-value
df SS MS
Variation Computed Tabular
See the table
Between 𝐵𝑆𝑆 at 0.05 or
𝐾−1 BSS
Groups 𝑑𝑓 the desired
𝑀𝑆𝐵 level of
= 𝐹
𝑀𝑆𝑊 significance
with df
Within 𝑊𝑆𝑆
(𝑁 − 1) − (𝐾 − 1) WSS between and
Groups 𝑑𝑓
within
group
Total N-1 TSS

Based on the table:

1. K is the number of columns


2. N is the number of observation
2
(𝐺𝑇)
3. BSS is the between sum of squares minus the CF, where 𝐶𝐹 = 𝑁
4. TSS is the total sum of squares minus the CF.
5. WSS is the within sum of squares or it is the difference between the TSS
minus the BSS
6. MSB mean squares between is equal to the BSS/df
7. MSW mean square within is equal to the WSS/df
𝑀𝑆𝐵
8. The 𝐹 = 𝑀𝑆𝑊

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
The F-computed value must be compared with the F-tabular value at a
given level of significance with the corresponding df’s of BSS and WSS. If the
F-computed value is greater than the F-tabular value, we are going to reject the null
hypothesis in favor of the research hypothesis and when the F-computed value is
greater than or beyond than the F-tabular, the alternative is accepted which means that
there is significant difference between and among the mean of the different groups.

Example 1.
The computer store is selling three different brands of cellular phone. The
manager of the store wants to determine if there is a significant difference in the
average sales of the three brands of cellular phone for five-day selling. The
following data are recorded:

BRAND
DAY
A(x1) B(x2) C(x3) (x1)2 (x2)2 (x3)2
1 4 8 3 16 64 9
2 6 3 5 36 9 25
3 2 6 3 4 36 9
4 5 4 6 25 16 36
5 2 7 4 4 49 16
Total 2 2 2
∑ 𝑥1 = 19 ∑ 𝑥1 = 28 ∑ 𝑥1 = 21 ∑ 𝑥1 = 85 ∑ 𝑥2 = 174 ∑ 𝑥3 = 95

n1 = 5 n2 = 5 n3 = 5

Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the average sales of the three brands of cellular phone are equal.

Problem: Is there any significant difference in the average sales of the three
brands of cellular phone?

1. Hypotheses:
Ho: There is no significant difference in the average sales of the
three brands of cellular phone.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Ha: There is a significant difference in the average sales of the
three brands of cellular phone.

2. Level of significance: α = 0.05

3. Computation:
2

(∑𝑥1+ ∑𝑥2+ ∑𝑥3)


𝐶𝐹 = 𝑛1+ 𝑛2+𝑛3

2
(19+28+21)
𝐶𝐹 = 5+5+5
= 308. 27

2 2 2
𝑇𝑆𝑆 = ∑ 𝑥1 + ∑ 𝑥2 + ∑ 𝑥3 − 𝐶𝐹

𝑇𝑆𝑆 = 85 + 174 + 95 − 308. 27 = 45. 73

2 2 2

(∑𝑥1) (∑𝑥2) (∑𝑥3)


𝐵𝑆𝑆 = 𝑛1
+ 𝑛2
+ 𝑛3
− 𝐶𝐹

2 2 2
(19) (28) (21)
𝐵𝑆𝑆 = 5
+ 5
+ 5
− 308. 27

𝐵𝑆𝑆 = 72. 2 + 156. 8 + 88. 2 − 308. 27 = 8. 93

WSS = TSS – BSS = 45.73 – 8.93= 36.8

ANOVA Table

Source of F-value
Df SS MS
Variation Computed Tabular

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Between
2 8.93 4.465
Groups
1.457 3.890
Within
12 36.8 3.07
Groups
Total 14 45.73

4. Decision rule:

If the F-computed is greater than the F-tabular value, reject


the null hypothesis.
5. Conclusion:

Since the F-computed value of 1.457 is less than the


F-tabular value of 3.890 at 0.05 level of significance with 2 and 12
degrees of freedom, retain the null hypothesis that there is no
significant difference in the average sales of the three brands of
cellular phone.

Example 2.
Test scores in the entrance examination of the incoming Grade 11 students
from four strands of the Academic Track are shown in the next table. Is there a
difference in the mean scores among the strands?

Let A = STEM B = ABM C = HUMSS D = GAS

Strands
A(x1) B(x2) C(x3) D(x4) (x1)2 (x2)2 (x3)2 (x4)2
40 50 50 35 1600 2500 2500 1225
30 47 45 45 900 2209 2025 2025
25 45 48 50 625 2025 2304 2500
27 48 50 26 729 2304 2500 676
31 38 38 36 961 1444 1444 1296
42 45 24 1764 2025 576
39 1521
2 2 2 2
∑ 𝑥1 = 195 ∑ 𝑥2 = 312 ∑ 𝑥3 = 231 ∑ 𝑥4 = 216 ∑ 𝑥1 = 6579 ∑ 𝑥2 = 14028 ∑ 𝑥3 = 10773 ∑ 𝑥4 = 8298

n1 = 6 n2 = 7 n3 = 5 n4 = 6

Perform the analysis of variance and test the hypothesis at 0.05 level of
significance that the mean scores among the strands are equal.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
1. Hypotheses:

Ho: There is no significant difference in the mean scores of the


incoming Grade 11 students from four strands of the Academic
Track..

Ha: There is a significant difference in the mean scores of the


incoming Grade 11 students from four strands of the Academic
Track

2. Level of significance: α = 0.05

3. Computation:
2

(∑𝑥1+ ∑𝑥2+ ∑𝑥3 +∑𝑥4 )


𝐶𝐹 = 𝑛1+ 𝑛2+𝑛3 +𝑛4

2
(195+312+231+216)
𝐶𝐹 = 6+7+5+6
= 37921. 5

2 2 2 2
𝑇𝑆𝑆 = ∑ 𝑥1 + ∑ 𝑥2 + ∑ 𝑥3 + ∑ 𝑥4 − 𝐶𝐹

𝑇𝑆𝑆 = 6579 + 14028 + 10773 + 8298 − 37921. 5 = 1756. 5

2 2 2 2

(∑𝑥1) (∑𝑥2) (∑𝑥3) (∑𝑥4)


𝐵𝑆𝑆 = 𝑛1
+ 𝑛2
+ 𝑛3
+ 𝑛4
− 𝐶𝐹

2 2 2 2
(195) (312) (231) (216)
𝐵𝑆𝑆 = 6
+ 7
+ 5
+ 6
− 37921. 5

𝐵𝑆𝑆 = 6337. 5 + 13906. 3 + 10672. 2 + 7776 − 37921. 5 = 770. 5

WSS = TSS – BSS = 1756.5 – 770.5= 986

ANOVA Table

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Source of F-value
Df SS MS
Variation Computed Tabular

Between
3 770.5 256.83
Groups
5.21 3.10
Within
20 986 49.3
Groups
Total 23 1756.5

4. Decision rule:

If the F-computed is greater than the F-tabular value, reject


the null hypothesis in favor of an alternative hypothesis.

5. Conclusion:

Since the F-computed value of 5.21 is greater than the


F-tabular value of 3.10 at 0.05 level of significance with 3 and 20
degrees of freedom between and within groups respectively, reject
the null hypothesis in favor of alternative hypothesis. Hence, there
is a significant difference in the mean scores of the incoming
Grade 11 students from four strands of the Academic Track.

One-way Analysis of Variance using SPSS

Testing the hypothesis using the manual computation especially in doing one-way
analysis of variance is a very lengthy process or computation on the part of the researchers. To
make it easier performing this statistical tool to analyze the data, the SPSS will help the
researcher to make the computation easier on the part of a researcher since the software will do
the mathematical computation.

Let us use the data on the first example presented under this test statistics. Based on this
example, we will test the hypothesis if there is any significant difference in the average sales of
the three brands of cellular phone? Assuming that the researcher already encodes all the data in
SPSS.

Variable View

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Data View
Remember that the dependent variable is the number of sales as scale variable while the
independent variable as the nominal variable is the brand of cellular phone.

Step 1. On the main menu bar, click “Analyze”, then look for the “Compare Means” then
look and click the “One-way Analysis of Variance”.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Step 2. Once you click the “One-way ANOVA”, another window will appear.

Step 3. On the above window, place the independent variable into factor and the
dependent variable into the dependent list using the arrow. In this example, the independent
variable is the nominal variable and that is the brand of cellular phone and the dependent variable
is the number of sales.

Step 4. Click the “option” icon and another window will appear. In this window you will
select important things that can be used in your analysis of data.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 5. Select the important things to be used in the analysis of data by clicking on the
following: descriptive, homogeneity of variance test, brown-forsythe test and the welch test.
You could also click the means plot in order to see the graph if there is a significant difference
among the means of the three brands.

Note: The output window for the homogeneity of variance test, brown-forsythe test and the
welch test would be discussed on the latter part of this course. It will be focusing more on the
descriptive table, the ANOVA table and the graph.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Step 6. Before you click the continue button, make sure that your level of significance is
what you choose. The default level of significance here is 0.05 or the confidence level is 0.95.
Click “continue” and then “ok”. The output view will appear.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Note: If you do the analysis, use the table for the descriptive as well as the ANOVA table. The
analysis is the same as on what we have done in our previous example for this statistical analysis.

Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
One-Way Analysis of Variance

Consider the problem below:

A certain University conducted a Sports Competition. The three units of 5


students for each unit were chosen to be chosen as the winner in a certain competition. The
scores of the students are listed below according to their unit team.

Student Unit 1 Unit 2 Unit 3


1 35 29 45
2 23 10 34
3 44 15 48
4 35 23 38
5 20 25 34

Test if there is a significant difference in the scores obtained by the three


units. Use α = 0.05 level of significance.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Note: In performing this problem, use both the manual computation and with the use of SPSS.

D. The Pearson Product Moment Coefficient of Correlation ( r )

The Pearson product-moment correlation symbolizes as r is a parametric measure of


association for two variables. Basically, this test statistics is used for testing the two quantitative
variables (both interval and ratio) or numerical information. It also measures both the strength
and the direction of a linear relationship. If one variable X is an exact linear function of another
variable Y, a positive relationship exists if the correlation is 1 and a negative relationship exists if
the correlation is -1. If there is no linear predictability between the two variables, the correlation
is 0.
An example in analyzing certain data with the use of this test statistics is determining if a
certain x-value is associated with the y-value, like the higher the IQ means the higher possible
score could get of the student in one of the identified subject or it could be possibly analyze that
the higher the income of a certain barangay, the lesser the number of crime rates.

In other words, we are going to consider the relationship between two variables X and Y
rather than predicting a value of Y.

Consider the scatter plot diagram below:

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

PERFECT POSITIVE CORRELATION; r = 1

PERFECT NEGATIVE CORRELATION; r = -1

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
POSITIVE CORRELATION

NEGATIVE CORRELATION

NO CORRELATION; r = 0

Note that we use the r to determine the index of relationship between two
variables, the independent and the dependent variables.

Looking at the scatter plot diagram, if r = +1.0, this correlation is a perfect positive
correlation while if r = +0.6, this is a positive correlation. The trend of this line graph is going
upward. This indicates that as the value of x increases, the value of y also increases.

If the value of r = -1.0, this is what we call a perfect negative correlation but if r = -0.6,
this is something we call a negative correlation. In the negative correlation, it indicates that the
line graph is going downward. Here, as the value of x increases, y decreases.

If r = 0.00, this is called no correlation or zero correlation. On this particular value of r,


we could say that the line graph cannot be established whether it is upward or downward. Here,
we could say that there is nor correlation between the two variables x and y.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
We use the r to determine the index of relationship between two variables, the
independent and the dependent variables.

Degree of Correlation

1. Correlation is measured on a scale of -1 to +1, where 0 indicates no


correlation and either -1 or +1 suggests high correlation. Both -1 and
+1 are equally high degrees of correlation.

2. Moderate correlation is often suggested by a correlation coefficient


of about 0.7. There is no absolute number guide for
correlation coefficient that tell when a two variables have low to high
degree of correlation; however, r closed to -1 or +1
suggest a high degree of correlation, values closed to 0 suggests no
correlation or low correlation and values
between 0.7 and 0.8 are moderate correlations.

How do we interpret the computed r?

To be able to interpret the computed coefficient correlation r, here is the range of


correlation and its interpretation.

Between ±0.80 to ± 0.99 High correlation

Between ±0.60 to ± 0.79 Moderately high correlation

Between ±0.40 to ± 0.59 Moderate correlation


Between ±0.20 to ± 0.39 Low correlation
Between ±0.01 to ± 0.19 Negligible correlation

Formula for Pearson Product Moment Coefficient Correlation r

The Pearson Product Moment Coefficient of Correlation denoted by r is determined


by using the formula:

𝑛∑𝑥𝑦− ∑𝑥∑𝑦
𝑟=
2 2
2 2
(𝑛∑𝑥 −(∑𝑥) )(𝑛 ∑𝑦 −(∑𝑦) )

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Where:

r = the Pearson Product Moment Coefficient of Correlation


n = sample size

∑ 𝑥𝑦= the sum of the product of x and y

∑ 𝑥 ∑ 𝑦 = the product of the sum of ∑ 𝑥 and the sum of ∑ 𝑦

2
∑ 𝑥 = sum of squares of x

2
∑ 𝑦 = sum of squares of y

Example:

Consider the pre-test and the post test in Statistics and Probability of the ten
students of CS 3201.

Student
Exam A B C D E F G H I J
Pre-test
56 70 60 85 75 87 72 89 75 86
(x)

Post-test
65 78 60 90 75 90 79 89 89 95
(y)

Solve and analyze the value of r. Determine if there is a significant relationship


between the pre-test and post-test of the ten students of CS in Statistics and Probability.
Use α = 0.05 level of significance.

Solution:

Problem: Is there a significant relationship between the pre-test and post-test of


the ten students of CS in Statistics and Probability?

1. Hypotheses:

Ho: There is no significant relationship between the pre-test and


post-test of the ten students of CS in Statistics and Probability.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Ha: There is a significant relationship between the pre-test and
post-test of the ten students of CS in Statistics and Probability.

2. Level of Significance:

α = 0.05
df = n – 2 = 10 – 2 = 8
r(0.05) = 0.6319 (tabular value)

3. Computation: (Statistics)

Pearson Product Moment Coefficient of Correlation

x y x2 y2 xy
56 65 3136 4225 3640
70 78 4900 6084 5460
60 60 3600 3600 3600
85 90 7225 8100 7650
75 75 5625 5625 5625
87 90 7569 8100 7830
72 79 5184 6241 5688
89 89 7921 7921 7921
75 89 5625 7921 6675
86 95 7396 9025 8170
2 2
∑ 𝑥 = 755 ∑ 𝑦 =810 ∑ 𝑥 =58181 ∑ 𝑦 =66842 ∑ 𝑥y =62259

Apply in a formula:

𝑛∑𝑥𝑦− ∑𝑥∑𝑦
𝑟=
2 2
2 2
(𝑛∑𝑥 −(∑𝑥) )(𝑛 ∑𝑦 −(∑𝑦) )

(10)(62259)− (755)(810)
𝑟= 2 2
[(10)(58181)− (755) ][(10)(66842)−(810) ])

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
622590−611550
𝑟=
(581810−570025)(668420−656100)

11040
𝑟=
(11785)(12320)

11040 11040
𝑟= = 12049.53
= 0. 92
145191200

Here, we could say that r = 0.92 is a highly positive correlation.

4. Decision Rule:

If the computed r value is greater than the r tabular value, reject the
null hypothesis.

5. Conclusion:

Since the computer r-value which is 0.92 is higher than the r-tabular
value which is 0.632 at 0.05 level of significance with 8 degrees of
freedom, we are going to reject the null hypothesis. This means that there
is a significant relationship between the pre-test and post-test of the ten
students of CS in Statistics and Probability. It implies that the higher the
pre-test the higher also the post-test.

Pearson Product Moment of Correlation using SPSS

Another statistical test that the SPSS could do is the Pearson product moment of
correlation or simple Pearson-r. Just like what we do in our previous examples, we need to have
first a data to be analyzed that are already encoded in the data view of SPSS.

Basically, this test statistics is used if you want to determine if the x variable is correlated
or have a relationship in y variable whether it is a positive correlation or a negative correlation or
in other words, the Pearson correlation is one of the test statistics in analyzing if one variable is
associated into another variable. Usually, we compare two quantitative data or the two groups of
data are scale in nature. In doing this test statistics, we will be using the given data in an example
under this statistical test. Let us say the data on the pre-test and in post-test are already encoded
in the variable and data view of SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 1. On the menu bar, click “Analyze” then “Correlate” and look for “Bivariate” and
click the “Bivariate”.

Step 2. When you click the “Bivariate”, another window will appear.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Note: Here, you will see the two groups, the pre-test and post-test that we need to test if the two
groups are correlated or if the pre-test has a relationship or is associated with the post-test. You
will notice also that the default is in Pearson, two-tailed (meaning our assumption is no
relationship) and the flag significant correlation was checked, meaning that SPSS will tell us or it
shows us if there is a significant correlation between the two variables or not.

Step 3. Transfer the two groups of variables, the pre-test and the post-test into the
variables box with the use of an arrow. If you want to have other test statistics to be used such as
mean and others, you may click the “option” button. Another window will appear.

Step 4. Once the window will appear, you could click the “means and standard
deviation” as part of the analysis of your data and then click “continue” and it will go
back to the first window. The next thing that you are going to do is just click “ok”. When
you click the “ok” button, the output view will be displayed.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

With this output, we are now ready to analyse and interpret the result. As you can see on the
bottom part to the correlation table, it has two asterisk **, meaning that the pre-test and post test
have a relationship or they could say that they are statistically related to each other. We could
conclude that “Since the p-value of 0.000 is less than the level of significance of 0.01 on the
r-value of 0.916, we will reject the null hypothesis in favor of alternative hypothesis that is
“There is a significant relationship between the pre-test and the post-test” and there is a high
positive correlation since the r = 0.916. So, we could say that the higher the pre-test, the higher
the post test.

Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
Pearson Product Moment of Correlation

Consider the problem below:

A study was made by a manager of a certain Computer Company to determine the


relationship between the weekly sales and the promotional expenditures. The following
data were recorded:

Promotional Expenses Weekly Sales


(in thousand pesos) (in thousand
pesos)
2.25 3.45
1.75 2.75
2.05 3.05
3.02 4.00
2.09 3.25

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
3.79 4.09
6.23 7.23
3.45 4.45
1.78 2.89

Find the value of r and interpret the result at 0.05 level of significance both
manual computation with the use of test statistics formula and with the use of SPSS.

Critical Values of Pearson Product-Moment of Correlation Coefficient

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

E. The Simple Linear Regression

Simple linear regression is an extension of a Pearson Product Moment of Correlation.


In simple linear regression, we predict scores on one variable from the scores on a second
variable. The variable we are predicting is called the criterion variable and is referred to as Y.
The variable we are basing our predictions on is called the predictor variable and is referred
to as X. When there is only one predictor variable, the prediction method is called simple
regression. In simple linear regression, the topic of this section, the predictions of Y when
plotted as a function of X form a straight line.

Like other parametric tests, it must also meet some conditions. First the data should
be normally distributed using the level of measurement which is expressed in an interval or
ratio data.

Linear regression analysis consists of more than just fitting a linear line through a
cloud of data points. It consists of 3 stages – (1) analyzing the correlation and directionality
of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and
usefulness of the model.

The three major uses for regression analysis are:


(1) causal analysis,
(2) forecasting an effect, and
(3) trend forecasting.

In regression analysis, we are going to consider two variables. If two variables are
correlated, that is if the correlation coefficient (r) is significant, then it is possible to predict
or estimate the value of one variable from the knowledge of the other variable.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Suppose the advertising cost (x) and sales (y) are correlated, then we can predict the
future sales (y) in terms of advertising cost (x).

The formula for the simple linear regression analysis is:

𝑦 = 𝑎 + 𝑏𝑥

Where;
x is the independent or the predictor variable

y is the dependent or the predictand variable

b is the slope of the line

a is the constant value (or this is the intercept value)

Note that if the slope is positive, it means that as x increases, y also increases
while if the slope is negative, it means that as x increases, y decreases. In
addition, if the slope is zero (0), the y would become constant, ie., y = a.

To solve for a and b, we have:

∑𝑦− 𝑏∑𝑥
𝑎= 𝑛

𝑛(∑𝑥𝑦)− ∑𝑥∑𝑦
𝑏= 2
2
𝑛∑𝑥 − (∑𝑥)

Example: Using our previous example under the Pearson Product-Moment of


Correlation, i.e.,” Pre-test and the Post test in Statistics and Probability of the ten students
of CS 3201; if the pre-test of a certain student is 79, what would be his possible score for
his post-test?

Computation:

Here x = 79

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Solving for b first, we have;
𝑛(∑𝑥𝑦)− ∑𝑥∑𝑦
(10)(62259)−(755)(810) 622590−611550
𝑏 = 2 = 2 = 581810−570025
2 (10)(58181)−(755)
𝑛∑𝑥 − (∑𝑥)

11040
= 11785
= 0. 94

Solving for a; we got:

∑𝑦− 𝑏∑𝑥
𝑎= 𝑛

(810)−(0.94)(755) 810−709.7 100.3


𝑎= 10
= 10
= 10
= 10. 03

Using the linear regression formula and solving for y, we got:

𝑦 = 𝑎 + 𝑏𝑥 = 10. 03 + (0. 94)(79)

𝑦 = 10. 03 + 74. 26 = 84. 29 84

𝑦 = 84. 29 𝑜𝑟 84(round-off to whole number)

Analysis and Conclusion:

The computed value of b = 0.94 indicates that the slope of a line is positive or we
have a positive slope. We could say that as the score of the pre-test increases, the
post-test also increases. Based on the value for a and b, if the pre-test of a certain student
is 79, the possible grade for his post-test would be 84.

Simple Linear Regression using SPSS

Once you are familiar with the use of SPSS, doing test statistics in simple linear
regression would be easy for you. Just like other test statistics previously presented, we need to
look for the menu bar of this software for the “Analysis” icon and you will see the different test
statistics that you need. Just click on the analysis icon and drop down in order to look for the
target statistics tool. Of course, to do this, you first need to have data in the data view of your
SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Let us use the data in the illustrative example in the pearson product moment of
correlation and follow the steps in doing this statistical analysis. Assuming that the data are
pre-encoded in SPSS.

Step 1. On the menu bar, look for the “Analyze” and click this icon. Look for the “Regression”
and proceed to “Linear”. Once you click “Linear”, another window will appear.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Step 2. You have to identify the dependent and the independent variable. In this example, will
place the independent variable (the pre-test) in “X-Axis” while the dependent variable (the
post-test) in the “Y-Axis”. Highlight each of these and click the arrow.

Step 3. If you want to have additional information such as descriptive statistics and other
information like the scatter plot diagram, we will look at those important details by clicking on
the appropriate and respective icon. If you click on the “Statistics”, you will see another dialog
box.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Here, the default is the “estimates” as well as the “model fit”. The important details that we need
to look into for the analysis are the “confidence intervals” and basically we use the 95% level of
confidence, and the “Descriptive”. The other icon is used for multi-regression analysis. Then
click “continue”.

Once you click the continue button/icon, you will see the first dialog box. Then click “ok” and
the output view will appear that is needed in our analysis of data.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

On the output view, you will see the table for “Descriptive Statistics” as well as the table for
“Correlations” in Pearson correlation. The numerical value for mean, standard deviation, the
value for Pearson correlation and the p-value are the important details that are needed in the
analysis of our data.
Another important output needed in analyzing the data are as follows and you could also see
these in the output view on the SPSS.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Note: Listen to your instructor on how your instructor analyzes the data with the use of some
important numerical values that are needed in analyzing the data. The only numerical data that
your instructor could use are as follows:

1. On the “Model Summary” table

>R
> R-square
> Standard error of estimate

2. On the ANOVA table


> Sum of Square
> degree of freedom (df)
> p-value (sig)

3. On the “Coefficients” table.

> Unstandardized Coefficient B (Pre-test)


> Standardized Coefficient B (Pre-testa)
> p-value (Pre-test), but if you want to determine the t-value, you could also use
this.

Another important output that you could see is a graph.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Another way in order to determine or to create a graph with the use of SPSS, we will be using
the “graph” icon as what you have learned in our previous lesson. On the menu bar, look for the
“Graphs” icon and proceed to “Legacy Dialogs” and proceed to “Scatter/Dot Diagram” and
follow this sequence of dialog boxes.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

If you click the “continue” button, the result graph will appear.

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS

Name: _________________________ Date:___________

Section:________________________ Score:__________

Practice Exercises
Simple Linear Regression

Consider the problem below:

A study was made by a manager of a certain Computer Company to determine the


relationship between the weekly sales and the promotional expenditures. The following
data were recorded:

Promotional Expenses Weekly Sales


(in thousand pesos) (in thousand
pesos)
2.25 3.45
1.75 2.75
2.05 3.05
3.02 4.00
2.09 3.25
3.79 4.09
6.23 7.23
3.45 4.45
1.78 2.89

Inferential Statistics | Parametric Test


Semester
DATA ANALYSIS
Plot the linear regression and find the possible weekly sales if the
promotional expenses is 4.80 using the manual computation and with the use of
SPSS.

Inferential Statistics | Parametric Test

You might also like