0% found this document useful (0 votes)
12 views

Data Management - Part 3

The document discusses correlation analysis and linear regression. It defines correlation analysis and explains how to construct scatter plots, compute the Pearson correlation coefficient r, and interpret the strength and direction of correlations. Examples are provided to demonstrate how to determine if a relationship exists between two variables and describe it.

Uploaded by

fjkb7yqn5b
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Management - Part 3

The document discusses correlation analysis and linear regression. It defines correlation analysis and explains how to construct scatter plots, compute the Pearson correlation coefficient r, and interpret the strength and direction of correlations. Examples are provided to demonstrate how to determine if a relationship exists between two variables and describe it.

Uploaded by

fjkb7yqn5b
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Data Management

(Part 3)
Mathematics in the Modern World

Mathematics Area
De La Salle Lipa
Contents
1) Correlation Analysis
• Constructing Scatter Plot
• Describing Relationships Using the Pearson Product-
Moment Correlation Coefficient
• Testing the Significance of the Pearson Product-Moment
Correlation Coefficient r
2) Linear Regression
Correlation Analysis
Correlation Analysis
is the process or procedure of describing the
relationship between two variables

The Correlation Coefficient is the statistic used to


report whether two variables are correlated or not and
to what extent and in which direction.

The correlation coefficient ranges from +1 to -1,


where zero means no correlation and 1 means strong
(perfect) correlation
Correlation Analysis

• the relationship between two variables can be


described by constructing a scatter plot

• A Scatter plot is a graphical way of showing the


correlation between two variables. The horizontal
axis represent one variable and the vertical
represents the other variable.
Constructing a Scatter Plot
EXAMPLE
A company with six branches
provides free coffee to its employees. A
manager is interested to find out if
there is a relationship between the
number of cups of coffee provided and
the number of employees in the offices.
The table shows the data needed.
Determine if there is a relationship
between the number of employees and
the number of cups of coffee.
Constructing a Scatter Plot

Notice that the points on the scatter plot do not lie on one line. However,
the points closely follow a straight line. This line is called a trend line.

The relationship between two variables is described in terms of strength


and direction.
Types of Correlation according to
DIRECTION
In terms of direction, the relationship between two variables may be
positive, negative, or zero.
Positive Correlation
• A positive correlation exists if high values in one
variable are associated with high values in another
variable. Similarly, low values in one variable are
associated with low values in the other variable.
• If a positive correlation exists, then the points on
the scatter plot closely follow a straight line
slanting up to the right.
Types of Correlation according to
DIRECTION
In terms of direction, the relationship between two variables may be
positive, negative, or zero.
Negative Correlation
• A negative correlation exists if high values in one
variable are associated with low values in another
variable. Similarly, low values in one variable are
associated with high values in the other variable.
• If a negative correlation exists, then the points on
the scatter plot closely follow a straight line slanting
down to the right.
Types of Correlation according to
DIRECTION
In terms of direction, the relationship between two variables may be
positive, negative, or zero.
Zero/No Correlation
• A zero/no correlation exists when high values in one
variable are associated to either high or low values in
another variable.
• If a zero/no correlation exists, then the points on the
scatter plot are randomly scattered. The points do
not follow closely a straight line.
Types of Correlation according to STRENGTH

A perfect correlation exists when all the points on


the scatter plot lie on a straight line. When the points on
the scatter plot do not lie on a straight line, the
relationship may be very high, high, moderately high, low,
negligible or zero.
Try This! ☺ (Practice Exercise 1)
The following data show the attitude ATTITUDE SCORE ACHIEVEMENT IN
(X) MATHEMATICS (Y)
scores and Mathematics achievement
of a group of students. 48 22
48 19
a. Construct a scatter plot for the
47 20
given data.
46 20
b. Describe the relationship between
46 17
attitude and achievement in
43 21
Mathematics in terms of direction
42 21
and strength based on the scatter
42 19
plot.
41 17
Describing Relationships Using the
Pearson Product-Moment Correlation Coefficient
• The scatter plot is not accurate enough to describe the strength and direction of
relationship between two variables. A more analytical approach to describe the
relationship between two variables is by computing the correlation coefficient.

• To describe the relationship between two variables, we can compute the correlation
coefficient (r). This is another and a more accurate way of determining the kind of
relationship that exists between two variables

• r is a number between -1 and 1 that describes both the strength and the direction of
correlation.
Describing Relationships Using the
Pearson Product-Moment Correlation Coefficient

• If the value of r is 1, 0, or -1, we interpret it as follows.


Describing Relationships Using the
Pearson Product-Moment Correlation Coefficient

• The following scale is used to interpret the other values of r

The value of
correlation coefficient
is usually expressed in
4 decimal places, but
for the sake of
interpretation, we
round it off to 2
decimal places.
Describing Relationships Using the
Pearson Product-Moment Correlation Coefficient

Example:
Value of r Interpretation

- 0.45 Moderate negative correlation

0.66 Moderately High positive correlation


0.83 High positive correlation
-0.35 Low negative correlation
0.58 Moderate positive correlation
Correlation Analysis
Pearson Product-Moment Correlation Coefficient ( r )
• The most common correlation measure
• used for continuous data and when the scores are normally
distributed
• To compute for the PPMC coefficient manually, we use the formula:

where:
x = value of variable x
y = value of variable y
n = number of sample
points/observations
Pearson Product-Moment Correlation Coefficient
Example
A store manager wishes to find out whether there is a relationship
between the age of the employees and the number of sick days they incur
each year. Calculate the correlation coefficient (r) and describe the
relationship in terms of strength and direction.
Pearson Product-Moment Correlation Coefficient
Solution:
A store manager wishes to find out whether there is a relationship between the age of the employees and the number of sick
days they incur each year. Calculate the correlation coefficient (r) and describe the relationship in terms of strength and
direction.

Employee x y xy x^2 y^2


We get the sum of all
A 18 16 288 324 256 these values for the
summations by
B 26 12 312 676 144 adding up the
numbers vertically,
C 39 9 351 1521 81 and use all these
D 48 5 240 2304 25 sums in our formula
later on.
E 53 6 318 2809 36
F 58 2 116 3364 4

෍ 𝑥 = 242 ෍ 𝑦 = 50 ෍ 𝑥𝑦 = 1625 ෍ 𝑥 2 = 10998 ෍ 𝑦 2 = 546


Pearson Product-Moment Correlation Coefficient
Solution:
A store manager wishes to find out whether there is a relationship between the age of the employees and the number of sick days
they incur each year. Calculate the correlation coefficient (r) and describe the relationship in terms of strength and direction.

Used only for


interpretation

Hence, there is high negative correlation between the age of


employees and the number of sick days they incur each year.
How to solve for r-value using calculator?
❑ For Casio Fx 570ms Please watch the video
STEPS: on Pearson's Correlation
Coefficient Using
1. Reset (Shift – Mode – All - =) Calculator Casio Fx 570ms
2. Mode (1 or 2 times) – Reg – Lin on this link
3. Encode the data in pairs https://www.youtube.co
separated by a comma, then m/watch?v=YXHPuEm
press M+ (ex. 18, 16 – M+) ouzM
4. Find the value of r
(Shift – SVAR – r - =)
How to solve for r-value using calculator?
❑ For Casio fx-991Es PLUS Please watch the video on
STEPS: how to compute the
Pearson r coefficient using
1. Reset (Shift – 9 – All - Yes) Casio fx 991ES Plus on this
2. Set up Linear Regression Mode link
(Mode – Stat – A+Bx)
3. Encode the data as they appear
https://www.youtube.com
in the given table.
/watch?v=Zrxz7jgFVos
4. Find the value of r
(Shift – Stat – r -)
Try This! ☺ (Practice Exercise 2)
Describe the strength and direction of the Score in English Score in2 Filpino
(X) (Y)
correlation of the following pairs of
variables. Compute the value of the 16 4

correlation coefficient (r). 14 6


10 4
1. A language teacher is interested to find
9 8
out whether students who are good in
8 7
English are also good in Filipino. The
8 8
following sample data have been
7 10
obtained. Determine if there is a
6 9
relationship between performances in
4 14
English and in Filipino.
2 12
Try This! ☺ (Practice Exercise 2)
Describe the strength and direction of the Attitude Score Achievement in
(X) Math
correlation of the following pairs of (Y)
variables. Compute the value of the 48 22
correlation coefficient (r). 48 19
2. A mathematics teacher would like to 47 20
find out if the students’ attitude toward 46 20
mathematics is related to their 46 17
achievement in the subject. The following 43 21
sample data have been obtained. 42 21
Determine if there is a relationship 42 19
between the two variables. 41 17
Testing the Significance
• The correlation coefficient tells us the strength and direction of
relationship between two variables based on the sample data.
• However, we are not sure whether the relationship really exists in
the population where the sample has been obtained.
• The relationship is only true for the sample used.
• If the correlation is significant, then we can conclude that the
relationship really exists in the population.
• It is possible that the very high correlation is just due to chance
only. So there is a need to test the significance of the correlation
coefficient.
Testing the Significance

1. Compute the correlation coefficient (r).

2. Test the significance of the correlation coefficient,


following the steps in hypothesis testing.
Testing the Significance
The existence of correlation between two variables can be
ascertained by testing the significance, using t-test. The test statistic for
testing the significance of r is given by:

where r = correlation coefficient


n = sample size
df = n-2
Testing the Significance
Example
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is
related to the travel time they are delivered. The following data have been obtained from past
experiences. Test the significance of the correlation coefficient at 0.05 level of significance.
1. Compute (r)
NO. OF Cases of Soft
Drinks (X)
Travel Time in
Minutes (Y) r = 0.1042

24 21
6 3
For the sample data, there is a negligible correlation
16 6
between the number of cases of soft drinks ordered and
64 15
the travel time they are delivered.
10 21
25 61
35 20
Testing the Significance
Example
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is
related to the travel time they are delivered. The following data have been obtained from past
experiences. Test the significance of the correlation coefficient at 0.05 level of significance.

NO. OF Cases of Soft Travel Time in


Drinks (X) Minutes (Y)
2. Testing the Significance (Use the 5 steps in Hypothesis Testing)
24 21 Step 1) Ho: There is no significant relationship between the number of
cases of soft drinks ordered and the travel time they are delivered. (r = 0)
6 3
16 6 Ha: There is a significant relationship between the number of
64 15 cases of soft drinks ordered and the travel time they are delivered. ( )
10 21
25 61
35 20
Testing the Significance
Example
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is
related to the travel time they are delivered. The following data have been obtained from past
experiences. Test the significance of the correlation coefficient at 0.05 level of significance.

NO. OF Cases of Soft Travel Time in 2. Testing the Significance (Use the 5 steps in Hypothesis Testing)
Drinks (X) Minutes (Y)

Step 2) Use t-test to test the significance of r. Identify the critical values (CV).
24 21
Level of significance =0.05; 2-tailed,
6 3
df= n – 2 = 7-2=5
16 6 Tabular Values or CV= +-2.571
64 15
Note that we are using t-test in
10 21
testing the significance of r thus, we
25 61 shall use the t-table in identifying
35 20 the critical values.
Testing the Significance
Example
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is
related to the travel time they are delivered. The following data have been obtained from past
experiences. Test the significance of the correlation coefficient at 0.05 level of significance.

NO. OF Cases of Soft Travel Time in


Drinks (X) Minutes (Y) 2. Testing the Significance (Use the 5 steps in Hypothesis Testing()

24 21 Step 3) Compute the test value.


6 3
16 6
64 15
10 21
25 61
35 20
Testing the Significance
Example
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is
related to the travel time they are delivered. The following data have been obtained from past
experiences. Test the significance of the correlation coefficient at 0.05 level of significance.

NO. OF Cases of Soft Travel Time in 2. Testing the Significance (Use the 5 steps in Hypothesis Testing)
Drinks (X) Minutes (Y)

Step 4) Make a decision whether to accept or reject the null hypothesis.


24 21 ⮚ /tcomputed/> CV, reject Ho, accept Ha
6 3
Since the absolute value (0.2343) is less than the absolute value
16 6 of the tabular or critical value (2.571), accept the null hypothesis.
64 15 Step 5) Make a conclusion.
10 21 There is no significant relationship between the number of cases
25 61 ordered and the travel time that they are delivered.
35 20
Regression Analysis
• designed to help us determine the probability that are inferences are
sound.
• It helps us test the degree to which the dependent variable is affected by
the independent variable
• If there is no significant linear correlation, do not use the regression
equation to make predictions.
Regression Equation
It is an error-free equation used to predict the value of y. It is an
equation for perfect correlations.
Formula: y = a + bx
(exact relationship)
a – the value of the mean of Y when X = 0 hence the name intercept
b – gives the amount of change in the mean of Y (whether positive or negative, depending on the
sign) for every unit increase in the value of X, hence the name slope
Regression Equation
Try This! ☺ (Practice Exercise 3)
The following table shows the assessed values and the selling prices of eight
houses, constituting a random sample of all houses sold recently in a Metropolitan
Area. Compute the correlation coefficient of these two variables then test its
significance at 0.05. Construct a linear regression equation and draw a conclusion
based on the results.
Try This! ☺ (Practice Exercise 4)
The following table shows data that describe the test scores of students
in Mathematics in relation to their test scores in Physics. Compute the
correlation coefficient of these two variables then test its significance at 0.01.
Construct a linear regression equation and draw a conclusion based on the
results.

You might also like