0% found this document useful (0 votes)
2 views

Topic 7.1_Correlation and Simple Linear Regression

The document discusses correlation analysis, simple linear regression, and model building in psychology, emphasizing the relationship between variables. It explains the coefficient of correlation (r), its interpretation, and how to use least squares to determine a regression equation. An example involving sales calls and copier sales illustrates the concepts and calculations involved in establishing a predictive model.

Uploaded by

tc458gxq6p
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Topic 7.1_Correlation and Simple Linear Regression

The document discusses correlation analysis, simple linear regression, and model building in psychology, emphasizing the relationship between variables. It explains the coefficient of correlation (r), its interpretation, and how to use least squares to determine a regression equation. An example involving sales calls and copier sales illustrates the concepts and calculations involved in establishing a predictive model.

Uploaded by

tc458gxq6p
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

PSY2032 Statistical Methods in Psychology I

Topic 7

Correlation,
Simple Linear Regression, and
Model Building

[7.1] What is Correlation Analysis?

[7.2] The Coefficient of Correlation (r)

[7.3] Simple Linear Regression


7.3.1 Least Squares Principle
7.3.2 Drawing the Line of Regression
7.1 What is Correlation Analysis?
 Example: Is there a relationship between the number of hours that students
studies for an exam and the score earned?

 Correlation Analysis is the study of the relationship between variables. It is a


group of techniques to measure the association between two variables.

 The basic idea of correlation analysis is to report the association between two
variables. The usual first step is to plot the data in a scatter diagram.

Example
Copier Sales of America sells copier to businesses of all sizes throughout the United
States and Canada. Ms Marcy Bancer was recently promoted to the position of
national sale manager. At the upcoming sales meeting, the sales representatives
from all over the country will be in attendance. She would like to impress upon
them the importance of making that extra sales call each day. She decides to gather
some information on the relationship between the number of sales calls and the
number of copier sold.
She selected a random sample of 10 sales representatives and determined the
number of sales calls they made last month and the number of copiers they sold.
The sale information is reported in the following table.
What observations can you make about the relationship between the number of sales
calls and the number of copiers sold? Develop a scatter diagram to display the
information.
Sales Number of Sales Number of Copiers
Representative Calls (X) Sold (Y)
Tom Keller 20 30
Jeff Hall 40 60
Brian Virost 20 40
Greg Fish 30 60
Susan Welch 10 30
Carlos Ramirez 10 40
Rich Niles 20 40
Mike Kiel 20 50
Mark Reynolds 20 30
Soni Jones 30 70
 Based on the information in table, Ms. Bancer suspects there is relationship
between the number of sales calls made in a month and the number of copiers
sold.
 Soni Jones sold the most copiers last month, and she was one of three
representatives making 30 or more sales call.
 Susan Welch and Carlos Ramirez made only 10 calls last month. Ms. Welch
had the lowest number of copiers sold among the sampled representatives.
 The implication is that the number of copies sold is related to the number of sales
calls made. As the number of sales calls increases, it appears the number of
copiers sold also increases.
 Common Practice to draw the scatter diagram
 Independent variable (number of sales calls) – Horizontal or X-axis
 Dependent variable (copiers sold) - Vertical or Y-axis
Independent variable – the variable that provides the basis for estimation.
It is the predictor variable = the number of sales calls
Dependent variable – the variable that is being predicted or estimated
= the number of copiers sold
 The scatter diagram shows graphically that the sales representatives who make
more calls tend to sell more copiers.
 Note that while there appears to be a positive relationship between the two
variables, all the points do not fall on a line.
 In the following section you will measure the strength and direction of this
relationship between two variables by determining the coefficient of correlation.
7.2 The Coefficient of Correlation (r)
 Coefficient of Correlation – describes the strength of the relationship between
two sets of interval-scaled or ratio-scaled variables.
ρ Population coefficient of correlation
r Sample coefficient of correlation
  1.00  r  1.00
 A correlation coefficient of -1 or +1 indicates perfect correlation.
 If there is absolutely no linear relationship between the two sets of variables,
Person’s r is zero.
 A coefficient of correlation r close to 0 (say 0.08) shows that the linear
relationship is quite weak. The same conclusion is drawn if r = -0.08.
 Coefficients of -0.91 and +0.91 have equal strength; both indicate very strong
correlation between the two variables. Thus the strength of the correlation does
not depend on the direction (either + or -).

= CORREL (x, y)
 Coefficient of Correlation (r) – A measure of the strength of the linear
relationship between two variables.
 The sample coefficient of correlation is identified by the lower-case letter (r).
 It shows the direction and strength of the linear (straight line) relationship
between two variables.
Sales Representative Calls (X) Sales (Y) XX YY X  X Y  Y 
Tom Keller 20 30 -2 -15 30
Jeff Hall 40 60 18 15 270
Brian Virost 20 40 -2 -5 10
Greg Fish 30 60 8 15 120
Susan Welch 10 30 -12 -15 180
Carlos Ramirez 10 40 -12 -5 60
Rich Niles 20 40 -2 -5 10
Mike Kiel 20 50 -2 5 -10
Mark Reynolds 20 30 -2 -15 30
Soni Jones 30 70 8 25 200
X  22 Y  45  X  X (Y  Y) = 900

r
 (X  X)(Y  Y)

900
 0.759
(n  1)SXSY (10  1)(9.189)(14.337)
 Positive, it confirms our reasoning based on the scatter diagram, fairly close to 1, so the association
is strong.
Positive Correlation
1 2
( x - x, y – y )

( x, y )

3 4

In the quadrant [2], both (x – X ) (+) and (Y – Y ) (+) will be positive (++=+),
Clare Morris: Quantitative Approaches in Business Studies, 6/e © Clare Morris 2003

while in the quadrant [3], both (x – X ) (-) and (Y – Y ) (-) then (--=+) will be
positive.
The products (x – X ) (Y – Y ) will therefore nearly all be positive, as will the sum
∑(x – X ) (Y - Y ) over all the points.
Negative Correlation No Linear Relationship – Zero Correlation

1 2 1 2

3 4
3 4

Clare Morris: Quantitative Approaches in Business Studies, 6/e © Clare Morris 2003 For no correlation, the points are pretty
In the quadrant [1], where (x – X ) is negative Clare Morris: Quantitative Approaches in Business Studies, 6/e © Clare Morris 2003
uniformly scattered throughout all four
and (Y – Y ) is positive. (-+=-) while in the
quadrants, so the product (x – X )(Y – Y )
quadrant [4], where (x – X ) is positive and will be fairly evenly balanced between
(Y – Y ) is negative. positive and negative.

Thus, when we sum them, the positives and


(+-=-), so the products (x – X ) (Y – Y ) and negatives will tend to balance out, so that the
total will be close to zero.
the sum ∑ (x – X ) (Y - Y ) over all the
Sample Correlation Coefficient (r)
Covariance

r
 (X  X)(Y  Y)

 XY  nXY
(n  1)SX S Y ( n  1) S x S y

Correlation and Causation


 If there is a strong relationship between two variables, we are tempted to assume
that an increase or decrease in one variable causes a change in the other variable.
 However, strong correlation but no causality - Spurious correlations.
 What we can conclude when we find two variables with a strong correlation is
that there is a relationship or association between the two variables, not that a
change in one causes a change in the other.

Pearson Correlation – SPSS


https://www.youtube.com/watch?v=VOI5IlHfZVE
Exercise
Employee Y X1 X2
Annual Salary Years of experience Years of Postsecondary
($ 000) Education
1 54.9 5.5 4.0
2 60.5 9.0 6.0
3 58.9 6.0 5.0
4 59.0 8.0 5.5
5 57.5 6.5 5.0

Which factor (X1 or X2) has a higher correlation with Annual Salary (Y)?

Y X1 X2
Y 1
X1 0.813164 1
X2 0.962216 0.924995 1
7.3 Simple Linear Regression
 In this section we wish to develop an equation to express the linear relationship
between two variables.
 The technique used to develop the equation and provide the estimates is called
regression analysis.
 Regression Analysis – An equation that expresses the linear relationship between
two variables.

7.3.1 Least Squares Principle


 Disadvantage: its
position is based in part
on the judgment of the
person drawing the line.
 All the lines except line
A seem to be reasonable.

 The scatter diagram is reproduced with a line drawn with a ruler through the dots
to illustrate that a straight line would probably fit the data.
 Judgment is eliminated by determining the regression line using mathematical
method called the least squares principle, this method gives us the “best-fitting”
line.
 Least Squares Principle – Determining a regression equation by minimizing
the sum of the squares of the vertical distances between the actual Y values and
the predicted value of Yˆ .

The Least Squares Line


 This plot (X=3; Y=8) deviates by 2 from the line,
found by 10-8
 The deviation squared is 4
 The squared deviation for the plot X=4, Y=18 is 16
 The squared deviation for the plot X=5, Y=16 is 4
 The sum of the squared deviations is 24, found by
4+16+4
 General form of linear regression equation

ŷ  a  b x
Where
 ŷ read Y hat, is the predicted value of the Y variable for a selected X value.

 a is the Y-intercept. It is the estimated value of Y when X = 0.


Another way to put it is: “a” is the estimated value of Y where the regression
line crosses the Y-axis when X is zero.

 b is the slope of the line, or the average change in ŷ for each change of one
unit (either increase or decrease) in the independent variable x.

 x is any value of the independent variable that is selected.


The formulas for a and b are:

 Slope of the regression line:


Sy
br
Sx
r is the correlation coefficient
 Sy is the standard deviation of Y (the dependent variable)
 Sx is the standard deviation of X (the independent variable)

 Y intercept:
a  Y  bX

 Y is the mean of Y (the dependent variable)


 X is the mean of X (the independent variable)
Recall the example involving Copier Sales of America. The sales manager gathered
information on the number of sales calls made and the number of copiers sold for a
random sample of 10 sales representatives. As a part of her presentation at the upcoming
sales meeting, Ms. Bancer, the sales manager, would like to offer specific information
about the relationship between the number of sales calls and the number of copiers sold.

Use the least square method to determine a linear equation to express the relationship
between the two variables. What is the expected number of copiers sold by a
representative who made 20 calls?

Sales Representative Calls (X) Sales (Y) XX YY X  X Y  Y 


Tom Keller 20 30 -2 -15 30
Jeff Hall 40 60 18 15 270
Brian Virost 20 40 -2 -5 10
Greg Fish 30 60 8 15 120
Susan Welch 10 30 -12 -15 180
Carlos Ramirez 10 40 -12 -5 60
Rich Niles 20 40 -2 -5 10
Mike Kiel 20 50 -2 5 -10
Mark Reynolds 20 30 -2 -15 30
Soni Jones 30 70 8 25 200
X  22 Y  45  X  X (Y  Y) = 900
Solution

The calculations necessary to determine the regression equation are:

r
 (X  X)(Y  Y)

900
 0.759
(n  1)SXSY (10  1)(9.189)(14.337)

 Sy   14.337 
 
b  r   0.759   1.1842
 Sx   9.189 

a  Y  bX  45 - (1.1843)22  18.9476

 Thus, the regression equation is ŷ  18.9476  1.1842 x , and it can be shown on the
scatter diagram.
7.3.2 Drawing the Line of Regression

ŷ  18.9476  1.1842 x

ŷ  42.6316
 The a value of 18.9476 is the point where the equation crosses the Y-axis. A
literal translation is that if no sales calls are made, that is, X = 0, 18.9476 copiers
will be sold.

 The b value of 1.1842 means that for each additional sales call made the sales
representative can expect to increase the number of copier sold by about 1.2.

 The regression equation is ŷ  18.9476  1.1842 x . If a salesperson makes 20 calls,


he or she can expect to sell ŷ  18.9476  1.1842(20)  42.6316 copiers.

How to Use SPSS: Simple Linear Regression


https://www.youtube.com/watch?v=xp4Sffz5bbA

You might also like