0% found this document useful (0 votes)
11 views

Regression and Correlation

Regression and Correlation PPT

Uploaded by

Sejo Gonzales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Regression and Correlation

Regression and Correlation PPT

Uploaded by

Sejo Gonzales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Correlation and

Regression Analysis
Correlation

Correlation refers to the departure of two


random variables from independence.

Pearson product-moment correlation (PPMC)


is the most widely used in statistics to
measure the degree of the relationship
between the linear related variables.

The correlation coefficient is defined as the


covariance divided by the standard deviations
of the variables.
Pearson product-moment correlation

Pearson’s product-moment correlation


coefficient of simply correlation coefficient
(or Pearson’s r) is a measure of the linear
strength of the association between two
variables.

 Founded by Karl Pearson.

 The value of the correlation coefficient


varies between +1 and –1.
Correlation Coefficient
Y Variable

Y Variable
X Variable X Variable

Perfect Positive Correlation (r = 1.00) Perfect Negative Correlation (r = -1.00)

Y Variable
Y Variable

X Variable X Variable

Positive Correlation (r = 0.80) Negative Correlation (r = -0.80)


Correlation Coefficient

Y Variable
Y Variable

X Variable X Variable

Zero Correlation (r = 0.00) Non-Linear Correlation (r = -0.00)


Pearson product-moment correlation

N XY − (  X )( Y)
r=
[N(  X 2 ) − (  X )2 ][N(  Y 2 ) − (  Y)2 ]

Test of Significance

r N−2
t=
1− r2

df = n – 2
Correlation Coefficient & Strength of Relationships

0.00 – no correlation, no relationship


±0.01 to ±0.20 – very slight correlation, almost negligible
relationship
±0.21 to ±0.40 – slight correlation, definite but small relationship

±0.41 to ±0.70 – moderate correlation, substantial relationship

±0.71 to ±0.90 – high correlation, marked relationship

±0.91 to ±0.99 – very high correlation, very dependable relationship

±1.00 – perfect correlation, perfect relationship


Assumptions

Subjects are randomly selected and


independently assigned to groups.

Both populations are normally distributed.


Procedure for Pearson Product-Moment Corr. test

 Set up the hypotheses.


H0:  = 0 (The correlation in the population is zero.)
H1:   0,   0,   0 (The correlation in the
population is different from zero.)
 Calculate the value of Pearson’s r.

 Calculate the value of t value.

 Statistical decision for hypothesis testing

If tcomputed  tcritical, do not reject H0.


If tcomputed  tcritical, reject H0.
Example 1: Pearson r

The owner of a chain of fruit shake stores would like to study


the correlation between atmospheric temperature and sales
during the summer season. A random sample of 12 days is
selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 147 143 147 168 206 155 192 211 209 187 200 150

Plot the data on a scatter diagram. Does it appear there is a


relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the 0.05
significance level whether the correlation in the population is
greater than zero.
Scatter Plot

220
210
200
190
Sales (Y)

180
170
160
150
140
130
70 75 80 85 90 95 100
Temperature (X)
Solution 1:
Step 1: State the hypotheses.
H0: r = 0
There is no correlation between atmospheric
temperature and total sales of fruit shake.
H1: r  0
There is a correlation between atmospheric
temperature and total sales of fruit shake.
Step 2: Level of significance is α = 0.05.

Step 3: df = n–2 = 12 – 2 = 10 & t critical value is 2.228.

Step 4: Compute the Pearson’s r.


Table
X(temp) Y(sales)
Day X2 Y2 XY
1 79 147 6,241 21,609 11,613  X = 1,029
2 76 143 5,776 20,449 10,868
3 78 147 6,084 21,609 11,466  Y = 2,115
4 84 168 7,056 28,224 14,112
5 90 206 8,100 42,436 18,540
X 2
= 88,733

6 83 155 6,889 24,025 12,865  = 380,887


Y 2

7 93 192 8,649 36,864 17,856


8 94 211 8,836 44,521 19,834  XY = 183,222
9 97 209 9,409 43,681 20,273
10 85 187 7,225 34,969 15,895
11 88 200 7,744 40,000 17,600
12 82 150 6,724 22,500 12,300
Total 1,029 2,115 88,733 380,887 183,222
Computation of Pearson’s r

N XY − ( X)( Y)
r=
[N( X 2 ) − ( X) 2 ][N( Y 2 ) − ( Y) 2 ]

12(183,222) − (1,029)(2 ,115)


=
[12(88,733) − (1,029) 2 ][12( 380,887) − ( 2 ,115) 2 ]

= 0.93

The atmospheric temperature and total sales


indicates a very high positive correlation (very
dependable relationship)–that is an increased in
atmospheric temperature is highly associated with
the increased in total sales of fruit shake.
Solution 2:
Step 5: Decision rule.
r N−2 0.93 12 − 2 0.93( 3.16227766) 2.940918224
t= = = = = 8.00
1− r2 1 − ( 0.93) 2 1 − 0.8649 0.367559519

Reject H0

0 +1.812 8.00

Step 6: Conclusion.
We can conclude that there is evidence that
shows significant association between the
atmospheric temperature and the total sales of
fruit shake.
Example 2: Spearman Rank

The owner of a chain of fruit shake stores would like to study


the correlation between atmospheric temperature and sales
during the summer season. A random sample of 12 days is
selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 147 143 147 168 206 155 192 211 209 187 200 150

Plot the data on a scatter diagram. Does it appear there is a


relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the 0.05
significance level whether the correlation in the population is
greater than zero.
Scatter Plot

14

12

10
Rank of Y

0
0 2 4 6 8 10 12 14
Rank of X
Solution 2:
Step 1: State the hypotheses.
H0:  = 0
There is no correlation between atmospheric
temperature and total sales of fruit shake.
H1:   0
There is a correlation between atmospheric
temperature and total sales of fruit shake.
Step 2: Level of significance is α = 0.05.

Step 3: df = n–2 = 12 – 2 = 10 & t critical value is 1.812

Step 4: Compute the .


Table

Day X Y RX RY D D2
1 79 147 10 10.5 –0.5 0.25
2 76 143 12 12 0 0
3 78 147 11 10.5 0.5 0.25
4 84 168 7 7 0 0
5 90 206 4 3 1 1
6 83 155 8 8 0 0
7 93 192 3 5 –2 4
8 94 211 2 1 1 1
9 97 209 1 2 –1 1
10 85 187 6 6 0 0
11 88 200 5 4 1 1
12 82 150 9 9 0 0
Total 0 D 2
= 8.5
Computation of 

6 D2
 = 1−
N( N2 − 1)
6(8.5) 51
= 1− = 1− = 1 − 0.03 = 0.97
12(12 − 1)
2
12(143)

The atmospheric temperature and total sales


indicates a very high positive correlation (very
dependable relationship)–that is an increased in
atmospheric temperature is highly associated with
the increased in total sales of fruit shake.
Solution 2:

Step 5: Decision rule.


 N−2 0.97 12 − 2 0.97( 3.16227766) 3.06740933
t= = = = = 12.62
1− 2 1 − (0.97) 2 1 − 0.9409 0.243104915

Reject H0

0 +1.812 12.62

Step 6: Conclusion.
We can conclude that there is evidence that
shows significant association between the
atmospheric temperature and the total sales of
fruit shake.
Simple Regression Equation

Regression analysis is a simple statistical tool


used to model the dependence of a variable
on one (or more) explanatory variables.

A simple linear regression is the least


estimator of a linear regression model with a
single predictor (or one independent variable)

The least square model determines a


regression equation by minimizing the sum of
squares of the vertical distances between the
actual Y values and the predicted values of Y.
Assumptions of Linear Regression Equation

Linearity – The mean of each error component


is zero.
Independence of Error Terms – The errors are
independent of each other.

Normally Distributed Error Terms – Each error


component (random variable) follows an
approximate normal distribution.

Homoscedasticity – The variance of the error


components is the same for each value of the
independent variable.
Concept of Linear Regression is
the simplest presentation of a
relationship of a two-variable
model. These two variables are
usually expressed in the form of an
equation.
Example:
1. the quantity demanded is related to price.
2. quantity produced in a factory is related to
the production cost.
3. the number of students for a given semester
may be related to tuition fee.
4. expenditures of a certain household is
related to income and so on.

=> the simplest presentation of the estimation


of these relationships is given by the linear
equation….
Y = a + bX
Steps to obtained the a’ and b’
1. Rewrite the paired values in vertical column.
2.Get the sum of the values in each column.
3.Multiply the value of x by its corresponding y
value to obtain the xy column.
4.Add the values in xy column to get _______
5.Square x values to get the x2 column.
6.Add the values x2 column to obtain ________.
7.Apply the formula:
b’ =
a’ =
8.Solve for the value of a’ by using the any of the
2 equations.
Estimating the Coefficient

Predicted or fitted value of Y.


Ŷ = b1X + b0

Slope of the regression line


N(  XY ) − (  X)( Y)
b1 =
N(  X 2 ) − (  X) 2

Intercept of the regression line


b 0 = Y − b1X
Determine the linear regression equation y = a + b x

Year Cost Sales X2 XY


1995 15 38.0 225 570
1996 30 53.3 900 1599
1997 16 60.0 256 960
1998 39 72.0 1521 2808
1999 20 40.0 400 800
2000 36 47.5 1296 1710
2001 45 82.0 2025 3690
2002 10 21.5 100 215
2003 13 25.0 169 325

total 224 439.3 6,892 12,677


ANSWER:
Slope
b’ = 1 .32
a’ = 15.86
Y = a + bx
Y = 15.8629 + 1.32X linear regression equation
Example # 2
Consider the linear equation
obtained in the # 1 example. Give y’ =
15.86 + 1.32 X, where x and y’ are
annual cost and expected annual sales
respectively, determine the expected
sales when the cost is
a. P23.54
b. P16.20
ANSWER:

Y = 15.8629 + 1.32X linear regression equation

a. Y = 15.8629 + 1.32 (23.54)


Y = 46.94

b. Y = 15.8629 + 1.32 (16.20)


Y = 37.25

You might also like