MODULE 11
Linear Regression and Correlation
Lesson 1. Coefficient of Correlation
The coefficient of correlation measures the strength of the association between two
variables. It describes the strength of the relationship between two sets of interval-scaled or
ratio-scaled variables.
Correlation Analysis. A group of techniques to measure the strength of the association
between two variables.
Dependent Variable. The variable that is being predicted or estimated
Independent Variable. A variable that provides the basis for estimation. Its the predictor
variable.
In coefficient of correlation the following must be present:
Both variables must be at least interval scale of measurement
1. The coefficient of correlation can range from -1.00 to +1.00
2. If the correlation between two variables is zero (0), there is no association between
them. .
3, A value of 1.00 indicates a perfect positive correlation, and ~ 1.00 perfect negative
tion. a
4 sane sign means there is adirect relationship between the variables, and anegative
"Ser ‘means there is an inverse relationship,
sign theletter r and found by the following equation:
5. It is designated by
[nEX¥]-[EXZY]
REY? - (LY
= fnex? - (2x)*1AEY? ~ (yy)
sllustrate the step-by-step procedure for obtaining a correlation coefficient
Example 1. To ilustit hip between years of school completed (X) and prejudice (¥) as
let us examine the rel
found inthe following sample of 10 respondents
133xy
Respondent x ¥ Y oa
; 0
A 10 100 u 2
B 3 9 7 v4 a
c 12 144 2 is 44
D 121 4 a 30
E 6 36 5 z Sy
F 8 64 4 16 a
6 4 196 1 i a
H 9 8 2 ft a
1 10 100 3 9
J 2 4 10 100 20
Sum (5) as [ass [ao | 25 | as
Using the above-cited formula
Step 1.
1 Step 2.
Step 3.
Step 4.
Step 5.
134
Determine the mean of X and y
¥=85/10=8.5; = 39/10=3.9
Square X and Y and determine the sum of each
5; LY*= 225
=z
Multiple X and Y, then get their sum.
EXY =243
Get the sum of X and the sum of ¥.
ix=85; LY=39
Substitute the data to the formula
= [(10)(855) - (85)*[(10)(225) - Gop)
10(243) ~ (85)(39)
2,430 ~ 3,315
= 918550 ~ 7,225][2,550 — 1,521]
885
[1,325][1,029}
-885
= 11363425
885The result sh
shows a very high negative correlation. This means here an inverse
relationship between the very high negati ation, This means that there an im
Example 2. Computation of Correlati
partocwestice Wisden Coefficient for Job Aptitude test (X) and Job
Worker x = 7 r 77
. 65 4,225 90 8.100 5,850
c 60 3,600 95 9,025 5,700
1 62 3.844 82 6,724 5,084
89 3,481 a7 7,569 5,133
E 58 3,364 80 6,400 4,640
E 53 2,890 75 5,625 3,975
g 50 2,500 60 3,600 3,000
4 48 2,304 69 4,761 3312
1 45 2,025 60 3,600 2,700
J 40 4,800 72 5,184 2,880
‘Sum (Z) 540 29,752 770 60.588 42,274
Step 1. Determine the mean of X and Y.
540 770
¥=—=54 =—=77
10 10
Step 2. Determine the square of X and Y and determine the sum of each.
EX? = 29,752; ZY? = 60,588
Step 3. Determine the sum of the product of X and Y.
UXY = 42,274
Step 4. Get the sum of X and the sum of Y.
yx=540, 9 LY=770
Step 5. Substitute to the formula the data from steps 1 - 4.
10(42,274) - (540)(770)
r= a{{(10)(29,752) - (540)"1[10(60,588) - (770)2]
422,740 - 415,800
= ,{[297,520-291,600][605,880-592,900]
6,940 : 6,940
= ffs.9201112,980] = g7¢593 § 5= 979?
135on which means that there is a direc,
The result reveals a very high positive correl nel
relationship between the two variables. It means, as x increases, y Increa
y decreases also.
1d as x decreases
Lesson 2. Testing the Significance of the Correlation
Coefficient
‘The correlation can be tested to find out if the computed correlation coefficient
come from a population of paired observations with zero correlation. To do this we have to
compute for the t ratio, To do this we have to formulate the null hypothesis and the research
hypothesis.
H,: p= 0 (the correlation in the population is zero)
H.:p 0 (the correlation in the population is different from zero)
This is a two-tailed test. The following formula will be used:
: t test for the
coefficient of 7 fe
at n= 2 degrees of freedom (11-2)
’ Using the 5% level of significance, the. decision rule states that ifthe computed ¢ falls
in the area between plus 2.306 and minus 2,306,
value 2.306 is found in the table of critical values
the null hypothesis is not rejected. (This
degrees of freedom.) .
at 5% level of significance wit
For r=.792 we substitute this to the
-792N10-2 _ .792(2.828)
T= (79De Ol
Based from the above computation, t= 3.67 ig
‘means that t falls on the rejection region. Thus, F
with 8 degrees of freedom.
higher than the critical tof 2
ln iS rejected at 5% level of
136EXERCISE 13
Coefficient of Correlation
Name Rating ___—
Course & Curriculum Year _______ Date.
For the following problems determine the:
a. Dependent and independent variables.
b. Coefficient of correlation
¢. Interpret the strength of the correlation coefficient
1. The following sample were randomly selected.
X% 4 5 3 6 10
46577
1372. ACC Appliance stores has outlets in several places in Albay. The
Blans to air a camcorder television commercial on selected local stations at least twice
I sales manager {
Prior to a gigantic sale starting on Saturday and ending Sunday. She plans to get the
{te for Saturday-Sunday camcorder sales atthe various outlets and pair them with
the number of times the advertisement was shown on the local TV stations. The basic
Purpose of this research is to find whether there is any relationship between the number
of times the advertisement was aired and camcorder sales. The pairings are:
Sunday Sales
Location of TV Station Number of Airings ame thousands)
—
Daraga 4 ib
Legazpi city 2 fe
Bitano, Legazpi city 5 21
Washngton Drive 6 i
Tabaco City 3 W.
Total 20 85
138{ 3. The owner of Dana Motors wants to study the relationship between the age of a car and
{its selling price. Listed below is a random sample of 12 used cars sold at Dana Motors
1 during the last year.
: Selling Price Selling Price
| Gar | Ave inyears) | enpao0 car | Age(inyears) |” phpo00)
| 9 84 7 8 76
else 7 60 8 tt 80
ty} 36 9 10 8.0
i|¢ 40 10 2 60
i] § 5.0 1 6 86
i Ls 7 10.0 12 6 8.0
1394. The Production Dept. of ABC Electronics wa
the number of employees who assemble
As an experiment, two employe
produced 15 during a one-hour period. ‘Tl SF tuenalie lien
produced 25 during a one-hour period. The complete set of paired observa vs
subassembly
to explore the relationship between |
nd the number produced, }
ned to assemble the subassemblies. They {
n four employees assembled them. They {
Number of Assemblers
One-Hour Production
(Units)
15
25
10
40
30
1405
‘The city co
he city council of Prince City és considering the increasing the nu
an eflort to reduce
Uo reduce crime, Before making the final decision, the council
of police to si
vey other cities of similar size to observe the
ner of police in
asks the Chief
ionship between the
number of pe i
ber of police and the number of crimes reported. ‘The Chief gathered the following,
information,
City Number of Police Number of Crimes
A 15 7
8 17 13
¢ 25 5
o 7 7
E 7 7
F 12 a
G 1" 19
4 22 6
Total 146. 95
141142Lesson 3. Linear Regression
Regression Anal
knowledge of values of
is is a technique employed in predicting values of one (¥) from
ther variable (X).
Regression Eq
n that defines the relationship between two
variables,
jon is an equi
Simple regression, is the most elementary regression model involving, two variables
in which one variable is predicted by another variable, In simple regression the variable to
be predicted is called the dependent variable and is designated as y. The predictor is called
the independent variable, ot explanatory variable and is designated as x. The equation for
simple regression is ‘
Y = atbx (11-3)
Where Y = the predicted value of y
a = y-intercept
b= the slope of the line
x = isany value of the independent variable that is selected
Formula for the slope of the line
F saline n(EXY) ~ (ZX)(ZY) ara)
Slope ofthe regression line b=
Y-Axisintercept a (11-3)
Example]: Predict the cost of flying a commercial plane using a regression line
Airline Cost Data
Number ot Passengers | Cost ($1,000)
61 4,280
63 4,080
87 4,420
69 4,170
70 4,480
74 4,300
76 4,820
81 4,700
86 5,110
at 5,130
5 5,640
97 5,560
143,Step 1. Determine the: EXY; YX; YY; and EX
x 7 ¥ xy
a 4,280 3,721 261,080
63 4,080 3,969 257,040
67 4.420 4,489 296,140
69 4,170 4761 287.730
70 4480 4,900 313,600
74 4,300 5,476 318,200
76 4,820 5,776 366,320
81 4.700 6561 380,700
86 5.110 7,396 439,460
3 5,130 8.281 466,830
95 5,640 9,025 535,800
97 5,560 9,409 539,320
X= 930 DY =56 690 Dxe= 73,764 | SkY= 4,462,220
4 ~
Step2. Compute = 1262220) ~(930)(56.690)
12(73,764) - (930)?
53,546,640 - 52,721,700
~ “885,168 -864,900
b= 40.70
Step 3. Determine a
56,690 930
a aT: (40.70) zr
= 4,724,167 - 3,154.25
a = 1,569.917
Step 4. Formulate the Regression Equation
P= 1,569,917 + 40.7X
144Step 5. Predict the cost (’) given the number of passengers (X)
x55 68 34 eS
4397517 __ [2.953.717
¥ = 1.569.917 + 40.7 (55) = 3,808.417
¥= 1,569.917 + 40.7(68)
¥ = 1,569.917 + 40.7(34)
¥=1,569.917 + 40.7(85) = 9,029-417
Example 2. A sales manager wants to determine whether there is a relationship between
the number of sales calls made in a month and the number of copiers sold that month
‘The manager selects a random of 10 representatives and determines the number of sales
calls each representative made last month and the number of copiers sold. The sample
information is shown below.
Table: Sales Calls and Copiers Sold for Ten Salespersons
Sales Calls | Copiers Sold 7 7
Sales Rep (x) ” x y xY
‘Aldo 20° 30 400 900 600
Ben 40 60 1,600 3,600 2,400
Carlo 20 40 400 1,600 800
David 30 60 900 3,600 1,800
Eddie 10 30. 100 00 300
Flor 10 40 100 1,600 400
Godwin 20 40 400 1,600 800
Hermie 20 50 400 2,500 1,000
Intong 20 30 400 900 600
Juan 30 70 00 4,900 2.100
Sum(Z) 220 450 5,600 22,100 10,800
Step 1. Determine the sum of: IXY; EX; EY; and EX4
EXY = 10,800; EX= 220; EY=450; EX? = 22,100
Step 2. Compute b.
10(10,800) ~ (450)(220) 108,000 - 99,000
b= —_ 7
10(5,600) ~ (220)* 56,000 - 48,400
9,000
145Ce
Step 3. Determ
450 20
a — - (1.1842)
10 0
= 45 ~ 26.05 = 18.95
a= 18.95
Step 4. Formulate the Regression equation
Y= 18.95 4 L.1842N
Step 5. Predicting the copiers sold (¥) given the following sales calls.
x. 4 4750s BCG HO
Y= 7228 7461 7816 6290 87.64 90.00 95.92 101.64
IfX = 45; then, Y = 18,95 + 1.1842(45) = 72.239
x 47, then, Y = 18.95 + 1.1842(47) = 74.607
x 50, then, Y = 18,95 + 1.1842(50) = 78.160
X = 54then, Y= 18.95 + 1.1842(54) = 82.897
X = 58then, Y= 18,95 + 1.1842(58) = 87.634
X = 60,then, = 18.95 + 1.1842(60) = 90.002
X = 65then, = ¥ 18.95 + 1.1842(65) = 95.923
X = 70,then, Y = 18.95 + 1.1842(70) =101.844
x
= 75,then, Y= 18,95 + 1.1842(75) =107.765
146
75
107.77Lesson 4. Using Regression to Develop a Forecasting Trend Line
Example 1. Ten-year Sales data for Huntsville Chemicals
Year (X) Sales (Y) x | nv
2000 7.84 4,000,000 15,680.00
2001 12.26 4,004,001 24,532.26
2002 13.11 4,008,004 26,246.22
2003 15.78 4,012,009 31,607.34
2004 21.29 4,016,016 42,665.16
2005 25.68 4,020,025 51,488.40
2006 23.80 4,024,036 47,742.80
2007 26.43 4,028,049 53,045.01
2008 29.16 4,032,084 58,553.28
2009 33.06 4,036,081 66,417.54
Ex= 20,045 Sy = 208.41 Yxt= 40,180,285 Day = 417,978.01
b 10(417,978.01) - (20,045)(208.41) _4,179,780.1 ~ 4,17)
7 10(40,180,285) - (20,045)* 401,802,850-4011,802,0:
2,201.65
b= =2.67
825
20,045
a = 2084 _ (2.67) S28 = 20.84 - 5,352.02
10 10
a = 20,84~ 5,352.02 =- 5,331.18
= -5.331.18 + 2.6687x
147EXERCISE 14
Regression Analysis
Name
Rating
Course & Curriculum Year =
1. The McDonald's Corporation would like to find out if there is a relationship between
the price of a Big Mac and the net hourly wages of workers around the world and if there
is a relationship, the researcher would like to find out how strong is the relationship
Predict the hourly wages by the price of the Big Mac,
Country | Big Mac Price (U.S.$) | Net Hourly Wage (U.S. (S)
1 Argentina 1.42 1.70
} Australia 1.86 7.80
' Brazil 1.48 2.05
: Britain 3.14 12.30
' Canada 221 9.35
Chile 4.98 2.80
China 4.20 240
Gzech Republic 1.96 2.40
Denmark 4.09 14.40
Euro Area 2.98 9.59
Hungary 249 3.00
Indonesia 1.84 1.50
Japan 218 13.60
Malaysia 1:33 3.40
Mexico 218 2.00
New Zealand 2.22 6.80
Philippines 224 1.20
Poland 1.62 220
Russia 1.32 2.60
Singapore 1.85 540
South Africa 1.85 3.90
South Korea 270 5.90
sweden 3.60 10.90
Switzerland 40 17.80
Thailand 1.38 1.10
Turkey 234 320
United States | 2.714 4430
149between the number of {
2. The National Highway Association is studying the relationship 2]
bidders on a highway project and the winning (lowest) bid for the project. OF particular |
interest is whether the number of bidders increases or decreases the amount of the |
winning bid. ;
Project Number of Bidders (X) Winning Bid (Php M)(Y) i
1 9 ;
2 9 8.0
3 3 97
4 10 a8
5 5 77
6 10 85
7 7 83
8 1 55
9 6 10.3
7 10 6 8.0
11 4 a8
12 7 94
13 z 86
14 7 84
18 6 78
Determine the regression equation. Interpret the equation, Do more bidders tend
a
to increase or decrease the amount of the winning bid?
b, Estimate the amount of the winning bid if there were seven bidders?
¢. Determine the coefficient of correlation,
4. Determine the coefficient of determination. Interpret its result.
150 :interested in the r.
sample of 15 com
3. Mr. William Profits studying companies going public
tionship between the size of the offering
ies that recently went public revealed the following infor
particular
forthe first time. Heis
A
and the price per share
mation
Company Size (Php MIC) Price/Share (Y)
1 9.0 108
2 94.4 Wd
3 273 12
| 4 179.2 W4
| 5 m9 Wt
5 97.9 11.2
| 7 835 14.0
8 70.0 10.7
{ 9 160.7 13
| 10 96.5 106
"1 830 105
2 23.5 10.3,
13 587 107
14 93.8 11.0
15 344 108
Bose
Determine the regression equation. Interpret the equation.
Determine the coefficient of correlation
Determine the coefficient of determination. Interpret the result.
Do you think Mr. Profit should be s:
the independent variable? (Justify your answer)
fied with using the size of the offering. as
151nship between }
2 the relat
4. The Balde Shipping Co., of the Province of Albay, is studying the r
days, it take
nee ashipment must travel and the length of time, in day
Dal o des estig de selected a random s
to arrive at the destination, ‘To investigate, Mr. Bald pent variable an shipping.
esthe shipment |
nple of 20 |
shipments made last month, Shipping distance is the inde i
time is the dependent variable, The results are as follows:
Shipment Distance (km)(X) Shipping time (Days)() i
1 656 5 !
2 853 4 }
3 646 6 :
4 783 "1 !
5 610 8 :
6 Bat 10 i
7 785 9
8 639 9
9 762 10
10 762 9
n 862 7
12 679 5 i
13 835 13 }
4 607 3 i
15 665 3 |
16 647 7 :
7 685, 10 :
18 720 8 i
19 652 6 i
20 828 10 :
152