0% found this document useful (0 votes)
69 views

Module 11 Linear Regression and Correlation

Uploaded by

Jonar Briones
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
69 views

Module 11 Linear Regression and Correlation

Uploaded by

Jonar Briones
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
MODULE 11 Linear Regression and Correlation Lesson 1. Coefficient of Correlation The coefficient of correlation measures the strength of the association between two variables. It describes the strength of the relationship between two sets of interval-scaled or ratio-scaled variables. Correlation Analysis. A group of techniques to measure the strength of the association between two variables. Dependent Variable. The variable that is being predicted or estimated Independent Variable. A variable that provides the basis for estimation. Its the predictor variable. In coefficient of correlation the following must be present: Both variables must be at least interval scale of measurement 1. The coefficient of correlation can range from -1.00 to +1.00 2. If the correlation between two variables is zero (0), there is no association between them. . 3, A value of 1.00 indicates a perfect positive correlation, and ~ 1.00 perfect negative tion. a 4 sane sign means there is adirect relationship between the variables, and anegative "Ser ‘means there is an inverse relationship, sign theletter r and found by the following equation: 5. It is designated by [nEX¥]-[EXZY] REY? - (LY = fnex? - (2x)*1AEY? ~ (yy) sllustrate the step-by-step procedure for obtaining a correlation coefficient Example 1. To ilustit hip between years of school completed (X) and prejudice (¥) as let us examine the rel found inthe following sample of 10 respondents 133 xy Respondent x ¥ Y oa ; 0 A 10 100 u 2 B 3 9 7 v4 a c 12 144 2 is 44 D 121 4 a 30 E 6 36 5 z Sy F 8 64 4 16 a 6 4 196 1 i a H 9 8 2 ft a 1 10 100 3 9 J 2 4 10 100 20 Sum (5) as [ass [ao | 25 | as Using the above-cited formula Step 1. 1 Step 2. Step 3. Step 4. Step 5. 134 Determine the mean of X and y ¥=85/10=8.5; = 39/10=3.9 Square X and Y and determine the sum of each 5; LY*= 225 =z Multiple X and Y, then get their sum. EXY =243 Get the sum of X and the sum of ¥. ix=85; LY=39 Substitute the data to the formula = [(10)(855) - (85)*[(10)(225) - Gop) 10(243) ~ (85)(39) 2,430 ~ 3,315 = 918550 ~ 7,225][2,550 — 1,521] 885 [1,325][1,029} -885 = 11363425 885 The result sh shows a very high negative correlation. This means here an inverse relationship between the very high negati ation, This means that there an im Example 2. Computation of Correlati partocwestice Wisden Coefficient for Job Aptitude test (X) and Job Worker x = 7 r 77 . 65 4,225 90 8.100 5,850 c 60 3,600 95 9,025 5,700 1 62 3.844 82 6,724 5,084 89 3,481 a7 7,569 5,133 E 58 3,364 80 6,400 4,640 E 53 2,890 75 5,625 3,975 g 50 2,500 60 3,600 3,000 4 48 2,304 69 4,761 3312 1 45 2,025 60 3,600 2,700 J 40 4,800 72 5,184 2,880 ‘Sum (Z) 540 29,752 770 60.588 42,274 Step 1. Determine the mean of X and Y. 540 770 ¥=—=54 =—=77 10 10 Step 2. Determine the square of X and Y and determine the sum of each. EX? = 29,752; ZY? = 60,588 Step 3. Determine the sum of the product of X and Y. UXY = 42,274 Step 4. Get the sum of X and the sum of Y. yx=540, 9 LY=770 Step 5. Substitute to the formula the data from steps 1 - 4. 10(42,274) - (540)(770) r= a{{(10)(29,752) - (540)"1[10(60,588) - (770)2] 422,740 - 415,800 = ,{[297,520-291,600][605,880-592,900] 6,940 : 6,940 = ffs.9201112,980] = g7¢593 § 5= 979? 135 on which means that there is a direc, The result reveals a very high positive correl nel relationship between the two variables. It means, as x increases, y Increa y decreases also. 1d as x decreases Lesson 2. Testing the Significance of the Correlation Coefficient ‘The correlation can be tested to find out if the computed correlation coefficient come from a population of paired observations with zero correlation. To do this we have to compute for the t ratio, To do this we have to formulate the null hypothesis and the research hypothesis. H,: p= 0 (the correlation in the population is zero) H.:p 0 (the correlation in the population is different from zero) This is a two-tailed test. The following formula will be used: : t test for the coefficient of 7 fe at n= 2 degrees of freedom (11-2) ’ Using the 5% level of significance, the. decision rule states that ifthe computed ¢ falls in the area between plus 2.306 and minus 2,306, value 2.306 is found in the table of critical values the null hypothesis is not rejected. (This degrees of freedom.) . at 5% level of significance wit For r=.792 we substitute this to the -792N10-2 _ .792(2.828) T= (79De Ol Based from the above computation, t= 3.67 ig ‘means that t falls on the rejection region. Thus, F with 8 degrees of freedom. higher than the critical tof 2 ln iS rejected at 5% level of 136 EXERCISE 13 Coefficient of Correlation Name Rating ___— Course & Curriculum Year _______ Date. For the following problems determine the: a. Dependent and independent variables. b. Coefficient of correlation ¢. Interpret the strength of the correlation coefficient 1. The following sample were randomly selected. X% 4 5 3 6 10 46577 137 2. ACC Appliance stores has outlets in several places in Albay. The Blans to air a camcorder television commercial on selected local stations at least twice I sales manager { Prior to a gigantic sale starting on Saturday and ending Sunday. She plans to get the {te for Saturday-Sunday camcorder sales atthe various outlets and pair them with the number of times the advertisement was shown on the local TV stations. The basic Purpose of this research is to find whether there is any relationship between the number of times the advertisement was aired and camcorder sales. The pairings are: Sunday Sales Location of TV Station Number of Airings ame thousands) — Daraga 4 ib Legazpi city 2 fe Bitano, Legazpi city 5 21 Washngton Drive 6 i Tabaco City 3 W. Total 20 85 138 { 3. The owner of Dana Motors wants to study the relationship between the age of a car and {its selling price. Listed below is a random sample of 12 used cars sold at Dana Motors 1 during the last year. : Selling Price Selling Price | Gar | Ave inyears) | enpao0 car | Age(inyears) |” phpo00) | 9 84 7 8 76 else 7 60 8 tt 80 ty} 36 9 10 8.0 i|¢ 40 10 2 60 i] § 5.0 1 6 86 i Ls 7 10.0 12 6 8.0 139 4. The Production Dept. of ABC Electronics wa the number of employees who assemble As an experiment, two employe produced 15 during a one-hour period. ‘Tl SF tuenalie lien produced 25 during a one-hour period. The complete set of paired observa vs subassembly to explore the relationship between | nd the number produced, } ned to assemble the subassemblies. They { n four employees assembled them. They { Number of Assemblers One-Hour Production (Units) 15 25 10 40 30 140 5 ‘The city co he city council of Prince City és considering the increasing the nu an eflort to reduce Uo reduce crime, Before making the final decision, the council of police to si vey other cities of similar size to observe the ner of police in asks the Chief ionship between the number of pe i ber of police and the number of crimes reported. ‘The Chief gathered the following, information, City Number of Police Number of Crimes A 15 7 8 17 13 ¢ 25 5 o 7 7 E 7 7 F 12 a G 1" 19 4 22 6 Total 146. 95 141 142 Lesson 3. Linear Regression Regression Anal knowledge of values of is is a technique employed in predicting values of one (¥) from ther variable (X). Regression Eq n that defines the relationship between two variables, jon is an equi Simple regression, is the most elementary regression model involving, two variables in which one variable is predicted by another variable, In simple regression the variable to be predicted is called the dependent variable and is designated as y. The predictor is called the independent variable, ot explanatory variable and is designated as x. The equation for simple regression is ‘ Y = atbx (11-3) Where Y = the predicted value of y a = y-intercept b= the slope of the line x = isany value of the independent variable that is selected Formula for the slope of the line F saline n(EXY) ~ (ZX)(ZY) ara) Slope ofthe regression line b= Y-Axisintercept a (11-3) Example]: Predict the cost of flying a commercial plane using a regression line Airline Cost Data Number ot Passengers | Cost ($1,000) 61 4,280 63 4,080 87 4,420 69 4,170 70 4,480 74 4,300 76 4,820 81 4,700 86 5,110 at 5,130 5 5,640 97 5,560 143, Step 1. Determine the: EXY; YX; YY; and EX x 7 ¥ xy a 4,280 3,721 261,080 63 4,080 3,969 257,040 67 4.420 4,489 296,140 69 4,170 4761 287.730 70 4480 4,900 313,600 74 4,300 5,476 318,200 76 4,820 5,776 366,320 81 4.700 6561 380,700 86 5.110 7,396 439,460 3 5,130 8.281 466,830 95 5,640 9,025 535,800 97 5,560 9,409 539,320 X= 930 DY =56 690 Dxe= 73,764 | SkY= 4,462,220 4 ~ Step2. Compute = 1262220) ~(930)(56.690) 12(73,764) - (930)? 53,546,640 - 52,721,700 ~ “885,168 -864,900 b= 40.70 Step 3. Determine a 56,690 930 a aT: (40.70) zr = 4,724,167 - 3,154.25 a = 1,569.917 Step 4. Formulate the Regression Equation P= 1,569,917 + 40.7X 144 Step 5. Predict the cost (’) given the number of passengers (X) x55 68 34 eS 4397517 __ [2.953.717 ¥ = 1.569.917 + 40.7 (55) = 3,808.417 ¥= 1,569.917 + 40.7(68) ¥ = 1,569.917 + 40.7(34) ¥=1,569.917 + 40.7(85) = 9,029-417 Example 2. A sales manager wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month ‘The manager selects a random of 10 representatives and determines the number of sales calls each representative made last month and the number of copiers sold. The sample information is shown below. Table: Sales Calls and Copiers Sold for Ten Salespersons Sales Calls | Copiers Sold 7 7 Sales Rep (x) ” x y xY ‘Aldo 20° 30 400 900 600 Ben 40 60 1,600 3,600 2,400 Carlo 20 40 400 1,600 800 David 30 60 900 3,600 1,800 Eddie 10 30. 100 00 300 Flor 10 40 100 1,600 400 Godwin 20 40 400 1,600 800 Hermie 20 50 400 2,500 1,000 Intong 20 30 400 900 600 Juan 30 70 00 4,900 2.100 Sum(Z) 220 450 5,600 22,100 10,800 Step 1. Determine the sum of: IXY; EX; EY; and EX4 EXY = 10,800; EX= 220; EY=450; EX? = 22,100 Step 2. Compute b. 10(10,800) ~ (450)(220) 108,000 - 99,000 b= —_ 7 10(5,600) ~ (220)* 56,000 - 48,400 9,000 145 Ce Step 3. Determ 450 20 a — - (1.1842) 10 0 = 45 ~ 26.05 = 18.95 a= 18.95 Step 4. Formulate the Regression equation Y= 18.95 4 L.1842N Step 5. Predicting the copiers sold (¥) given the following sales calls. x. 4 4750s BCG HO Y= 7228 7461 7816 6290 87.64 90.00 95.92 101.64 IfX = 45; then, Y = 18,95 + 1.1842(45) = 72.239 x 47, then, Y = 18.95 + 1.1842(47) = 74.607 x 50, then, Y = 18,95 + 1.1842(50) = 78.160 X = 54then, Y= 18.95 + 1.1842(54) = 82.897 X = 58then, Y= 18,95 + 1.1842(58) = 87.634 X = 60,then, = 18.95 + 1.1842(60) = 90.002 X = 65then, = ¥ 18.95 + 1.1842(65) = 95.923 X = 70,then, Y = 18.95 + 1.1842(70) =101.844 x = 75,then, Y= 18,95 + 1.1842(75) =107.765 146 75 107.77 Lesson 4. Using Regression to Develop a Forecasting Trend Line Example 1. Ten-year Sales data for Huntsville Chemicals Year (X) Sales (Y) x | nv 2000 7.84 4,000,000 15,680.00 2001 12.26 4,004,001 24,532.26 2002 13.11 4,008,004 26,246.22 2003 15.78 4,012,009 31,607.34 2004 21.29 4,016,016 42,665.16 2005 25.68 4,020,025 51,488.40 2006 23.80 4,024,036 47,742.80 2007 26.43 4,028,049 53,045.01 2008 29.16 4,032,084 58,553.28 2009 33.06 4,036,081 66,417.54 Ex= 20,045 Sy = 208.41 Yxt= 40,180,285 Day = 417,978.01 b 10(417,978.01) - (20,045)(208.41) _4,179,780.1 ~ 4,17) 7 10(40,180,285) - (20,045)* 401,802,850-4011,802,0: 2,201.65 b= =2.67 825 20,045 a = 2084 _ (2.67) S28 = 20.84 - 5,352.02 10 10 a = 20,84~ 5,352.02 =- 5,331.18 = -5.331.18 + 2.6687x 147 EXERCISE 14 Regression Analysis Name Rating Course & Curriculum Year = 1. The McDonald's Corporation would like to find out if there is a relationship between the price of a Big Mac and the net hourly wages of workers around the world and if there is a relationship, the researcher would like to find out how strong is the relationship Predict the hourly wages by the price of the Big Mac, Country | Big Mac Price (U.S.$) | Net Hourly Wage (U.S. (S) 1 Argentina 1.42 1.70 } Australia 1.86 7.80 ' Brazil 1.48 2.05 : Britain 3.14 12.30 ' Canada 221 9.35 Chile 4.98 2.80 China 4.20 240 Gzech Republic 1.96 2.40 Denmark 4.09 14.40 Euro Area 2.98 9.59 Hungary 249 3.00 Indonesia 1.84 1.50 Japan 218 13.60 Malaysia 1:33 3.40 Mexico 218 2.00 New Zealand 2.22 6.80 Philippines 224 1.20 Poland 1.62 220 Russia 1.32 2.60 Singapore 1.85 540 South Africa 1.85 3.90 South Korea 270 5.90 sweden 3.60 10.90 Switzerland 40 17.80 Thailand 1.38 1.10 Turkey 234 320 United States | 2.714 4430 149 between the number of { 2. The National Highway Association is studying the relationship 2] bidders on a highway project and the winning (lowest) bid for the project. OF particular | interest is whether the number of bidders increases or decreases the amount of the | winning bid. ; Project Number of Bidders (X) Winning Bid (Php M)(Y) i 1 9 ; 2 9 8.0 3 3 97 4 10 a8 5 5 77 6 10 85 7 7 83 8 1 55 9 6 10.3 7 10 6 8.0 11 4 a8 12 7 94 13 z 86 14 7 84 18 6 78 Determine the regression equation. Interpret the equation, Do more bidders tend a to increase or decrease the amount of the winning bid? b, Estimate the amount of the winning bid if there were seven bidders? ¢. Determine the coefficient of correlation, 4. Determine the coefficient of determination. Interpret its result. 150 : interested in the r. sample of 15 com 3. Mr. William Profits studying companies going public tionship between the size of the offering ies that recently went public revealed the following infor particular forthe first time. Heis A and the price per share mation Company Size (Php MIC) Price/Share (Y) 1 9.0 108 2 94.4 Wd 3 273 12 | 4 179.2 W4 | 5 m9 Wt 5 97.9 11.2 | 7 835 14.0 8 70.0 10.7 { 9 160.7 13 | 10 96.5 106 "1 830 105 2 23.5 10.3, 13 587 107 14 93.8 11.0 15 344 108 Bose Determine the regression equation. Interpret the equation. Determine the coefficient of correlation Determine the coefficient of determination. Interpret the result. Do you think Mr. Profit should be s: the independent variable? (Justify your answer) fied with using the size of the offering. as 151 nship between } 2 the relat 4. The Balde Shipping Co., of the Province of Albay, is studying the r days, it take nee ashipment must travel and the length of time, in day Dal o des estig de selected a random s to arrive at the destination, ‘To investigate, Mr. Bald pent variable an shipping. esthe shipment | nple of 20 | shipments made last month, Shipping distance is the inde i time is the dependent variable, The results are as follows: Shipment Distance (km)(X) Shipping time (Days)() i 1 656 5 ! 2 853 4 } 3 646 6 : 4 783 "1 ! 5 610 8 : 6 Bat 10 i 7 785 9 8 639 9 9 762 10 10 762 9 n 862 7 12 679 5 i 13 835 13 } 4 607 3 i 15 665 3 | 16 647 7 : 7 685, 10 : 18 720 8 i 19 652 6 i 20 828 10 : 152

You might also like