0% found this document useful (0 votes)
64 views

FHMM1034 Tutorial 6-Correlation Regression 201910

This document provides information and questions about regression analysis and correlation for a mathematics tutorial. It includes sample data sets and asks students to: 1) Calculate product-moment correlation coefficients, regression lines, and predicted values for several data sets relating two variables. 2) Interpret correlation coefficients, regression lines, and coefficients of determination. 3) Check assumptions and interpret results for linear regression analyses of different relationships.

Uploaded by

JUN JET LOH
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

FHMM1034 Tutorial 6-Correlation Regression 201910

This document provides information and questions about regression analysis and correlation for a mathematics tutorial. It includes sample data sets and asks students to: 1) Calculate product-moment correlation coefficients, regression lines, and predicted values for several data sets relating two variables. 2) Interpret correlation coefficients, regression lines, and coefficients of determination. 3) Check assumptions and interpret results for linear regression analyses of different relationships.

Uploaded by

JUN JET LOH
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

FHMM 1034 Mathematics III TUTORIAL 6 1

UNIVERSITI TUNKU ABDUL RAHMAN


ACADEMIC SESSION 201910
FHMM1034 MATHEMATICS III
FOUNDATION IN SCIENCE

TUTORIAL 6 : REGRESSION AND CORRELATION

1. The following data show the IQ and the score in an English test of a sample of
10 pupils taken from a mixed ability class.
The English test was marked out of 50 and the range of IQ values for the class
was 80 to 140.

Pupil A B C D E F G H I J
IQ(x) 110 107 127 100 132 130 98 109 114 124
English(y) 26 31 37 20 35 34 23 38 31 36

(a) Estimate the product-moment correlation coefficient for the class.


(b) Find the equation of the regression line of y on x

2. The relationship between the number of hours of revision a day, x, with the
marks, y obtained by 9 students for a paper in an examination is shown in the
following table.

x 4 5 6 7 8 9 10 11 12
y 55 60 53 70 72 68 80 90 85

(a) Find the linear correlation coefficient.


(b) Find the equation of the regression line of y on x.
(c) Find the expected marks that a students who revises 8.5 hours a day for
the paper.

3. Let Pearson’s correlation coefficient between variables x and y for a random


sample be r.
(a) What does r measure?
(b) State the range of the possible values of r?
(c) What is the effect of change in the unit of measurement of either
variable on the value of r?
A sample of 10 data points may be summarized as follows:
 ( x  x) 2  600.1 ,  ( y  y) 2  444.4 ,  ( x  x)( y  y)  466.2
Calculate Pearson’s correlation coefficient between x and y. Comment on your
answer.

4. The equation of the line of regression of the monthly wages, RM y , on the


work experience, x years , of workers in a factory is given by y = a + 42 x ,
with 0  x  12.
(a) Give the interpretation of constant a in the equation.
(b) State whether the equation can be used to predict the monthly wage of
a worker who has a work experience of 25 years. Give reasons for your
answer.
FHMM 1034 Mathematics III TUTORIAL 6 2

5. The following table shows the length of time that six persons have been
working at an automobile inspection station and the number of cars each of
them checked between noon and 1 o’clock on a given day.

No. of weeks employed, x Cars checked, y


5 16
1 15
7 19
9 23
2 14
12 21

(a) Find the regression line that will enable us to predict y in terms of x.
(b) If a person has worked at the inspection station for 10 weeks, how
many cars can we expect him or her to inspect during the given time
period?
(c) What percentage of the variation of the numbers of cars checked can
be attributed to differences in the length of time that the persons have
been working at the inspection station?

6. The following table gives information on ages and cholesterol levels for a
random sample of 10 men.

Age 58 69 43 39 63 52 47 31 74 36
Cholesterol
189 235 193 177 154 191 213 175 198 181
level

(a) Taking age as an independent variable and cholesterol level as a


dependent variable, compute SXX, SYY and SXY.
(b) Calculate r and r2 and explain what they mean.
(c) Find the regression of cholesterol level on age.
(d) Plot the scatter diagram and the regression line
(e) Predict the cholesterol level of a 60-year-old man.

7. The following table gives information on the incomes (in thousands of RM)
and charitable contributions (in hundreds of RM) for a random sample of 10
households.

Income 33 23 82 47 26 71 28 39 58 17
Contributions 10 4 29 23 3 28 8 16 18 1

(a) Taking income as an independent variable and charitable contributions


as a dependent variable, compute SXX, SYY and SXY.
(b) Calculate r and r2 and explain what they mean.
(c) Find the regression of contributions on income.
(d) Plot the scatter diagram and the regression line
FHMM 1034 Mathematics III TUTORIAL 6 3

8. The following table gives information on GPAs and starting salaries (rounded
to the nearest hundred RM) of seven college graduates.

GPA 2.90 3.81 3.20 2.42 3.94 2.05 2.25


Starting salary 23 28 23 21 32 19 22

(a) With GPA as an independent variable and starting salary as a


dependent variable, compute SXX, SYY and SXY.
(b) Find the least squares regression line.
(c) Interpret the meaning of the values of a and b calculated in part (c).
Comment on the reliability of the value a.
(d) Plot the scatter diagram and the regression line.
(e) Calculate r. Interpret the results.
(f) Calculate also the coefficient of determination and briefly explain what
it means.
(g) Predict the starting salary when the GPA is 3.5.

9. Eight students, randomly selected from a large class, were asked to keep a
record of the hours they spent studying before the midterm examination. The
following table gives the number of hours these eight students studied before
the midterm and their midterm scores.

Hours of study 15 7 12 8 18 6 9 11
Midterm score 97 78 87 92 89 57 74 69

(a) Do the midterm scores depend on hours of study or do hours of study


depend on the midterm scores? Do you expect a positive or a negative
relationship between these two variables?
(b) Taking hours of study as an independent variable and midterm scores as
a dependent variable compute S xx , S yy and S xy .
(c) Find the least squares regression line.
(d) Interpret the meaning of the value of a and b calculated in part (c). Is the
predicted value of a reliable? Give a reason.
(e) Plot the scatter diagram and the regression line.
(f) Calculate r. Interpret the results.
(g) Calculate also the coefficient of determination. Interpret the results.

10. Consider the following pairs of measurement


x 5 3 -1 2 7 6 4
y 4 3 0 1 8 5 3
(a) Construct a scatter diagram for these data.
(b) Calculate r. Interpret the results.
(c) Explain why it is necessary to plot a scatter diagram even though the
calculated value of the correlation coefficient r is high.
(d) Calculate also the coefficient of determination. Interpret the results.
(e) Find the least squares line.
(f) Interpret the slope of the least squares line.
FHMM 1034 Mathematics III TUTORIAL 6 4

(g) Predict the value of y if x = 20. Is the predicted value reliable? Give a
reason.

11. A student conducted an experiment to determine the specific heat capacity of


water. He recorded the time (x) rate of change in temperature (y) of water as it
cooled down. The data was recorded as follows:

Time (x) in
0 30 60 90 120 150 180 210 240 270 300
seconds
Temperature
40 37 35 34 33 32 31 30 28 27 26
(y) in oC

(a) Determination the values of SXX, SYY and SXY.


(b) Find the least squares line.
(c) Interpret the y-intercept and slope of the least squares line.
(d) Calculate the correlation coefficient. Interpret the result.
(e) Calculate the coefficient of determination. Interpret the result.

12. In an experiment to determine the total resistance of resistors in different


configurations (in parallel and in series), the voltage V across the resistors and
the current I are measured. It is known that the values of V depend on the
current that flows in the circuit. The data obtained are tabulated as follows:

Current, I 0.16 0.20 0.24 0.28 0.32 0.36


Voltage, V 2.80 3.20 3.60 4.20 4.80 5.20

(a) According to linear regression theory, what are the assumptions made
about
(i) the relationship between I and V?
(ii) the current reading?
(b) Find the least squares line.
(c) Calculate the correlation coefficient. Interpret the result.
(d) Calculate the coefficient of determination. Interpret the result.
FHMM 1034 Mathematics III TUTORIAL 6 5

Answers:

1. (a) r = 0.7452 (b) y = − 11.798 + 0.3727 x

2. (a) 0.9219 (b) y = 35.397 + 4.367 x (c) 72.52

3. (a) r measures the strength of the linear relationship between the paired x-
and y- values in a sample.
(b) 1  r  1.
(c) The change in units of the variables has no effect on r.
r  0.9028 .
There is a strong positive linear relationship between x and y.

4. (a) RMa is the expected monthly wages when the workers have no work
experience.
(b) Cannot. x = 25 is too far from the measured range of x. The prediction
is not unreliable.

5. (a) yˆ  13.4318  0.7614 x (b) 21 (c) 79.71%

6. (a) 1895.6 ; 1029.8 ; 4396.4


(b) 0.3567 , 0.1273
r = 0.3567 indicates the cholesterol level and age has a weak positive
linear relation.
r2 indicates 12.73% of the variation in the cholesterol level can be
explained by the variation in the age of men.
(c) yˆ  162.78  0.5433x
(d)
Cholesterol level
260

240

220

200

180

160

140
Age
25 30 35 40 45 50 55 60 65 70 75 80

(e) 195.38

7. (a) 4248.40, 964, 1920


(b) 0.9487, 0.9
The contribution and income has a strong positive linear relation.
90% of the variation in the contribution can be explained by the
variation in the income.
(c) ŷ = -5.162 + 0.4519x
FHMM 1034 Mathematics III TUTORIAL 6 6

(d)

Contributions

25

20

15

10

Income
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

8. (a) 3.364, 18.65, 120


(b) ŷ = 7.712 + 5.543x
(c) If a student has a zero GPA, his expected starting salaries is RM771.
If the GPA increase by 1 unit, the starting salaries is expected to
increase by RM554.30.
a is unreliable as the value of GPA = 0 lies outside the given range
2.05  GPA  3.94 .
(d)

Starting Salary (RM '00)


30

25

20

15

10

5
GPA
0.5 1 1.5 2 2.5 3 3.5 4

(e) 0.9282. The GPA and the starting salary (in RM’00) has a strong
positive linear relation.
(f) 0.8618. 86.16% of the variation in the starting salary can be explained
by the variation in the GPA.
(g) RM 2711.21

9. (a) Midterm scores depend on hours studied and we expect a positive


relationship between these two variables.
FHMM 1034 Mathematics III TUTORIAL 6 7

(b) 119.5 , 237.75 , 1251.875


(c) ŷ =58.98 + 1.99x
(d) When a student does not study at all (zero hour of study), the predicted
average midterm score is 58.98.
When the number of hours of study increase by 1 hour, the midterm
scores is expected to increase by 1.99, on the average.
a is unreliable as the value of hours of study = 0 lies outside the given
range 6  hours of study  18 .

(e)

Midterm score

100

80

60

40

20

Study hours
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

(f) 0.6147. There is a moderately strong, positive, linear correlation


between hours of study and the midterm scores.
(g) 0.3778. 37.78% of the variations in the midterm scores can be
explained by the variations in the hours of study.

10. (a)
y
8
7
6

5
4
3
2
1
x
-1 1 2 3 4 5 6 7 8

(b) 0.9365 or 0.9364


There is a strong positive linear correlation between x and y.
(c) A high value of r does not necessarily means the relationship between
x and y is linear. The value of r can be easily affected by outliers.
FHMM 1034 Mathematics III TUTORIAL 6 8

(d) 0.8770 or 0.8768


87.68% of the variations in y is explained by the variation in x.
(e) ŷ =0.0196 + 0.9178x
(f) When x increase by 1 unit, y is expected to increase by 0.9178 units.
(g) 18.3756.
The prediction is unreliable because x=20 is way outside the measured
range of values of x.

11. (a) S xx  99000 S yy  184,9091 S xy  4230


(b) yˆ  38.5  0.0427 x
(c) The y-intercept is the temperature of water before the start of the timer.
The slope gives the rate of temperature drop in oC per second
(d) ̶ 0.9887.
There exists a strong negative correlation between the time and the
temperature of water as the water cooled.
(e) 0.9774.
97.74% of the variation in temperature is due to passage of time

12. (a) (i) There is a linear relationship between V and I.


(ii) The variable I is the independent variable that can be measured
without errors.
The errors associated with different observations are
independent of one another.
(b) ˆ
V  0.7352  12.4286I
(c) 0.9968. There exists a strong positive correlation between V and I.
(d) 0.9936. 99.36% of the variation in V is due to variation in I.

You might also like