Workshop 5 Correlation and Simple Linear Regression
Workshop 5 Correlation and Simple Linear Regression
Note: To use Excel’s Data Analysis add-in, it may need to be manually accessed. It is found on the
right-hand side of the ‘Data’ tab’. If it is not there, go to the Office button (top left corner)
and select ‘Excel Options’. On this screen choose ‘Add-ins’ and the press the ‘Go’ button
next to ‘Manage Excel Add-ins’. Check the box next to ‘Analysis ToolPak’, press Ok and it
should then be available on the ‘Data’ tab.
1. A mail order company has collected data from some customers about their annual salary and
has linked it with the amount spent by the customers in a year. The data is shown below.
(Adapted from: Albright, S.C., Winston, W.L. & Zappe, C. (2002) “Data Analysis and
Decision Making with Microsoft Excel”.)
a. Draw a ‘Scatter graph’ of the data (decide which is the dependent variable first!).
b. Find the correlation between salary and amount spent the CORREL function in Excel and
interpret the result. (Ans: 0.939)
c. Use Excel’s SLOPE and INTERCEPT functions to find the Least Squares regression
equation where the dependent (y) variable is amount spent and the independent (x) variable
is salary. (Ans: Slope = 0.028886, Intercept = –132.904, Equation: y= –132.904+0.028886x)
d. Use the Excel Data Analysis add-in to find the answers to questions b and c above.
e. Use the equation from (ii) to predict the amount spent by someone with a salary of $50,000.
Is this a valid prediction and why? (Ans: $1311.40)
2. Data has been collected from four theme parks around the country. Shown below is the
cumulative amount of investment in the parks over the past 4 years and annual profits.
a. Draw a ‘Scatter graph’ of the data (decide which is the dependent variable first!).
b. Find the correlation between salary and amount spent using Excel and interpret the result.
(Ans: 0.570542)
c. Use Excel’s SLOPE and INTERCEPT functions to find the Least Squares regression
equation where the dependent (y) variable is profits and the independent (x) variable is
investment. (Ans: Slope = 1.930815, Intercept = 2831136.434, Equation y =
2831136.434+1.930815x)
d. Use the Excel Data Analysis add-in to find the answers to questions b and c above.
e. Use the equation from c. to predict the profits for a park with an investment of £500,000.
(Ans: £3,796,543.93)
3. A trainee manager wondered whether the length of time his trainees revised for an exam had
any effect on their results. A random sample of trainees were asked to estimate how long they
spent revising to the nearest hour. After the exam he investigated the relationship between the
two variables.
Trainee: A B C D E F G H I J
Revision time: 4 9 10 14 4 7 12 22 1 17
Exam mark (%) 31 58 65 73 37 44 60 91 21 84
a. Draw a ‘Scatter graph’ of the data (decide which is the dependent variable first!)
b. Fit a straight line to this data
c. Find correlation coefficient (Ans: 0.976)
d. Find the regression model (Ans: y = 21.6926 + 3.4707x)
e. Predict the exam mark for a trainee who revises for 15 hours Is this a valid prediction
and why? (Ans: 74%)
f. Predict the exam mark for a trainee who revises for 35 hours. Is this a valid prediction
and why? (Ans: 143%)
4. Five new brands of washing powder have been the subject of advertising campaigns. The
amount spent and the resulting sales are shown below
a. Draw a ‘Scatter graph’ of the data (decide which is the dependent variable first!).
b. Find the correlation between advertising budget and sales using Excel and interpret the
result. (Ans: 0.843779)
c. Use Excel’s SLOPE and INTERCEPT functions to find the Least Squares regression
equation where the dependent (y) variable is sales and the independent (x) variable is the
advertising budget. (Ans: Slope = 21.27211, Intercept = 1225396.665, Equation y =
1225396.665+21.27211x)
d. Use the Excel Data Analysis add-in to find the answers to questions b and c above.
e. Use the equation from (ii) to predict the sales for an advertising budget of £70,000. (Ans:
£2,714,444.37)
5. The following data show the gaming revenue and the hotel revenue, in millions of dollars,
for ten Las Vegas casino hotels (Cornell Hotel and Restaurant Quarterly).
a. Develop a scatter diagram for these data with hotel revenue as the independent variable.
b. Does there appear to be a linear relationship between the two variables?
c. Calculate the correlation coefficient and interpret the result. (Ans: 0.9306)
d. Develop the estimated regression equation relating gaming revenue to hotel revenue. (Ans:
y=113.5+0.938x)
e. Suppose that the hotel revenue was $500 million. What is an estimate of the gaming
revenue. (Ans: 582.76)
f. Suppose the hotel revenue was $66.4 million what will be the predicted gaming revenue?
How does that compare with the actual Primadonna gaming revenue? (Ans: 175.83)
g. What is the estimate of the gaming revenue if the hotel revenue is $10 million and $1000
million? Are these valid predictions? If not why not? (Ans: 122.90, 1052.0 )
a. Compute the correlation coefficient between takings and floor space. (Ans: = 0.4316)
b. Compute the regression line suitable for predicting takings from floor space. What would
you expect the takings of a store with 800 m2 of floorspace to be?
(Ans: Takings = – 452202 + (862.787 Floorspace). If Floorspace = 800 m2, would predict
Takings = 238028 (to the nearest pound).
a. Calculate the correlation coefficient between takings and number of staff, and comment on
the result. (Ans: = 0.202)