0% found this document useful (0 votes)
96 views4 pages

SCI 1020 - wk2

Worksheet

Uploaded by

Luk Ee Ren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views4 pages

SCI 1020 - wk2

Worksheet

Uploaded by

Luk Ee Ren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

SCI1020: Introduction to Statistical Reasoning

WEEK 2: LINEAR REGRESSION


EXPLORING DATA- Relationship between two Quantitative Variables
Student's Name: Tutorial Day/Time:

PRELIMINARY READING: D S Moore et al, “Basic Practice of Statistics”, Chs 4-5.


On completion of this workshop you should be able to:
1. Produce a scatterplot of quantitative data with appropriate explanatory and response axes;
2. Recognise a linear pattern and the general formula for a straight line;
3. Calculate a predicted value given the equation of the linear regression line;
4. Add a linear line of best fit to data using MS EXCEL, and describe the regression line (equation and correlation);
5. Assess the closeness of fit using the least-squares criterion as reflected in the correlation coefficient;
6. Obtain residual values and interpret their size and distribution about the line in the form of a residual plot;
7. Find or calculate and interpret the squared correlation, r 2.

PRELIMINARY QUESTIONS:
These problems are to help you engage with the lecture material, and also to make sure that everyone is up-
to-speed before the workshop starts. Please make sure you do them before class each week!
Q.1 State in your own words what is meant by each of the terms listed below. Be specific.

Term Definition

Explanatory variable

Response Variable

Association

Correlation

Regression line

Residual

Q.2 What is the general equation of a straight line? Define all the terms in the equation.

Week 2 Copyright 2021: Monash University Page | 1


Q.3 Do Q5.2 from Moore et al text, p.130.
What is the regression line equation based on the description of the trend in this example?

WORKSHOP PROBLEMS:
Q.4 Demonstration of correlation and least squares regression.
a) Go to the website http://digitalfirst.bfwpub.com/stats_applet/stats_applet_5_correg.html (Note
that spaces in the URL are underscores_ ).
Create a scatterplot of linear trend (similar to plot #1 below. Observe the size of the correlation
coefficient for different scatter patterns. Use “Draw your own line” to draw a line of best fit.
Change the intercept and slope, trying to minimise the sum of the squares of the residuals as shown
by the “relative SS” value. Compare yours with the “Show least-squares line” which is placed by
calculation. No written answers are required here just observe the values.
b) Describe the relationship in the x-y data plotted below:
Quiz score vs chocolate consumption 5. Change in pulse rate with exercise 3. Measured radioactive decay

1. 120
2. 140 1400
Pulse rate after exercise (beats

120 1200
100
Counts per minute

100 1000
Quiz score (%)

80
per minut)

80 800
60
60 600

40 400
40

20 20 200

0 0 0
0 50 100 150 200 250 300 0 20 40 60 80 100 120 0 5 10 15
Daily Chocolate consumption (g) Pulse rate before exercise (beats per minute) Time (mins)

Identify the association (positive/negative/none) and correlation (strong/moderate/weak/none) present.


PLOT 1 2 3

Association

Correlation
Estimate r
(If approp.)

REGRESSION ANALYSIS FOR LINEAR DATA


Also see “Introduction to Excel” Section 2.7 pp.13-15.
Using Excel to produce a scatterplot of the data and add a LINEAR line of best fit:
• Plot Response variable (y-axis) against explanatory variable (x-axis). Excel: left hand column = x
• Select the chart layout that has the line and fx so that you obtain the equation of the line of best fit.
• Note the correlation coefficient, r, and its square, R2;
• Note the coefficients which are the intercept and slope for the equation;
• Obtain the full regression analysis including a residual plot by using Data Analysis/Regression. Note
that it asks for the y-data column first, and you need the data in columns, not rows;
• Check the appropriateness of linearity by interpreting the residual plot. Describe the scatter or pattern
in the residual plot: A random scatter of residuals, plus and minus, along the added line of best fit
indicates that linear IS appropriate. This is important. Data can often look linear, but a closer check
often reveals that a different trend is present!

Week 2 Page | 2
Q.5 Do Q4.29 (7th ed: Q4.28) from Moore et al, p.121 and with the same data, Q5.39, p.155.
Download the data set “Sparrowhawk” from the Moodle page/Part 1: Exploring Data.
Produce the scatterplot of the relationship in the (x,y) data.
Describe the association between New Adults arriving and the percentage of returning birds:

What is the general strength of the association?

EXTRA: What is the R-squared value?


What does this R-squared value specifically tell us about this association?

ALSO: Apply linear regression analysis using Excel. Include the residual plot.
TUTOR CHECK OF PLOTS: Scatterplot and Residual plot
(Not done in class? You must attach your plots printed out)
ALSO: Describe the residual plot. Is there any trend: is there a curve of data about the line of best
fit OR are the data points randomly scattered either side along the linear trend line?
What does this tell you about fitting a linear model to these data?

Moore Q5.39 a) What is the equation of the linear model for this relationship?
Do not use x and y designations but replace them with a descriptive notation for the variables.

Moore 5.39 b) For this sparrowhawk data:


Value of the Slope =
What does the slope actually indicate (use the actual value to explain size of influence of x on y)?

Week 2 Page | 3
Moore 5.39 c) Use the model to predict the new adult number if 45% of adults from the previous
year return:

EXTRA: Verify the value of the residual for the datum point where x = 45. Show the full
calculation of a residual.
Residual at each x = (data y value- line predicted ŷ value)

Check your value against the Excel output for x = 45.

Q.6 Appropriateness? Causation? Application? ….


Explain at least two cautions that you should make when making interpretations of an x-y
relationship using linear regression analysis.
Consider Moore et al, Chapter 5 summary, 8th ed, pp.152-153 (7th ed: pp 151-152).

From this exercise, you should make sure you able to:
• draw a scatterplot:
• obtain a line of best fit for linear data and identify its equation
• obtain a residual plot
• obtain the correlation coefficient
• state what each of the items above tells you about the data.

MARK : /10

Week 2 Page | 4

You might also like