0% found this document useful (0 votes)
87 views17 pages

Chapter 9: Correlation & Simple Linear Regression

This document discusses correlation and simple linear regression. It provides examples of exploring the relationship between waist circumference and body fat percentage through correlation analysis and linear regression modeling. Specifically, it examines whether there is a linear relationship between waist and fat, how strong the relationship is, and how well fat can be predicted from waist measurements. Key aspects covered include correlation coefficients, coefficients of determination, assumptions of linear regression models, interpreting regression coefficients and goodness of fit.

Uploaded by

Kadir Ozcan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views17 pages

Chapter 9: Correlation & Simple Linear Regression

This document discusses correlation and simple linear regression. It provides examples of exploring the relationship between waist circumference and body fat percentage through correlation analysis and linear regression modeling. Specifically, it examines whether there is a linear relationship between waist and fat, how strong the relationship is, and how well fat can be predicted from waist measurements. Key aspects covered include correlation coefficients, coefficients of determination, assumptions of linear regression models, interpreting regression coefficients and goodness of fit.

Uploaded by

Kadir Ozcan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 9: Correlation &

Simple Linear Regression


Waist & Fat
• Example 9.3.1 (P. 413)
Waist & Fat
• Expecting a “Linear Relationship”
Waist & Fat
• Research Questions:

• Waist is easy to measure, and can be measured accurately


• Fat is not easy to measure, and hard to tell the accuracy
• It helps if we can establish a relationship between Waist & Fat

• Does a linear relationship exist between Waist & Fat?


• How strong is this linear relationship?
• How well can we predict Fat by Waist?
Correlation
• Linear relationship between two continuous variables
• “Scatter plot”
Correlation
• Correlation Coefficient R(X, Y)
• -1 < R < 1 : Strength & direction of linear relationship between X & Y
• 0 < R < 1: + linear correlation; Y increases as X increases
• -1 < R < 0; - linear correlation; Y decreases as X increases

• R = 1 or R = -1: perfect linear correlation


• R = 0: no linear correlation

• Coefficient of Determination R2
• % of variability in Y (or X) that can be explained by X (or Y)

• Estimation & Testing


• Assumption: X & Y from normal populations
• H0: R = 0 vs. H1: R ≠ 0
• SAS proc corr
Simple Linear Regression
• Linear relationship btw two continuous variables
• Predicting value of Y based on X
• X: covariate (Waist)
• Y: response (Fat)

• Relationship, not causality!


• A man with larger waist cir. also have higher body fat; but that’s not saying
larger waist cir. causes higher body fat!
Linear Regression: Model

• b0: Intercept
• b1: Slope; indicator of linear relationship
• e : random error

• Regression Line:

• Y values from different x are independent

• Waist & Fat satisfies these assumptions?


Linear Reg.: Research Topics

• Explore data: assumptions satisfied?


• Use scatter plot

• Estimate model: what is the quantitative relationship?

• Evaluate model: Relationship strong? Prediction good?

• SAS proc reg.


• Run Dropbox/Regression/Chp9 SAS
Linear Reg.: Linear Relationship
• Model Interpretation
• For every Δx increase in the predictor X, the response Y will increase for the
amount of β1*Δx
• Always in terms of the change!

• Existence of Relationship: H0: β1 = 0


• {H0 Not Rejected}: “Based on our data, no evidence supports a linear
relationship between Y & X. Other relationship might exist.”
• {H0 Rejected}: “Our data supports a linear relationship between Y & X”

• Does the data support a linear relationship between Waist & Fat?
Linear Reg.: Model Strength
• ANOVA in linear regression model
• SST: total variation in Y; = SSR + SSE
• SSR: variation explained by linear regression
• SSE: unexplained/error/residual variation
• Least-squares estimates minimizes SSE

• Coefficient of Determination R2 = SSR / SST


• R2 in SAS output “Analysis of Variance”
• 0 < R2 < 1; Larger R2 means stronger model
• R = corr(X, Y), if model has only one X

• What % of variance does Fat ~ Waist model explain?


Linear Reg: Estimation & Prediction
• “Narrower” interval: 100(1-a)% confidence (estimation) interval of μY|X
• “Wider” interval: 100(1-a)% prediction interval of Y|X
Linear Reg: Estimation & Prediction
• “Narrower” interval: 100(1-a)% confidence (estimation) interval of μY|X
• “Wider” interval: 100(1-a)% prediction interval of Y|X

• People often use words “confidence” or “estimation” for parameter


• … and use “prediction” for future observations / subjects

• Given x=x0, I have 95% confidence that the prediction interval will cover the mean
of y|x=x0
• I have 95% confidence that the next y corresponding to x0 will fall in the estimation
interval
Summary
• A regression model describes the condition distribution of Y|X=x, or
certain characteristics of it, as a function of the explanatory variables x
• We estimate such models on the basis of samples of pairs of random
variables (Y,X)
• It is convenient to assume that a regression model consists of signal and
noise, i.e. a deterministic part and an error term
Extra: Dummy Coding
• Use (k-1) dummy variables for a k-level categorical predictor

• Study the effect of Gender


• Define one dummy variable Gender: =0 (male); =1 (female)

• Iris sepal length (Short, Medium & Long)


• Wrong: Iris: =0 (Short); =1 (Medium); =2 (Long). “Equally spaced” assumption
• Correct: define two dummy variables IrisM & IrisL, i.e. Short is the “reference”

Short Medium Long


IrisM 0 1 0
IrisL 0 0 1
Homework
• Preview questions of Two-way Table

• In linear regression, why do you think Prediction Interval is wider than


Confidence (Estimation) Interval?
• Answers can be found in Chapter 9

• Reading the story of Bigfoot & UFO sighting


Part I: Page 378, Exercise 15

ANOVA
Source of
Variation SS df MS F P-value F crit
0.54671208 27.5462179 2.01480369
Rows 8.20068125 15 3 6 1.82574E-13 1
0.05365416 0.02682708 1.35168895 0.27411299 3.31582950
Columns 7 2 3 5 6 1
0.01984708
Error 0.5954125 30 3

8.84974791
Total 7 47

You might also like