0% found this document useful (0 votes)
12 views34 pages

Simple Linear Reg Ex 1

Simple Linear Regression

Uploaded by

prioofficial21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views34 pages

Simple Linear Reg Ex 1

Simple Linear Regression

Uploaded by

prioofficial21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Simple Linear Regression Example

• A real estate agent wishes to examine the relationship


between the selling price of a home and its size (measured
in square feet). A random sample of 10 houses is selected

• Dependent variable (Y) = house price in $1000s


• Independent variable (X) = square feet

SimpleLinearRegEx1-1
Sample Data for House Price Model
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

SimpleLinearRegEx1-2
Graphical Presentation
• House price model: scatter plot

SimpleLinearRegEx1-3
Regression Using Excel
• Tools / Data Analysis / Regression

SimpleLinearRegEx1-4
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039


Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

SimpleLinearRegEx1-5
Graphical Presentation
• House price model: scatter plot and regression
line

Slope
= 0.10977

Intercept
= 98.248

SimpleLinearRegEx1-6
Interpretation of the Intercept, b0

• b0 is the estimated average value of Y when the


value of X is zero (if X = 0 is in the range of observed
X values)
• Here, no houses had 0 square feet, so b0 = 98.24833 has
no meaningful interpretation

SimpleLinearRegEx1-7
Interpretation of the Slope Coefficient, b1

•b1 measures the estimated change in the


average value of Y as a result of a one-unit
change in X
• Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on average,
for each additional one square feet of size

SimpleLinearRegEx1-8
Measures of Variation

• Total variation is made up of two parts:

Total Sum of Regression Sum Error Sum of


Squares of Squares Squares

where:
= Average value of the dependent variable
yi = Observed values of the dependent variable
i
= Predicted value of y for the given xi value
SimpleLinearRegEx1-9
Measures of Variation
(continued)

• SST = total sum of squares


• Measures the variation of the yi values around their mean, y
• SSR = regression sum of squares
• Explained variation attributable to the linear relationship
between x and y
• SSE = error sum of squares
• Variation attributable to factors other than the linear relationship
between x and y

SimpleLinearRegEx1-10
2
Coefficient of Determination, R
• The coefficient of determination is the portion of the total
variation in the dependent variable that is explained by
variation in the independent variable
• The coefficient of determination is also called R-squared and
is denoted as R2

note:
SimpleLinearRegEx1-11
2
Examples of Approximate r Values

Y
r2 = 1

Perfect linear relationship


between X and Y:
X
2
r =1
Y 100% of the variation in Y is
explained by variation in X

2 X
r =1
SimpleLinearRegEx1-12
Examples of Approximate
2
r Values
Y
0 < r2 < 1

Weaker linear relationships


between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X

X
SimpleLinearRegEx1-13
Examples of Approximate
2
r Values

r2 = 0
Y
No linear relationship
between X and Y:

The value of Y does not


X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X)

SimpleLinearRegEx1-14
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10
variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

SimpleLinearRegEx1-15
Correlation and
• The coefficient of determination, R2, for a simple
regression is equal to the simple correlation
squared

SimpleLinearRegEx1-16
Estimation of Model Error Variance

• An estimator for the variance of the population model error is

• Division by n – 2 instead of n – 1 is because the simple regression model uses two estimated
parameters, b0 and b1, instead of one

is called the standard error of the estimate

SimpleLinearRegEx1-17
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

SimpleLinearRegEx1-18
Comparing Standard Errors
se is a measure of the variation of observed y
values from the regression line

Y Y

X X

The magnitude of se should always be judged relative to the size


of the y values in the sample data

i.e., se = $41.33K is moderately small relative to house prices in the $200 - $300K range

SimpleLinearRegEx1-19
Inferences About the Regression Model

• The variance of the regression slope coefficient (b1) is estimated by

where:
= Estimate of the standard error of the least squares slope

= Standard error of the estimate


SimpleLinearRegEx1-20
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

SimpleLinearRegEx1-21
Inference about the Slope: t Test
• t test for a population slope
• Is there a linear relationship between X and Y?
• Null and alternative hypotheses
H0 : β1 = 0 (no linear relationship)
H1 : β1 ≠ 0 (linear relationship does exist)
• Under H0, the test statistic
where:
b1 = regression slope
coefficient
β1 = hypothesized slope
sb1 = standard
error of the slope
SimpleLinearRegEx1-22
Inference about the Slope: t Test
(continued)

House Price Estimated Regression Equation:


Square Feet
in $1000s
(x)
(y)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Does square footage of the house
405 2350
affect its sales price?
324 2450
319 1425
255 1700

SimpleLinearRegEx1-23
Inferences about the Slope: t Test Example

H0 : β 1 = 0 From Excel output: b1


H1 : β 1 ≠ 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

SimpleLinearRegEx1-24
Inferences about the Slope: t Test Example
(continued)
Test Statistic: t = 3.329
H0 : β 1 = 0 From Excel output: b1 t
H1 : β 1 ≠ 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
d.f. = 10-2 = 8 Square Feet 0.10977 0.03297 3.32938 0.01039
t8,.025 = 2.3060
Decision:
α/2=.025 α/2=.025 Reject H0
Conclusion:

Reject H0 Do not reject H0 Reject H0


There is sufficient evidence
-tn-2,α/2 0 tn-2,α/2 that square footage affects
-2.3060 2.3060 3.329 house price
SimpleLinearRegEx1-25
Inferences about the Slope: t Test Example
(continued)
P-value = 0.01039
P-value
H0 : β 1 = 0 From Excel output:
H1 : β 1 ≠ 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

This is a two-tail test, so Decision: P-value < α so


the p-value is Reject H0
Conclusion:
P(t > 3.329)+P(t < -3.329)
= 0.01039 There is sufficient evidence
(for 8 d.f.) that square footage affects
house price
SimpleLinearRegEx1-26
Confidence Interval Estimate for the Slope

Confidence Interval Estimate of the Slope:

d.f. = n - 2

Excel Printout for House Prices:


Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

At 95% level of confidence, the confidence interval for


the slope is (0.0337, 0.1858)
SimpleLinearRegEx1-27
Confidence Interval Estimate for the Slope
(continued)

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Since the units of the house price variable is


$1000s, we are 95% confident that the average
impact on sales price is between $33.70 and
$185.80 per square foot of house size

This 95% confidence interval does not include 0.


Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance

SimpleLinearRegEx1-28
F-Test for Significance
• F Test statistic:

where

where F follows an F distribution with k numerator and (n – k - 1)


denominator degrees of freedom

(k = the number of independent variables in the regression model)


SimpleLinearRegEx1-29
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10 With 1 and 8 degrees P-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

SimpleLinearRegEx1-30
F-Test for Significance
(continued)

H0 : β 1 = 0 Test Statistic:
H1 : β 1 ≠ 0
α = .05
df1= 1 df2 = 8 Decision:
Critical Reject H0 at α = 0.05
Value:
Fα = 5.32
Conclusion:
α = .05
There is sufficient evidence that
0 F house size affects selling price
Do not Reject H0
reject H0
F.05 = 5.32
SimpleLinearRegEx1-31
Predictions Using Regression
Analysis
Predict the price for a house
with 2000 square feet:

The predicted price for a house with 2000


square feet is 317.85($1,000s) = $317,850
SimpleLinearRegEx1-32
Graphical Analysis
• The linear regression model is based on minimizing
the sum of squared errors
• If outliers exist, their potentially large squared errors
may have a strong influence on the fitted regression
line
• Be sure to examine your data graphically for outliers
and extreme points
• Decide, based on your model and logic, whether the
extreme points should remain or be removed

SimpleLinearRegEx1-33
Summary
• Introduced the linear regression model
• Reviewed correlation and the assumptions of linear
regression
• Discussed estimating the simple linear regression
coefficients
• Described measures of variation
• Described inference about the slope
• Addressed estimation of mean values and prediction
of individual values

SimpleLinearRegEx1-34

You might also like