0% found this document useful (0 votes)
12 views

11 SimpleRegression

This document provides an introduction to simple linear regression. It presents a model for simple linear regression using an intercept (β0) and slope (β1) coefficient. It provides an example using advertising expenditures and sales data from a teddy bear company. Scatterplots and least squares estimators are used to estimate the intercept and slope coefficients. The coefficients are then interpreted. Measures of variation like R-squared, standard error, and inferences about the slope coefficient using t-tests are also discussed. Finally, the document touches on prediction using the regression model.

Uploaded by

Jawwad Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

11 SimpleRegression

This document provides an introduction to simple linear regression. It presents a model for simple linear regression using an intercept (β0) and slope (β1) coefficient. It provides an example using advertising expenditures and sales data from a teddy bear company. Scatterplots and least squares estimators are used to estimate the intercept and slope coefficients. The coefficients are then interpreted. Measures of variation like R-squared, standard error, and inferences about the slope coefficient using t-tests are also discussed. Finally, the document touches on prediction using the regression model.

Uploaded by

Jawwad Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

DISC 203 – PROBABILITY &

STATISTICS

SIMPLE LINEAR REGRESSION

1 Lecturer: Muhammad Asim


SIMPLE LINEAR REGRESSION MODEL

Slope Random
Intercept Independent Error term
Coefficient
Variable
Dependent
Variable

Yi  β0  β1Xi  ε i
Deterministic component Random Error
component
EXAMPLE
 You are a marketing analyst for Teddy Bears. You gather the
following data and want to find a simple relationship between
advertising and sales.

Advertising - Sales Data


Month Advertising Sales
Expenditure x Revenue y
($100) ($1,000)
1 1 1
2 2 1
3 3 2
4 4 2 3

5 5 4
SCATTERGRAM
SALES VS. ADVERTISING

Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
4
SCATTERGRAM
SALES VS. ADVERTISING

5
LEAST SQUARES ESTIMATORS
 Prediction equation
yˆi  ˆ0  ˆ1 xi
 Sample slope
SS xy  xi  x  yi  y 
ˆ1  
2
SS xx  i x  x 
 Sample y - intercept  y  y 
SS xy   x  x

  x  x 
2
SS xx
ˆ0  y  ˆ1x SS yy   y  y 
2

6
COMPUTATIONS – LEAST SQUARES LINE
xi (adv) yi (sales) (xi - 3)2 (xi - 3)(yi - 2)
1 1 4 2
2 1 1 1
3 2 0 0
4 2 1 0
5 4 4 4
∑xi = 15 ∑yi = 10 SSxx= ∑(xi - 3) 2 SSxy=∑(xi - 3)(yi - 2)
Mean = 3 Mean = 2 = 10 =7

7
COEFFICIENT INTERPRETATIONS

^
1. Slope (1)
• Sales Volume (y) is expected to increase by $ 700
for each $100 increase in advertising (x), over the
sampled range of advertising expenditures from
$100 to $500
^
2. y-Intercept (0)
• Since 0 is outside of the range of the sampled
values of x, the y-intercept has no meaningful
interpretation 8
MEASURES OF VARIATION

 SST = total sum of squares


 Measures the variation of the y i values around their mean, y
 SSR = regression sum of squares
 Explained variation attributable to the linear relationship between x
and y
 SSE = error sum of squares
 Variationattributable to factors other than the linear relationship
between x and y
MEASURES OF VARIATION

 Total variation is made up of two parts:

SST  SSR  SSE


Total Sum of Regression Sum Error Sum of
Squares of Squares Squares

SST   (y i  y)2 SSR   (yˆ i  y)2 SSE   (y i  yˆ i )2


where:
y = Average value of the dependent variable
yi = Observed values of the dependent variable
ŷ = Predicted value of y for the given x value
i i
Advertising Sales
yhat=b0+b1*x (y-yhat)^2 Expenditure x Revenue y
($100) ($1,000)
(y-yhat)

n=5

11
COEFFICIENT OF DETERMINATION, R2

 The coefficient of determination is the portion of


the total variation in the dependent variable that is
explained by variation in the independent variable
 The coefficient of determination is also called R-
squared and is denoted as R2

SSR regression sum of squares


R 
2

SST total sum of squares

note: 0 R 1
2
R2 INTERPRETATION

R2 = SSR/SST = 0.82

Interpretation: About 82% of the sample variation in


Sales can be explained by Advertising Expenditures,
using the linear regression model.

13
STANDARD ERROR OF THE
REGRESSION MODEL

SSE SSE
s 
2

Degrees of freedom for error n  2

 We refer to s as the standard error of the regression


model
 s measures the spread of the distribution of y values
about the least squares line
 We expect most of the observed y-values to lie within 2s
of their respective least squares predicted values
CALCULATING S2 AND S

SSE 1.1
s 
2
  .36667
n2 52

s  .36667  .6055
We would expect most of the observed revenues to
fall within 2s or $1,220 of the least squares line.

15
Yi  β0  β1Xi  ε i

16
MAKING INFERENCES ABOUT SLOPE
 E(y) = 0 + 1x
 Ho: 1 = 0
 Ha: 1 ≠ 0
 If 1 = 0, then x has no influence on y.
 If we reject Ho, we say that x has a statistically significant
effect on y.
 To test the null, we need to know the sampling distribution of
̂1

17
MAKING INFERENCES ABOUT THE
SLOPE 1
 Sampling Distribution of ̂1 large n:
for


̂1 ~ N ( 1 ,  ˆ  )
1
SS xx

 Typically approximate
 s
 ˆ  sˆ 
1
SS xx by 1
SS xx

 So, when n is large, we use a z-statistic ~ N(0,1)


 When n is small, we typically use a t-statistic ~ t( n-2 )
 For large n, the distributions of z and t statistics are almost the same
18
MAKING INFERENCES ABOUT THE
SLOPE 1
A Test of Model Utility: Simple Linear
Regression
One-Tailed Test Two-Tailed Test
H0: β1=0 H0: β1=0
Ha: β1<0 (or Ha: β1>0) Ha: β1≠0 s  2 SSE

SSE
Degrees of freedom for error n  2
ˆ1 ˆ1
Test statistic :t  
sˆ s SS xx
1

Rejection region: t< -tα Rejection region: |t|> tα/2


(or t< -tα when Ha: β1>0) 19

Where tα and tα/2 are based on (n-2) degrees of freedom


EXAMPLE
 We estimated a simple relationship between advertising
and sales based on a sample of 5 observations. Is the true
relationship statistically significant at the .05 level of
significance?

20
TEST OF SLOPE COEFFICIENT
SOLUTION

 H0: 1 = 0
 Ha: 1  0

   .05

 df 5–2=3
 Critical Value(s):
Reject H0 Reject H0
.025 .025

-3.182 0 3.182 t
21
TEST OF SLOPE COEFFICIENT
SOLUTION

 H0: 1 = 0 Test Statistic:


 Ha: 1  0

   .05

 df 5–2=3
 Critical Value(s): Decision:
Reject H0 Reject H0 Reject Ho at  = .05
.025 .025 Conclusion:

There is evidence of a
-3.182 0 3.182 t relationship
22
MAKING INFERENCES ABOUT THE
SLOPE 1
 Confidence Interval for 1 : ˆ1  t 2 sˆ
1

 [0.090, 1.309]

 We can be 95% confident that the true mean increase in


monthly sales revenue per additional $100 of advertising
expenditure is between $90 and $1,310.

23
PREDICTION WITH REGRESSION
MODELS
 Types of predictions
 Point estimates
 Interval estimates
 What is predicted
 Population mean response E(y) for given x
 Point on population regression line
 Individual response (yi) for given x

24
WHAT IS PREDICTED

y
yIndividual ^
b x
^
= b0
+ 1
^y i
Mean y, E(y)

E(y) = b0 + b1x

Prediction, ^
y
x
xP 25
USING THE MODEL FOR
ESTIMATION AND PREDICTION
 100(1-α)% Confidence Interval for Mean Value of y at x=xp

1  xp  x 
2

yˆ  t 2 s 
n SS xx
 100(1-α)% Prediction Interval for an Individual New Value of y at
x=xp
1  xp  x 
2

yˆ  t 2 s 1  
n SS xx
26
 where tα/2 is based on (n-2) degrees of freedom
EXAMPLE
 Find a 95% confidence interval for the mean monthly
sales when the store spends $400 on advertising.

27
EXAMPLE
 Predict the monthly sales for next month if $400 is spent
on advertising. Use a 95% prediction interval.

28
CONFIDENCE INTERVALS V.
PREDICTION INTERVALS

y
^
b xi
^
= b0
+ 1
^y i

x
x 29
REGRESSION RESULTS IN R

30
ESTIMATOR: UNBIASEDNESS (ACCURACY) AND
MINIMUM VARIANCE (PRECISION OR EFFICIENCY)

Unbiasedness (accuracy) and minimum variance


(precision/efficiency) are desirable properties of the probability
distribution of the sample statistic.
31
MODEL ASSUMPTIONS

32
MODEL ASSUMPTIONS
So far we only estimated deterministic component. Now we
turn our attention to random error ϵ. We first need some
modeling assumptions…
Assumption 1: E( /x) = E( ) = 0
The mean of the probability distribution of  is 0. This
implies mean value of y for a given value of x is 0 + 1x.
y = 0 + 1 x + 
Since, E( /x) = E( ) = 0,
E(y/x) = 0 + 1x
33
Sometimes, just written as E(y) = 0 + 1x
MODEL ASSUMPTIONS

Assumption 2: Homoskedasticity
• The variance of the probability distribution of 
is constant for all settings of the independent
variable x. For our straight-line model, this
assumption means that the variance of  is
equal to a constant, say 2, for all values of x.
• When this assumption does not hold, we say
we have a problem of heteroskedasticity.
34
MODEL ASSUMPTIONS

Assumption 3: Normality
The probability distribution of  is normal.
Assumption 4: No Autocorrelation
The values of  associated with any two observed
values of y are independent–that is, the value of 
associated with one value of y has no effect on the
values of  associated with other y values.
35
DISC: 203 – PROBABILITY &
STATISTICS
Practice Set – Simple Regression

36
EXERCISE 1
 For five popular cars, the following information on engine size and
mileage ratings was recorded:
Engine Size, x Mileage Rating,
(cubic inches) y

144 28

232 21

306 23

388 17

414 15

 Test the null hypothesis that x contributes no information for the


37
prediction of y against the alternative hypothesis that these variables
are linearly related with a slope significantly different from zero. Use
a significance level of 5%. Interpret results.
EXERCISE 1

 H0: 1 = 0 Test Statistic:


 Ha: 1  0 t = -4.337
   .05

 df 5–2=3
 Critical Value(s): Decision:
Reject H0 Reject H0 Reject Ho at  = .05
.025 .025 Conclusion:
There is evidence of a
-3.182 0 3.182 t relationship
38
Since size is
statistically
significant, we can
interpret its value: if
size increases by 10 As p-value
cubic inches, =0.0226<0.05, the
mileage rating effect of size on
decreases by 0.43 mileage is
points on average. statistically
significant.

We are 95%
confident, an
increase in size by
10 cubic inches,
decreases mileage
rating by between
0.11 and 0.74 points.
EXERCISE 2
 A real estate broker has been collecting data on home sales so
that he can investigate the relationship between the dependent
variable, y = value of the purchased home, and the
independent variable, x=annual family income of buyer. Data
from six recent sales are shown in the following table:

annual family Value of home,


Test the null hypothesis that x contributes no income, x y (thousands of
(thousands of dollars)
information (at α=0.01) for the prediction of dollars)
y against the alternative that home value y
tends to increase as the annual family
income x increases.
15.2 33.8
17.4 48.9
22.0 49.5
24.6 61.0
40
29.8 63.8
38.0 92.5
EXERCISE 2

 H0: 1 = 0 Test Statistic:


 Ha: 1 > 0 t = 7.6
   .01

 df 6–2=4
 Critical Value(s): Decision:
Reject H0 Reject Ho at  = .01
.01 Conclusion:
There is evidence of a
0 3.747 t positive relationship
41
For a one-tail test,
use half the p-value.
Since
0.000805<0.01, we
reject Ho at 1%
significance level.

42

You might also like