0% found this document useful (0 votes)
7 views40 pages

QBM101 Chapter10

Uploaded by

bonachinh111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views40 pages

QBM101 Chapter10

Uploaded by

bonachinh111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

QBM 101 Business Statistics

Department of Business Studies


Faculty of Business, Economics & Accounting
HELP University
SUBJECT OUTLINE:
 Module 1: Introduction; organizing
and graphing data; numerical
descriptive measures

 Module 2: Probability, discrete random


variables; continuous random variables
and the normal distribution

 Module 3: Sampling distributions;


estimation; hypothesis testing

 Module 4: Simple linear regression


CHAPTER 10:
SIMPLE LINEAR REGRESSION
 10.1 Simple linear regression
 10.2 Standard deviation of errors
and coefficient of determination
 10.3 Inferences about B
 10.4 Linear correlation
 10.5 Regression analysis: A complete
example
 10.6 Interpretation of Excel output
A regression model is a mathematical equation
that describes the relationship between two or
more variables. A simple regression model
includes only two variables: one independent and
one dependent. The dependent variable is the one
being explained, and the independent variable is
the one used to explain the variation in the
dependent variable.

A (simple) regression model that gives a straight-


line relationship between two variables is called a
linear regression model.
Regression: describing the nature of relationship
between variables – positive, negative, linear, or
nonlinear.
Correlation: determining whether a relationship
between variables exists

Questions: Are the two variables related? If so,


what is the strength? What kind of relationship?
What prediction can be made?
Examples: Height and weight of human, number
of cigarettes smoked vs weights of infants;
time spent on studying and exam marks.
 Dependent variable (DV) (y, the one being
explained) vs. independent variable (IV) (x, used
to explain the variation).
 Simple (only 1 IV) vs. multiple (> 1 IV)
regression
 Linear (straight-line relationship) vs. nonlinear
regression
SIMPLE LINEAR REGRESSION ANALYSIS

In the regression model y = A + Bx + ε, A


is called the y-intercept or constant term, B
is the slope, and ε is the random error term.
The dependent and independent variables
are y and x, respectively.

In the model ŷ = a + bx, a and b, which are


calculated using sample data, are called the
estimates of A and B, respectively.
SCATTER PLOT/DIAGRAM
ERROR SUM OF SQUARE (SSE)
The error sum of squares, denoted SSE, is

SSE   e 2   ( y  yˆ )2

The values of a and b that give the minimum


SSE are called the least square estimates of A
and B, and the regression line obtained with
these estimates is called the least squares line.
Least square/best-fit line:
yˆ  a  bx
  x
2

SS xx   x 2

n
  y
2

SS yy   y 2

n

SS xy   xy 
  x   y 
n
SS xy
b
SS xx
a  y  bx
Least square/best-fit line:

x
 x  386  55.1429, y   y  108  15.4286
n 7 n 7

SS xy   xy 
  x   y   6403   386 108  447.5714
n 7
 x
2
(386) 2
SS xx   x 
2
 23058   1772.8571
n 7
SS xy 447.5714
b   0.2525
SS xx 1772.8571
a  y  bx  15.4286  (0.2525)(55.1429)  1.5050
yˆ  a  bx  1.5050  0.2525x
Least square/best-fit line (estimation and its
reliability):
SS xy 447.5714
b   0.2525
SS xx 1772.8571
a  y  bx  15.4286  (0.2525)(55.1429)  1.5050
yˆ  a  bx  1.5050  0.2525 x
Estimate the amount of food expenditures when the income is $6100.
yˆ  a  bx  1.5050  0.2525(61)  $16.9075 hundred  $1690.75
Error, e  y  yˆ  16  16.9075  $0.9075 hundred  $90.75
Estimate the amount of food expenditures when the income is $6000.
yˆ  a  bx  1.5050  0.2525(60)  $16.655 hundred  $1665.50
The estimation is reliable because 60  (33,83)
Estimate the amount of food expenditures when the income is $2000.
yˆ  a  bx  1.5050  0.2525(20)  $6.555 hundred  $655.50
The estimation is not reliable because 20  (33,83) *Extrapolation
ERROR OF PREDICTION
Least square/best-fit line (interpretation of
regression coefficients):

yˆ  a  bx  1.5050  0.2525 x
y  intercept, a  1.5050
A family with RM 0 income will
spend RM1.5050 hundred
=RM150.50 on food.
Slope coefficient, b  0.2525
For every one unit (RM100) of increment
in income, the expenditure on food will
increase by RM0.2525 hundred = RM25.25.
Degrees of Freedom for a Simple Linear
Regression Model

The degrees of freedom for a simple linear


regression model are

df = n – 2
Standard deviation of errors:

  is estimated by se
SSE
se  , where SSE   ( y  yˆ ) 2

n2
df  n  2
SS yy  bSS xy
se 
n2
Standard deviation of errors:

SS xy 447.5714
b   0.2525
SS xx 1772.8571

SS xy   xy 
  x   y   6403   386 108  447.5714
n 7
 y
2
(108) 2
SS yy   y 2
  1792   125.1743
n 7
SS yy  bSS xy 125.1743  (0.2525)(447.5714)
se    1.5939
n2 72
Coefficient of determination (COD)

bSS xy
r 
2
,0  r 1
2

SS yy
b  0.2525, SS xy  447.5714, SS yy  125.7143
bSS xy 0.2525(447.5714)
r 
2
  0.899  89.9%
SS yy 125.7143
Interpretation: 89.9% of the total variation in food expenditures
of household can be explained by the variation in incomes, and
the remaining 10.1% is due to randomness and other variables.
Coefficient of correlation (COC)

SS xy
r , 1  r  1
SS xx SS yy
SS xx  1772.8571, SS xy  447.5714, SS yy  125.7143
SS xy 447.5714
r   0.9481
SS xx SS yy 1772.8571125.7143
Interpretation: Positive or negative sign/correlated.
Very weak, average/moderate, strong, very strong
r  0.9481: very strong and positively correlated
Other example:
r  0.1111: very weak and negatively correlated
bB se
Test statistic: tcalc  , df  n  2, sb 
sb SS xx
H0 : B  0
H1 : B  0 (two-tailed test)
B  0 (positive), B  0 (negative) (one-tailed test)
  is unknown, use the t distribution.
HT about the slope coefficient, B
Test at the 1% significance level whether the
slope of the regression line is positive.
H 0 : B  0, H1 : B  0 (one-tailed test)
  0.01
df  n  2  7  2  5
b  B 0.2525  0
tcalc    6.662
sb 0.0379
tcritical  t ,n  2  t0.01,5  3.365
tcritical  3.365  tcalc  6.662
Reject H 0 . There is sufficient evidence to conclude
that the slope is positive, or, income determines
food expenditure positively.
A random sample of eight drivers selected from a small city
insured with a company and having similar minimum
required auto insurance policies was selected. The following
table lists their driving experiences (in years) and monthly
auto insurance premiums (in dollars).
Regression Analysis: A Complete Example

(a) IV and DV. Do you expect a positive or negative relationship?


(b) Compute SS xx , SS yy , and SS xy .
(c) Find the least square regression line.
(d) Interpret the regression coefficients in (c).
(e) Calculate the COC and COD. Interpret their meanings.
(f) Predict the monthly premium for a driver with 10 years of experience.
Comment on the reliability of the estimation.
(g) Compute the standard deviation of errors.
(h) Test at a 5% significance level whether B is negative.
Regression Analysis: A Complete Example
(a) IV: Driving experience, DV: Monthly auto insurance premium
A negative linear relationship.
Regression Analysis: A Complete Example

(b) x 
 x 90
  11.25, y 
 y 474
  59.25
n 8 n 8

SS xy   xy 
  x   y 
 4739 
(90)(474)
 593.5
n 8
  x
2
(90) 2
SS xx   x 2
  1396   383.5
n 8
 y
2
(474) 2
SS yy   y 2   29, 642   1557.5
n 8

SS xy 593.5
(c) b    1.5476
SS xx 383.5
a  y  bx  59.25  (1.5476)(11.25)  76.6605
yˆ  a  bx  76.6605  1.5476 x
Regression Analysis: A Complete Example
(d) yˆ  a  bx  76.6605  1.5476 x
y  intercept, a  76.6605
A driver with 0 years of driving experience will need to pay
a monthly premium of $76.66.
Slope coefficient, b  1.5476
For every one extra year of driving experience, the monthyly
premium will decrease by $1.55.
SS xy 593.5
(e) COC, r    0.7679
SS xx SS yy (383.5)(1557.5)
A moderately strong and negatively correlation.
bSS xy (1.5476)(593.5)
r 
2
  0.5897
SS yy 1557.5
Alternative: COD,r 2   0.7679   0.5897
2

58.97% of the variation in monthly premium can be explained by


driving experience, whereas the remaining 41.03% is due to
randomness and other unaccounted factors.
Regression Analysis: A Complete Example

(f) yˆ (10)  76.6605  1.5476(10)  $61.18


The estimstion is reliable because 10  (2,25).

SS yy  bSS xy 1557.5  (1.5476)(593.5)


(g) se    10.3199
n2 82
Regression Analysis: A Complete Example
(h) H 0 : B  0, H1 : B  0
  0.05, df  n  2  8  2  6
b  B 1.5476  0 1.5476  0
tcalc     2.937
sb 10.3199 0.5270
383.5
tcritical  t ,df  t0.05,6  1.943
tcalc  2.937  tcritical  1.943
Reject H 0 . There is sufficient evidence to conclude that the slope is negative.
The hypothesis test on B can be
performed using the p-value approach,
using the output obtained from
statistical software.
EXCEL OUTPUT

Source: http://www.excel-easy.com/examples/regression.html
EXCEL
EXCEL
EXCEL
SUMMARY
 Identify IV (x) and DV (y)
 Calculate SS of xx, yy, and xy

 Determine the best fit line

 Calculate and interpret regression coefficients

 Calculate and interpret COC and COD

 Estimate and comment on its reliability

 Hypothesis test on B (critical value approach


using manual calculation, or p-value
approach from the Excel output)
 Finding missing values from the given Excel
output

You might also like