0% found this document useful (0 votes)
9 views14 pages

BCOR 3750 Linear Regression Models

The document discusses measures of association, including covariance and correlation, which assess the linear relationship between two variables. It explains the coefficient of determination (R²) for evaluating regression models and the least squares method for estimating regression equations. Additionally, it covers the significance of coefficients through p-values and confidence intervals to determine the meaningfulness of relationships in regression analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

BCOR 3750 Linear Regression Models

The document discusses measures of association, including covariance and correlation, which assess the linear relationship between two variables. It explains the coefficient of determination (R²) for evaluating regression models and the least squares method for estimating regression equations. Additionally, it covers the significance of coefficients through p-values and confidence intervals to determine the meaningfulness of relationships in regression analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Measures of Association

Covariance
• Measure of linear association between two variables, X and Y.
• Average of the x-deviations and y-deviations from their respective
means
• As absolute value of the covariance increases, the strength of the
linear association between X and Y increases
• A positive covariance indicates a direct relationship, while a
negative covariance indicates an inverse relationship between X
and Y
◦ Excel function: = COVARIANCE.P(data range 1, data range 2) or
COVARIANCE.S(data range 1, data range 2)
Measures of Association
Correlation
• Measure of linear association between two variables, X and Y.
• Measured by the Pearson correlation coefficient
• Does not depend upon units of measurement: -1<=ρxy<=+1
◦ Covar(X,Y) = ρxy σxσy

◦ Excel function: = CORREL(data range 1, data range 2)


Measures of Association
Correlation
•The correlation coefficient is in the range -1 (perfect negative
correlation) to +1 (perfect positive correlation)
•If the correlation is close to zero there is no linear relationship between
the two variables
Possible Regression Lines in Simple
Linear Regression

4
Assessing the Fit of a Simple Linear Regression
Model
❖The Coefficient of Determination:
• Proportion of variation in Y for sample that can be explained by the linear
relationship in the regression equation
• 0 <= R2 <= 1
❖How is R2 calculated?
• R2 = (ρxy)2
• Use the ratio SSR/SST to evaluate the goodness of fit for the estimated
regression equation
Least Squares Method
◦ A procedure for using sample data to find the
estimated regression equation:
ith residual: The error made using the regression model to estimate the
mean value of the dependent variable for the ith observation, denoted as
ei = 𝑦𝑖 − 𝑦ො𝑖 :
𝑛 2 𝑛
min ෌𝑖=1 𝑦𝑖 − 𝑦ො𝑖 = min ෌𝑖=1 𝑒𝑖 2

= =

• n is the number of observations included in the sample

➢Least Squares Method is performed when you insert a trendline in scatter


plot or use Excel’s Data Analysis Regression menu option
The Estimation Process in Simple Linear Regression
Estimation
Sample Statistics are used to estimate Population
Parameters
◦ X is used to estimate Population Mean, 
◦ b is used to estimate 
Problem: Different samples provide different estimates of
the population parameter
A sampling distribution describes the likelihood of
different sample estimates that can be obtained from a
population
Comparing the Population with its
Sampling Distribution
Population Sample Means Distribution
N=4 n=2
 = 21  = 2.236  X = 21  X = 1.58
P(X) P(X)
.3 .3

.2 .2

.1 .1

0
X
0 _
A B C D 18 19 20 21 22 23 24 X
(18) (20) (22) (24)
Properties of Summary Measures
X = 
◦ i.e. X is unbiased
Standard error (standard deviation) of the sampling distribution
when sampling with replacement:

X =
n
◦ As n increases,  X decreases
◦ Sampling more decreases the uncertainty in the estimate for 
Significant Coefficients
If the dependent variable y does not change when x1
changes, the true value of the slope would be 0
H0: 1=0
To test to see if this is true, look at the p-value for an
independent variable’s coefficient:

Coefficients Standard Error t Stat P-value


Intercept 1636.414726 451.4953308 3.624433 0.015149
X Variable 1 1.486633657 0.164999212 9.009944 0.000281

The p-value represents the probability of rejecting H 0 if it


is actually true (known as Prob of Type I error)
P-value & Significance Level 
The p-value measures the probability that the observed sample (used
to estimate b1) would have occurred if 1=0
In this example, there is only a 0.02% chance that we would have
seen this sample of seven store sales/sizes and estimated our b 1 if
there was no relationship between X (sq ft) and Y ($1000 sales) (i.e.,
1=0)
The slope for store size is statistically significant at a level less than
1%! We want lower levels of significance as that means the b 1
estimate cannot be attributed to chance.
>> A low p-value indicates that you can reject the null hypothesis
that slope is 0 and assume that this X is a meaningful addition to
your model as changes in the X are correlated with changes in the Y.
Significant Coefficients at 5% Level
In addition to assessing the p-value, look at the
confidence interval for an independent variable’s
coefficient:
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 1636.414726 451.4953308 3.624433 0.015149 475.80903 2797.02042
X Variable 1 1.486633657 0.164999212 9.009944 0.000281 1.06248968 1.91077763

If the lower and upper limit of the 95% confidence


interval are the same sign, 0 is not in interval. => less
than 5% chance that the slope could be 0!

You might also like