0% found this document useful (0 votes)
31 views

03 Statistics in Regrression Analysis

- The document discusses commonly used statistics for regression analysis: R-squared, t-test, and F-test. - R-squared measures how well the regression line fits the data, ranging from 0 to 1. A higher R-squared indicates a better fit. - The t-test allows testing if a slope coefficient is statistically different from zero. It compares the t-statistic to the t-critical value. - The F-test jointly tests if all slope coefficients are simultaneously equal to zero by comparing the F-statistic to the F-critical value.

Uploaded by

Xin Ni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

03 Statistics in Regrression Analysis

- The document discusses commonly used statistics for regression analysis: R-squared, t-test, and F-test. - R-squared measures how well the regression line fits the data, ranging from 0 to 1. A higher R-squared indicates a better fit. - The t-test allows testing if a slope coefficient is statistically different from zero. It compares the t-statistic to the t-critical value. - The F-test jointly tests if all slope coefficients are simultaneously equal to zero by comparing the F-statistic to the F-critical value.

Uploaded by

Xin Ni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

COMMONLY USED STATISTICS

FOR REGRESSION ANALYSIS


SEEQ2023
BASIC ECONOMETRIC
STATISTICS IN REGRESSION

• Supposed we run a regression using annual income as the


dependent variable, and the independent variables is age,
height and years of education.
• Let says we get slope coefficient estimated for height is -1000.
• Question:
– Does this mean taller people tend to make less money?
– How do we know that the -1000 estimate is not just a coincidence?
– What is the chance that, in reality, there is no relationship at all
between height and income?
– Even though we have -1000 as the slope estimate, what is the chance
that the true value of the slope is zero?
– What is the chance that none of our dependent variables have a
relationship with income?
STATISTICS IN REGRESSION

• In econometrics analysis, there is many statistics we will deal


to shows that our regression results is robust (fit/good).
• For our courses, we will deal 3 statistics which is commonly
used:

 The R-squared test.


 The t-test.
 The F-test.
R-squared test

• Also called as a coefficient of determination.


• A measure of “goodness of fit” for a regression.
• Goodness of fit - fitted regression line to a set of data.
• Figure 3.1 – if all the observation were lie on the regression
line – obtain “perfect fit”, but its rarely case.
• Generally – some positive and some negative .
• Hope – these residual around the regression line as small as
possible.
• The coefficient of determination is a summary measure –
how well the SRL fits the data.
R-squared test
• To compute this , proceed as follows. Recall:

or in the deviation form

• Squaring equation above on both side and summing over the


sample:
R-squared test

• Where:

. Total sum of squares (TSS). Total variation of


the actual Y values about their sample mean.

. Explained sum of squares (ESS). Variation of the


estimated Y values about their mean .

. Residual sum of squares (RSS). Residual or


unexplained variation of the Y values about the regression line.
R-squared test

• Thus:
TSS = ESS + RSS

• Total variation in the observed Y values about their mean can


be partitioned into 2 parts :
– attributable to the regression line
– random forces (not all actual Y observation lie on fitted line).
R-squared test

• Dividing by TSS both side:

• We now define as :
R-squared test

• or, alternatively as

• defined as (sample) coefficient of determination and


commonly used measure of the goodness of fit of regression
line.
• measure the proportion or percentage of the total variation
in Y explained by the regression model.
R-squared test

• Two properties may be noted:


– Non-negative quantity.
– The limits are

• means a perfect fit.


• means there is no relationship between the left-hand
variable and right-hand variable.
t-test
• The t-test is the most common test in econometrics
• This test helps us assess the chance of a slope true value
being zero.
• If this chance is small and the slope true value is most likely
something other than zero, then we can put more trust in our
slope estimate.
• The t-test allows us to conduct a separate hypothesis test on
each slope estimate.
• That is, we can test each slope estimate to see if its true slope
is zero.
t-test : Hypothesis Testing
• Hypothesis testing allows us to make conclusion about
regression estimates and about the idea we are investigating.
• Eg: the owner of the store wants to know how prices, sales
and advertising affect revenue.
• Model:

Where Y = store revenue (dollar), X1 = price of a set of typical


item (dollar), X2 = sales of set of typical item (dollar), and X3 =
amount spent on advertising (dollar)
t-test : Hypothesis Testing
• Lets say we want to know whether the advertising cost will
effect the revenue or not.
• Our hypothesis is (two side test):

• H0 = hypothesis null. Advertising not significantly affect sales


• HA = hypothesis alternative. Advertising is significantly affect
sales revenue.
t-test : Hypothesis Testing

• If we want to know whether the advertising cost will increase


the revenue or not.
• For one side test, our hypothesis will be:

• H0 = hypothesis null. Advertising not significantly increase store


revenue.
• HA = hypothesis alternative. Advertising is significantly increase
sales revenue.
t-test : Hypothesis Testing

• If we want to know whether the advertising cost will decrease


the revenue or not.
• For the one side test, our hypothesis will be:

• H0 = hypothesis null. Advertising not significantly decrease


store revenue.
• HA = hypothesis alternative. Advertising is significantly
decrease sales revenue.
t-test : perform the test
• Now, we want to know whether the advertising cost will effect
the revenue or not (two side test).
• The general formula for the t-statistics:

• or, the value t for coefficient :


t-test : perform the test
• If the value of = 1.14 and the = is 0.50. Base on our
two side hypothesis, the value of t-statistics will be:

= 2.28
• The value 2.28 is called t-statistics.
• We now must get the value of the t-critical from the t table to
make decision whether we can reject or fail to reject the H0 at
the significance level we are chosen.
t-test : perform the test
• If n=124, and we choose 5% significance level, the value of t-
critical is 1.645.
• That means, the t-statistics > t-critical at 5% significance level.
We can reject the H0.
• Means that, the advertisement is significantly affect the
revenue.
F-test
• To test joint null hypothesis – the hypothesis that have more
than one slope coefficient. (t-test can be used only on null
hypothesis involving one slope coefficient).
• The F-test for the revenue model can be:

• H0 : all slope coefficients are simultaneously zero.


• HA: at least one of these is not zero. (not all slope
coefficients are simultaneously zero).
F-test

• To calculate the F-statistics:

• Where k = number of coefficient in model and n = number of


observation. (k – 1) = numerator and (n – k) = denominator.
• If F-statistics > F-critical, reject the H0 , otherwise, we fail to
reject it.
F-test

• If n = 124, k = 4, and F-statistics = 65.26


• The value F-critical from F table at 5% significance level:
df numerator (4 – 1 ) = 3
df denominator (124 – 4) = 120
the value F-critical = 2.68.

• The F-statistics > F-critical.


• Means that the H0 is rejected. There is at least one of the
slope coefficient is not zero.

You might also like