0% found this document useful (0 votes)
5 views3 pages

Non-normality

Uploaded by

seokamilla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Non-normality

Uploaded by

seokamilla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Non-normality

The linear regression model assumes that the error term ui is normally distributed.
This assumption is critical when the sample size is relatively small because the commonly
used significance tests, such as t and F , are based on the assumption of normality. The
OLS estimates are not affected by non-normality, the problem is with the inference.

Causes

ˆ Presence of extreme values or outliers

ˆ Insufficient data (There is no rule of thumb, but it is ideal to have more than 30
observations)

ˆ Two or more processes overlapping

ˆ Subset of the main sample

ˆ Data follows some other distribution

How can one detect the non-normality?

It is thus important that we check whether the error term is normally distributed.
There are two types of methods:

ˆ Informal methods: Normal Q-Q plot of residuals (in Gretl select the residual
variable, click on Variable and Normal Q-Q Plot. In Figure 1 you can see some
patterns in the normal Q-Q plot (b, c, d and e show a non-normal pattern while a
shows a normal pattern). Technically, a normal Q-Q plot compares the distribution
of a data set to the normal distribution. The line represents perfect quantile match-
ing. If the distributions were perfectly matched, all quantile points would lie on the
line. In Figure 2, since most of the points are on the line (except at the extremes),
there is no reason to worry about normality.

ˆ Formal methods: Tests of normality

– Doornik-Hansen test (χ2 −goodness of fit test for the normal distribution)
– Shapiro-Wilk test
– Lilliefors test (Kolmogorov-Smirnov test for the normal distribution)
– Jarque-Bera test

The first three tests are studied in the Statistics II course.


The Jarque-Bera test is a large sample test and may not be appropriate in small
samples. The test statistic is given by

S 2 (K − 3)2
 
JB = n + ∼ χ2 (2)
6 24
where n is the sample size, S = skewness coefficient, and K = kurtosis coefficient.
The null hypothesis is the joint hypothesis that S = 0 and K = 3 ⇔ Normal
distribution.
If the JBexp > χ22;1−α , we reject the null hypothesis that the error term is normally
distributed; otherwise, we cannot reject it. Obviously, we can also make the decision
with the p−value.

Possible solutions

ˆ Remove outliers

ˆ Increase the sample size

ˆ Take transformations of the variables (logarithms, squares, square roots...)

ˆ Consider other regression models that take into account the lack of normality (Gen-
eralized linear models, GLM)

And if we still have problems with the normality assumption, what do we do?

That is up to you. If you still find (minor) problems in the residuals after following all
the above recommendations, you have to decide how accurate you want your model to be.
George Box said: ”Basically all models are wrong, but some are useful”. So, in general, a
decent model is better than no model at all.
Figure 1: Patterns in a normal Q-Q plot

Figure 2: Example of a normal Q-Q plot

You might also like