Quantitative Anaysise Solomon
Quantitative Anaysise Solomon
Solomon Tsehay(PhD)
Quantitative Analysis
3
Nominal and ordinal scales
• Nominal scales classifies elements into two or more categories.
• It indicates that the elements are different-not according to order or magnitude.
• It is a type of data which reflects classification characteristics, but do not indicate any
mathematical or qualitative differences.
• When the data is nominal, it is meaningless to find mean, standard deviations,
correlation coefficients, etc.
• On the other hand, ordinal scale possesses the property of magnitude.
• It classifies scores in the algebra of inequalities (a < b < c) (I.e a is not equal to “b” and
b is not equal to “a” etc)
• Ordering, ranking, or rank ordering is involved. Examples: the ranking of people for
height, weight, etc.
• Can apply the median, rank order, correlations and percentile.
• In ordinal scales, the numbers attached to values might indicate a ranking or
ordering of the values
4
• All qualitative research is nominal.
• All categorical data in which there is no difference in value among the categories is
nominal.
• All categorical data in which there is an implied ranking (for example, high-
medium-low) is ordinal.
• Any question that asks respondents to rank order responses is also ordinal.
5
Descriptive Analysis
•.
•.
When two random variables X and Y are not independent, it is frequently of interest to
assess how strongly they are related to one another.
If both variables tend to deviate in the same direction (both go above their means or below their
means at the same time), then the covariance will be positive.
If the opposite is true, the covariance will be negative.
If X and Y are not strongly related, the covariance will be near 0.
Correlation coefficient
• The units of covariance Cov(X, Y ) are ‘units of X times units of Y ’. This makes it
hard to compare covariance.
• If we change scales then the covariance changes as well. Correlation is a way to
remove the scale from the covariance.
• Definition: The correlation coefficient between X and Y is defined by
95% (approx.)
99.7% (approx.)
A normal distribution is fully described by its two parameters, µ X and o2 . We can find out the
probability of X lying within a certain interval from the mathematical formula.
A linear combination (function) of two (or more) normally distributed random variables is itself
normally distributed
1
.95
0.9050
0.0475 0.0475
PDF CDF
TTEST
Defination:
χ2 = Expected Count)2
/Expected Count
CHI SQUARE TEST
•.
Equality of variances
• pairs of samples are taken from a normal population, the ratios of the
variances of the samples in each pair will always follow the same
distribution.
• Sample variances collected in a number of different ways follow this
same distribution, the F-distribution.
•.
H 0 : 1 2 c
H1 : Not all j are the same
1 2 3
•.
H 0 : 1 2 c
H1 : Not all j are the same
1 2 3 1 2 3
• Test Statistic
MSA
F
MSW
• MSA is mean squares among
• MSW is mean squares within
• Degrees of Freedom
df1 c 1
df 2 n c
•.
Summary Table
Degrees
Source of Sum of Mean Squares F
of
Variation Squares (Variance) Statistic
Freedom
Among MSA =
c–1 SSA MSA/MSW
(Factor) SSA/(c – 1 )
Within MSW =
n–c SSW
(Error) SSW/(n – c )
SST =
Total n–1
SSA + SSW
SUMMARY
Application using STATA
Regression
•.
• This is a linear combination of the measurements that are used to make predictions, plus a
constant.
• is the intercept and is the slope for the jth variable , which is the average increase in Y
when is increased by one unit and all other X’s are held constant.
Assumptions for estimation of parameters using multiple linear
regression
• The regression model is linear in the coefficients and the error term
• The error term has a population mean of zero
• No correlation between the error term and the independent variables
• All independent variables are uncorrelated with the error term
• Observations of the error term are uncorrelated with each other
• The error term has a constant variance (no heteroscedasticity)
• The error term is normally distributed
• The RSS is also called the sum of squared errors (SSE), where
• We see that the MLE for is the one that minimizes the RSS. Thus, we estimate the
parameters using ordinary least squares (OLS), which is identical to the MLE, to choose
through as to minimize the RSS.
• (The details of the estimation will be discussed in class)
Accuracy of the Model: R2
• The proportion of variability in Y that can be explained using X:
• Total sum of squares (TSS) measures the total variance in the response Y.
• It is thought of as the amount of variability inherent in the response before the regression
is performed.
• Note that RSS measures the amount of variability that is left unexplained after performing
the regression. Always between 0 (no fit) and 1 (perfect fit).
F test
Tests on Individual Regression Coefficients
• For the individual regression coefficient:
• H0: βj = 0
• H1: βj 0
• Let Cjj be the j-th diagonal element of (X’X)-1. The test statistic:
ˆ j ˆ j
t0 ~ t n k 1
ˆ 2 C jj se( ˆ j )
• This test is a test of contribution of xj given the other regressors in the model
34
Example
Post Estimation Tests
Multicollinearity…
Autocorrelation
• The AR (1) autocorrelation assumes that the disturbance in time period t (current period) depends upon
the disturbance in time period t − 1(previous period).
• Model selection
• Use AIC and BIC
• The one with the lowest AIC and BIC are the best model
Omitted variables
• Regress y on Xis and square, cubic and quadruple of expected Y
• Use F test
Application using STATA
Dummy variables model
•.
.
Application using STATA