0% found this document useful (0 votes)
18 views51 pages

Quantitative Anaysise Solomon

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views51 pages

Quantitative Anaysise Solomon

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Applied Quantitative Analysis

Solomon Tsehay(PhD)
Quantitative Analysis

• We use numbers to examine our problems in a research


• We collect data and we look for parameters to describe variables and
establish relationships among variables
• Using sample: we make inference about the population
• Data type determines the kind of quantitative analysis we deploy
• Quality of your data is imperative to produce sound parameters
• Parameters: enable us to characterize the behavior of your data
• A variable is a concept or perception that takes on different values and that can be
measured.
• There are different types of variables, independent, dependent, extraneous and
intervening
• Independent variables –responsible for bringing about change in a phenomenon,
situation
• Dependent variables – is a variable that depends on other variables. The outcome
of the changes brought about by changes in an independent variable
• Intervening variables –is a variable that link the independent and dependent
variables. It acts as a bridge between the dependent and independent variables.
• Continuous variable : a variable that assumes every value.

3
Nominal and ordinal scales
• Nominal scales classifies elements into two or more categories.
• It indicates that the elements are different-not according to order or magnitude.
• It is a type of data which reflects classification characteristics, but do not indicate any
mathematical or qualitative differences.
• When the data is nominal, it is meaningless to find mean, standard deviations,
correlation coefficients, etc.
• On the other hand, ordinal scale possesses the property of magnitude.
• It classifies scores in the algebra of inequalities (a < b < c) (I.e a is not equal to “b” and
b is not equal to “a” etc)
• Ordering, ranking, or rank ordering is involved. Examples: the ranking of people for
height, weight, etc.
• Can apply the median, rank order, correlations and percentile.
• In ordinal scales, the numbers attached to values might indicate a ranking or
ordering of the values
4
• All qualitative research is nominal.

• All categorical data in which there is no difference in value among the categories is
nominal.

• All categorical data in which there is an implied ranking (for example, high-
medium-low) is ordinal.

• Any question that asks respondents to rank order responses is also ordinal.

• Any question in which the response is a number or can be interpreted as a number


with equal values among the data points is ratio.

5
Descriptive Analysis
•.
•.
When two random variables X and Y are not independent, it is frequently of interest to
assess how strongly they are related to one another.

If both variables tend to deviate in the same direction (both go above their means or below their
means at the same time), then the covariance will be positive.
If the opposite is true, the covariance will be negative.
If X and Y are not strongly related, the covariance will be near 0.
Correlation coefficient
• The units of covariance Cov(X, Y ) are ‘units of X times units of Y ’. This makes it
hard to compare covariance.
• If we change scales then the covariance changes as well. Correlation is a way to
remove the scale from the covariance.
• Definition: The correlation coefficient between X and Y is defined by

ρ is the covariance of the standardizations of X and Y .


Standard or z score
• A z score indicates distance from the mean in standard deviation units.
Formula:
X X X 
z z
S 
• Converting to standard or z scores does not change the shape of the
distribution. Z-scores are not normalized.
• There are four distributions which are widely used in Statistics
• The normal distribution
• The t distribution
• The chi-square (y2) distribution
• The F distribution
–  –  –   2 
68% (approx.)

95% (approx.)

99.7% (approx.)

The normal distribution curve is symmetrical around its mean value µX


Normal distribution

A normal distribution is fully described by its two parameters, µ X and o2 . We can find out the
probability of X lying within a certain interval from the mathematical formula.

A linear combination (function) of two (or more) normally distributed random variables is itself
normally distributed
1
.95

0.9050

0.0475 0.0475

PDF CDF
TTEST
Defination:
χ2 = Expected Count)2
/Expected Count
CHI SQUARE TEST
•.
Equality of variances

• pairs of samples are taken from a normal population, the ratios of the
variances of the samples in each pair will always follow the same
distribution.
• Sample variances collected in a number of different ways follow this
same distribution, the F-distribution.
•.
H 0 : 1   2     c
H1 : Not all  j are the same

1   2  3
•.

H 0 : 1   2     c
H1 : Not all  j are the same

1   2  3 1   2  3
• Test Statistic
MSA
F
MSW
• MSA is mean squares among
• MSW is mean squares within

• Degrees of Freedom
df1  c  1
df 2  n  c
•.
Summary Table
Degrees
Source of Sum of Mean Squares F
of
Variation Squares (Variance) Statistic
Freedom
Among MSA =
c–1 SSA MSA/MSW
(Factor) SSA/(c – 1 )
Within MSW =
n–c SSW
(Error) SSW/(n – c )
SST =
Total n–1
SSA + SSW
SUMMARY
Application using STATA
Regression

•.
• This is a linear combination of the measurements that are used to make predictions, plus a
constant.

• No matter the source of the , the model is linear in the parameters.

• is the intercept and is the slope for the jth variable , which is the average increase in Y
when is increased by one unit and all other X’s are held constant.
Assumptions for estimation of parameters using multiple linear
regression

• The regression model is linear in the coefficients and the error term
• The error term has a population mean of zero
• No correlation between the error term and the independent variables
• All independent variables are uncorrelated with the error term
• Observations of the error term are uncorrelated with each other
• The error term has a constant variance (no heteroscedasticity)
• The error term is normally distributed

• ( The details will be discussed in class)


• Note that RSS stands for residual sum of squares:

• The RSS is also called the sum of squared errors (SSE), where

• We see that the MLE for is the one that minimizes the RSS. Thus, we estimate the
parameters using ordinary least squares (OLS), which is identical to the MLE, to choose
through as to minimize the RSS.
• (The details of the estimation will be discussed in class)
Accuracy of the Model: R2
• The proportion of variability in Y that can be explained using X:

• Total sum of squares (TSS) measures the total variance in the response Y.
• It is thought of as the amount of variability inherent in the response before the regression
is performed.
• Note that RSS measures the amount of variability that is left unexplained after performing
the regression. Always between 0 (no fit) and 1 (perfect fit).
F test
Tests on Individual Regression Coefficients
• For the individual regression coefficient:
• H0: βj = 0
• H1: βj  0
• Let Cjj be the j-th diagonal element of (X’X)-1. The test statistic:

ˆ j ˆ j
t0   ~ t n  k 1
ˆ 2 C jj se( ˆ j )

• This is a partial or marginal test because any estimate of the regression


coefficient depends on all of the other regression variables.

• This test is a test of contribution of xj given the other regressors in the model

34
Example
Post Estimation Tests
Multicollinearity…
Autocorrelation

• The AR (1) autocorrelation assumes that the disturbance in time period t (current period) depends upon
the disturbance in time period t − 1(previous period).

• The DW test is a measure of the first order autocorrelation.


• The DW test is constructed to test the null and alternative hypotheses regarding the temporal
autocorrelation coefficient.
Heteroscedasticity
Breusch Pagan test
• Ignoring heterogeneity, apply OLS to
Omitted Variables

• Model selection
• Use AIC and BIC
• The one with the lowest AIC and BIC are the best model
Omitted variables
• Regress y on Xis and square, cubic and quadruple of expected Y
• Use F test
Application using STATA
Dummy variables model
•.
.
Application using STATA

You might also like