Research Methodology in Finance MGMG 522: Session #1: Charn Soranakom, PH.D
Research Methodology in Finance MGMG 522: Session #1: Charn Soranakom, PH.D
1-1
Steps in Writing a Research Paper
1. Select a topic
2. Review the relevant literature
1-3
You should also observe the trend in research
from studying recently published articles in the
top journals.
That way you will learn about the trend, i.e.,
what are the hot topics being studied these
days.
You should not spend your valuable time
studying and writing on the “dead” topics.
These are some of the leading journals in the
field of Finance.
– Journal of Finance (JF)
– Journal of Business (JB)
– Journal of Financial and Quantitative Analysis (JFQA)
– Review of Financial Studies (RFS), and
– Journal of Financial Economics (JFE)
– [See, “Finance Journal Rankings”, 2001]
1-4
Other Sources for Journal Articles
Social Science Research Network (SSRN) has
downloadable working papers, forthcoming
papers, and published papers at www.ssrn.com
Journal of Finance gives a free preview of the
forthcoming papers at
www.afajof.org/journal/forthcoming.asp
Review of Financial Studies allows you to view
abstracts of past and current papers at
rfs.oxfordjournals.org
Charles A. Dice Center has a list of journal
websites, which can be found at
www.cob.ohio-state.edu/fin/journal/jofsites.htm
1-5
Research Areas in Finance
This is a list of possible topics (not comprehensive).
Mergers and Acquisitions
Mutual Fund Performance
Ownership Structure
Dividend Policy
Share Repurchases
Earnings and Stock Returns
Capital Structure
Efficient Markets (using Event Studies)
Investment Techniques
Derivatives
IPO
Insider Trading
Portfolio Management
Behavioral Finance
Market Microstructure
1-6
2. Literature Review
In this step, you will try to form a
knowledge base.
This step will require a lot of reading.
1-9
5. Select Your Tool
Almost all of your analysis will be performed
using a statistical or econometric software
package.
– SPSS, Minitab, EViews, Limdep, and SAS
You should be familiar with the software package
you use.
Generally, the software package will not make a
mistake. It is usually the person who makes the
mistake.
Each software package is usually different from
the others. If you don’t understand something,
you should consult the manual. There is a
danger in using a tool that you don’t know how it
operates.
If you input bad data or input erroneous data,
you will not get the correct result. Remember,
GIGO (garbage in, garbage out).
1-10
6. Do the Analysis
When you have a big data set or many data
files, you should have or develop a system to
keep track of your data files.
Remember to back up your data.
You should make printouts (hard copy)
periodically. Better be safe than sorry). In
case you lost your files due to power outage,
data corruption, or virus, you still can recover
your data in the unlikely event that your
backup files failed.
Keep track of any changes you have made.
Dating or numbering your documents.
Don’t count on your memory. Write instead.
1-11
7. Interpret the Results
The analysis you have performed will need
to be converted into words that explain
what you’ve discovered.
This part requires much of your
communication skills. A good paper
explains the results very well to the
audience.
Be clear, concise, and complete.
1-14
Types of Research Papers
1. Theoretical Paper
This type of paper is concerned with
developing a model to explain economic or
financial phenomena. It usually requires a
set of assumptions and is involved with
mathematical models. Some of these models
can be tested empirically, while some cannot.
2. Empirical Paper
This type of paper requires data for the
analysis. The relationships and explanations
have already been developed or known in
previous papers. Your job is to gather the
data and test such relationships.
1-15
General Format of an Empirical Paper
1. Introduction and Literature review
– Tell the author’s motivation for doing this research.
– Review the literature (what have been done or found).
– Tell how this paper is different from the others.
2. Hypothesis development
– This part will usually refer to or restate the previous findings.
– Tells what is to be tested in this paper and what theory backs up the
hypothesis.
3. Data and Analysis
– Tell where the data came from and the tools for data analysis.
4. Results
– State your results or what you find in your analysis.
– This part should answer your hypothesis stated earlier in the
“Hypothesis Development” section of your paper.
5. Conclusion
– This is a very important part of your paper.
– It should briefly summarize your whole paper.
– It should describe your paper in greater detail than your abstract.
– Some authors also provide
– a direction for future research
– a word of cautions or limitations
1-16
General Format of a Theoretical Paper
1. Introduction and Literature review
– Similar to that of an empirical paper.
2. Assumptions of the model
– There are many factors that influence or explain
economic behaviors.
– We need to discard things that are not important and
focus only on the thing under the study.
– Therefore, we need to provide some assumptions to
exclude the things we don’t consider.
3. The Model
– Explain economic behaviors with relationships,
mathematical formulas, or theory.
4. Results
– State your results.
5. Conclusion
– similar to that of an empirical paper.
1-17
Types of Data
Primary data
– Data you collect by yourself from the interviews or
observations.
Secondary data
– Data that already exist in print or electronic format.
– Some are freely available, some are available from data
vendors and require subscription or per-usage fees.
– Examples of data in electronic format
CompuStat (from S&P)
DataStream
CRSP (from Center for Research in Securities Prices, University of
Chicago)
SET SMART (from Stock Exchange of Thailand)
– Examples of data in print format
SET publications
Newspaper
Magazines
Company annual reports to shareholders
Data can also be categorized as time-series or cross-
sectional.
1-18
SET SMART
A web-based database
There are two versions
– Investor version (available for purchase): Non real-time
data, plus limited historical data
– Enterprise version: Only historical data dated back to
1975
Real-time price data are available free of charge
at www.set.or.th
Enterprise version can only be accessed from our
campus at http://set.oz.cmmu.net (may require
initial set-up)
1-19
Other Considerations
Plagiarism
Referencing system and format
(Ch. 1 & 2)
1-21
What is Econometrics?
Econometrics tries to model and
quantify actual economic
phenomena.
Economics provides us with the
theory that is abstract in nature.
Econometrics uses real data and
quantifies the theory.
1-22
Uses of Econometrics
1. To model economic phenomena
– What X’s explain Y?
– How X’s affect Y and what are their magnitudes?
– For example,
Economic theory: Demand = f(Price, Income)
Econometrics: Demand = 100-0.5*Price+5*Income
2. To test hypotheses about economic theory
– How strong is the relationship between each X and Y?
– Is it significant?
– Is it what we expected?
– Is the model a good fit overall?
3. To forecast the future
– What is the value of Y, given X’s?
1-23
Econometric Approaches
Given a data set to work with, we
could use many different approaches
to come up with a model or
equation.
One of many econometric
approaches is regression analysis.
By far, regression analysis is the
most widely used approach.
1-24
What is Regression Analysis?
Regression analysis is a statistical method
that tries to specify a single equation that
describes what, how, and how much X’s
affect Y.
Y is called a dependent variable.
X’s are called independent variables.
Although Y is called a dependent variable,
it doesn’t mean that Y is dependent on
X’s, or X’s cause Y.
Regression analysis does not imply cause
and effect or causality.
1-25
Single-Equation Linear Models
A single-equation linear model takes the form,
Y = β0 + β1X
It is called “single-equation” because there are no
other equations that specify relationships
between X and Y.
It is called “linear” because X is linearly related to
Y. In other words, if you were to plot a graph
between X and Y, you would get a straight line.
β0 is called the intercept. It tells the value of Y if
X is zero. We normally do not care much about
β0 .
Our interest is mainly on β1, which is called the
coefficient. It measures the direction and the
magnitude of change in Y if X changes by one
unit.
1-26
What does “linear” mean?
A regression equation can be linear in coefficients or
linear in variables.
Linear in coefficients: all β’s must be in their simplest
form (sometimes, after a transformation).
Linear in variables: all X’s must be in their simplest
form (sometimes, after a modification). A plot of X
against Y must be a straight line.
Q: Which of the followings are linear in coefficients
and which are linear in variables?
Y = β0 + β1X
Y = β0 + β1lnX
lnY= β0 + β1X
Y = β0 + β1X1 + β2X2 + β3X1X2
Y e 0 X 11 X 2 2
1-27
What do we mean by “linear” regression?
We mean that the regression equation
must be “linear in coefficients”.
The regression equation DOES NOT have
to be “linear in variables.”
Therefore, our regression equation can be,
ln Y = β0 + β1X3, and still satisfies the
linearity in coefficients condition.
Linearity in coefficients condition will
always be satisfied if you can write the
regression equation in the form,
f(Y) = β0 + β1f(X).
1-28
Stochastic or Random Error Term
Usually the relationship between X and Y is not
exact.
So, there will usually be some variation in Y that
cannot be explained completely by X.
Causes of this random error are
– Omitted variables,
– Measurement error,
– Incorrect functional form, or
– Purely random error.
Regression analysis only allows an additive
random error term like, Y = β0 + β1X + .
Regression analysis does not allow a
multiplicative random error term like, Y = β0 +
β1X + X.
1-29
About a Regression Equation
A typical regression equation is written as,
Y = β0 + β1X +
β0 + β1X is the deterministic term.
is the stochastic or random error term.
Y = β0 + β1X + is the “population” regression
equation, which is unknown.
Given a value of X, we can estimate only the
expected value of Y, not the true value of Y.
Our job is to estimate the “population”
regression equation with a sample set of data.
As a result, we will get the “estimated”
regression equation like, ˆ ˆ ˆ
Y 0 1 X 1-30
Estimate a Regression Equation
Y1 = 0 + 1*X1 + 1
Y2 = 0 + 1*X2 + 2
Y3 = 0 + 1*X3 + 3
Y4 = 0 + 1*X4 + 4
Y5 = 0 + 1*X5 + 5
… … … …
… … … …
Yn = 0 + 1*Xn + n
1-31
Estimation
True Regression Equation Estimated Regression Equation
β0 β̂0 or b0
β1 β̂1 or b1
ε e
Y β̂ 0 β̂1X e, or
Y β 0 β1X ε
Y b0 b1 e
1-32
Estimation (continue…)
is not observable.
But we can observe e (called residuals).
ei Yi Yˆi
ei Yi ˆ0 ˆ1 X i
There are many ways to come up with the
estimates for 0 and 1.
Given data on Xi and Yi.
i 1
n
ˆ 2
OLS minimizes Y Y
i i
i 1
1-34
Why use OLS?
1. OLS is quite easy to use.
2. Minimizing the sum of the squared
residuals is a theoretically sound
objective.
* The other objective could be to minimize Σei or Σ|ei|, but
there are problems with this objective.
3. OLS estimates possess a number of
useful properties.
– The regression line passes through the means of X & Y.
– The sum of the residuals is exactly zero.
– Under some restrictions, OLS is the best estimator.
1-35
How do we estimate 0 and 1?
n
ˆ ˆX 2
The objective function is, Min Y i 0 1 i
i 1
n
i 1
2 Yi ˆ0 ˆ1 X i 1 0
ˆ 0 i 1
n
Yi ˆ0 ˆ1 X i
2
n
i 1
2 Yi ˆ0 ˆ1 X i X i 0
ˆ1 i 1
1-36
Normal Equations
n
n
i 1
Yi n 0 1 X i
ˆ ˆ
i 1
n
n
n
2
i 1
X iYi 0 X i 1 X i
ˆ
i 1
ˆ
i 1
Estimates
X X Yi Y
n
i
̂1 i 1
X i X
n
2
i 1
ˆ0 Y ˆ1 X
1-37
TSS, ESS, and RSS
n
Total Sum of Squares, TSS Yi Y
2
i 1
n
Explained Sum of Squares, ESS Yˆi Y
2
i 1
n
Residual Sum of Squares, RSS Yi Yˆ
2
i 1
1-38
For a given data set, TSS cannot
change and does not depend on the
estimation method.
OLS is the method that guarantees
to find the estimates of 0 and 1
that minimize RSS.
We usually delegate the task of
finding the estimates of 0 and 1 to
the computer, especially for a
multiple regression analysis.
1-39
Coefficients in a Multiple Regression
The coefficient in a multiple regression
measures how much Y will change if the
dependent variable in question changes by
one unit, holding constant the other
independent variables included in the
regression equation.
This is equivalent to a partial derivation of
Y with respect to each X.
Multiple regression analysis technically
allows us to hold constant the influences
of other variables and focus on the
influence of just one variable at a time
(where otherwise impossible).
1-40
Goodness of Fit Test: R2
We want the regression equation to
fit the data well.
One way to measure the overall
goodness of fit is to compare ESS to
TSS.
That measure is R2.
R2 = ESS/TSS = 1 – (RSS/TSS)
0 ≤ R2 ≤ 1
1-41
In a time-series type of data, R2 of
0.9 may be considered a good fit.
On the other hand, in a cross-
sectional type of data, R2 of 0.5 may
be considered a good fit.
How high the value of R2 is
considered a good fit is very
subjective.
1-42
Adj-R2
When you add one more variable to the
regression equation, your R2 value will either stay
the same or increase. There is no way that the
R2 value will decline. Why?
Each time we add one independent variable to
the regression equation, we lose one more
degree of freedom.
To adjust for the loss of degrees of freedom as a
result of the addition of independent variables,
we calculate the adjusted R2, or Adj-R2.
Adj R 1 1 R
2 2 n 1
n K 1
n = # of observations, K = # of independent variables.
1-43
Cautions about R2 and Adj-R2
Both R2 and Adj-R2 measure the overall
goodness of fit only.
Obtaining a high value of R2 or Adj-R2 for
the regression equation is desirable, but it
is not all that matters!
Comparing two competing regression
models on the basis of R2 or Adj-R2 can be
misleading and dangerous.
You should evaluate the competing
equations with other criteria (which will be
discussed in later session).
1-44