factorModelTutorial Handout PDF
factorModelTutorial Handout PDF
Workshop Overview
• About Me
• Brief Introduction to R in Finance
• Factor Models for Asset Returns
• Estimation of Factor Models in R
• Factor Model Risk Analysis
• Factor Model Risk Analysis in R
• Modeling Interest Rates in R (brief discussion)
1
About Me
• Robert Richards Chaired Professor of
Economics at the University of Washington
– Adjunct Professor of Applied Mathematics,
Finance, and Statistics
• Co-Director of MS Program in Computational
Finance and Risk Management at UW
• BS in Economics and Statistics from UC
Berkeley
• PhD in Economics from Yale University
© Eric Zivot 2011
2
Brief Introduction to R in Finance
• R is a language and environment for statistical computing and
graphics
• R is based on the S language originally developed by John
Chambers and colleagues at AT&T Bell Labs in the late 1970s and
early 1980s
• R (sometimes called\GNU S" ) is free open source software
licensed under the GNU general public license (GPL 2)
• R development
d l t was initiated
i iti t d by
b Robert
R b t Gentleman
G tl andd Ross
R Ih k
Ihaka
at the University of Auckland, New Zealand
• R is formally known as The R Project for Statistical Computing
• www.r-project.org
• Data Manipulation
• Data Visualization
3
S Language Implementations
• R is the most recent and
full-featured
full featured implementation
of the S language
• Original S - AT & T Bell
Labs
• S-PLUS (S plus a GUI)
• Statistical Sciences, Inc.y
– M
Mathsoft,
h f Inc.,
I I i hf l
Insightful,
Inc., Tibco, Inc.
• R - The R Project for
Statistical Computing
R Timeline
4
Recognition for Software Excellence
The R Foundation
• The R Foundation is the non-profit organization
l t d in
located i Vienna,
Vi A t i which
Austria hi h is
i responsible
ibl
for developing and maintaining R
– Hold and administer the copyright of R software and
documentation
– Support continued development of R
– Organize meetings and conferences related to
statistical computing
5
R Homepage
• http://www.r-project.org
• List
Li off CRAN mirror
i
sites
• Manuals
• FAQs
• Mailing Lists
• Links
CRAN – Comprehensive R
Archive Network
• http://cran.fhcrc.org
• CRAN Mirrors
Mi
– About 75 sites
worldwide
– About 16 sites in US
• R Binaries
• R Packages
P k
• R Sources
• Task Views
6
CRAN Task Views
• Organizes 2600+ R
packages by application
• Relevant tasks for
financial applications:
– Finance
– Time Series
– Econometrics
– Optimization
– Machine Learning
R-Sig-Finance
• https://stat.ethz.ch/mail
man/listinfo r sig
man/listinfo.r-sig-
finance
• Nerve center of the R
finance community
• Daily must read
• Exclusively
E l i l for f
Finance-specific
questions, not general R
questions
© Eric Zivot 2011
7
Other Useful R Sites
• R Seek R specific search site:
– http://www.rseek.org/
• R Bloggers Aggregation of about 100 R blogs:
– http://www.r-bloggers.com
• Stack Overflow Excellent developer Q&A forum
– http://stackoverflow.com
• R Graph Gallery Examples of many possible R graphs
– http://addictedtor.free.fr/graphiques
• Blog from David Smith of Revolution
– http://blog.revolutionanalytics.com
• Inside-R R community site by Revolution Analytics
– http://www.inside-r.org
© Eric Zivot 2011
8
Set Options and Load Packages
# set output options
> options(width = 70, digits=4)
digits 4)
Berndt Data
# load Berndt investment data from fEcofin package
> data(berndtInvest)
> class(berndtInvest)
[1] "data.frame"
> colnames(berndtInvest)
[1] "X.Y..m..d" "CITCRP" "CONED" "CONTIL"
[5] "DATGEN" "DEC" "DELTA" "GENMIL"
[9] "GERBER" "IBM" "MARKET" "MOBIL"
[13]
[ ] "PANAM" "PSNH" "TANDY" "TEXACO"
[17] "WEYER" "RKFREE"
# create data frame with dates as rownames
> berndt.df = berndtInvest[, -1]
> rownames(berndt.df) = as.character(berndtInvest[, 1])
9
Berndt Data
> head(berndt.df, n=3)
CITCRP CONED CONTIL DATGEN DEC
1978-01-01 -0.115 -0.079 -0.129 -0.084 -0.100
1978-02-01 -0.019 -0.003 0.037 -0.097 -0.063
1978-03-01 0.059 0.022 0.003 0.063 0.010
10
Estimation Results
> cbind(beta.hat, diagD.hat, R.square)
beta.hat diagD.hat R.square
CITCRP 0.66778 0.004511 0.31777
CONED 0
0.09102
09102 00.002510
002510 00.01532
01532
CONTIL 0.73836 0.020334 0.11216
DATGEN 1.02816 0.011423 0.30363
DEC 0.84305 0.006564 0.33783
DELTA 0.48946 0.008152 0.12163
GENMIL 0.26776 0.003928 0.07919
GERBER 0.62481 0.005924 0.23694
IBM 0.45302 0.002546 0.27523
MOBIL 0.71352 0.004105 0.36882
PANAM 0.73014 0.015008 0.14337
PSNH 0.21263 0.011872 0.01763
TANDY 1.05549 0.011162 0.31986
TEXACO 0.61328 0.004634 0.27661
WEYER 0.81687 0.004154 0.43083
© Eric Zivot 2011
> par(mfrow=c(1,2))
> barplot(beta.hat, horiz=T, main="Beta values", col="blue",
+ cex.names = 0.75, las=1)
> barplot(R.square, horiz=T, main="R-square values", col="blue",
+ cex.names = 0.75, las=1)
> par(mfrow=c(1,1))
Beta values R-square values
WEYER WEYER
TEXACO TEXACO
TANDY TANDY
PSNH PSNH
PANAM PANAM
MOBIL MOBIL
IBM IBM
GERBER GERBER
GENMIL GENMIL
DELTA DELTA
DEC DEC
DATGEN DATGEN
CONTIL CONTIL
CONED CONED
CITCRP CITCRP
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4
© Eric Zivot 2011
11
Compute Single Index Covariance
# compute single index model covariance/correlation
> cov.si =
as.numeric(var(market.mat))*beta.hat%*%t(beta.hat)
+ diag(diagD.hat)
> cor.si = cov2cor(cov.si)
DATGEN
TEXACO
GENMIL
CITCRP
WEYER
CONTIL
CONED
PANAM
TANDY
DELTA
MOBIL
PSNH
DEC
IBM
CONED
PSNH
GENMIL
CONTIL
DELTA
PANAM
GERBER
IBM
TEXACO
DATGEN
TANDY
DEC
MOBIL
WEYER
CITCRP
12
Sample Correlation Matrix
GERBER
DATGEN
TEXACO
GENMIL
CITCRP
WEYER
CONTIL
CONED
PANAM
TANDY
DELTA
MOBIL
PSNH
DEC
IBM
PSNH
CONED
TEXACO
PANAM
MOBIL
DELTA
IBM
GERBER
GENMIL
DEC
CONTIL
TANDY
DATGEN
WEYER
CITCRP
13
Single Index Weights
0.3
0.2
0.1
0.0
TEXACO
IBM
PANAM
CITCRP
DELTA
TANDY
CONED
DATGEN
DEC
GERBER
PSNH
WEYER
CONTIL
GENMIL
MOBIL
Sample Weights
0.3
0.2
0.1
0.0
TEXACO
IBM
PANAM
CITCRP
DELTA
TANDY
CONED
DATGEN
DEC
GERBER
PSNH
WEYER
CONTIL
GENMIL
MOBIL
14
List Output
> names(reg.list)
[1]
[ ] "CITCRP" "CONED" "CONTIL" "DATGEN" "DEC"
[6] "DELTA" "GENMIL" "GERBER" "IBM" "MOBIL"
[11] "PANAM" "PSNH" "TANDY" "TEXACO" "WEYER"
> class(reg.list$CITCRP)
[1] "lm"
> reg.list$CITCRP
Call:
lm(formula = si.formula, data = reg.df)
Coefficients:
(Intercept) MARKET
0.00252 0.66778
Call:
lm(formula = si.formula, data = reg.df)
Residuals:
Min 1Q Median 3Q Max
-0.16432 -0.05012 0.00226 0.04351 0.22467
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00252 0.00626 0.40 0.69
MARKET 0.66778 0.09007 7.41 2.0e-11 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
15
Plot Actual and Fitted Values:
Time Series
# use chart.TimeSeries() function from
# PerformanceAnalytics
P f A l ti package
k
0.0
-0.1
-0.2
Fitted
Actual
-0.3
Jan 78 Jan 79 Jan 80 Jan 81 Jan 82 Jan 83 Jan 84 Jan 85 Jan 86 Jan 87
Date
16
Plot Actual and Fitted Values:
Cross Section
0.0
-0.1
-0.2
-0.3
MARKET
© Eric Zivot 2011
17
Extract Regression Information 1
## extract beta values, residual sd's and R2's from list
## of regression objects by brute force loop
> reg.vals
l = matrix(0,
t i (0 llength(asset.names),
th( t ) 3)
> rownames(reg.vals) = asset.names
> colnames(reg.vals) = c("beta", "residual.sd",
+ "r.square")
> for (i in names(reg.list)) {
+ tmp.fit = reg.list[[i]]
+ tmp.summary = summary(tmp.fit)
+ reg.vals[i,
g [ , "beta"]
] = coef(tmp.fit)[2]
( p )[ ]
+ reg.vals[i, "residual.sd"] = tmp.summary$sigma
+ reg.vals[i, "r.square"] = tmp.summary$r.squared
+}
Regression Results
> reg.vals
beta residual.sd r.square
CITCRP 0.66778 0.06716 0.31777
CONED 00.09102
09102 0
0.05010
05010 00.01532
01532
CONTIL 0.73836 0.14260 0.11216
DATGEN 1.02816 0.10688 0.30363
DEC 0.84305 0.08102 0.33783
DELTA 0.48946 0.09029 0.12163
GENMIL 0.26776 0.06268 0.07919
GERBER 0.62481 0.07697 0.23694
IBM 0.45302 0.05046 0.27523
MOBIL 0.71352 0.06407 0.36882
PANAM 0.73014 0.12251 0.14337
PSNH 0.21263 0.10896 0.01763
TANDY 1.05549 0.10565 0.31986
TEXACO 0.61328 0.06808 0.27661
WEYER 0.81687 0.06445 0.43083
© Eric Zivot 2011
18
Extract Regression Information 2
# alternatively use R apply function for list
# objects
j - lapply
pp y or sapply
pp y
extractRegVals = function(x) {
# x is an lm object
beta.val = coef(x)[2]
residual.sd.val = summary(x)$sigma
r2.val = summary(x)$r.squared
ret.vals = c(beta.val, residual.sd.val, r2.val)
names(ret.vals) = c("beta", "residual.sd",
"
"r.square")
")
return(ret.vals)
}
> reg.vals = sapply(reg.list, FUN=extractRegVals)
Regression Results
> t(reg.vals)
beta residual.sd r.square
CITCRP 0.66778 0.06716 0.31777
CONED 00.09102
09102 0
0.05010
05010 00.01532
01532
CONTIL 0.73836 0.14260 0.11216
DATGEN 1.02816 0.10688 0.30363
DEC 0.84305 0.08102 0.33783
DELTA 0.48946 0.09029 0.12163
GENMIL 0.26776 0.06268 0.07919
GERBER 0.62481 0.07697 0.23694
IBM 0.45302 0.05046 0.27523
MOBIL 0.71352 0.06407 0.36882
PANAM 0.73014 0.12251 0.14337
PSNH 0.21263 0.10896 0.01763
TANDY 1.05549 0.10565 0.31986
TEXACO 0.61328 0.06808 0.27661
WEYER 0.81687 0.06445 0.43083
© Eric Zivot 2011
19
Industry Factor Model
# create loading matrix B for industry factor model
> n.stocks = ncol(returns.mat)
> tech.dum = oil.dum = other.dum =
+ matrix(0,n.stocks,1)
> rownames(tech.dum) = rownames(oil.dum) =
+ rownames(other.dum) = asset.names
> tech.dum[c(4,5,9,13),] = 1
> oil.dum[c(3,6,10,11,14),] = 1
> other.dum = 1 - tech.dum - oil.dum
> B.mat = cbind(tech.dum,oil.dum,other.dum)
> colnames(B.mat) = c("TECH","OIL","OTHER")
20
Multivariate Least Squares
Estimation of Factor Returns
# returns.mat is T x N matrix, and fundamental factor
# model treats R as N x T.
> returns.mat = t(returns.mat)
# multivariate OLS regression to estimate K x T matrix
# of factor returns (K=3)
> F.hat =
+ solve(crossprod(B.mat))%*%t(B.mat)%*%returns.mat
21
OLS estimates of industry factors
0.3
0.2
0.1
TECH
-0.1
-0.2
05
0.0
OTHER
-0.05
-0.15
Index
22
GLS Factor Weights
> t(H.hat)
TECH OIL OTHER
CITCRP 0.0000 0.0000 0.19918
CONED 0.0000 0.0000 0.22024
CONTIL 0.0000 0.0961 0.00000
DATGEN 0.2197 0.0000 0.00000
DEC 0.3188 0.0000 0.00000
DELTA 0.0000 0.2233 0.00000
GENMIL 0.0000 0.0000 0.22967
GERBER 0.0000 0.0000 0.12697
IBM 0
0.2810
2810 0
0.0000
0000 0
0.00000
00000
MOBIL 0.0000 0.2865 0.00000
PANAM 0.0000 0.1186 0.00000
PSNH 0.0000 0.0000 0.06683
TANDY 0.1806 0.0000 0.00000
TEXACO 0.0000 0.2756 0.00000
WEYER 0.0000 0.0000 0.15711
© Eric Zivot 2011
0.0
OLS
-0.2
GLS
Index
0.0
OLS
-0.2
GLS
Index
OLS
-0.15
GLS
Index
23
Industry Factor Model Covariance
# compute covariance and correlation matrices
> cov.ind = B.mat%
B.mat%*%var(t(F.hat.gls))%*%t(B.mat)
%var(t(F.hat.gls))% %t(B.mat) +
+ diag(diagD.hat)
> cor.ind = cov2cor(cov.ind)
# plot correlations using plotcorr() from ellipse
# package
> rownames(cor.ind) = colnames(cor.ind)
> ord <- order(cor.ind[1,])
> ordered.cor.ind <- cor.ind[ord, ord]
> plotcorr(ordered.cor.ind,
(
+ col=cm.colors(11)[5*ordered.cor.ind + 6])
GENMIL
CITCRP
WEYER
CONTIL
CONED
PANAM
TANDY
DELTA
MOBIL
PSNH
DEC
IBM
CONTIL
PANAM
DELTA
TEXACO
MOBIL
TANDY
DATGEN
PSNH
IBM
DEC
GERBER
WEYER
CONED
GENMIL
CITCRP
24
Industry Factor Model Summary
> ind.fm.vals
TECH OIL OTHER fm.sd residual.sd r.square
CITCRP 0 0 1 0.07291 0.05468 0.4375
CONED 0 0 1 0.07092 0.05200 0.4624
CONTIL 0 1 0 0.13258 0.11807 0.2069
DATGEN 1 0 0 0.10646 0.07189 0.5439
DEC 1 0 0 0.09862 0.05968 0.6338
DELTA 0 1 0 0.09817 0.07747 0.3773
GENMIL 0 0 1 0.07013 0.05092 0.4728
GERBER 0 0 1 0.08376 0.06849 0.3315
IBM 1 0 0 0
0.10102
10102 0
0.06356
06356 0
0.6041
6041
MOBIL 0 1 0 0.09118 0.06839 0.4374
PANAM 0 1 0 0.12222 0.10630 0.2435
PSNH 0 0 1 0.10601 0.09440 0.2069
TANDY 1 0 0 0.11159 0.07930 0.4950
TEXACO 0 1 0 0.09218 0.06972 0.4279
WEYER 0 0 1 0.07821 0.06157 0.3802
© Eric Zivot 2011
0.15
0.10
0.05
0.00
IBM
PANAM
TEXACO
CONED
DATGEN
DEC
GERBER
PSNH
EYER
CITCRP
DELTA
TANDY
CONTIL
GENMIL
MOBIL
Sample Weights
0.3
0.2
0.1
0.0
IBM
PANAM
TEXACO
CITCRP
DELTA
TANDY
CONED
DATGEN
DEC
GERBER
PSNH
EYER
CONTIL
GENMIL
MOBIL
25
Statistical Factor Model: Principal
Components Method
# continue
ti t
to use B
Berndt
dt d
data
t
> returns.mat = as.matrix(berndt.df[, c(-10, -17)])
# use R princomp() function for principal component
# analysis
> pc.fit = princomp(returns.mat)
> class(pc.fit)
[1] "princomp"
> names(pc.fit)
[1] "sdev" "loadings" "center" "scale" "n.obs"
[6] "scores" "call"
eigenvectors
principal components
© Eric Zivot 2011
26
Eigenvalue Scree Plot
pc.fit
0.05
0.04
0.03
Variances
0.02
0.01
0.00
> plot(pc.fit)
© Eric Zivot 2011
Loadings (eigenvectors)
> loadings(pc.fit) # pc.fit$loadings
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
CITCRP 0.273
CONED
CONTIL 0.377 -0.824 -0.199 0.157 0.144 -0.191
DATGEN 0.417 0.152 0.277 -0.329 0.287 -0.497
DEC 0.305 0.129 0.202 -0.141 0.368
DELTA 0.250 0.179 0.258 0.242 0.481
GENMIL 0.133 0.128 0.249 0.117
GERBER 0.167 -0.199 -0.418 0.349
IBM 0
0.146
146 0
0.142
142
MOBIL 0.155 0.248 -0.241 -0.459 -0.155
PANAM 0.311 0.365 -0.630 0.227 -0.343 -0.390 -0.197
PSNH -0.527 -0.692 0.249 0.360
TANDY 0.412 0.207 0.188 0.323 0.356 0.385 -0.564
TEXACO 0.132 0.245 -0.219 -0.430 -0.325
WEYER 0.265 0.131 -0.128 -0.111 0.152 0.291
© Eric Zivot 2011
27
Principal Component Factors
> head(pc.fit$scores[, 1:4])
Comp 1
Comp.1 Comp
Comp.22 Comp
Comp.3
3 Comp
Comp.4
4
1978-01-01 -0.28998 0.069162 -0.07621 0.0217151
1978-02-01 -0.14236 -0.141967 -0.01794 0.0676476
1978-03-01 0.14927 0.113295 -0.09307 0.0326150
1978-04-01 0.35056 -0.032904 0.01128 -0.0168986
1978-05-01 0.10874 0.004943 -0.04640 0.0612666
1978-06-01 -0.06948 0.041330 -0.06757 -0.0009816
Comp.1
0.5
0.0
Value
-0.5
-1.0
Jan 78 Jan 79 Jan 80 Jan 81 Jan 82 Jan 83 Jan 84 Jan 85 Jan 86 Jan 87
Date
28
Direct Eigenvalue Computation
> eigen.fit = eigen(var(returns.mat))
> names(eigen.fit)
[1] "
"values"
l " ""vectors"
t "
> names(eigen.fit$values) =
+ rownames(eigen.fit$vectors) = asset.names
29
Centered and Uncentered Principle Component Factors
0.5
Value
0.0
-0.5
Comp.1
Comp.1.uc
-1.0
Jan 78 Jan 79 Jan 80 Jan 81 Jan 82 Jan 83 Jan 84 Jan 85 Jan 86 Jan 87
Date
© Eric Zivot 2011
30
Comp.1.uc
ρ = 0.77
0.5
Value
0.0
-0.5
Comp.1.uc
MARKET
Jan 78 Jan 79 Jan 80 Jan 81 Jan 82 Jan 83 Jan 84 Jan 85 Jan 86 Jan 87
Date
© Eric Zivot 2011
31
Factor mimicking weights
0.12
0.10
0.08
0.06
0.04
0.02
0.00
TEXACO
IBM
PANAM
CITCRP
DELTA
TANDY
CONED
DATGEN
DEC
GERBER
PSNH
WEYER
CONTIL
GENMIL
MOBIL
32
Regression Results
> cbind(beta.hat, diagD.hat, R.square)
beta.hat diagD.hat R.square
CITCRP 0.9467 0.002674 0.59554
CONED 0
0.1542
1542 00.002444
002444 00.04097
04097
CONTIL 1.3085 0.015380 0.32847
DATGEN 1.4483 0.007189 0.56176
DEC 1.0586 0.004990 0.49664
DELTA 0.8685 0.005967 0.35704
GENMIL 0.4602 0.003336 0.21808
GERBER 0.5803 0.006284 0.19058
IBM 0.5084 0.002378 0.32318
MOBIL 0.5387 0.005229 0.19600
PANAM 1.0785 0.012410 0.29168
PSNH 0.2918 0.011711 0.03096
TANDY 1.4300 0.007427 0.54746
TEXACO 0.4591 0.005480 0.14455
WEYER 0.9195 0.003583 0.50904
© Eric Zivot 2011
Regression Results
Beta values R-square values
WEYER WEYER
TEXACO TEXACO
TANDY TANDY
PSNH PSNH
PANAM PANAM
MOBIL MOBIL
IBM IBM
GERBER GERBER
GENMIL GENMIL
DELTA DELTA
DEC DEC
DATGEN DATGEN
CONTIL CONTIL
CONED CONED
CITCRP CITCRP
33
Principal Components Correlations
GERBER
DATGEN
TEXACO
GENMIL
CITCRP
WEYER
CONTIL
CONED
PANAM
TANDY
DELTA
MOBIL
PSNH
DEC
IBM
PSNH
CONED
TEXACO
GERBER
MOBIL
GENMIL
PANAM
IBM
CONTIL
DELTA
DEC
WEYER
TANDY
DATGEN
CITCRP
0.3
0.2
0.1
0.0
IBM
PANAM
TEXACO
CITCRP
DELTA
TANDY
CONED
DATGEN
DEC
GERBER
PSNH
EYER
CONTIL
MOBIL
GENMIL
Sample Weights
0.3
0.2
0.1
0.0
TEXACO
IBM
PANAM
CONED
DATGEN
DEC
GERBER
PSNH
EYER
TANDY
CITCRP
DELTA
CONTIL
GENMIL
MOBIL
34