Cheat Sheet: Summarize Data Estimate Models, 1/2
Cheat Sheet: Summarize Data Estimate Models, 1/2
their equivalent expression in R. environment simultaneously, and hence must be reg wage educ // simple regression mod1 <- lm(wage ~ educ, data =
specified with each function call. Note: R does not have of `wage` by `educ` (Results wage1) # simple regression of
an equivalent to Stata’s `codebook` command. printed automatically). `wage` by `educ`, store results in
References for importing/cleaning data, manipulating `mod1`
variables, and other basic commands include Hanck summary(mod1) # print summary of
browse // open browser for loaded data
et al. (2019), Econometrics with R, and Wickham and reg wage educ if nonwhite==1 // `mod1` results
Grolemund (2017), R for Data Science. add condition with if statement
describe // describe structure of
loaded data mod2 <- lm(wage ~ educ, data =
Example data comes from Wooldridge Introductory summarize // display summary reg wage educ exper, robust // wage1[wage1$nonwhite==1, ]) # add
Econometrics: A Modern Approach. Download Stata statistics for all variables in multiple regression using HC1 condition with if statement`
data sets here. R data sets can be accessed by dataset robust standard errors mod3 <- estimatr::lm_robust(wage ~
installing the `wooldridge` package from CRAN. list in 1/6 // display first 6 rows reg wage educ exper, educ + exper, data = wage1, se_type
cluster(numdep) // use clustered = “stata”) # multiple regression
tabulate educ // tabulate `educ` standard errors with HC1 (Stata default) robust
All R commands written in base R, unless otherwise variable frequencies standard errors, use {estimatr}
noted. tabulate educ female // cross-tabulate package
`educ` and `female` frequencies Tip: An alternate way to compute robust mod4 <- estimatr::lm_robust(wage ~
standard errors in R for any models not
Setup covered by {estimatr} package is load the
educ + exper, data = wage1,
clusters = numdep) # use clustered
Note: While it is common to create a `log` file in {AER} package and run: standard errors.
Stata to store the commands and output of Stata View(wage1) # open browser for loaded
`wage1` data coeftest(mod1, vcov. = vcovHC,
sessions, the equivalent does not exist in R. A more
savvy version in R is to create a R-markdown file to type = "HC1")
str(wage1) # describe structure of mod_log <- glm(inlf~nwifeinc + educ
capture code and output. `wage1` data + family=binomial(link="logit"),
summary(wage1) # display summary data=mroz) # estimate logistic
ssc install outreg2 // install statistics for `wage1` variables
MLE (Logit/Probit/Tobit) example data:`mroz` regression
`outreg2` package. Note: unlike R head(wage1) # display first 6 (default)
packages, Stata packages do not have rows data
to be loaded each time once installed. tail(wage1) # display last 6 rows logit inlf nwifeinc educ // mod_pro <- glm(inlf~nwifeinc + educ
estimate logistic regression + family=binomial(link=“probit"),
table(wage1$educ) #tabulate `educ` data=mroz) # estimate logistic
install.packages(“wooldridge”) # install frequencies regression
table(“yrs_edu” = wage1$educ, “female” =
probit inlf nwifeinc educ //
`wooldridge` package
wage1$female) # tabulate `educ` estimate logistic regression
data(package = “wooldridge”) # list frequencies name table columns mod_tob <- AER::tobit(hours ~
datasets in `wooldridge` package nwifeinc + educ, left = 0, data =
tobit hours nwifeinc educ, ll(0) mroz) # estimate tobit regression,
Tip: The {AER} package will automatically // estimate tobit regression,
load(wage1) # load `wage1` dataset into lower-limit of y censored at zero,
load other useful dependent packages, lower-limit of y censored at zero use {AER} package
session including: {car}, {lmtest}, {sandwich} which
?wage1 # consult documentation on are used for many of the commands listed in
`wage1` dataset this cheat sheet. Postestimation, 1/2 example data:`wage1`
Note: Postestimation commands in Stata apply to the most recently run estimation commands.
Basic plots example data:`wage1`
hist(wage) // histogram of `wage` hist(wage1$wage) # histogram of `wage` reg wage educ // estimation used mod1 <- lm(wage ~ educ, data =
hist(wage), by(nonwhite) // for the following post-estimation wage1) # estimation used for the
scatter(wage educ) // scatter plot commands following post-estimation commands
plot(y = wage$1wage, x = wage1$educ) #
of `wage` by `educ` scatter plot predict yhat // get predicted yhat <- predict(mod1) # get
twoway (scatter wage educ) (lfit abline(lm(wage1$wage~wage1$educ), values from last estimation, store predicted values
wage educ) // scatter plot with col=“red”) # add fitted line to as `yhat`
fitted line scatterplot
predict e, res // get residuals e <- residuals(mod1) # get residual
graph box wage, by(nonwhite) //
boxplot(wage1$wage~wage1$nonwhite) # from last estimation, store as `e` values
boxplot of wage by `nonwhite`
boxplot of `wage` by `nonwhite`
tab numdep, gen(numdep) // create wage1 <- ivreg lwage (educ = fatheduc), modiv <-AER::ivreg(lwage ~ educ |
fastDummies::dummy_cols(wage1, first // show results of first fatheduc, data = mroz) # estimate
dummy variables for `numdep` 2SLS with {AER} package
select_columns = “numdep”) # create stage regression
recode exper (1/20 = 1 "1 to 20 dummy variables for `numdep`, use summary(modiv, diagnostics = TRUE)
{fastDummies} package etest first // test IV and
years") (21/40 = 2 "21 to 40 years") # get diagnostic tests of IV and
endogenous variable endogenous variable
(41/max = 3 "41+ years"), ivreg lwage(educ = fatheduc) //
{
gen(experlvl) // recode `exper` and wage1$experlvl <- 3 # recode `exper`
show results of 2SLS directly
gen new variable wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1