0% found this document useful (0 votes)
96 views

Cheat Sheet: Summarize Data Estimate Models, 1/2

This cheat sheet provides Stata commands for econometric analysis and their equivalent expressions in R. It summarizes how to import and manipulate data, estimate linear regression models with robust standard errors, and estimate logistic and tobit models. Example data comes from Wooldridge's introductory econometrics textbook and can be accessed in R by installing the wooldridge package. The cheat sheet demonstrates how to estimate models, display results, and perform post-estimation tests in both Stata and R.

Uploaded by

Gerald
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Cheat Sheet: Summarize Data Estimate Models, 1/2

This cheat sheet provides Stata commands for econometric analysis and their equivalent expressions in R. It summarizes how to import and manipulate data, estimate linear regression models with robust standard errors, and estimate logistic and tobit models. Example data comes from Wooldridge's introductory econometrics textbook and can be accessed in R by installing the wooldridge package. The cheat sheet demonstrates how to estimate models, display results, and perform post-estimation tests in both Stata and R.

Uploaded by

Gerald
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Stata to R :: CHEAT SHEET

Introduction Summarize Data example data: `wage1` Estimate Models, 1/2


This cheat sheet summarizes common Stata Where Stata only allows one to work with one data set
commands for econometric analysis and provides at a time, multiple data sets can be loaded into the R OLS example data: `wage1`

their equivalent expression in R. environment simultaneously, and hence must be reg wage educ // simple regression mod1 <- lm(wage ~ educ, data =
specified with each function call. Note: R does not have of `wage` by `educ` (Results wage1) # simple regression of
an equivalent to Stata’s `codebook` command. printed automatically). `wage` by `educ`, store results in
References for importing/cleaning data, manipulating `mod1`
variables, and other basic commands include Hanck summary(mod1) # print summary of
browse // open browser for loaded data
et al. (2019), Econometrics with R, and Wickham and reg wage educ if nonwhite==1 // `mod1` results
Grolemund (2017), R for Data Science. add condition with if statement
describe // describe structure of
loaded data mod2 <- lm(wage ~ educ, data =
Example data comes from Wooldridge Introductory summarize // display summary reg wage educ exper, robust // wage1[wage1$nonwhite==1, ]) # add
Econometrics: A Modern Approach. Download Stata statistics for all variables in multiple regression using HC1 condition with if statement`
data sets here. R data sets can be accessed by dataset robust standard errors mod3 <- estimatr::lm_robust(wage ~
installing the `wooldridge` package from CRAN. list in 1/6 // display first 6 rows reg wage educ exper, educ + exper, data = wage1, se_type
cluster(numdep) // use clustered = “stata”) # multiple regression
tabulate educ // tabulate `educ` standard errors with HC1 (Stata default) robust
All R commands written in base R, unless otherwise variable frequencies standard errors, use {estimatr}
noted. tabulate educ female // cross-tabulate package
`educ` and `female` frequencies Tip: An alternate way to compute robust mod4 <- estimatr::lm_robust(wage ~
standard errors in R for any models not
Setup covered by {estimatr} package is load the
educ + exper, data = wage1,
clusters = numdep) # use clustered
Note: While it is common to create a `log` file in {AER} package and run: standard errors.
Stata to store the commands and output of Stata View(wage1) # open browser for loaded
`wage1` data coeftest(mod1, vcov. = vcovHC,
sessions, the equivalent does not exist in R. A more
savvy version in R is to create a R-markdown file to type = "HC1")
str(wage1) # describe structure of mod_log <- glm(inlf~nwifeinc + educ
capture code and output. `wage1` data + family=binomial(link="logit"),
summary(wage1) # display summary data=mroz) # estimate logistic
ssc install outreg2 // install statistics for `wage1` variables
MLE (Logit/Probit/Tobit) example data:`mroz` regression
`outreg2` package. Note: unlike R head(wage1) # display first 6 (default)
packages, Stata packages do not have rows data
to be loaded each time once installed. tail(wage1) # display last 6 rows logit inlf nwifeinc educ // mod_pro <- glm(inlf~nwifeinc + educ
estimate logistic regression + family=binomial(link=“probit"),
table(wage1$educ) #tabulate `educ` data=mroz) # estimate logistic
install.packages(“wooldridge”) # install frequencies regression
table(“yrs_edu” = wage1$educ, “female” =
probit inlf nwifeinc educ //
`wooldridge` package
wage1$female) # tabulate `educ` estimate logistic regression
data(package = “wooldridge”) # list frequencies name table columns mod_tob <- AER::tobit(hours ~
datasets in `wooldridge` package nwifeinc + educ, left = 0, data =
tobit hours nwifeinc educ, ll(0) mroz) # estimate tobit regression,
Tip: The {AER} package will automatically // estimate tobit regression,
load(wage1) # load `wage1` dataset into lower-limit of y censored at zero,
load other useful dependent packages, lower-limit of y censored at zero use {AER} package
session including: {car}, {lmtest}, {sandwich} which
?wage1 # consult documentation on are used for many of the commands listed in
`wage1` dataset this cheat sheet. Postestimation, 1/2 example data:`wage1`

Note: Postestimation commands in Stata apply to the most recently run estimation commands.
Basic plots example data:`wage1`

hist(wage) // histogram of `wage` hist(wage1$wage) # histogram of `wage` reg wage educ // estimation used mod1 <- lm(wage ~ educ, data =
hist(wage), by(nonwhite) // for the following post-estimation wage1) # estimation used for the
scatter(wage educ) // scatter plot commands following post-estimation commands
plot(y = wage$1wage, x = wage1$educ) #
of `wage` by `educ` scatter plot predict yhat // get predicted yhat <- predict(mod1) # get
twoway (scatter wage educ) (lfit abline(lm(wage1$wage~wage1$educ), values from last estimation, store predicted values
wage educ) // scatter plot with col=“red”) # add fitted line to as `yhat`
fitted line scatterplot
predict e, res // get residuals e <- residuals(mod1) # get residual
graph box wage, by(nonwhite) //
boxplot(wage1$wage~wage1$nonwhite) # from last estimation, store as `e` values
boxplot of wage by `nonwhite`
boxplot of `wage` by `nonwhite`

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10


Create/Edit Variables example data: `wage1` Estimate Models, 2/2
Note: where Stata only allows one to work with one data set at a time, multiple data sets can be loaded into
the R environment simultaneously, hence the data set must be specified for each command. Panel/Longitudinal example data: `murder`

xtset id year // set `id` as plm::is.pbalanced(murder$id,


gen exper2 = exper^2 // create wage1$exper2 <- wage1$exper^2 # murder$year) # check panel balance
`exper` squared variable create `exper` squared variable entities (panel) and `year` as
time variable with {plm} package
egen wage_avg = mean(wage) // create wage1$wage_avg <- mean(wage1$wage) #
create average wage variable xtdescribe // describe pattern of modfe <- plm::plm(mrdrte ~ unem,
average wage variable index = c("id", "year"),model =
xt data
"within", data = murder) # estimate
drop tenursq // drop `tenursq` xtsum // summarize xt data fixed effects (“within”) model
variable wage1$tenursq <- NULL #drop `tenursq`
xtreg mrdrte unem, fe // fixed summary(modfe) # display results
effects regression
keep wage educ exper nonwhite // keep wage1 <- wage1[ , c(“wage”, “educ”,
selected variables “exper”, “nonwhite”)] # keep selected
variables Instrumental Variables (2SLS) example data: `mroz`

tab numdep, gen(numdep) // create wage1 <- ivreg lwage (educ = fatheduc), modiv <-AER::ivreg(lwage ~ educ |
fastDummies::dummy_cols(wage1, first // show results of first fatheduc, data = mroz) # estimate
dummy variables for `numdep` 2SLS with {AER} package
select_columns = “numdep”) # create stage regression
recode exper (1/20 = 1 "1 to 20 dummy variables for `numdep`, use summary(modiv, diagnostics = TRUE)
{fastDummies} package etest first // test IV and
years") (21/40 = 2 "21 to 40 years") # get diagnostic tests of IV and
endogenous variable endogenous variable
(41/max = 3 "41+ years"), ivreg lwage(educ = fatheduc) //

{
gen(experlvl) // recode `exper` and wage1$experlvl <- 3 # recode `exper`
show results of 2SLS directly
gen new variable wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1

Post-estimation, 2/2 example data: `wage1`


Statistical tests / diagnostics example data: `wage1`
Note: Postestimation commands in Stata apply to the most recently run estimation commands.
reg lwage educ exper // estimation mod <-lm(lwage ~ educ exper, data =
used for examples below wage1) # estimate used for examples
reg lwage educ exper##exper // mod1 <- lm(lwage ~ educ + exper +
estat hettest // Breusch-Pagan / below
estimation used for following post- I(exper^2), data = wage1) # Note: in
Cook-Weisberg test for lmtest::bptest(mod) # Breusch-Pagan estimation commands R, mathematical expressions inside a
heteroskedasticity / Cook-Weisberg test for hetero- formula call must be isolated with
skedasticity using the {lmtest} estimates store mod1 // stores in
estat ovtest // Ramsey RESET test `I()`
package memory the last estimation results
for omitted variables to `mod1`
lmtest::resettest(mod) # Ramsey
ttest wage, by(nonwhite) // RESET test
independent group t-test, compare margins // get average predictive margins::prediction(mod1) # get
t.test(wage ~ nonwhite, data =
means of same variable between margins average predictive margins with
wage1) # independent group t-test
groups {margins} package
margins, dydx(*) // get average
m1 <- margins::margins(mod1) # get
marginal effects for all variables
Interactions, categorical/continuous variables example data: `wage1`
marginsplot // plot marginal
average marginal effects for all
variables
In Stata, it is common to use special operators to specify the treatment of variables as continuous (`c.`) or effects plot(m) # plot marginal effects
categorical (`i.`). Similarly, the `#` operator denotes different ways to return the interaction of those
variables. Here we show some common uses of these operators as well as their R equivalents. margins, dydx(exper) // average summary(m) # get detailed summary of
marginal effects of experience marginal effects
reg lwage i.numdep // treat lm(lwage ~ as.factor(numdep), data
margins, at(exper=(1(10)51)) //
= wage1) # treat `numdep` as factor margins::prediction(mod1, at =
`numdep` as a factor variable average predictive margins over list(exper = seq(1,51,10))) #
reg lwage c.educ#c.exper // return lm(lwage ~ educ:exper, data = `exper` range at 10-year increments predictive margins over `exper` range
interaction term only wage1) # return interaction term at 10-year increments
only
reg lwage c.educ##c.exper // return estimates use mod1 // loads `mod1`
full factorial specification lm(lwage ~ educ*exper, data = stargazer::stargazer(mod1, mod2, type
wage1) # return full factorial back into working memory
reg lwage c.exper##i.numdep // = “text”) # use {stargazer} package,
specification estimates table mod1 mod2 // with `type=text` to display results
return full, interact continuous display table with stored
and categorical lm(wage ~ exper*as.factor(numdep), within R. Note: `type= ` also can be
data = wage1) # return full, estimation results changed for LaTex and HTML output.
interact continuous and categorical

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10

You might also like