Cheat Sheet: Summarize Data Estimate Models, 1/2

This cheat sheet provides Stata commands for econometric analysis and their equivalent expressions in R. It summarizes how to import and manipulate data, estimate linear regression models with robust standard errors, and estimate logistic and tobit models. Example data comes from Wooldridge's introductory econometrics textbook and can be accessed in R by installing the wooldridge package. The cheat sheet demonstrates how to estimate models, display results, and perform post-estimation tests in both Stata and R.

Uploaded by

Gerald

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views2 pages

Cheat Sheet: Summarize Data Estimate Models, 1/2

Uploaded by

Gerald

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Stata to R :: CHEAT SHEET

Introduction Summarize Data example data: `wage1` Estimate Models, 1/2

This cheat sheet summarizes common Stata Where Stata only allows one to work with one data set
commands for econometric analysis and provides at a time, multiple data sets can be loaded into the R OLS example data: `wage1`

their equivalent expression in R. environment simultaneously, and hence must be reg wage educ // simple regression mod1 <- lm(wage ~ educ, data =
specified with each function call. Note: R does not have of `wage` by èduc` (Results wage1) # simple regression of
an equivalent to Stata’s `codebook` command. printed automatically). `wage` by èduc`, store results in
References for importing/cleaning data, manipulating `mod1`
variables, and other basic commands include Hanck summary(mod1) # print summary of
browse // open browser for loaded data
et al. (2019), Econometrics with R, and Wickham and reg wage educ if nonwhite==1 // `mod1` results
Grolemund (2017), R for Data Science. add condition with if statement
describe // describe structure of
loaded data mod2 <- lm(wage ~ educ, data =
Example data comes from Wooldridge Introductory summarize // display summary reg wage educ exper, robust // wage1[wage1$nonwhite==1, ]) # add
Econometrics: A Modern Approach. Download Stata statistics for all variables in multiple regression using HC1 condition with if statement`
data sets here. R data sets can be accessed by dataset robust standard errors mod3 <- estimatr::lm_robust(wage ~
installing the `wooldridge` package from CRAN. list in 1/6 // display first 6 rows reg wage educ exper, educ + exper, data = wage1, se_type
cluster(numdep) // use clustered = “stata”) # multiple regression
tabulate educ // tabulate èduc` standard errors with HC1 (Stata default) robust
All R commands written in base R, unless otherwise variable frequencies standard errors, use {estimatr}
noted. tabulate educ female // cross-tabulate package
èduc` and `female` frequencies Tip: An alternate way to compute robust mod4 <- estimatr::lm_robust(wage ~
standard errors in R for any models not
Setup covered by {estimatr} package is load the
educ + exper, data = wage1,
clusters = numdep) # use clustered
Note: While it is common to create a `log` file in {AER} package and run: standard errors.
Stata to store the commands and output of Stata View(wage1) # open browser for loaded
`wage1` data coeftest(mod1, vcov. = vcovHC,
sessions, the equivalent does not exist in R. A more
savvy version in R is to create a R-markdown file to type = "HC1")
str(wage1) # describe structure of mod_log <- glm(inlf~nwifeinc + educ
capture code and output. `wage1` data + family=binomial(link="logit"),
summary(wage1) # display summary data=mroz) # estimate logistic
ssc install outreg2 // install statistics for `wage1` variables
MLE (Logit/Probit/Tobit) example data:`mroz` regression
òutreg2` package. Note: unlike R head(wage1) # display first 6 (default)
packages, Stata packages do not have rows data
to be loaded each time once installed. tail(wage1) # display last 6 rows logit inlf nwifeinc educ // mod_pro <- glm(inlf~nwifeinc + educ
estimate logistic regression + family=binomial(link=“probit"),
table(wage1$educ) #tabulate èduc` data=mroz) # estimate logistic
install.packages(“wooldridge”) # install frequencies regression
table(“yrs_edu” = wage1$educ, “female” =
probit inlf nwifeinc educ //
`wooldridge` package
wage1$female) # tabulate èduc` estimate logistic regression
data(package = “wooldridge”) # list frequencies name table columns mod_tob <- AER::tobit(hours ~
datasets in `wooldridge` package nwifeinc + educ, left = 0, data =
tobit hours nwifeinc educ, ll(0) mroz) # estimate tobit regression,
Tip: The {AER} package will automatically // estimate tobit regression,
load(wage1) # load `wage1` dataset into lower-limit of y censored at zero,
load other useful dependent packages, lower-limit of y censored at zero use {AER} package
session including: {car}, {lmtest}, {sandwich} which
?wage1 # consult documentation on are used for many of the commands listed in
`wage1` dataset this cheat sheet. Postestimation, 1/2 example data:`wage1`

Note: Postestimation commands in Stata apply to the most recently run estimation commands.
Basic plots example data:`wage1`

hist(wage) // histogram of `wage` hist(wage1$wage) # histogram of `wage` reg wage educ // estimation used mod1 <- lm(wage ~ educ, data =
hist(wage), by(nonwhite) // for the following post-estimation wage1) # estimation used for the
scatter(wage educ) // scatter plot commands following post-estimation commands
plot(y = wage$1wage, x = wage1$educ) #
of `wage` by `educ` scatter plot predict yhat // get predicted yhat <- predict(mod1) # get
twoway (scatter wage educ) (lfit abline(lm(wage1$wage~wage1$educ), values from last estimation, store predicted values
wage educ) // scatter plot with col=“red”) # add fitted line to as `yhat`
fitted line scatterplot
predict e, res // get residuals e <- residuals(mod1) # get residual
graph box wage, by(nonwhite) //
boxplot(wage1$wage~wage1$nonwhite) # from last estimation, store as `e` values
boxplot of wage by `nonwhite`
boxplot of `wage` by `nonwhite`

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10

Create/Edit Variables example data: `wage1` Estimate Models, 2/2
Note: where Stata only allows one to work with one data set at a time, multiple data sets can be loaded into
the R environment simultaneously, hence the data set must be specified for each command. Panel/Longitudinal example data: `murder`

xtset id year // set `id` as plm::is.pbalanced(murder$id,

gen exper2 = exper^2 // create wage1$exper2 <- wage1$exper^2 # murder$year) # check panel balance
`exper` squared variable create `exper` squared variable entities (panel) and `year` as
time variable with {plm} package
egen wage_avg = mean(wage) // create wage1$wage_avg <- mean(wage1$wage) #
create average wage variable xtdescribe // describe pattern of modfe <- plm::plm(mrdrte ~ unem,
average wage variable index = c("id", "year"),model =
xt data
"within", data = murder) # estimate
drop tenursq // drop `tenursq` xtsum // summarize xt data fixed effects (“within”) model
variable wage1$tenursq <- NULL #drop `tenursq`
xtreg mrdrte unem, fe // fixed summary(modfe) # display results
effects regression
keep wage educ exper nonwhite // keep wage1 <- wage1[ , c(“wage”, “educ”,
selected variables “exper”, “nonwhite”)] # keep selected
variables Instrumental Variables (2SLS) example data: `mroz`

tab numdep, gen(numdep) // create wage1 <- ivreg lwage (educ = fatheduc), modiv <-AER::ivreg(lwage ~ educ |
fastDummies::dummy_cols(wage1, first // show results of first fatheduc, data = mroz) # estimate
dummy variables for `numdep` 2SLS with {AER} package
select_columns = “numdep”) # create stage regression
recode exper (1/20 = 1 "1 to 20 dummy variables for `numdep`, use summary(modiv, diagnostics = TRUE)
{fastDummies} package etest first // test IV and
years") (21/40 = 2 "21 to 40 years") # get diagnostic tests of IV and
endogenous variable endogenous variable
(41/max = 3 "41+ years"), ivreg lwage(educ = fatheduc) //

{
gen(experlvl) // recode `exper` and wage1$experlvl <- 3 # recode `exper`
show results of 2SLS directly
gen new variable wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1

Post-estimation, 2/2 example data: `wage1`

Statistical tests / diagnostics example data: `wage1`
Note: Postestimation commands in Stata apply to the most recently run estimation commands.
reg lwage educ exper // estimation mod <-lm(lwage ~ educ exper, data =
used for examples below wage1) # estimate used for examples
reg lwage educ exper##exper // mod1 <- lm(lwage ~ educ + exper +
estat hettest // Breusch-Pagan / below
estimation used for following post- I(exper^2), data = wage1) # Note: in
Cook-Weisberg test for lmtest::bptest(mod) # Breusch-Pagan estimation commands R, mathematical expressions inside a
heteroskedasticity / Cook-Weisberg test for hetero- formula call must be isolated with
skedasticity using the {lmtest} estimates store mod1 // stores in
estat ovtest // Ramsey RESET test Ì()`
package memory the last estimation results
for omitted variables to `mod1`
lmtest::resettest(mod) # Ramsey
ttest wage, by(nonwhite) // RESET test
independent group t-test, compare margins // get average predictive margins::prediction(mod1) # get
t.test(wage ~ nonwhite, data =
means of same variable between margins average predictive margins with
wage1) # independent group t-test
groups {margins} package
margins, dydx(*) // get average
m1 <- margins::margins(mod1) # get
marginal effects for all variables
Interactions, categorical/continuous variables example data: `wage1`
marginsplot // plot marginal
average marginal effects for all
variables
In Stata, it is common to use special operators to specify the treatment of variables as continuous (`c.`) or effects plot(m) # plot marginal effects
categorical (ì.`). Similarly, the `#` operator denotes different ways to return the interaction of those
variables. Here we show some common uses of these operators as well as their R equivalents. margins, dydx(exper) // average summary(m) # get detailed summary of
marginal effects of experience marginal effects
reg lwage i.numdep // treat lm(lwage ~ as.factor(numdep), data
margins, at(exper=(1(10)51)) //
= wage1) # treat `numdep` as factor margins::prediction(mod1, at =
`numdep` as a factor variable average predictive margins over list(exper = seq(1,51,10))) #
reg lwage c.educ#c.exper // return lm(lwage ~ educ:exper, data = èxper` range at 10-year increments predictive margins over èxper` range
interaction term only wage1) # return interaction term at 10-year increments
only
reg lwage c.educ##c.exper // return estimates use mod1 // loads `mod1`
full factorial specification lm(lwage ~ educ*exper, data = stargazer::stargazer(mod1, mod2, type
wage1) # return full factorial back into working memory
reg lwage c.exper##i.numdep // = “text”) # use {stargazer} package,
specification estimates table mod1 mod2 // with `type=text` to display results
return full, interact continuous display table with stored
and categorical lm(wage ~ exper*as.factor(numdep), within R. Note: `type= ` also can be
data = wage1) # return full, estimation results changed for LaTex and HTML output.
interact continuous and categorical

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10

Florian Heiss - Using R For Introductory Econometrics - 2016
100% (6)
Florian Heiss - Using R For Introductory Econometrics - 2016
356 pages
Multicollinearity and Oaxaca -Tutorial
No ratings yet
Multicollinearity and Oaxaca -Tutorial
35 pages
Instrumental Variable Estimation 2: Implementation in R: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 2: Implementation in R: Instructor: Yuta Toyama Last Updated: 2021-05-18
34 pages
Tutorial-5
No ratings yet
Tutorial-5
12 pages
R For Introductory Econometrics-1
No ratings yet
R For Introductory Econometrics-1
4 pages
R Course
No ratings yet
R Course
7 pages
R CODES
No ratings yet
R CODES
5 pages
OLS Stata9
No ratings yet
OLS Stata9
13 pages
STA108HW4-1
No ratings yet
STA108HW4-1
5 pages
OLS Stata9
No ratings yet
OLS Stata9
11 pages
Problem Set
No ratings yet
Problem Set
8 pages
Econometrics Mock Exam - Solutions
No ratings yet
Econometrics Mock Exam - Solutions
3 pages
Econometrics All R Codes Final
No ratings yet
Econometrics All R Codes Final
12 pages
Final AK (Spring 2024)
No ratings yet
Final AK (Spring 2024)
14 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
Panel 2
No ratings yet
Panel 2
26 pages
stats notes
No ratings yet
stats notes
4 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
Pbset1 Dofile
No ratings yet
Pbset1 Dofile
3 pages
Valio So
No ratings yet
Valio So
23 pages
Techniques of Statistical Analysis 1 Group 2 2014-15
No ratings yet
Techniques of Statistical Analysis 1 Group 2 2014-15
3 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
100% (2)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
49 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual - Quickly Download For The Best Reading Experience
100% (3)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual - Quickly Download For The Best Reading Experience
49 pages
Points For Session 4 - Updated
No ratings yet
Points For Session 4 - Updated
9 pages
Econ117 ps1
No ratings yet
Econ117 ps1
6 pages
Predict and Co
No ratings yet
Predict and Co
6 pages
HW Iii PDF
No ratings yet
HW Iii PDF
8 pages
R-Codes-1
No ratings yet
R-Codes-1
3 pages
Statistics Econometrics Exam Feb
No ratings yet
Statistics Econometrics Exam Feb
8 pages
Enjoy immediate access to the full Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual in PDF.
100% (13)
Enjoy immediate access to the full Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual in PDF.
48 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Example 2.3: CEO Salary and Return On Equity: Chapter 2 - The Simple Regression Model
No ratings yet
Example 2.3: CEO Salary and Return On Equity: Chapter 2 - The Simple Regression Model
3 pages
Stata Workshop
No ratings yet
Stata Workshop
5 pages
Practice-Training_BTTC
No ratings yet
Practice-Training_BTTC
25 pages
Pool
No ratings yet
Pool
13 pages
R For Economic Research - 11 Multiple Equations Model
No ratings yet
R For Economic Research - 11 Multiple Equations Model
6 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Ecotrix Assignment
No ratings yet
Ecotrix Assignment
5 pages
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
100% (5)
Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual pdf download
46 pages
R Workshop PART 2
No ratings yet
R Workshop PART 2
36 pages
A028 GLM-SC3
No ratings yet
A028 GLM-SC3
137 pages
Analysis Course HW3
No ratings yet
Analysis Course HW3
12 pages
Data-Analysis-using-R
No ratings yet
Data-Analysis-using-R
3 pages
An Introduction To Stata For Economists: Data Analysis
No ratings yet
An Introduction To Stata For Economists: Data Analysis
48 pages
14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
No ratings yet
14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
52 pages
STATA_frain - Copy
No ratings yet
STATA_frain - Copy
68 pages
Comandos
No ratings yet
Comandos
51 pages
Tutorial 1-13 Answer Intermediate Macro
No ratings yet
Tutorial 1-13 Answer Intermediate Macro
40 pages
Detecting and Resolving Model Specification Errors in STATA
No ratings yet
Detecting and Resolving Model Specification Errors in STATA
7 pages
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
No ratings yet
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
362 pages
ECON6067 Stata (II) 2022
No ratings yet
ECON6067 Stata (II) 2022
22 pages
Lec20
No ratings yet
Lec20
16 pages
Applied Econometrics With R: Package Vignette and Errata: Christian Kleiber Achim Zeileis
No ratings yet
Applied Econometrics With R: Package Vignette and Errata: Christian Kleiber Achim Zeileis
6 pages
Regression An Ova
No ratings yet
Regression An Ova
24 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Panel Data Stata
No ratings yet
Panel Data Stata
16 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
mini
No ratings yet
mini
10 pages
oSCR::: Cheat Sheet
No ratings yet
oSCR::: Cheat Sheet
3 pages
Cheat Sheet: Extract Features
No ratings yet
Cheat Sheet: Extract Features
2 pages
Parallel Computing::: Cheat Sheet
No ratings yet
Parallel Computing::: Cheat Sheet
1 page
Golem
No ratings yet
Golem
1 page
Class Agnostic Time Series With Tsbox::: W W W W W WW W WW
No ratings yet
Class Agnostic Time Series With Tsbox::: W W W W W WW W WW
1 page
Cheatsheet: Pruning Text Summary Parameter Basics
No ratings yet
Cheatsheet: Pruning Text Summary Parameter Basics
1 page
Chap 4 Probability&Statics EC
No ratings yet
Chap 4 Probability&Statics EC
27 pages
Stat Prob Q4 Module 3
60% (5)
Stat Prob Q4 Module 3
21 pages
Erwin John Landicho
No ratings yet
Erwin John Landicho
8 pages
Homework7 1
No ratings yet
Homework7 1
11 pages
Statistics
No ratings yet
Statistics
3 pages
Stat 410 Tutorial Week 11
No ratings yet
Stat 410 Tutorial Week 11
5 pages
Hypothesis Testing
100% (3)
Hypothesis Testing
23 pages
DS Assignment 2
No ratings yet
DS Assignment 2
6 pages
Time Series Formula or
No ratings yet
Time Series Formula or
4 pages
Model-Based Geostatistics for Global Public Health: Methods and Applications Peter J. Diggle All Chapters Instant Download
100% (2)
Model-Based Geostatistics for Global Public Health: Methods and Applications Peter J. Diggle All Chapters Instant Download
55 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Implementation of Single Moving Average Methods For Sales Forecasting of Bag in Convection Tas Loram Kulon
No ratings yet
Implementation of Single Moving Average Methods For Sales Forecasting of Bag in Convection Tas Loram Kulon
12 pages
ML Model Flowchart
No ratings yet
ML Model Flowchart
5 pages
Methods of Center Measurement: X N X X X
No ratings yet
Methods of Center Measurement: X N X X X
85 pages
015 - Random Forest
No ratings yet
015 - Random Forest
15 pages
Aai101 Data Science Question Bank
No ratings yet
Aai101 Data Science Question Bank
24 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
No ratings yet
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
5 pages
NAMA: Laela Nofira Priantini NIM: 932113420 (20201134) Kelas: Pai-D TUGAS: Uji Normalitas Dengan Metode Kolmogorov Smirnov
No ratings yet
NAMA: Laela Nofira Priantini NIM: 932113420 (20201134) Kelas: Pai-D TUGAS: Uji Normalitas Dengan Metode Kolmogorov Smirnov
5 pages
probability problem sheet
No ratings yet
probability problem sheet
6 pages
Time: 3 Hours Total Marks: 100: Notes
No ratings yet
Time: 3 Hours Total Marks: 100: Notes
3 pages
Computer Predictions With Quantified Uncertainty, Part II
No ratings yet
Computer Predictions With Quantified Uncertainty, Part II
4 pages
Pizza Corner
100% (2)
Pizza Corner
12 pages
Biostatics
No ratings yet
Biostatics
5 pages
IB PPT 11 SL Data PDF
No ratings yet
IB PPT 11 SL Data PDF
38 pages
Statistics Syllabus
No ratings yet
Statistics Syllabus
11 pages
Management Science - Final Exam With Answers
No ratings yet
Management Science - Final Exam With Answers
11 pages
MSCFE 620 Group Submission
No ratings yet
MSCFE 620 Group Submission
9 pages
TD_Meth_2024
No ratings yet
TD_Meth_2024
6 pages
Statistics Mcqs - Hypothesis Testing For One Population Part 1
No ratings yet
Statistics Mcqs - Hypothesis Testing For One Population Part 1
6 pages

Cheat Sheet: Summarize Data Estimate Models, 1/2

Uploaded by

Cheat Sheet: Summarize Data Estimate Models, 1/2

Uploaded by

Stata to R :: CHEAT SHEET

Introduction Summarize Data example data: `wage1` Estimate Models, 1/2

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10

xtset id year // set `id` as plm::is.pbalanced(murder$id,

Post-estimation, 2/2 example data: `wage1`

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10

You might also like