Stataguide
Stataguide
Kurt Schmidheiny
Fall 2014 Universität Basel
1 Introduction 2
4 Additions to Stata 3
7 Data Manipulation 6
8 Descriptive Statistics 9
9 Graphs 10
10 OLS Regression 11
11 Log Files 12
12 Do-Files 13
1 Introduction
This guide introduces the basic commands of Stata. More commands are
described in the respective handouts.
All commands are shown using specific examples. Stata commands are
set in Courier; example specific datafiles , variables , etc. are set in
italics while built-in Stata functions and operators are upright.
When you start Stata, you will see the following windows: the Command
window where you type in your Stata commands, the Results window
where Stata results are displayed, the Review window where past Stata
commands are displayed and the Variables window which list all the vari-
ables in the active datafile.
The active datafile can be browsed (read-only) in the Browser window,
which is activated from the menu Data/Data browser or by the command
browse
The Editor window allows to edit data either by directly typing into the
editor window or by copying and pasting from spreadsheet software:
edit
Since version 8, Stata has implemented every command (except the pro-
gramming commands) as a dialog that can be accessed from the menus.
This makes commands you are using for the first time easier to learn as
the proper syntax for the operation is displayed in the Review window.
3 Short Guides to Microeconometrics
help correlate
search covariance
4 Additions to Stata
where the option clear removes a previously opened data set from the
Stata memory.
Stata provides a long series of example datasets at http://www.stata-press.
com/data/r12/. These dataset can directly be opened by, for example,
save mynewdata
dir
The current directory can be set to a particular drive and directory. For
example by
cd "/Users/kurt/Documents "
cd "C:\Users\kurt\Documents "
There are many ways to import data into Stata. Since version 12, data
can be conveniently imported from Excel with the menu File/Import/Excel
spreadsheet or the command import excel.
The following section shows a reliable way that can also be used in older
versions.
Prepare the data in Excel for conversion:
• Make sure that missing data values are coded as empty cells or as
numeric values (e.g., 999 or -1). Do not use character values (e.g -,
N/A) to represent missing data.
• Make sure that there are no commas in the numbers. You can
change this under Format menu, then select Cells... .
• Make sure that variable names are included only in the first row of
your spreadsheet. Variable names should be 32 characters or less,
start with a letter and contain no special characters except ‘ ’.
Under the File menu, select Save As... . Then Save as type Text(tab
delimited). The file will be saved with a .txt extension.
Start Stata, then issue the following command:
where mydata.txt is the name of the tab-delimited file. The option clear
removes a previously opened data set from the memory.
A Short Guide to Stata 12 6
7 Data Manipulation
webuse lifeexp
generate gnppc2 = gnppc ^ 2
generates a new variable gnppc2 with the square of the gross national
product (gnp).
generates a new dummy variable rich taking the value one if the gnp is
greater or equal than 20000, zero if it is below and missing if it is unknown.
does the same. Note Stata returns true (1) for the conditional statement
gnppc >= 20000 if gnppc is missing. This is a very unfortunate feature
of Stata and the source of many errors.
The command egen extends the functionality of generate. For example
creates a new variable mgnppc containing the (constant) mean of gnp for
all observations (countries). See the section 13 for more functions in egen.
Both the generate and the egen command allow the by varlist prefix
which repeats the command for each group of observations for which the
values of the variables in varlist are the same. For example,
sort region
by region : egen reggnppc = mean(gnppc )
generates the new variable reggnppc containing for each country the mean
of gnp across all countries within the same world region.
The recode command is a convenient way to exchange the values of or-
dinal variables. For example,
drop if region == 3
keep in 6/20
sorts the countries within regions by life expectancy. The order of vari-
ables in the current dataset is changed with, for example,
8 Descriptive Statistics
webuse lifeexp
summarize lexp gnppc
reports the frequency counts for safewater. The missing option requests
that the number of missing values is also reported.
9 Graphs
webuse lifeexp
scatter lexp gnppc
draws a scatter plot of the variable lexp (y-axis) against gnppc (x-axis).
draws a scatter plot with regression line. A histogram with relative fre-
quencies is called with, for example,
histogram gnppc
11 Short Guides to Microeconometrics
10 OLS Regression
The multiple linear regression model is estimated by OLS with the regress
command For example,
webuse auto
regress mpg weight displacement
where only cars heavier than 3000 lb are considered. The Eicker-Huber-
White covariance is reported with the option robust
F -tests for one or more restrictions are calculated with the post-estimation
command test. For example
11 Log Files
A log file keeps a record of the commands you have issued and their results
during your Stata session. You can create a log file with, for example
where mylog.txt is the name of the resulting log file. The append option
adds more information to an existing file, whereas the replace option
erases anything that was already in the file. Full logs are recorded in one
of two formats: SMCL (Stata Markup and Control Language) or text
(ASCII). The default is SMCL, but the option text changes that.
A command log contains only your commands, for example
view mylog.txt
You can temporarily suspend, resume or stop the logging with the com-
mands:
log on
log off
log close
13 Short Guides to Microeconometrics
12 Do-Files
do mydofile.do
You can also click on the Do current file icon in the do-file editor to run
the do file you are currently editing.
Comments are indicated by a * at the beginning of a line. Alternatively,
what appears inside /* */ is ignored. The /* and */ comment delimiter
has the advantage that it may be used in the middle of a line.
* this is a comment
generate x = 2*y /* this is another comment*/ + 5
Hitting the return key tells Stata to execute the command. In a do file,
the return key is at the end of every line, and restricts commands to be on
the same line with a maximum of 255 characters. In many cases, (long)
commands are more clearly arranged on multiple lines. You can tell Stata
that the command is longer than one line by using the
#delimit ;
⇒ Note that lines with comments also need to be terminated by ‘;’. Oth-
erwise the following command will not be executed.
15 Short Guides to Microeconometrics
& and | or
! not ∼ not
> greater than < less than
>= greater or equal <= smaller or equal
== equal != not equal
fill(numlist )
creates a variable of ascending or descending numbers or complex
repeating patterns. See help numlist for the numlist notation.
mean(varname )
creates a constant containing the mean of varname.
rowmax(varlist )
gives the maximum value in varlist for each observation (row). Equals
max(var1, var2, ... ) in the generate command.
rowmean(varlist )
creates the (row) means of the variables in varlist for each observation
(row).
rowmin(varlist )
gives the minimum value in varlist for each observation (row).