2023 Stata Lab Session
2023 Stata Lab Session
Narayani Lasala-Blanco
Stata is composed of four sub-windows: Review, Variables, Results, and Command Windows:
• Command Window (Command Line): All the commands can typed in this window. If you
type in a command and press the “Enter” key, the command will be executed. We will rarely
type commands directly here. Most of the time we will be using a do-file (more on this
below).
• Variables Window: The names of variables contained in the opened dataset will appear here.
• Results Window: The results of any command you execute will appear in this window.
• Review / Command Window: The commands you have executed appear in this window
(upper left corner)
1
The next thing you do to start your statistical analysis is to bring a data file into Stata. To open
up a data file, click on the “Open” icon which is located in the upper-left corner of the Stata
window as shown below.
A windows explorer window will appear (the directory appearing in the windows explorer may
be different from the example shown below). Open the (GSS) data files wherever you saved
them (Documents, Downloads etc)
Choose the data you want to read into Stata. We will use the “nes08” data file here.
Once you register and are able to log in you will see the links to download the data in a .dta
(STATA) format. Download the dataset and go back to STATA.
2
Once you are in STATA: Click Open… and select the downloaded dataset.
Error Message?
If you are working at a non University Lab computer you may see a memory error message like
when you try to bring a file into Stata:
3
This error happens due to a shortage of memory when the dataset is very large. If you see this
message, increase the memory size of Stata temporarily to 4 Megabyte (or more ) by typing in
the following command.
. set memory 4m
3. Create a “do-file”
You are required to submit the commands you used for assignments. As we progress in the
course you will see it is easy to get confused about which files correspond to which day of work.
A do-file is a command file that lets you submit several commands to Stata at the same time. It is
important to create a do file because if you need to make changes to your statistical analysis a
do-file simplifies this process. A do-file should contain a list of all (that is, all) the Stata
commands you use in your data analysis, preferably from start (after bringing the data file into
Stata) to finish.
To start a new do-file within the Stata program click on the envelope icon on the toolbar or click
“New Do-file” from the Windows pulldown menu (Windows: Do-file Editor: New Do-file).
The following window will appear. Save as any other file by going to File> Save as…
4
Work on your command window as demonstrated further below. Every time you execute a
command successfully simply select it, copy and paste the command in the do-file window. For
example, in the above screen we have successfully brought the NES 2008 data into Stata (notice
the variables are in the Variables window).
Tip #1: It is quite useful to drag and rearrange the do file and the remaining Stata windows.
If you have them side by side as shown below it is easier to use the do-file edit-as-you-go-
advantage. Stata Window Do-File Editor Window
2. A do file (which will contain all the commands you used for your analysis). You may
also have an “output” or “results” file from your data analysis, as a separate file form 3.
3. A Word or other word processing file, where you will paste your tables (Copy the tables
from STATA, paste onto word by selecting Courier font size 8)
4. A Word or other word processing file, where you will write your paper (5 pages
maximum of writing), which may have two appendixes: one with the (tables of the)
results you obtained which are not on the main body of the paper, and the second with all
the commands you executed, that is, a copy of your do-file.
In this lab session we will copy and paste each command executed correctly onto the do file.
You can enlarge the Variable and Review windows by clicking and dragging the border line of
each window. The captured picture shows the resized (enlarged) Review and Variable windows.
In the Variable window, you will see the labels for the variables which were hidden before due to
the window size. Resizing allows you to look at the variable labels, if they are provided by the
data-creator.
6
Drag
In the Review window, you see the commands you executed. In the image shown above, the
command for opening the dataset appears. Even if you use the Menu-Bar option instead of typing
in the commands, Stata automatically converts the Menu-Bar activities into commands and
records the commands.
For this example, we use “V085084” which represents the survey respondent’s self-reported
liberal/conservative political ideology. To find out detail information for this variable, use
“codebook” command. Type into Command window,
. codebook V085084
Caution!
Stata is “case sensitive.” Stata distinguishes the upper letters from the lower letters. That is, “A”
and “a” are not the same. In our example, the variable name is “V085084.” This is “V085084”,
which starts with “V” not with “v”. If you type “codebook v085084” instead of “codebook
V085084”, Stata will say “no variables defined.”
7
Notice at the bottom, only some example values and labels are shown (instead of all values and
labels). You can tell because the range of values is “[-9,7]”, i.e. -9 to 7, but it only shows labels
for values -7, 2, 4, and 5. Use the “labelbook” command to see all labels (this dataset does not
have it)
*Note: typing just “codebook” or “labelbook” without the specific variable name requests
detail information for ALL variables—you do not want to do this!
The information given by the “codebook” and “labelbook” does not include univariate statistics.
. tab V085084
or
. tabulate V085084
8
In the Results window, we see this variable has seven substantive categories. That is, there are
seven options for answers, seven of which are “substantive” and refused (coded -9) don’t know
(coded -8) (-2 No Post Election IW).
The seven substantive categories are: “1” if the respondent is “extremely liberal”, “2” if the
respondent is a “liberal” and so on. The first four categories (“Refused”, “Don’t” and
“Haven’t..”’ “No post election..”) will be treated as “missing”, which means the response data is
missing. In some cases categories are assigned negative values or very high values, so that they
won’t be confused with substantive categories.
To get univariate statistics such as mean and standard deviation, use the “summarize” command:
. sum V085084
or
. summarize V085084
The mean for these categories is 1.126184. However, this is highly distorted due to including the
“missing” categories. We have to recode them, so that Stata designates them as “missing” cases
to exclude in statistical calculations.
9
Tip#3: When you copy the commands from the Results window and paste them into your
do-file be careful NOT to include the “.” at the beginning of the line. Stata will not
recognize the Command.
Always verify that you did not get an error message in the Results window before copying and
pasting a command.
Suppose you type “Codebook” instead of “codebook” and since Stata is case sensitive, you will
get the following error message:
You can type the command again and then proceed to paste it on your do-file. Always type in
commands in the window and then paste into do-file.
In the screens above we proceed to select and copy the commands into the do-file window. All
successful commands typed into the Command window (in this and previous handouts in bold
and preceded by “.”) should be in your do file.
Note that commands may not go more than one line in the do-file.
Tip #4: To add notes to your do-file use asterisks. (You can use just one *, or any number
of them, for various levels of emphasis. Because Stata reads each new line as a new
command, make sure to use an asterisk on each line if you have an extended comment.)
The following code includes a note and then a command:
Stata will ignore any text that has asterisks in front of it. Now you can put notes and reminders
into the text of your do-file. Including notes also helps other understand what you are trying to
do in your analysis.
10
Here, we recode categories, “ -9” to “-2” to “.” which is the symbol for “missing” in Stata. Use
the “recode” command.
Type the name of the variable you want to recode after “recode” command. Then, in parentheses,
list the values of the original variable you want to recode on the left side of equal sign and the
value you want to replace them with on the right hand side. To keep the original variable intact,
create a new variable with recoded values by using the “gen” option. Put the name for the newly
recoded variable in the parenthesis. Here, we name it “ideol”.
Stata reports 735 differences from the original variable because there were 735 observations for
categories -9 to -2 together. This is a relatively small portion of the original 2322 observations.
. tab ideol
Notice the “missing” values (-2,-7,-8, and- 9) disappeared. The “tab” command excludes
missing cases automatically. To include the missing cases, add “missing” at the end of the
command line.
. tab ideol, missing
11
How does the new mean look? Does it look about right?
. sum ideol
Move the cursor onto the command in the Review Window that you want to re-run, and double-
click the line. A single-click will make the command just appear in the Command Window.
Now that we have treated missing values properly, let’s look at some other statistics for this
variable. To get “Standard Error of the Mean”, type:
. ci ideol
12
This command shows not only the standard error of the mean but also the 95% confidence
interval of the estimated mean. (This statistic will be covered in a later class.)
. hist ideol
or
. histogram ideol
13
7. Recoding – Dichotomization
or
Values “1,” “2,” and “3” are recoded as “1” (Liberal), and values “4” through “7” are recoded as
“0” (Non-Liberal). Missing values are recoded to “.”, as they originally were. Notice that you
can use the “/” term in “recode” as a shortcut, i.e. “1/3” is equivalent to “ 12 3” in this command.
We name the resulting dichotomous variable “liberal”.
. codebook liberal
Usually it is convenient to give variables short names—like we did for variable “V085084” when
we generated “ideol” and “liberal”—as we will type these names many times. It is also
convenient to have more complete descriptions of the data attached to the dataset so that when
you come back a year later you can remember what everything means. The “label” and “note”
commands are used for this purpose.
The “label” command is used to label the values of the variables, e.g.,
. tab liberal
14
You may also use the “note” command to add a note to any variable in the dataset.
This allows you to have more complete descriptions of the data attached to the dataset, so that
when you come back a year later you can remember what everything means. For example:
9. Save
Once you have executed all the commands required to get the appropriate statistics for
completing Computer Exercise #2 (and copied and pasted each onto the do-file window) your
do-file for this exercise should look like this:
15
Save your do file (File-> Save as).
16
This will clear STATA from all datasets.
Open the dataset again (as shown in page 2). Once the dataset is loaded, go to the icon shown
above of the do file window and click on this “Do” icon. This will execute in a batch mode all
the commands saved in the do- file.
Stata should execute all the commands and you should be able to retrieve all statistics from the
Results window. If error messages appeared you should revise your do-file
If you want to modify your do-file in a following session open the do file editor as shown at the
beginning of the handout (Window> Do file editor> New do file). Then, open the File menu. An
untitled do-file will pop up, click Open… and search for your “stata lab 1.do” do-file (or
whatever name you have given it).
17