STATAforEconWorkshop2
STATAforEconWorkshop2
Erika Meucci
We are going to be working interactively – at the end you will be expected to create a do file to
reproduce what you are doing. A good idea is to have a do file editor open and add to it each time
you execute a command. Don’t forget to add the explanation of each step.
Create a new folder for this workshop – call it something useful like StataWorkshop02.
Download the Do-files and the data files and save them to that location.
Launch Stata
Change the working directory to the newly created folder
Open the Do-files when asked to
Stata data files have the extension *.dta. These files should not be opened with any program but Stata. If
you locate a *.dta file using double-click it will also start Stata. There are several ways to open, or load,
Stata data files. We will explain a couple of them.
With Stata started, change your working directory to the where you have stored the Stata data files. In the
Command window type use lifeexp and press Enter.
If you have a data file already open, and have changed it in some way, Stata will reply with an error
message:
This feature will prevent you from losing changes to a data file you may wish to save. If this happens, you
can either save the previous data file [more on this below], or enter clear in the Command window.
The clear command will clear what is in Stata’s memory. If you want to open the data file and clear
memory, enter
To open a Stata data file click the Open (use) icon on the toolbar
Locate the file you wish to open, select it, and click Open.
In the Review window the implied command is shown.
In Stata opening a data file is achieved with the use command. The path of the data file is shown in
quotes. The quotes are necessary if the path name has spaces included. The option clear indicates that any
existing data is cleared from memory. You can also work with data file that are on line.
use http://stats.idre.ucla.edu/stat/stata/notes/hsb2
Also shown are variable Labels, if they are present, along with the Type of variable and its Format.
Labels are useful and can be easily added, changed or deleted. For example, if you type the following line
in the command window
This command will create the label, and it will write over an already existing label for female.
Instead of the command approach, you can use the pull-down menus as follows:
Data > Data Utilities > Label Utilities > Label Variable.
In the resulting dialog box, you can alter the existing label by choosing Attach a label to a variable,
choosing the variable from the Variable: drop-down list and typing in the New variable label. Click OK.
In the dialog box you can also choose to Remove a label.
As we saw last class, there are a few things you should do each time a data file is opened, or when you
begin a new problem. First, enter into the Command window
describe
This produces a summary of the variables in the data file, information about them, and their labels.
However, what you can see is the list of number of observations, mean, standard deviation, Min. and
Max. of the variables. If you want information about the median and other percentiles, you should use
sum varname, d
sum write, d
The Stata help system is one if its most powerful features. Click on Help on the Stata menu. Select
Contents.
Each of the blue words is linked to further screens. You should explore these to get a feel for what is
available.
In the Dialog box that opens there are several search options. To search all the Stata documentation and
Frequently Asked Questions (FAQs) simply type in phrase describing what you want to find. It does not
have to be a specific Stata command. For example, let’s search for Summary Statistics.
Up comes a list of topics that might be of interest. Once again blue terms are links. Click on Summarize.
The resulting Viewer box shows the command syntax, which can be used when typing commands in the
Command window, and many options.
1.4.2 Using command search
If you know the name of the Stata command you want help with, click Help > Stata Command
In the resulting dialog box type in the name of the command and click OK.
help summarize
If you know the name of the command you want, but do not recall details and options, a dialog box can be
opened from the Command window. For example, if you wish to summarize the data using the dialog
box, enter
db summarize
Or, enter
help summarize
and click on the blue link to the viewer or click on Dialog to obtain the dialog box.
Stata commands have a common syntax. The name of the command, such as summarize is first.
The terms in brackets [ ] are various optional command components that could be used.
• [varlist] is the list of variables for which the command is used.
• [if] is a condition imposed on the command.
• [in] specifies range of observations for the command.
• [weight] when some sample observations are to be weighted differently than others.
• [, options] command options go here.
For more on these options use a Keyword Search for Command syntax, then click Language.
Remark: An important fact to keep in mind when using Stata is that its commands are case sensitive. This
means that lower case and capital letters have different meanings. Since Stata considers x to be different
from X, it is easy to make programming errors.
Consider the following examples using the syntax features. In each case type the command into the
Command window and press Enter.
For example,
computes the simple summary statistics for the females in the sample. The variable female is 1 for
females and 0 for males. In the “if statement” [called an “if qualifier” by Stata] equality is indicated by
“==”.
computes simple summary statistics for those in the sample whose writing score (write) is greater than or
equal to 40.
summarize in 1/50
computes detailed summary statistics for the variable write in the first 50 observations.
If you notice at bottom left of the Results window —more—: when the Results window is full it pauses
and you must click —more— in order for more results to appear, or press the space bar.
At this point you are wondering “How am I supposed to know all this?” Luckily you do not have to know
it all now, and learning comes with repeated use of Stata. One great tool is the combination of pull-down
menus and the Review window. Suppose we want detailed summary statistics for female write scores in
the first 100 observations. While you may be able to guess from previous examples how to do this, let’s
use the point and click approach. Select Statisics > Summary statistics > Summary and
descriptive statistics and then Summary Statistics from the pull-down menu.
In the resulting dialog box we will specify which variables we want to include, and select the option to
display additional statistics. Then click on the by/if/in tab at the top.
In the new dialog box you can enter the if condition in a box. Click the box next to Use a range of
observations. Use the selection boxes to choose observations 1 to 100. Then click OK.
Stata echoes the command, and produces detailed summary statistics for the women in the first 100
observations
In the Review window is the list of commands we have typed. You will also find the list of commands
generated using the dialogs. After experimenting for just a few minutes you will learn the syntax for the
command summarize. Suppose you want to change the last command to include observations 1 to 150.
You can type the command
into the Command window, but Stata offers us a much easier option. In the Review window, simply click
on the command. Instantly, this command appears in the Command window
Simply edit this command, changing 100 to 150, then press Enter
To edit a previously used command, click on that command in the Review window. The past command
will appear to the Command window, where it can be edited and executed. Not only do you obtain new
results, but the modified command now appears as the last item in the Review window.
1.6 SAVING YOUR WORK
When you carry out a long Stata session you will want to save your work.
One option is to highlight the output the Results window, then right-click.
This gives you options to copy (Ctrl+C) the output as text, and then paste it into a document using the
shortcut (Ctrl+V) or by clicking the paste icon.
If you paste into a word processing document you may find that the nicely arranged Stata results become
a ragged, hard to read, mess. Part of the results might look like
This is due to the word processor changing the font. While you may be using Times New Roman font for
standard text, use Courier New for Stata output. You may have to reduce the font size to 8 or 9 to make it
fit. A partial output is
!
As we saw in the last session, Stata offers a better alternative. In addition to having results in the Results
window in Stata, it is a very good idea to have all results written (echoed) to an output file, which Stata
calls a log file. You can begin a log file by entering a command as we saw last session or by clicking on
the Log Begin/Close/Suspend/Resume icon on the Stata toolbar.
In the resulting dialog box, the log file can be named and the type of log file selected. The default is a
formatted log file with the extension *.smcl, which stands for Stata Markup and Control Language. Better
to save it as a log file by choosing the option to save a Stata Log. Give the file a meaningful name and
recall that it will be located in the directory which we have made the default. Click Save.
This dialog box can also be reached via the Stata toolbar by clicking File > Log > Begin.
The command log using, with the file name in quotes, appears in the Results window.
You can Begin/Close/Suspend/Resume a log by choosing the icon used to open the log file.
In the resulting dialog box select Close log file and press OK.
To View the log file, click on File > Log > View. In the dialog box enter (or browse for) the file
name and click OK. The log file file1.log opens in what is called the Stata Viewer. Now, you can print the
entire log file by clicking the printer icon.
Advantages of the log file include the ability to view the formatted output, and to easily print it. A
disadvantage of those files is that they cannot be easily viewed without having Stata open. They are like
*.html files in that while they are text files, they also include lots and lots of formatting commands.
You can translate the Stata log files into simple text files. On the Stata toolbar select
This will open file1.log in the current directory. Variations of this command are:
will open the log file and replace one by the same name if it exists.
will open an existing log file and add new results at the end.
The command
log close
It is a good idea to examine the data to see the magnitudes of the variables and how they appear in the
data file. On the Stata toolbar are a number of icons
Sliding the mouse pointer over each icon reveals its use. Click on Data Browser.
The data browser is a spreadsheet view. Use the slide bar at the bottom and the one on the right to view
the entire data array. The browser allows you to scroll through the data, but not to edit any of the entries.
This is a good feature that ensures we do not accidentally change a data value.
Do-files are very convenient after having pointed and clicked enough so that the commands you want to
execute appear in the Review window. If you have been carrying along on the computer with the
examples we have been doing, then your Review window is a clutter of commands right now. Let’s take
those commands to a new Do-file called WS02.do (or whatever you want) or to your existing do file if
you haven’t already. The extension *.do is recognized by Stata and should be used. Right-click in the
Review window, and on the pull-down menu click Select All. After all commands are selected right-click
again and choose Send to Do-file Editor.
The Do-file Editor is opened. To save this file click on File > Save as and enter the file name WS02.do.
The Stata Do-file editor is a simple text editor that allows you to edit the command list to include only
those commands to keep. In the file below we have eliminated some commands, done some rearranging,
and added some new commands. It also presumes that the log file is new, that you have saved and cleared
any previous work, and that the working directory has been specified.
To execute this series of commands click the Do icon on the Do-file Editor toolbar.
The results appear in the Result window and will be written to the specified log file.
The Do-file editor has some useful features. Several Do-files can be open at once, and the Do-file editor
can be used to open and edit any text file. By highlighting several commands in the Do-file and selecting
Do Selected Lines parts of the Do-file can be executed one after the other. Of course the data file must be
open prior to attempting to execute the selected lines.
Stata offers a wide variety of functions that can be used to create new variables, and commands that let
you alter the variables you have created. In this section we examine some of these capabilities.
To create a new variable use the generate command. Let’s start with the pull-down menu. Click on Data
> Create or change data > Create new variable on the Stata menu. A dialog box will
open.
Alternatively, in the Command window, enter db generate to open the dialog box. In the dialog box
you must fill in New variable name: choose something logical, informative and not too long.
Contents of new variable: this is a formula (no equal sign required) that is a mathematical
expression. write2 is a new variable that will be the square of write. The operator “^” is the symbol Stata
uses for “raise to a power, so write^2 is the square of write, write^3 would be write cubed, and so on.
Click OK. In the Results window (and Review window) we see that the command implied by the menu
process is
generate float write2 = write^2
In this command float is automatically added by the menu driven process and is a description of the type
of variable being created. It stands for floating point. Type help data type if you are curious. It is
optional and is not required. We can enter
generate write2 = write^2
The command can also be shortened to
gen write2 = write^2
Suppose in the process of creating a new variable you forget the exact name of the function. This happens
all the time. To illustrate let us create a new variable lwrite which will be the natural logarithm of write.
Go through the steps in Section 2.1 until you reach the generate dialog box. Type in the name of the new
variable, and then click Create, opening Expression builder.
In the Expression builder dialog box you can locate a function by choosing a category, scrolling down the
function list while keeping an eye on the definitions at the bottom until you locate the function you need.
Double-click on the function log(), and it will appear the Expression builder window
Now fill in the name of the variable write in place of “x” and click OK.
In the generate dialog box you will now find the correct expression for the natural logarithm of write in
the Contents of new variable space. Click OK.
The command will be executed, creating the new variable lwrite which shows up in the Variables window.
Stata echoes the command to the Results window
generate float lwrite = log(write)
and to the Review window. The simple direct command is
gen lwrite = log(write)
To drop or rename a variable in the variable list, click on Data > Create or change data >
Keep or drop variables.
To Rename a variable, click Data > Data utilities > Rename groups of variables.
Suppose we want to rename math as mathematics. Then fill in the dialog box.
The drop and rename commands are simple to enter directly, and are
drop write2
rename math mathematics
Stata shortcut: With a list of variables to type it is easier to type the command name, here drop, and then
click on the names of the variables in the Variables window. When selected they appear in the Command
window.
Stata has a long list of mathematical and statistical functions that are easy to use. Type help functions in
the Command window. We will be using math functions and density functions extensively.
Click on math functions. Scrolling down the list you will see many functions that are new to you. A few
examples of the ones we will be using are:
Note that the exponential function is exp. Use the Stata browser to compare the values of write and
elwrite. These are identical because the exponential function is the antilog of the natural logarithm. The
variable lwrite is the logarithm of write, and elwrite is the antilog of lwrite. The function log(write) is the
natural logarithm and so is ln(write). We often use the notation ln(x) is used to denote the natural
logarithm.
2.6 SAVING THE STATA DATA FILE AND ENDING THE SESSION
At this point it would be a good idea to save the Stata data file, since it has been changed by adding
several variables. Click File > Save as
!
So that you do not write over the original data, save the data file under a new name.
The Stata command is
log close
QUESTIONS: