0% found this document useful (0 votes)
23 views

software material

Uploaded by

amanueco21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

software material

Uploaded by

amanueco21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1.

Introduction to Stata

What is Stata?
 It is a multi-purpose statistical package to explore, summarize and analyze datasets

 It has capability for handling and manipulating large data sets (e.g. millions of

observations), and it has ever growing capabilities for handling panel and time-series

regression analysis.

 A dataset is a collection of several pieces of information called variables usually arranged by


columns
 A variable can have one or several values information for one or several cases

Stata Interface
I. Stata Windows

Generally, Stata has 4 main windows

Command window: to submit command to stata. It supports basic text editing, copying

and pasting, and a command history

Results window: contains all the commands and their textual results

Review window: shows the history of commands that have been entered. It displays

successful commands in black and unsuccessful commands, along with their error codes,

in red.

variables window: shows the list of variables in the dataset, along with selected

properties of the variables

1
II. The Stata Tool Bars
Contain buttons that provide quick access to Stata’s more commonly used features

The toolbar buttons and their functions


Open: opens a Stata dataset. Click on the button to open a dataset with the Open dialog.
Save: saves the Stata dataset currently in memory to disk.
Print: displays a list of windows. Select a window name to print its contents
Log: begins a new log or closes, suspends, or resumes the current log.
Viewer: opens the Viewer or brings a Viewer to the front of all other windows.
Graph: brings a Graph window to the front of all other windows.
Do file Editor: opens the Do-file Editor or brings a Do-file Editor to the front of all other
windows.
Data Editor (Edit): opens the Data Editor or brings the Data Editor to the front of the
other Stata windows.
Data Editor (Browse): opens the Data Editor in browse mode
Variables Manager: opens the Variables Manager.
Clear more Condition: tells Stata to continue when it has paused in the middle of long
output.
Break: stops the current task in Stata.

III. Stata Menus and dialogs

 Stata’s Data, Graphics, and Statistics menus provide point-and-click access to almost

every command in Stata.

 The dialogs for many commands have the by /if/in and Weights tabs.

 These provide access to Stata’s commands and qualifiers for controlling the estimation

sample and dealing with weighted data

Getting Started

2
 If you are using Stata version 11 or earlier, and you want to read in a big dataset, then

before reading in your data, you must tell Stata to make available enough computer

memory for your data.

 If you get a message while using Stata 11 or earlier that there is not enough memory,
 For example, “no room to add more observations…”, then you need to manually set the
memory higher.
 You can type, for example,
 clear or drop_all
 to set the memory to a large enough amount, type
 set mem 700m or something higher

How to Read Data into Stata?


To load files in excel format into Stata, follow one of the following 2 procedures

1. click on “file” on the menu bar. In the file drop down menu, click on “import” and then

choose excel spread sheet, or

2. Open data editor by just typing “edit” or clicking on the menu bar. Then copy from excel,

right click in any of the cell in the data editor and then, past.

Saving data into Stata


 If the dataset is new or just imported from other format go to file –> save as or

 just type: save filename

 To save a dataset that has been already in use (overwriting the original data file),

 1. select File > Save; or

 2. click on the Save button; or


 3. type:
save, replace in the Command window

Log file

3
 A log file is simply a record of your Results window. It records all commands and all

textual output as it happens.

 Thus, it keeps your lab notebook for you as you work.

 Because it writes the file to disk while it writes the Results window, it also protects you

from disastrous failures, be they power failures or computer crashes.

How to create it?


File>Log>Begin

Do-file
 Do-file is a file containing a list of commands for Stata to run (called a batch file or a

script in other settings). It gets its name from the term do-file.

 Do-file Editor has advanced features that can help in writing such files; it can also be

used to build up a series of commands that can then be submitted to Stata all at once.

 A do-file can be launched by either clicking on the Do-file editor toolbar button or by typing
doedit in the command window.

4
2. DATA MANAGEMENT

Loading Data into Stata


Things to know about entering data in Stata

 A period („.‟) represents a missing numeric value

 Press Tab or Return to input a missing numeric value

 Press Tab or Return to input a missing value for a string variable

 Stata will not allow empty columns or rows in the middle of your data set.

Easy steps to load your data in Stata

Say you have a File name: datamgmt in excel


 Open the data
 Open data editor by just typing “edit” or clicking on the menu bar.
 Then copy from excel, right click in any of the cell in the data editor and then, past.
 To save the file, type: save filename
 ex: save datamgmt

Naming variables
 Variable names can have up to 32 characters,

 but many commands print only 12, and shorter names are easier to type.

 Stata names are case sensitive, Age and age are different variables!

 It pays to develop a convention for naming variables and sticking to it.

5
 It helps to use short lowercase names and single words or abbreviations rather than multi-

word names,

 for example, use effort or fpe to represent a variable called family_planning_effort or

familyPlanningEffort, although all four names are legal.

 Note the use of underscores to separate words.

Renaming variables

 Variables can be renamed using the following Stata syntax:


 rename old variable name new variable name
For example, rename female sex

Labeling variables
 Variables can be labeled using the following Stata syntax

 label variable var1 "description"

 where var1 is the variable to be labeled; and description is the label of var1

 The various levels of a categorical variable can be labeled using the following two Stata
syntaxes together:
 label define var1 1 “name of the first category” 2 “name of the second category”
 label values var1 var1
Where var1 is the name of the categorical variable; and 1 and 2 are the levels of the
categorical variable.
Example: A variable called gender has two categories – 1 for male and 2 for female.
 The categories of gender can be labeled as follows:
 label define gender 1 male 2 female
 label values gender gender

Generating (creating) new variable from existing


variables(s)

6
 The most common command for creating new variables is generate.

 Syntax is: generate new variable = expression

Where: new variable is the name of new variable

 Example: Generate a variable called income which is the sum of farm income
(fincome) and nonfarm income (nfincome):

generate income = fincome + nfincome

to generate natural logarithm: gen name of the new variable == ln(x)

to generate square root of X from X: gen name of the new variable == sqrt(X)

to generate natural exponential of X: gen name of the new variable == exp(X)

7
Keeping and dropping variable
 Your data set may contain variables you are not interested in or you don’t want to

analyze.

 It’s a good idea to get rid of these first – that way, they won’t use up valuable

memory and they won’t inadvertently sneak into your analysis.

 You can tell Stata to either keep what you want or drop what you don’t want – the end
results will be the same.
The syntax is
 keep variables to remain
 drop variables to remove
 keep if var>= 0
 drop if var < 0

Examining the Data


 It is important to examine your data when you first read it into Stata

 check that all the variables and observations are present and in the correct format.

 “browse and edit” commands start a pop-up window in which you can examine the

raw data.

 To examine it within the results window, use the “list” command

 Note: listing the entire dataset is only feasible if it is small.

 If the dataset is large, you can use some options to make the output of list more

tractable.

 The list command displays the values of all the variables


 Syntax: list varlist
 Where varlist is the list of variables to be listed; and options is any or a combination of
any of the options associated with the list command
8
 list varlist , options (you can include the if or in options). like,
1. list x if x> 65 or x >=25 etc
2. List x if gender=1. This give value of X only for male or female
3. list X in 1/5. To list only the first 5 observations

Assert

 With large datasets, it often is impossible to check every single observation using

list or browse

 additional commands to examine data are described in the following.

 A first useful command is “assert “which verifies whether a certain statement is

true or false.

 Syntax: assert expression

For example, you might want to check whether all values in the math variable are
nonnegative as they should be:
 Syntax: assert math !< 0 or assert math >= 0
 If the statement is true, assert does not yield any output on the screen.
 If it is false, assert gives an error message and the number of contradictions.

Describe
The describe command produces a summary of the dataset in memory or of the data stored in a
Stata-format dataset.
 Syntax: describe
 describe varlist, memory_option
 Describe data in file
 describe varlist using “location and name of the file”, file_options

Summarize
This provides summary statistics, such as means, standard deviations, and so on.

 Syntax: summarize or

9
 Summarize, detail

Tabulate
The tabulate command is a versatile command that can be used, for example, to produce a

frequency table of one variable or a cross-tab of two variables.

 Syntax: tabulate varname, options

 Syntax: tabulate varname1 varname2, options

Inspect
 The inspect command is a way to eyeball the distribution of a variable, including as
it does a mini-histogram.
Syntax: inspect varlist

Correlations
 Correlation measures association/relationship between variables.

 The correlate command displays the correlation matrix or covariance matrix for a

group of variables.

 The syntax: corr variable list

How to get correlation & if it is significant or not (pairwise correlation)?

syntax: pwcorr list of variables, star (5or 1or 10.i.e level of sig)

10
3. Application to Crossectional Analysis

Hypothesis Testing
ttest varname == # : Test the hypothesis that the mean of a variable is equal to some
number, which you type the number, instead of the sign #.
ttest varname1 == varname2 :Test the hypothesis that the mean of one variable equals
the mean of another variable.
ttest varname, by(groupvar) :Test the hypothesis that the mean of a single variable is
the same for all groups. The groupvar must be a variable with a distinct value for each
group. For example, groupvar might be gender, to see if the mean of a variable is the
same for male & female

Confidence Intervals
ci varname :Confidence interval for the mean of varname (using asymptotic normal
distribution).
ci varname, level(#) : Confidence interval at #%. For example, use 99 for a 99% confidence
interval.

How to generate dummy variable


We can generate dummy variables by using the tabulate (tab) & generate (gen) commands. Say
the variable “Race” is a categorical variable with 4 categories, we can generate dummy variable
for each category by using the following syntax
tab Race, gen (Race_dummy)

OLS Regression
regress yvar xvarlist: Regress the dependent variable yvar on the independent variables
xvarlist. For example: regress y x or regress y x1 x2 x3.
regress yvar xvarlist, robust : regress but this time compute robust standard errors.
regress yvar xvarlist, robust level(#): Regress with robust standard errors, and this time change
the confidence interval to #% (e.g. use 99 for a 99% confidence interval)

OLS Regression with dummy variable(s)

11
regress yvar xvarlist i.Race: Regress the dependent variable (yvar) on the continuous
independent variables (xvarlist) & categorical independent variable (Race). For example: regress
y x i.Race
, or regress y x1 x2 x3 i.Race

Post-Estimation Commands
Commands described here work after OLS regression.
predict yhat: After a regression, create a new variable, having the name you enter here, that
contains for each observation the predicted value of the dependent variable.
predict name of the new variable, residuals : After a regression, create a new variable, having
the name you enter here, that contains for each observation its residual

Post-Estimation Tests
1. Heteroskedasticity Tests
Syntax: hettest
2. Functional Form (specification error) Test
Syntax: ovtest
3. Multicollinierity Test
Syntax: vif

Logistic (Logit) Regression


logit yvar xvarlist: Regress a binary dependent variable (yvar) on the independent variables
(xvarlist). For example: logit y x or logit y x1 x2 x3.
logit yvar xvarlist, or : Regress a binary dependent variable (yvar) on the independent
variables (xvarlist). But this time compute the odds ratio (or)
For example: logit y x1 x2 x3, or
logit yvar xvarlist, robust : Regress a binary dependent variable (yvar) on the independent
variables (xvarlist). But this time compute robust standard errors.

Logistic Regression with dummy variable(s)


logit yvar xvarlist i.Race: Regress a binary dependent variable (yvar) on the continuous
independent variables (xvarlist) & categorical independent variable (Race).
For example: logit y x1 x2 x3 i.Race

12
If you are interested in computing the odds ratio
logit y x1 x2 x3 i.Race, or

13

You might also like