Introduction to Stata Software,MaU, 2022
Introduction to Stata Software,MaU, 2022
Tolasa Alemayehu
Economics Department
Mattu University
Nov, 2022
Mattu , Ethiopia
A PLAN (CONTENT) OF TRAINING
1.Introduction
The Stata Interface
Exploring and Examining Datasets
Storing Commands and Outputs
2. Data management
Creating, Modifying and Defining Variables
Appending and Merging Datasets
Collapsing Data Sets
3. Describing Data
Summary Statistics
Statistical Tests
Graphics
Born in 1985.
Stata is not an abbreviation but rather a
Toolsbar
Results window
Variables window
5 windows
Variables window
Review Window
Properties window
Commands window
The Stata Interface: Windows, Toolbar, Menus, and Dialogs
Windows
The Stata windows give you all the key information about
the data file you are using, recent commands, and the results
of those commands.
The five main windows are the Review, Results, Command,
~ Not
== Equal
~= not equal
!= not equal
> greater than
>= greater than or equal
< less than
<= less than or equal
& And
| Or
1.2.2. Examining dataset
Using the command window:
a. Stata file (.dta): use command
b. Excel file (.xlsx): import command
c. CSV file (csv): insheet command
d. .SPSS file: usespss command
Log file:Stata can save the file in one of 2 d/t formats.
of file
• We can also combine the if and the in commands
• use q1a hhid hhsize cons using ERHScons1999:
out a command
• command if exp
Examples:
– list hhid q1a food if food >1200 list data if food is > 1200
– list if q1a < 6 lists cases in region is 1 through 5
– Browse hhid q1a food if food >=1200 browse data if
Storing data
Save
Save, replace
Examples
Save "C:\Users\eea\Desktop\SD\verion1.dta”
Save “C:\Users\eea\Desktop\SD\verion2.dta”, replace
Getting help in Stata
• Help: The help command gives you information about any Stata
command or topic
• help [command]
For example,
• help tabulate: gives a description of the tabulate command
• help summarize gives a description of the summarize
• search: a keyword search and Useful when one does not know
stata commands
Example : search ols
hsearch : not restricted to key words
E.g. hsearch weak instruments
netsearch: when connected to internet
◦ capture log
◦ log using commands
Create a shorter way of writing your directories
◦ Note
Notes can be written for variables
of an existing variable.
The syntax is the same:
◦ rename variable
◦ label variable
◦ Keep/ drop and order/sort
◦ label define/values
rename variables: This command is used to rename variables
apply keep
However, if there are many variables to keep and only few to
Examples
◦ drop pwhole_mixed pretail_mixed
◦ keep pwhole_white pretail_white pwhole_red pretail_red
Note: The two commands are the same in this case
Sort: This command arranges the observations of the current
data into ascending order based on the values of the variables
listed
Variable ordering: This command helps us to organize
variables in a way that makes sense by changing the order of
the variables
order x y z: Puts x first y second z third
sort x : Puts data in ascending order of the variable x
Appending datasets
Appending datasets
Often we don’t have all the info
that we need in one dataset, and
we have to append two or more
datasets into one
merge two or more datasets
into one
There are several types of
“appending” “merging”
datasets…
As long as the variables in the
files are the same and the only
thing you need to do is to add
observations, this is vertical
combination.
For this we use the append
command.
Appending datasets
Appending data files
◦ concatenates two datasets, that is, stick them together
vertically, one after another
use "$final\tprice_addis.dta", clear
append using "$final\tprice_dire.dta“
save "$final\tprice_all.dta", replace
◦ The append command does not require that the two
datasets contain the same variables.
◦ But it highly recommended to use identical list of
variables for append command to avoid missing
values from one dataset
Defining Variables
label define: This command gives a name to a set of value
labels. For example, instead of numbering the regions, we can
assign a label to each region. The syntax is:
label define lblname # "label" # "label" # “label“ [, add modify]
Where: lblname is the name given to the set of value labels
◦ # are the value numbers
◦ “label”are the value labels
◦ add means add these value labels to the existing set
◦ modify means o change these values in the existing set
Defining Variables
Note that:
You can use the abbreviation “label def“
The double quotation marks are only necessary if there are
spaces in the labels
Stata will not let you define an existing label unless you say
“modify” or “add“
label values
Statistics
Stata
Exercises
3.1.Basic Descriptive Statistics Using Stata
• summarize
– The summarize command produces statistics on continuous
skewness = Y3
measure of asymmetry (lack of symmetry) of a distribution
Kurtosis =
= measure of mass in tails = measure of probability of large values
kurtosis = 3: normal distribution
1-62
Basic Descriptive Statistics using Stata
Basic Descriptive Statistics Using Stata
◦ histogram cons
◦ histogram cons, normal
kernel density
◦ kdensity cons
◦ kdensity cons, normal
4. Regression Analysis Using Stata
Steps in Empirical Analysis
Structure of Economic Data
Regression Models
data)
◦ Distribution is the same over time
Weak/covariance vs strong stationarity
Data is random
Survey design
Nature of data
Properties of the OLS
Estimator
Under assumptions (A1)–(A4), the OLS
Additional
The relationship of interest is linear
Data are stationary
Data is random
Violations of GM Assumptions
GM assumptions can be violated for a
variety of reasons
Example 1: The assumption that there is
usually hetroskedastic
stationary