Basic Tutorial Stata PDF
Basic Tutorial Stata PDF
1 Basic Statistics
summarize: gives us summary statistics
After opening the data file, running summarize will give us summary statistics, including
number of observations, mean, standard deviation, minimum, and maximum, for all of
the variables in the data file.
summarize
If we want to see more detailed summary statistics, we can use an option, detail.
summarize iq, detail
1
tabstat: displays table of summary statistics
Running tabstat without options simply provides us means of variables.
tabstat wage kww educ
The statistics we can put in statistics( ) are following: mean (mean), count (count
of nonmissing observations), n (same as count), sum (sum), max (maximum), min (mini-
mum), range (range = max - min), sd (standard deviation), and variance (variance).
Adding an option by( ) specifies that the statistics be displayed separately for each unique
value of variable.
tabstat wage kww educ, by(married) statistics(mean median sd count)
The top panel where married = 0 shows the statistics of people who are not married.
2
2 Data Management
browse: opens data editor to browse the data set
Through data editor you can see how the data set is built and also whether you have
managed the data in a way that you want to work.
Using data editor, you can edit the values of observations, but I would not suggest doing
so for this class or for your academic career. There are better ways to manage values of
observations.
list: lists values of variables
Adding variable names after command provides values of the specific variable
list wage
(This will list all observations in our case, 935 observations. Unless you would like to
stare at series of numbers, you can click "stop" button at the top of stata window to
stop listing all numbers.)
generate: creates or changes contents of variable
You can create a new variable using this command. The following example creates a
new variable called lnwage with natural log values of wage.
generate lnwage = ln(wage)
You can change values of this new variable (wage2) by using replace command.
replace wage2 = wage2
(Be careful not to drop variables that you are using for your exercise. If you have
accidentally dropped the variables you need, clear the memory and reopen the dataset.)
You can eliminate the observations by using if. The following command will eliminate
the observations whose wage is greater than 3000. (Suppose you thought that people
with wage more than 3000 are outliers)
drop if wage > 3000
(Again, be careful with this. Please clear the memory and reopen the original data set
before you work on your homework.)
3
clear: clears memory
graph twoway: creates twoway graphs of scatter plots, line plots, etc.
You can investigate the scatter plots of two variables since its a twoway graph. The
first variable you put after scatter will be on the y-axis and the second variable will be
on the x-axis, as we will see in the next section, the dependent variable comes before the
independent variables.
graph twoway scatter wage educ
3000
2000
wage
1000
0
8 10 12 14 16 18
educ
You can also graph two different plots in one graph. While scatter graphs scatter
plots, lfit graphs twoway linear prediction plots. We can merge these two plots using
the following command:
graph twoway (scatter wage educ) (lfit wage educ)
3000
2000
1000
0
8 10 12 14 16 18
educ
4
3 Regression
regress: runs a linear regression
When using regress, after regress command put a dependent variable first and inde-
pendent variable(s) after it. If you want to estimate the following regression specification:
wage = 0 + 1 educ + u
The result provides 0 , 1 , t-statistics, standard errors, and 95% confidence intervals of
estimates, R2 , and many other statistical information of this regression.
For multivariate regression, you can just add more independent variables after dependent
variable. For example, if you want to run a regression on the model