Chapter Three
Chapter Three
• Menus allow users to get results without needing to know the command
syntax.
• The top row is a Menu bar with commands. Below the menu bar is a Tool bar with
buttons.
• When you open Stata for the first time it seems as shown below.
Stata Environment /
Tool
Bar
Review window
• This window (in the upper left corner with a white background) lists all the recent
commands.
• If you click on one of the commands, it appears in the Command window and can be
executed by pressing the “Enter” key.
• When you open a Stata data file, it lists the variables in the file.
• If you create new variables, they will be added to the list of variables, &
• You can insert a variable into the Stata Command window by clicking on it in the
Variables window.
• The Stata results window does not keep all output generated. It will keep about 300-600
lines of the most recent output, deleting earlier output. If you want to store output in a
file, you must use the log command.
Command window
• This window (at the bottom with a white background) allows you to enter
commands which will be executed as soon as you press the “Enter” key.
• If you click on a variable in the Variable window, it will appear in the Command
window.
Results window
• This window (on the right with a black background) shows all recent commands,
output, and error messages.
Green: General information and the frame and headings of output tables
Blue: Commands or error messages that can be clicked on for more information
• The data editor window looks like a spreadsheet and shows the data in
memory.
• Unless you are absolutely certain to make some modification to your data, do
not open this window.
• If this window is open, you cannot execute any commands or make changes to the data.
Cont.…
Cont.…
Meanu bars/Commands
File Open files, save files, record commands and results in a log, import and export data
files, Print files, exit Stata.
edit Copy text or tables and Paste text or tables. Change preferences on how Stata looks
including the colors and layout of windows. For example, you can create a window
layout that you like and save it.
Data Describe data, view data, add labels to variables, create new variables, delete
variables, and combine two datasets.
Graphics Create and save many types of graphs from data in memory.
statistics Calculate descriptive statistics, run many types of regression analysis, and perform
other statistical analysis.
• Graphics: This drop-down menu enables you to make graphs and manipulate them as you like.
• Statistics: In this menu you can find among the most important applications where you can
make diverse and complex regression and simulation analyses.
Tool bar Button
Button What the button does
Open folder Open a new or existing data file
Diskette Save data file in memory to the hard-disk
Tool
Bar
Transferring Data Into Stata
• Copying and pasting from Excel spreadsheet into the Stata editor
• One of the easiest methods for getting data into Stata is using the Stata data editor, which
resembles an Excel spreadsheet.
• If your data is already typed into an Excel spreadsheet, you can copy and paste it in to the Stata
data editor.
Manually enter data into the Stata Data Editor
• Type edit in the Command window and press <Enter>
• or click on the Data Editor icon on the Results toolbar. Enter the following data
• 1 15845 female
• 2 74500 male
• 3 31000 male
• 4 22000 female
• 5 20323 male
• Then close the Data Editor by clicking on X. You have now three variables listed,
var1, var2, and var3.
• give these variables more meaningful names using the following commands:
Cont..
• rename var1 id
• rename var2 income
• rename var3 sex
• You will see in the Variables window that the names of the variables have changed
accordingly.
• The next step is to label the variables. Type and run the following three lines from
your Do-file:
• label variable id "Respondent Identification Code"
• label variable income "Respondent basic income"
• label variable sex "Sex of respondent”
• Finally save the data as “ income”
Opening existing Stata files
• Existing Stata format data files have the file extension .dta.
• In that case, we can directly open the file we want using the File→Open menu.
• When you open Stata, you will see a menu bar across the top, a tool bar with
buttons, and windows (the number of windows open depends on which windows
were open the last time Stata was used).
Entering data via the command window
• Type input in the command window.
• Following input type the sequence of variable names (eight letters or less) separated by
blanks.
• For example enter the following data
• input id age str8 race expenditure
• 1 22 white 5000
• 2 43 black 1500
• 3 25 white 6500
• 4 51 black 2500
• 5 29 oriental 3100
• end
• For missing data, enter a period for a numeric variable or blank for string variable
• When we finish entering the data, type end; that will complete the data entry.
• Then save the data as “expenditure”
Remark
• You can enter commands in either of three ways:
• Manually: you type the first command in the command window and execute it, then the next,
and so on.
• Do-file: type up a list of commands in a “do-file”, essentially a computer programme, and execute
the do-file.
Stata’s basic operators
• + Addition
• - Subtraction
• * Multiplication
• / Division
• ^ Raise to a power
• > Greater than
• < Less than
• >= Greater than or equal to
• <= Less than or equal to
• == Equal to
• ~ = or !=Not equal to
• & And
• | Or
• ~ Not
Stata basic functions
abs(x) absolute value of |x|
exp(x) exponential of x, ex
ln(x) or log(x) (natural) logarithm of x
log10(x) base 10 logarithm
sqrt(x) square root
rounds to the nearest integer, eg. round(5.8) = 6
round(x)
• These are three related commands that produce frequency tables for
discrete variables. They can produce
• One-way frequency tables (tables with the frequency of one variable)
or
• Two-way frequency tables (tables with a row variable and a column
variables).
• tabulate or tab produce a frequency table for one or two variables
• tab1 produces a one-way frequency table for each variable in the
variable list
• tab2 produces all possible two-variable tables from the list of variables
Cont…
• bysort sex: sum wage # for each sex, give statistics on wage
• bysort sex: tab educ # for each sex, give the frequency table of educ.
save, if option
• Save: This command saves the data in memory.
• The syntax is: save [filename] [, replace ]
• If you do not give a file name, it will use the current name.
• if option carries out the command only for the records that satisfy
some condition.
• Syntax : command if exp
sum wage if educ==12
tab sex if educ==12
list wage if educ<12
In, help
• The in option carries out a command only for records selected by the
case number.
• The syntax is: command in exp
• list var1 in 10 #give the value of var1 in observation number 10
• summarize in 10/20 #give mean, minimum, and maximum of all
variables for observations 10-20 .
• help command gives you information about any Stata command or topic.
• Syntax: help command
• help tabulate #gives a description of the tabulate command
• help summarize #gives a description of the summarize command
set mem
• This command creates a file with a copy of all the commands and
output from Stata.
• The first time you open a log, you must give a name to the new file to
be created.
• The syntax is: log using filename [, append replace]
• log using xx # saves output to a file called xx
• log using xx, replace #saves output to an existing file, replacing content
• log using xx, append saves output to an existing file, adding to contents
Log off, log on, log close
• log off
• This command temporarily turns off the logging of output, so that any subsequent
output is not copied to the log file.
• This is useful if you want to save some of the output but not all.
• “Log off” only works after a “log using command.”
• log on
• This command is used to restart the logging, copying any new output to the log file
that was already defined.
• “Log on” only works after a “log using” and a “log off” command.
• log close
• This command is used to turn off the logging and save the file.
• How are “log off” and “log close” different? “Log off” allows you to turn it back on
easily with “log on,” continuing to use the same log file.
Section-4 CREATING NEW
VARIABLES AND ADDING • generate
LABELS
• replace
• Focus on how we create
new variables and how to • tab …, generate
label them.
• using operators
• using functions
• recode
• label variable
• label define
• label values
generate
• replace wage = lnwage if wage > 1000 # replaces wage with an lnwage
• replace wage = 25 in 107 #replace wage=25 in observation #107
Dichotomize continuous variable
•Use lifeexp.dta
•Group life expectancy into two groups
•Lexp >==70 betterlife
•Lexp<70 worthlife
tabulate … generate
• This command creates one- and two-way tables that summarize continuous
variables.
• The command tabulate by itself gives frequencies and percentages in each cell
(cross-tabulations).
• The syntax is: tabulate varname1 varname2 [if exp] [in range],
summarize(varname3) option
• Where varname1 is a categorical row variable, varname2 is a categorical column
variable (optional), varname3 is the continuous variable summarized in each cell.
• Consider lifeexp data
• tabulate region country, summarize( gnppc)
• tabulate region country, summarize( gnppc) mean
tabstat
• This command sorts the records in the file according to the value of
specified variables.
• Syntax : sor varlist
• sort sex hhsize # sorts data file in order of sex and hhsize
• sort lnwage
merge
• This command combines two files with different variables into one
file.
• The merge command combines files horizontally (side to side).
• Syntax
Cont…
• The syntax : merge [varlist] using filename
• Where varlist is the list of key variable(s) that are in both data files
• filename is the data file that the current data set will with merged
with.
• Input id str8 sex age income ……trail1
• Input id hhsize expe ……trail2
• Sort both files using id and save.
• Use trail1
• merge id using trail2 ……..the two files will be merged together.
append
• sysuse lifeexp
• histogram popgrowth #histogram of popgrowth
• histogram popgrowth,normal ##histogram of popgrowth including normal curve
• histogram popgrowth, bin(5) # histogram with 5 bars
• scatter popgrowth gnppc lexp #scatter plot of popgrowth and gnppc against lexp
• scatter popgrowth lexp , by(region) #scatter plots of popgrowth against lexp for
each region
• graph bar gnppc lexp popgrowth #bar graph of the means of gnppc, lexp, and
popgrowth
• graph bar (sum) gnppc lexp popgrowth # bar graph of the sums of var1, var2, and
var3 s
• graph bar lexp, by( region)
Cont…
• Histogram: a bar chart showing the distribution of values of one
variable.
• Scatter:This command generates a two-way scatter plot, showing a
dot for each observation.
• graph bar lexp, over( region) over(country) # across two categorical
variable
• graph box lexp # box plot
• graph matrix gnppc lexp popgrowth
• hist lexp, discrete
• twoway (scatter gnppc popgrowth)
Section 8-
ANALYSIS
• Cross tab
Focus on
•
• Correlation
Examine relationship
• The one sample t-test assumes that the data be reasonably normally distributed.
• ttest ttest salary=2000 # test whether the mean salary of emp is 2000
• signtest salary=2000 # A nonparametric counterpart,
• ttest salbegin=3000
The independent samples t-test
• Any number of grouping variables can be stratified into cells that precisely
define your comparison groups.
• Compares the means of one variable for two groups of cases.
• We may be interested to compare the blood pressure of patients across
gender
• ttest salary, by( minority)
• ttest salary, by( gender )
• ttest prevexp, by( gender)
• ttest prevexp, by( gender) unequal # Two-sample t test with unequal
variances
• ranksum prevexp, by( gender) #Two-sample Wilcoxon rank-sum (Mann-
Whitney) test
Cont…
what if there are more than two categories for the independent variable we
have?
Cont…
• ANOVA is used to test the null hypothesis that several population means are equal.
• test the hypothesis that several means are equal.
It is an extension of the two independent sample t test.
• It examines the
variability of the observation within each group
variability between the group means.
• Based on these two estimates of variability, draw conclusions about the population
means.
• Depending on the design of the experiment, the ANOVA partitions the total
variation into a number of parts such as Treatment, Block or Error.
Cont…
• Compare means of more than two levels of the independent variable
We have one continuous dependent variable (interval/ratio
data) and;
One nominal or ordinal independent variable with more than two
levels /categories
Main question: Are the averages of the quantitative variable across the groups the
same?
Cont. ..
Why Not Just Use t-tests?
Since t-test considers two groups at a time, it will be tedious when many
groups are present
Conducting multiple t-tests can lead also to severe inflation of the Type I
error rate (false positives) and is not recommended