0% found this document useful (0 votes)
45 views

stata应用课程 回归

The document provides an overview of using Stata for regression analysis. It covers importing and exploring data, processing and preparing data for analysis, conducting basic regressions, and exporting results. Key topics include importing different data types, viewing dataset structure and variables, generating and transforming variables, descriptive statistics, ordinary least squares regression, panel regression, and saving results.

Uploaded by

Mengdi Shi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

stata应用课程 回归

The document provides an overview of using Stata for regression analysis. It covers importing and exploring data, processing and preparing data for analysis, conducting basic regressions, and exporting results. Key topics include importing different data types, viewing dataset structure and variables, generating and transforming variables, descriptive statistics, ordinary least squares regression, panel regression, and saving results.

Uploaded by

Mengdi Shi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Stata: Regression

Xinyan Yao
Outline

1. Data Importing different types of data

2. Data Exploration dataset structure and variable type

3. Data Processing handle and describe data

4. Basic Regression
5. Results Exporting export regression results and save data
Environment Setup
- Change the working directory
cd /Users/sija/Desktop/0627-STATA
- Clear memory data
clear all
- Stop pausing for –more– messages
set more off
- Check the working directory pwd
- Check all files in the current working directory dir
Outline

1. Data Importing different types of data

2. Data Exploration dataset structure and variable type

3. Data Processing handle and describe data

4. Basic Regression
5. Results Exporting export regression results and save data
1 Data Importing
Ø Raw data
- TXT/CSV example.txt / example.csv
insheet [varlist] using filename [, options]
insheet using “example.csv”, delim( ) clear
- EXCEL example.xls / example.xlsx
import excel using “example.xlsx”, firstrow clear
- STATA example.dta
use example.dta, clear
1 Data Importing
Ø Example data
- Dataset installed with Stata sysuse auto.dta, clear
- Dataset from State website webuse charity, clear
- Dataset from websites
use
http://fmwww.bc.edu/RePEc/bocode/c/CardKrueger1994.dta,
clear
1 Data Importing
Ø Artificial data
set obs 100 //generate 100 observations
gen id = _n //generate “id” variable
gen x = invnormal(uniform())
gen y = 10 + 0.5 * invnormal(uniform())
//generate variables using
random number
1 Data Importing
Ø Other methods
- Clicking
1 Data Importing
Ø Other methods
- Paste data in the Data Editor
1 Data Importing
Ø Append/Merge datasets
- Vertical
(different units, same variables)
append using auto0
- Horizontal
(same units, different variables)
merge 1:1 id using auto1
Outline

1. Data Importing different types of data

2. Data Exploration dataset structure and variable type

3. Data Processing handle and describe data

4. Basic Regression
5. Results Exporting export regression results and save data
2 Data Exploration
Ø Dataset structure
- Cross-Sectional Data: data collected by observing various
subjects (like individuals, firms, countries), at the same point
in time. E.g., the population and GDP of China’s provinces in
1991.
2 Data Exploration
Ø Dataset structure
- Time Series Data: data collected by observing one subject at
different points of time. E.g., the population and GDP of
Beijing in 1991-2000.
2 Data Exploration
Ø Dataset structure
- Pooled Cross-Sectional Data: data collected by randomly
sampling cross sections of individuals at different points of
time. E.g., the population and GDP of Beijing and Tianjin in
1991, and that of Chongqing and Shanghai in 1992.
2 Data Exploration
Ø Dataset structure
- Panel Data: data collected by tracking the same group of
subjects over time. E.g., the population and GDP of Beijing
and Tianjin in 1991-2000.
2 Data Exploration
Ø Variable type
- Numeric variables: byte (binary), int, long (long integer),
float, double (double precision floating-point data)
- String variables: str1 – str244 (occupies n characters of
storage space)
- Date variables: if the variable imported values “2009-08-01”
or “2009/8/1”, it would be treated as a string variable. We can
use the date function to change it to date variable.
date("1/15/08","MDY",2019) = 15jan2008
2 Data Exploration
Ø View the data
- Variable type and explanation
des _all

help data types help format


2 Data Exploration
Ø View the data
- Variable’s detailed information
codebook make
2 Data Exploration
Ø View the data
- Variable’s value range
codebook length //examine the variable length’s value
range and basic statistics
2 Data Exploration
Ø View the data
- Variable’s value range (for categorical variables)
tab make //examines the variable make’s all possible
values and their frequencies
2 Data Exploration
Ø View the data
- Variable value for some subjects
list in 1

list make in 1/3, noobs


2 Data Exploration
Ø View the data
- Examine the missing value
codebook var1 //examine the number of missing values

tab var1, m //treat missing values like other values


Outline

1. Data Importing different types of data

2. Data Exploration dataset structure and variable type

3. Data Processing handle and describe data

4. Basic Regression
5. Results Exporting export regression results and save data
3 Data Processing
Ø Change storage type of variable
- Change within the numeric type
recast double mpg //change int type to double type
3 Data Processing
Ø Change storage type of variable
- Change between different types
tostring length, gen(length_str) //change number to string
destring length_str, replace
gen length_num = real(length_str)
//change string to number
encode make, gen(makeid) //for categorical variable,
change string variable to numeric variable
decode makeid, gen(makeid_str) //numeric to string
3 Data Processing
Ø Generate new variables
gen lnprice = log(price) //generate the natural logarithm
gen price2 = price ^ 2 //generate the squared term
gen inter = price * make //generate the interaction term
gen highprice = (price > 6500) //generate dummy variable
Ø Replace the value of variables
replace weight = weight * 100 //replace the value with 100
times of the initial value
3 Data Processing
Ø Rename variables
rename (lnprice inter) (ln_price price_make)

Ø Drop/Keep variables or observations


drop ln_price price2 //drop variables
drop in 1/3 //drop the first to the third observations
drop if price < 5000 //drop observations meet the condition
keep make price mpg //only keep these variables
keep in 1/20 ; keep if mpg < 22
3 Data Processing
Ø Basic operators in Stata
3 Data Processing
Ø Add labels to dataset/variable
label data “Automobile Data 1978” //add notes to dataset
label variable price “sales price” //add notes to variable
3 Data Processing
ØAdd labels to variable values
label define foreign_label 1 “F” 0 “D” //define a value
lable named “foreign_label”
label values foreign foreign_label //lable the variable
foreign with the value lable “forign_lable”
3 Data Processing
Ø Store the data
pwd //check current working directory
save autodata, replace //save the data as .dta and name
it autodata
3 Data Processing
Ø Descriptive analysis
- Summary statistics of numeric variables
sum price length //summary statistics of price and length

tabstat price length, stats(mean sd min p50 max) c(s)


f(%6.2f) //display summary statistics
in one table
3 Data Processing
Ø Descriptive analysis
- Summary statistics by category
tab foreign, sum(price) // summary statistics of each group
classified by foreign
3 Data Processing
Ø Descriptive analysis
- Two-way table of association measures
tab make price // the default option is frequency counts
Outline

1. Data Importing different types of data

2. Data Exploration dataset structure and variable type

3. Data Processing handle and describe data

4. Basic Regression
5. Results Exporting export regression results and save data
4 Basic Regression
Ø Ordinary Least Squared
reg price i.foreign trunk length weight //regression with
price as the dependent variable
qui reg price i.foreign trunk length weight //do not report
the regression results
est sto model1 //store the results as “model1”
4 Basic Regression
Ø Regression between group differences

within group differences

total variance of price ANOVA


4 Basic Regression
Ø Regression
4 Basic Regression
Ø Regression

t test test length


4 Basic Regression
Ø Regression
4 Basic Regression
Ø Regression with options
reg price i.foreign trunk length weight, noconstant
// regression without the intercept term
reg price i.foreign trunk length weight, beta
// regression with standardized coefficient
reg price i.foreign trunk length weight, level(99)
// report the 99% confidence interval
reg price i.foreign trunk length weight, robust
// estimate the robust standard error
4 Basic Regression
Ø Panel Regression
xtset idcode year // declare data to be panel and identify
the panel and the time variables unit panel and time
xtreg ln_wage age grade south tenure i.year, fe robust //
use fixed effect model, also control for year fixed effects, and
estimate robust standard error
est sto result // store the results
4 Basic Regression
4 Basic Regression
Ø Fitting value
predict price_hat, xb // fit the value of dependent variable
price
predict res_hat, residual // fit the residuals

display _b[length]
display _b[length] + invttail(e(df_r),0.025) * _se[length]
display _b[length] - invttail(e(df_r),0.025) * _se[length]
// construct the 95% confidence interval by self calculation
Outline

1. Data Importing different types of data

2. Data Exploration dataset structure and variable type

3. Data Processing handle and describe data

4. Basic Regression
5. Results Exporting export regression results and save data
5 Results Exporting
Ø Export descriptive analysis
asdoc pwcorr price length, star(.05) save(corr1.doc)
// export correlation matrix (.word)

logout, save(corr2) word replace: pwcorr price length,


star(.05) // export correlation matrix (.rtf)
5 Results Exporting
Ø Export descriptive analysis
asdoc sum price length, stat(N mean sd min max) dec(3)
replace save(sum1.doc) // export summary statistics (.word)

logout, save(sum2) word replace: tabstat price length,


stats(N mean sd min max) c(s) f(%6.2f) // export summary
statistics (.rtf)
5 Results Exporting
Ø Export regression results
esttab model* using regression1.rtf, replace r2 ar2 star(*
0.10 ** 0.05 *** 0.01) b(3) se(3) mtitle("reg" "reg noconstant"
"reg beta" "reg level99" "reg r") nonumber
// export results “model1”- “model5” (.rtf)
outreg2 [model1 model2 model3 model4 model5] using
regression2.xls, replace r2 adjr2 dec(3) // export results
“model1”- “model5” (.excel)
5 Results Exporting
Ø Export regression results
5 Results Exporting
Ø Export regression results
(panel regression)
outreg2 result using result.xls,
replace keep(age south tenure)
dec(3) addtext (Individual FE, YES,
Year FE, YES) // export results
( .excel) with reporting only coefficients
of age, south and tenure, and note that
the fixed effects have been controlled for.

You might also like