stata应用课程 回归
stata应用课程 回归
Xinyan Yao
Outline
4. Basic Regression
5. Results Exporting export regression results and save data
Environment Setup
- Change the working directory
cd /Users/sija/Desktop/0627-STATA
- Clear memory data
clear all
- Stop pausing for –more– messages
set more off
- Check the working directory pwd
- Check all files in the current working directory dir
Outline
4. Basic Regression
5. Results Exporting export regression results and save data
1 Data Importing
Ø Raw data
- TXT/CSV example.txt / example.csv
insheet [varlist] using filename [, options]
insheet using “example.csv”, delim( ) clear
- EXCEL example.xls / example.xlsx
import excel using “example.xlsx”, firstrow clear
- STATA example.dta
use example.dta, clear
1 Data Importing
Ø Example data
- Dataset installed with Stata sysuse auto.dta, clear
- Dataset from State website webuse charity, clear
- Dataset from websites
use
http://fmwww.bc.edu/RePEc/bocode/c/CardKrueger1994.dta,
clear
1 Data Importing
Ø Artificial data
set obs 100 //generate 100 observations
gen id = _n //generate “id” variable
gen x = invnormal(uniform())
gen y = 10 + 0.5 * invnormal(uniform())
//generate variables using
random number
1 Data Importing
Ø Other methods
- Clicking
1 Data Importing
Ø Other methods
- Paste data in the Data Editor
1 Data Importing
Ø Append/Merge datasets
- Vertical
(different units, same variables)
append using auto0
- Horizontal
(same units, different variables)
merge 1:1 id using auto1
Outline
4. Basic Regression
5. Results Exporting export regression results and save data
2 Data Exploration
Ø Dataset structure
- Cross-Sectional Data: data collected by observing various
subjects (like individuals, firms, countries), at the same point
in time. E.g., the population and GDP of China’s provinces in
1991.
2 Data Exploration
Ø Dataset structure
- Time Series Data: data collected by observing one subject at
different points of time. E.g., the population and GDP of
Beijing in 1991-2000.
2 Data Exploration
Ø Dataset structure
- Pooled Cross-Sectional Data: data collected by randomly
sampling cross sections of individuals at different points of
time. E.g., the population and GDP of Beijing and Tianjin in
1991, and that of Chongqing and Shanghai in 1992.
2 Data Exploration
Ø Dataset structure
- Panel Data: data collected by tracking the same group of
subjects over time. E.g., the population and GDP of Beijing
and Tianjin in 1991-2000.
2 Data Exploration
Ø Variable type
- Numeric variables: byte (binary), int, long (long integer),
float, double (double precision floating-point data)
- String variables: str1 – str244 (occupies n characters of
storage space)
- Date variables: if the variable imported values “2009-08-01”
or “2009/8/1”, it would be treated as a string variable. We can
use the date function to change it to date variable.
date("1/15/08","MDY",2019) = 15jan2008
2 Data Exploration
Ø View the data
- Variable type and explanation
des _all
4. Basic Regression
5. Results Exporting export regression results and save data
3 Data Processing
Ø Change storage type of variable
- Change within the numeric type
recast double mpg //change int type to double type
3 Data Processing
Ø Change storage type of variable
- Change between different types
tostring length, gen(length_str) //change number to string
destring length_str, replace
gen length_num = real(length_str)
//change string to number
encode make, gen(makeid) //for categorical variable,
change string variable to numeric variable
decode makeid, gen(makeid_str) //numeric to string
3 Data Processing
Ø Generate new variables
gen lnprice = log(price) //generate the natural logarithm
gen price2 = price ^ 2 //generate the squared term
gen inter = price * make //generate the interaction term
gen highprice = (price > 6500) //generate dummy variable
Ø Replace the value of variables
replace weight = weight * 100 //replace the value with 100
times of the initial value
3 Data Processing
Ø Rename variables
rename (lnprice inter) (ln_price price_make)
4. Basic Regression
5. Results Exporting export regression results and save data
4 Basic Regression
Ø Ordinary Least Squared
reg price i.foreign trunk length weight //regression with
price as the dependent variable
qui reg price i.foreign trunk length weight //do not report
the regression results
est sto model1 //store the results as “model1”
4 Basic Regression
Ø Regression between group differences
display _b[length]
display _b[length] + invttail(e(df_r),0.025) * _se[length]
display _b[length] - invttail(e(df_r),0.025) * _se[length]
// construct the 95% confidence interval by self calculation
Outline
4. Basic Regression
5. Results Exporting export regression results and save data
5 Results Exporting
Ø Export descriptive analysis
asdoc pwcorr price length, star(.05) save(corr1.doc)
// export correlation matrix (.word)