Bio624 Class1handout
Bio624 Class1handout
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 1
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
— Website and Schedule — Use several models for the analysis of a dataset to
effectively answer the main scientific questions
— Lecture Notes #1
— Understand how longitudinal data differ from cross-
— e-Quiz #1 (due Fri, 8 Apr 2011) sectional data and why special regression methods
are sometimes needed for their analysis
! Model checking: analysis of residuals, measures of ! The course contents, schedule, and procedures are
leverage and influence summarized in course website pages:
! Special topics: methods for missing data; reliability, inter- — “Home” page: organizational details
rater agreement, diagnostic tests, reference intervals,
sample size, regression for survey samples — “Schedule” page: classes, e-quizzes, exam, project
! Students who master the course contents will be able to: ! Web site URL:
— Design a tabular or graphical display of a dataset that ! Some parts of the course site require a Userid and
makes apparent the association between Password, which are
explanatory variables and the response
Userid: bio624
— Choose a specific linear, logistic, log-linear, or
survival regression model appropriate to address a Password: theedge
scientific question and correctly interpret the
meaning of its parameters.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 2
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
0. Title
1. Abstract (structured)
2. Introduction
3. Methods (including sample size
considerations)
4. Results (including at least one figure and one
table)
5. Discussion
6. Appropriate other tables, figures, etc
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 3
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
4.4 Data analysis project (cont'd)
! Possible sources for datasets: — Some textbooks have collections of datasets that may
be suitable for further analysis
— An important part of the project is to identify and gain Again, if you decide to use one of these
access to an appropriate dataset datasets, make sure to consult source
paper(s) for the dataset and attach with
— The best dataset is one that you are familiar with from the supporting materials for your project
past work that you can use to address questions report
that have not been addressed before
LC Hamilton, Statistics with Stata
— Next best is a dataset from an advisor or colleague www.stata.com/bookstore/swsdl.html
— ideally one whose subject matter is of interest to
you Duxbury publishing website - site contains datasets
from health statistics textbooks: Click “Data
— It is OK to use datasets from other classes or the Library”:
MPH capstone project if they include enough http://www.thomsonedu.com/statistics/disciplin
material to support a regression analysis — if in e_content/dataLibrary.html
doubt, ask an instructor from this class
— Online datasets. There are numerous datasets online Hosmer and Lemeshow: Applied Survival
that could be used for a project. Some links to Analysis:
possible sources for datasets are posted on the ftp://ftp.wiley.com/public/sci_tech_med/survival/
course website (“Other links” on the home page):
http://www.biostat.jhsph.edu/courses/bio624/misc/datasets.ht
m Hosmer and Lemeshow: Applied Logistic
Regression Analysis: Datasets are
— Government and institutional websites ( a few are contained in the University of Massachusetts
listed below) contain an enormous amount of data, Datasets Archive, which contains links to other
will require some exploration to find downloadable, data resources (make sure to type the URL
raw data suitable for analysis): exactly as given below and then scroll down to
the list of datasets by type of analysis - DO
www.fedstats.gov FEDSTATS (federal NOT USE the low birthweight dataset)
statistics locator)
http://www-unix.oit.umass.edu/~statdata/statdata/
www.cdc.gov Centers for Disease
Control, including the
National Center for Moore and McCabe: Introduction to the Practice of
Health Statistics Statistics (IPS), arguably, the best introductory statistics
text available. The applets help master statistical
NCHS public use data files concepts. The datasets will require finding the source
and documentation papers
www.cdc.gov/nchs/datawh/ftpserv/ftpdata/ftpdata.htm http://www.whfreeman.com/ips/
www.census.gov US Census Bureau
http://www.sph.emory.edu/bios/bioslist.html#database
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 4
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! The Stata website (www.stata.com) has a good Support set memory 800m , permanently
section, especially the FAQs
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 5
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! Stata has lots of on-line help available -- all sections of the ! Prices vary for academic institutions, businesses, and
written documentation is on-line in “abbreviated” form students. Prices also depend on whether the system will
(sometimes too abbreviated, especially for statistical be used on a network and how many users there will be
techniques)
! Manuals are purchased separately - some are available in
! A good way to access on-line help is via the Help pull-down the JHMI bookstore
menu - portal to all Stata Help including the complete set
of manuals in well-indexed PDF format. ! There is a charge for a subscriptions to the Stata Journal
are also extra, which comes in both hard copy and PDF
! If you know the name of the command, you can access format
online help via the help command. For example to get
help for the summarize command: ! Stata has no annual renewal fee, as do some other
statistical packages such as SAS, and offers regular free
help summarize updates containing fixes and extensions
Note, upper right: dialog: summarize
! The Stata web site, www.stata.com, has the latest prices
– Nearly every Stata command has a dialog and information on how to purchase items
screen to construct the command
! BSPH has a GRADPLAN for purchasing the lastest version
Note: [R] summarize -- Summary statistics of Stata by students. Online ordering is at
www.stata.com/gpdirect
- Nearly every Stata command has an [R] link
to the PDF Documentation entry
! The Stata Journal is a refereed journal and is published 2. Select one a fixed space font -- one of the larger
quarterly with articles about statistics, data analysis, Stata fonts or fixedsys are good choices
teaching methods, and effective use of Stata’s language
3. Make sure the font size is at least 9
Net courses on Stata. These range is length from a few to 4. IMPORTANT – save the windowing preferences
12 weeks. They are done via e-mail. There is a charge for or the changes disappear:
the courses.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 6
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
1. Do nothing, all files up to date ! In Stata, a “Dataset” is “Data” plus labels, formats, notes,
and characteristics
2. Update both the executable and ado files
! One extra step is are required to install a new executable: — Using and saving data from disk
use, save
Click: update swap append, merge
compress
! After installing an update, you can find out what has been — Inputting data into Stata
added or changed by typing: input
edit
help whatsnew infile
infix
insheet
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 7
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
— Data manipulation
generate, replace
recode 5.13 A special do-file – profile.do
egen
rename
drop, keep
sort ! When Stata begins, it looks for a file named profile.do ,
encode, decode containing commands that are to be executed as Stata
order starts
by
reshape ! In particular, Stata looks for the profile.do file in c:\data,
among other places, so you can execute a set of
— Keeping track of your work commands every time you start Stata by placing them in
log a text file named profile.do , which you store in c:\data
notes
! The profile.do file recommended for this course is as
— Convenience follows and can be downloaded from various places on
display the e-Quizzes page on the course website:
table
tabstat
tab_chi ( use findit tab_chi for 5.14 How to start Stata and set the working
install/help) directory
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 8
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! You would usually store the log(s) in the same folder with
your data files related to your work
In this tutorial we show you how to enter your data into Stata.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 9
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
5.17 Stata tutorial on data input (cont'd)
infix
a transfer program
-------------------------- --------------------------------------
Then you save your data by using save
-------------------------------------------------------------------------------
edit is the easiest way to enter a small amount of data. You type
. clear (to drop any data in memory)
. edit (to enter the spreadsheet editor)
Only Stata for Windows and Stata for Macintosh users can use edit. We are
not going to demonstrate it here. See the Getting Started manual or just
try it. input is available on all versions of Stata:
-------------------------------------------------------------------------------
. clear
-------------------------------------------------------------------------------
input continues to accept observations until you type 'end'. Once you have
some data in memory, typing input by itself adds new observations:
-------------------------------------------------------------------------------
. input
id mpg weight price
6. 6 26 2230 4453
7. end
Only Stata for Windows and Stata for Macintosh users can use edit. We are
not going to demonstrate it here. See the Getting Started manual or just
try it. input is available on all versions of Stata:
-------------------------------------------------------------------------------
. clear
. input id mpg weight price
id mpg weight price
1. 1 22 2930 4099
2. 2 17 3350 4749
3. 3 22 2640 3799
4. 4 20 3250 4816
5. 5 15 4080 7827
6. end
-------------------------------------------------------------------------------
input continues to accept observations until you type 'end'. Once you have
some data in memory, typing input by itself adds new observations:
-------------------------------------------------------------------------------
. input
id mpg weight price
6. 6 26 2230 4453
7. end
-------------------------------------------------------------------------------
Another way to enter this data would be to type it into a wordprocessor or an
editor, save it in a file, and then read the file. We have such a file:
-------------------------------------------------------------------------------
. type "h:\stata\auto1.raw"
make, mpg,weight, price
AMC Concord, 22, 2930, 4099
AMC Pacer, 17, 3350, 4749
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 10
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
5.17 Stata tutorial on data input (cont'd)
-------------------------------------------------------------------------------
Our file has the variable names at the top (that is not required) and we used
commas to separate values one from the other. To read this, we can type:
-------------------------------------------------------------------------------
. clear
. list
make mpg weight price
1. AMC Concord 22 2930 4099
2. AMC Pacer 17 3350 4749
3. AMC Spirit 22 2640 3799
4. Buick Century 20 3250 4816
5. Buick Electra 15 4080 7827
-------------------------------------------------------------------------------
It's easy. insheet will read comma- or tab-delimited files, so it will read
text files created by spreadsheet and database programs.
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
If your values are separated by blanks rather than commas or tabs, you use
infile to read it. Here is such a file:
-------------------------------------------------------------------------------
. type "h:\stata\autodata.raw"
"AMC Concord" 22 2930 4099
"AMC Pacer" 17 3350 4749
"AMC Spirit" 22 2640 3799
"Buick Century" 20 3250 4816
"Buick Electra" 15 4080 7827
. clear
. infile str14 make mpg weight price using "h:\stata\autodata"
(5 observations read)
. list in ½
-------------------------------------------------------------------------------
Finally, if you have a formatted file, you use infile or infix to read it:
-------------------------------------------------------------------------------
. type "h:\stata\auto3.raw"
AMC Concord
2229304099
AMC Pacer
1733504749
AMC Spirit
2226403799
Buick Century
2032504816
Buick Electra
1540807827
. clear
. infix 1: str make 1-18 2: mpg 1-2 weight 3-6 price 7-11
> using "h:\stata\auto3.raw"
(5 observations read)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 11
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
. list
Saving data
-----------
After you have entered data into Stata, you can save it. The command is:
save filename
If you do not specify the extension for the filename, Stata assumes the ex-
tension '.dta'. For instance, we could type 'save auto' to save this data.
It would be saved in the file auto.dta. The command to retrieve previously
saved data is:
use filename [, clear]
Thus, the next time we want to use auto.dta, we could type 'use auto' or 'use
auto, clear'. Sometimes 'use auto' will work, but 'use auto, clear' will al-
ways work. Stata stores data in memory. The clear option tells Stata that
it's okay to drop the data in memory in order to retrieve the new data.
! Command:
! Command:
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 12
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
6.1 What are and why use do-files ... Make sure you press [Enter] after typing the line
! “Do-files” make re-running a series of commands very easy Click mom.do on the Task Bar
– one step
! Make the fixes (change to “Hello Mother Dear” ) and then
! “Do-files” for particular tasks can be copied and modified to (IMPORTANT) save the file
perform similar tasks – “do-files” serve as templates for
future work Click File / Save
or (as above),
! This program simply displays the message “Hello Mom” --
e
an easy way to try the do-file approach Click: Do current file icon (in do-file editor)
! The name of the program file will be mom.do
! Repeat the “Edit - Run” cycle until done or tired
! Store the program in a folder: My Documents\bio624
! To create a program file: ! This program is a little more complicated – try it for fun
and practice in making do-files
Click: Start
! Open Stata by clicking profile.do in MyDocuments\bio624
Click: Stata icon
! Input faculty IQ data and summarize it
Click: Do-editor icon (envelope)
! The name of the program will be blah.do
Note: You can also used NOTEPAD, WORDPAD or
! The program is in folder: MyDocuments\bio624
even WORD -- anything that allows files to be
read and written in “text” format
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 13
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
6.5 Another program (cont'd)
do blah.do
input sno IQ
1 138
2 142
3 136
4 124
5 158
6 108
7 116
8 128
9 125
10 88
end
list
summarize IQ , detail
log close
Type: MyDocuments\bio624\blah.do
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 14
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
Click Intercooled... on the Task Bar Paste into the do-file editor (or into Notepad or Wordpad)
do blah.do
8. Stat /Transfer for importing/exporting data
or,
! In many cases, you can Copy/Paste the data from the outside
source into the Stata Data Editor, which transfers the data in
simple cases
! The best option is to use to translate the data into or from Stata
format is to use a “transfer program” such as StatTransfer --
available in the PC Labs on the 3rd floor
Select SAS for Windows/OS2 from the input File Type selection
box
Click Browse ; locate and select the file SAS file ex3-1.sd2
for the input File Specification box
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 15
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
— These can “pasted” into the Stata Data Editor, which often is
a very quick way to transfer data into Stata
! Data Source: The data comes from Exercise 3 on p.45 from the
well-written textbook Practical Statistics for Medical
Research (Chapman & Hall) by Douglas Altman
! Data sheet:
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 16
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9. Example 1: exploratory analysis of data from Altman’s
Exercise 3-1 (cont'd)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 17
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
alt3-1ex.dat,
which contains the raw data, one line (row) per patient
Id Number sno
1 2 44 1560 1.0 0
2 2 65 1310 1.2 0
3 2 58 850 1.2 0
4 2 57 1250 1.7 0
5 2 51 950 1.8 0
6 2 64 850 1.8 0
7 2 33 1200 1.9 0
8 2 61 1390 2.0 0
9 2 49 1450 2.3 0
10 2 67 3300 2.8 0
11 2 39 2760 2.8 0
12 2 42 860 3.4 0
13 2 35 1810 3.4 0
14 2 31 1310 3.8 0
15 2 37 1250 3.8 0
16 2 43 1210 4.2 0
17 2 39 1460 4.9 0
18 2 53 2310 5.4 0
19 2 44 1360 5.9 0
20 2 41 1910 6.2 0
21 2 72 910 12.0 0
22 2 61 1410 18.8 0
23 2 48 2460 47.0 0
24 2 59 1350 70.0 0
25 2 72 810 80.0 1
26 2 59 1460 80.0 1
27 2 71 760 80.0 1
28 2 53 910 80.0 1
1 1 53 360 2.0 0
2 1 74 2010 2.0 0
3 1 29 1390 2.0 0
4 1 53 660 3.0 0
5 1 67 1135 3.5 0
6 1 67 510 5.3 0
7 1 54 410 5.7 0
8 1 51 910 6.5 0
9 1 57 360 13.0 0
10 1 62 1260 13.0 0
11 1 51 560 13.9 0
12 1 68 1135 14.7 0
13 1 50 1410 15.4 0
14 1 38 1110 15.7 0
15 1 61 960 16.6 0
16 1 59 1310 16.6 0
17 1 68 910 16.6 0
18 1 44 1235 22.0 0
19 1 57 2950 22.3 0
20 1 49 360 33.2 0
21 1 49 1935 47.0 0
22 1 63 1660 61.0 0
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 18
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
23 1 29 435 65.0 0
24 1 53 310 65.0 0
25 1 53 310 80.0 1
26 1 49 410 80.0 1
27 1 42 690 80.0 1
28 1 44 910 80.0 1
29 1 59 1260 80.0 1
30 1 51 1260 80.0 1
31 1 46 1310 80.0 1
32 1 46 1350 80.0 1
33 1 41 1410 80.0 1
34 1 39 1460 80.0 1
35 1 62 1535 80.0 1
36 1 49 1560 80.0 1
37 1 53 2050 80.0 1
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 19
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! See boxcox in the Stata reference manual for more details and
examples
! Label variables
! List data
! Get boxplots
! NOTE: The do-file and data file are on the website as alt3-
1ex.do and alt3-1ex.dat
.
. * Turn off MORE feature
.
. set more off
.
.
.
. * Input data
.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 20
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
.
.
.
. * Variable labels
. label variable sno "Study No."
.
. label values react reactlbl
.
.
.
.
. * Save Stata dataset
.
. save alt3-1ex.dta, replace
file alt3-1ex.dta saved
.
.
. * List data for checking
.
. list in 1/10
+-------------------------------------------+
| sno react age sadose si censor |
|-------------------------------------------|
1. | 1 No 44 1560 1 0 |
2. | 2 No 65 1310 1.2 0 |
3. | 3 No 58 850 1.2 0 |
4. | 4 No 57 1250 1.7 0 |
5. | 5 No 51 950 1.8 0 |
|-------------------------------------------|
6. | 6 No 64 850 1.8 0 |
7. | 7 No 33 1200 1.9 0 |
8. | 8 No 61 1390 2 0 |
9. | 9 No 49 1450 2.3 0 |
10. | 10 No 67 3300 2.8 0 |
+-------------------------------------------+
.
.
.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 21
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
. * Descriptive Statistics
.
. summarize , detail
Study No.
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 2 1
10% 4 2 Obs 65
25% 9 2 Sum of Wgt. 65
Adverse Reaction
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 65
25% 1 1 Sum of Wgt. 65
Dose of SA (mg)
-------------------------------------------------------------
Percentiles Smallest
1% 310 310
5% 360 310
10% 410 360 Obs 65
25% 860 360 Sum of Wgt. 65
50% 1260 Mean 1249.538
Largest Std. Dev. 622.3134
75% 1460 2460
90% 2010 2760 Variance 387274
95% 2460 2950 Skewness .9572716
99% 3300 3300 Kurtosis 4.426923
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 22
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
Sulphoxidation Index
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1.7 1.2
10% 1.9 1.2 Obs 65
25% 3.4 1.7 Sum of Wgt. 65
censor
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 65
25% 0 0 Sum of Wgt. 65
50% 0 Mean .2615385
Largest Std. Dev. .4428926
75% 1 1
90% 1 1 Variance .1961538
95% 1 1 Skewness 1.085217
99% 1 1 Kurtosis 2.177696
.
.
.
. * Stem and leaf
. stem age
. stem sadose
Stem-and-leaf plot for sadose (Dose of SA (mg))
0*** | 310,310,360,360,360
0*** | 410,410,435,510,560
0*** | 660,690,760
0*** | 810,850,850,860,910,910,910,910,910,950,960
1*** | 110,135,135
1*** | 200,210,235,250,250,260,260,260,310,310,310,310,350,350,360,390,390
1*** | 410,410,410,450,460,460,460,535,560,560
1*** | 660
1*** | 810,910,935
2*** | 010,050
2*** | 310
2*** | 460
2*** | 760
2*** | 950
3*** |
3*** | 300
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 23
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
. stem si
0** | 10,12,12,17,18,18,19,20,20,20,20,23,28,28,30,34,34,35,38,38,42,49
0** | 53,54,57,59,62,65
1** | 20,30,30,39,47
1** | 54,57,66,66,66,88
2** | 20,23
2** |
3** | 32
3** |
4** |
4** | 70,70
5** |
5** |
6** | 10
6** | 50,50
7** | 00
7** |
8** | 00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
.
.
.
. * Scatterplots Matrix
. graph box age, over (react) t1(AGE BOXPLOTS) t2(" ") l1(A
> GE) b1(REACTION)
(file alt3-1ex\boxplot1.gph saved)
.
. graph export alt3-1ex\scatmat.wmf,replace
(file C:\jt\bio624\2004\progs\alt3-1ex\scatmat.wmf written in Windows Metafile format)
SCATTERPLOT MATRIX
Adverse
Reaction
80
60 Age in
40 years
20
AGE
4000
Dose
2000 of SA
(mg)
0
100
50 Sulphoxidation
Index
0
1 1.5 220 40 60 800 2000 4000
REACTION
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 24
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
.
.
. * Dot diagram
.
. sort react
.
. dotplot age , by (react) t1(AGE DOTPLOT) l1(AGE) b1(REAC
> TION)
(file alt3-1ex\dotplot1.gph saved)
. graph export alt3-1ex\dotplot1.wmf,replace
(file C:\jt\bio624\2004\progs\alt3-1ex\dotplot1.wmf written in Windows Metafile format)
AGE DOTPLOT
70
60
Age in years
AGE
50 40
30
Yes No
Adverse Reaction
REACTION
.
. dotplot sadose, by (react) t1(SA DOSE DOTPLOT) l1(SADOSE M
> G) b1(REACTION)
(file alt3-1ex\dotplot2.gph saved)
SA DOSE DOTPLOT
4000 3000
Dose of SA (mg)
SADOSE MG
1000 2000
0
Yes No
Adverse Reaction
REACTION
.
. dotplot si, by (react) t1(SI DOSE DOTPLOT) l1(SI)
> b1(REACTION)
(file alt3-1ex\dotplot3.gph saved)
. graph export alt3-1ex\dotplot3.wmf,replace
(file C:\jt\bio624\2004\progs\alt3-1ex\dotplot3.wmf written in Windows Metafile format)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 25
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
SI DOSE DOTPLOT
80 60
Sulphoxidation Index
SI
20 400
Yes No
Adverse Reaction
REACTION
. list age if react==1 & ( (age >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (age <=( r(l_F) - 1.5*(r(u_F
> ) - r(l_F)))) )
.
.
. lv age if react==2 ,generate
# 28 Age in years
---------------------------------
M 14.5 | 52 | spread pseudosigma
F 7.5 | 41.5 51.25 61 | 19.5 14.65586
E 4 | 37 52 67 | 30 13.28402
D 2.5 | 34 52.75 71.5 | 37.5 13.11905
C 1.5 | 32 52 72 | 40 11.51282
1 | 31 51.5 72 | 41 10.41174
| |
| | # below # above
inner fence | 12.25 90.25 | 0 0
outer fence | -17 119.5 | 0 0
. list age if react==2 & ( (age >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (age <=( r(l_F) - 1.5*(r(u_F)
> - r(l_F)))) )
.
.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 26
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
# 37 Dose of SA (mg)
---------------------------------
M 19 | 1135 | spread pseudosigma
F 10 | 560 985 1410 | 850 657.2313
E 5.5 | 385 997.5 1610 | 1225 563.183
D 3 | 360 1185 2010 | 1650 563.0501
C 2 | 310 1180 2050 | 1740 512.0124
B 1.5 | 310 1405 2500 | 2190 587.8463
1 | 310 1630 2950 | 2640 633.4493
| |
| | # below # above
inner fence | -715 2685 | 0 1
outer fence | -1990 3960 | 0 0
. list sadose if react==1 & ( (sadose >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (sadose <=( r(l_F) - 1
> .5*(r(u_F) - r(l_F)))) )
+--------+
| sadose |
|--------|
37. | 2950 |
+--------+
.
. lv sadose if react==2 , generate
# 28 Dose of SA (mg)
---------------------------------
M 14.5 | 1330 | spread pseudosigma
F 7.5 | 930 1220 1510 | 580 435.9179
E 4 | 850 1580 2310 | 1460 646.489
D 2.5 | 830 1720 2610 | 1780 622.7175
C 1.5 | 785 1907.5 3030 | 2245 646.157
1 | 760 2030 3300 | 2540 645.0197
| |
| | # below # above
inner fence | 60 2380 | 0 3
outer fence | -810 3250 | 0 1
. list sadose if react==2 & ( (sadose >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (sadose <=( r(l_F) - 1
> .5*(r(u_F) - r(l_F)))) )
+--------+
| sadose |
|--------|
26. | 2460 |
27. | 2760 |
28. | 3300 |
+--------+
.
.
. lv si if react==1 ,generate
# 37 Sulphoxidation Index
---------------------------------
M 19 | 22.3 | spread pseudosigma
F 10 | 13 46.5 80 | 67 51.80529
E 5.5 | 4.4 42.2 80 | 75.6 34.75644
D 3 | 2 41 80 | 78 26.61691
C 2 | 2 41 80 | 78 22.95228
B 1.5 | 2 41 80 | 78 20.93699
1 | 2 41 80 | 78 18.71555
| |
| | # below # above
inner fence | -87.5 180.5 | 0 0
outer fence | -188 281 | 0 0
. list si if react==1 & ( (si >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (si <=( r(l_F) - 1.5*(r(u_F)
> - r(l_F)))) )
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 27
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
. lv si if react==2 ,generate
# 28 Sulphoxidation Index
---------------------------------
M 14.5 | 3.8 | spread pseudosigma
F 7.5 | 1.95 8.675 15.4 | 13.45 10.10879
E 4 | 1.7 40.85 80 | 78.3 34.6713
D 2.5 | 1.2 40.6 80 | 78.8 27.5675
C 1.5 | 1.1 40.55 80 | 78.9 22.70904
1 | 1 40.5 80 | 79 20.06164
| |
| | # below # above
inner fence | -18.225 35.575 | 0 6
outer fence | -38.4 55.75 | 0 5
. list si if react==2 & ( (si >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (si <=( r(l_F) - 1.5*(r(u_F)
> - r(l_F)))) )
+----+
| si |
|----|
23. | 47 |
24. | 70 |
25. | 80 |
26. | 80 |
27. | 80 |
|----|
28. | 80 |
+----+
.
.
.
.
.
.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 28
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
. * Boxplots
.
. sort react
.
. graph box age, over (react) t1(AGE BOXPLOTS) t2(" ") l1(A
> GE) b1(REACTION)
(file alt3-1ex\boxplot1.gph saved)
AGE BOXPLOTS
70
60
Age in years
AGE
50 40
30
Yes No
REACTION
.
. graph box sadose, over (react) t1(SA DOSE BOXPLOTS) t2("
> ") l1(DOSE MG) b1(REACTION)
(file alt3-1ex\boxplot2.gph saved)
. graph exort alt3-1ex\boxplot2.wmf,replace
(file C:\jt\bio624\2004\progs\alt3-1ex\boxplot2.wmf written in Windows Metafile format)
SA DOSE BOXPLOTS
4,000 3,000
Dose of SA (mg)
DOSE MG
1,000 2,000
0
Yes No
REACTION
.
. graph box si, over (react) t1(SI DOSE BOXPLOTS) t2(" ") l
> 1(SI) b1(REACTION)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 29
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
SI DOSE BOXPLOTS
80 60
Sulphoxidation Index
40
SI
20 0
Yes No
REACTION
.
.
* Shapiro-Wilk Test for Normality
.
. swilk age sadose si
Shapiro-Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+-------------------------------------------------
age | 65 0.98503 0.868 -0.307 0.62061
sadose | 65 0.92756 4.199 3.107 0.00094
si | 65 0.82921 9.901 4.964 0.00000
.
.
.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 30
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
80
71
70
Age in years
60
AGE
53
50 40
33
30
30 40 50 60 70 80
Inverse Normal
AGE Q-Q PLOT
Grid lines are 5, 10, 25, 50, 75, 90, and 95 percentiles
2460
SA DOSE
2000
1260
1000
360
0
.
. qnorm si , grid b1(SI Q-Q PLOT) l1(SI)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 31
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
9.5 Log Showing Commands and Output (cont'd)
100
80
Sulphoxidation Index
50
SI
1.714.7
0 -50 -50 0 50 100
Inverse Normal
SI Q-Q PLOT
Grid lines are 5, 10, 25, 50, 75, 90, and 95 percentiles
.
.
* Box-Cox method to choose transformation to normality
.
. * nolog option suppresses iterations - nothing to do with logarithms
.
. boxcox age , nolog
Fitting comparison model
Fitting full model
Number of obs = 65
LR chi2(0) = 0.00
Log likelihood = -248.73918 Prob > chi2 = .
------------------------------------------------------------------------------
age | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/theta | 1.028826 .527121 1.95 0.051 -.004312 2.061964
------------------------------------------------------------------------------
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 32
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
------------------------------------------------------------------------------
sadose | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/theta | .4100593 .1929563 2.13 0.034 .031872 .7882467
------------------------------------------------------------------------------
Number of obs = 65
LR chi2(0) = 0.00
Log likelihood = -285.74575 Prob > chi2 = .
------------------------------------------------------------------------------
si | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/theta | .0403967 .1055843 0.38 0.702 -.1665448 .2473382
------------------------------------------------------------------------------
---------------------------------------------------------
Test Restricted LR statistic P-Value
H0: log likelihood chi2 Prob > chi2
---------------------------------------------------------
theta = -1 -333.2825 95.07 0.000
theta = 0 -285.81928 0.15 0.701
theta = 1 -319.4322 67.37 0.000
---------------------------------------------------------
.
.
.
.
.
. * Close the log -- may want to use for production runs
. *log close
10. Example 2: input and display of data from
Altman’s exercise 3-2
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 33
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! Data: These data are found on p.47 of Altman (Exercise 3.2). The
data concerns airplane accidents (counts, rates/1000, and rates
per 100,000 flight hours) and how they relate to occupation of
the pilot
! NOTE: The script file and data file are on the class disk as
alt3-2ex.do and alt3-2ex.dat
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 34
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
10.2 Raw data — text file on disk (cont'd)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 35
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! Explore this simple dataset with several graphs using the graph
command
.
.
. * Turn off MORE feature
.
. set more off
.
.
.
. * Input data, embedded blanks in string
.
. infix str occup 1-29 accid 30-34 rate1 40-44 rate2 50-54 using alt3-2ex.dat
(13 observations read)
.
.
.
. * Variable labels
. label variable occup "Occupation"
.
. * List data for checking
.
. list
+-----------------------------------------------------+
| occup accid rate1 rate2 |
|-----------------------------------------------------|
1. | Professional pilots 1302 15.9 .2 |
2. | Lawyers 57 11 1.5 |
3. | Farmers 166 10.1 1.3 |
4. | Sales representatives 137 9 1.2 |
5. | Physicians 76 8.7 1.8 |
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 36
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
10.4 Stata log (cont'd)
|-----------------------------------------------------|
6. | Mechanics and repairmen 44 6.9 1.5 |
7. | Policemen and detectives 48 6.6 1.8 |
8. | Managers and administrators 643 6 .7 |
9. | Engineers 125 4.7 1.1 |
10. | Teachers 43 4.2 1.1 |
|-----------------------------------------------------|
11. | Housewives 29 3.7 3.2 |
12. | Academic students 188 3.2 3.7 |
13. | Armed Forces Members 111 1.6 .7 |
+-----------------------------------------------------+
.
.
.
. * Code occupations for graphs
. encode occup, gen(occup1)
.
.
.
. * Make shorter labels for graphs
.
. #delimit ;
delimiter now ;
. label define occuplab 1 "Acad" 2 "Armed For" 3 "Engin"
> 4 "Farm" 5 "Housewife" 6 "Law"
> 7 "Mgrs" 8 "Mech" 9 "MD"
> 10 "Police" 11 "Pro Pilot" 12 "Sales"
> 13 "Teach" ;
. #delimit cr
delimiter now cr
.
. label values occup1 occuplab
.
.
.
.
. * Save as Stata dataset
.
. save alt3-2ex.dta, replace
file alt3-2ex.dta saved
.
.
. * Bar graph, See Figure 1
.
. sort occup1
.
. graph hbar accid , over(occup1,sort(1)) ytitle(" ") l1(OCCUPAT
> ION) b1(No. of Accidents) t1 (AIRPLANE ACCIDENTS)
AIRPLANE ACCIDENTS
Housewife
Teach
Mech
Police
OCCUPATION
Law
MD
Armed For
Engin
Sales
Farm
Acad
Mgrs
Pro Pilot
No. of Accidents
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 37
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
10.4 Stata log (cont'd)
.
.
.
. * Bar graph, See Figure 2
.
. graph hbar rate1 , over(occup1,sort(1)) ytitle(" ") l1(OCCUPAT
> ION) b1(Rate per 1000 Pilots) t1 (AIRPLANE ACCIDENTS)
AIRPLANE ACCIDENTS
Armed For
Acad
Housewife
Teach
OCCUPATION
Engin
Mgrs
Police
Mech
MD
Sales
Farm
Law
Pro Pilot
0 5 10 15
.
. * Bar graph See Figure 3
.
. graph hbar rate2 , over(occup1,sort(1)) ytitle(" ") l1(OCCUPAT
> ION) b1(Rate per 100000 hrs) t1 (AIRPLANE ACCIDENTS)
(file alt3-2ex\fig3.gph saved)
. graph export alt3-2ex\fig3.wmf,replace
(file C:\jt\bio624\2004\progs\alt3-2ex\fig3.wmf written in Windows Metafile format)
AIRPLANE ACCIDENTS
Pro Pilot
Armed For
Mgrs
Engin
OCCUPATION
Teach
Sales
Farm
Law
Mech
MD
Police
Housewife
Acad
0 1 2 3 4
.
.
. * Scatterplot See Figure 4
.
. graph twoway scatter rate1 rate2, mlabel(occup1) t1(AIRPLANE ACCIDENT RATES)
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 38
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
10.4 Stata log (cont'd)
15
Law
10
Rate per 1000
Farm
Sales
MD
Mech
Police
Mgrs
5
Engin
Teach
Housewife
Acad
Armed For
0
0 1 2 3 4
Rate per 100,000 hr
Law
10
Rate per 1000
Farm
Sales
MD
Mech
Police
Mgrs
5
Engin
Teach
Housewife
Acad
Armed For
0
0 1 2 3 4
Rate per 100,000 hr
.
.
. log close
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 39
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
11. Common data analysis applications 11.4 Confidence interval for a mean
! For simplicity of illustration, the data from the rheumatoid arthritis ! Calculate a 95% confidence interval for the mean value of a
data introduced earlier will be used in all the examples, some of variable
which may be contrived or inappropriate
! Variable: age
! The examples shown below assume that the Stata dataset has
been loaded into the work space through input of the raw data ! Command:
or by loading a saved data (e.g., use alt3-1ex\alt3-1ex.dta)
ci age
summarize age sadose si , detail ! Calculate a 95% confidence interval for the proportion positive in a
binomial distribution. Stata calculates exact binomial limits.
11.2 Stem-and-leaf charts Note: Stata can also calculate limits for the mean of Poisson
distribution using the poisson option of the ci or cii commands.
. cii 65 17
11.3 Boxplots
! Poisson example ( 27 deaths, 645 person-years):
! Variables:
— Subgrouping: reac
— Analysis: age
sort react
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 40
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
! Used to test equality of means. It comes in 3 forms: ! Immediate forms of commands can be used as a “calculator” to
test equality of proportions in a 2x2 table. Enter the rows of the
— Test that variable has a mean equal to specific # — this is table separated by a “\” character:
the one-sample t-test
tabi 24 24 \ 13 4 , chi2 exact
— Test that variable1 has the same mean as variable2 — this
is the paired t-test
— Test that variable has the same mean within two groups 11.8 Correlation
defined by a grouping variable groupvar — this is the two-
sample t-test
! Obtain either the Pearson’s or Spearman’s (rank) estimated
Note: Stata gives p-values for the t-tests, but also gives 95% correlation coefficient of two measured responses x and y
confidence intervals on means and differences in means
! Variables: age and si
! Variables: age with reac as the subgrouping variable
! Commands: ! Commands:
— Paired t-test: (Stupidly, for illustration) test mean sadose = si Note: Pairs of correlations among a set of variables may be
ttest sadose = si obtained by specifying the list of variables. E.g., to obtain
age-sadose, age-si, and sadose-si correlations:
— Two-sample t-test: Test age means are equal within reaction
groups corr age sadose si
! Immediate forms of commands can be used as a “calculator” to ! Estimate simple linear model relating a measured response
get t-test given summary data on n, and the observed means (dependent) variable y to a fixed, covariate (independent)
and standard deviations (sd): variable x — y = α+βx+ε
— One-sample test (n=24, observed mean=62.6, sd=15.8; test Stata produces an analysis of variance, p-values, coefficient
mean=75) estimates, standard errors, and 95% confidence intervals
! Use to test equality of proportions within two subgroups graph export alt3-1ex\lreg.wmf,replace
Note: Stata gives the 2x2 chi-square test and p-value. It also
gives the Fisher’s exact test p-value
! Commands:
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 41
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
help epitab
! Used to tests equality of means withing two or more subgroups —
usually 3 or more as the t-test is usually used for 2 groups For convenience, the Help text is included below
! Command:
oneway si reac
help regress
! Also see Stata User’s Guide Chapters 26 and 35 (in the handout
for Part 1) for more details on fitting regression models
! Use ologit for logistic regression for ordered responses with more
than 2 categories
! Use mlogit for logistic regression for responses with more than 2
categories (not ordered)
help logistic
help clogit
help ologit
help mlogit
! Also see Stata User’s Guide Chapters 26 and 35 (in the handout
for Part 1) for more details on fitting regression models
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 42
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
11.13 Epidemiologic calculations - epitab (cont'd)
. help epitab
-------------------------------------------------------------------------------
help for epitab, ir, iri, cs, csi, cc, cci, mcc, mcci (manual: [R] epitab)
-------------------------------------------------------------------------------
Description
-----------
ir is used with incidence rate (incidence density or person-time) data; point
estimates and confidence intervals for the incidence rate ratio and difference
are calculated along with attributable or prevented fractions for the exposed
and total population. iri is the immediate form of ir; see help immed.
Also see help nbreg, help poisson and help stcox for related commands.
cs is used with cohort study data with equal follow-up time per subject and,
in some cases, cross-sectional data. Risk is then the proportion of subjects
who become cases. Point estimates and confidence intervals for the risk dif-
ference, risk ratio, and (optionally) the odds ratio are calculated along with
attributable or prevented fractions for the exposed and total population. csi
is the immediate form of cs; see help immed. Also see help logistic and help
glogit for related commands.
mcc is used with matched case-control data. McNemar's chi-squared, point esti-
mates and confidence intervals for the difference, ratio, and relative differ-
ence of the proportion with the factor, along with the odds ratio, are calcu-
lated. mcci is the immediate form of mcc; see help immed. Also see help
clogit for a related command.
Options
-------
exact requests Fisher's exact P be calculated rather than the chi-squared and
its significance level. We recommend specifying exact whenever samples are
small. A conservative rule-of-thumb for 2x2 tables is to specify exact
when the least-frequent cell contains fewer than 1,000 cases. Note that
exact does not affect whether exact confidence intervals are calculated;
commands always calculate exact confidence intervals where they can unless
tb or woolf is specified.
by(varname) specifies that the tables are stratified on varname. Within-
stratum statistics are shown then combined with Mantel-Haenszel weights.
If estandard, istandard, or standard() is also specified (see below), the
weights specified are used in place of Mantel-Haenszel weights.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 43
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
11.13 Epidemiologic calculations - epitab (cont'd)
woolf requests that the Woolf approximation, also known as the Taylor expan-
sion, be used for calculating the standard error of the odds ratio. Other-
wise, the Cornfield approximation is used. The Cornfield approximation
takes substantially longer (a few seconds) to calculate than the Woolf
approximation. This standard error is used in calculating a confidence
interval for the odds ratio. (For matched case-control data, exact con-
fidence intervals are always calculated.)
estandard external weights are the person-time for the unexposed (ir),
the total number of unexposed (cs), or the number of unexposed controls
(cc).
istandard internal weights are person-time for the exposed (ir), the total
number of exposed (cs), or the number of exposed controls (cc). istandard
can be used, among other things, to produce standardized mortality
ratios (SMRs).
standard(varname) allows user-specified weights. varname must contain
a constant within stratum and be nonnegative. The scale of varname is
irrelevant.
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 44
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
11.13 Epidemiologic calculations - epitab (cont'd)
The basic syntax (ignoring options) for iri is "iri #a #b #N1 #N2".
For example:
The basic syntax (ignoring options) for ir is "ir case_var ex_var time_var".
case_var contains the number of cases represented by an observation. ex_var
contains 0 if the observation represents unexposed and nonzero (e.g., 1) if the
observation represents exposed. time_var contains the exposure time (e.g.,
person-years) represented by the observation. ir obtains the table by summing
across observations. Observations with missing values are not used.
. list
cases exposed time
1. 20 1 14000
2. 21 1 14010
3. 15 0 19017
. ir cases exposed time, level(90)
(output omitted)
. gen wgt=1
. irr deaths exposed pyears, by(agegrp) standard(wgt)
Exposed Unexposed
------------+---------------------
Cases | a b
Noncases | c d
. csi 7 12 9 2, exact
The basic syntax (ignoring options) for cs is "cs case_var ex_var". case_var
contains 1 if the observation represents a case and nonzero (e.g., 1) if it
represents a noncase. ex_var contains 0 if the observation represents unex-
posed and nonzero (e.g., 1) if it represents exposed. Frequency weights are
allowed.
. list
case exp pop
1. 0 0 2
2. 0 1 9
3. 1 0 12
4. 1 1 2
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 45
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
5. 1 1 5
. gen wgt=1
. cs case exposed [freq=pop], by(age) standard(wgt)
cc and cci work just like cs and csi. They differ in that they report the
odds ratio rather than the risk ratio.
Also see
--------
help sampsi
11.14 Sample size and power calculations For convenience, the Help text is given below:
! Also see the free sample size software from Dupont and Plummer
– “Other Links” on the course website Home page
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 46
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
11.14 Sample size and power calculations (cont'd)
. help sampsi
-------------------------------------------------------------------------------
help for sampsi (manual: [R] sampsi)
-------------------------------------------------------------------------------
Description
-----------
Options
-------
alpha(#) specifies the significance level of the test; the default is
alpha(.05). (More correctly, the default is 1-level/100 from set level,
see help level.)
n1(#) specifies the size of the first (or only) sample and n2(#) specifies
the size of the second sample. If specified, sampsi reports the power
calculation. If not specified, sampsi computes sample size.
ratio(#) is an alternative way to specify n2() in two-sample tests. In a
two-sample test, if n2() is not specified, n2() is assumed to be
n1()*ratio(). That is, ratio() = n2()/n1(). The default is
ratio(1).
sd1(#) and sd2(#) are the standard deviations for comparison of means. If
not specified, comparison of proportions is assumed. In two-sample
cases, if only sd1() is specified, sd2() is assumed to equal sd1().
Examples
--------
Compute power with n1 = n2, sd1 = sd2, and alpha = 0.01 one-sided:
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 47
Class 1 - Introduction; Overview of Stata -- LECTURE NOTES
11.14 Sample size and power calculations (cont'd)
Compute power:
. sampsi 0.5 0.6, n(200) onesam
Also see
--------
Manual: [R] sampsi
On-line: help for immed
Biostatistics 624 © 2011 by JHU Biostatistics Dept. Sun, 27 Mar 2011 (6:47p) CLASS 1 - 48