Using SPSS: Prepared by Pam Schraedley January 2002
Using SPSS: Prepared by Pam Schraedley January 2002
This document was prepared using SPSS versions 8 and 10 for the PC. Mac versions or other
PC versions may look slightly different, but most instructions here should still work.
Table of contents
1. Getting Started
1.1 Entering data from scratch........ 3
Defining Variables (SPSS version 8)... 3
Defining Variables (SPSS version 10)..... 4
1.2 Importing data from Excel.... 6
2. Getting your data in shape
2.1 Calculating variables.8
2.2 The If button .... 9
2.3 Recoding Variables .. 10
Recoding into Same Variables..... 10
Recoding into Different Variables....... 11
Special case: Median (or tertile or quartile) splits ...... 12
2.4 Select cases... 13
2.5 Merging files. 14
Adding cases 15
Adding variables.. 16
3. Analyzing your data
3.1 Independent Samples t-test .. 18
3.2 Paired t-test .. 19
3.3 Oneway simple ANOVA.. 21
3.4 Chi square contingency test . 24
3.5 Correlations (simple and partial).. 25
3.6 Regression.... 27
3.7 ANOVA models and GLM . 30
Repeated Measures ......... 34
3.8 Reliability .37
4. Taking a look at your data
4.1 Checking the numbers . 39
Frequencies . 39
Tables .. 40
4.2 Graphing and plotting .. 42
Scatterplots . 42
Histograms .. 43
Bar charts 43
5. Output
5.1 Organizing your output 45
5.2 Results Coach .. 46
6. Using Syntax
6.1 The Paste function 48
6.2 Creating a Session Journal ... 48
7. For more information .... 49
1. Getting started
1.1 Entering data from scratch:
You will first want to create a template into which to enter data by defining variables. This is
done differently in SPSS 8 and SPSS 10, and is the most commonly used feature that differs
between the 2 versions.
Defining variables (SPSS version 8)
Under the Data tab, click Define Variable.
Important note about entering data
in SPSS:
SPSS likes it best when all of the data
for one subject are on one line. For
doing paired t-tests, repeated
measures ANOVAs and complicated
ANOVA designs, your life will be
easier if you enter your data this way.
If you generally have a huge matrix
of data for each subject in which this
would be prohibitive, maybe SPSS is
not the stats package for you.
Type your variable name into the Variable Name Box (circled in red above). Variable names
must have 8 characters or less. Specify the variable Type by clicking on Type (circled in green
above). Numeric is the default, but date or string are other common types. If numeric, you can
specify the number of decimal places here. Specify whether your variable is scale, ordinal, or
nominal (circled in purple above). It has to be scale if you want to do things like add and
average it or to do typical statistics like t-tests. Specify labels for your variables by clicking on
Labels (circled in blue above). I strongly recommend that you do this. Many the grad student
has come back to their data a year later and had no idea what boms47 meant. Here is the Labels
dialog box:
Specify a Variable label (i.e. tell yourself that boms47 is the Brat-o-meter Scale, question #47,
hairpulling). Enter this information in the Variable label box (circled in red). Then specify
value labels if appropriate. For example, entering 1 may mean the person responded with I
never pull peoples hair; 2 means I pull peoples hair occasionally; 3 means I pull peoples
hair often etc. In that case, you would enter 1 in the Value box above (circled in green), enter
I never pull peoples hair into the Value label box (circled in blue), then click Add (red arrow).
Your new value label will appear in the box circled in purple. Do that for value=2, 3, and so on
until you have all of your values entered. Then click continue to return to the Define variable
dialog box.
Click OK in the Define Variable dialog box, and that variable will be created. If you want to do
a whole slew of similar ones of these (e.g. boms1 - boms50), there may be easier ways. You can
do one, and then copy and paste the syntax to create all of your variables. Ill explain how to do
this in the Using Syntax section below.
Defining variables (SPSS version 10)
The good news is that defining variables got much easier in SPSS version 10.
At the opening screen, you will see two tabs at the bottom of the grid (circled in red below): You
start out in the Data View tab. You can click on the Variable View tab to define variables.
Once in Variable View, you can enter a variable name in the first
column, labeled name. Here, I have entered our old boms47 into the
first column, and all of the defaults have filled themselves in:
At the red arrow, you can see all of the characteristics of your variable
that can be specified, including our old Label (meaning variable label)
and Values (meaning value labels). Again, I strongly recommend that
you use variable and value labels. If you click in the Label box for
your variable, you will see 3 little dots in a box (circled in red below).
Clicking on those dots will pop up the Value labels dialog box (circled in
green below). You can add value labels using this dialog box in the
same way you did in version 8.
Once you have your data in Excel 4.0 format, open SPSS. Click on FileOpenData (red
arrow below) which will open the Open File dialog box. Under files of type choose Excel
(*.xls) (green arrow below) to show your Excel 4.0 file. Choose your file (note: it must not be
currently open in Excel or you will not be able to open it in SPSS).
Once you are finished with your If condition, clicking Continue will return you to the Compute
Variable dialog box, or whatever box you were in prior to clicking the If button.
For example, lets say that boms46 is a reverse coded item (e.g. 1 is I am a big brat; 2 is I am a
medium sized brat; etc.) so 1 becomes 4, 2 becomes 3, and so on.
Recoding into Same Variables
To the left is the Recode into same
variable dialog box. I have clicked
boms46 over into the Numeric Variables
box to be recoded. You can send over
more than one variable at a time if they
will need the same recoding operation
(e.g. do all of your reverse coded items
at once). I will then click the Old and
New Values button (circled in red). You
will also see our old friend the If
button which will let you specify the
conditions under which you want to
recode.
Clicking on Old and New Values (above) brings up the Old and New Values dialog box below.
Entering a 2 into the Old Value box (red arrow) and a 3 into the New Value box (green arrow)
and then clicking the Add button (circled in red) will make all 2s in the boms46 column change
to 3s. Once you add them, they will appear in the Old New Box where the 1 4 already
10
appears. You can also change a range of numbers using the three Range options (outlined in
purple) or change all remaining values to some value. For example, you could recode all other
values to system missing by clicking All other values on the left (inside the purple square) and
system missing on the right (blue arrow) then clicking Add. Or change missing values to zeros
by clicking system missing on the left (above the purple box) and entering zero in the new value
box on the right (green arrow) then clicking Add. Dont forget to click Add. (Its easy to forget).
When youre done, click Continue to go back to the Recode dialog box. Then click OK.
11
Values button
(circled in blue to
left). This will
send you to an
identical dialog
box to that for
Recode into Same
Variables (above).
Follow those
instructions to
recode your
variable(s) then
click Continue
and OK. Your
new variable will
be at the end of
your dataset.
Special case: Median (or tertile or quartile) splits
One common form of recoding is to divide your variable values into two groups, split at the
median (or into four quartile groups, etc.). To do this in SPSS, there is a secret function in Rank
Cases. Click on Transform Rank Cases (circled in red below). This will bring up the Rank
Cases dialog box (below). Click over the variable(s) you want to recode (circled in green below)
then click the Rank Types button (circled in blue below).
You should leave the default of Assign Rank 1 to Smallest Value (red arrow above) unless you
want your highest values to be assigned a value of 1 in your new recoded variable. Clicking on
Rank Types (circled in blue above) will get you to the Rank Cases: Types dialog box below.
12
13
When you filter cases, a diagonal line will go through the case
number as shown to the right (column indicated by the red arrow)
for cases that are being filtered out. That is, for cases that are NOT
selected. In this case, I selected cases if boms45 <= 2, so all 3s
and 4s are filtered. Any analyses I do at this point will not include
any subjects who scored a 3 or 4 on boms45. Dont forget to
Select all cases again when you are done. Incidentally, this also
creates a variable called FILTER_$ in your dataset that takes a
value of 1 if the case is selected and 0 if it is filtered out. You can
ignore that variable if you like, but sometimes it can be useful.
14
Adding cases
To add cases to an existing data file, go to DataMerge filesAdd Cases (circled in red below).
That will pop up the Add cases: Read file window shown below. Click on the file that contains
the cases you need to add. In this case, that is boms3.sav.
15
Adding variables
Adding variables is a little more tricky. You will need
a variable with the same name in both files (for
example, id). Before you start, you have to sort
BOTH data files in ascending order by that variable,
which SPSS calls a key variable. For example, you
will see in the case to the right that the variable id
(our key variable) is NOT sorted (red arrow to right).
Merging to add variables will not work in this case.
To sort by id, click on DataSort Cases (circled in
green below). This will pop up the Sort Cases dialog
box. As you can see, I have clicked over id into the
Sort by box (red arrow below), and it is sorted in
ascending order (the default). Once you do this to
both data files, you are ready to merge and add
variables.
Go into one of your data files, and click DataMerge FilesAdd Variables (circled in red
below). This will pop up the Add Variables: Read File window. Choose the (sorted) file that has
the additional variables you want to add to your current (sorted) data file. In this case that is
boms2 (green arrow below). Click OK.
16
arrow button) over to the Excluded Variables box. This is a good way to clean up your dataset so
you are only looking at the variables you need. But make sure you keep your original data
somewhere so you dont have to re-enter it.
18
The window to the left above shows an outline of all of your output. I like to rename the tests so
I can see what Ive done. For example, I would call this T-test of bomstot by gender (rather than
just T-test) Ill show you how to do that later in the output section. You can see that SPSS has
spit out the two categories (female and malered arrow above), the N for each group (green
arrow above) and the mean for each group (blue arrow above) as well as the standard deviation
and the standard error. Woohoo, boys are brattier than girls according to the means, but is it
significant? Levines test for quality of variances (outlined in red above) is not significant, so the
variances can be assumed to be equal. In that case, you use the first line of results (in purple
above). If the Levines test had been significant, we would use the lower line of results (in
orange above). You can see the t value, degrees of freedom, and p value in the green box above,
and the 95% confidence interval for the difference in the blue box. In this case, men and women
are not significantly different on the Brat-O-Meter Scale, t(28)=-1.529, p=.137.
3.2 Paired t-test
To do a paired t-test in SPSS, we will use the Time 1 vs. Time 2 bomstot variables. This will test
whether people were brattier at the first time point (lets say, right before a visit to see parents)
and the second (right after the same visit). Go to AnalyzeCompare MeansPaired Samples TTest (circled in red below). This will pop up the Paired samples t-test dialog box below. Click
19
on the 2 variables that you want to compare (here bomstot and bomstot2green arrows below)
then click the arrow button (circled in blue below). This will pair those two variables. Again,
the options button only allows you to change the percentage on your confidence interval, and the
default is 95%.
21
22
23
dataset (or 99s that someone entered as a missing value). Next, the Levines test for
homogeneity of variances is not significant (circled in red above) so equal variances can be
assumed. Next is a typical ANOVA table (outlined in green above) including SS, df, Mean
Squares, F, and p. This analysis is not significant (probably because the data are completely
random). Because you do not have a significant main effect, you should stop here, but we will
look at the output from the contrasts and post-hocs anyway as a learning exercise. In real life,
you do not look at these tests if your main ANOVA is not significant. The blue box above shows
the contrast coefficientsthis is just as a double-check. Next you have the contrast tests.
Because Levines above was not significant, you can use the first row of numbers (assume equal
variances). This table includes the contrast value blue arrow above), the t value (purple arrow
above), df (orange arrow above), and significance (pink arrow above). In this case, the contrast
value was 3.30 and was not significant t(27)=-.774, p=.446. Finally we come to the multiple
comparisons. In the blue box above, you can see the mean difference for each pairwise
comparison and the significance value. When a difference is significant, the mean difference is
starred. The purple box above shows the confidence intervals for the differencethese all
include zero, confirming that out differences are not significant.
3.4 Chi square contingency test
This is the question about SPSS that I have fielded more than any other question. This oft-used
test is just not where you would think. As an example, we can examine whether gender is
associated with scoring above or below the median on the bomstot variable (using our median
split nbomstot). Go to AnalyzeDescriptive StatisticsCrosstabs (circled in red below). Click
your two categorical
variables into the Row and
Column boxes (it doesnt
matter which goes into
which). Then click the
Statistics button (green arrow
to right). This will pop up
the Crosstabs: Statistics box
below. Check the Chi-square
box (red arrow below) to
perform the Chi-square test
on your contingency table.
Click Continue, then
OK.
24
Also notice that the Crosstabs: Statistics box is where you would go to perform a Kappa
reliability test (blue arrow above)Kappa is the reliability statistic used when two raters make
categorical judgments rather than continuous ratings.
Below is the Chi-square test output. First, youll see a Case Processing Summary (circled in
green to left). This
will pop
up in many of the
statistics you do.
Its good to check
that you have the
expected number of
cases included and
are not missing
large portions of
data. Next is the
crosstab, or
contingency table
(red arrow to left).
Finally the Chisquare test is
reported (in blue
box to left). The
Pearson Chi-square
on the first line is
the typical test used
for data of this sort.
Notice that SPSS
will warn you if you
have expected cell
counts lower than 5
(purple arrow to
left). This test
should not be used
in that case.
Note that the Chi-square model fit test is under AnalyzeNonparametric testsChi-square.
This is a different testone in which you assign expected values to cells and test the goodness of
fit of that model The fact that these are called the same thing has tricked many an SPSS user.
3.5 Correlations (simple and partial)
Simple correlations are a piece of cake in SPSS. You can do a whole slew of em if you want.
Go to AnalyzeCorrelateBivariate (circled in red below). Click over all of the variables that
you want to correlate. In this case, we have age, bomstot and bomstot2 (Time 1 and Time 2
brattiness). SPSS will compute all pairwise correlation. Thats itjust click OK.
25
26
3.6 Regression
The linear regression function in SPSS covers a lot of ground. Go to AnalyzeRegression
Linear (circled in red below). That will pop up the Linear regression dialog box shown below.
Enter your dependent measure (here we used bomstot) into the Dependent box (red arrow
below). Enter your independent variable(s) (here age) into the Independent(s) box (green arrow
below). You can enter more than one independent variable here. Choose a regression method if
you are using more than one independent variable using the pulldown menu (blue arrow below).
Enter is the default and is standard linear regression but you can also use stepwise regression,
either forward and backward, enter (and remove) variables in blocks using the Previous and Next
27
buttons, etc. This is a very versatile dialog box. Of more common use are the Statistics, Save,
and Options buttons.
The Statistics button (outlined
in green to left) brings up the
Statistics window below.
Checking the estimates box
(red arrow below) gives you
estimated for your regression
coefficients (or betas).
Checking the Model fit box
(green arrow below) gives you
an R2 for the regression model.
Checking R squared change
will tell you the change in R2 if
each variable (when you have
more than one independent
variable) is removed. Finally,
checking casewise diagnostics
(purple arrow below) will give
you information on outliers
outside a range that you specify
(here 2 standard deviations).
Clicking the Save button (outlined in purple above) allows you to save residuals of various kinds
from your regression in a column in your dataset (outlined in red to left below). This is useful in
examining residuals to look for a patterns and in computing corrected means. Finally, clicking
the Options button (outlined in orange above) allows you to remove the constant from your
regression (forcing it to go through zero) by unchecking the Include constant box (orange arrow
to right below). It also gives some options for Stepwise regression.
28
29
you to assign some of your covariates as categorical. Output will also include a Chi-square
goodness of fit test (to test the goodness of your prediction) and a table of predicted values. A
full treatment of logistic regression is beyond the scope of this guide, but it is fairly
straightforward to use the SPSS functionality if you read and understand a chapter or so on the
statistical test that you are performing.
3.7 ANOVA models and GLM
SPSS offers pretty much any kind of
ANOVA model you can think of. Lets
start with a univariate ANOVA.
Actually, the univariate GLM
encompasses ANCOVA as well. Go to
AnalyzeGeneral Linear
ModelUnivariate (circled in red to
right). Click your dependent measure
(continuous) into the Dependent Variable
box (outlined in green to right). Click
over any fixed factors (ordinary ANOVA
factorscategorical variables) into the
Fixed Factor(s) box (outlined in blue to
right). Enter any random effects factors
(such as region, classroom, etccheck a
statistics textbook if you are not sure)
into the Random Factor(s) box (outlined
in purple to right). Finally enter any
continuous predictors, or covariates, into
the Covariate(s) box (outlined in orange
to right). There is generally some
confusion about the meaning of the word
covariate. Many people use covariate to
mean a variable I dont care about, as
in Ill just covary out SES. But in
statistical and SPSS terms, a covariate is
simply a continuous
predictor. You CAN use this method to covary out age in the above example, but you would
use the exact same technique if you were interested in the effect of age as well as your factor
effects. Whew now we have all of our factors and covariates in place, but theres more. Click
on the Model button (red arrow above) to specify anything less than a fully crossed model For
example, lets say that we are interested in main effects of gender, birth order, and age, as well as
the interaction of gender and age, but no other interactions. We click on model which pops up
the Univariate: Model dialog box below to left. Click on the Custom radio button (red arrow
below to left) to specify a custom model. You will see that I have already sent over main effects
for gender and family and am about to send over the main effect of age (green arrow below to
left). Simply click on the effect you want to send over, then click the arrow button (outlined in
purple below to left). One the panel below and to the right, you can see I have sent over the
30
main effect of age, and also the interaction effect of age by gender (orange arrow below to right).
To do this just click on both age and gender, then while both are highlighted, click the arrow
button (outlined in purple below to left). Once you have the custom model you want, click
Continue.
31
32
33
Repeated measures
To give a full example of the functionality of the Repeated measures GLM, I have added 4 new
variables to our dataset. They are: bomsfam1, bomsfam2, bomsfrd1, and bomsfrd2. These
assess the family and friend subscales of the BOMS scale at Time 1 and Time2. These will help
me to show an example of a fully crossed within-subjects design.
To run a repeated measures ANOVA, go to
AnalyzeGeneral Linear ModelRepeated
Measures (circled in red to right). This will pop
up the Repeated Measures: Define Factor(s)
dialog box below. Here, you enter each withinsubjects factor in your design (saving your
between subjects factors for later). I have
already entered the subscale (family vs. friends)
factor (pink arrow to right). To enter the time
factor (Time 1 vs. Time 2), enter time in the
Within-subject factor name box (purple arrow to
right) then enter the number of levels for this
factor (blue arrow to right) then click Add
(green arrow to right). Once you have Added all
of your within-=subjects factors, click the
Define button (orange arrow to right).
This will pop up the Repeated measures dialog
box below. Here you can enter your between
subjects factors (here, birth order, blue arrow
below) and covariates (here, age, orange arrow
below). You also need to define your within
subjects variables at this point.
34
35
Sphericity Assumed
row in your ANOVA
table. (red arrow to
right). Otherwise, in
most cases, you can
assume Sphericity. In
fact, in most cases, all
rows within a cell of
this table will look the
same. This table also
gives information on
the error terms for
each group of tests
most importantly, the
MSE for these tests
(green arrows to
right). Next, SPSS
prints out tests of
within-subjects
contrasts (red arrow
on next page). It does
this even if you dont
request it, and uses
linear trend contrasts
as a default. These
tend not to be useful
to most people. You
can ignore this table
too. Finally, you get
to your between
subjects effects
ANOVA table (purple
arrow on next page).
You can see that Repeated measures GLM outputs quite a bit of material. You will probably
want to tidy this output up a little, which will be demonstrated in the Output section of this guide.
You can also see that we have a significant 3-way interaction in these data (subscale*time*family
above), thus showing that Type I error will give you a significant result every so often even when
nothing is going on.
36
3.8 Reliability
Another common analysis is to determine alpha reliabilityeither for scale or questionnaire
items or among raters or coders. In either case, the items (or people) to be compared must be
entered in columns and the subjects or observations must be entered in the rows. If you have
your data entered backwards, there is a transpose function in Excels Paste Special window. In
this case, we will use our old BOMS items and determine reliability. Here we will look at
boms1-boms10. Go to AnalyzeScaleReliability Analysis (circled in red below). This will
pop up the Reliability Analysis dialog box below. Click over all of your items or coders (here,
bomns1-boms10) into the Items box. Make sure your Model is set to Alpha (orange arrow
below). You can also set this Model to split-half or some other forms of reliability. If you like,
you can press the Statistics button (outlined in green below). That will take you to a dialog box
in which you can do item analysis (e.g. get the alpha with each item of the scale deleted to see if
any items are pulling your alpha down, etc.). Otherwise, just press OK to see your alpha.
37
38
Tables
Tables are also a good way to get a quick
look at whats going on in your data in
preparation for graphing. Go to
AnalyzeReportsCase summaries
(circled in red below). Click over the
variable you want statistics for in your
table (see green arrow below), and click
over any grouping variables (see blue
arrow below) Here, we will look at means and standard errors for bomstot by birth order. I
prefer to uncheck the Display cases box (orange arrow below) because I dont want a frequency
tableI just want the summaries, but you could leave that checked if you wanted a frequency
table at the same time
40
OK, its pretty picture time. You can use scatterplots to get an idea about the relationship
between two variables, histograms to get an idea about the distribution of your variables, and bar
charts to help interpret interactions or to show your results to your friends and family (I include
grant reviewers in this category).
Scatterplots
To create a scatterplot, go to GraphsScatter (circled in red below). This will pop up the
Scatterplot dialog box in which you select a style of scatterplot. A simple scatterplot (red arrow
below) will serve most peoples purposes. Choose your style then click Define. This will pop up
the Simple Scatterplot dialog box below. Choose your X and Y axes from your variable list
(green arrows below). Click on Titles (outlined in blue below) to add titles to your scatterplot.
You will probably not need to click on Click on Options (outlined in orange below).
42
Histograms
We saw one way to create histograms using the Frequencies function in the last section. You can
also create them another way. Go to GraphsHistogram (circled in red below). This pops up
the histogram
dialog box (to
left). Click over
the variable you
want to graph.
You can click on
the Titles button
to add titles. You
can check the
Display normal
curve box (green
arrow to left) if
you want a normal
curve
superimposed on
your histogram.
43
below). Note that the Other summary function radio button must be clicked in order to create
this kind of bar chart. You could, instead, do a bar chart on number of cases, or percentage,
using one of the other radio buttons. You can change the summary function from mean (the
default) to median or some other function by clicking the Change summary button (purple arrow
below). Again, you can add titles by using the Titles button. In this case, I do generally click the
Options button and deselect (uncheck) the Display groups defined by missing values checkbox.
If you dont do this, you will get an extra group for anyone who is missing values in your dataset
and it gets in the way, in my opinion. Once you are done, click OK to see your bar chart.
44
5. Output
5.1 Organizing
As we mentioned before, some of these analyses spit out large amounts of output that you dont
really need. In addition, a happy day of data analysis can leave you with more tests that you can
handle, so keeping things organized is the goal of this section.
We have been kind of
ignoring the lefthand side
of the output windowthe
organizational part. You
can see in the output to
right that it is hard from the
output window to know
exactly what analyses were
done. The first big help is
to rename the tests. Instead
of T-test, report WHAT the
t-test was on. You can also
click on the little minuses
to temporarily hide
analyses. Finally, you can
45
46
To the left is
the Results
Coach.
Simply hit
the Next
button
(green arrow
to left) to
cycle
through the
information
given by the
coach. This
is a very
helpful
feature.
47
6. Using syntax
There are two simple ways to start using syntax. Either you can save a specific analysis by using
the Paste function, or you can log your entire session in a Session Journal.
6.1 The Paste function
All of the functions that you can use in SPSS to compute variables, do statistics, and create
graphs have a little button near the Cancel and OK buttons called Paste. Here is an example
from the Univariate ANOVA case (red arrow below). Hitting the Paste instead of the OK button
will paste the syntax associated with the
action you are about to perform into a
syntax window (which will pop up
automatically). Below you will see the
syntax associated with this analysis. To
run this syntax, highlight the part you wish
to run (all of it in this case) and then hit
the Arrow button (orange arrow below).
48
49