Using SPSS - Statistical Packages For Social Sciences
Using SPSS - Statistical Packages For Social Sciences
Sciences
Introduction —
SPSS was launched in 1968. In 2009, IBM procured the license of SPSS. SPSS is a software
for editing & analysing all sorts of data. The data may come from basically any source:
scientific research, a customer database, etc.
Uses of SPSS —
Although SPSS was developed initially for social scientists only, but nowadays it has been
widely used by students & scholars of life sciences, physical sciences, chemical sciences &
earth sciences. We mention below few uses of SPSS:
1. Statistical Analysis – SPSS is used for different statistical analysis such as basic
descriptive statistics, association between attributes, correlation and regression models,
analysis of variance, etc.
2. Manipulating Data – SPSS is also used for manipulating data with the help of functions
like recoding data, computing new variables, merging & aggregating data sets, etc.
3. Generating Tables & Graphs – Using SPSS one can generate desired tables using the
function called custom tables. SPSS can also be used to draw diagrams such as bar
diagrams, pie diagrams, pictograms, etc. One can also draw graph like line diagram,
histogram, frequency polygon, scatter diagram, etc.
Different Window in SPSS –
SPSS has basically fine different windows, each associated with a particular SPSS file type:
1. Data Editor Window – It opens at start-up & is used to enter & store data in a spreadsheet
format. In data editor window, there are two different windows:
(a) One is data view.
(b) The other is variable view.
2. Output Viewer Window – This particular window opens automatically when one
executed an analysis or creates a graph, using a dialog box or command syntax to
execute a procedure. It contains the results of all statistical analysis & graphical displays
of data.
3. Syntax Editor Window – It is a text editor where one composes SPSS commands &
submits them to SPSS processor.
4. Dialog Box – These are associated with menus & submenus for providing options to
analysis & calculations by users. There are two types of dialog boxes:
(i) Menu Dialog Box – Opens from main menu & is closed when one clicks OK or
Cancel.
(ii) Submenu Dialog Box – Opens from options within a menu dialog box & is closed
when one click Continue or Cancel.
5. Chart Boxes – These show charts & graphs in output view window which can be
separately processed.
Types of Variables/Attributes —
Basically variables are broadly classified as:
Scale variable & Categorical variable. All ungrouped data represent scale variables. For e.g.,
age, height, weight, etc. Categorical variable again may be classified as: Nominal & Ordinal.
Nominal variable represents those categorical variables where any kind of ordering is not
done. For e.g., Gender (Male, Female, Transgender). Ordinal variables represents those
categorical variables where some ordering is done. For e.g., Income categories (Low,
Middle, High)
Variables
Ordianal Nominal
(They follow (They donot
some order) follow any
Income order)
Categories Gender
Low M
Middle F
High T
Start-up & Data Editor Window —
To start SPSS one must click START, point All Programs, point SPSS for Window, click
IBM SPSS 20.0. After that a new window will appear. To start data editor for creating a new
data set one must click File, point New, click Data. Then a data editor window will appear
which displays contents of working data sets arranged in a spreadsheet format, containing
variables in columns & cases in rows. There are 2 sheets in the window:
1. Data view which is visible when data editor is open.
2. The second sheet can be accessed by clicking Variable View. It contains information
about data stored with the data set.
In the worksheet, each row represents a case – individual or entity & each column represents
an attribute or a variable. Data can be typed or read from stored files using data editor
window. In addition, descriptional variables & values may be entered & saved with data
editor window.
Data Description Consists of Multiple Items —
1. Naming a Variable – Default names of variable in SPSS are Var 001, Var 002, etc. These
names can be edited with user-defined names of at most 8 alpha-numeric characters.
Variable names should start with an alphabet & it may contain digits also. Blank spaces
are not allowed while defining a variable name. The only special character allowed is
the underscore( _ ).
2. Defining Type of Variable – The variables may be numeric, string or date, etc.
3. Defining Width of Variable – Width of the variable is defined here. By default, numeric
variable assumes total width of 8 spaces including 2 decimal points. A string variable
consists of 8 characteristics.
4. Assigning Labels to Variables – One can put different labels to the variables defined in
one above. These labels appear in output, not the values, easing interpretation of results.
5. Assigning Values – Different labels to values of a variable (usually categorical variables)
can be assigned.
6. Missing Values – It allows one to define which values of a variable should be treated as
missing variables.
7. Columns, Align & Measure – Generally default values & type are kept.
Graphical Representation Using SPSS —
SPSS can be used to prepare, a number of different graphs/charts. Few important are
mentioned below:
1. Bar – Here X-axis represents categorical variable, Y-axis represents some scaled
variables. There should be only one 𝑦 – value for each 𝑥 – value. Length of bars
graphically represents the values of the scaled variables. Bars may be classified into –
simple, clustered (multiple), and stacked (sub-divided).
2. Line Diagram – Here both X-axis & Y-axis are used for scaled variables. There should
be only one 𝑦 – value for each 𝑥 – value. Here (𝑥, 𝑦) points may or may not be joined
by different lines.
3. Pie Diagram – A circle is segmented into portions whose areas or are lengths represent
𝑦 – values. Either categorical 𝑥 – values or scaled 𝑦 – values may be used to label the
segments.
4. Scatter Diagram – This is similar to line diagram with the exception that there may be
more than one 𝑦 – value for some or all 𝑥 – values.
5. Histogram – A special type of bar diagram where 𝑦 – values are frequencies, proportions
or cumulative there of.
6. Box Plot – A box plot is a convenient way of graphically depicting groups of numerical
data through fine parameters namely the smallest observations, lower quartile (𝑄1 ),
median (𝑄2 ), upper quartile (𝑄3 ) & the largest observation. The box plot is a convenient
way of comparing one or more sets of data graphically.
7. Stem or Leaf Plot – It is a device for representing quantitative data in a graphical format
similar to a histogram to assist in visualising the shape of a distribution. A basic stem –
leaf plot contains two columns separated by a dotted vertical line. The left column
contains the stem & the right column contains the leaf.
These charts may be plotted on raw data or computed data, as well as
on some statistics there of. To carry out these in SPSS, one must click Graph, then point
Interactive, then click on Graph Type. For e.g., Bar, Pie, Box Plot, etc.
Inserting & Deleting Cases & Variables —
To insert a new variable one has to click the variable name to select column in which the
new variable is to be inserted. Then the column is highlighted and then one must click Data
and then click Insert Variable.
To insert a new case, one has to click on the rows number. Then the row is highlighted. Next
one must click Data & then click Insert Case.
To delete a variable one has to click the variable name & press the delete key in the
keyboard. Or, one may go to Edit and then click on Clear.
To delete a case one has to click the row number & then press delete key in the keyboard.
The same thing can be done using Edit & Clear option also.
Re-coding a Variable —
Sometimes one needs to modify values of some categorical variables or want to define a
new categorical variable from a scaled variable. This can be done with the help of an option
called Recoding of a Variable. To do this one must click Transform, then click Recode –
into some variable or into different variable. A dialog box is displayed where old values (or
a range of values) can be replaced by new values.
Computing New Variables —
Often one needs to compute values of a new variable using some mathematical formula
involving some other variables. This is known as computing a new variable. To perform this
one needs to click Transform & then click Compute. Then a menu dialog box is displayed,
where one has to type the name of a new variable in target variable box. Numeric expression
box will contain the expression defining the variable being computed.
Sorting of Cases —
Sorting arranges rows of data in ascending or descending order of one or more variables. To
carry out sorting one must click Data, then click Sort Cases. A dialog box is displayed,
where sort keys options may be typed, or generated by pressing specific options.
Selecting Cases —
One can analyse a specific subset of data by selecting only certain cases in which one is
interested in “Select Cases” menu option will either temporarily or permanently remove
cases one does not want from the data set. To do it, one must click Data, then click Select
Cases. A dialog box, displaying a list of variables in active data file on the right, appears
selecting anyone of these options produces secondary dialog boxes, which prompts one for
particular specifications one is interested in. If option results in if condition is satisfied
dialog box. Then filtering can be done as per wishes of the user. Filtered option removes
data from subsequent analysis until all cases option is reset.
Working with SPSS : Descriptive Statistics —
Several summary or descriptive statistics are available under Descriptive options from the
Analyse and Descriptive Statistics menu. To carry out this one must click Analyse, then
point the cursor to Descriptive Statistics and then click Descriptive. A menu dialog box,
showing names of variables on the left is displayed. Names of variables in the right box are
those for which descriptive statistics are to be computed. To view the available descriptive
statistics, one has to click Options – leading to a submenu dialog box where one click on the
desired boxes for obtaining these statistics in the output. After selecting the statistics desired,
output can be generated by first clicking Continue in option submenu dialog box, then
clicking OK in descriptive dialog box. The selected statistics will be displayed in an output
viewer window.
Working with SPSS : Frequencies —
In general, frequencies are obtained for categorical variables only. We are not interested on
obtaining frequencies of a scale variable. To obtain frequencies of one or more variables,
one must click Analyse, then point Descriptive Statistics & then click Frequencies. A menu
dialog box displays names of variables on the left. Names of variables on the right box are
those for which frequencies are calculated. After selecting a particular categorical variable
then one has to click OK. A frequency distribution table of the selected categorical variable
will be displayed in an output viewer window. Several options are there in the frequency
dialog box such as Statistics, Charts, etc.
Statistics produces a submenu dialog box with additional statistics like median, mode,
quartiles, percentiles, etc.
Charts produces another submenu dialog box allowing one to graphically examine data in
formats like bar, chart, pie-chart, histogram, etc.
If one ticks off the box do not display frequencies, frequencies will not be displayed.
Importing a File in SPSS —
Sometimes instead of entering a data in SPSS worksheets, a data set saved in other
spreadsheets like MS-Excel, can be imported to SPSS worksheet. For that purpose one has
to follow few simple steps in SPSS. Open a SPSS blank file. Click File, then point to Import
Data, then click Excel from database, then select the already saved excel file. Open the excel
file, then click OK. A new SPSS file opens up.
Chi – Square Distributions —
Independence of Attributes –
We go to analyse, then point descriptive statistics, then go to crosstabs.
Under Weight Normal Obese Highly Obese Total
M 1 2 1 1 5
F 0 0 5 0 5
Total 1 2 6 1 10
∑𝑖(𝑂𝑖 −𝐸𝑖 )2
𝜒2 =
𝑂𝑖
𝑇𝑜𝑡𝑎𝑙 𝑜𝑓 𝑟𝑜𝑤 1 ×𝑇𝑜𝑡𝑎𝑙 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛 1
𝐸11 =
𝑁
𝑇𝑜𝑡𝑎𝑙 𝑜𝑓 𝑟𝑜𝑤 2 ×𝑇𝑜𝑡𝑎𝑙 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛 4
𝐸24 =
𝑁
p – value :
p – value < 0.05 – 𝐻0 will be rejected at 5% level of significance otherwise 𝐻0 is accepted.
p – value < 0.01 – 𝐻0 will be rejected at 1% level of significance otherwise 𝐻0 is accepted.
t – Distribution —
One Sample t – test –
Example 1 :
• The calorie intake of a randomly chosen sample of 20 boys of a school are recorded.
• We are interested to test whether the sample average calorie intake coincides with the
population average calorie intake of 2000
• We set up the null hypothesis, 𝐻0 : μ = 2000
against alternative hypothesis, 𝐻1 : μ ≠ 2000
We go to Analyse, then point Compare Means, then click on One Sample t – test.
t – test for Difference of Means –
Example 2 :
• Gain in weight (in Kgs) of a particular animal fed on two diets A and B are given.
• We are interested to test whether two diets differ significantly as regard their effect on
increase in weight.
• The null hypothesis : there is no significant difference between mean increase in weight
due to diets A & B.
We go to Analyse, then point Compare Means, then we go to Independent Sample t – test.
(We generally assume equal variance)
We select gain in weight as test variable and diet as grouping variable.
Paired t – test –
We go to Analyse, then point Compare Means, then we go to Paired Sample t – test.
F – Distribution —
Equality of Several Means –
Example 3 :
• In an experiment the factor A has 5 levels and number of observation in each level are as
shown below :
𝐴1 – 55, 56, 54, 60
𝐴2 – 63, 60, 56, 60, 64, 62, 68, 41, 58, 40
𝐴3 – 63, 90, 72, 74, 77, 55, 81, 70, 80, 74
𝐴4 – 54, 50, 52, 55, 62, 57, 56, 61, 72
𝐴5 – 70, 57, 61, 42, 46, 53, 72, 48, 68, 50
• Here our null hypothesis 𝐻0 is that the effects of different levels of A are equal.
We go to Analyse, then point Compare Means, then we go to One way ANOVA.
We select Observations in dependent list and levels in factors.
We should write all the levels (series) in one column only.
Correlation —
We go to Analyse, then point Correlate, then we go to Bivariate.
Correlation between two variables increases and tends to 1 if the values of 𝑥 & 𝑦 increases
or moves in the same direction, i.e., correlation should be calculated for two scaled variables.