0% found this document useful (0 votes)
6 views187 pages

ALL Luctures

This document provides an overview of different types of questions and variables that can be used in SPSS data entry. It discusses closed and open questions, including dichotomous, multiple response, and Likert scale questions. For variables, it outlines the levels of measurement including nominal, ordinal, interval and ratio scales. Different examples are provided for each type of question and variable. The goal is to introduce concepts necessary to properly define variables in SPSS.

Uploaded by

Abdirahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views187 pages

ALL Luctures

This document provides an overview of different types of questions and variables that can be used in SPSS data entry. It discusses closed and open questions, including dichotomous, multiple response, and Likert scale questions. For variables, it outlines the levels of measurement including nominal, ordinal, interval and ratio scales. Different examples are provided for each type of question and variable. The goal is to introduce concepts necessary to properly define variables in SPSS.

Uploaded by

Abdirahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 187

SPSS data entry

Training session 2
Objectives
• To describe opening and closing SPSS
• To introduce the look and structure of SPSS
• To introduce the data entry windows: Data View and Variable View
• To outline the components necessary to define a variable
• To introduce the SPSS online tutorial
Uses for SPSS
• Data management
• Data analysis
Data management
• Defining variables
• Coding values
• Entering and editing data
• Creating new variables
• Recoding variables
• Selecting cases
Data analysis
• Univariate statistics
• Bivariate statistics
• Multivariate statistics
Opening SPSS
• Double click the SPSS icon on the desktop

OR

• Start/Programs/SPSS for Windows/SPSS**


• The following introductory screen should appear:
The Data View window

Cell edit field

Cell information

View tabs

Status bar/boxes
Data View
• Rows represent cases or observations, that is, the objects on which
data have been collected:
• For example, rows represent the contents of a single treatment data
collection form, the information on an individual
• Columns represent variables or characteristics of the object of
interest:
• For example, each column contains the answers to the questions on the
treatment data collection form: age, gender, primary drug of use, etc.
Data Editor
• Data Editor comprises two screens:
• Data View: the previous screen
• Variable View: used to define the variables
• To move between the two:
• Use the View tab at the bottom of the screen
OR
• Ctrl + T
OR
• View/Variables from the Data View window
• View/Data from the Variable View window
Variable View
The data entry process
• Define your variables in Variable View
• Enter the data, the values of the variables, in Data
View
Definition of variables
10 characteristics are used to define a variable:

Name Values
Type Missing
Width Column
Decimals Align
Label Measure
Name
• Each variable must have a unique name of not more than 8 characters
and starting with a letter
• Try to give meaningful variable names:
• Describing the characteristic: for example, age
• Linking to the questionnaire: for example, A1Q3
• Keep the names consistent across files
Type
• Internal formats: • Output formats:
• Numeric • Comma
• String (alphanumeric) • Dot
• Date • Scientific notation
• Dollar
• Custom currency
Numeric
• Numeric variables:
• Numeric measurements
• Codes
• Definition of the size of the variable
String (alphanumeric)
• String variables contain words or characters; strings can include
numbers but, taken here as characters, mathematical operations
cannot be applied to them
• The maximum size of a string variable is 255 characters
Date
• The input format for date variables must be defined, such as
DD/MM/YYYY, MM/DD/YYYY or MM/DD/YY
• Computers store dates as numbers from a base date; in SPSS, dates
are stored as the number of seconds from 14 October 1582
Example
• Create two variables:
• ID: the unique identifier, which will be alphanumeric
with a maximum of 8 characters
• Age: the age of the respondent measured in years, a
discrete variable ranging between 10 and 100
Click here
Click on the String radio button and change the characters to the size of the variable, 8 in this case. Click OK.
Click on the Type column in the second row and define a numeric variable with a maximum size of
3 with no decimal points.
Click on OK to continue.
Note that a number of default values have been entered into the remaining columns.
Labels
• Descriptors for the variables
• Maximum 255 characters
• Used in the output
Variable labels added
Values
• Value labels are descriptors of the categories of a variable
• Coding
Missing
• Defines missing values
• The values are excluded from some analysis
• Options:
• Up to 3 discrete missing values
• A range of missing values plus one discrete missing value
Click in the Missing Values column to obtain the dialogue box below. Enter the value 999 for Age.
Missing values added
Columns and Align
• Columns sets the amount of space reserved to display the contents of
the variable in Data View; generally the default value is adequate
• Align sets whether the contents of the variable appear on the left,
centre or right of the cell in Data View
• Numeric variables are right-hand justified by default and string
variables left-hand justified by default; the defaults are generally
adequate
Measure
• Levels of measurement:
• Nominal
• Ordinal
• Interval
• Ratio
• In SPSS, interval and ratio are designated together
as Scale
• The default for string variables is Nominal
• The default for numeric variables is Scale
Returning to Data View, the first two column headings will reflect the two variables created: ID and Age. Here the first six
observations have been entered.
Exercise: define the necessary variables and enter the following data
Saving the file
• The file must always be saved in order to save the work that has been
done to date:
• File/Save
• Move to the target directory
• Enter a file name
• Save
Summary
• Data Editor • Variable definition
• Data View • Name
• Variable View • Type
• File/Save • Width
• Decimals
• Label
• Values
• Missing
• Columns
• Align
• Measure
Training session 4

Types of question and types


of variable
Objectives
• Define a range of classifications for questions and variables
• Discuss the use of levels of measurement in defining variables in SPSS
Types of question
• Closed, open
• “Factual” and attitudinal
Closed questions
• The respondent selects from a list of mutually exclusive and
collectively exhaustive answers
• The answers are pre-coded
Example
• Has the patient been in treatment prior to this episode?
 Yes (1)
 No (0)
Example
• In the last 30 days, how many times (if any) have you had 5 or more
drinks in a row?
 None
1
2
 3-5
 6-9
 10 or more
“Other” Category
• An option on all but the simplest closed questions
• Ensures the list of options are exhaustive
• Allows flexibility in response
• Post-coded rather than pre-coded
Example
• Type of centre:
 Specialized treatment centre
 Therapeutic community
 General hospital
 Psychiatric hospital/unit
 Other (specify): …………………………..
Dichotomous questions
• A subset of closed questions
• There are only two possible answers
• The answers are mutually exclusive and collectively exhaustive
Examples
1. Gender:
 Male
 Female
2. Has the patient been in treatment prior to this episode?
 Yes
 No
Multiple-response questions
• The question allows more than one response
• The categories are not mutually exclusive
• Frequently, a grouping of dichotomous closed questions
Example
• Mode of ingestion of primary substance
(X all that apply):
 Swallow
 Smoke
 Snort
 Inject
 Other (specify): ……………………………….
Likert Scales
• A type of closed question
• Designed to measure attitudes
Example
• Do you disapprove of people doing each of the
following:
• Trying marijuana once or twice
 Don’t approve
 Disapprove
 Strongly disapprove
 Don’t know
• Smoking marijuana occasionally
• (options repeated)
Open questions
• There are no constraints on the respondent’s answer
• The answers cannot be predicted before the questionnaires are
presented
• The answers must be coded after the questionnaires are collected
Examples
1. Q30. Which new drugs or new patterns of use have been
reported?
2. Q13. Indicate primary substance of abuse, that is, the most
frequently used
3. Other (specify): ……………………..
Exercise: discussion
• Do Open or Closed questions appear more frequently in the
questionnaires used by your specific focal group? Give
reasons/possible explanations for these choices.
Response types
• Factual/attitudinal
• Direct/indirect
Types of variable
• Levels of measurement
• Types of variation
• Categorical vs. continuous
Levels of measurement
• Nominal
• Ordinal
• Interval
• Ratio
Nominal
• The data describe an attribute
• The set of possible values the variable can
contain are mutually exclusive and collectively
exhaustive categories
• The categories cannot be objectively measured
against each other
Examples: nominal data
• Gender: male and female
• Location: urban and rural
• Religion: Christian, Hindu, Muslim, Jew
• Race: white, black, coloured, mixed
• Referral source: self, employer, court
Ordinal
• The data are broken into categories that can be
ranked
• It is not possible to quantify the difference
between the categories
Example: ordinal
• Level of education:
None
Primary
Secondary
Tertiary
Interval
• The data are measured on a continuous scale,
not simply ranked
• The units of measurement are constant
• There is no absolute 0
Example: interval
• Temperature:
• Fahrenheit or Celsius
• Measured on a continuous scale
• No absolute 0
Ratio
• The data are measured on a continuous scale, not simply ranked
• The units of measurement are constant
• There is an absolute 0
Examples: ratio
• Age
• Income
• Temperature on the Kelvin Scale
Types of variation
• Nominal: equal categories
• Ordinal: ordered categories
• Interval and ratio: a continuous scale
Types of variation
• Qualitative: nominal
• Quantitative: interval and ratio
• Quantitative and qualitative: ordinal
Exercise:
identify the levels of measurement (make 10 respondents)

• Highestoflevel
Name treatment
of education
centre completed
• Referral source
Employment status
• Gender marital status
Current
• Age old was the patient when they first began using drugs regularly?
How
• Home language
• Region of permanent residence
Level of measurement in SPSS
• Nominal
• Ordinal
• Scale
Exercise: measure
• Return to Ex1.sav and set the level of measurement for the variables
ID, DRUG, AGE and COND
• Save the file
Summary
• Variable
Questiontypes:
types:
• Closed/Open
Levels of measurement
• Discrete
Factual/Attitudinal
(categorical)/continuous
• Quantitative/qualitative
Training session 4

Data analysis: frequencies


Objectives
• Introduce univariate, descriptive statistics as the first step in a process
of data analysis, starting from exploration and moving towards more
sophisticated techniques
• Distinguish between frequencies and relative frequencies
• Introduce frequency and probability distributions as data models
Descriptive Statistics
• Univariate
• Categorical data
• Continuous data
SPSS Descriptive Statistics
• Analyse/Descriptive Statistics/Frequencies
• Analyse/Descriptive Statistics/Explore
• Analyse/Descriptive Statistics/Descriptives
Frequency vs relative frequency
• “The frequency of any value of a variable is the number of times that
value occurs in the data; that is, a frequency is a count. The relative
frequency of any value is the proportion or fraction or percent of all
observations that have that value.”
(D. S. Moore, Statistics: Concepts and Controversies, 5th ed. (New York, W. H. Freeman Press,
2000)).
Frequency distribution/probability
distribution
• Frequency distribution: all possible values of the variable and their
associated counts
• Probability distribution: all possible values of the variable and their
associated probabilities (relative frequencies)
Percentages
• Let:
• f1 = the number of cases in category 1
• n = the total number of cases
• The percentage of cases in category 1:

f1
*100  %
n
Exercise: frequency of referral
• Construct a frequency table for referral source in the file main.sav
Referral
Frequency Percent Valid Percent Cumulative Percent

Valid Self/Fam/Friends 586 37.3 38.0 38.0


Employer 195 12.4 12.7 50.7
Health Pro 194 12.3 12.6 63.3
Religious Grp 65 4.1 4.2 67.5
Hosp/Clinic 53 3.4 3.4 70.9
Welfare 252 16.0 16.4 87.3

Courts/Corrections 100 6.4 6.5 93.8


School 64 4.1 4.2 97.9
Unknown 32 2.0 2.1 100.0
Total 1541 98.1 100.0
Missing System 30 1.9
Total 1571 100.0
Frequencies: Format button
Referral
Frequency Percent Valid Percent Cumulative Percent
Valid Self/fam/friends 586 37.3 38.0 38.0
Welfare 252 16.0 16.4 54.4
Employer 195 12.4 12.7 67.0
Health pro 194 12.3 12.6 79.6
Courts/corrections 100 6.4 6.5 86.1
Religious grp 65 4.1 4.2 90.3
School 64 4.1 4.2 94.5

Hosp/clinic 53 3.4 3.4 97.9


Unknown 32 2.0 2.1 100.0
Total 1541 98.1 100.0
Missing System 30 1.9
Total 1571 100.0
Frequencies: Charts button
Referral
50

40

30

20
P e rce n t

10

0
Self/Fam/Friends Employer Courts/Corrections School Unknown
Welfare Health Pro Relgious Grp Hosp/Clinic

Referral
Frequencies: Statistics button
Referral Statistics

N Valid 1541

Missing 30

Mode 1
Frequencies: syntax
• FREQUENCIES
• VARIABLES=refsourc
• /FORMAT=DFREQ
• /STATISTICS=MODE
• /BARCHART PERCENT
• /ORDER=ANALYSIS.
Exercise: frequencies
• Generate a frequency table and bar chart for each of the following
variables and comment:
• Race
• Education
• Employment
• Save the output and the syntax file
Frequency: Race
Race
Frequency Percent Valid Percent Cumulative Percent

Valid Coloured 722 46.0 52.8 52.8


White 520 33.1 38.0 90.8
African 109 6.9 8.0 98.8
Asian 17 1.1 1.2 100.0
Total 1368 87.1 100.0
Missing System 203 12.9
Total 1571 100.0
Frequency: Education
Education
Frequency Percent Valid Percent Cumulative Percent

Valid Secondary 978 62.3 64.3 64.3


Primary 332 21.1 21.8 86.2
Tertiary 189 12.0 12.4 98.6
None/pre-primary 21 1.3 1.4 100.0
Total 1520 96.8 100.0
Missing System 51 3.2
Total 1571 100.0
Frequency: Employment
Employment
Frequency Percent Valid Percent Cumulative
Percent
Valid Working full-time 571 36.3 36.6 36.6
Not working 569 36.2 36.4 73.0
Student/pupil 240 15.3 15.4 88.3
Working part-time 68 4.3 4.4 92.7
Pensioner 34 2.2 2.2 94.9
Disabled 33 2.1 2.1 97.0

Housewife 28 1.8 1.8 98.8


Other 18 1.1 1.2 99.9
Apprentice 1 .1 .1 100.0
Total 1562 99.4 100.0
Missing System 9 .6
Total 1571 100.0
Summary
• Frequencies and relative frequencies
• Frequency distributions and probability distributions
• Format/ordering
• Bar charts
• Statistics/mode
Data analysis: cross-
tabulation
Training session 5
Objectives
• To introduce cross-tabulation as a method of investigating the
relationship between two categorical variables
• To describe the SPSS facilities for cross-tabulation
• To discuss a range of simple statistics to describe the relationship
between two categorical variables
• To reinforce the range of SPSS skills learnt to date
Bivariate analysis
• The relationship between two variables
• A two-way table:
• Rows: categories of one variable
• Columns: categories of the second variable
Gender
Frequency Percent Valid Percent Cumulative Percent

Valid Male 1251 79.6 79.9 79.9


Female 314 22.0 20.1 100.0
Total 1565 99.6 100.0
Missing System 6 .4
Total 1571 100.0
Mode of ingestion Drug 1
Frequency Percent Valid Percent Cumulative
Percent
Valid Swallow 794 50.5 51.0 51.0
Smoke 634 40.4 40.7 91.7
Snort 62 3.9 4.0 95.6
Inject 30 1.9 1.9 97.6
12.00 2 .1 .1 97.7
15.00 1 .1 .1 97.8

23.00 10 .6 .6 98.4
Out-of-range values
(note that none of
24.00 11 .7 .7 99.1
the digits are > 5)
25.00 5 .3 .3 99.4
34.00 4 .3 .3 99.7
234.00 5 .3 .3 100.0

Total 1558 99.2 100.0


Missing System 13 .8
Total 1571 100.0
Cleaning Mode1
• Save a copy of the original
• Recode the out-of-range values into a new value (for example,12, 15,
23, 24 ,25, 34, 234 into the value 8)
• Set the new value as a user-defined missing value (for example, 8 is
declared a missing value and given the label “Out-of-range”).
Mode of ingestion Drug 1
Frequency Percent Valid Percent Cumulative
Percent
Valid Swallow 794 50.5 52.2 52.2
Smoke 634 40.4 41.7 93.9
Snort 62 3.9 4.1 98.0
Inject 30 1.9 2.8 100.0
Total 1520 96.8 100.0
Missing Out-of-range 38 2.4
System 13 .8
Total 51 3.2
Total 1571 100.0
Mode of ingestion Drug1 * Gender cross-tabulation
Count
Gender
Male Female Total

Swallow 600 194 794


Mode of
ingestion Smoke 553 77 630
Drug1
Snort 44 17 61 Row totals
Inject 20 10 30

Total 1271 298 1515

Grand total

Column totals
Joint frequencies
Percentages
• The difference in sample size for men and women makes comparison
of raw numbers difficult
• Percentages facilitate comparison by standardizing the scale
• There are three options for the denominator of the percentage:
• Grand total
• Row total
• Column total
Mode of ingestion Drug1 * Gender cross-tabulation

Gender
Male Female Total

Swallow Count 600 194 794


Mode of
ingestion % of Total 39.6% 12.8% 52.4%
Drug1 Smoke Count 553 77 630

% of Total 36.5% 5.1% 41.6%

Snort Count 44 17 61
Marginal
% of Total 2.9% 1.1% 4.0% distribution
Inject Count 20 10 30 Mode1

% of Total 1.3% .7% 2.0%


Total Count 1271 298 1515

% of Total 80.3% 19.7% 100.0%

Joint distribution
Mode1 & Gender
Marginal distribution
Gender
Mode of ingestion Drug1 * Gender cross-tabulation
Gender

Male Female Total

Mode of Swallow Count 600 194 794


ingestion Drug1 % within Mode of 75.6% 24.4% 100.0%
ingestion Drug1

Smoke Count 553 77 630

% within Mode of 87.8% 12.2% 100.0%


ingestion Drug1

Snort Count 44 17 61
% within Mode of 72.1% 27.9% 100.0%
ingestion Drug1

Inject Count 20 10 30
% within Mode of 66.7% 33.3% 100.0%
ingestion Drug1

Total Count 1271 298 1515

% within Mode of 80.3% 19.7% 100.0%


ingestion Drug1

The distribution of Gender


conditional on Mode1
Mode of ingestion Drug1 * Gender cross-tabulation
Gender

Male Female Total

Mode of Swallow Count 600 194 794


ingestion % within Gender 49.3% 65.1% 52.4%
Drug1
Smoke Count 553 77 630
% within Gender 45.4% 25.8% 41.6%

Snort Count 44 17 61
% within Gender 3.6% 5.7% 4.0%
Inject Count 20 10 30
% within Gender 1.6% 3.4% 2.0%
Total Count 1271 298 1515

% within Gender 100.0% 100.0% 100.0%

The distribution of Mode1


conditional on Gender
Choosing percentages
• “Construct the proportions so that they sum to one within the
categories of the explanatory variable.”
Source: (C. Marsh, Exploring Data: An Introduction to Data Analysis for Social Scientists
(Cambridge, Polity Press, 1988), p. 143.)
n=600

n=553

n=194

n=44 n=77

n=20
n=17
n=10
Dimensions

Definitions of vertical
and horizontal variables
Two-by-two tables
• Tables with two rows and two columns
• A range of simple descriptive statistics can be applied to two-by-two
tables
• It is possible to collapse larger tables to these dimensions
Gender * White pipe cross-tabulation
White pipe

Yes No Total

Male Count 290 961 1251


Gender % within Gender 23.2% 76.8% 100.0%
Female Count 22 292 314
% within Gender 7.0% 93.0% 100.0%

Total Count 312 1253 1565

% within Gender 19.9% 80.1% 100.0%


White pipe

Yes No

Gender Male 0.2318 0.7682

Female 0.0701 0.9299


Relative risk
• Divide the probabilities for “success”:
• For example:
P(Whitpipe=Yes|Gender=Male)=0.2318
P(Whitpipe=Yes|Gender=Female)=0.0701
Relative risk is 0.2318/0.0701=3.309
• The proportion of males using white pipe was over three times
greater than females
Odds
• The odds of “success” are the ratio of the probability of “success” to
the probability of “failure”
• For example:
- For males the odds of “success” are
0.2318/0.7682=0.302
- For females the odds of “success” are
0.0701/0.9299=0.075
Odds ratio
• Divide the odds of success for males by the odds of success for
females
• For example: 0.302/0.075=4.005
• The odds of taking white pipe as a male are four times those for a
female
Risk estimate
Odds ratio M/F 95% Confidence interval

Value Lower Upper

Odds ratio for Gender (Male / Female) 4.005 2.547 6.299


For cohort white pipe = Yes 3.309 2.184 5.012
Relative risk of
For cohort white pipe = No .826 .791 .862
“success” N of valid cases 1565

Relative risk of
“failure”
Exercise 1: cross-tabulations
• Create and comment on the following cross-tabulations:
• Age vs Gender (Raw total)
• Region vs Gender (Column total)
• School vs Gender (Sum total)
• Year of study vs School (All three)
• Suggest other cross-tabulations that would be useful
Exercise 2: cross-tabulation
• Construct a dichotomous variable for age: Up to 24 years and Above
24 years
• Construct a dichotomous variable for the primary drug of use:
Alcohol and Not Alcohol
• Create a cross-tabulation of the two new variables and interpret
• Generate Relative Risks and Odds Ratios and interpret
Summary
• Cross-tabulations
• Joint frequencies
• Marginal frequencies
• Row/Column/Total percentages
• Relative risk
• Odds
• Odds ratios
• Working with relationships between two variables
• Size of Teaching Tip & Stats Test Score

100
90
80
70
60
Stats
Test 50
Score 40
30
20
10
0
$0 $20 $40 $60 $80
Correlation & Regression
• Univariate & Bivariate Statistics
• U: frequency distribution, mean, mode, range, standard deviation
• B: correlation – two variables
• Correlation
• linear pattern of relationship between one variable (x) and another variable (y) – an
association between two variables
• relative position of one variable correlates with relative distribution of another variable
• graphical representation of the relationship between two variables
• Warning:
• No proof of causality
• Cannot assume x causes y
Scatterplot!
• No Correlation
• Random or circular assortment
of dots
• Positive Correlation
• ellipse leaning to right
• GPA and SAT
• Smoking and Lung Damage

• Negative Correlation
• ellipse learning to left
• Depression & Self-esteem
• Studying & test errors
Pearson’s Correlation Coefficient
• “r” indicates…
• strength of relationship (strong, weak, or none)
• direction of relationship
• positive (direct) – variables move in same direction
• negative (inverse) – variables move in opposite directions
• r ranges in value from –1.0 to +1.0
-1.0 0.0 +1.0
Strong Negative No Rel. Strong Positive

•Go to website!
– playing with scatterplots
Practice with Scatterplots

r = .__ __ r = .__ __

r = .__ __
r = .__ __
Correlations

Miles walked
per day Weight Depression Anxiety
Miles walked per day Pearson Correlation 1 -.797** -.800** -.774**
Sig. (2-tailed) .002 .002 .003
N 12 12 12 12
Weight Pearson Correlation -.797** 1 .648* .780**
Sig. (2-tailed) .002 .023 .003
N 12 12 12 12
Depression Pearson Correlation -.800** .648* 1 .753**
Sig. (2-tailed) .002 .023 .005
N 12 12 12 12
Anxiety Pearson Correlation -.774** .780** .753** 1
Sig. (2-tailed) .003 .003 .005
N 12 12 12 12
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Samples vs. Populations
• Sample statistics estimate Population parameters
• M tries to estimate μ
• r tries to estimate ρ (“rho” – greek symbol --- not “p”)
•r correlation for a sample
• based on a the limited observations we have
•ρ actual correlation in population
• the true correlation
• Beware Sampling Error!!
• even if ρ=0 (there’s no actual correlation), you might get r =.08 or r = -.26 just by
chance.
• We look at r, but we want to know about ρ
Hypothesis testing with Correlations
• Two possibilities
• Ho: ρ = 0 (no actual correlation; The Null Hypothesis)
• Ha: ρ ≠ 0 (there is some correlation; The Alternative Hyp.)
• Case #1 (see correlation worksheet)
• Correlation between distance and points r = -.904
• Sample small (n=6), but r is very large
• We guess ρ < 0 (we guess there is some correlation in the pop.)
• Case #2
• Correlation between aiming and points, r = .628
• Sample small (n=6), and r is only moderate in size
• We guess ρ = 0 (we guess there is NO correlation in pop.)
• Bottom-line
• We can only guess about ρ
• We can be wrong in two ways
Reading Correlation Matrix Correlationsa

Time spun
Total ball Distance before Aiming Manual College grade Confidence
toss points from target throwing accuracy dexterity point avg for task
Total ball toss points Pearson Correlation 1 -.904* -.582 .628 .821* -.037 -.502
Sig. (2-tailed) . .013 .226 .181 .045 .945 .310
N 6 6 6 6 6 6 6
Distance from target Pearson Correlation -.904* 1 .279 -.653 -.883* .228 .522
Sig. (2-tailed) .013 . .592 .159 .020 .664 .288
N 6 6 6 6 6 6 6
Time spun before Pearson Correlation -.582 .279 1 -.390 -.248 -.087 .267
throwing Sig. (2-tailed) .226 .592 . .445 .635 .869 .609
N
6 6 6 6 6 6 6

Aiming accuracy Pearson Correlation .628 -.653 -.390 1 .758 -.546 -.250
Sig. (2-tailed) .181 .159 .445 . .081 .262 .633
N 6 6 6 6 6 6 6
Manual dexterity Pearson Correlation .821* -.883* -.248 .758 1 -.553 -.101
Sig. (2-tailed) .045 .020 .635 .081 . .255 .848
N 6 6 6 6 6 6 6

r = -.904
College grade point avg Pearson Correlation -.037 .228 -.087 -.546 -.553 1 -.524
Sig. (2-tailed) .945 .664 .869 .262 .255 . .286
N 6 6 6 6 6 6 6
Confidence for task Pearson Correlation -.502 .522 .267 -.250 -.101 -.524 1
Sig. (2-tailed)
N
.310
6
.288
6
.609
6
.633
6
.848
6
.286
6
.
6
p = .013 -- Probability of getting a
*. Correlation is significant at the 0.05 level (2-tailed).
a. Day sample collected = Tuesday
correlation this size by sheer chance.
Reject Ho if p ≤ .05.
a
Co rrel ati o ns

Ti me s pun
Tot al bal l D is t anc e bef ore A im i ng M anual Co lleg e gra de Conf id enc e
t os s poi nt s f rom t arget th row ing ac c urac y de x te ri ty p oi nt av g f or t as k
To ta l ball tos s point s P ears on C orrel at ion 1 - .90 4* -. 582 .6 28 . 821* -.0 37 -.50 2
S ig . (2-t aile d) . . 013 . 226 .1 81 . 045 . 945 . 310
N 6 6 6 6 6 6 6
Di s t anc e f rom t arget P ears on C orrel at ion -. 9 04* 1 . 279 -. 653 -. 883* . 228 . 522
S ig . (2-t aile d) . 013 . . 592 .1 59 . 020 . 664 . 288
N 6 6 6 6 6 6 6
Ti m e s pun be fo re P ears on C orrel at ion -. 5 82 . 279 1 -. 390 -. 248 -.0 87 . 267

sample
t hrowi ng S ig . (2-t aile d) . 226 . 592 . .4 45 . 635 . 869 . 609
N
6 6 6 6 6 6 6

A i m ing ac c urac y P ears on C orrel at ion . 628 - .65 3 -. 390 1 . 758 -.5 46 -.25 0
S ig . (2-t aile d) . 181 . 159 . 445 . . 081 . 262 . 633
N 6 6 6 6 6 6 6
M an ual dex t e ri t y P ears on C orrel at ion . 821* - .88 3* -. 248 .7 58 1 -.5 53 -.10 1
S ig . (2-t aile d) . 045 . 020 . 635 .0 81 . . 255 . 848
N 6 6 6 6 6 6 6

size
Co lleg e grade poi nt av g P ears on C orrel at ion -. 0 37 . 228 -. 087 -. 546 -. 553 1 -.52 4
S ig . (2-t aile d) . 945 . 664 . 869 .2 62 . 255 . . 286

r (4) = -.904, p.05


N 6 6 6 6 6 6 6
Co nf idenc e for t as k P ears on C orrel at ion -. 5 02 . 522 . 267 -. 250 -. 101 -.5 24 1
S ig . (2-t aile d) . 310 . 288 . 609 .6 33 . 848 . 286 .
N 6 6 6 6 6 6 6
*. Correlat i on is s ig nif ic a nt at t he 0. 05 l ev el (2-t ai led).
a. Day s a mp le c ol lec t ed = Tue s day
Predictive Potential
• Coefficient of Determination
• r²
• Amount of variance accounted for in y by x
• Percentage increase in accuracy you gain by using the regression line to make
predictions
• Without
0% correlation,
20% you can 40%
only guess60%
the mean of80%
y 100%
• [Used with regression]
Limitations of Correlation
• linearity:
• can’t describe non-linear relationships
• e.g., relation between anxiety & performance
• truncation of range:
• underestimate stength of relationship if you can’t see full range of x value
• no proof of causation
• third variable problem:
• could be 3rd variable causing change in both variables
• directionality: can’t be sure which way causality “flows”
Regression
•Regression: Correlation + Prediction
• predicting y based on x
• e.g., predicting….
• throwing points (y)
• based on distance from target (x)
• Regression equation
• formula that specifies a line
• y’ = bx + a
• plug in a x value (distance from target) and predict y (points)
• note
• y= actual value of a score
• y’= predict value •Go to website!
– Regression Playground
Regression Graphic – Regression Line
See correlation &
120

regression worksheet
100

80
To t a l b a ll to s s p o in ts
60

y’=47
40

y’=20 20

0 Rsq = 0.6031
8 10 12 14 16 18 20 22 24 26

Distance from target if x=18 if x=24


then… then…
Regression Equation See correlation &
regression worksheet
• y’= bx + a
• y’ = predicted value of y
• b = slope of the line
• x = value of x that you plug-in
• a = y-intercept (where line crosses y access)
• In this case….
• y’ = -4.263(x) + 125.401

• So if the distance is 20 feet


• y’ = -4.263(20) + 125.401
• y’ = -85.26 + 125.401
• y’ = 40.141
SPSS Regression Set-up •“Criterion,”
•y-axis variable,
•what you’re trying to
predict

•“Predictor,”
•x-axis variable,
•what you’re basing the
prediction on

Note: Never refer to the IV or DV when doing regression


Getting Regression Info from SPSSSee correlation &
Model Summary
regression worksheet
Adjusted Std. Error of
Model R R Square R Square the Estimate
1 .777a .603 .581 18.476
a. Predictors: (Constant), Distance from target

y’ = b (x) + a
y’ = -4.263(20) + 125.401
a
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 125.401 14.265 8.791 .000
Distance from target -4.263 .815 -.777 -5.230 .000
a. Dependent Variable: Total ball toss points

b
Predictive Ability
• Mantra!!
• As variability decreases, prediction accuracy ___
• if we can account for variance, we can make better predictions
• As r increases:
• r² increases
• “variance accounted for” increases
• the prediction accuracy increases
• prediction error decreases (distance between y’ and y)
• Sy’ decreases
• the standard error of the residual/predictor
• measures overall amount of prediction error
• We like big r’s!!!
Drawing a Regression Line by Hand
Three steps
1. Plug zero in for x to get a y’ value, and then plot this value
• Note: It will be the y-intercept

2. Plug in a large value for x (just so it falls on the right end of the
graph), plug it in for x, then plot the resulting point
3. Connect the two points with a straight line!
Chi-square Test of
Independence
Training session 6
Chi-square Test of Independence
• The chi-square test of independence is probably the most frequently used
hypothesis test in the social sciences.

• In this exercise, we will use the chi-square test of independence to evaluate


group differences when the test variable is nominal, dichotomous, ordinal, or
grouped interval.

• The chi-square test of independence can be used for any variable; the group
(independent) and the test variable (dependent) can be nominal, dichotomous,
ordinal, or grouped interval.

SW318 Social Work Statistics Slide 142


Independence Defined
• Two variables are independent if, for all cases, the classification of a case into a
particular category of one variable (the group variable) has no effect on the
probability that the case will fall into any particular category of the second
variable (the test variable).

• When two variables are independent, there is no relationship between them. We


would expect that the frequency breakdowns of the test variable to be similar for
all groups.

SW318 Social Work Statistics Slide 143


Independence Demonstrated
• Suppose we are interested in the relationship between gender and attending
college.

• If there is no relationship between gender and attending college and 40% of our
total sample attend college, we would expect 40% of the males in our sample to
attend college and 40% of the females to attend college.

• If there is a relationship between gender and attending college, we would expect


a higher proportion of one group to attend college than the other group, e.g. 60%
to 20%.

SW318 Social Work Statistics Slide 144


Displaying Independent and Dependent
Relationships
When the variables are independent, the
When group membership makes a
difference, the dependent relationship is
proportion in both groups is close to the indicated by one group having a higher
same size as the proportion for the total proportion than the proportion for the total
sample. sample.
P o p o r t io n A t t e n d in g C o lle g e

P o p o r t io n A t t e n d in g C o lle g e
Inde pe nde nt Re lations hip De pe ndent Re lations hip
be tw ee n Ge nde r and Colle ge be tw e en Ge nder and Colle ge

100% 100%

80% 80%
60%
60% 60%
40% 40% 40% 40%
40% 40%
20%
20% 20%
0% 0%
Males Females Total Males Females Total

SW318 Social Work Statistics Slide 145


Expected Frequencies
• Expected frequencies are computed as if there is no difference between the
groups, i.e. both groups have the same proportion as the total sample in each
category of the test variable.

• Since the proportion of subjects in each category of the group variable can differ,
we take group category into account in computing expected frequencies as well.

• To summarize, the expected frequencies for each cell are computed to be


proportional to both the breakdown for the test variable and the breakdown for
the group variable.

SW318 Social Work Statistics Slide 146


Expected Frequency Calculation
The data from “Observed Frequencies for Sample Data” is the source for
information to compute the expected frequencies. Percentages are computed
for the column of all students and for the row of all GPA’s. These percentages
are then multiplied by the total number of students in the sample (453) to
compute the expected frequency for each cell in the table.

SW318 Social Work Statistics Slide 147


Expected Frequencies versus Observed Frequencies

• The chi-square test of independence plugs the observed frequencies and


expected frequencies into a formula which computes how the pattern of
observed frequencies differs from the pattern of expected frequencies.

• Probabilities for the test statistic can be obtained from the chi-square probability
distribution so that we can test hypotheses.

SW318 Social Work Statistics Slide 148


Independent and Dependent Variables
• The two variables in a chi-square test of independence each play a specific role.
• The group variable is also known as the independent variable because it has an influence on
the test variable.

• The test variable is also known as the dependent variable because its value is believed to be
dependent on the value of the group variable.

• The chi-square test of independence is a test of the influence or impact that a


subject’s value on one variable has on the same subject’s value for a second
variable.

SW318 Social Work Statistics Slide 149


Step 1. Assumptions for the Chi-square Test
• The chi-square Test of Independence can be used for any level variable, including
interval level variables grouped in a frequency distribution. It is most useful for
nominal variables for which we do not another option.

• Assumptions: No cell has an expected frequency less than 5.

• If these assumptions are violated, the chi-square distribution will give us


misleading probabilities.

SW318 Social Work Statistics Slide 150


Step 2. Hypotheses and alpha
• The research hypothesis states that the two variables are dependent or related.
This will be true if the observed counts for the categories of the variables in the
sample are different from the expected counts.

• The null hypothesis is that the two variables are independent. This will be true if
the observed counts in the sample are similar to the expected counts.

• The amount of difference needed to make a decision about difference or


similarity is the amount corresponding to the alpha level of significance, which
will be either 0.05 or 0.01. The value to use will be stated in the problem.

SW318 Social Work Statistics Slide 151


Step 3. Sampling distribution and test statistic
• To test the relationship, we use the chi-square test statistic, which
follows the chi-square distribution.

• If we were calculating the statistic by hand, we would have to


compute the degrees of freedom to identify the probability of the test
statistic. SPSS will print out the degrees of freedom and the
probability of the test statistics for us.

SW318 Social Work Statistics Slide 152


Step 4. Computing the Test Statistic
• Conceptually, the chi-square test of independence statistic is computed by
summing the difference between the expected and observed frequencies for
each cell in the table divided by the expected frequencies for the cell.

• We identify the value and probability for this test statistic from the SPSS statistical
output.

SW318 Social Work Statistics Slide 153


Step 5. Decision and Interpretation
• If the probability of the test statistic is less than or equal to the probability of the
alpha error rate, we reject the null hypothesis and conclude that our data
supports the research hypothesis. We conclude that there is a relationship
between the variables.

• If the probability of the test statistic is greater than the probability of the alpha
error rate, we fail to reject the null hypothesis. We conclude that there is no
relationship between the variables, i.e. they are independent.

SW318 Social Work Statistics Slide 154


Which Cell or Cells Caused the Difference
• We are only concerned with this procedure if the result of the chi-square test was
statistically significant.

• One of the problems in interpreting chi-square tests is the determination of


which cell or cells produced the statistically significant difference. Examination of
percentages in the contingency table and expected frequency table can be
misleading.

• The residual, or the difference, between the observed frequency and the
expected frequency is a more reliable indicator, especially if the residual is
converted to a z-score and compared to a critical value equivalent to the alpha
for the problem.

SW318 Social Work Statistics Slide 155


Standardized Residuals
• SPSS prints out the standardized residual (converted to a z-score) computed for
each cell. It does not produce the probability or significance.

• Without a probability, we will compare the size of the standardized residuals to


the critical values that correspond to an alpha of 0.05 (+/-1.96) or an alpha of
0.01 (+/-2.58). The problems will tell you which value to use. This is equivalent to
testing the null hypothesis that the actual frequency equals the expected
frequency for a specific cell versus the research hypothesis of a difference greater
than zero.

• There can be 0, 1, 2, or more cells with statistically significant standardized


residuals to be interpreted.

SW318 Social Work Statistics Slide 156


Interpreting Standardized Residuals
• Standardized residuals that have a positive value mean that the cell was over-
represented in the actual sample, compared to the expected frequency, i.e. there
were more subjects in this category than we expected.

• Standardized residuals that have a negative value mean that the cell was under-
represented in the actual sample, compared to the expected frequency, i.e. there
were fewer subjects in this category than we expected.

SW318 Social Work Statistics Slide 157


Interpreting Cell Differences in
a Chi-square Test - 1

A chi-square test of independence


of the relationship between sex and
marital status finds a statistically
significant relationship between the
variables.

SW318 Social Work Statistics Slide 158


Interpreting Cell Differences in
a Chi-square Test - 2

Researcher often try to identify try to identify which cell or cells are the major contributors to
the significant chi-square test by examining the pattern of column percentages.

Based on the column percentages, we would identify cells on the married row and the
widowed row as the ones producing the significant result because they show the largest
differences: 8.2% on the married row (50.9%-42.7%) and 9.0% on the widowed row (13.1%-
4.1%)
SW318 Social Work Statistics Slide 159
Interpreting Cell Differences in
a Chi-square Test - 3

Using a level of significance of 0.05, the critical value for a standardized residual
would be -1.96 and +1.96. Using standardized residuals, we would find that only the
cells on the widowed row are the significant contributors to the chi-square
relationship between sex and marital status.

If we interpreted the contribution of the married marital status, we would be


mistaken. Basing the interpretation on column percentages can be misleading.

SW318 Social Work Statistics Slide 160


Chi-Square Test of Independence: post hoc
practice problem 1

This question asks you to use a chi-square test of independence and, if significant, to do a post
hoc test using 1.96 of the critical value.

First of all, the level of measurement for the independent and the dependent variable can be
any level that defines groups (dichotomous, nominal, ordinal, or grouped interval). “degree of
religious fundamentalism" [fund] is ordinal and "sex" [sex] is dichotomous, so the level of
measurement requirements are satisfied.
SW318 Social Work Statistics Slide 161
Chi-Square Test of Independence: post hoc
test in SPSS (1)

You can conduct a chi-square test of


independence in crosstabulation of SPSS by
selecting:

Analyze > Descriptive Statistics > Crosstabs…

SW318 Social Work Statistics Slide 162


Chi-Square Test of Independence: post hoc
test in SPSS (2)
First, select and move the variables
for the question to “Row(s):” and
“Column(s):” list boxes.

The variable mentioned first in the


problem, sex, is used as the
independent variable and is moved
to the “Column(s):” list box.

The variable mentioned second in


Second, click on “Statistics…” the problem, [fund], is used as the
button to request the test dependent variable and is moved to
statistic. the “Row(s)” list box.

SW318 Social Work Statistics Slide 163


Chi-Square Test of Independence: post hoc
test in SPSS (3)

First, click on “Chi-square” to request the


chi-square test of independence. Second, click on “Continue” button to
close the Statistics dialog box.

SW318 Social Work Statistics Slide 164


Chi-Square Test of Independence: post hoc
test in SPSS (4)

Now click on “Cells…” button


to specify the contents in the
cells of the crosstabs table.

SW318 Social Work Statistics Slide 165


Chi-Square Test of Independence: post hoc
test in SPSS (5)

In the “Residuals” section, select


“Unstandardized” and “Standardized”
residuals and click on “Continue” and
“OK” buttons.
First, make sure both “Observed” and
“Expected” in the “Counts” section in
“Crosstabs: Cell Display” dialog box are
checked.

SW318 Social Work Statistics Slide 166


Chi-Square Test of Independence: post hoc
test in SPSS (6)

In the table Chi-Square Tests result, SPSS also


tells us that “0 cells have expected count less
than 5 and the minimum expected count is
70.63”.

The sample size requirement for the chi-square


test of independence is satisfied.

SW318 Social Work Statistics Slide 167


Chi-Square Test of Independence: post hoc
test in SPSS (7)
The probability of the chi-square test statistic (chi-
square=2.821) was p=0.244, greater than the alpha
level of significance of 0.05. The null hypothesis that
differences in "degree of religious fundamentalism"
are independent of differences in "sex" is not
rejected.

The research hypothesis that differences in "degree


of religious fundamentalism" are related to
differences in "sex" is not supported by this
analysis.

Thus, the answer for this question is False. We do


not interpret cell differences unless the chi-square
test statistic supports the research hypothesis.

SW318 Social Work Statistics Slide 168


Chi-Square Test of Independence: post hoc
practice problem 2

This question asks you to use a chi-square test of independence and, if significant, to do a post
hoc test using -1.96 of the critical value.

First of all, the level of measurement for the independent and the dependent variable can be
any level that defines groups (dichotomous, nominal, ordinal, or grouped interval). [empathy3]
is ordinal and [sex] is dichotomous, so the level of measurement requirements are satisfied.

SW318 Social Work Statistics Slide 169


Chi-Square Test of Independence: post hoc
test in SPSS (8)

You can conduct a chi-square test of


independence in crosstabulation of SPSS by
selecting:

Analyze > Descriptive Statistics > Crosstabs…

SW318 Social Work Statistics Slide 170


Chi-Square Test of Independence: post hoc
test in SPSS (9)
First, select and move the variables
for the question to “Row(s):” and
“Column(s):” list boxes.

The variable mentioned first in the


problem, [sex], is used as the
independent variable and is moved
to the “Column(s):” list box.

The variable mentioned second in


Second, click on “Statistics…” the problem, [empathy3], is used as
button to request the test the dependent variable and is moved
statistic. to the “Row(s)” list box.

SW318 Social Work Statistics Slide 171


Chi-Square Test of Independence: post hoc
test in SPSS (10)

First, click on “Chi-square” to request the


chi-square test of independence. Second, click on “Continue” button to
close the Statistics dialog box.

SW318 Social Work Statistics Slide 172


Chi-Square Test of Independence: post hoc
test in SPSS (11)

Now click on “Cells…” button


to specify the contents in the
cells of the crosstabs table.

SW318 Social Work Statistics Slide 173


Chi-Square Test of Independence: post hoc
test in SPSS (12)

In the “Residuals” section, select


“Unstandardized” and “Standardized”
residuals and click on “Continue” and
“OK” buttons.
First, make sure both “Observed” and
“Expected” in the “Counts” section in
“Crosstabs: Cell Display” dialog box are
checked.

SW318 Social Work Statistics Slide 174


Chi-Square Test of Independence: post hoc
test in SPSS (13)

In the table Chi-Square Tests result, SPSS also


tells us that “0 cells have expected count less
than 5 and the minimum expected count is 6.79”.

The sample size requirement for the chi-square


test of independence is satisfied.

SW318 Social Work Statistics Slide 175


Chi-Square Test of Independence: post hoc
test in SPSS (14)
The probability of the chi-square test statistic
(chi-square=23.083) was p<0.001, less than or
equal to the alpha level of significance of 0.05.
The null hypothesis that differences in "accuracy
of the description of feeling protective toward
people being taken advantage of" are
independent of differences in "sex" is rejected.

The research hypothesis that differences in


"accuracy of the description of feeling protective
toward people being taken advantage of" are
related to differences in "sex" is supported by this
analysis.

Now, you can examine the post hoc test using the
given critical value.
SW318 Social Work Statistics Slide 176
Chi-Square Test of Independence: post hoc
test in SPSS (15)
The residual is the difference between the
actual frequency and the expected frequency
(58-79.2=-21.2).

When converted to a z-score, the


standardized residual (-2.4) was smaller than
the critical value (-1.96), supporting a specific
finding that among survey respondents who
were male, there were fewer who said that
feeling protective toward people being taken
advantage of describes them very well than
would be expected.

The answer to the question is true.

SW318 Social Work Statistics Slide 177


Steps in solving chi-square test of independence: post hoc
problems - 1
The following is a guide to the decision process for answering
homework problems about chi-square test of independence post hoc problems:

Is the dependent and independent


No Incorrect
variable nominal, ordinal, dichotomous,
application of a
or grouped interval?
statistic

Yes

SW318 Social Work Statistics Slide 178


Steps in solving chi-square test of independence: post hoc
problems - 2

Compute the Chi-Square test of independence,


requesting standardized residuals in the output

Yes
Incorrect
Expected cell counts less than 5? application of a
statistic

No

Is the p-value for the chi-square test


No
of independence <= alpha? False

Yes

SW318 Social Work Statistics Slide 179


Steps in solving chi-square test of independence: post hoc problems - 3

Identify the cell in the crosstabs table that contains the specific
relationship in the problem

Is the value of the standardized residual for


the specified cell larger (smaller) than the No
postive (negative) critical value given in the False
problem?

Yes

No
Is the relationship correctly described?
False

Yes

True SW318 Social Work Statistics Slide 180


Exercise: Chi-square
• Create and comment on the following chi-square:
• Age vs Gender
• Region vs Gender
• School vs Gender
• Year of study vs School
Ch 12
1-way ANOVA SPSS example
Part 2 - Nov 15th
SPSS Example for 1-way ANOVA
• Harassment data set with school district employees
• “School” variable indicates work setting
• 1=elementary school
• 2=middle school
• 3=high school
• “Harassment in 1997” indicates har experiences from ’96-’97
• Does the work setting influence harassment experiences?
(cont.)
• Get “anova_class” file from link on webpage
• In SPSS menus: Analyze  Compare Means  One-way ANOVA
• Then, “Dependent List” can indicate as many dependent variables as you’d
like…here “Harassment in 1997”
• In “Factor” indicate the ‘grouping’ variable on which you’ll compare
Harassment means… here “School”
(cont.)
• Click the “Options” button at bottom, click the box for Descriptives
under “Statistics”, hit continue…
• Click the “Post Hoc” button at bottom, click the box for “Bonferroni”,
hit continue…
• (this will give you output for follow-up comparisons in case your overall
ANOVA is signif  if it’s not, you’ll ignore these comparisons)
• Now hit “OK” to run the analysis
Output
• You’ll have 3 sections of output…
• The 1st reports the group harassment means for elementary, middle, and
high school employees
• You’ll need to look back at this to help w/interpretation in case your ANOVA is
signif!
• 2nd gives the overall ANOVA F test – for the null hypothesis of “no group
differences”
• Notice the MSBetween and MS Within, then the F statistic is your F obtained value,
next to that is the “Sig” value
• If “Sig” value is < .05 (or .01 – depends on alpha)  Reject Null and conclude there
are significant group differences
(cont.)
• But where are the signif group differences? Which groups differ?
• 3rd section gives follow-up comparisons (Bonferroni – but remember to
use .05/3 = .017 as your new comparison alpha level)
• Check each row for which pairs are being compared, then its “sig” value
• If “sig” < .017 (or whatever your new alpha is)  Reject null of equal group means;
conclude those 2 group means differ
• Which schools significantly differ in harassment experiences?

You might also like