0% found this document useful (0 votes)
60 views

Lecture 7 Crosstabs

This document provides information on crosstabs, which are tables used to assess relationships between two variables. It discusses what a crosstab is, provides an example of a 2x2 crosstab comparing gender and TV watching, and offers guidance on properly analyzing, interpreting, and presenting crosstab results. Key points covered include percentage calculations for crosstabs, issues to consider when operationalizing variables, and matters of data analysis such as small sample sizes and determining meaningful differences. The document concludes with instructions for an assignment involving hypothesis testing using SPSS crosstabs.

Uploaded by

ibmr
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Lecture 7 Crosstabs

This document provides information on crosstabs, which are tables used to assess relationships between two variables. It discusses what a crosstab is, provides an example of a 2x2 crosstab comparing gender and TV watching, and offers guidance on properly analyzing, interpreting, and presenting crosstab results. Key points covered include percentage calculations for crosstabs, issues to consider when operationalizing variables, and matters of data analysis such as small sample sizes and determining meaningful differences. The document concludes with instructions for an assignment involving hypothesis testing using SPSS crosstabs.

Uploaded by

ibmr
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Crosstabs

How do we assess the relationship between two variables?


(Well bring in more variables later.)
Various ways, especially with interval-level data;
one of the most common ways is with crosstabs.
Crosstab is a contraction of Cross Tabulation
Also called a contingency table
What is a (simple) crosstab?
A table based on two variables, where the cell
entries are the counts or percentages of cases
that fall in that row or column category.
So, also called a bivariate frequency table .

WARNING:
Theres going to be a lot coming at you (in
class).
It requires paying attention, thinking
(wow!).
Butit only involves percentages, so no
complicated statistics (yet).

An example: gender & tv watching

We begin with the


observations (persons).
A file of about a thousand
people would have data like
this (except that Gender and
Watching TV would be coded
using numbers).
You would create the cross-tab
(presumably using SPSS
these are real bears to do by
hand).

Observation
Gender
number

Watch
(some TV
program)

1004

Are Women More Likely to Watch Than Men?

You might want to ask the


question: are women more
likely to watch this particular tv
program than men are?
So, you display the data in a
crosstab (in this case a 2x2
table).
But how do you read it?

Female

Male

Watch

331

170

~Watch

210

293

Almost without exception, you want to look


at percentages rather than numbers of
cases.
But, which way to percentage?
Add to 100% within categories of the iv.
In the way that makes sense for the
question at hand. (Requires thinking.)

We can % either way


(by rows)
(by columns)
F

Total

Total

331
(66%)

170
501
(34%) (100%)

331
(61%)

170
(37%)

501
(50%)

~W

210
(42%)

293
503
(58%) (100%)

~W

210
(39%)

293
(63%)

503
(50%)

Total 541
(54%)

463
1004
(46%) (100%)

Total

541
(100%)

463
(100%)

1004
(100%)

Note
Ive percentaged the total row or col.
Not necessary, but often useful, and
its done automatically by SPSS.
Ive used whole numbers. Nothing
about creating tables defines
accuracy level. Dont overdo
accuracy.

Which way is correct?


Recall:
Dv is the one were trying to explain.
Iv is the one used to explain the dv.
The question we are asking is: are
women more likely to watch this
particular tv program than men are?

Burning question
Does it make any difference which
variable makes up the rows, and which the
columns?
No. Theres no agreed-upon convention
for whether the iv goes in the rows or
columns (text notwithstanding). BUT
If you switch row and column variables,
then which percentages are right (for your
question) will also change.

More analytical matters.


Illustration: Do different careers attract
different partisans? (class survey)
Are Democrats or Republicans more likely to go into:
Law?
Politics?
Business?
Academia?

Party ID * Career
Party ID * career Crosstabulation

law
Party
ID

Republican
Democrat
Independent
Other
Don't Know

Total

Count
% within Party
Count
% within Party
Count
% within Party
Count
% within Party
Count
% within Party
Count
% within Party

ID
ID
ID
ID
ID
ID

6
42.9%
10
41.7%
2
15.4%
2
50.0%
2
50.0%
22
37.3%

career
politics
business
2
5
14.3%
35.7%
10
2
41.7%
8.3%
5
4
38.5%
30.8%
0
1
.0%
25.0%
0
2
.0%
50.0%
17
14
28.8%
23.7%

academic/edu
1
7.1%
2
8.3%
2
15.4%
1
25.0%
0
.0%
6
10.2%

Total
14
100.0%
24
100.0%
13
100.0%
4
100.0%
4
100.0%
59
100.0%

Recode? Note small # of cases in some


rows. Also, are Ind and Other different?
DKs? Delete or combine with other rows?

More analytical matters (cont.)


These are analytical matters.
Dont make meaningless combinations
just because of small Ns.
Keep in/delete dont knows depending
on your reasoning about them.
Suppose we decide to keep DKs and
combine the three smallest categories.

Recoded
Party ID * career Crosstabulation

law
Party
ID

republican
democrat
other

Total

Count
% within Party
Count
% within Party
Count
% within Party
Count
% within Party

ID
ID
ID
ID

6
42.9%
10
41.7%
6
28.6%
22
37.3%

career
politics
business
2
5
14.3%
35.7%
10
2
41.7%
8.3%
5
7
23.8%
33.3%
17
14
28.8%
23.7%

academic/edu
1
7.1%
2
8.3%
3
14.3%
6
10.2%

Table is simpler, easier to read.


More meaningful because it doesnt make
distinctions we arent really interested in.

Total
14
100.0%
24
100.0%
21
100.0%
59
100.0%

Final analytical matter


How much of a difference is enough to be
meaningful?
Important question
For now, see Weisberg et al. reading,
pp. 211-12.
You might want to look at this when
doing your data assignment.

Presentation matters
DO NOT USE SPSS OUTPUT DIRECTLY
Reformat as necessary
Provide meaningful labels
Give it a title
Show ns, %s, not cell counts
Can be easily done in MSWord (maybe
other ways as well)

Table 1. Career Interests by Party Identification


Partisanship

_________________Career_______________
Law
Politics
Business
Education

Total (n)

Republican

42.9%

14.3

35.7

7.1

100.0% (14)

Democrat

41.7%

41.7

8.3

8.3

100.0% (24)

Other

28.6%

23.8

33.3

14.3

100.0% (21)

37.3%

28.8

23.7

10.2

100.0% (59)

Total

Table 1. Career Interests by Party Identification


Partisanship

_________________Career__________________
Law
Politics
Business
Education

Total (n)

Republican

42.9%

14.3

35.7

7.1

100.0% (14)

Democrat

41.7%

41.7

8.3

8.3

100.0% (24)

Other

28.6%

23.8

33.3

14.3

100.0% (21)

37.3%

28.8

23.7

10.2

100.0% (59)

Total

Data Analysis #1
Due one week from today (by inds, not pairs)

Directions are on the syllabus


Describe a hypothesis (NES or GSS).
Tell us why this hyp makes sense.
Thoughtfulness is rewarded. (Dems
more often voted for Gore is not that
thoughtful.)
Discuss the operazation of your concepts.
Tell us how you operationalized your
variables, but also why.
Tell us about measurement problems.

Generate an SPSS cross-tab to test your


hypothesis.
Percentage the table properly.
Presentation as noted earlier.
Explain the table: is your hypothesis
supported?
(More than yes or no is required.)
Note possible alternative explanations.

Usually 3 pp. (double-spaced) + table


Table should go on a separate page.
Writing is important.
Use clear, straightforward prose (you
are not writing a novel).
Proper grammar; correct spelling,
punctuation, and capitalization; typofree

You might also like