0% found this document useful (0 votes)
19 views32 pages

MktRes MARK4338 Lecture5 004

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views32 pages

MktRes MARK4338 Lecture5 004

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Frequency Distribution and

Cross-Tabulation

© 2007 Prentice Hall 115-1


Chapter Outline
1) Frequency Distribution
2) Statistics Associated with Frequency Distribution
i. Measures of Location
ii. Measures of Variability
iii. Measures of Shape
3) Cross-Tabulations
i. Two Variable Case
ii. Three Variable Case
iii. General Comments on Cross-Tabulations
4) Statistics for Cross-Tabulation: Chi-Square
© 2007 Prentice Hall 15-2
Internet Usage Data
Table 15.1
Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking

1 1.00 7.00 14.00 7.00 6.00 1.00 1.00


2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 9.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00
© 2007 Prentice Hall 15-3
Frequency Distribution

 In a frequency distribution, one variable is


considered at a time.

 A frequency distribution for a variable produces


a table of frequency counts, percentages, and
cumulative percentages for all the values
associated with that variable.

© 2007 Prentice Hall 15-4


Frequency Distribution of Familiarity
with the Internet

Table 15.2

Valid Cumulative
Value label Value Frequency (N) Percentage percentage percentage

Not so familiar 1 0 0.0 0.0 0.0


2 2 6.7 6.9 6.9
3 6 20.0 20.7 27.6
4 6 20.0 20.7 48.3
5 3 10.0 10.3 58.6
6 8 26.7 27.6 86.2
Very familiar 7 4 13.3 13.8 100.0
Missing 9 1 3.3

TOTAL 30 100.0 100.0

© 2007 Prentice Hall 15-5


Frequency Histogram
Fig. 15.1
8
7
6
5
Frequency

4
3
2
1
0
2 3 4 5 6 7
© 2007 Prentice Hall
Familiarity 15-6
Statistics for Frequency Distribution:
Measures of Location
 The mean, or average value, is the most commonly used
measure of central tendency. The mean, X,is given by
n
X =  X i /n
i=1
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)

 The mode is the value that occurs most frequently. The


mode is a good measure of location when the variable is
inherently categorical or has otherwise been grouped into
categories.

© 2007 Prentice Hall 15-7


Statistics for Frequency Distribution:
Measures of Location

 The median of a sample is the middle


value when the data are arranged in
ascending or descending order.
 If the number of data points is even, the
median is the midpoint between the two
middle values. The median is the 50th
percentile.
© 2007 Prentice Hall 15-8
Statistics for Frequency Distribution:
Measures of Variability

 The range measures the spread of the data.


 The variance is the mean squared deviation
from the mean. The variance can never be
negative.
 The standard deviation is the square root
of the variance.
 The coefficient of variation is the ratio of
the standard deviation to the mean
expressed as a percentage, CV = sx /X
© 2007 Prentice Hall 15-9
Statistics for Frequency Distribution:
Measures of Shape
 Skewness. The tendency of the deviations from the mean
to be larger in one direction than in the other. Tendency for
one tail of the distribution to be heavier than the other.
 Kurtosis is a measure of the relative peakedness or
flatness of the frequency distribution curve. The kurtosis of
a normal distribution is zero.
-kurtosis>0, then dist is more peaked than normal dist.
-kurtosis<0, then dist is flatter than a normal distribution.

© 2007 Prentice Hall 15-10


Skewness of a Distribution
Fig. 15.2

Symmetric Distribution

Skewed Distribution

Mean
Median
Mode
(a)

Mean Median Mode


© 2007 Prentice Hall
(b) 15-11
Cross-Tabulation
 While a frequency distribution describes one variable at
a time, a cross-tabulation describes two or more
variables simultaneously.

 Cross-tabulation results in tables that reflect the joint


distribution of two or more variables with a limited
number of categories or distinct values, e.g., Table 15.3.

© 2007 Prentice Hall 15-12


Gender and Internet Usage
Table 15.3

Gender
Row
Internet Usage Male Female Total

Light (1) 5 10 15

Heavy (2) 10 5 15

Column Total 15 15

© 2007 Prentice Hall 15-13


Two Variables Cross-Tabulation
 Since two variables have been cross-classified,
percentages could be computed either columnwise,
based on column totals (Table 15.4), or rowwise,
based on row totals (Table 15.5).

 The general rule is to compute the percentages in


the direction of the independent variable, across the
dependent variable. The correct way of calculating
percentages is as shown in Table 15.4.

© 2007 Prentice Hall 15-14


Internet Usage by Gender
Table 15.4

Gender

Internet Usage Male Female

Light 33.3% 66.7%

Heavy 66.7% 33.3%

Column total 100% 100%

© 2007 Prentice Hall 15-15


Gender by Internet Usage
Table 15.5

Internet Usage

Gender Light Heavy Total

Male 33.3% 66.7% 100.0%

Female 66.7% 33.3% 100.0%

© 2007 Prentice Hall 15-16


Introduction of a Third Variable in
Cross-Tabulation
Fig. 15.7
Original Two Variables

Some Association No Association


between the Two between the Two
Variables Variables

Introduce a Third Introduce a Third


Variable Variable

Refined Association No Association No Change in Some Association


between the Two between the Two the Initial between the Two
Variables Variables Pattern Variables
© 2007 Prentice Hall 15-17
3 Variables Cross-Tab:
Refine an Initial Relationship
 As can be seen from Table 15.6, 52% (31%) of unmarried
(married) respondents fell in the high-purchase category
 Do unmarried respondents purchase more fashion clothing?
 A third variable, the buyer's sex, was introduced
 As shown in Table 15.7,
- 60% (25%) of unmarried (married) females fell in the
high-purchase category
- 40% (35%) of unmarried (married) males fell in the high-
purchase category.

 Unmarried respondents are more likely to fall in the high


purchase category than married ones, and this effect is
much more pronounced for females than for males.
© 2007 Prentice Hall 15-18
Purchase of Fashion Clothing by
Marital Status
Table 15.6

Purchase of Current Marital Status


Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents

© 2007 Prentice Hall 15-19


Purchase of Fashion Clothing by
Marital Status and Gender
Table 15.7
Purchase of Sex
Fashion Male Female
Clothing
Married Not Married Not
Married Married
High 35% 40% 25% 60%

Low 65% 60% 75% 40%

Column 100% 100% 100% 100%


totals
Number of 400 120 300 180
cases

© 2007 Prentice Hall 15-20


3 Variables Cross-Tab:
Initial Relationship was Spurious
 Table 15.8 shows that 32% (21%) of those with
(without) college degrees own an expensive
automobile

 Income may also be a factor

 In Table 15.9, when the data for the high income


and low income groups are examined separately,
the association between education and ownership
of expensive automobiles disappears,

 Initial relationship observed between these two


variables was spurious.
© 2007 Prentice Hall 15-21
Ownership of Expensive
Automobiles by Education
Level
Table 15.8
Own Expensive Education
Automobile
College Degree No College Degree

Yes 32% 21%

No 68% 79%

Column totals 100% 100%

Number of cases 250 750

© 2007 Prentice Hall 15-22


Ownership of Expensive Automobiles
by Education Level and Income Levels
Table 15.9
Income
Own Low Income High Income
Expensive
Automobile
College No College No College
Degree College Degree Degree
Degree

Yes 20% 20% 40% 40%


No 80% 80% 60% 60%
Column totals 100% 100% 100% 100%
Number of 100 700 150 50
respondents

© 2007 Prentice Hall 15-23


3 Variables Cross-Tab:
Reveal Suppressed Association
 Table 15.10 shows no association between desire to
travel abroad and age.
 In Table 15.11, sex was introduced as the third
variable.
 Controlling for effect of sex, the suppressed
association between desire to travel abroad and age
is revealed for the separate categories of males and
females.
 Since the association between desire to travel
abroad and age runs in the opposite direction for
males and females, the relationship between these
two variables is masked when the data are
aggregated across sex as in Table 15.10.
© 2007 Prentice Hall 15-24
Desire to Travel Abroad by Age
Table 15.10

Desire to Travel Abroad Age

Less than 45 45 or More

Yes 50% 50%

No 50% 50%

Column totals 100% 100%

Number of respondents 500 500

© 2007 Prentice Hall 15-25


Desire to Travel Abroad by
Age and Gender
Table 15.11

© 2007 Prentice Hall 15-26


Three Variables Cross-Tabulations
No Change in Initial Relationship

 Consider the cross-tabulation of family size and the


tendency to eat out frequently in fast-food restaurants
as shown in Table 15.12. No association is observed.

 When income was introduced as a third variable in the


analysis, Table 15.13 was obtained. Again, no
association was observed.

© 2007 Prentice Hall 15-27


Eating Frequently in
Fast-Food Restaurants by Family Size
Table 15.12

© 2007 Prentice Hall 15-28


Eating Frequently in Fast Food-Restaurants
by Family Size and Income
Table 15.13

© 2007 Prentice Hall 15-29


Statistics Associated with
Cross-Tab: Chi-Square
 H0: there is no association between the two
variables
 Use chi-square statistic.
 H0 will be rejected when the calculated value
of the test statistic is greater than the critical
value of the chi-square distribution

© 2007 Prentice Hall 15-30


Statistics for Cross-Tab: Chi-Square
 From Table 3 in the Statistical Appendix, the
probability of exceeding a chi-square value
of 3.841 is 0.05.
 The calculated chi-square is 3.333. Since
this is less than the critical value of 3.841,
the null hypothesis can not be rejected
 Thus, the association is not statistically
significant at the 0.05 level.

© 2007 Prentice Hall 15-31


Cross-Tabulation in Practice
While conducting cross-tabulation analysis in practice, it is
useful to proceed along the following steps.
1. Test the null hypothesis that there is no association between
the variables using the chi-square statistic. If you fail to
reject the null hypothesis, then there is no relationship.
2. If H0 is rejected, then determine the strength of the
association
3. If H0 is rejected, interpret the pattern of the relationship by
computing the percentages in the direction of the
independent variable, across the dependent variable.

© 2007 Prentice Hall 15-32

You might also like