Postmidterm Session PPTs
Postmidterm Session PPTs
3.14159265
Variable View window: Label
• Label
Variable View window: Values
• Values
Defining the value labels
Click
Data Input: Data View window
Click
Output Viewer
Syntax editor
The Four Windows: Script Window
• Script Window
Provides the opportunity to write full-blown programs,
in a BASIC-like language. Text editor for syntax
composition. Extension of the saved file will be “sbs.”
• Step 1: Define variables;
• Step 2: Data input
• Step 3: Data cleaning
13–
20
Opening SPSS
• Start → All Programs → SPSS Inc→ SPSS 18.0 →
SPSS 18.0
• Typos
• Missing values
22
Cleaning your data –
missing data
Cleaning your data –
missing data
Cleaning your data –
missing data cont.
Practice 1
• How would you put the following information into
SPSS?
N am e G ender H e ig h t
J A U N ITA 2 5 .4
S A LLY 2 5 .3
D ONNA 2 5 .6
S A B R INA 2 5 .7
J OHN 1 5 .7
M A RK 1 6
E RIC 1 6 .4
B RUC E 1 5 .9
• File
• Edit
• View
• Data
• Transform
• Analyse
• Graph
• Utilities
• Add ons 13–
27
28
• Many of the statistical methods that
we will apply require the assumption
that a variable or variables are
normally distributed.
• There are both graphical and statistical
methods for evaluating normality.
40
30
20
Frequency
10
Std. Dev = 15.35
Mean = 10.7
0 N = 93.00
0.0 20.0 40.0 60.0 80.0 100.0
10.0 30.0 50.0 70.0 90.0
0
Expected Normal
-1
-2
-3
-40 -20 0 20 40 60 80 100 120
Observed Value
Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
TOTAL TIME SPENT
.246 93 .000 .606 93 .000
ON THE INTERNET
a. Lilliefors Significance Correction
• Three common transformations are:
• the logarithmic transformation,
• the square root transformation, and
• the inverse transformation.
• Reliability: The degree of stability exhibited
when a measurement is repeated under identical
conditions
43
• Analyze > Scale > Reliability Analysis
• Select items for analysis
• Click “Statistics” and check “item” and
“scale if item deleted”
• Click continue
• Click OK
• See Output
44
45
• Case Processing Summary – N is # of Test Takers
• Reliability Statistics – Cronbach’s Alpha is our stat, .50-.60
marginal, .61-.70 good, .71-85 very good
• Item Statistics – average response for all test takers
• Item total Statistics – use to determine which items stay, get
dropped
48
• Descriptive Statistics
• Graphical & Statistical methods
• Inferential Statistics
• Hypothesis & Hypothesis testing
• Type I and II errors
• High School and Beyond Study
http:///www.psypress.com/ibm-spss-
intro-stas
• hbsdata.sav
• 75 respondents
• Variables: grades, mathematics
achievement, demographics(father’s &
mother’s education, mathematics
attitude(14 items)..
•Descriptive Analysis
•Statistical methods(statistics)
•Graphical methods
20–
51
Levels of Scale Measurement and Suggested Descriptive Statistics
20–
52
20–54
20–
63
• Understand nature of relationship between two variables.
• To test distribution of continuous variable
for normality
• Basics of SPSS
• Data cleaning and transformation
• Normality and Reliability test
• Recode
• Select cases
• Compute
13–
76
• Research situations
• Collapsing categories
• Interval to nominal data
• Reverse coding
13–
77
13–
78
13–
79
13–
80
Data manipulation – compute new
variable
.
13–
82
Data manipulation – select
cases
13–
84
85
Test of Association: Chi-square test
applications
•Goodness of fit of distribution
•Test of independence of
attributes
•Test of homogenity
(Oi E i )²
χ²
Ei
χ² = chi-square statistic
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell
Ri C j
E ij
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size 22–
89
d.f.=(R-1)(C-1)
22–
90
Which soap-powder name do shoppers like
best?
Number of shoppers picking each name
(observed frequencies):
• Washo Scruba Musty Stainzoff Beeo total
40 35 5 10 10 100
100 / 5 = 20.
Example
O E
2 2
E
Washo Scruba Musty Stainzoff Beeo total
O: 40 35 5 10 10 100
E: 20 20 20 20 20
100
95
•Testing hypothesis about
relationship between two
categorical variables.
96
Is there an association between gender (male
or female ) and soap powder (Washo, Musty,
etc)
• Data for a random sample of 100 shoppers, 70
men and 30 women
• This gives a 2 x 5 contingency table.
• To calculate expected frequencies:
22–
104
Test of Difference: ANOVA, MANOVA,
MACOVA)
• ANOVA means Analysis of Variance
• ANOVA: compare means of two or more
levels of the independent variable
106
• The partitioning of the total sum of
Independent
squares of deviations variable 1
Independent
Total sum of variable 2
Squares of
deviations of DV
Independent
variable 3
Error
107
F
• F-Test
• Used to determine whether there is more variability in the scores
of one sample than in the scores of another sample.
• Variance components are used to compute F-ratios
• SSE, SSB, SST
22–
108
• Research design
• Between-subjects design*: different individuals are assigned to
different groups (level of independent variable).
• Within-subjects design: all the participants are exposed to several
conditions.
109
• Data considerations
• Independent variable (factor variable)
is categorical.
• Dependent variable should be
quantitative (interval level of
measurement).
• Variances on dependent variable are
equal across groups 110
• Research question: Is there a difference in
sedentary behavior across different grade
students?
• One independent variable: Grade with
4 levels: 9th, 10th, 11th, and 12th grade.
• One dependent variable: sedentary
behavior ( How many hours watch TV)
111
• Running a one-way between-subjects ANOVA with SPSS.
• Select Analyze General Linear Model Univariate
• Move Q81
• Move Q3r
• Click Post Hoc
112
• Post Hoc Comparisons
• This analyses assess mean differences between all combinations
of pairs of groups (6 comparions)
• If the F ratios for the independent variable is significant
• To determine which groups differ from which
• It is a follow-up analysis
• Check Tukey checkbox
• Click Continue
113
• Options
Click Continue
Then click OK
114
• SPSS output
115
• SPSS output
116
• SPSS output
117
• Post Hoc Comparisons
• This analyses assess mean differences between all combinations
of pairs of groups (6 comparions)
• If the F ratios for the independent variable is significant
• To determine which groups differ from which
• It is a follow-up analysis
• Check Tukey checkbox
• Click Continue
118
• If the overall F is statistically significant, which pair of means
are significantly different?
22–
119
• SPSS output
120
• Results
The one-way ANOVA test showed there was a statistically
significant difference across grade levels in sedentary
behavior, F (3, 15709) = 26.86, p <.01, partial η2 = .01. A Tukey
HSD test indicated that 9th (M = 3.91, SD = 1.76) and 10th (M =
3.83, SD= 1.76) graders spent more time on watching TV on
average school day than 11th (M = 3.65, SD = 1.71)and 12th (M
= 3.61, SD = 1.71) graders did (p < .01).
121
• Are there statistically significant difference
between three fathers education group on
maths achievement score?
• The two-way ANOVA compares the mean differences between
groups that have been split on two independent variables
(called factors).
22–
124
• Do math grades and gender seem to have statistical significant
effect on math achievement score?
22–
125
22–
126
22–
127
22–
128
MANOVA
15
Frequency
10
Mean = 8.03
Std. Dev. = 12.952
N = 30
0
0 10 20 30 40 50
Units of alcohol per week
Ranks
• Practical differences between
parametric and NP are that NP
methods use the ranks of values
rather than the actual values
• E.g.
1,2,3,4,5,7,13,22,38,45 - actual
1,2,3,4,5,6, 7, 8, 9,10 - rank
Median
• The median is the value above and below which 50% of the
data lie.
• If the data is ranked in order, it is the middle value
• In symmetric distributions the mean and median are the
same
• In skewed distributions, median more appropriate
• Cross-Tabulation (Contingency) Table
• A joint frequency distribution of observations on two more variables.
• χ2 Distribution
• Provides a means for testing the statistical significance of a
contingency table.
• Involves comparing observed frequencies (Oi) with expected
frequencies (Ei) in each cell of the table.
• Captures the goodness- (or closeness-) of-fit of the observed
distribution with the expected distribution.
22–
155
•Goodness of fit of distribution
•Test of independence of
attributes
•Test of homogenity
(Oi E i )²
χ²
Ei
χ² = chi-square statistic
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell
Ri C j
E ij
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size 22–
158
d.f.=(R-1)(C-1)
22–
159
Which soap-powder name do shoppers like
best?
Number of shoppers picking each name
(observed frequencies):
• Washo Scruba Musty Stainzoff Beeo total
40 35 5 10 10 100
100 / 5 = 20.
Example
O E
2 2
E
Washo Scruba Musty Stainzoff Beeo total
O: 40 35 5 10 10 100
E: 20 20 20 20 20
100
164
•Testing hypothesis about
relationship between two
categorical variables.
165
Is there an association between gender (male
or female ) and soap powder (Washo, Musty,
etc)
• Data for a random sample of 100 shoppers, 70
men and 30 women
• This gives a 2 x 5 contingency table.
• To calculate expected frequencies:
22–
173
Z
Z
p1 p 2 1 2
S p1 p 2
p1 = sample portion of successes in Group 1
p2 = sample portion of successes in Group 2
1 1) = hypothesized population proportion 1
minus hypothesized population proportion 2
Sp1-p2 = pooled estimate of the standard errors of
differences of proportions
22–
174
Parametric / Non-parametric
Pearson’s correlation
Wilcoxon tests
• Frank Wilcoxon was Chemist
In USA who wanted to develop
test similar to t-test but without requirement of Normal
distribution
• Presented paper in 1945
• Wilcoxon Signed Rank Ξ paired t-test
Wilcoxon Signed Rank Test
• NP test relating to the median as measure of central
tendency
• The ranks of the absolute differences between the data
and the hypothesised median calculated
• The ranks for the negative and the positive differences
are then summed separately (W- and W+ resp.)
• The minimum of these is the test statistic, W
The Wilcoxon Signed Rank Test
Example
• Example using SPSS:
A group of 10 patients with chronic anxiety receive
sessions of cognitive therapy. Quality of Life scores
are measured before and after therapy.
Wilcoxon Signed Rank Test example
QoL Score
Before After Diff Rank -/+
6 9 3 5.5 +
W- = 2
5 12 7 10 + W+ = 7
3 9 6 9 + 1 tied
4 9 5 8 +
2 3 1 4 +
1 1 0 3 tied
3 2 -1 2 -
8 12 4 7 +
6 9 3 5.5 +
12 10 -2 1 -
Wilcoxon Signed Rank
Test(using SPSS)
• Are mother’s and father’s education levels
different?
Wilcoxon Signed Rank
Test example
p < 0.05
Parametric / Non-parametric
Pearson’s correlation
Mann-Whitney test Ξ Wilcoxon Rank
Sum
HB Mann
• Used when we want to compare two unrelated or
INDEPENDENT groups
• For parametric data you would use the unpaired
(independent) samples t-test
• The assumptions of the t-test were:
1. The distribution of the measure in each group is approx
Normally distributed
2. The variances are similar
Example
Plot histograms
Stem and leaf plot
Box-plot
Q-Q or P-P plot
Boxplots of alcohol units per week by gender
50
40
Units of alcohol per week
6
30
25
20
10
Male Female
Gender
Example (3)
Are those distributions symmetrical?
Definitely not!
Mann-Whitney (NS)
Chi-squared test
• Used when comparing 2 or more groups of categorical
or nominal data (as opposed to measured data)
• Already covered!
• In SPSS Chi-squared test is test of observed vs.
expected in single categorical variable
Parametric / Non-parametric
Pearson’s correlation
More than 2 groups
• So far we have been comparing 2
groups
• If we have 3 or more independent
groups and data is not Normal we need
NP equivalent to ANOVA
• If independent samples use Kruskal-
Wallis
• Same assumptions as before
More than 2 groups
(a) Kruskal-Wallis test:
• Similar to the Mann-Whitney test, except it
enables you to compare three or more groups
rather than just two.
• Different subjects are used for each group.
(b) Friedman's Test:
• Similar to the Wilcoxon test, except you can use
it with three or more conditions.
• Each subject does all of the experimental
conditions.
Kruskal-Wallis test
Example
Does it make any difference to students’
comprehension of statistics whether the lectures
are given in English, Spanish or French?
Step 1:
•Rank the scores, ignoring which group they belong to.
•Lowest score gets lowest rank.
•Tied scores get the average of the ranks they would otherwise
have obtained.
Step 2:
Find "Tc", the total of the ranks for each group.
Find H.
12 Tc 2
H 3 N 1
N N 1 nc
2 2 2 2
Tc 20 40.5 17.5
586.62
nc 4 4 4
12 (
H 586.62 3 13 ) 6.12
12 * 13
Step 4:
Degrees of freedom are the number of groups minus one. Here, d.f. = 3 - 1 = 2.
Step 5:
Assessing the significance of H depends on the number of participants and the
number of groups.
Here, H is 6.12.
For 2 d.f., a Chi-Square of 5.99 has a p = .05 of occurring by chance.
Our H is bigger than 5.99, and so even less likely to occur by chance!
H is 6.12, p <.05
Conclusion:
The three groups differ significantly; the language in which statistics is taught
does make a difference to the lecturer's intelligibility.
NB: the test merely tells you that the three groups differ; inspect the group
means or medians to decide how they differ.
English M = 22.25, Spanish = M = 32.25, French M = 21.50.
Looks like lectures are more intelligible in Spansiht than in either English or
French (which are similar to each other).
Using SPSS for the Kruskal-Wallis test:
Analyze > Nonparametric tests > Legacy Dialogs > k independent samples...
Independent measures -
one column gives scores, another column identifies which group each score belongs to:
“1” for “English”, “2” for “Spanish”, “3” for “French”.
Kruskal-Wallis test
(using SPSS)
• Are there statistically significant
difference among the three father’s
education groups on the competence
scale?
Using SPSS for the Kruskal-Wallis test :
SPSS output for Kruskal-Wallis test :
Version B:
During the past 4 weeks, I have felt downhearted:
Never 1
Some days 2
Every day 3
Interrater Reliability
The extent to which the scores
counted by coders correlate
each other.
Aggression Code
Coder 1 Coder 2
Hit boy A _____ ______
Hit boy B ______ ______
Cohen’s Kappa
Hit girl A ______ ______
Hit girl B ______ ______ 1 3
3 3
3 2
1 1
• Applied not to one item, but to groups of items
that are thought to measure different aspects of
the same concept
t
Questionnaire 1 Questionnaire 1
Item 1 Test-Retest Reliability
Item 1
Item 2 Item 2
Item 3 Item 3
Reliability as
Internal Consistency
Questionnaire 2
Item 1
Item 2
Equivalent-Forms
Item 3
Reliability
Interrater Reliability
• Cronbach’s alpha model
• Celebrity data
• Trustworthiness: honest, trustworthy, reliable,
sincere, dependable
• Expertise: Knowledgeable, qualified, experienced,
expert
• Attractiveness: beautiful, classy, elegant, sexy
220
• Analyze > Scale > Reliability Analysis
• Select items for analysis
• Click “Statistics” and check “item” and
“scale if item deleted”
• Click continue
• Click OK
• See Output
221
222
• Case Processing Summary – N is # of Test Takers
• Reliability Statistics – Cronbach’s Alpha is our stat, .50-.60
marginal, .61-.70 good, .71-85 very good
• Item Statistics – average response for all test takers
• Item total Statistics – use to determine which items stay, get
dropped
225
• Validity is measured in many forms
• Face validity
• Predictive validity
• Content validity
• Construct validity
• The extent to which the measured variable
appears to be an adequate measure of the
conceptual variable.
I don’t like Japanese
Strongly Disagree 1 2 3 4 5 6 7 8 Strongly Agree
Discrimination
towards Japanese
X
Measured Conceptual
Variable Variable
• The degree to which the measured variable
appears to have adequately sampled from the
potential domain of question that might relate to
the conceptual variable of interest
Sympathy
Verbal Aptitude
• Alternative Hypothesis
• Statement that indicates the
opposite of the null hypothesis.
• Relational hypotheses
• Examine how changes in one variable vary with
changes in another.
• Hypotheses about differences between groups
21–237
• Examine how some variable varies from one
group to another.
• Hypotheses about differences from some
standard
• Examine how some variable differs from some
preconceived standard.
• Univariate Statistical Analysis
• Tests of hypotheses involving only one
variable.
• Testing of statistical significance
• Bivariate Statistical Analysis
• Tests of hypotheses involving two variables.
• Multivariate Statistical Analysis
• Statistical analysis involving three or more
variables or sets of variables.
Explore relationships between variables Description
General
purpose
Correlation, Mean,
Statistics t-test, ANOVA regression percentage,
range
• The specifically stated hypothesis is derived from
the research objectives.
• A sample is obtained and the relevant variable is
measured.
• The measured sample value is compared to the
value either stated explicitly or implied in the
hypothesis.
• If the value is consistent with the hypothesis,
the hypothesis is supported.
• If the value is not consistent with the
hypothesis, the hypothesis is not supported.
• Significance Level
• p-value
21–
244
• Significance Level
• A critical probability associated with a statistical
hypothesis test that indicates how likely an inference
supporting a difference between an observed value
and some statistical expectation is true.
• The acceptable level of Type I error.
21–245
• p-value
• Probability value, or the observed or computed
significance level.
• p-values are compared to significance levels to test
hypotheses.
• Higher p-values equal more support for an
hypothesis.
p-Values and Statistical Tests
21–246
• Type I Error
• An error caused by rejecting the null
hypothesis when it is true.
• Has a probability of alpha (α).
• Practically, a Type I error occurs when the
researcher concludes that a relationship or
difference exists in the population when in
reality it does not exist.
• Type II Error
• An error caused by failing to reject the null
hypothesis when the alternative hypothesis is
true.
• Has a probability of beta (β).
• Practically, a Type II error occurs when a
researcher concludes that no relationship or
difference exists when in fact one does exist.
Type I and Type II Errors in Hypothesis Testing
• Type of question to be answered
• Number of variables involved
• Level of scale measurement
21–254
• Parametric Statistics
• Involve numbers with known, continuous
distributions.
• Appropriate when:
• Data are interval or ratio scaled.
21–255
• Sample size is large.
• Nonparametric Statistics
• Appropriate when the variables being analyzed
do not conform to any known or continuous
distribution.
t test
• t-test
• A hypothesis test that uses the t-
distribution.
x H
t
sx
t test statistic
x sample mean
H hypothesized population mean 21–
Hypothesized
population mean (and
sampling distribution)
4.78
1 2 3 4 5
Not Very
Important Important
• Is the mean SAT score statistically
different from presumed population
mean 500?
t
“Grandville”
“Branson”
3.70 4.78
1 2 3 4 5
Not Very
Important Important
“How important is the Patronage Refund Program to you as a member/borrower with FCS?
• Do male and female students differ in
regard to their average maths
achievement score?
t
• Permits the comparison on separate
questionnaire items from the same group of
respondents
23–
275
23–
276
• Correlation coefficient
• A statistical measure of the covariation, or association, between
two at-least interval variables.
• Covariance
• Extent to which two variables are associated systematically with
each other.
X i X Yi Y
n
i 1
rxy r yx
Xi X Yi Y
n n
2 2
i 1 i 1 23–
277
• Pearson product-moment correlation coefficient (r)
• Ranges from +1 to -1
• Perfect positive linear relationship = +1
• Perfect negative (inverse) linear relationship = -1
• No correlation = 0
• Correlation coefficient for two variables (X,Y)
23–
278
EXHIBIT 23.2 Scatter Diagram to Illustrate Correlation Patterns
23–
279
• When two variables covary, they display concomitant
variation.
• This systematic covariation does not in and of itself establish
causality.
• e.g., Rooster’s crow and the rising of the sun
• Rooster does not cause the sun to rise.
23–
280
EXHIBIT 23.3
23–
281
•Bivariate
•Partial
Spearman Rank Correlation
Coefficient (rs)
It is a non-parametric measure of correlation.
This procedure makes use of the two sets of
ranks that may be assigned to the sample
values of x and Y.
Procedure:
2Explained variance
R
Total Variance
23–
289
• Correlation matrix
• The standard form for reporting
correlation coefficients for more than
two variables.
• Statistical Significance
• The procedure for determining statistical
significance is the t-test of the
significance of a correlation coefficient.
23–
290
EXHIBIT 23.4 Pearson Product-Moment Correlation Matrix for Salesperson Examplea
23–
291
• Simple (Bivariate) Linear Regression
• A measure of linear association that investigates
straight-line relationships between a continuous
dependent variable and an independent variable that
is usually continuous, but can be a categorical dummy
variable.
• The Regression Equation (Y = α + βX )
• Y = the continuous dependent variable
• X = the independent variable
• α = the Y intercept (regression line intercepts Y axis)
• β = the slope of the coefficient (rise over run) 23–
292
Y
130
120
110
100 Yˆ aˆ ̂X
90 Yˆ
80
X
X
80 90 100 110 120 130 140 150 160 170 23–
293
• Parameter Estimate Choices
• β is indicative of the strength and direction of the
relationship between the independent and dependent
variable.
• α (Y intercept) is a fixed point that is considered a
constant (how much Y can exist without X)
• Standardized Regression Coefficient (β)
• Estimated coefficient of the strength of relationship
between the independent and dependent variables.
• Expressed on a standardized scale where higher absolute
values indicate stronger relationships (range is from -1 to
1).
23–
294
• Parameter Estimate Choices
• Raw regression estimates (b1)
• Raw regression weights have the advantage of retaining the scale
metric—which is also their key disadvantage.
• If the purpose of the regression analysis is forecasting, then raw
parameter estimates must be used.
• This is another way of saying when the researcher is interested only
in prediction.
• Standardized regression estimates (β)
• Standardized regression estimates have the advantage of a constant
scale.
• Standardized regression estimates should be used when the
researcher is testing explanatory hypotheses.
23–
295
EXHIBIT 23.5 The Advantage of Standardized Regression Weights
23–
296
EXHIBIT 23.7 The Best Fit Line or Knocking Out the Pins
23–
297
• OLS
• Guarantees that the resulting straight line will produce the least
possible total error in using X to predict Y.
• Generates a straight line that minimizes the sum of squared deviations
of the actual values from this predicted regression line.
• No straight line can completely represent every dot in the scatter
diagram.
• There will be a discrepancy between most of the actual scores (each
dot) and the predicted score .
• Uses the criterion of attempting to make the least amount of total
error in prediction of Y from X.
23–
298
23–
299
The equation means that the predicted value for any value
of X (Xi) is determined as a function of the estimated slope
coefficient, plus the estimated intercept coefficient + some
error.
23–
300
23–
301
• Statistical Significance Of Regression Model
• F-test (regression)
• Determines whether more variability is explained by the regression or
unexplained by the regression.
23–
302
• Statistical Significance Of Regression Model
• ANOVA Table:
23–
303
• R2
• The proportion of variance in Y that is explained by X
(or vice versa)
• A measure obtained by squaring the correlation
coefficient; that proportion of the total variance of a
variable that is accounted for by knowing the value of
another variable.
3,398 . 49
R
2
0 . 875
3,882 . 40
23–
304
EXHIBIT 23.6 Relationship of Sales Potential to Building Permits Issued
23–
305
EXHIBIT 23.8 Simple Regression Results for Building Permit Example
23–
306
23–
307
• Sample size
• Multicollinearity of independent variables
• Linearity
• Absence of outliers
• Homoscedasticity
• Normality
• Standard multiple regression
• Hierarchical regression
• Stepwise regression
LEARNING OUTCOMES
24–
314
Data
Reduction or Dimension
Reduction technique
• Data reduction tool
• Represents correlated variables with a smaller set
of “derived” variables.
• Factors are formed that are relatively
independent of one another.
• Two types of “variables”:
• latent variables: factors
• observed variables
Example: What’s Peter Griffin Like?
What’s Peter Griffin Like?
LV
OV
Factor Analysis:
Concept: Covariance
• It’s about the level of association
between a set of variables!
• A correlation coefficient is a
standardised covariance.
• Types:
• Exploratory factor analysis (EFA)—performed when the
researcher is uncertain about how many factors may exist among
a set of variables.
• Confirmatory factor analysis (CFA)—performed when the
24–322
researcher has strong theoretical expectations about the factor
structure before performing the analysis.
EXHIBIT 24.6 A Simple Illustration of Factor Analysis
24–323
• Interdependency technique
• Also known as FA
• Explore the structure amongst a set of manifest or observed
variables is highlighted and no a priori reasoning is considered
regarding the interrelationship between the observed
variables.
Characteristics of EFA
• EFA seeks to resolve a large set of measured (manifest)
variables in terms of relatively few categories known as
factors which could be termed as constructs.
• There is no criterion or predictor subsets like those in
Multiple Regression since it is an interdependency
multivariate technique.
• It examines the overall association amongst variables
• It is based on linear correlation and assumes data in
metric scale( interval or ratio). However, ordinal scale
data are very often used.
• Subjectivity is involved in naming the factor (latent
variable) or the constructs.
• Factor
• Factor Loadings
• Communality
• EIGEN value
• KMO Statistic
• Bartlett’s Test of Sphericity
A few important terms used in
Exploratory Factor Analysis
• Factor: It is an underlying dimension that accounts for
several observed variables.
331
• Bartlett Test of Sphericity:
used to test the hypothesis the correlation matrix is an identity matrix (all
diagonal terms are 1 and all off-diagonal terms are 0).
If the value of the test statistic for sphericity is large and the associated
significance level is small, it is unlikely that the population correlation matrix
is an identity.
332
• The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:
is an index for comparing the magnitude of the observed correlation
coefficients to the magnitude of the partial correlation coefficients.
The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8
and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is
unaccaptable ).
333
• Extraction refers to the process of obtaining the underlying
factors or components.
To decide on how many factors
we need to represent the data, Total Variance Explained
335
The examination of the Scree plot
provides a visual of the total
variance associated with each
factor.
336
• Two types
• Orthogonal
• Oblique
• CFA is used to test whether the factor
structure matches the results of the
original study
Structural Equations
Modeling
Nature of SEM
•Manifest variables
•Latent variables
•Error
•Disturbance
•Exogenous Variables
•Endogenous or dependent
variable
Basic Terminologies of SEM (Contd..)
•Single-headed arrow
•Double-headed arrow
•Rectangle or Square
•Circle or Eclipse
Components of SEM
L-
RB-1(Reduced 2(Recommend
anxiety) to others)
L3(first choice
RB- to purchase)
2(Recognition)
RB CL
L4(Do not
RB-3(Special switch)
discount)
24–360
Z i b1 X 1i b 2 X 2 i b n X ni
Z b1 X 1 b 2 X 2 b3 X 3
0 .069 X 1 0 .013 X 2 0 .0007 X 3
Discriminant analysis using
SPSS
• Analyse - Classify - Discriminant
• Group Variables – Define Range – 0, 1 – OK
Example: Family Conflict and
working hours
The work- family conflict of an individual is influenced by
• The no of working hours- since increase in the office hours
would increase stress levels and leave less time for family
affairs
• The number of children- since increase in the number of
children shall lead to increase in responsibilities at home
thereby encroaching on office responsibilities
• The years of work experience- since increase in the number
of work experience shall lead to more involvement and
responsibilities at work
Work Family Conflict = -9.098 + 0.520 (no of work hours) + 0.467
(no of children) + 0.409 (no of yrs of work experience)
• This indicates that the variable “no of working hours” has the
greatest discriminate weight.
• This can be confirmed with the standardized discriminant
function.
• The Wilk’s lambda is found as 0.637 and the F value is
statistically significant at 0.001 indicating good discriminating
power of the Discriminant equation.
This table summarizes the
analysis dataset in terms of
valid and excluded cases.
In this example, 39 out of 40
observations are valid and
taken into consideration.
24–368
EXHIBIT 24.7 Clusters of Individuals on Two Dimensions
Distance measures for individual observations
24–370
measure
• Many characteristics (e.g. income, age, consumption habits, brand
loyalty, purchase frequency, family composition, education level, ..), it
becomes more difficult to define similarity with a single value
• The most known measure of distance is the Euclidean
distance, which is the concept we use in everyday life for
spatial coordinates.
Model:
Data: each object is characterized by a set of numbers
(measurements);
e.g., object 1: (x11, x12, … , x1n)
object 2: (x21, x22, … , x2n)
: :
object p: (xp1, xp2, … , xpn)
24–371
Distance: Euclidean distance, dij,
LV
OV
Factor Analysis:
Concept: Covariance
• It’s about the level of association
between a set of variables!
• A correlation coefficient is a
standardised covariance.
• Types:
• Exploratory factor analysis (EFA)—performed when the
researcher is uncertain about how many factors may exist among
a set of variables.
• Confirmatory factor analysis (CFA)—performed when the
24–386
researcher has strong theoretical expectations about the factor
structure before performing the analysis.
EXHIBIT 24.6 A Simple Illustration of Factor Analysis
24–387
• Interdependency technique
• Also known as FA
• Explore the structure amongst a set of manifest or observed
variables is highlighted and no a priori reasoning is considered
regarding the interrelationship between the observed
variables.
Characteristics of EFA
• EFA seeks to resolve a large set of measured (manifest)
variables in terms of relatively few categories known as
factors which could be termed as constructs.
• There is no criterion or predictor subsets like those in
Multiple Regression since it is an interdependency
multivariate technique.
• It examines the overall association amongst variables
• It is based on linear correlation and assumes data in
metric scale( interval or ratio). However, ordinal scale
data are very often used.
• Subjectivity is involved in naming the factor (latent
variable) or the constructs.
• Factor
• Factor Loadings
• Communality
• EIGEN value
• KMO Statistic
• Bartlett’s Test of Sphericity
A few important terms used in
Exploratory Factor Analysis
• Factor: It is an underlying dimension that accounts for
several observed variables.
395
• Bartlett Test of Sphericity:
used to test the hypothesis the correlation matrix is an identity matrix (all
diagonal terms are 1 and all off-diagonal terms are 0).
If the value of the test statistic for sphericity is large and the associated
significance level is small, it is unlikely that the population correlation matrix
is an identity.
396
• The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:
is an index for comparing the magnitude of the observed correlation
coefficients to the magnitude of the partial correlation coefficients.
The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8
and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is
unaccaptable ).
397
• Extraction refers to the process of obtaining the underlying
factors or components.
To decide on how many factors
we need to represent the data, Total Variance Explained
399
The examination of the Scree plot
provides a visual of the total
variance associated with each
factor.
400
• Two types
• Orthogonal
• Oblique
• CFA is used to test whether the factor
structure matches the results of the
original study
Structural Equations
Modeling
Nature of SEM
•Manifest variables
•Latent variables
•Error
•Disturbance
•Exogenous Variables
•Endogenous or dependent
variable
Basic Terminologies of SEM (Contd..)
•Single-headed arrow
•Double-headed arrow
•Rectangle or Square
•Circle or Eclipse
Components of SEM
L-
RB-1(Reduced 2(Recommend
anxiety) to others)
L3(first choice
RB- to purchase)
2(Recognition)
RB CL
L4(Do not
RB-3(Special switch)
discount)
24–424
Z i b1 X 1i b 2 X 2 i b n X ni
Z b1 X 1 b 2 X 2 b3 X 3
0 .069 X 1 0 .013 X 2 0 .0007 X 3
Discriminant analysis using
SPSS
• Analyse - Classify - Discriminant
• Group Variables – Define Range – 0, 1 – OK
Example: Family Conflict and
working hours
The work- family conflict of an individual is influenced by
• The no of working hours- since increase in the office hours
would increase stress levels and leave less time for family
affairs
• The number of children- since increase in the number of
children shall lead to increase in responsibilities at home
thereby encroaching on office responsibilities
• The years of work experience- since increase in the number
of work experience shall lead to more involvement and
responsibilities at work
Work Family Conflict = -9.098 + 0.520 (no of work hours) + 0.467
(no of children) + 0.409 (no of yrs of work experience)
• This indicates that the variable “no of working hours” has the
greatest discriminate weight.
• This can be confirmed with the standardized discriminant
function.
• The Wilk’s lambda is found as 0.637 and the F value is
statistically significant at 0.001 indicating good discriminating
power of the Discriminant equation.
This table summarizes the
analysis dataset in terms of
valid and excluded cases.
In this example, 39 out of 40
observations are valid and
taken into consideration.
24–432
EXHIBIT 24.7 Clusters of Individuals on Two Dimensions
Distance measures for individual observations
24–434
measure
• Many characteristics (e.g. income, age, consumption habits, brand
loyalty, purchase frequency, family composition, education level, ..), it
becomes more difficult to define similarity with a single value
• The most known measure of distance is the Euclidean
distance, which is the concept we use in everyday life for
spatial coordinates.
Model:
Data: each object is characterized by a set of numbers
(measurements);
e.g., object 1: (x11, x12, … , x1n)
object 2: (x21, x22, … , x2n)
: :
object p: (xp1, xp2, … , xpn)
24–435
Distance: Euclidean distance, dij,