0% found this document useful (0 votes)
18 views442 pages

Postmidterm Session PPTs

The document provides an overview of SPSS, a statistical software used for data manipulation and analysis, detailing its features such as the data editor, output viewer, and syntax editor. It covers essential processes including data input, cleaning, normality testing, reliability analysis, and various statistical tests like ANOVA and chi-square tests. Additionally, it emphasizes the importance of understanding variable distributions and the assumptions required for different statistical methods.

Uploaded by

cewax99242
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views442 pages

Postmidterm Session PPTs

The document provides an overview of SPSS, a statistical software used for data manipulation and analysis, detailing its features such as the data editor, output viewer, and syntax editor. It covers essential processes including data input, cleaning, normality testing, reliability analysis, and various statistical tests like ANOVA and chi-square tests. Additionally, it emphasizes the importance of understanding variable distributions and the assumptions required for different statistical methods.

Uploaded by

cewax99242
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 442

13–1

•Gaining understanding of data.


•Identifying outliers
•Examining variables(for normal
distribution)
19–
3
What is SPSS?

• One of the most popular statistical packages


which can perform highly complex data
manipulation and analysis with simple
instructions
The Windows:
Data editor
Output viewer
Syntax editor
Data Editor
Opening SPSS
• The default window will have the data editor
• There are two sheets in the window:
1. Data view 2. Variable view
Define Variables: Variable View
window
Variable View window: Type
• Type
Variable View window: Width
• Width
Variable View window: Decimals
• Decimals

3.14159265 
Variable View window: Label
• Label
Variable View window: Values
• Values
Defining the value labels

Click
Data Input: Data View window

Click
Output Viewer
Syntax editor
The Four Windows: Script Window
• Script Window
Provides the opportunity to write full-blown programs,
in a BASIC-like language. Text editor for syntax
composition. Extension of the saved file will be “sbs.”
• Step 1: Define variables;
• Step 2: Data input
• Step 3: Data cleaning

13–
20
Opening SPSS
• Start → All Programs → SPSS Inc→ SPSS 18.0 →
SPSS 18.0
• Typos
• Missing values

22
Cleaning your data –
missing data
Cleaning your data –
missing data
Cleaning your data –
missing data cont.
Practice 1
• How would you put the following information into
SPSS?
N am e G ender H e ig h t
J A U N ITA 2 5 .4
S A LLY 2 5 .3
D ONNA 2 5 .6
S A B R INA 2 5 .7
J OHN 1 5 .7
M A RK 1 6
E RIC 1 6 .4
B RUC E 1 5 .9
• File
• Edit
• View
• Data
• Transform
• Analyse
• Graph
• Utilities
• Add ons 13–
27
28
• Many of the statistical methods that
we will apply require the assumption
that a variable or variables are
normally distributed.
• There are both graphical and statistical
methods for evaluating normality.

• Graphical methods include the histogram


and normality plot.
To compute the statistics
needed for evaluating the
normality of a variable, select
the Explore… command from
the Descriptive Statistics
menu.
Second, click on right
arrow button to move
the highlighted variable
to the Dependent List.

First, click on the


variable to be included
in the analysis to
highlight it.
To select the statistics for the
output, click on the
Statistics… command button.
First, click on the
Descriptives checkbox
to select it. Clear the
other checkboxes.

Second, click on the


Continue button to
complete the request for
statistics.
To select the diagnostic charts
for the output, click on the
Plots… command button.
First, click on the
None option button
on the Boxplots panel
since boxplots are not
as helpful as other
charts in assessing
normality.

Finally, click on the


Continue button to
complete the request.

Second, click on the


Normality plots with tests Third, click on the Histogram
checkbox to include checkbox to include a
normality plots and the histogram in the output. You
hypothesis tests for may want to examine the
normality. stem-and-leaf plot as well,
though I find it less useful.
Click on the OK button to
complete the specifications
for the analysis and request
SPSS to produce the
output.
Histogram
50

40

30

20
Frequency

10
Std. Dev = 15.35
Mean = 10.7
0 N = 93.00
0.0 20.0 40.0 60.0 80.0 100.0
10.0 30.0 50.0 70.0 90.0

TOTAL TIME SPENT ON THE INTERNET


Normal Q-Q Plot of TOTAL TIME SPENT ON THE INTERNET
3

0
Expected Normal

-1

-2

-3
-40 -20 0 20 40 60 80 100 120

Observed Value
Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
TOTAL TIME SPENT
.246 93 .000 .606 93 .000
ON THE INTERNET
a. Lilliefors Significance Correction
• Three common transformations are:
• the logarithmic transformation,
• the square root transformation, and
• the inverse transformation.
• Reliability: The degree of stability exhibited
when a measurement is repeated under identical
conditions

• Validity: How well a survey measures what it sets


out to measure
• Cronbach’s alpha model
• Celebrity data
• Trustworthiness: honest, trustworthy, reliable,
sincere, dependable
• Expertise: Knowledgeable, qualified, experienced,
expert
• Attractiveness: beautiful, classy, elegant, sexy

43
• Analyze > Scale > Reliability Analysis
• Select items for analysis
• Click “Statistics” and check “item” and
“scale if item deleted”
• Click continue
• Click OK
• See Output

44
45
• Case Processing Summary – N is # of Test Takers
• Reliability Statistics – Cronbach’s Alpha is our stat, .50-.60
marginal, .61-.70 good, .71-85 very good
• Item Statistics – average response for all test takers
• Item total Statistics – use to determine which items stay, get
dropped

48
• Descriptive Statistics
• Graphical & Statistical methods
• Inferential Statistics
• Hypothesis & Hypothesis testing
• Type I and II errors
• High School and Beyond Study
http:///www.psypress.com/ibm-spss-
intro-stas
• hbsdata.sav
• 75 respondents
• Variables: grades, mathematics
achievement, demographics(father’s &
mother’s education, mathematics
attitude(14 items)..
•Descriptive Analysis
•Statistical methods(statistics)
•Graphical methods

20–
51
Levels of Scale Measurement and Suggested Descriptive Statistics

20–
52
20–54
20–
63
• Understand nature of relationship between two variables.
• To test distribution of continuous variable
for normality
• Basics of SPSS
• Data cleaning and transformation
• Normality and Reliability test
• Recode
• Select cases
• Compute

13–
76
• Research situations
• Collapsing categories
• Interval to nominal data
• Reverse coding

13–
77
13–
78
13–
79
13–
80
Data manipulation – compute new
variable

.
13–
82
Data manipulation – select
cases
13–
84
85
Test of Association: Chi-square test
applications
•Goodness of fit of distribution
•Test of independence of
attributes
•Test of homogenity
(Oi  E i )²
χ²  
Ei
χ² = chi-square statistic
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell

Ri C j
E ij 
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size 22–
89
d.f.=(R-1)(C-1)

22–
90
Which soap-powder name do shoppers like
best?
Number of shoppers picking each name
(observed frequencies):
• Washo Scruba Musty Stainzoff Beeo total
40 35 5 10 10 100

Expected frequency for each category


is
total no.observations / number of categories

100 / 5 = 20.
Example
 O  E 
2 2

  E
Washo Scruba Musty Stainzoff Beeo total
O: 40 35 5 10 10 100
E: 20 20 20 20 20
100

(O-E): 20 15 -15 -10 -10

(O-E) 22 400 225 225 100 100


O  E 
E 20 11.25 11.25 5 5
Tables show how likely various values of 2 are to occur
by chance. e.g.:
probability level:
• d.f. .05 .01 .001
• 1 3.84 6.63 10.83
• 2 5.99 9.21 13.82
• 3 7.81 11.34 16.27
• 4 9.49 13.28 18.46
• 5 11.07 etc. etc.
• 52.5 is bigger than 18.46, a value of 2 which will
occur by chance less than 1 times in a 1000 (p<.001).
• The chi-square test for goodness-of-fit uses
frequency data from a sample to test hypotheses
about the shape or proportions of a population.
• Each individual in the sample is classified into
one category on the scale of measurement.
• The data, called observed frequencies, simply
count how many individuals from the sample are
in each category.

95
•Testing hypothesis about
relationship between two
categorical variables.

96
Is there an association between gender (male
or female ) and soap powder (Washo, Musty,
etc)
• Data for a random sample of 100 shoppers, 70
men and 30 women
• This gives a 2 x 5 contingency table.
• To calculate expected frequencies:

• E = row total * column total


grand total
Using exactly the same formula as before, we get 2 =
52.94.

d.f. = (number of rows - 1) * (number of columns - 1).


We have two rows and five columns,
so d.f. = (2-1) * (5-1) = 4 d.f.

Use the same table to assess the chances of obtaining a


Chi-Squared value as large as this by chance; again p<
.001.

Conclusion: our observed frequencies are significantly


different from the frequencies we would expect to obtain
if there were no association between the two variables:
• Do males and females differ on whether they
have high or low maths grade?
Z

• Z-Test for Differences of Proportions


• Tests the hypothesis that proportions are significantly different
for two independent samples or groups.
• Requires a sample size greater than thirty.

• The hypothesis is: Ho: π1 = π2


may be restated as: Ho: π1 - π2 = 0

22–
104
Test of Difference: ANOVA, MANOVA,
MACOVA)
• ANOVA means Analysis of Variance
• ANOVA: compare means of two or more
levels of the independent variable

106
• The partitioning of the total sum of
Independent
squares of deviations variable 1

Independent
Total sum of variable 2
Squares of
deviations of DV
Independent
variable 3

Error

107
F
• F-Test
• Used to determine whether there is more variability in the scores
of one sample than in the scores of another sample.
• Variance components are used to compute F-ratios
• SSE, SSB, SST

Variance  between  groups


F
Variance  within  groups

22–
108
• Research design
• Between-subjects design*: different individuals are assigned to
different groups (level of independent variable).
• Within-subjects design: all the participants are exposed to several
conditions.

109
• Data considerations
• Independent variable (factor variable)
is categorical.
• Dependent variable should be
quantitative (interval level of
measurement).
• Variances on dependent variable are
equal across groups 110
• Research question: Is there a difference in
sedentary behavior across different grade
students?
• One independent variable: Grade with
4 levels: 9th, 10th, 11th, and 12th grade.
• One dependent variable: sedentary
behavior ( How many hours watch TV)

111
• Running a one-way between-subjects ANOVA with SPSS.
• Select Analyze General Linear Model Univariate
• Move Q81
• Move Q3r
• Click Post Hoc

112
• Post Hoc Comparisons
• This analyses assess mean differences between all combinations
of pairs of groups (6 comparions)
• If the F ratios for the independent variable is significant
• To determine which groups differ from which
• It is a follow-up analysis
• Check Tukey checkbox
• Click Continue

113
• Options

In the Display box: check


•Descriptive statistics
•Estimate of effect size
•Homogeneity test

Click Continue
Then click OK
114
• SPSS output

115
• SPSS output

The Leven’s test is about


equal variance. p = .48,
means homogeneous
variances across four
groups.

116
• SPSS output

There was a significant difference


across four grades in Q81, Q3r
accounting for 1% of the total
variance in Q81.

117
• Post Hoc Comparisons
• This analyses assess mean differences between all combinations
of pairs of groups (6 comparions)
• If the F ratios for the independent variable is significant
• To determine which groups differ from which
• It is a follow-up analysis
• Check Tukey checkbox
• Click Continue

118
• If the overall F is statistically significant, which pair of means
are significantly different?

22–
119
• SPSS output

120
• Results
The one-way ANOVA test showed there was a statistically
significant difference across grade levels in sedentary
behavior, F (3, 15709) = 26.86, p <.01, partial η2 = .01. A Tukey
HSD test indicated that 9th (M = 3.91, SD = 1.76) and 10th (M =
3.83, SD= 1.76) graders spent more time on watching TV on
average school day than 11th (M = 3.65, SD = 1.71)and 12th (M
= 3.61, SD = 1.71) graders did (p < .01).

121
• Are there statistically significant difference
between three fathers education group on
maths achievement score?
• The two-way ANOVA compares the mean differences between
groups that have been split on two independent variables
(called factors).

• The primary purpose of a two-way ANOVA is to


understand if there is an interaction between the two
independent variables on the dependent variable.

22–
124
• Do math grades and gender seem to have statistical significant
effect on math achievement score?

22–
125
22–
126
22–
127
22–
128
MANOVA

• Multivariate Analysis of Variance usually known


as MANOVA could be considered as an extension
of ANOVA.
• In MANOVA the dependent variable is more than
one and independent variable are categorical.
• MANOVA is categorized under dependence
multivariate technique that measures
differences for two or more metric-dependent
variables with respect to a set of categorical
(non-metric) independent variables.
Characteristics of MANOVA
• In MANOVA the dependent variable is more than one
and the independent variables are categorical.
• The general specification of MANOVA is as follows:
Y1 + Y2 +…. + Yn = X1 + X2 + … + Xn
where Y i s are measured in metric scale of
measurement and Xi s are measured in non-metric
scale of measurement.
• Output for MANOVA is difference in dependent
variables across a group of non-metric dependent
variables.
DATA STRUCTURE AND SAMPLE SIZE IN
MANOVA
• MANOVA techniques are used in many experimental designs
where treatments that are measured and represented
(quantified) in non-metric scale of measurement
• The scale is classificatory or nominal and are considered to be the
independent variables in MANOVA
• The differences in the metric-dependent variable across the
groups, that is, the independent Variables highlighted in
MANOVA.
• However. it may be mentioned here that in many practical
situations data on ordinal scale of measurement, which are also
referred to as data measured in non-metric scale, are used in the
dependent side along with the metric data.
DATA STRUCTURE AND SAMPLE SIZE IN
MANOVA

• Sample size for every empirical research has a


bearing on the validity, reliability, and power to
generalize the outcome. The sample size refers
to the number of observation to be included in
the exercise.
• In MANOVA the sample size requirement may
relate to the individual groups and the total
sample size.
• Some of the thumb rules for sample sizes to
conduct MANOVA are as follows:
SAMPLE SIZE FOR MANOVA

• The minimum number of observations in each group should


be greater than the number of dependent variables used in
MANOVA. It could be the minimum requirement. It is better
to have at least 20 observations for each group.
• Thus the total sample size could be the sum total of the
number of observations in each group.
• For example, suppose we are trying to find the differences
of technology in production of a few manufacturing plants.
if we have three technologies in the exercise and we have 20
observations for each group the total observations will have
to be 60 in number. This could be considered to be a good
sample size for an analysis.
• 3. The sample sizes across the groups should be more or
less equal for better power of generalization of the results.
Steps in Estimation of MANOVA
(Using SPSS)
• The steps relate to the collection of a relevant data set
suit-able for the analysis of MANOVA, tabulating them in
a proper format and following the sequence of the
options to be used while using the software package.
• It may be mentioned again that as MANOVA uses
multiple variables in both the sides, dependent and
independent.
• Thus the data may be arranged in that order.
• In the dependent side those variables are included that
are measured in ratio, interval, or ordinal scale and the
independent variables are normally measured in nominal
or classificatory scale.
• These data are tabulated accordingly and entered in the
spread sheet of the statistical package. After the data
entry the sequence is followed to estimate MANOVA.
1. Select Analyze. Select General Linear Model and then
Multivariate followed by MANOVA.
Interpretation of output
• Interpretation of the output for MANOVA may
depend on the objectives of the researcher and the
options opted for during the process of estimation.
A few output and their interpretations are as
follows:
1. SPSS MANOVA output: descriptive statistics
This output gives mean and SD of both the
dependent variables across the independent
variables. The differences may be observed from
this table of out put.
•. Box's test of equality of covariance matrices. This
statistics tests the null hypothesis that the observed
covariance matrices of the dependent variables are equal
across groups.
3. other relevant ones for interpretation are the Pillai’s
trace and and Wilks' lambda.
These tests are used to test the null hypotheses of equality
in the mean and variances across the
Groups. The significance of the F-values are considered to
rest the hypothesis.
4 The Levene’s test (F- Test) is used to test the null
hypothesis that the error variance of the dependent
variable is equal across groups.
• A screen appears from where Y1, Y2 can be selected as dependent
variables and X1, X2 as fixed factor (independent variables).
Choose Descriptive Statistics, Estimates of Effect Size, Observed
Power, and Homogeneity . Test from Display Block and select
Continue
Interpretation of output
1. SPSS MANOVA output: descriptive statistics
This output gives mean and SD of both the dependent
variables across the independent variables. The
differences may be observed from this table of out
put.
2. Box's test of equality of covariance matrices. This
statistics tests the null hypothesis that the observed
covariance matrices of the dependent variables are equal
across groups.
• Other relevant ones for interpretation are the Pillai’s trace and and
Wilks' lambda. These tests are used to test the null hypotheses of
equality in the mean and variances across the Groups. The
significance of the F-values are considered to rest the hypothesis.
• The Levene’s test (F- Test) is used to test the null hypothesis that
the error variance of the dependent variable is equal across
groups.
Post Hoc Test
• In MANOVA we test that whether the dependent variables
have variability across the independent variables.
• Post hoc test are statistical tests of mean differences
performed after the statistical tests mentioned earlier have
been performed.
• The post hoc test are used to find differences among all
possible combinations of groups used in the analysis. It gives a
lot of information on the pairs of the variables by multiple
statistical tests.
• Type I error could be more in such cases. However, the
researcher uses a high degree of confidence level for the test
of the hypotheses. For example, the significance levels close
to zero may be opted for while accepting or rejecting the null
hypotheses.
• If you want to know from the data whether the individual
dependent variable varies across the independent variables, select
Post Hoc. A table (box) appears.
• Send the independent variables to Post Hoc Test For.
• Select LSD from Equal Variances Assumed Block.
• Select Continue. The main screen appears again.
• Select OK on the main screen.
• Output window opens.
MANCOVA

• In ANOVA and MANOVA metric dependent variables and non-


metric independent variables are used. The independent
variables are normally measured in the nominal scale
signifying the groups. .
• ANOVA and ANCOVA differ in that in the later metric
independent variables are also used.
• The same logic is applied in cases of MANOVA.
• We know there could be more than one dependent variables
and one or many categorical variables in MANOVA.
• Now if we introduce a metric variable in the independent side
the technique conforms to MANCOVA.
MANCOVA

• MANCOVA could be considered to be an extension of


MANOVA where along with the categorical independent
variables some metric independent variables are also used.
• In this context, the concern of the researcher is the number of
covariates he can use in a model.
• There is a thumb rule that suggests about the maximum
number of covariates.
The interpretation of the statistics are similar to MANOVA.
Non-parametric tests
• ‘Non-parametric’ tests were developed for
these situations where fewer assumptions have
to be made
• Sometimes called Distribution-free tests
• Non Parametric tests STILL have assumptions
but are less stringent
• NP tests can be applied to Normal data but
parametric tests have greater power IF
assumptions met
A positively skewed
distribution
20

15
Frequency

10

Mean = 8.03
Std. Dev. = 12.952
N = 30
0
0 10 20 30 40 50
Units of alcohol per week
Ranks
• Practical differences between
parametric and NP are that NP
methods use the ranks of values
rather than the actual values
• E.g.
1,2,3,4,5,7,13,22,38,45 - actual
1,2,3,4,5,6, 7, 8, 9,10 - rank
Median
• The median is the value above and below which 50% of the
data lie.
• If the data is ranked in order, it is the middle value
• In symmetric distributions the mean and median are the
same
• In skewed distributions, median more appropriate
• Cross-Tabulation (Contingency) Table
• A joint frequency distribution of observations on two more variables.
• χ2 Distribution
• Provides a means for testing the statistical significance of a
contingency table.
• Involves comparing observed frequencies (Oi) with expected
frequencies (Ei) in each cell of the table.
• Captures the goodness- (or closeness-) of-fit of the observed
distribution with the expected distribution.

22–
155
•Goodness of fit of distribution
•Test of independence of
attributes
•Test of homogenity
(Oi  E i )²
χ²  
Ei
χ² = chi-square statistic
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell

Ri C j
E ij 
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size 22–
158
d.f.=(R-1)(C-1)

22–
159
Which soap-powder name do shoppers like
best?
Number of shoppers picking each name
(observed frequencies):
• Washo Scruba Musty Stainzoff Beeo total
40 35 5 10 10 100

Expected frequency for each category


is
total no.observations / number of categories

100 / 5 = 20.
Example
 O  E 
2 2

  E
Washo Scruba Musty Stainzoff Beeo total
O: 40 35 5 10 10 100
E: 20 20 20 20 20
100

(O-E): 20 15 -15 -10 -10

(O-E) 22 400 225 225 100 100


O  E 
E 20 11.25 11.25 5 5
Tables show how likely various values of 2 are to occur
by chance. e.g.:
probability level:
• d.f. .05 .01 .001
• 1 3.84 6.63 10.83
• 2 5.99 9.21 13.82
• 3 7.81 11.34 16.27
• 4 9.49 13.28 18.46
• 5 11.07 etc. etc.
• 52.5 is bigger than 18.46, a value of 2 which will
occur by chance less than 1 times in a 1000 (p<.001).
• The chi-square test for goodness-of-fit uses
frequency data from a sample to test hypotheses
about the shape or proportions of a population.
• Each individual in the sample is classified into
one category on the scale of measurement.
• The data, called observed frequencies, simply
count how many individuals from the sample are
in each category.

164
•Testing hypothesis about
relationship between two
categorical variables.

165
Is there an association between gender (male
or female ) and soap powder (Washo, Musty,
etc)
• Data for a random sample of 100 shoppers, 70
men and 30 women
• This gives a 2 x 5 contingency table.
• To calculate expected frequencies:

• E = row total * column total


grand total
Using exactly the same formula as before, we get 2 =
52.94.

d.f. = (number of rows - 1) * (number of columns - 1).


We have two rows and five columns,
so d.f. = (2-1) * (5-1) = 4 d.f.

Use the same table to assess the chances of obtaining a


Chi-Squared value as large as this by chance; again p<
.001.

Conclusion: our observed frequencies are significantly


different from the frequencies we would expect to obtain
if there were no association between the two variables:
• Do males and females differ on whether they
have high or low maths grade?
Z

• Z-Test for Differences of Proportions


• Tests the hypothesis that proportions are significantly different
for two independent samples or groups.
• Requires a sample size greater than thirty.

• The hypothesis is: Ho: π1 = π2


may be restated as: Ho: π1 - π2 = 0

22–
173
Z

• Z-Test statistic for differences in large random samples:

Z 
 p1  p 2    1   2 
S p1  p 2
p1 = sample portion of successes in Group 1
p2 = sample portion of successes in Group 2
1  1) = hypothesized population proportion 1
minus hypothesized population proportion 2
Sp1-p2 = pooled estimate of the standard errors of
differences of proportions
22–
174
Parametric / Non-parametric

Parametric Tests Non-parametric Tests

Paired sample t-test Paired Wilcoxon-signed rank

2 independent samples t-test

One-way Analysis of Variance

Pearson’s correlation
Wilcoxon tests
• Frank Wilcoxon was Chemist
In USA who wanted to develop
test similar to t-test but without requirement of Normal
distribution
• Presented paper in 1945
• Wilcoxon Signed Rank Ξ paired t-test
Wilcoxon Signed Rank Test
• NP test relating to the median as measure of central
tendency
• The ranks of the absolute differences between the data
and the hypothesised median calculated
• The ranks for the negative and the positive differences
are then summed separately (W- and W+ resp.)
• The minimum of these is the test statistic, W
The Wilcoxon Signed Rank Test
Example
• Example using SPSS:
A group of 10 patients with chronic anxiety receive
sessions of cognitive therapy. Quality of Life scores
are measured before and after therapy.
Wilcoxon Signed Rank Test example

QoL Score
Before After Diff Rank -/+
6 9 3 5.5 +
W- = 2
5 12 7 10 + W+ = 7
3 9 6 9 + 1 tied
4 9 5 8 +
2 3 1 4 +
1 1 0 3 tied
3 2 -1 2 -
8 12 4 7 +
6 9 3 5.5 +
12 10 -2 1 -
Wilcoxon Signed Rank
Test(using SPSS)
• Are mother’s and father’s education levels
different?
Wilcoxon Signed Rank
Test example
p < 0.05
Parametric / Non-parametric

Parametric Tests Non-parametric Tests

Paired sample t-test Paired Wilcoxon-signed rank

2 independent samples t-test Mann-Whitney test (Note:


sometimes called Wilcoxon Rank
Sum test!)
One-way Analysis of Variance

Pearson’s correlation
Mann-Whitney test Ξ Wilcoxon Rank
Sum
HB Mann
• Used when we want to compare two unrelated or
INDEPENDENT groups
• For parametric data you would use the unpaired
(independent) samples t-test
• The assumptions of the t-test were:
1. The distribution of the measure in each group is approx
Normally distributed
2. The variances are similar
Example

The following data shows the number


of beverage units consumed per week collected
in a survey:

Men (n=13): 0,0,1,5,10,30,45,5,5,1,0,0,0


Women (n=14): 0,0,0,0,1,5,4,1,0,0,3,20,0,0

Is the amount greater in men compared


to women?
Example
How would you test whether the
distributions in both groups are
approximately Normally distributed?

 Plot histograms
 Stem and leaf plot
 Box-plot
 Q-Q or P-P plot
Boxplots of alcohol units per week by gender

50

40
Units of alcohol per week

6
30

25
20

10

Male Female
Gender
Example (3)
Are those distributions symmetrical?

Definitely not!

They are both highly skewed so not


Normal. If transformation is still not Normal
then use non-parametric test – Mann Whitney

Suggests perhaps that males tend to


have a higher intake than women.
Mann-Whitney on SPSS
Mann-Whitney on
SPSS
•Do boys and girls differ
significantly on maths grade?
Normal approx (NS)

Mann-Whitney (NS)
Chi-squared test
• Used when comparing 2 or more groups of categorical
or nominal data (as opposed to measured data)
• Already covered!
• In SPSS Chi-squared test is test of observed vs.
expected in single categorical variable
Parametric / Non-parametric

Parametric Tests Non-parametric Tests

Paired sample t-test Paired Wilcoxon-signed rank

2 independent samples t-test Mann-Whitney test (Note:


sometimes called Wilcoxon Rank
Sum test!)
One-way Analysis of Variance Kruskal-Wallis

Pearson’s correlation
More than 2 groups
• So far we have been comparing 2
groups
• If we have 3 or more independent
groups and data is not Normal we need
NP equivalent to ANOVA
• If independent samples use Kruskal-
Wallis
• Same assumptions as before
More than 2 groups
(a) Kruskal-Wallis test:
• Similar to the Mann-Whitney test, except it
enables you to compare three or more groups
rather than just two.
• Different subjects are used for each group.
(b) Friedman's Test:
• Similar to the Wilcoxon test, except you can use
it with three or more conditions.
• Each subject does all of the experimental
conditions.
Kruskal-Wallis test
Example
Does it make any difference to students’
comprehension of statistics whether the lectures
are given in English, Spanish or French?

Group A: lectures in English;


Group B: lectures in Spanish
Group C: lectures in French.

DV: student rating of lecturer's intelligibility on


100-point scale ("0" = "incomprehensible").

Ratings - so use a nonparametric test.


English English Spanish Spanish French (raw Cantonese
(raw score) (rank) (raw score) (rank) score) (rank)

20 3.5 25 7.5 19 1.5


27 9 33 10 20 3.5
19 1.5 35 11 25 7.5
23 6 36 12 22 5

M = 22.25 M = 32.25 M = 21.50


SD = 3.59 SD = 4.99 SD = 2.65

Step 1:
•Rank the scores, ignoring which group they belong to.
•Lowest score gets lowest rank.
•Tied scores get the average of the ranks they would otherwise
have obtained.
Step 2:
Find "Tc", the total of the ranks for each group.

Tc1 (the total for the English group) is 20.

Tc2 (for the Spanish group) is 40.5.

Tc3 (for the French group) is 17.5.


Step 3:

Find H.

 12 Tc 2
H     3   N  1
 N  N  1 nc 

N is the total number of subjects;


Tc is the rank total for each group;
nc is the number of subjects in each group.
 12 Tc 2
H      3  N  1
 N N  1 nc 

2 2 2 2
Tc 20 40.5 17.5
     586.62
nc 4 4 4

 12   (
H     586.62   3  13 )  6.12
 12 * 13  
Step 4:
Degrees of freedom are the number of groups minus one. Here, d.f. = 3 - 1 = 2.

Step 5:
Assessing the significance of H depends on the number of participants and the
number of groups.

(a) If you have 3 groups and N in each group is 5 or less:


Special tables exist for small sample sizes – but you really should run more
participants!

(b) If N in each group is larger than 5:


Treat H as Chi-Square.
H is statistically significant if it is larger than the critical value of Chi-Square for
these d.f.

Here, H is 6.12.
For 2 d.f., a Chi-Square of 5.99 has a p = .05 of occurring by chance.
Our H is bigger than 5.99, and so even less likely to occur by chance!
H is 6.12, p <.05

Conclusion:

The three groups differ significantly; the language in which statistics is taught
does make a difference to the lecturer's intelligibility.

NB: the test merely tells you that the three groups differ; inspect the group
means or medians to decide how they differ.
English M = 22.25, Spanish = M = 32.25, French M = 21.50.

Looks like lectures are more intelligible in Spansiht than in either English or
French (which are similar to each other).
Using SPSS for the Kruskal-Wallis test:
Analyze > Nonparametric tests > Legacy Dialogs > k independent samples...

Independent measures -
one column gives scores, another column identifies which group each score belongs to:
“1” for “English”, “2” for “Spanish”, “3” for “French”.
Kruskal-Wallis test
(using SPSS)
• Are there statistically significant
difference among the three father’s
education groups on the competence
scale?
Using SPSS for the Kruskal-Wallis test :
SPSS output for Kruskal-Wallis test :

Ranks Test Statisticsa,b


language N Mean Rank intelligibility
intelligibility English 4 5.00
Chi-Square 6.190
Serbo-croat 4 10.13
Cantonese 4 4.38 df 2
Total 12 Asymp. Sig. .045
a. Kruskal Wallis Test
b. Grouping Variable: language

Students’ intelligibility ratings were significantly


affected by which language statistics was taught
in, 2 (2) = 6.19, p = .04.
• Used to predict a discrete outcome based on variables which
are discrete, continuous or mixed
• Types of Hypothesis
• Hypothesis-testing procedure
• Significance Levels and p-values
• Decision Making Risks
• Parametric versus Nonparametric
Tests
• Reliability is assessed many forms
• Test-retest reliability
• Interrater reliability
• Alternate-form reliability
• Internal consistency reliability
Questionnaire 9/20 Questionnaire 9/27
4 4
___ I feel I do not have much proud of. ___ I feel I do not have much proud of.
3 4
___ On the whole, I am satisfied with ___ On the whole, I am satisfied with myself
myself
2 1
___ I certainly feel useless at times
___
1 I certainly feel useless at times 1
___ At times I think I am no good at all
___
4 At times I think I am no good at all 4
___ I have a number of good qualities
___
3 I have a number of good qualities 4
___ I am able to do things as well as others
___ I am able to do things as well as
others
Alternate-form reliability
Example: Assessment of depression
Circle one item
Version A:
During the past 4 weeks, I have felt downhearted:
Every day 1
Some days 2
Never 3

Version B:
During the past 4 weeks, I have felt downhearted:
Never 1
Some days 2
Every day 3
Interrater Reliability
The extent to which the scores
counted by coders correlate
each other.

How Do You Measure


Interrater Reliability?

Aggression Code
Coder 1 Coder 2
Hit boy A _____ ______
Hit boy B ______ ______
Cohen’s Kappa
Hit girl A ______ ______
Hit girl B ______ ______ 1 3
3 3
3 2
1 1
• Applied not to one item, but to groups of items
that are thought to measure different aspects of
the same concept
t

Questionnaire 1 Questionnaire 1
Item 1 Test-Retest Reliability
Item 1
Item 2 Item 2

Item 3 Item 3

Reliability as
Internal Consistency

Questionnaire 2
Item 1
Item 2

Equivalent-Forms
Item 3
Reliability

Interrater Reliability
• Cronbach’s alpha model
• Celebrity data
• Trustworthiness: honest, trustworthy, reliable,
sincere, dependable
• Expertise: Knowledgeable, qualified, experienced,
expert
• Attractiveness: beautiful, classy, elegant, sexy

220
• Analyze > Scale > Reliability Analysis
• Select items for analysis
• Click “Statistics” and check “item” and
“scale if item deleted”
• Click continue
• Click OK
• See Output

221
222
• Case Processing Summary – N is # of Test Takers
• Reliability Statistics – Cronbach’s Alpha is our stat, .50-.60
marginal, .61-.70 good, .71-85 very good
• Item Statistics – average response for all test takers
• Item total Statistics – use to determine which items stay, get
dropped

225
• Validity is measured in many forms
• Face validity
• Predictive validity
• Content validity
• Construct validity
• The extent to which the measured variable
appears to be an adequate measure of the
conceptual variable.
I don’t like Japanese
Strongly Disagree 1 2 3 4 5 6 7 8 Strongly Agree

Discrimination
towards Japanese

X
Measured Conceptual
Variable Variable
• The degree to which the measured variable
appears to have adequately sampled from the
potential domain of question that might relate to
the conceptual variable of interest
Sympathy

Verbal Aptitude

Intelligence Math Aptitude


•Predictive validity
•Concurrent validity
• Most valuable and most difficult measure
of validity
• Basically, it is a measure of how
meaningful the scale or instrument is
when it is in practical use
• Convergent validity
• Discriminant validity
• Null Hypothesis
• Statement about the status quo.
• No difference in sample and
population.

• Alternative Hypothesis
• Statement that indicates the
opposite of the null hypothesis.
• Relational hypotheses
• Examine how changes in one variable vary with
changes in another.
• Hypotheses about differences between groups

21–237
• Examine how some variable varies from one
group to another.
• Hypotheses about differences from some
standard
• Examine how some variable differs from some
preconceived standard.
• Univariate Statistical Analysis
• Tests of hypotheses involving only one
variable.
• Testing of statistical significance
• Bivariate Statistical Analysis
• Tests of hypotheses involving two variables.
• Multivariate Statistical Analysis
• Statistical analysis involving three or more
variables or sets of variables.
Explore relationships between variables Description
General
purpose

Specific Strength of Summarize


Compare Groups
Purpose association data

Difference Associations Descriptive


hypothesis

Correlation, Mean,
Statistics t-test, ANOVA regression percentage,
range
• The specifically stated hypothesis is derived from
the research objectives.
• A sample is obtained and the relevant variable is
measured.
• The measured sample value is compared to the
value either stated explicitly or implied in the
hypothesis.
• If the value is consistent with the hypothesis,
the hypothesis is supported.
• If the value is not consistent with the
hypothesis, the hypothesis is not supported.
• Significance Level
• p-value

21–
244
• Significance Level
• A critical probability associated with a statistical
hypothesis test that indicates how likely an inference
supporting a difference between an observed value
and some statistical expectation is true.
• The acceptable level of Type I error.

21–245
• p-value
• Probability value, or the observed or computed
significance level.
• p-values are compared to significance levels to test
hypotheses.
• Higher p-values equal more support for an
hypothesis.
p-Values and Statistical Tests

21–246
• Type I Error
• An error caused by rejecting the null
hypothesis when it is true.
• Has a probability of alpha (α).
• Practically, a Type I error occurs when the
researcher concludes that a relationship or
difference exists in the population when in
reality it does not exist.
• Type II Error
• An error caused by failing to reject the null
hypothesis when the alternative hypothesis is
true.
• Has a probability of beta (β).
• Practically, a Type II error occurs when a
researcher concludes that no relationship or
difference exists when in fact one does exist.
Type I and Type II Errors in Hypothesis Testing
• Type of question to be answered
• Number of variables involved
• Level of scale measurement

21–254
• Parametric Statistics
• Involve numbers with known, continuous
distributions.
• Appropriate when:
• Data are interval or ratio scaled.

21–255
• Sample size is large.
• Nonparametric Statistics
• Appropriate when the variables being analyzed
do not conform to any known or continuous
distribution.
t test
• t-test
• A hypothesis test that uses the t-
distribution.
x  H
t
sx
t  test statistic
x  sample mean
 H  hypothesized population mean 21–

s x  standard error of the mean 257


21–258
21–259
t-test Comparing Group Means
•One sample t-test( Univariate
testing
•Independent samples t-test
(Bivariate Analysis)
•Paired sample t-test (Bivariate
Analysis)
• High School and Beyond Study
http:///www.psypress.com/ibm-spss-
intro-stas
• hbsdata.sav
• 75 respondents
• Variables: grades, mathematics
achievement, demographics(father’s &
mother’s education, mathematics
attitude(14 items)..
t

Hypothesized
population mean (and
sampling distribution)

4.78
1 2 3 4 5
Not Very
Important Important
• Is the mean SAT score statistically
different from presumed population
mean 500?
t
“Grandville”

“Branson”

3.70 4.78
1 2 3 4 5
Not Very
Important Important

“How important is the Patronage Refund Program to you as a member/borrower with FCS?
• Do male and female students differ in
regard to their average maths
achievement score?
t
• Permits the comparison on separate
questionnaire items from the same group of
respondents

• Allows hypothesis tests that responses to two


different questions were identical.
t
• Do student’s fathers or mothers differ
in education level?
• Reliability: The degree of stability exhibited
when a measurement is repeated under identical
conditions

• Validity: How well a survey measures what it sets


out to measure
©
1–
273
• Apply and interpret simple bivariate and partial
correlations
• Interpret a correlation matrix
• Understand simple (bivariate) regression
• Understand the least-squares estimation
technique
• Interpret regression output including the tests of
hypotheses tied to specific parameter coefficients
23–
274
• Measures of Association
• The chi-square (2) test provides information
about whether two or more less-than interval
variables are interrelated.
• Pearson product-moment correlation analysis
• Simple Regression can accommodate either less-
than interval or interval independent variables,
but the dependent variable must be continuous.

23–
275
23–
276
• Correlation coefficient
• A statistical measure of the covariation, or association, between
two at-least interval variables.
• Covariance
• Extent to which two variables are associated systematically with
each other.

 X i  X Yi  Y 
n

i 1
rxy  r yx 
 Xi  X   Yi  Y 
n n
2 2

i 1 i 1 23–
277
• Pearson product-moment correlation coefficient (r)
• Ranges from +1 to -1
• Perfect positive linear relationship = +1
• Perfect negative (inverse) linear relationship = -1
• No correlation = 0
• Correlation coefficient for two variables (X,Y)

23–
278
EXHIBIT 23.2 Scatter Diagram to Illustrate Correlation Patterns

23–
279
• When two variables covary, they display concomitant
variation.
• This systematic covariation does not in and of itself establish
causality.
• e.g., Rooster’s crow and the rising of the sun
• Rooster does not cause the sun to rise.

23–
280
EXHIBIT 23.3

23–
281
•Bivariate
•Partial
Spearman Rank Correlation
Coefficient (rs)
It is a non-parametric measure of correlation.
This procedure makes use of the two sets of
ranks that may be assigned to the sample
values of x and Y.
Procedure:

• Rank the values of X from 1 to n where n is the


numbers of pairs of values of X and Y in the
sample.
• Rank the values of Y from 1 to n.
• Compute the value of di for each pair of
observation by subtracting the rank of Yi from
the rank of Xi
• Square each di and compute ∑di2 which is the
sum of the squared values.
Apply the following formula

The value of rs denotes the magnitude


and nature of association giving the same
interpretation as simple r.
• Coefficient of Determination (R2)
• A measure obtained by squaring the correlation
coefficient; the proportion of the total variance of a
variable accounted for by another value of another
variable.
• Measures that part of the total variance of Y that is
accounted for by knowing the value of X.

2Explained variance
R 
Total Variance
23–
289
• Correlation matrix
• The standard form for reporting
correlation coefficients for more than
two variables.
• Statistical Significance
• The procedure for determining statistical
significance is the t-test of the
significance of a correlation coefficient.
23–
290
EXHIBIT 23.4 Pearson Product-Moment Correlation Matrix for Salesperson Examplea

23–
291
• Simple (Bivariate) Linear Regression
• A measure of linear association that investigates
straight-line relationships between a continuous
dependent variable and an independent variable that
is usually continuous, but can be a categorical dummy
variable.
• The Regression Equation (Y = α + βX )
• Y = the continuous dependent variable
• X = the independent variable
• α = the Y intercept (regression line intercepts Y axis)
• β = the slope of the coefficient (rise over run) 23–
292
Y
130

120

110

100 Yˆ  aˆ  ̂X
90 Yˆ
80
X
X
80 90 100 110 120 130 140 150 160 170 23–
293
• Parameter Estimate Choices
• β is indicative of the strength and direction of the
relationship between the independent and dependent
variable.
• α (Y intercept) is a fixed point that is considered a
constant (how much Y can exist without X)
• Standardized Regression Coefficient (β)
• Estimated coefficient of the strength of relationship
between the independent and dependent variables.
• Expressed on a standardized scale where higher absolute
values indicate stronger relationships (range is from -1 to
1).
23–
294
• Parameter Estimate Choices
• Raw regression estimates (b1)
• Raw regression weights have the advantage of retaining the scale
metric—which is also their key disadvantage.
• If the purpose of the regression analysis is forecasting, then raw
parameter estimates must be used.
• This is another way of saying when the researcher is interested only
in prediction.
• Standardized regression estimates (β)
• Standardized regression estimates have the advantage of a constant
scale.
• Standardized regression estimates should be used when the
researcher is testing explanatory hypotheses.

23–
295
EXHIBIT 23.5 The Advantage of Standardized Regression Weights

23–
296
EXHIBIT 23.7 The Best Fit Line or Knocking Out the Pins

23–
297
• OLS
• Guarantees that the resulting straight line will produce the least
possible total error in using X to predict Y.
• Generates a straight line that minimizes the sum of squared deviations
of the actual values from this predicted regression line.
• No straight line can completely represent every dot in the scatter
diagram.
• There will be a discrepancy between most of the actual scores (each
dot) and the predicted score .
• Uses the criterion of attempting to make the least amount of total
error in prediction of Y from X.

23–
298
23–
299
The equation means that the predicted value for any value
of X (Xi) is determined as a function of the estimated slope
coefficient, plus the estimated intercept coefficient + some
error.

23–
300
23–
301
• Statistical Significance Of Regression Model
• F-test (regression)
• Determines whether more variability is explained by the regression or
unexplained by the regression.

23–
302
• Statistical Significance Of Regression Model
• ANOVA Table:

23–
303
• R2
• The proportion of variance in Y that is explained by X
(or vice versa)
• A measure obtained by squaring the correlation
coefficient; that proportion of the total variance of a
variable that is accounted for by knowing the value of
another variable.

3,398 . 49
R 
2
 0 . 875
3,882 . 40
23–
304
EXHIBIT 23.6 Relationship of Sales Potential to Building Permits Issued

23–
305
EXHIBIT 23.8 Simple Regression Results for Building Permit Example

23–
306
23–
307
• Sample size
• Multicollinearity of independent variables
• Linearity
• Absence of outliers
• Homoscedasticity
• Normality
• Standard multiple regression
• Hierarchical regression
• Stepwise regression
LEARNING OUTCOMES

• Experiment Design and Computations: MANOVA,


MANCOVA, ANCOVA
• Cluster Analysis: A classification technique
• Interpret analysis results in SPSS

24–
314
Data
Reduction or Dimension
Reduction technique
• Data reduction tool
• Represents correlated variables with a smaller set
of “derived” variables.
• Factors are formed that are relatively
independent of one another.
• Two types of “variables”:
• latent variables: factors
• observed variables
Example: What’s Peter Griffin Like?
What’s Peter Griffin Like?

• Well, he sticks his tongue


in fans.
• He mixes his cereal with
red bull.
• He’s litters all round
Example: Psychology

• How has John been feeling recently?

• He feels sad all the time


• He talks of committing suicide
• Lost interest in activities he use to enjoy
• No motivation
Factor Analysis: Concept

LV

OV
Factor Analysis:
Concept: Covariance
• It’s about the level of association
between a set of variables!

• A correlation coefficient is a
standardised covariance.
• Types:
• Exploratory factor analysis (EFA)—performed when the
researcher is uncertain about how many factors may exist among
a set of variables.
• Confirmatory factor analysis (CFA)—performed when the

24–322
researcher has strong theoretical expectations about the factor
structure before performing the analysis.
EXHIBIT 24.6 A Simple Illustration of Factor Analysis

24–323
• Interdependency technique
• Also known as FA
• Explore the structure amongst a set of manifest or observed
variables is highlighted and no a priori reasoning is considered
regarding the interrelationship between the observed
variables.
Characteristics of EFA
• EFA seeks to resolve a large set of measured (manifest)
variables in terms of relatively few categories known as
factors which could be termed as constructs.
• There is no criterion or predictor subsets like those in
Multiple Regression since it is an interdependency
multivariate technique.
• It examines the overall association amongst variables
• It is based on linear correlation and assumes data in
metric scale( interval or ratio). However, ordinal scale
data are very often used.
• Subjectivity is involved in naming the factor (latent
variable) or the constructs.
• Factor
• Factor Loadings
• Communality
• EIGEN value
• KMO Statistic
• Bartlett’s Test of Sphericity
A few important terms used in
Exploratory Factor Analysis
• Factor: It is an underlying dimension that accounts for
several observed variables.

• Factor Loadings: These values appear in the component


matrix. They explain how closely the variables are related to
each one of the factor discovered. It is also known as factor
variable correlation.

• Communality (h2): It shows how much each variable is


accounted for by the underlying factor(s) taken into
consideration.
It is the summation of factor loading squares on all factors
extracted in case of a variable. Thus, there is a difference in
the initial and extracted communalities.
• Factor analysis usually proceeds in four
steps:
• 1st Step: the correlation matrix for all
variables is computed

© Dr. Maher Khelifa


• 2nd Step: Factor extraction
• 3rd Step: Factor rotation
• 4th Step: Make final decisions about the
number of underlying factors
329
• Correlation matrix
• Bartlett Test of Sphericity: test the
hypothesis the correlation matrix is an
identity matrix

© Dr. Maher Khelifa


• The Kaiser-Meyer-Olkin (KMO) measure of
sampling adequacy:

331
• Bartlett Test of Sphericity:
 used to test the hypothesis the correlation matrix is an identity matrix (all
diagonal terms are 1 and all off-diagonal terms are 0).

 If the value of the test statistic for sphericity is large and the associated
significance level is small, it is unlikely that the population correlation matrix
is an identity.

© Dr. Maher Khelifa


• If the hypothesis that the population correlation matrix is an identity
cannot be rejected because the observed significance level is large, the
use of the factor model should be reconsidered.

332
• The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:
 is an index for comparing the magnitude of the observed correlation
coefficients to the magnitude of the partial correlation coefficients.

 The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8
and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is
unaccaptable ).

© Dr. Maher Khelifa


 Reasonably large values are needed for a good factor analysis. Small KMO
values indicate that a factor analysis of the variables may not be a good idea.

333
• Extraction refers to the process of obtaining the underlying
factors or components.
 To decide on how many factors
we need to represent the data, Total Variance Explained

we use 2 statistical criteria: Initial Eigenvalues


Extraction Sums of Squared
Loadings

 Eigen Values, and Comp


onent Total
% of
Variance
Cumulativ
e% Total
% of
Variance
Cumulativ
e%

 The Scree Plot. 1 3.046 30.465 30.465 3.046 30.465 30.465


2 1.801 18.011 48.476 1.801 18.011 48.476
3 1.009 10.091 58.566 1.009 10.091 58.566

© Dr. Maher Khelifa


4 .934 9.336 67.902
5 .840 8.404 76.307
6 .711 7.107 83.414
7 .574 5.737 89.151
8 .440 4.396 93.547
9 .337 3.368 96.915
10 .308 3.085 100.000
Extraction Method: Principal Component Analysis.

335
 The examination of the Scree plot
provides a visual of the total
variance associated with each
factor.

 The steep slope shows the large

© Dr. Maher Khelifa


factors.

336
• Two types
• Orthogonal
• Oblique
• CFA is used to test whether the factor
structure matches the results of the
original study
Structural Equations
Modeling
Nature of SEM

• Structural Equation Modeling (SEM) is


a multivariate technique that
incorporates analysis using both
dependency and interdependency
techniques and tries to validate a
structure.
Types of SEM

SEM could broadly be divided into two


types:
• covariance-based SEM or CB-SEM and
• Partial least square (PLS) approach to
SEM (PLS-SEM).
Basic Terminologies of SEM

•Manifest variables
•Latent variables
•Error
•Disturbance
•Exogenous Variables
•Endogenous or dependent
variable
Basic Terminologies of SEM (Contd..)

• Measurement model : It is a specification of the


relationship and is a part of the specified model that
relates indicators to latent variables. It is the factor
analytic part of SEM.
• Structural model: It is the part of the model that
relates variable or factors to one another. It is the
regression part of the SEM.
Basic Terminologies of SEM (Contd)

• Confirmatory Factor Analysis(CFA): This is an


extension of the EFA. It provides quantitative
measures of the reliability and validity of the
constructs used in the model. These constructs
may be the outcome of the EFA.
• Covariance Structure: This is the relationships
between the variables based on their variances
and co-variances.
• Mean structure: These are the means
(intercepts) estimated in the model.
Diagrams used in SEM

•Single-headed arrow
•Double-headed arrow
•Rectangle or Square
•Circle or Eclipse
Components of SEM

• SEM is estimated in different stages.


It consists of two major
components
• Measurement model (CFA) and
• Structural model (path analysis).
Statistical Tests in SEM

• Goodness of fit is used to find how much


is the estimated covariance matrix equal
to observed covariance matrix.
• Validity and reliability of measurement
model are tested with the relevant
statistics.
• Significant and meaningfulness of
structural relationship between the
variables are tested. .
Assessing the Measurement Model

Check the goodness of fit of the measurement model. This is


checked with a few statistics which are as follows with the
rules of thumb for goodness of fit :
• CMIN or df A value lesser than 2 is preferred, but between 2
and 5 is considered acceptable.
• CFI, GFI, and NFI A value equal to or greater than 0.90 is
preferred.
• PCFI, PGFI, and PNFI A value equal to or greater than 0.50 is
preferred.
• RMSEA A value lesser than 0.08 is preferred.
L-1(Positive
WOM)

L-
RB-1(Reduced 2(Recommend
anxiety) to others)

L3(first choice
RB- to purchase)
2(Recognition)
RB CL

L4(Do not
RB-3(Special switch)
discount)

L7(Encourage L5(Wait for


fruends ) the product)
RB-4(Better Price)
L6(Overlook
minor
To calculate discriminant scores, the
linear function used is:

24–360
Z i  b1 X 1i  b 2 X 2 i    b n X ni
Z  b1 X 1  b 2 X 2  b3 X 3
 0 .069 X 1  0 .013 X 2  0 .0007 X 3
Discriminant analysis using
SPSS
• Analyse - Classify - Discriminant
• Group Variables – Define Range – 0, 1 – OK
Example: Family Conflict and
working hours
The work- family conflict of an individual is influenced by
• The no of working hours- since increase in the office hours
would increase stress levels and leave less time for family
affairs
• The number of children- since increase in the number of
children shall lead to increase in responsibilities at home
thereby encroaching on office responsibilities
• The years of work experience- since increase in the number
of work experience shall lead to more involvement and
responsibilities at work
Work Family Conflict = -9.098 + 0.520 (no of work hours) + 0.467
(no of children) + 0.409 (no of yrs of work experience)
• This indicates that the variable “no of working hours” has the
greatest discriminate weight.
• This can be confirmed with the standardized discriminant
function.
• The Wilk’s lambda is found as 0.637 and the F value is
statistically significant at 0.001 indicating good discriminating
power of the Discriminant equation.
This table summarizes the
analysis dataset in terms of
valid and excluded cases.
In this example, 39 out of 40
observations are valid and
taken into consideration.

The discriminating power of


the three independent
variables into dividing into
the two groups (jobs) is 63.7
%.The Significance of the
discriminating function is
also very high.
This table indicates that 32 out of the 40 cases (79.5%)
have been correctly classified using the three
independent variables
The variable working hour has the
greatest discriminating power

Work experience has the maximum


shared variance with the function.
Interdependence Method
• Cluster analysis
• A multivariate approach for grouping observations based on
similarity among measured variables.

24–368
EXHIBIT 24.7 Clusters of Individuals on Two Dimensions
Distance measures for individual observations

• To measure similarity between two observations a


distance measure is needed
• With a single variable, similarity is straightforward
• Example: income – two individuals are similar if their income level is
similar and the level of dissimilarity increases as the income gap
increases
• Multiple variables require an aggregate distance

24–370
measure
• Many characteristics (e.g. income, age, consumption habits, brand
loyalty, purchase frequency, family composition, education level, ..), it
becomes more difficult to define similarity with a single value
• The most known measure of distance is the Euclidean
distance, which is the concept we use in everyday life for
spatial coordinates.
Model:
Data: each object is characterized by a set of numbers
(measurements);
e.g., object 1: (x11, x12, … , x1n)
object 2: (x21, x22, … , x2n)
: :
object p: (xp1, xp2, … , xpn)

24–371
Distance: Euclidean distance, dij,

d ij  xi1  x j1   xi 2  x j 2     xin  x jn 


2 2 2
Cluster Analysis Using SPSS

• An example using demographic variables in a


study on expenditure on clothes using about
400 sample respondent in a market segment in a
city
• The variables such as age, occupation, income
and expenditure on clothing have been used to
carve out clusters.
• Hierarchical cluster or the K-Mean cluster.
The summary of analysis for the
study is as follows:
• Cluster 1 Number of respondent : 160
Students (Post Graduate/Management), Age 20-30 yrs,
Annual spending on clothes Rs 4000/-,House hold income 8-12
lac,

• Cluster 2 Number of respondents 201


Service, Age more than 40 yrs, Spending on clothes Rs 8000/-
Household Income 8-10Lac

• Cluster 3 Number of respondents 35


Students (graduate o), Age 20-30 yrs,H. Income 8-10 lacs
Annual spending on clothes 3500/-
A few could not be classified
Data
Reduction or Dimension
Reduction technique
• Data reduction tool
• Represents correlated variables with a smaller set
of “derived” variables.
• Factors are formed that are relatively
independent of one another.
• Two types of “variables”:
• latent variables: factors
• observed variables
Example: What’s Peter Griffin Like?
What’s Peter Griffin Like?

• Well, he sticks his tongue


in fans.
• He mixes his cereal with
red bull.
• He’s litters all round
Example: Psychology

• How has John been feeling recently?

• He feels sad all the time


• He talks of committing suicide
• Lost interest in activities he use to enjoy
• No motivation
Factor Analysis: Concept

LV

OV
Factor Analysis:
Concept: Covariance
• It’s about the level of association
between a set of variables!

• A correlation coefficient is a
standardised covariance.
• Types:
• Exploratory factor analysis (EFA)—performed when the
researcher is uncertain about how many factors may exist among
a set of variables.
• Confirmatory factor analysis (CFA)—performed when the

24–386
researcher has strong theoretical expectations about the factor
structure before performing the analysis.
EXHIBIT 24.6 A Simple Illustration of Factor Analysis

24–387
• Interdependency technique
• Also known as FA
• Explore the structure amongst a set of manifest or observed
variables is highlighted and no a priori reasoning is considered
regarding the interrelationship between the observed
variables.
Characteristics of EFA
• EFA seeks to resolve a large set of measured (manifest)
variables in terms of relatively few categories known as
factors which could be termed as constructs.
• There is no criterion or predictor subsets like those in
Multiple Regression since it is an interdependency
multivariate technique.
• It examines the overall association amongst variables
• It is based on linear correlation and assumes data in
metric scale( interval or ratio). However, ordinal scale
data are very often used.
• Subjectivity is involved in naming the factor (latent
variable) or the constructs.
• Factor
• Factor Loadings
• Communality
• EIGEN value
• KMO Statistic
• Bartlett’s Test of Sphericity
A few important terms used in
Exploratory Factor Analysis
• Factor: It is an underlying dimension that accounts for
several observed variables.

• Factor Loadings: These values appear in the component


matrix. They explain how closely the variables are related to
each one of the factor discovered. It is also known as factor
variable correlation.

• Communality (h2): It shows how much each variable is


accounted for by the underlying factor(s) taken into
consideration.
It is the summation of factor loading squares on all factors
extracted in case of a variable. Thus, there is a difference in
the initial and extracted communalities.
• Factor analysis usually proceeds in four
steps:
• 1st Step: the correlation matrix for all
variables is computed

© Dr. Maher Khelifa


• 2nd Step: Factor extraction
• 3rd Step: Factor rotation
• 4th Step: Make final decisions about the
number of underlying factors
393
• Correlation matrix
• Bartlett Test of Sphericity: test the
hypothesis the correlation matrix is an
identity matrix

© Dr. Maher Khelifa


• The Kaiser-Meyer-Olkin (KMO) measure of
sampling adequacy:

395
• Bartlett Test of Sphericity:
 used to test the hypothesis the correlation matrix is an identity matrix (all
diagonal terms are 1 and all off-diagonal terms are 0).

 If the value of the test statistic for sphericity is large and the associated
significance level is small, it is unlikely that the population correlation matrix
is an identity.

© Dr. Maher Khelifa


• If the hypothesis that the population correlation matrix is an identity
cannot be rejected because the observed significance level is large, the
use of the factor model should be reconsidered.

396
• The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:
 is an index for comparing the magnitude of the observed correlation
coefficients to the magnitude of the partial correlation coefficients.

 The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8
and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is
unaccaptable ).

© Dr. Maher Khelifa


 Reasonably large values are needed for a good factor analysis. Small KMO
values indicate that a factor analysis of the variables may not be a good idea.

397
• Extraction refers to the process of obtaining the underlying
factors or components.
 To decide on how many factors
we need to represent the data, Total Variance Explained

we use 2 statistical criteria: Initial Eigenvalues


Extraction Sums of Squared
Loadings

 Eigen Values, and Comp


onent Total
% of
Variance
Cumulativ
e% Total
% of
Variance
Cumulativ
e%

 The Scree Plot. 1 3.046 30.465 30.465 3.046 30.465 30.465


2 1.801 18.011 48.476 1.801 18.011 48.476
3 1.009 10.091 58.566 1.009 10.091 58.566

© Dr. Maher Khelifa


4 .934 9.336 67.902
5 .840 8.404 76.307
6 .711 7.107 83.414
7 .574 5.737 89.151
8 .440 4.396 93.547
9 .337 3.368 96.915
10 .308 3.085 100.000
Extraction Method: Principal Component Analysis.

399
 The examination of the Scree plot
provides a visual of the total
variance associated with each
factor.

 The steep slope shows the large

© Dr. Maher Khelifa


factors.

400
• Two types
• Orthogonal
• Oblique
• CFA is used to test whether the factor
structure matches the results of the
original study
Structural Equations
Modeling
Nature of SEM

• Structural Equation Modeling (SEM) is


a multivariate technique that
incorporates analysis using both
dependency and interdependency
techniques and tries to validate a
structure.
Types of SEM

SEM could broadly be divided into two


types:
• covariance-based SEM or CB-SEM and
• Partial least square (PLS) approach to
SEM (PLS-SEM).
Basic Terminologies of SEM

•Manifest variables
•Latent variables
•Error
•Disturbance
•Exogenous Variables
•Endogenous or dependent
variable
Basic Terminologies of SEM (Contd..)

• Measurement model : It is a specification of the


relationship and is a part of the specified model that
relates indicators to latent variables. It is the factor
analytic part of SEM.
• Structural model: It is the part of the model that
relates variable or factors to one another. It is the
regression part of the SEM.
Basic Terminologies of SEM (Contd)

• Confirmatory Factor Analysis(CFA): This is an


extension of the EFA. It provides quantitative
measures of the reliability and validity of the
constructs used in the model. These constructs
may be the outcome of the EFA.
• Covariance Structure: This is the relationships
between the variables based on their variances
and co-variances.
• Mean structure: These are the means
(intercepts) estimated in the model.
Diagrams used in SEM

•Single-headed arrow
•Double-headed arrow
•Rectangle or Square
•Circle or Eclipse
Components of SEM

• SEM is estimated in different stages.


It consists of two major
components
• Measurement model (CFA) and
• Structural model (path analysis).
Statistical Tests in SEM

• Goodness of fit is used to find how much


is the estimated covariance matrix equal
to observed covariance matrix.
• Validity and reliability of measurement
model are tested with the relevant
statistics.
• Significant and meaningfulness of
structural relationship between the
variables are tested. .
Assessing the Measurement Model

Check the goodness of fit of the measurement model. This is


checked with a few statistics which are as follows with the
rules of thumb for goodness of fit :
• CMIN or df A value lesser than 2 is preferred, but between 2
and 5 is considered acceptable.
• CFI, GFI, and NFI A value equal to or greater than 0.90 is
preferred.
• PCFI, PGFI, and PNFI A value equal to or greater than 0.50 is
preferred.
• RMSEA A value lesser than 0.08 is preferred.
L-1(Positive
WOM)

L-
RB-1(Reduced 2(Recommend
anxiety) to others)

L3(first choice
RB- to purchase)
2(Recognition)
RB CL

L4(Do not
RB-3(Special switch)
discount)

L7(Encourage L5(Wait for


fruends ) the product)
RB-4(Better Price)
L6(Overlook
minor
To calculate discriminant scores, the
linear function used is:

24–424
Z i  b1 X 1i  b 2 X 2 i    b n X ni
Z  b1 X 1  b 2 X 2  b3 X 3
 0 .069 X 1  0 .013 X 2  0 .0007 X 3
Discriminant analysis using
SPSS
• Analyse - Classify - Discriminant
• Group Variables – Define Range – 0, 1 – OK
Example: Family Conflict and
working hours
The work- family conflict of an individual is influenced by
• The no of working hours- since increase in the office hours
would increase stress levels and leave less time for family
affairs
• The number of children- since increase in the number of
children shall lead to increase in responsibilities at home
thereby encroaching on office responsibilities
• The years of work experience- since increase in the number
of work experience shall lead to more involvement and
responsibilities at work
Work Family Conflict = -9.098 + 0.520 (no of work hours) + 0.467
(no of children) + 0.409 (no of yrs of work experience)
• This indicates that the variable “no of working hours” has the
greatest discriminate weight.
• This can be confirmed with the standardized discriminant
function.
• The Wilk’s lambda is found as 0.637 and the F value is
statistically significant at 0.001 indicating good discriminating
power of the Discriminant equation.
This table summarizes the
analysis dataset in terms of
valid and excluded cases.
In this example, 39 out of 40
observations are valid and
taken into consideration.

The discriminating power of


the three independent
variables into dividing into
the two groups (jobs) is 63.7
%.The Significance of the
discriminating function is
also very high.
This table indicates that 32 out of the 40 cases (79.5%)
have been correctly classified using the three
independent variables
The variable working hour has the
greatest discriminating power

Work experience has the maximum


shared variance with the function.
Interdependence Method
• Cluster analysis
• A multivariate approach for grouping observations based on
similarity among measured variables.

24–432
EXHIBIT 24.7 Clusters of Individuals on Two Dimensions
Distance measures for individual observations

• To measure similarity between two observations a


distance measure is needed
• With a single variable, similarity is straightforward
• Example: income – two individuals are similar if their income level is
similar and the level of dissimilarity increases as the income gap
increases
• Multiple variables require an aggregate distance

24–434
measure
• Many characteristics (e.g. income, age, consumption habits, brand
loyalty, purchase frequency, family composition, education level, ..), it
becomes more difficult to define similarity with a single value
• The most known measure of distance is the Euclidean
distance, which is the concept we use in everyday life for
spatial coordinates.
Model:
Data: each object is characterized by a set of numbers
(measurements);
e.g., object 1: (x11, x12, … , x1n)
object 2: (x21, x22, … , x2n)
: :
object p: (xp1, xp2, … , xpn)

24–435
Distance: Euclidean distance, dij,

d ij  xi1  x j1   xi 2  x j 2     xin  x jn 


2 2 2
Cluster Analysis Using SPSS

• An example using demographic variables in a


study on expenditure on clothes using about
400 sample respondent in a market segment in a
city
• The variables such as age, occupation, income
and expenditure on clothing have been used to
carve out clusters.
• Hierarchical cluster or the K-Mean cluster.
The summary of analysis for the
study is as follows:
• Cluster 1 Number of respondent : 160
Students (Post Graduate/Management), Age 20-30 yrs,
Annual spending on clothes Rs 4000/-,House hold income 8-12
lac,

• Cluster 2 Number of respondents 201


Service, Age more than 40 yrs, Spending on clothes Rs 8000/-
Household Income 8-10Lac

• Cluster 3 Number of respondents 35


Students (graduate o), Age 20-30 yrs,H. Income 8-10 lacs
Annual spending on clothes 3500/-
A few could not be classified

You might also like