0% found this document useful (0 votes)
40 views

Lesson 14

Uploaded by

Denny Vell
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Lesson 14

Uploaded by

Denny Vell
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

Chapter 14

Analyzing
Results

Denny Vell M. Devaras, RPm


Chapter Objectives
● Learn how to select the appropriate statistical tests
● Understand the concepts behind the chi-square test and how to compute it
● Understand the differences between the two types of t tests
● Understand the concept of variance in an experiment
● Learn how different components of variance can be compared in analysis of
variance to detect significant treatment effects
● Learn how to interpret F ratios for multiple group and factorial experiments
Levels of
Measuremen
t
4 Levels of Measurement NOIR

● Nominal Scale – classifies items into distinct categories that


have no quantitative relationship to one another
● Ordinal Scale - reflects differences only in magnitude, where
magnitude is measured in the form of ranks
● Interval Scale - measures magnitude, or quantitative size, and
has equal intervals between values.
● Ratio Scale - equal intervals between all its values and an
absolute zero point
Parameters of Data Analysis

● How many independent variables are there?


● How many treatment conditions are there?
● Is the experiment run within or between subjects?
● Are the subjects matched?
● What is the level of measurement of the dependent variable?
One Independent Variable
More than Two Treatment
Two Treatment Conditions
Conditions
Level of Two Two matched Multiple Multiple matched
Measurement independent groups (or Independent groups (or within
of DV groups within subjects) groups subjects)
t-test for t-test for One-way ANOVA
Interval or One-way
independent matched (repeated
Ratio ANOVA
groups groups measures)
Mann-
Kruskal-Wallis
Ordinal Whitney U- Wilcoxon Test Friedman Test
Test
test
Chi-square Chi-square
Nominal
Test Test
Two Independent Variable
Factorial Designs
Independent Groups
Level of Matched
Independent and matched groups (or
Measurement of groups (or
Groups between and within
DV within subjects)
subjects)
Two-way
Two-way ANOVA
Interval or Ratio Two-way ANOVA (Mixed)
ANOVA (repeated
measures)
Chi-square
Nominal
Test
Statistics for Two-group
experiments
Example:
Imagine that a researcher wants to test whether subjects can be
induced to make errors on a test question by first presenting them
with a task designed to prime, or elicit, a certain incorrect
response—a homophone (like the “cherry pit”—>”Brad Pitt”
experiment by Burke et al., 2004).
Example:
Subjects in the experimental group are asked the following priming
question by the experimenter: “What do the letters T-O-P-S spell?”

Subjects will answer, “Tops.”

Then they are asked the following test question: “What does a car do at a
green light?”

The experimenter hopes subjects will give the wrong answer, “Stops,”
instead of the correct answer, “Goes,”
Example:
The control group could be asked a different question.

One question that would not be expected to produce very much


interference with the correct answer to the test question might be this:
“What do the letters C-A-R spell?”

After the subject responds, the same test question is asked, and the
subjects’ responses, correct or incorrect, are recorded
Example:
First, let’s answer the questions earlier.

1. There is one independent variable (priming).


2. There are two treatment conditions (priming versus no priming).
3. The experiment is run between subjects. (There are different subjects
in each treatment condition.)
4. The subjects are not matched.
5. The dependent variable is measured by a nominal scale.
Chi-square
Test
Chi-square (ꭕ2)
Chi-square – a non-parametric, inferential statistic that test whether the
frequencies of responses in our sample represent certain frequencies in
the population
● The chi-square does not assume that the population has a certain
parameters such as normal distribution of scores or that variances in
two groups are approximately equal
● A chi-square (ꭕ 2 ) test determines whether the frequencies of
responses in our sample represent frequencies expected in the
population.
● It is commonly used for nominal data
Chi-square (ꭕ 2)
● To conduct a chi-square test, data are first organized in the form of a
2x2 contingency table
○ Frequency counts are tabulated and placed in the appropriate cells

● The chi square compares the frequencies obtained in the experiment,


called obtained frequencies (O), with expected population frequencies,
called expected frequencies (E), to test the null hypothesis.
Hypothetical data from 20 experimental and 20
control subjects from the priming experiment

Outcome of the study

Correct Responses Incorrect Responses Row Total

Experimental Group 4 (12) 16 (8) 20

Control Group 20 (12) 0 (8) 20

Column Total 24 16
Computation of Chi-square (x2)
● First, obtain the expected frequency (E) for each of the four cells in the
table using the following formula:

● Once you have the expected frequencies, you can calculate 2 using
this formula:
Computation of Chi-square (x2)
● Substituting our data, we would have the following
Chi-square (ꭕ 2)
● To know if the value is large enough to reject the null hypothesis, we
must compare the value we obtained (ꭕ2obt) with the critical value
needed to reject H 0
● The critical value can be found in the ꭕ2 table of critical values
● To select the appropriate ꭕ2 distribution from the table, however, we
need to first understand the concept of degrees of freedom.
Degrees of Freedom (df)
● The number of members of a set of data that can vary or change value
without changing the value of a known statistic for those data
○ Let us say we know the mean of the data, the degree of freedom
tells us how many members of that set of data could change
without altering the value of the mean

Example:
Imagine that a phone number is a set of data with a mean of 5. It has
seven digits; so to produce a mean of 5, the total of the seven digits must
equal 35. Suppose that the first six digits are 6,7,4,3,9, and 2. Can you
find the last digit?
Degrees of Freedom (df)
● The degrees of freedom in the distribution of a statistic vary in a way
related to the number of subjects sampled
● We also compute degrees of freedom differently for different test
statistics
○ For the analysis of a 2x2 contingency table, there is only 1 df
○ Formula: number of rows minus 1 x number of columns minus 1
○ df = (2-1) x (2-1) = 1
Degrees of Freedom (df)
● The way we compute the degrees of freedom can
also vary with different applications of the same
statistic
● This is the reason why critical values of test statistics
are always presented and organized by degrees of
freedom rather than by number of subjects
Interpreting
the
Chi-square
Interpreting the Chi-square
● Once that we know the df we can look up the critical value in the table
○ For a significance level of .05, the critical value is 3.84
Interpreting the Chi-square

If we were writing up the results of this experiment, we could now say the ff:
● The research hypothesis was supported
● As predicted by the research hypothesis, there was a significant
difference between the experimental and the control condition
● Subjects who received the T-O-P-S prime were much more likely to give
incorrect response “Stop” than were subjects who were given the control
prime.
Disadvantages of using Statistical
Packages
● These packages will still run programs designed for interval or ratio even
if your data is nominal
● These programs will not tell you how to interpret your data, you will need
to have enough knowledge with statistics to interpret the data
Cramer’s Coefficient Phi (Φ)

● Cramer’s Coefficient phi (Φ) is an estimate of the degree of association


between the two categorical variables tested by ꭕ2
● It is simple to calculate once you have computed ꭕ2
● N= number of observations, S= the smaller number of rows or columns in
the priming experiment (N=40; S=2)
Cramer’s Coefficient Phi (Φ)

Cohen (1988) suggests the following criteria for interpreting the size of Φ
● Φ=.10 (a small degree of association)
● Φ=.30 (medium degree of association)
● Φ=.50 (large degree of association)
● Like r2 , Φ 2 can be interpreted as the proportion of variance shared by the
two variables (it can be reported as an estimate of the effect size in the
experiment.
T Test
Example:
1. There is one independent variable (“fun”).
2. There are two treatment conditions (“fun” versus “no fun”).
3. The experiment is run between subjects. (Each treatment condition
has different subjects.)
4. The subjects are not matched.
5. The dependent variable is measured by a ratio scale (time).
T-test
● T-test is a statistic that relates differences between treatment means to
the amount of variability expected between any two samples of data
from the same population
● It is used to analyze the results of a two-group experiment with one
independent variable and interval or ratio data
● The exact probabilities for each value of t had been calculated, but the
distribution of these values changes depending on the number of
subjects in the samples
Effects of Sample Size
● Sample size is important for t-test because the exact
shape of the distribution of t changes depending on
the size of the samples
● Sample size is also important because of the
assumptions we make whenever we apply t
Why are large samples needed for t-
test?
● Larger samples make it easier to reject H0 because
the critical value of t gets smaller as sample size
(and degree of freedom) increases
● With fewer subjects, we have a greater chance of
making a type 2 error
Degrees of Freedom and Critical Value
of T-test
● We select the appropriate t distribution based on degrees of freedom
rather than the sample size.
● The critical value of t is the value we needed to exceed to reject the H0
at our chosen significance level
○ For probabilities, this mean that the computed value of t is so
extreme that it could have occurred by chance less than 5% of the
time
● Fewer degrees of freedom mean more variability between samples
● We need to understand that critical values of t that will change
depending on the type of hypothesis (directional or non-directional)
Direction of the Hypothesis

● One-tailed is for directional; two-tailed is for non-


directional
○ Solve for the df for the time-estimation experiment

○ Which direction would our experiment have?

○ Check the critical values of t for the one-tailed

test and for its degree of freedom


Computation of t-value
Computation of t-value
We could now say the following:
● The research hypothesis was supported
● There was a significant difference between the group that was having
fun and the group that was not having fun
● As predicted by the hypothesis, subjects who were having fun gave
significantly shorter time estimates than did subjects who were not
having fun
Computation of the effect size
Confidence Intervals CI
● When reporting our data, we also include the confidence intervals for
the data we obtain in an experiment.
○ Confidence intervals represent a range of values above and
below our sample mean that is likely to contain the population
mean with the probability level (usually at 95% to 99%) that the
mean of the population would actually fall somewhere in that
range.
Confidence Intervals CI
● Mean=14.2 and SD= 2.9; df= 4 and tcritical = 2.776
t Test for Matched Groups
● When we look at the data for two matched groups of subjects or two-
group within subjects, we would need a different statistical procedures.
● If we did statistical tests for these experiments in the same way as for
an independent-groups experiment, we would overestimate the
amount of variability in the population sampled.
● We should know that Subjects are not all the same
○ Some scores vary because subjects differ from one another
○ Even the same group of subjects would have a varying responses
but not as much as two groups of different samples
t Test for Matched Groups
● t test also applies to interval and ratio data and requires the
assumption that the population sampled is normally distributed on the
dependent variable
● because this test is used to evaluate data from an experiment in which
the treatment groups are not independent, the computations are
handled differently
Example:
Dan Wegner and his colleagues have studied the effects of “thought
suppression” for a number of years. They have found that when people
try to suppress a thought (i.e., keep it out of consciousness), the thought
can take on obsessive qualities: It can be much more likely to pop into
our head than thoughts that have not been suppressed. Wegner’s classic
“White Bear” study (Wegner, 1989) is a good example.
Example:
When subjects were told that they were not to think about a white
bear—no matter what—thoughts of a white bear were much more likely
to come to consciousness than if subjects were not told to suppress the
thought of a white bear.

In a subsequent study, Wegner found that trying to keep a thought secret


from someone else has the same qualities as trying to suppress it; trying
to keep it secret actually makes the thought more likely to automatically
pop into our heads!
Example:
These findings led to a within-subjects experiment (Wegner, Lane, &
Dimitri, 1994) that tested the allure of secret relationships. The
researchers predicted that the most alluring old flames would be those
that were most secret. The researchers asked people to recall as many
as five old flames, and to rank them from the one they still thought about
the most to the one they thought about the least. Among other questions,
the researchers asked people to rate how secret each relationship had
been at the time.
Example:
The researchers then compared the level of secrecy for each participant’s
mostthought-about past partner (“hot flame”) and their least-thought-
about past partner (“cold flame”). The researchers found that
relationships with hot flames had been significantly more secret than
relationships with cold flames!
Example:
● This experiment is similar to the time-estimation example in that we
are looking at one independent variable with two treatment conditions
(“hot flame” and “cold flame”).
● Secrecy was measured using an interval scale (0 to 5).
● The secrecy experiment, however, was run with only one group of
subjects, making it a within-subjects experiment.
● The appropriate statistical test is therefore a t test for matched groups
(also called a within-subjects t test).
Computation of t test for matched
groups
Computation of t test for matched
groups
Degree of Freedom in Within-Subjects
Design
● Notice that using the within-subjects procedures affects the critical
value of t.
● When we used within subjects design, we end up with fewer degree of
freedom
● the critical value of t needed to reject the null hypothesis increases as
the degrees of freedom get smaller
● The fewer degrees of freedom we have, the more difficult it will be to
reject the null hypothesis
Degree of Freedom in Within-Subjects
Design
● If we measure the responses of different subjects, we are likely to get
much more variability than if we measure the same subjects or
matched subjects
● Using a within-subjects or matched-groups design lowers the amount
of variability in the data
● . When we reduce variability among individual subjects, we make the
denominator of the t formula smaller.
● We lower the degrees of freedom for the experiment, but we also
lower the amount of variability produced by factors other than the
independent variable
Analysis of
Variance
ANOVA
● Analysis of Variance (ANOVA) - a statistical procedure used to
evaluate differences among three or more treatment means
● Each part represents variability produced by different influences in the
experiment.
● Anova breaks up the variability in the data into component parts
○ Within-groups variability - the degree to which the scores of
subjects in the same treatment group differ from one another
○ Between-groups variability - the degree to which the scores of
different treatment groups differ from one another
Sources of Variability
● when we run an experiment we would like to be able to show that the
pattern of data obtained was caused by the experimental manipulation.
● One common source of variability is individual differences.
○ We use random assignment or matching in each experiment so
that these differences do not confound the results of the
experiment
● Some differences between scores will be the result of procedures we
did not handle well in the experiment.
○ Extraneous variables of all kinds can produce more variability,
causing changes in subjects’ behavior that we might not detect
Sources of Variability
● Error – the variability within and between treatment groups that is not
produced by the changes in the independent variables
● Another major source of variability in data is the experimental
manipulation
○ we expect our treatment conditions to create variability in the
responses of subjects who are tested under different levels of an
independent variable
Sources of Variability
● Within-groups variability is the extent to which subjects’ scores differ
from one another under the same treatment conditions
○ The factors that we call error explain the variability that we see
within groups.
● Between-groups variability is the extent to which group performance
differs from one treatment condition to another.
○ Between-groups variability is made up of error and the effects of
the independent variable.
F-ratio
● F-ratio – the ratio between the variability observed between the
treatment groups and the variability observed within treatment groups
● Theoretically, if the independent variable had no effect, the F ratio
should equal 1
● The larger the effect of the independent variable is, the larger the F
ratio should be.
One-way between-subjects ANOVA
● One-way Between-Subjects Analysis of Variance (ANOVA) – a
statistical procedure used to evaluate a between subjects experiment
with three or more levels of a single independent variable
Example:
Strahilevitz and Lowenstein (1998) conducted an interesting three-group
experiment to look at the effect of “ownership history” on the value we
place on objects. They hypothesized that mere ownership of an object
increases our perceptions of its value, and the longer we own it, the more
we value it.

To test their hypothesis, they created an experimental situation in which


subjects were required to place a cash value on several items (e.g., a
mug, a key chain, a T-shirt, and a box of candy). The experiment was set
up to last about 50 minutes.
Example:
To create a 50-minute experiment, all subjects filled out questionnaires
(which were not really part of the experiment) before they were asked to
evaluate the set of items.

Strahilevitz and Lowenstein’s experiment had three conditions: no


ownership; brief ownership; and long ownership. In one condition of the
experiment, subjects were not given any of the items (no ownership).
Example:
In a second condition, subjects were given the mug as a gift for
participating, but it wasn’t given to them until just before they were asked
to place a value on all of the items (brief ownership). In the third condition,
subjects were given the mug at the beginning of the experiment as a gift
for participating (long ownership).

The researchers found that their hypothesis was supported: Subjects


placed a significantly higher value on the mug if they owned it than if they
didn’t, and the highest value was placed on the mug when subjects had
owned it for a longer time.
Example:
● The example have one independent variable: Length of ownership
● The IV has three levels: 0, 1 or 5 minutes
● Dependent variable: the value they would give to the owned object
● Levels of measurement: Ratio Scale
Within- and Between Groups variability
● Variance – represents the average squared deviation from the mean,
and ANOVA uses the term mean square (MS) to denote variability or
variance
● Mean square within groups (MSw ) – represents the portion of the
variability in the data that is produced by the combination of the
sources that we call error.
● Mean square between groups (MS b) – represents the amount of
variability produced by both error and treatment effects in the
experiment
Interpreting the Results
● When we compute F, we test only the overall pattern of treatment
means
● The ANOVA doesn’t identify specific differences between each pair of
means
● After significant F had been computed, further statistics are needed to
determine which groups are really different from each other
○ Post-hoc Tests – used to make pair-by-pair comparisons of the
different groups (example: Turkey and Scheffe)
○ Priori Comparisons – tests between specific treatment groups that
were aniticipated or planned before the experiment was conducted
Interpreting the Results
Now we can say the following:
1. The research hypothesis was supported.
2. As predicted, there was a significant difference in object valuation
between groups that differed by length of time of ownership.
3. Objects were valued higher after brief ownership than if they had never
been owned, and long ownership produced higher object value than brief
ownership did.
Calculating the effect size (η2)
● η2 represents the proportion of variance in all the scores that can be
accounted for or explained by the treatment
● η2 ≥ .1379 may be considered as large treatment effect
● η2 can be computed easily from the information obtained from the
ANOVA
Statistical Control for Differences between groups
● Moderating variable - one that can moderate, or change, the influence
of the independent variable
● Analysis of Covariance (ANCOVA) - can be used to control statistically
for potential moderating variables
● ANCOVA works by removing the variance produced by scores on the
moderating variable, called the covariate, from the variance in the
ANOVA that was produced by error
Statistical Control for Differences between groups
● ANCOVA can be used to accomplish the following two objectives
(Keppel, 1982):
○ to refine estimates of experimental error
○ to adjust treatment effects for any differences between the
treatment groups that existed before the start of the experiment
● ANCOVA is very difficult to compute by hand, but including a covariate
is a simple option in most computer statistics programs
One-way repeated measures ANOVA
● One-way Repeated Measures Analysis of Variance (ANOVA) –
analyze the effects in a multiple-group experiment testing one
independent variable that uses a within-subjects design
Example
● Suppose we conducted a simple experiment in which subjects were
presented with three different lists of words and were asked to recall
as many as they could from each list. One list contained only positively
valenced words (e.g., smile, treasure, love). A second contained
negatively valenced words (e.g., frown, burglar, hate). The third
contained neutral words (e.g., house, car, hat). We predicted that
valence would influence the number of words that subjects recalled
from each list
Between Subjects Factorial Experiment
● Factorial experiments - designed to look at the effects of more than
one independent variable at a time and at the interaction between
variables
● We evaluate both main effects produced by each factor and the
interactions between factors
Example
● Assume we have set up and run an experiment to explore the
relationship between word frequency and recall. Half the subjects saw
words that appear often in the English language (high-frequency
words), and half saw words that are relatively uncommon (low-
frequency words).
● From searching the literature, we predicted that high-frequency words
would be recalled better than low-frequency words.
● We also tested another factor in the same experiment: cueing.
Example
● Besides evaluating the effect of frequency, we also manipulated the
testing procedures so that half the subjects were asked simply to recall
the words they saw on the original list; the other half were given cues
to aid them in remembering the words they saw.
● For instance, suppose subjects saw the word camel on the original list.
If they were in the “no-cue” condition, they were simply asked to recall
the word.
● If they were in the “cue” condition, we provided the name of the
category the word belongs to—animal.
● Category cues were given for all words on the list. Cueing has also
been shown to aid word recall.
Between Subjects Factorial Experiment
● when we have a factorial design, determining treatment effects is more
complex than it is with one independent variable
● Main effects: if the word frequency or category cues significantly affect
the ability to recall words.
● Interaction: whether the effect of word frequency on recall will change
depending on whether cues are given—or whether the effect of cues
could differ depending on whether the word to be recalled is relatively
frequent or infrequent
Two-way ANOVA
● The procedures and formulas for a two-way between-subjects ANOVA
are based on the same set of assumptions as the one-way ANOVA
procedures
● In our example, the experimenter has chosen to use high and low
word frequencies and two levels of the category-cue variable—cues
versus no cues

Note: Even if we no longer calculate ANOVA by hand, it is still important


to understand the logic behind ANOVA in order to select the appropriate
ANOVA model, to set up the follow up tests and to interpret the results
Calculating Two-way ANOVA
● Step 1: We begin by computing the within-groups variability (MSw)
○ MSw represents variability produced by individual differences,
extraneous variables, and other sources of error in the experiment
● Step 2: compute the total sum of squares between groups (SSb)
○ SS b represents all of the variability we have among the treatment
groups
○ To complete the ANOVA, it will be necessary to divide the SSb into
its main components: the parts associated with each factor (SS1
and SS2) and the part associated with the interaction (SS1×2)
Calculating Two-way ANOVA
● Step 3: When we evaluate the main effect of one independent variable,
we treat the data as if that variable is the only one in the experiment.
● Step 4: The variability associated with the interaction of the two
independent variables is simply what remains after the main effects of
the independent variables have been taken into account.
○ The variability between groups that is not explained by either
independent variable can be explained by their interaction (SS 1x2 )
○ The sum of squares for the interaction is found by simple
subtraction: SS1x2 = SSb – SS1 – SS2
Calculating Two-way ANOVA
● Step 5: We can summarize the calculations in a summary table
Evaluating F-ratios
● In our example, we reject the null hypothesis
● Because there are only two levels of factor 1 (Word Frequency), we do
not need to worry about post-hoc or priori comparison to pinpoint the
significant difference
● When we look up the critical value for factor 2 (cueing), we will find
that the main effect is significant at p<.01
● We can interpret the effects by simply comparing the two-group means
● The computed F for the interaction is 0 (not significant)
● It tells us that the variability between group can be explained by the
effect of either Word Frequency or category cues acting separately on
subjects’ scores
Evaluating F-ratios
● If the interaction had been significant, we would be limited in what we
could conclude about the main effects in this experiment.
● If there is a significant interaction, it is more useful to discuss the
impact of the independent variables in combination with each other
● A significant interaction means that the impact of one independent
variable differs depending on the value of the other.
Calculating Effect Sizes
● For Factorial designs we would calculate an effect size for each
significant effect (Main effects and interactions)
● We can use the same formula used in the one-way anova
Repeated Measures and
Mixed Factorial Designs
● Total variability is broken down into component parts representing
treatment and error and an F-ratio is computed
● The obtained F values for main effects and interactions are compared
with critical values needed to reject the null hypothesis.
Example
● Siiter and Ellison (1984) were interested in studying stereotypes of a
“police personality.”
● As these researchers have reported, there is a commonly held belief
that law enforcement officers are a breed apart: “The police, it is said,
like the rich, are different from you and me”
Example
● Siiter and Ellison tested the stereotype in two groups of male subjects:
62 undergraduates and 62 law enforcement officers. Type of group (a
subject variable) was the two-level between-subjects factor
● All subjects filled out a standard scale of authoritarianism twice: once
for their own beliefs and once as they felt a typical member of the
other group might respond.
● Target of the responses (self or other group) was the two-level within-
subjects factor
Interpreting Significant Effects
● To interpret the set of results from this study, we would want to say that
both main effects and the interaction were significant, but we would
focus the interpretation on the significant interaction.
Interpreting Significant Effects
● The null hypothesis for this interaction would be that the four groups
were sampled from the same population; the significant interaction
simply tells us that they probably were not
● The results of the post hoc tests can be seen (as subscripts) in Table
14.11
● Results indicated that the students and the police evidenced similar
authoritarianism levels
Interpreting Significant Effects
● we can look at how accurate the judgments of the police and students
were:
○ The college men rated police officers significantly higher in
authoritarianism than police rated themselves (119.00 vs. 94.97);
○ whereas the police officers’ judgments about the students were
quite similar to their actual scores (88.15 vs. 87.13).
● Taken as a whole, the results suggested that only the students held a
negative stereotype about police, greatly overestimating how
authoritarian they were
Questions?

You might also like