0% found this document useful (0 votes)
17 views21 pages

Week4 2 Testing

Uploaded by

seyfelizeliha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views21 pages

Week4 2 Testing

Uploaded by

seyfelizeliha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Using and Interpreting Information

about Test Reliability


Chapter 7

05/10/24 1
Aim is...
Using the information about reliability in
evaluating, interpreting and improving
psychological tests.
Reliability information alone is NOT
enough
Relationship between reliability and
validity

05/10/24 2
Using the Reliability Coefficient
It provides a relative measure of the accuracy of
test scores.
It doesn’t provide an indication of how accurate
test scores really are, in absolute terms.
A score of 110 from an intelligence test. Is it
really higher than the average score (i.e.100)?
How much variability should we expect on the
basis of measurement error? RC doesn’t say this
in concrete terms!
So, we need to know the size of the standard
error of measurement.
05/10/24 3
Using the Reliability Coefficient vs
SEM
Reliability coefficients are most useful in
comparing the scores produced by different tests.
The standard error of measurement (SEM) is
more useful when interpreting test scores.

05/10/24 4
SEM – Standard Error of
Measurement
SEM – measure of how much the
individual’s score is likely to differ from the
individual’s true test score.

SEM – 2 factors;
The reliability of the test (rxx)
The variability of test scores (X )

05/10/24 5
SEM – Standard Error of
Measurement
A spelling test has a reliability coefficient of .84
and a standard deviation of 10, then

SEM=

05/10/24 6
SEM – Standard Error of
Measurement
For testing purposes, SEM is more useful
than reliability coefficients.
We can use SEM to create a confidence
interval around a users score.
confidence interval= SEM X 1.96 (confidence
level %95)
As reliability increases, SEM decreases
As a test becomes more reliable, we can feel
more confident that an individual’s observed
score is close to the individual’s true score.
05/10/24 7
SEM – Standard Error of
Measurement

A mean of 100, a standard error of 4.7


4.7 x 1.96= 9.2 (confidence interval)
Range is 90.8 and 109.2
05/10/24 8
Confidence Intervals
Confidence intervals reflect a range that
contains the examinee’s true score.
Confidence intervals are calculated using the
SEM and the SD of the scores.
As reliability increases, SEM and confidence
intervals get smaller.

05/10/24 9
Confidence Intervals
Example of Confidence Interval:
“Johnny’s FSIQ is 113 (between 108 and 118
with 95% confidence).”
The SEM and confidence intervals remind us
that scores are not perfect.

05/10/24 10
SEM – Standard Error of
Measurement
If a person's true score is 110 on a test with a standard
error of measurement of 3.7 and a mean of 100, we
would expect 95% of the person's test scores to fall
within
a) 102.75 - 117.25
b) 92.75 - 107.25
c) 90.75 - 120.25
d) 100-110
A mean of 100, a standard error of 3.7
3.7 x 1.96= 7.25
Range is 92.75 and 107.25
05/10/24 11
Reliability and Validity
A reliable test is NOT necessarily valid.
A test can be reliable, not yet valid.

05/10/24 12
Reliability and Validity
Measurement errors would decrease the correlation
between two tests, X and Y
(in other words, the validity of predictions.)
‘correction for attenuation’
A method of estimating the true correlation between X and Y
given the correlation between two unreliable measures of
X and Y is by using the correction for attenuation.

05/10/24 13
Reliability and Validity
If the reliability of tests are increased,
the validity of tests would also
be expected to increase.

The aim is to increase the correlation between two tests


05/10/24 14
Reliability and Validity

 Example (a) shows what an unreliable test would look like. Example (b) shows what a
reliable but invalid test would look like. It is similar to a rifle that has its sights mis-
aligned. The high degree of reliability is shown by the consistency of the strikes. The
lack of validity is shown by the fact that the missiles are missing their target, the
bullseye. For example, a job satisfaction test given to unskilled workers may measure
literacy skills rather than job satisfaction if the test is written in complex language. In
psychometric terms, the test is not measuring what it was intended to measure.
Example (c) is what a valid and reliable test would look like: the missiles hit the mark
and they hit it consistently.
05/10/24 15
Special Issues
Speed test vs. power tests
Speed:A test in which items are trivially easy
60 seconds for a 100-item test.

Power: 20-item test with no time limit.

05/10/24 16
Special Issues
Speed test vs. power tests
A pure speed test should have an odd-even split-half
reliability of about 1.0

The most useful method of assessing the reliability of


highly speeded tests is the test-retest method.

A participant may be slow in the speed test and cannot


finish all the questions on time. Some test items may be
poorly constructed, but not responded by some
participants.

05/10/24 17
Selecting a Reliability
Coefficient
If a test is to be administered multiple times:
Test-Retest Reliability
Tests to be administered one time:
Homogeneous content – coefficient alpha
Heterogeneous content – split-half coefficient

05/10/24 18
How Reliable Should Tests Be?

A lower level of reliability are acceptable when tests


are used for preliminary rather than final decisions.

05/10/24 19
Reliability of Composite and
Difference Scores
Composite scores
When scores are combined
to form a composite
For example, IQs are
typically composite scores
The reliability of composite
scores is typically better
than the individual scores
in composite

05/10/24 20
Reliability of Composite and
Difference Scores
Difference scores
Involves calculating the difference between two
scores
The reliability of difference scores is typically
lower than the individual scores

05/10/24 21

You might also like