Reliability & Validity
Reliability & Validity
What is Reliability?
Reliability: Consistency and dependability.
If a measurement device or procedure consistently
assigns the same score to individuals or objects with
equal values, the device is considered reliable.
Researchers must establish the reliability of their
measurement devices in order to be certain that
they are obtaining a systematic and consistent
record of the variation in X and Y.
Types of Reliability
Several types:
Test-retest reliability and alternate reliability
Inter-item reliability and internal consistency
Split-half reliability
Inter-rater reliability
Scorer reliability
Test-retest Reliability
Measure
the
scores
twice
with
the
same
procedures
may
be
may
able
not
to
be
recall
useful
their
when
previous
Inter-item reliability
Inter-item reliability: The degree to which different
items measuring the same variable attain
consistent results.
Scores on different items designed to measure the
same construct should be highly correlated. It also
goes by the name internal consistency.
Example: Math tests often ask you to solve several
examples of the same type of problem. Your
scores on these questions will normally represent
your ability to solve this type of problem, and the
test would have high inter-item reliability.
Inter-rater reliability
When observers must use their own judgment to
interpret the events they are interpreting
(including live or videotaped behaviors and
written answers to open-ended interview
questions), scorer reliability must be measured.
Have different observers take measurements of
the same responses; the agreement between
their measurements is called inter-rater reliability.
Their results can be compared statistically and
represent the scorers reliability.
Types of Validity
Validity: (actually studying the
variables that we wish to study)
Construct validity
Face validity
Content validity
Criterion validity -- 2 types:
Predictive validity
Concurrent validity
Construct Validity
Do my dependent variables actually
measure the hypothetical construct that I
want to test?
Does my IQ test really measure IQ, and
nothing else?
Do my procedures actually measure
learning, (without being influenced by
motivation)?
Does my personality test really measure
personality traits without including fatigue?
Face Validity
The consensus (usually by experts in the field) that a
measure represents a particular concept. It is the least
stringent type of validity. Because most psychological
variables require indirect measures (like the intelligence
example before), the validity of a measured definition
may not be self-evident.
Does rate of eating really reflect hunger? In rats, does the
rate of lever pressing actually measure learning?
Does talking measure extroversion?
Does GPA or SAT score
really reflect intelligence?
Content Validity
Does the content of our measure fairly reflect the
content of the thing we are measuring?
Example: Do the questions on an exam accurately
reflect what you have learned in the course, or were
the exam questions sampled from only a subsection of the material?
A test to measure your knowledge of mathematics
should not be limited to addition problems, nor
should it include questions about French literature.
It should cover the entire range appropriate math
problems you are trying to measure.
Criterion Validity
A powerful indicator of the validity of a
measure is its ability to accurately predict
performance on other, independent outcome
measures (referred to as criterion measures).
The extent to which your SAT score predicts
your college GPA is an indication of the SATs
criterion validity.
There are two approaches to criterion
validity: Concurrent validity and Predictive
validity.