Reliability Dan Validity
Reliability Dan Validity
Y520
Strategies for Educational Inquiry
Robert S Michael
Measurement-1
1
Reliability
■ The observed score on an instrument can be divided into two
parts:
• Observed score = true score + error
• An instrument is said to be reliable if it accurately reflects the true
score, and thus minimizes the error component.
• The reliability coefficient is the proportion of true variability to the total
observed (or obtained) variability.
• E.g., If you see a reliability coefficient of .85, this means that 85% of
the variability in observed scores is presumed to represent true
individual differences and 15% of the variability is due to random
error.
■ Reliability is a correlation computed between two events:
• Repeated use of the instrument (stability)
• Similarity of items (internal consistency)
• Equivalence of 2 instruments (equivalence)
Measurement-3
Stability: Test-Retest
■ Test - retest reliability: Same group of respondents complete the
instrument at two different points in times. How stable are the
responses?
• The correlation coefficient between the 2 sets of scores describes
the degree of reliability.
• For cognitive domain tests, correlations of .90 or higher are good.
For personality tests, .50 or higher. For attitude scales, .70 or higher.
■ Problems with test-retest reliability procedures:
• Differences in performance on the second test may be due to the first
test - i.e., responses actually change as a result of the first test.
• Many constructs of interest actually change over time independent of
the stability of the measure.
• The interval between the 2 administrations may be too long and the
construct you are attempting to measure may have changed.
• The interval may be too short. Reliability is inflated because of
memory. Measurement-4
2
Alternate Forms
■ Involves using differently worded questions to measure the same
construct.
• Questions or items are reworded to produce two items that are
similar but not identical.
• Items must focus on the same exact aspect of behavior with the
same vocabulary level and same level of difficulty.
• Items must differ only in their wording.
■ Reliability is the correlation between the responses to the pairs of
questions.
■ Alternate forms reliability is said to avoid the practice effects that can
inflate test-retest reliability (i.e., respondent can recall how they
answered on the identical item on the first test administration).
Measurement-5
Measurement-6
3
Internal Consistency - Homogeneity
Measurement-7
■ Example:
• The Rand 36-item Health Survey measures 8 dimensions of
health. One of these dimensions is physical function.
• Instead of asking just one question, “How limited are you in
your day-to-day activities?” Rand found that asking 10
questions produced more reliable results, and conveyed a
better understanding of “physical function.
Measurement-8
4
Internal Consistency: Rand Example
■ The following questions are about activities you might do during a typical
day. Does your health now limit you in these activities. If so, how much?
(Response options are: limited a lot, limited a little, not limited at all).
• Vigorous activities, such as running, lifting heavy objects,
participating in strenuous sports.
• Moderate activities, such as moving a table, pushing a vacuum
cleaner, bowling, or playing golf.
• Lifting or carrying groceries.
• Climbing several flights of stairs.
• Climbing one flight of stairs.
• Bending, kneeling, or stooping.
• Walking more than a mile.
• Walking several blocks.
• Walking one block.
• Bathing or dressing yourself.
Measurement-9
■ Cronbach’s alpha:
• Indicates degree of internal consistency.
• Is a function of the number of items in the scale and the degree of
their intercorrelations.
• Ranges form 0 to 1 (never see 1).
• Measures the proportion of variability that is shared among items
(covariance).
• When items all tend to measure the same thing they are highly
related and alpha is high.
• When items tend to measure different things they have very little
correlation with each other and alpha is low.
Measurement-10
5
Cronbach’s Alpha
■ Conceptual Formula:
Validity
Measurement-12
6
Content Validity
Measurement-13
Criterion Validity
Measurement-14
7
Construct Validity
Measurement-15
Inferences in Measurement
Measurement-16
8
Inferences in Measurement
■ Length of table
■ Weight of person
■ Speed of car
■ Temperature
■ Humidity
■ Wind chill index
■ Discomfort Index (Smog Index / Pollen Index)
■ Intelligence
■ Anxiety
Measurement-17
Inventing Constructs
Measurement-18
9
Inventing Constructs
■ Temperature
■ Wind Chill
■ Discomfort Index
■ Intelligence
■ Extroversion
Measurement-19
Example:
Intelligence and its Measurement
10
Definitions:
Intelligence and its Measurement (a)
Measurement-21
Definitions:
Intelligence and its Measurement (b)
Measurement-22
11
Definitions:
Intelligence and its Measurement (c)
Measurement-23
Definitions:
Intelligence and its Measurement (d)
Measurement-24
12