Validity and Reliability
Validity and Reliability
There are two aspects of the quality of a measurement instrument (questionnaire or observation):
- Validity explores if we measure what we want to measure. In other words, if the instrument
measuring the underlying construct (no systematic error in the measurements).
- Reliability investigates the accuracy with which the instrument measure the construct. If
we get the same result when the measurement is repeated (no influence of random/non-
systematic errors).
There are different types of validity (questionnaires) and two ways of assessing validity:
Face
validity
Theoretical
Content
Ways of validity
assessing
validity Criterium
validity
Empirical
(statistics)
Construct
validity
- Face validity studies if the test at first sight measure what it is intending to measure. For
example, in mathematical ability, you should expect math items. But other types of validity
are needed.
- Content validity study if the test items are representative for the construct. For this, we
have to investigate the relevant dimensions/aspects of the construct that should be
reflected in the items and the irrelevant aspects that should not be as judged by (clinical)
experts, consulting the literature. For example, different aspects of problem behavior (of
child) should be measure suing different items which pertain to different aspects and we
have to avoid irrelevant aspects like “do you beat your child?”.
- Criterion validity looks if the scores are related to an external criterion. For example, test
regarding school maturity of a child. There are two types:
o Concurrent validity shows the relation with a “measurement” at the same
moment. For example, teachers’ opinion regarding maturity of the child.
o Predictive validity shows the relation with a “measurement” later in time. For
example, school results in the first grade.
• Convergent validity:
- Construct validity explore the degree to which a test measures what it claims, or
purports, to be measuring.
36
QUANTITATIVE RESEARCH METHODS
Non-systematic/random errors
Reliability
Accuracy, consistency of the measurements
Error in fluctuations in reaction times is commited because of the presence of accidental factors
that yield inaccurate measurements. There could be problems with the measurement device
(computer) or being distracted for a moment. The goal is to control these accidental factors. Also
with questionnaires owing to a skipping of a question or a misunderstanding of it.
But, there are different degrees of inaccurary of a measurement instrument,that we can quantify
by a correlation. There are two types:
An error in reliability can lead to fluctuations in reaction times (error) due to accidental factors
that result in inaccurate measurements because of problems with the measuring device
(computer) or being distracted for a moment. Therefore, its goal is to control these accidental
factors.
Errors can also occur in questionnaires when, for example, the participant skips a question or it is
not understood as the researcher intended.
The degree of inaccurary of a measurement instrument is quantified by a correlation. There are
two types:
a) Internal consistency of the measurement instrument (consistency of the different
items): cronbach alfa is. It is a function of the number of items in a test, the average
covariance between item-pairs, and the variance of the total score.
b) Stability of the measurement instrument: test-retest reliability and parallel reliability.
37
QUANTITATIVE RESEARCH METHODS
a) Test-retest reliability.
It involves administering the test twice (at different times) to the same people. There will be
greater reliability, the greater the correlation between the two tests.
There may be a problem of the time interval, because we don't know how long it should last. If
it's too short, it skews because people remember your answers. But, if it's too long, people can
change (the construct can change over time, less so for cognitive skills). In turn, there could be a
learning effect.
That's not the best option.
b) Parallel reliability.
It consists of administering two parallel tests at the same time (of the same individuals). There will
be greater reliability, the higher the correlation between parallel tests.
The advantage is that there is no time effect. However, the tests must be parallel. For example, if
we measure reading ability by looking at the number of words a child can read in a minute (sorted
from easy to difficult), the parallel test should have the same order. But this is not possible for all
constructions
2.2. OBSERVATIONS/JUDGEMENTS
a) Inter. Proportion of agreement among two raters regarding the same set of items
b) Intra. Proportion of agreement within a rater when evaluating the same items twice.
Example: case of dichotomous (binary) items
passed failed total
passed 87 23 110
failed 14 95 109
total 101 118 219
Ratio of number of agreements to total number of items: (87 + 95) / (87 + 23 + 95 + 14) = 182 /
219 = .8311
Test manuals always contain information regarding validity and reliability. When constructing your
own instrument, you should give arguments to prove the validity and reliability of the instrument.
• Face/content validity: evaluated by experts.
• Criterium/construct validity: relation with other tests or information regarding the
construct under study.
• Reliability: administer test twice from same people.
38
QUANTITATIVE RESEARCH METHODS
3.1. TYPES
A) Test-retest reliability. Administering the test twice (at different times) to the same
individuals. high reliability = high correlation among both tests. The problem is to
determine how long will be the time interval, because if it is too short, bias because people
remember their answers; but, if it is too long, people change (construct may change over
time, less for cognitive abilities)-learning effect. That´s not the best option.
B) Parallel reliability. Administering two parallel tests at the same moment in time (from
the same individuals). High reliability = high correlation among parallel tests.
Advantages Disadvantages
No time effect Tests should be parallel. Reading ability by the
number of words that a child can read in one
minute (ordered from easy to difficult). Parallel
test should have the same order.
It´s not possible for all constructs.
3.2. OBSERVATIONS/JUDGEMENTS
Ratio of number of agreements to total number of items: (87 + 95) / (87 + 23 + 95 + 14) = 182 /
219 = .8311
39