0% found this document useful (0 votes)
9 views

Validity and Reliability

Validity and reliability are important aspects of measurement quality. Validity explores if an instrument measures the intended construct, while reliability assesses the accuracy of measurement. There are different types of validity including face, content, criterion, and construct validity. Reliability can be assessed via internal consistency and test-retest or parallel versions of instruments. Measurement errors can occur systematically or randomly, and reliability quantifies inaccuracies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Validity and Reliability

Validity and reliability are important aspects of measurement quality. Validity explores if an instrument measures the intended construct, while reliability assesses the accuracy of measurement. There are different types of validity including face, content, criterion, and construct validity. Reliability can be assessed via internal consistency and test-retest or parallel versions of instruments. Measurement errors can occur systematically or randomly, and reliability quantifies inaccuracies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

QUANTITATIVE RESEARCH METHODS

4. Validity and reliability

There are two aspects of the quality of a measurement instrument (questionnaire or observation):
- Validity explores if we measure what we want to measure. In other words, if the instrument
measuring the underlying construct (no systematic error in the measurements).
- Reliability investigates the accuracy with which the instrument measure the construct. If
we get the same result when the measurement is repeated (no influence of random/non-
systematic errors).

There are different types of validity (questionnaires) and two ways of assessing validity:

Face
validity
Theoretical
Content
Ways of validity
assessing
validity Criterium
validity
Empirical
(statistics)
Construct
validity

- Face validity studies if the test at first sight measure what it is intending to measure. For
example, in mathematical ability, you should expect math items. But other types of validity
are needed.
- Content validity study if the test items are representative for the construct. For this, we
have to investigate the relevant dimensions/aspects of the construct that should be
reflected in the items and the irrelevant aspects that should not be as judged by (clinical)
experts, consulting the literature. For example, different aspects of problem behavior (of
child) should be measure suing different items which pertain to different aspects and we
have to avoid irrelevant aspects like “do you beat your child?”.

- Criterion validity looks if the scores are related to an external criterion. For example, test
regarding school maturity of a child. There are two types:
o Concurrent validity shows the relation with a “measurement” at the same
moment. For example, teachers’ opinion regarding maturity of the child.
o Predictive validity shows the relation with a “measurement” later in time. For
example, school results in the first grade.
• Convergent validity:
- Construct validity explore the degree to which a test measures what it claims, or
purports, to be measuring.

36
QUANTITATIVE RESEARCH METHODS

o Convergent validity shows that different measurement instruments of the same


construct (or of related constructs) should be related. If there is a large positive
correlation
o Discriminant validity shows that the different measurement instruments of
different constructs should not (or less) be related.for example, problem behavior
and parenting behavior.

Validity Systematic errors

Non-systematic/random errors
Reliability
Accuracy, consistency of the measurements

Error in fluctuations in reaction times is commited because of the presence of accidental factors
that yield inaccurate measurements. There could be problems with the measurement device
(computer) or being distracted for a moment. The goal is to control these accidental factors. Also
with questionnaires owing to a skipping of a question or a misunderstanding of it.
But, there are different degrees of inaccurary of a measurement instrument,that we can quantify
by a correlation. There are two types:
An error in reliability can lead to fluctuations in reaction times (error) due to accidental factors
that result in inaccurate measurements because of problems with the measuring device
(computer) or being distracted for a moment. Therefore, its goal is to control these accidental
factors.
Errors can also occur in questionnaires when, for example, the participant skips a question or it is
not understood as the researcher intended.
The degree of inaccurary of a measurement instrument is quantified by a correlation. There are
two types:
a) Internal consistency of the measurement instrument (consistency of the different
items): cronbach alfa is. It is a function of the number of items in a test, the average
covariance between item-pairs, and the variance of the total score.
b) Stability of the measurement instrument: test-retest reliability and parallel reliability.

37
QUANTITATIVE RESEARCH METHODS

2.1. TYPES OF RELIABILITY

a) Test-retest reliability.
It involves administering the test twice (at different times) to the same people. There will be
greater reliability, the greater the correlation between the two tests.
There may be a problem of the time interval, because we don't know how long it should last. If
it's too short, it skews because people remember your answers. But, if it's too long, people can
change (the construct can change over time, less so for cognitive skills). In turn, there could be a
learning effect.
That's not the best option.
b) Parallel reliability.
It consists of administering two parallel tests at the same time (of the same individuals). There will
be greater reliability, the higher the correlation between parallel tests.
The advantage is that there is no time effect. However, the tests must be parallel. For example, if
we measure reading ability by looking at the number of words a child can read in a minute (sorted
from easy to difficult), the parallel test should have the same order. But this is not possible for all
constructions

2.2. OBSERVATIONS/JUDGEMENTS

a) Inter. Proportion of agreement among two raters regarding the same set of items
b) Intra. Proportion of agreement within a rater when evaluating the same items twice.
Example: case of dichotomous (binary) items
passed failed total
passed 87 23 110
failed 14 95 109
total 101 118 219

Ratio of number of agreements to total number of items: (87 + 95) / (87 + 23 + 95 + 14) = 182 /
219 = .8311

Test manuals always contain information regarding validity and reliability. When constructing your
own instrument, you should give arguments to prove the validity and reliability of the instrument.
• Face/content validity: evaluated by experts.
• Criterium/construct validity: relation with other tests or information regarding the
construct under study.
• Reliability: administer test twice from same people.

38
QUANTITATIVE RESEARCH METHODS

3.1. TYPES

A) Test-retest reliability. Administering the test twice (at different times) to the same
individuals. high reliability = high correlation among both tests. The problem is to
determine how long will be the time interval, because if it is too short, bias because people
remember their answers; but, if it is too long, people change (construct may change over
time, less for cognitive abilities)-learning effect. That´s not the best option.
B) Parallel reliability. Administering two parallel tests at the same moment in time (from
the same individuals). High reliability = high correlation among parallel tests.
Advantages Disadvantages
No time effect Tests should be parallel. Reading ability by the
number of words that a child can read in one
minute (ordered from easy to difficult). Parallel
test should have the same order.
It´s not possible for all constructs.

3.2. OBSERVATIONS/JUDGEMENTS

Intra- and inter-rater reliability:


- Inter: proportion of agreement among two raters regarding the same set of items.
- Intra: proportion of agreement within a rater when evaluating the same items twice.
Example: case of dichotomous (binary) items

Ratio of number of agreements to total number of items: (87 + 95) / (87 + 23 + 95 + 14) = 182 /
219 = .8311

Test manuals always contain information regarding validity and reliability.


When constructing your own instrument, you should give arguments to prove the validity and
reliability of the instrument. Also, face/content validity must be evaluated by experts and
criterium/construct validity related with other tests or information regarding the construct under
study. Finally, reliability administer test twice from same people.

39

You might also like