0% found this document useful (0 votes)
214 views7 pages

Types of Validity

This document discusses different types of validity in language testing, including internal validity (face validity, content validity, and response validity) and external validity (concurrent validity, predictive validity, and consequential validity). It defines each type and provides examples. The document concludes that construct validity encompasses all other forms of validity, and discusses ways to assess construct validity such as analyzing correlations between sub-tests and comparing test scores to student characteristics.

Uploaded by

Anonymous IP3gjs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views7 pages

Types of Validity

This document discusses different types of validity in language testing, including internal validity (face validity, content validity, and response validity) and external validity (concurrent validity, predictive validity, and consequential validity). It defines each type and provides examples. The document concludes that construct validity encompasses all other forms of validity, and discusses ways to assess construct validity such as analyzing correlations between sub-tests and comparing test scores to student characteristics.

Uploaded by

Anonymous IP3gjs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Types of Validity

Validity might be seemed as united concept but, according to Ferguson (2006), with different
aspects. These differences are, in fact, based on methods of view of validity. Alderson,
Clapham and Wall (1995) indicate previous statement and suggest that it would be useful to
validate tests in many types, since each type provide more evidences. In this section I will
generally follow such statement dividing these types into two main parts; internal validity and
external validity. Then I will conclude with construct validity as an umbrella for all other
form.

1. Internal Validity:
There are three kind of internal validity; face validity, content (or rational) validity and
response validity.
a. Face Validity:
Face validity refers, as Alderson, Clapham and Wall (1995:172) mention, to the test's 'surface
credibility or public acceptability'. Bachman (1990:307) states that 'face validity is the
appearance of real life'. Thus, it can be said that face validity is, normally, assessed by people
who are not necessarily experts.

The objective of such assessment is, Alderson, Clapham and Wall (1995:172) say, to make
sure that the test is considered as face valid due to the fact the this believe will lead users to
take the test seriously and perform to the test of the best of their ability.

b. Content Validity:
It is, Kerlinger (1973:458) cited in Alderson, Clapham and Wall (1995:173), the
representative or sampling adequacy of the content-the substance, the matter, the topic- of a
measuring instrument. Usually, content validity is based on the experts' assessment.
In the light of previous definition, content validity can be assessed. Ferguson (2006:2) by
attempting to answer following questions:
 Is what candidates are asked to do relevant to their future work instrument?
 Is there a match between the characteristics of the test takers and the characteristics
of the target language use situation?
 Does the test test what is contained in the syllabus?
 How relevant is the test content-to the needs of the students; to the syllabus?
 Does the test content offer a good basis for inferences about canididate's ability in the
target language use domain?

The key issue for the content validity, according to Messick (1999) is the specification of to
be assessed. Thus, a common way of such assessment is, Ferguson (2006:2) to analyse the
actual test content, by using prepared checklist or instrument, and then compare it with the
statement of what the content should be.

Content validity has useful application. For example, a scale could be improved on which the
experts rate the test according to some criteria. Test Methods Characteristic (TMC) scale was
adapted by Clapham (1992), cited in Alderson, Clapham and Wall (1995:174), when she
evaluate the content of three reading comprehensive test by asking three teachers to rate
aspects of the test input.

In practice, the judgment of content validity is not easy. McNamra (2000) states previous
notice, exemplifying a test of ability to read academic texts when the question is likely to be
arisen: does it matter from which academic field the texts are drawn?

2.1.3 Response Validity:

This type of validation has appeared as, a result of the growing range of qualitative
techniques such as self-observation on the parts of test takers are used, as Alderson, Clapham
and Wall (1995) state, to make sure how exactly they response to test items and why

The responding to test components cannot be gained simply. Thus, according to Ferguson
(2006) some research methodologies such as think aloud protocols have provided a kind of
promise and the answers obtained might help identify what the test is testing as comparing to
what the testers think it is testing.

2.2 External Validity:

There are three types of external validity: concurrent, predictive and consequential validity.
2.2.1 Concurrent Validity:

Concurrent validity, as in Alderson, Clapham and Wall (1995:177), 'involves the comparison
of the test scores with some other measures if the same candidates taken at roughly the same
time as the test'. Other measures should be, according to Ferguson (2006:3) already known to
be valid. For example:
 Scores on an older test whose validity is already established.
 Scores on a parallel version of the same test.
 Teachers ranking and estimates of students language ability.

Alderson, Clapham and Wall (1995) and Ferguson (2006) mention that in case of using
previous valid test as a measure for new one, the high correlation, say .90, is considered as a
declaration of the validity of new test. However, correlation alone cannot state that the two
tests are measure the same language. For example, Shohamy (1994) has examined such claim
by practical study. She looks at two tests; direct versus semi direct oral tests. At the end, she
concludes that "… concurrent validation, using correlation, cannot provide sufficient
evidence that two tests actually test the same language and therefore comparable…"
Shohamy (1994:120).

2.2.2 Predictive Validity:

Although this type is similar to previous one, there is a significant difference between them in
terms of the time of collecting the external measures. In predictive validity, as Alderson,
Clapham and Wall (1995) and Ferguson (2006) point put, the measures will gathered in the
future after the test has been given.

Predictive validity, according to Alderson, Clapham and Wall (1995) and Ferguson (2006),
has a special attention in the field of proficiency tests (e.g. IELTS, TOFEL…etc) where the
aim is to predict how one will do in the future (e.g. in university, jobs…etc).

Nevertheless, there are two difficulties mentioned by Ferguson (2006:3) as following:


i. The first is the truncated sample problem. The fact that the test has already been used to
screen out low scoring applicants means that there are no longer low scoring students in
the sample. Thus, the sample is slightly distorted to a narrower range of scores and the
resulting correlation accordingly depressed.
ii. The second major problem is the suitability of the criterion itself. Clearly, academic result
is influenced by more than language ability and this means that one would not be
necessarily expect a high correlation coefficient.

2.2.3 Consequential Validity:

Messick (1996:251) states that consequential validity "…includes evidence and rationales for
evaluating the intended and unintended consequences of score interpretation and use in both
the short- and long-term, especially those associated with bias in scoring and interpretation,
with unfairness in test use, and with positive or negative washback effect on teaching and
learning".

Most efforts in consequential validity are focused on the washback. The previous term is
considered by Alderson & Wall (1993:117) as a common term in language teaching and
testing which "…can be related to 'influence'. If the test is poor, then the washback may be
felt to be negative…"

2.3 Construct Validity

As I have mentioned above, construct validity is a superordinate form of validity (see:


Bachman and Palmer (1996), Messick (1996) Alderson, Clapham and Wall (1995)).
Bachman and Palmer (1996:21) state that construct validity "…refer to the extent to which we
can interpret a given test score as an indicator of the ability(ies) or construct(s) we want to
measure". Such interpretation should be based on the evidences supporting that the test score
reflects the area(s) of language skills that we want to measure (see: Bachman and Palmer
(1996), Ferguson (2006)), in order to insure that test scores mean what we expect them to
mean (see: Alderson, Clapham and Wall (1995).
2.3.1 Assessing construct validity:

There are many ways of assessing construct validity. Ferguson (2006:4) have summarised as
following:

i. Studying internal correlation between the sub-tests. The rational here is that the reason for
having different components of a test (e.g. reading, grammar, writing etc) is that they
measure something different from each other, so we should expect the correlation
between the sub-tests to be fairly moderate, say between 0.3-0.6. if the correlation
between two sub-tests was very high, then we might wonder if they were testing the same
thing, and if , in turn, one of the sub-tests was redundant. It is also necessary to correlate
the sub-tests with the whole overall test score. According to the classical theory, the
correlation here might be expected to be higher- around 0.7.

ii. Comparing the test with theory: according some writers, construct validation may also
involve assessing the extent to which the test is successfully based on its underlying
theory. In other word, it is a successful operationalisation of theory. This is assessed by
experts, who-having looked at the test and having been informed as to underlying theory,
reach an informed judgment as to construct validity.

iii. Comparison with Students Biodata and Psychological characteristics: another form of
construct validation involves comparing test performance with bio-data from test-takers.
The aim is to detect and bias for or against a particular group of students defined in terms
if age, nationality, first language, etc. an alternative is to compare test scores with
theoretically relevant psychological measures.

There are, also, other ways to assess the construct validity such as Multitrait-multimethod
analysis and convergent-divergent validation, factor analysis (e.g. Bachman & Palmer 1989)
but, according to Alderson, Clapham and Wall (1995) and Ferguson (2006), these ways are
complex and involving sophisticated producers of statistics. Therefore, they might be not
related to the concerns of this short essay.

2.3.2 Some difficulties facing construct validity:


Fulcher (1999:225) point out that there are two major threats to construct validity and score
interpretation;

i. The first is the construct under-representation, when the test fails to represent the
construct supposed to be measuring.
ii. The second, on the other hand, is construct irrelevant variance, which the test seems to
ignore the construct we wish to measure and, rather, focuses on something not related,
although that this kind of test is sometimes reliable test.

Most validity research, Fulcher (1996) says, are attempted to reduce the negative impact of
these two threats. Such concerns, according to Messick (1996) are linked to negative
washback (which has been mentioned above in consequential validity). For instance, if the
test under-represent an important construct, teachers might overemphasis the well-present
constructs and down play others.

2.4 Suitable data for test validity:

To focus on the data which is suitable regarding some types of test validity, Alderson,
Clapham and Wall (1995:193-194) provide a useful checklist as following:

Type of validity Procedures for evaluation

Face Validity Questionnaires to, interview with candidates, administrations and


other users
Content Validity a) Compare test content with specifications/syllabus.
b) Questionnaires to, interview with 'experts' such as teachers,
subject specialists, applied linguists.
c) Expert judges rate test items and texts according to precise list
criteria
Response Students introspect on their test-taking procedures, either
Validity concurrently or retrospectively.
Concurrent a) Correlate students' test score with their scores on other test.
Validity b) Correlate students' test scores with teachers' ranking.
c) Correlate students' test scores with other measuers of ability
such as students' teacher rating.
Predicative a) Correlate students' test scores with their scores on tests taken
Validity some time later.
b) Correlate students' test scores with success in final exam.
c) Correlate students' test scores with other measures of their
ability taken some time later, such as teachers' assessment.
d) Correlate students' test scores with success of later placement.
Construct a) Correlate each student with other subtest.
Validity b) Correlate each student with total test
c) Correlate each student with total minus self.
d) Compare students' test score with students' biodata and
psychological characteristics.
e) Multitrait-multimethode studies.
f) Factor analysis.

You might also like