Context Validity: Cyril J. Weir
Context Validity: Cyril J. Weir
Cyril J. Weir
Discussant: Ms. Mary Ann S. De Castro
Kelly (1927: 14) noted, Lado (1961:
‘The problem of validity 321) similarly
is that of whether a test
asked, ‘Does
really measures what it
purports to measure.’ a test
measure
Cronbach (1971: 463) took a what it is
similar position: ‘Every time an
supposed to
educator asks “but what does
the instrument really measure?” measure? If it
does, it is
valid.’
Validity
Messick’s unified view of validity
Validity is broadly defined as nothing less than
an evaluative summary of both the evidence
for and the actual – as well as the potential –
consequences of score interpretation and use .
(1995)
Validity
The role of context as a The context must be
determinant of acceptable to the candidates
communicative and expert judges as a suitable
language ability is milieu for assessing
particular language abilities.
paramount.
Weir on Context
Every attempt should be made
within the constraints of the test
situation to approximate to
situational authenticity .
(Douglas 2000, O’Sullivan 2004).
Context
Context validity is concerned with
the extent to which the choice of
tasks in a test is representative of
the larger universe of tasks of which
the test is assumed to be a sample.
Context Validity
This coverage
relates to linguistic
and to interlocutor
demands made by
the task(s) as well as
the conditions under
which the task is
performed arising
from both the task
itself and its
administrative
setting
Context Validity
TASK SETTING
Rubric
The test rubric should be candidate-
friendly, intelligible, comprehensive,
explicit, brief, simple and accessible. The
rubric should not be more difficult than
the text or task.
Task Setting
Khalifa (2003: 73) suggests a number of
questions which might be asked of a rubric:
• Is the rubric clear on what students have to do?
• Is the rubric written in as short as possible, simple sentences?
• Is the rubric grammatically correct?
• Is the rubric spelled correctly?
• Is the rubric in the First Language (L1) or the Target Language (TL)?
• Is the rubric familiar to the students?
• Is the rubric clear about the amount of time to spend on each part of the
task?
• Will the task require different types of response? If so, it may be
necessary to provide separate specific instructions for each type of
response required.
Is the rubric accurate and
accessible?
1.Purpose - Test takers should be given a clear
unequivocal idea in the rubric of what the
requirements of the task are so that they can choose
the most appropriate strategies and determine what
information they are to target in the text in
comprehension activities and to activate in productive
tasks.
Purpose
[It] is clear . . . that the writing needs of different groups of second
language learners are quite varied in terms of both cognitive
demands and communicative function. In developing appropriate
writing tests for these different populations then, it will be important to
keep these differences in mind.(Weigle 2002: 12)
Weigle on designing
appropriate tests
In the case of reading we may often read
something just because it looks as though it
might interest us – at other times, because
of the assumed usefulness of the text.
Purpose
In reading tests we need to match purposes to
appropriate text types and vice versa.
e.g. ADVERTISEMENT to see if something is worth buying
ARTICLES to find information for an assignment
BIBLIOGRAPHY to find a reference to an article or book
Purpose
In constructing tests it is important to
. include texts and activities that mirror as
Purpose
In a spoken language test the purpose of the speakers
will help to define the structure and focus of the
interaction, as well as some outcome towards which the
participants will be required to work.
Purpose
Is the purpose of the test made unequivocally clear
for the candidate?
Is it an appropriate purpose?
3. Response format - the choice you make
about format will critically affect the
cognitive processing that the task will elicit.
Alderson et al. on response technique
The response format [test method] used for testing language ability may itself
affect the student’s score. . . Since the effects of the response format tend to
be unpredictable, it can be a potential source of construct-irrelevant variance.
The best advice that can be offered is ensure that more than one response
format for testing any ability is used. (1995: 44–5)
Response format
There are obviously differences occasioned by using
Multiple-Choice Questions (MCQ) as opposed to
short-answer questions (SAQ) (see Weir 1990).
Hughes (2003: 75–8) lists the following problems associated with MCQ:
Response format
Is there any evidence that the test response format
is likely to affect the test
performances?
3. Criteria- candidates should be given a
clear idea of how they will be judged.
Known Criteria
Are the criteria to be used in the
marking of the test explicit for the
candidates and the markers?
3. Weighing - concerned with the assignment of a
different number of maximum points to a test item,
task or component in order to change its relative
contribution in relation to other parts of the same test.
Weighing
Are any weightings for different test
components adequately justified?
4. Order of Items- consider closely the order the
questions that should come in. So in a test of careful
reading, it is usual for questions in a text to follow a
serial order as the evidence suggests that this is the
way we construct meaning (see Kintsch 1998,
Urquhart and Weir 1998).
Order of Items
In reading, the order of
items in each section must
reflect the way such skills
and strategies are
deployed in normal
processing for the
particular reading purpose.
Order of Items
In the case of a listening test,
the items should ask for
information in the
same order in which it occurs
in the passage; if not, it may
confuse test takers, which
could lead to unreliable
performance (see Buck 2001:
119, 138)
Order of Items
In a speaking or
writing test there may
be logical or affective
reasons
for the order in which
tasks occur.
Order of Items
Are the items and tasks in a test in a
justifiable order?
5. Time Constraints- In testing reading and
listening, it is important to consider the time
constraints for the processing of text and
answering the items set on it.
The test developer has to sequence the texts and tasks, and ensure
there is enough time allowed for all activities; if time allotment is not
carefully planned, it may result in unpredictable performance.
Time Contraints
Alderson on speed
Timed readings, especially in computer-
based test settings, might provide useful
diagnoses of developing automaticity,
and thought needs to be given to
measuring the rate at which readers read,
as well as to their comprehension of the
text. Speed should not be measured
without reference to comprehension,
but at present comprehension is all too
often measured without reference to
speed.(2000: 30)
Time Constraints
Is the timing for each part of the test
e.g., preparation and completion
appropriate?
THANK YOU…
Good day to all!
TASK DEMANDS
1. Discourse Mode
Task Demands
B. Reading- test developers decide what text types are appropriate for
a particular test population through needs analysis of the students’
target situations, and careful examination of the texts (and tasks)
used in other tests and teaching materials aimed at the target population.
The texts should be reasonably authentic. In other words, they should
either be taken from the target-language use situation, or possess
salient characteristics of target-language use texts.
Task Demands
D. Channel of Communication- again, one has to resort to the
demands that will be placed on candidates in the future target
situation to inform judgments on the condition that should obtain.
Decisions would have to be made on the nature and amount of non-
verbal information that is desirable, e.g., graphs, charts, diagrams,
etc.
Task Demands
2.Nature of information in the text .Whether the information in the text is
abstract (ethics, love, etc.) or concrete (the objects in a room, for
example) is relevant to the appropriateness of the test. Abstract
information may in itself be cognitively as well as linguistically more
complex and more difficult to process.
Task Demands
Lexical.
Texts with more high-frequency vocabulary tend to be easier than texts
with more low-frequency vocabulary. In listening, low-frequency lexical
items are less likely to be recognized or more likely to be misheard (see Bond
and Garnes 1980).
Structural
Listener familiarity with a speaker’s preferred syntactic patterns may influence
understanding at a global level. A speaker’s use of long, complex syntactic
constructions, more characteristic of written style, may prevent the listener
from using ‘normal’ syntactic expectations for understanding spoken language
(see Rost 1990: 49–50). Texts with less complex grammar tend to be easier than texts
with more complex grammar. Berman (1984) considers how opacity and heaviness of
sentence structures sometimes may lead to increased difficulty in processing.
Functional
Function is a term used to describe the illocutionary force of what is said.
Examples of communicative functions might be where a speaker has to
persuade, advise, describe, etc. (see O’Sullivan et al. 2002).
Task Demands
Interlucutors
Speech rate
A level of speed with which the speaker is delivering speech; the most common
measure of speech rate is words per minute (wpm), while syllables are a much better
unit of measurement whenever precision is necessary. Buck
(2001: 38) argues that research results generally support the commonsense
belief that the faster the speech, the more difficult it is to comprehend.
Variety of accent
The best speakers for any test are speakers typical of the target-language use situation
(Buck 2001). The bottom line is that they have clear accessible pronunciation
and intonation.
Acquaintanceship
The more relaxed the candidate is, the greater the sample of language
that may be elicited. The tester needs to consider who the candidates will be
using English with in their future target situations (see Brown and Yule 1983).
Number
The number of participants in an interaction or the number of things being
referred to in a picture description or discussed has an effect (Brown et al.,1984)
Task Demands
SETTING ADMINISTRATION