Muf Muf Principle of Language Assessment
Muf Muf Principle of Language Assessment
Assessment
Practicality
Reliability
Validity
Authenticity
Washback
Practicality
An effective test is practical
Is not excessively expensive
Stays within appropriate time
constraints
Is relatively easy to administer
Has a scoring/evaluation procedure
that is specific and time-efficient
Reliability
A reliable test is consistent and dependable. If
you give the same test to the same students in
two different occasions, the test should yield
similar results.
Student-related reliability
Rater reliability
Test administration reliability
Test reliability
Rater Reliability
Human error, subjectivity, and bias may
enter into the scoring process.
Inter-rater reliability occurs when two or
more scorers yield inconsistent scores of
the same test, possibly for lack of attention
to scoring criteria, inexperience,
inattention, or even preconceived bias
toward a particular good and bad
student.
Test Reliability
The test is too long
Poorly written or ambiguous test items
Validity
A test is valid if it actually assess the
objectives and what has been taught.
Content validity
Criterion validity (tests objectives)
Construct validity
Consequential validity
Face validity
Content Validity
A test is valid if the teacher can clearly define
the achievement that he or she is measuring
A test of tennis competency that asks someone
to run a 100-yard dash lacks content validity
If a teacher uses the communicative approach
to teach speaking and then uses the audio
lingual method to design test items, it is going
to lack content validity
Criterion-related Validity
The extent to which the objectives of the test
have been measured or assessed. For instance,
if you are assessing reading skills such as
scanning and skimming information, how are
the exercises designed to test these objectives?
In other words, the test is valid if the
objectives taught are the objectives tested and
the items are actually testing this objectives.
Construct Validity
A construct is an explanation or theory that
attempts to explain observed phenomena
If you are testing vocabulary and the lexical
objective is to use the lexical items for
communication, writing the definitions of the
test will not match with the construct of
communicative language use
Consequential Validity
Accuracy in measuring intended criteria
Its impact on the preparation of testtakers
Its effect on the learner
Social consequences of a test
interpretation (exit exam for pre-basic
students at El College, the College Board)
Face Validity
Face validity refers to the degree to which a test looks right,
and appears to measure the knowledge or ability it claims to
measure
A well-constructed, expected format with familiar tasks
A test that is clearly doable within the allotted time limit
Directions are crystal clear
Tasks that relate to the course (content validity)
A difficulty level that presents a reasonable challenge
Authenticity
The language in the test is as natural as possible
Items are contextualized rather than isolated
Topics are relevant and meaningful for learners
Some thematic organization to items is provided
Tasks represent, or closely approximate, realworld tasks
Washback
Washback refers to the effects the tests have on
instruction in terms of how students prepare
for the test Cram courses and teaching to
the test are examples of such washback
In some cases the student may learn when
working on a test or assessment
Washback can be positive or negative