0% found this document useful (0 votes)
59 views

Resume Group 3 Principles of Language Assessment

This document discusses principles of language assessment, including validity and reliability. It covers several aspects of validity such as content validity, criterion-related validity including concurrent and predictive validity, and construct validity. Content validity ensures a test adequately samples the targeted language skills. Criterion-related validity compares test results to an independent assessment. Construct validity examines if a test truly measures what it aims to. The document also discusses reliability and factors to consider for a reliable test such as clear instructions, uniform testing conditions, and objective scoring. While a reliable test provides consistent results, it may still lack validity in accurately measuring the intended constructs.

Uploaded by

Nila Veranita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Resume Group 3 Principles of Language Assessment

This document discusses principles of language assessment, including validity and reliability. It covers several aspects of validity such as content validity, criterion-related validity including concurrent and predictive validity, and construct validity. Content validity ensures a test adequately samples the targeted language skills. Criterion-related validity compares test results to an independent assessment. Construct validity examines if a test truly measures what it aims to. The document also discusses reliability and factors to consider for a reliable test such as clear instructions, uniform testing conditions, and objective scoring. While a reliable test provides consistent results, it may still lack validity in accurately measuring the intended constructs.

Uploaded by

Nila Veranita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Resume Material Principles Of Language

Assessment

A. Validity
Validity is devided into some aspects. The first is that content validity of the test. A
test is said to have content validity if its content constitutes a representative sample of the
language skills, structures, etc. with which it is meant to be concerned. The test would have
content validity only if it included a proper sample of the relevant structures. We would not
expect an achievement test for intermediate learners to contain just the same set of structures
as one for advanced learners. In order to judge whether or not a test has content validity, we
need to a specification of the skills or structures, etc. that it is meant to cover. Such a
specification should be made at a very early stage in test construction. It is not to be expected
that everything in the specification will always appear in the test; there may simply be too
many things for all of them to appear in a single test. A comparison of test specification and
test content is the basis for judgments as to content validity. Ideally these judgments should
be made by people who are familiar with language teaching and testing but who are directly
concerned with the production of the test in question.
What is the important of content validity? Firstly, the greater a test’s content validity,
the more likely it is to be an accurate measure of what it is supposed to measure, i.e. to have
construct validity. A test in which major areas identified in the specification are under-
represented – or not represented at all – is unlikely to be accurate. Secondly, such a test is
likely to have a harmful backwash effect. Areas that are not tested are likely to become areas
ignored in teaching and learning.

The second one is a form of evidence of a test’s construct validity relates to the degree
to which results on the test agree with those provided by some independent and highly
dependable assessment of the candidate’s ability. This independent assessment is thus the
criterion measure against which the test is validated.
There are essentially two kinds of criterion-related validity: concurrent validity and
predictive validity. Concurrent validity is established when the test and the criterion are
administered at about the same time. To exemplify this kind of validation in achievement
testing, let us consider a situation where course objectives call for an oral component as part
of the final achievement test.

From the point of view of content validity, this will depend on how many of the
functions are tested in the component, and how representative they are of the component set
of functions included in the objectives. Every effort should be made when designing the oral
component to give it content validity.

The second kind of criterion-related validity is predictive validity. This concerns the
degree to which a test can predict candidates’ future performance. An example would be how
well a proficiency test could predict a student’s ability to cope with a graduate course. The
criterion measure here might be an assessment of the student’s English as perceived by his or
her supervisor at the university, or it could be the outcome of the course (pass/fail etc).

The third one is that an investigations of a test’s content validity and criterion-related
validity provide evidence for its overall, or construct validity. One could imagine at a test that
was meant to measure reading ability, the specifications for which included reference to a
variety of reading sub-skills, including, for example, the ability to guess the meaning of
unknown words from the context in which they are met. Concurrent validation might several
a strong relations between students’ performance on the test and their supervisors’ assessment
of their reading ability. But one would still not be sure that the items in the test were ‘really’
measuring the sub-skills listed in the specifications.

Two principal methods are used to gather such information: think aloud and
retrospection. In the think aloud method, test takers voice their thoughts as they respond to
the item. In retrospection, they try to recollect what their thinking was as they responded. The
problem with the think aloud method is that the very voicing of thoughts may interfere with
what will be the natural response to the item. The drawback to retrospection is that thoughts
may be misremembered or forgotten. Despite these weaknesses, such research can give
valuable insights into how items work.

In these circumstances, it is recommended these things. Firstly, we write explicit


specifications for the test which take account of all that is known about the constructs that are
to be measured. Make sure that you include a representative of the content of these in the test.
Secondly, whenever feasible, we use direct testing. If for some reason it is decided that
indirect testing is necessary, reference should be made to the research literature to confirm
that measurement of the relevant underlying constructs has been demonstrated using the
testing techniques that are to be employed (this may often result in disappointment, another
reason for favoring direct testing). Thirdly, we make sure that the scoring of responses relates
directly to what is being tested. Finally, we do everything possible to make the test reliable. If
a test is not reliable, it cannot be valid.

B. Reliability

It is possible to quantify the reliability of a test I the form of a reliability coefficient.


Reliability coefficients are like validity coefficients. They allow as comparing the reliability
of different tests. The ideal reliability coefficient of 1 is one which would give precisely the
same results for a particular set of candidates regardless of when it happened to be
administered. A tests which had a reliability coefficient of zero (ad let us hope that no such
tests exists!) would give sets of results quite unconnected with each other, in the sense that
the score that someone actually got on a Wednesday would be no help at all in attempting to
predict the score he or she would get it they took the test the day after. It is between the two
extremes of me and zeros that genuine tests reliability coefficients are to be found.

In fact reliability coefficient that is to be sought will depend also on other


considerations, most particularly the importance of the decisions that are to be taken on the
basis of the test. The more important the decisions, the greater reliability we must demand : if
we are to refuse someone opportunity to study overseas because of their score on a language
test, then we have to be pretty sure that their score would not have been much different if
they had taken the test a day or two earlier or later.

In order to make your test reliable, we consider these factors: 1) Take enough samples
of behavior, 2) Do not allow candidates too much freedom, 3) Write unambiguous items, 4)
Provide clear and explicit instructions, 5) Ensure that tests are well laid out and perfectly
legible, 6) Candidates should be familiar with format and testing techniques, 7) Provide
uniform and non-distracting conditions of administration, 8) Use items that permit scoring
which is as objective as possible, 9) Make comparisons between candidates as direct as
possible, 10) Provide a detailed scoring key, 11) Train scorers, 12) Agree acceptable
responses and appropriate scores at outset of scoring, 13) Identify candidates by number not
name, and 14) Employ multiple, independent scoring.
In connection with validity and reliability, we could argue that to be valid a test must
provide consistently accurate measurements. It must therefore be reliable. A reliable test,
however, may not be valid at all. There always be some tension between reliability and
validity. The tester has to balance gains in one against losses in the other.

You might also like