0% found this document useful (0 votes)
31 views6 pages

Language Assessment

The document discusses the concepts and issues surrounding language assessment, emphasizing the distinction between assessment and testing, as well as the various types of assessments such as formative, summative, diagnostic, and proficiency tests. It highlights the principles of language assessment, including practicality, reliability, validity, authenticity, and washback, and their implications for classroom testing. The text also addresses the evolution of assessment methods, including traditional and alternative assessments, and the impact of technology on language learning.

Uploaded by

Risa Fairus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views6 pages

Language Assessment

The document discusses the concepts and issues surrounding language assessment, emphasizing the distinction between assessment and testing, as well as the various types of assessments such as formative, summative, diagnostic, and proficiency tests. It highlights the principles of language assessment, including practicality, reliability, validity, authenticity, and washback, and their implications for classroom testing. The text also addresses the evolution of assessment methods, including traditional and alternative assessments, and the impact of technology on language learning.

Uploaded by

Risa Fairus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Name : Risa Fairus Zumar Nafisa

NIM : 204200043
Class : TBI B
Language Assessment
Chapter 1
ASSESSMENT CONCEPTS AND ISSUES

Assessment is “appraising or estimating the level or magnitude of some attribute of a person”


(Mousavi, 2009, p. 35). In educational practice, assessment is an Ongoing process that
encompasses a wide range of methodological techniques. Whenever a student responds to a
question, offers a comment, or tries a new word or structure, the teacher subconsciously
appraises the student’s performance. A good teacher never ceases to assess students, whether
those assessments are incidental or intended.

Tests on the other hand, are a subset of assessment, a genre of assessment techniques. They
are prepared administrative procedures that occur at identifiable times in a curriculum when
learners muster all their faculties to offer peak performance, knowing that their responses are
being measured and evaluated. In scientific terms, a test is a method of measuring a person's
ability, knowledge, or performance in a given domain

Test measures an individual's ability, knowledge, or performance. Testers need to


understand who the test-takers are. A test measures performance, but the results imply the test-
taker’s ability or, to use a concept common in the field of linguistics, competence. Most
language tests measure one’s ability to perform language, that is, to speak, write, read, or listen
to a subset of language.

Measurement is the process of quantifying the observed performance of classroom learners.


Bachman (1990) cautioned us to distinguish between quan- titative and qualitative descriptions
of student performance.

Evaluation does not necessarily entail testing; rather, evaluation is involved when the resuits
of a test (or other assessment procedure) are used to make decisions (Bachman, 1990, pp. 22-
23). Evaluation involves the interpretation of information. Simply recording numbers or
making check marks on a chart does not constitute evaluation.

Assessment and learning Although tests can be useful devices, they are only one among many
procedures and tasks that teachers can ultimately use to assess (and measure) students. For
optimal learning to take place, students in the classroom must have the freedom to experiment,
to try out their own hypotheses about language without feeling their overall competence is
judged in terms of those trials and errors. In the same way that tournament tennis players must,
before a tournament, have the freedom to practice their skills with no implications for their
final placement on that day of days.

Informal assessment can take a number of forms, starting with incidental, unplanned
comments and responses, along with coaching and other impromptu feedback to the student.
Examples include putting a smiley face on homework or saying “Nice job!” or “Good work!.
Informal assessment is virtually always nonjudgmental, in that you asa teacher are not making
ultimate decisions about the student’s performance.

Formal assessments are exercises or procedures specifically designed to tap into a storehouse
of skills and knowledge. They are systematic, planned sampling techniques constructed to give
teacher and student an appraisal of student achievement. To extend the tennis analogy, formal
assessments are the tournament games that occur periodically in the course of a regimen of
practice.
Formative, assessment evaluating students in the process of “forming” their competencies
and skills with the goal of helping them to continue that growth process. The key to such
formation is the delivery (by the teacher) and internalization (by the student) of appropriate
feedback on performance, with an eye toward the future continuation (or formation) of
learning.

Summative assessment aims to measure, or summarize, what a student has grasped and
typically occurs at the end of a course or unit of instruction. A summation of what a student
has learned implies looking back and taking stock of how well that student has accomplished
objectives, but it does not necessarily point to future progress. Final exams in a course and
general proficiency exams are examples of summative assessment. Summative assessment
often, but not always, involves evaluation (decision making).

Norm-referenced tests, each test-taker’s score is interpreted in relation to a mean (average


score), median (middle score). standard deviation (extent of variance in scores), and/or
percentile rank. The purpose of such tests is to place test-takers in rank order along a
mathematical continuum.

Criterion-referenced tests, on the other hand, are designed to give test- takers feedback,
usually in the form of grades, on specific course or lesson objectives. Classroom tests involving
students in only one course and connected to a particular curriculum are typical of criterion-
referenced testing.

TYPES AND PURPOSES OF ASSESSMENT


Achievement Tests they are (or should be) limited to particular material addressed in a
curriculum within a specific time frame and are offered after a course has focused on the
objectives in question.

Diagnostic test is to identify aspects of a language that a student needs to develop or that a
course should include. A test of pronunciation, for example, might diagnose the phonological
features of English that are difficult for learners and should therefore become part of a
curriculum.

Place- ment tests, the purpose of which is to place a student into a particular level or section
of a language curriculum or school. A placement test usually, but not always, includes a
sampling of the material to be covered in the various courses in a Curriculum; a student’s
performance on the test should indicate the point at which the student will find material neither
too easy nor too difficult but appropriately challenging.

Proficiency test is not limited to any one course, curriculum, or single skill in the language;
rather, it tests overall ability. Proficiency tests have traditionally consisted of standardized
multiplechoice items on grammar, vocabulary, reading comprehension, and aural
comprehension. Many commercially produced proficiency tests-the TOEFL, for example
include a sample of writing as well as oral production performance.

Aptitude test is designed to measure capacity or general ability to learn a foreign language a
priori (before taking a course) and ultimate predicted success in that undertaking. Language
aptitude tests were ostensibly designed to apply to the classroom learning of any language.

Integrative Approaches The discrete-point approach presupposed a decontextualization that


was proving to be inauthentic. So, as the profession emerged into an era emphasizing
communication, authenticity, and context, new approaches were sought. John Oller (1979).

Communicative Language Testing proposed a model of language competence consisting of


organizational and pragmatic competence, respectively subdivided into grammatical and
textual components and into illocutionary and sociolinguistic components

Traditional and “Alternative” Assessment However, research and practice during the 1990s
provided compelling arguments against the notion that all people and all skills could be
measured by traditional tests. The result was the emergence of what came to be labeled as
alternative assessment.

Performance-Based Assessment A characteristic of many (but not all) performance-based


language assessments is the presence of interactive tasks and hence another term, task based
assessment, for such approaches. J. D. Brown (2005) noted that this is perhaps not so much a
synonym for performance-based assessment as it is a subset thereof, in which the assessment
focuses explicitly on “particular tasks or task types” (p. 24) in a curriculum.

Draws parallels to dynamic assessment (DA), aprolearning form of assessment conceptually


based on Vygotskian approaches to education. DA, as its name suggests, contrasts sharply with
traditional assessment, which is static or stable over time. Instead, in DA, learner abilities are
considered malleable, not fixed.

Tests of pragmatics have primarily been informed by research in interlanguage and cross-
cultural pragmatics (Bardovi-Harlig & Hartford, 2016; BlumKulka, House, & Kasper, 1989;
Kasper & Rose, 2002; Stadler, 2013). Much of pragmatics research has focused on speech acts
(e.g., requests, apologies, refusals, compliments, advice, complaints, agreements, and
disagreements).

Technological innovation and applications of that technology to language learning and


teaching. then, that an overwhelming number of language courses use some form of computer-
assisted language learning (CALL) or mobileassisted language learning (MALL) to achieve
their goals, as recent publications show (H. D. Brown. 2007b: Chapelle, 2005: Chapelle &
Jamieson, 2008: de Szendeffy. 2005).

Chapter 2

PRINCIPLES OF LANGUAGE ASSESSMENT


Practicality refers to the logistical, down-to-earth, administrative issues involved in making,
giving, and scoring an assessment instrument. These include “costs, the amount of time it takes
to construct and to administer, ease of scoring, and ease of interpreting/reporting the results”
(Mousavi, 2009, p. 516).

A reliable test is consistent and dependable. If you give the same test to the same student or
matched students on two different occasions, the test should yield similar results.

Student-Related Reliability. The most common learner-related issue in reliability is caused


by temporary illness, fatigue, a “bad day,” anxiety, and other physical or psychological factors,
which may make an observed score deviate from one’s “true” score. Also included in this
category are such factors as a test-taker’s test-wiseness, or strategies for efficient test-taking
(Mousavi, 2009, p. 804).

Rater Reliability. Human error, subjectivity, and bias may enter into the scoring process.
Interrater reliability occurs when two or more scorers yield consistent scores of the same test.
Failure to achieve inter-rater reliability could stem from lack of adherence to scoring criteria,
inexperience, inattention, or even preconceived biases. Lumley (2002) provided some helpful
hints to ensure inter-rater reliability.

Test Administration Reliability. Unreliability may also result from the conditions in which
the test is administered.

Test Reliability In classroom-based assessment, test unreliability can be caused by many


factors, including rater bias. This typically occurs with subjective tests with open-ended
responses (e.g., essay responses) that require a judgment on the part of the teacher to determine
correct and incorrect answers. Objective tests, in contrast, have predetermined fixed responses,
a format that of course increases their test reliability.

Validity. By far the most complex criterion of an effective test and arguably the most important
principle is validity, “the extent to which inferences made from assessment results are
appropriate, meaningful, and useful in terms of the purpose of the assessment” (Gronlund,
1998, p. 226).

Content-Related Evidence If a test actually samples the subject matter about which
conclusions are to be drawn, and if it requires the test-taker to perform the behavior measured,
it can claim content-related evidence of validity, often popularly referred to as contentrelated
validity (e.g., Hughes, 2003; Mousavi, 2009).

Criterion-Related Evidence. A second form of evidence of the validity of a test may be found
in what is called criterion-related evidence, also referred to as criterion-related validity, or the
extent to which the “criterion” of the test has actually been reached.

Construct-Related Evidence. A third kind of evidence that can support validity, but one that
does not play as large a role for classroom teachers, is construct-related validity, commonly
referred to as construct validity.

Consequential validity encompasses all the consequences of a test, including such


considerations as its accuracy in measuring intended criteria, its effect on the preparation of
test-takers, and the (intended and unintended) social consequences of a test’s interpretation and
use.

Face validity refers to the degree to which a test looks right, and appears to measure the
knowledge or abilities it claims to measure, based on the subjective judgment of the examinees
who take it, the administrative personnel who decide on its use, and other psychometrically
unsophisticated observers” (Mousavi, 2009, p. 247).
Authenticity. A fourth major principle of language testing is authenticity, a concept that is
difficult to define, especially within the art and science of evaluating and designing tests.
Bachman and Palmer (1996) defined authenticity as “the degree of correspondence of the
characteristics of a given language test task to the features of a target language task” (p. 23)
and then suggested an agenda for identifying those target language tasks and for transforming
them into valid test items.

Washback A facet of consequential validity is “the effect of testing on teaching and learning”
(Hughes, 2003, p. 1), otherwise known in the language assessment field as washback. Messick
(1996, p. 241) reminded us that the washback effect may refer to both the promotion and the
inhibition of learning, thus emphasizing what may be referred to as beneficial versus harmful
(or negative) washback. Alderson and Wall (1993) considered washback an important enough
concept to define a washback hypothesis that essentially elaborated on how tests influence both
teaching and learning. Cheng, Watanabe, and Curtis (2004) devoted an entire anthology to the
issue of washback, and Spratt (2005) challenged teachers to become agents of beneficial
washback in their language classrooms. (See Cheng, 2014, for a more recent discussion of this
topic.)

Applying Principles To Classroom Testing. The five principles of practicality, reliability,


validity, authenticity, and washback go a long way toward providing useful guidelines for both
evaluating an existing assessment procedure and designing one on your own. Quizzes, tests,
final exams, and standardized proficiency tests can all be scrutinized through these five lenses.

Maximizing Both Practicality And Washback In many circumstances, assessment


techniques that strive to provide greater washback and, because of their authenticity, usually
carry greater content validity, all require considerable time and effort on the part of the teacher
and the student. But practicality, as seen in earlier sections, may come at the expense of
washback and authenticity. And here we have an age-old challenge to teachers and test
designers: the dilemma of maximizing both practicality and washback. The relationship can be
depicted in a hypothetical graph that shows practicality/reliability on one axis and
washback/authenticity on the other (Figure 2.1). Notice the presumed negative correlation: as
a technique increases in its washback and authenticity, its practicality and reliability tend to
decline. Conversely, the greater the practicality and reliability, the less likely you are to achieve
beneficial washback and authenticity. Three types of assessment are illustrated on the
regression line.

You might also like