Testing and Evaluation By: Abla Ben Bellal
Testing and Evaluation By: Abla Ben Bellal
2.1.2. Summative Assessment : is used at the end of the term, semester, or year
in order to measure what has been achieved both by groups and by
individuals.
3. DEFINITION OF EVALUATION:
It is the process of making overall judgment about one’s work or a whole
school’s work.
Evaluation is typically broader concept than assessment as it focuses on the
overall, or summative experience.
5. TYPES OF TESTS :
6.1. PRACTICALITY
Validity and reliability are not enough to build a test. Instead, the test should be
practical across time, cost, and energy. Dealing with time and energy, tests should be
efficient in terms of making, doing, and evaluating. Then, the tests must be affordable.
It is quite useless if a valid and reliable test cannot be done in remote areas because it
requires an inexpensive computer to do it (Heaton, 1975: 158-159; Weir, 1990: 34-35;
Brown, 2004: 19-20).
6.2. RELIABILITY
A reliability test is consistent and dependable. A number of sources of unreliability
may be identified:
·
a. Students-related Reliability
A test yields unreliable results because of factors beyond the control of the test taker,
such as illness, fatigue, a “bad day”, or no sleep the night before.
A. Content-related Validity
A test is said to have content validity when it actually samples the subject matter about
which conclusion are to be drawn, and require the test–taker to perform the behavior
being measured. For example, speaking ability is tested using speaking performance, not
pencil and paper test. It can be identified when we can define the achievement being
measured.
It can be achieved by making a direct test performance. For example to test pronunciation
teacher should require the students to pronounce the target words orally.
There are two questions are used to applying content validity in classroom test:
1. Are classroom objectives identified and appropriately framed? The objective should
include a performance verb and specific linguistic target.
2. Are lesson objectives represented in the form of test specification? A test should
have a structure that follows logically from the lesson or unit being tested. It can be
designed by dividing the objectives into sections, offering students a variety of item
types, and gives appropriate weight to each section.
B. Criterion-related Validity
The extent to which the “criterion” of the test has actually been reached. It can be best
demonstrated through a comparison of result of an assessment with result of some other
measure of the same criterion.
Criterion-related validity usually falls into two categories:
1. Concurrent Validity: if the test result supported by other concurrent performance
beyond assessment. (e.g.: high score in English final exam supported by actual
proficiency in English)
C. Construct-related Validity
Construct validity ask “Does the test actually touch into the theoretical construct
as it has been defined?”. An informal construct validation of the use of virtually every
classroom test is both essential and feasible. For example, the scoring analysis of
interview includes pronunciation, fluency, grammatical accuracy, vocabulary used and
sociolinguistics appropriateness. This is the theoretical construct of oral proficiency.
Construct validity is a major issue in validating large-scale standardized test of
proficiency.
D. Consequential Validity
It includes all the consequences of a test, such as its accuracy in measuring the
intended criteria, its impact on the test-takers preparation, its effect on the learner, and
the social consequences of the test interpretation and use. One aspect of consequential
validity which draws special attention is the effect of test preparation courses and
manual on performance.
E. Face Validity
Face validity is the extent to which students view the assessment as fair, relevant, and
useful for improving learning. It means that students perceive the test to be valid. It
will be perceived valid if it samples the actual content of what the learners has
achieved or expect to achieve. Nevertheless the psychological state of the test-taker
(confidence, anxiety) is an important aspect in their peak performance.
Test with high face validity has the following characteristics:
• Well constructed, expected format with familiar task.
• Clearly doable within the allotted time.
• Clear and uncomplicated test item.
• Crystal clear direction.
• Task that relate to students course work.
• A difficulty level that present a reasonable challenge.
6.5. WASHBACK
In general terms, washback means the effect of testing on teaching and learning. In
large-scale assessment, it refers to the effects that test have on instruction in the terms
of how the students prepare for the test. While in classroom assessment, washback
means the beneficial information that washesback to the students in the form of useful
diagnoses of strengths and weaknesses.
In enhancing washback, the teachers should comment generously and specifically
on test performance, respond to as many details as possible, praise strengths, criticize
weaknesses constructively, and give strategic hints to improve performance.
The teachers should serve classroom tests as learning device through which
washback is achieved. Students’ incorrect responses can become windows of insight
into further work. Their correct responses need to be praised, especially when they
represent accomplishments in a student’s inter-language.
Washback enhances a number of basic principles of language acquisition: Intrinsic
motivation, autonomy, self-confidence, language ego, inter-language, and strategic
investment, among others.
One way to enhance washback is to comment generously and specifically on test
performance. Washback implies that students have ready access to the teacher to
discuss the feedback and evaluation he/she has given.
7. WAYS OF TESTING:
They are not focused directly on the language items, but on the scores the students
can get
Norm Referenced (proficiency & placement tests) Criterion Referenced (achievement & diagnostic)
*Norm-referenced tests refer to standardized tests that *Criterion-referenced tests are designed to measure
are designed to compare and rank test takers in relation students’ performance against a fixed set of criteria
to one another. This type of tests reports whether test or learning standards. That is to say, they are written
takers performed better or worse than a hypothetical descriptions of what students are expected to know
average student. It is designed to measure global and be able to do a lot at a specific stage of their
language abilities, such as overall English language education. CRTs provide information on whether
proficiency and academic listening ability, in which students have attained a predetermined level of
each student’s score is interpreted relative to the scores performance called “mastery”.
of all other students who took the test.
* students know the general format of the questions but * students can predict both the questions formats on
not the language points or content to be tested by those the test and the language points to be tested.
questions Teaching to such a test should help teachers and
students stay on track .Besides, the results should
provide a useful feedback on the effectiveness of
teaching and learning processes.
*Itinvolves the knowledge of grammar and how it could be applied in written and oral language; the
knowledge when to speak and what to say in an appropriate situation; knowledge of verbal and non-
verbal communication. All these types of knowledge should be successfully used in a situation
*without a context the communicative language test would not function. The context should be as
closer to the real life as possible. It is required in order to help the student feel him/herself in the natural
environment.
*the student has to possess some communicative skills, that is how to behave in a certain situation, how
to apply body language, etc.
*Communicative language testing involves the learner’s ability to operate with the language s/he knows
and apply it in a certain situation s/he is placed in. S/he should be capable of behaving in real -life
situation with confidence and be ready to supply the information required by a certain situation.
Thereof, we can speak about communicative language testing as a testing of the student’s ability to
behave him/herself, as he or she would do in everyday life. We evaluate their performance.
Spolsky (1975) identifies three stages in the recent history of language testing: 1) The
pre-scientific 2) the psychometric-structuralist and 3) the psycho-linguistic-
sociolinguistic.
However, the following triple objectives were achieved from discrete tests, which was
the result of the coalescence of the two fields.
The psychometric-structuralist movement was important because for the first time
language test development followed scientific principles. In addition, Brown (1996)
maintains that psychometric-structuralist movement could be easily handled by trained
linguists and language testers. As a result, statistical analyses were used for the first
time. Interestingly, psychometric-structuralist tests are still very much in evidence
around the world, but they have been supplemented by what Carrol (1972)
called integrative tests.
With the attention of linguists inclined toward generativism and psychologist toward
cognition, language teachers adopted the cognitive-code learning approach for
teaching a second and/or foreign language. Language professionals began to believe
that language is more than the sum of the discrete elements being tested during the
psychometric-structuralist movement (Brown, 1996; Heaton 1991; Oller, 1979).
The criticism came largely from Oller (1979) who argued that competence is a unified
set of interacting abilities that cannot be tested apart and tested adequately. The claim
This movement has certainly its roots in the argument that language is creative.
Beginning with the work of sociolinguists like Hymes (1967), it was felt that the
development of communicative competence depended on more than simple
grammatical control of the language; communicative competence also hinges on the
knowledge of the language appropriate for different situations.
Tests typical of this movement were the cloze test and dictation, both of which assess
the students’ ability to manipulate language within a context of extended text rather
than in a collection of discrete-point questions. The possibility of testing language in
context led to further arguments that linguistic and extralinguistic elements of
language are interrelated and relevant to human experience and operate in
orchestration.
Consequently, the broader views of language, language use, language teaching, and
language acquisition have broadened the scope of language testing, and this brought
about a challenge that was articulated by Canale (1984) as the shift in emphasis from
language form to language use. This shift of focus placed new demands on language as
well as language testing.