0% found this document useful (0 votes)
161 views

Adaptive Testing & Compter Based Administration

Schools are increasingly turning to adaptive tests and computer-based administration as part of their assessment approach. Adaptive tests can help schools establish a baseline measure of student ability in an inclusive way. Computer adaptive testing (CAT) involves administering tests via computer that adapt to each student's ability level by selecting subsequent questions based on previous responses. CAT offers advantages over traditional paper tests like being individualized to each student, shorter to administer while maintaining precision, and potentially creating a more positive testing attitude.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views

Adaptive Testing & Compter Based Administration

Schools are increasingly turning to adaptive tests and computer-based administration as part of their assessment approach. Adaptive tests can help schools establish a baseline measure of student ability in an inclusive way. Computer adaptive testing (CAT) involves administering tests via computer that adapt to each student's ability level by selecting subsequent questions based on previous responses. CAT offers advantages over traditional paper tests like being individualized to each student, shorter to administer while maintaining precision, and potentially creating a more positive testing attitude.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

Introduction

Assessment in education has always been a very important issue, and each
school has its own approach.

Schools can find it challenging to embed a whole-school or departmental


approach to assessment that is clear, consistent and coherent: What is the
purpose of the assessment? What will progression within each subject look like?
What range of assessment strategies will be used? How will progress be
measured? How will the evidence be shared?

Increasingly, schools are turning towards adaptive tests as part of their whole-
school approach. What is very likely is that assessment, diagnosis, prognosis,
and placement by computer may soon completely replace paper and pencil
testing.

Adaptive test can help school establish a baseline measure of ability in a way
that is inclusive and can accommodate the full range of student abilities.

Brief History of Adaptive Testing

The idea of adaptive testing dates back to 1905 and the Stanford-Binet Scales.
These were designed to diagnose cognitive development in young children.
Understanding that students who were unable to answer an easy question were
unlikely to be able to answer a difficult one, Binet tailored the tests he gave by
rank ordering the items in terms of difficulty. He used different stopping rules for
ending the test session based on the pattern of a student’s responses. With
developments in technology, adaptive testing became feasible in large-scale
assessment.

The need for more objectivity in testing gradually led pedagogues to the use of
computers as precise measurement tools. The first tests computers were called
computer aided testing (Larson and Madsen, 1985). Computers were used as
word processors equipped with dictionary and/or a thesaurus, and students
were able to use the sources of references via computer during their writing test.
COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 1
ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

Computers were also used for fast computation of grades, assisting the testers in
their calculation. Then, computer experts, computerized paper-and pencil tests
and turned them into electronic test. Such tests, however, showed no real
difference from conventional tests, only that they were administered through a
non-standard medium.

Since computer assisted testing did not offer a real advantage over traditional
paper and pencil tests, further research was conducted. One such project,
financed by the US Department of Education, developed the first computerized
adaptive language test (CALT) first issued in 1986 at Brigham Young University.
One major difference between computer assisted testing and computer adaptive
testing is that the latter tailors the test to the student’s level.

What is a Test?

According to Anastasi (1968), a test is essentially an objective and standardized


measure of a sample of behaviour. To Ipaye (1980), a test is a set of tasks or
questions intended to elicit particular types of behaviour, well presented under
standardized conditions, expected to yield score that has desirable psychometric
properties.

What is Adaptive Test?

According to Larson (1989), the function of an adaptive test is to present items to


an examinee according to the correctness of his or her previous responses. If a
student answers an item correctly, a more difficult item is presented; and
conversely, if an item is answered incorrectly, an easier item is given. In short,
the test “adapts” to the examinee’s level of ability. The computer’s role is to
evaluate the student’s response, selects an appropriate succeeding item and
display it on the screen. The computer also notifies the examinee of the end of
the test and his or her level of performance.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 2


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

What is Test Administration?

Test Administration is concerned with the physical and psychological setting in


which students or test takers take the test for the test takers to do their best
(Airisian, 1994).

What is Computer Adaptive Testing?

Computer Adaptive Testing (CAT) is s form of computer-based test that adapts


to the examinee’s ability level. Consequent upon this reason, it has been called
‘tailored testing or personalized assessment.’ Thus, CAT is a form of computer-
administered test in which the next item or set of items selected to be
administered depends on the correctness of the test taker’s responses to the most
recent items administered.

What is Computer based Test Administration?

Computer based Test Administration is the process of administering a test to


student(s) or individual(s) by means of the computer system. It is synonymous
with computer based testing (CBT). Computer adaptive testing is an example of
Computer based test (CBT) administration.

How CAT works

CAT successively selects questions for the purpose of maximizing the precision of
the exam based on what is known about the examinee from previous
questions. From the examinee's perspective, the difficulty of the exam seems to
tailor itself to their level of ability. For example, if an examinee performs well on
an item of intermediate difficulty, they will then be presented with a more
difficult question. Or, if they performed poorly, they would be presented with a
simpler question. Compared to static multiple choice tests that nearly everyone
has experienced, with a fixed set of items administered to all examinees,
computer-adaptive tests require fewer test items to arrive at equally accurate

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 3


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

scores. (Of course, there is nothing about the CAT methodology that requires the
items to be multiple-choice; but just as most exams are multiple-choice, most
CAT exams also use this format.)

The basic computer-adaptive testing method is an iterative algorithm with the


following steps:

 The pool of available items is searched for the optimal item, based on the
current estimate of the examinee's ability
 The chosen item is presented to the examinee, who then answers it
correctly or incorrectly
 The ability estimate is updated, based upon all prior answers
 Steps 1–3 are repeated until a termination criterion is met

Nothing is known about the examinee prior to the administration of the first
item, so the algorithm is generally started by selecting an item of medium, or
medium-easy, difficulty as the first item.

As a result of adaptive administration, different examinees receive quite different


tests. The psychometric technology that allows equitable scores to be computed
across different sets of items is item response theory (IRT). IRT is also the
preferred methodology for selecting optimal items which are typically selected on
the basis of information rather than difficulty, per se.

In the United States, the Graduate Management Admission is currently primarily


administered as a computer-adaptive test. A list of active CAT programs is found
at international Association for Computerized adaptive Testing, along with a list
of current CAT research programs and a near-inclusive bibliography of all
published CAT research.

A related methodology called multistage testing (MST) or CAST is used in the


Uniform Certified Public Accountant Examination.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 4


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

Advantages of CAT

The long term benefits of computer adaptive testing over conventional paper-and-
pencil tests remain to be determined. What are the immediate advantages of
computer adaptive tests?

According to (Dandonoli 1989, Larson 1989; Stansfield 1990; Madsen 1986,


1991).

i. Tests are individualized. Since Cat tailors and adapts to the


examinee’s level, results are now interpreted with respect to a specific
level of ability, not in relation to the performance of a particular group of
individuals. Even if the item bank is determined on the basis of a norm-
referenced assessment procedure, the use of the computer adaptive
version is criterion-referenced since students are evaluated in reference
to a certain level of ability, not in relation to the group.

ii. Computer adaptive tests are shorter. Since computer adaptive tests
adapt to the examinee’s level, questions that are above or below the
examinee’s ability level will not be submitted. As such, a CAT can be
administered in a shorter period of time while still reaching precise
information about the student’s level. Madsen (1991) reported that “over
80% of students required fewer than 50% of the reading items normally
administered on paper-and-pencil test” (Madsen 1991, 250).

iii. Create a more positive attitude toward tests. Again, since computer
adaptive testing is shorter, due to its ability to focus on the examinee’s
level, students feel less bored with questions that are too easy and less
frustrated with questions that are too difficult since such questions are
not submitted. Madsen (1986) indicated that among students taking
both paper-and-pencil test and a computer adaptive, 81% expressed a
more positive attitude toward CAT.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 5


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

iv. Immediate report of test results: The instantaneous computation of


scores allows immediate reporting of test results once the test is over
(Dandonoli, 1989). This is an advantage for both students (who are
usually eager to know how well they did), and for testers (who don’t
have to spend time correcting and grading).

v. Self-Pacing: Computer adaptive tests are not limited in time, although


some programs take into consideration the time spent by examinees per
test item as part of the test result. Even so, students are generally not
directly pressured by time, and as such, time related anxiety does not
affect test results. It has been argued, however, that a lack of uniform
distribution of time across items can strongly affect the validity CAT
(Henning, 1991).

vi. Measurement precision: The superiority of CAT to conventional tests


has been established in terms of high reliability and validity. Since high
reliability coefficient statistically implies a low standard error of
measurement (also defined as standard deviation of error scores), it
implies that the measurement potentials of computer adaptive testing
are reliable.

vii. Improved test security: Since the ability level of every examinee is
different, and since every examinee is given an individualized test, no
information that would directly help other students can be passed
around.

viii. CAT may be used to catch early onset of disabilities or diseases:


Large target populations can generally be exhibited in scientific and
research-based fields. CAT testing in these aspects may be used to
catch early onset of disabilities or diseases. The growth of CAT testing in
these fields has greatly increased in the past decade. Once not accepted
in medical facilities and laboratories, CAT testing is now encouraged in
the scope of diagnostics.
COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 6
ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

Disadvantages (Canale 1986; Carton et al. 1991; Lange 1990; Tung 1986)

i. Incapability of CAT to handle open ended questions: Today’s


software cannot handle the systematic scoring of all possible open
ended questions. CAT is limited to such test format as multiple choice
questions, cloze activities, and jumbled sentences.

ii. A questionable construct validity: Language competence for example,


is the result of more than one construct, e.g., grammatical competence,
textual competence, illocutionary competence, and sociolinguistic
competence (Bachman, 1991). It has been argued that the
unidimensionality of Item Response Theory raises the fundamental
issue of construct validity for CAT. How feasible is it to test a
multidimensional construct such as language ability with
unidimensional approach such as Item Response Theory (Lange 1990;
Tung, 1986)?

iii. Test Design takes a tremendous amount of time: In order to develop


a computer adaptive test, a large sample of test items first needs to be
calibrated; then time is needed to sequence items based on a continuum
of difficulty, and finally a pilot test is required prior to administering the
final version of a computer adaptive test.

Although CAT’s advantages contribute a great deal to improving testing,


the limitations mentioned above led Canale (1986) to wonder whether or
not computer adaptive testing is truly a “threat or a promise.”

Other Issues

Pass-Fail

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 7


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

In many situations, the purpose of the test is to classify examinees into two or
more mutually exclusive and exhaustive categories. This includes the common
"mastery test" where the two classifications are "pass" and "fail," but also
includes situations where there are three or more classifications, such as
"Insufficient," "Basic," and "Advanced" levels of knowledge or competency. The
kind of "item-level adaptive" CAT described in this paper is most appropriate for
tests that are not "pass/fail" or for pass/fail tests where providing good feedback
is extremely important. Some modifications are necessary for a pass/fail CAT,
also known as a computerized classification test (CCT). For examinees with true
scores very close to the passing score, computerized classification tests will result
in long tests while those with true scores far above or below the passing score
will have shortest exams.

For example, a new termination criterion and scoring algorithm must be applied
that classifies the examinee into a category rather than providing a point
estimate of ability. There are two primary methodologies available for this. The
more prominent of the two is the sequential probability ratio test (SPRT). This
formulates the examinee classification problem as a hypothesis test that the
examinee's ability is equal to either some specified point above the cutscore or
another specified point below the cutscore. Note that this is a point hypothesis
formulation rather than a composite hypothesis formulation that is more
conceptually appropriate. A composite hypothesis formulation would be that the
examinee's ability is in the region above the cutscore or the region below the
cutscore.

A confidence interval approach is also used, where after each item is


administered, the algorithm determines the probability that the examinee's true-
score is above or below the passing score. For example, the algorithm may
continue until the 95% confidence interval for the true score no longer contains
the passing score. At that point, no further items are needed because the pass-
fail decision is already 95% accurate, assuming that the psychometric models
underlying the adaptive testing fit the examinee and test. This approach was
originally called "adaptive mastery testing” but it can be applied to non-adaptive

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 8


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

item selection and classification situations of two or more cutscores (the typical
mastery test has a single cutscore).

As a practical matter, the algorithm is generally programmed to have a minimum


and a maximum test length (or a minimum and maximum administration time).
Otherwise, it would be possible for an examinee with ability very close to the
cutscore to be administered every item in the bank without the algorithm making
a decision.

The item selection algorithm utilized depends on the termination criterion.


Maximizing information at the cutscore is more appropriate for the SPRT because
it maximizes the difference in the probabilities used in the likelihood
ratio. Maximizing information at the ability estimate is more appropriate for the
confidence interval approach because it minimizes the conditional standard error
of measurement, which decreases the width of the confidence interval needed to
make a classification.

Practical constraints of adaptivity

ETS researcher Martha Stocking has quipped that most adaptive tests are
actually barely adaptive tests (BATs) because, in practice, many constraints are
imposed upon item choice. For example, CAT exams must usually meet content
specifications, a verbal exam may need to be composed of equal numbers of
analogies, fill-in-the-blank and synonym item types. CATs typically have some
form of item exposure constraints, to prevent the most informative items from
being over-exposed. Also, on some tests, an attempt is made to balance surface
characteristics of the items such as gender of the people in the items or the
ethnicities implied by their names. Thus CAT exams are frequently constrained
in which items it may choose and for some exams the constraints may be
substantial and require complex search strategies (e.g. linear programming) to
find suitable items.

A simple method for controlling item exposure is the "randomesque" or strata


method. Rather than selecting the most informative item at each point in the
test, the algorithm randomly selects the next item from the next five or ten most

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 9


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

informative items. This can be used throughout the test, or only at the
beginning. Another method is the Sympson-Hetter method, in which a random
number is drawn from U(0,1), and compared to a ki parameter determined for
each item by the test user. If the random number is greater than ki, the next
most informative item is considered.

Wim van der Linden and colleagues have advanced an alternative approach


called shadow testing which involves creating entire shadow tests as part of
selecting items. Selecting items from shadow tests helps adaptive tests meet
selection criteria by focusing on globally optimal choices (as opposed to choices
that are optimal for a given item).

Multidimensional

Given a set of items, a multidimensional computer adaptive test (MCAT) selects


those items from the bank according to the estimated abilities of the student,
resulting in an individualized test. MCATs seek to maximize the test's accuracy,
based on multiple simultaneous examination abilities (unlike a computer
adaptive test – CAT – which evaluates a single ability) using the sequence of
items previously answered (Piton-Gonçalves and Aluisio, 2012).

COMPONENTS OF CAT

There are five (5) technical components in building a CAT (the following is
adapted from Weiss & Kingsbury, 1984). It should be noted that this list does not
include practical issues, such as item pretesting or live field release.

1. Calibrated item pool


2. Starting point or entry point
3. Item selection algorithm
4. Scoring procedure
5. Termination criterion

Calibrated item pool

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 10


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

A pool of items must be available for the CAT to choose from. Such items can be
created in the traditional way (i.e., manually) or through Automatic Item
Generation. The pool must be calibrated with a psychometric model, which is
used as a basis for the remaining four components. Typically, item response
theory is employed as the psychometric model. One reason item response theory
is popular is because it places persons and items on the same metric (denoted by
the Greek letter theta), which is helpful for issues in item selection.

Starting Point

In CAT, items are selected based on the examinee's performance up to a given


point in the test. However, the CAT is obviously not able to make any specific
estimate of examinee ability when no items have been administered. So some
other initial estimate of examinee ability is necessary. If some previous
information regarding the examinee is known, it can be used but often the CAT
just assumes that the examinee is of average ability - hence the first item often
being of medium difficulty.

Item selection algorithm

As mentioned previously, item response theory places examinees and items on


the same metric. Therefore, if the CAT has an estimate of examinee’s ability, it is
able to select an item that is most appropriate for that estimate. Technically, this
is done by selecting the item with the greatest information at that
point. Information is a function of the discrimination parameter of the item, as
well as the conditional variance and pseudoguessing parameter (if used).

Scoring Procedure

After an item is administered, the CAT updates its estimate of the examinee's
ability level. If the examinee answered the item correctly, the CAT will likely
estimate their ability to be somewhat higher, and vice versa. This is done by
using the item response function from item response theory to obtain a likelihood

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 11


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

function of the examinee's ability. Two methods for this are called maximum
likelihood estimation and Bayesian estimation.

Termination criterion

The CAT algorithm is designed to repeatedly administer items and update the
estimate of examinee ability. This will continue until the item pool is exhausted
unless a termination criterion is incorporated into the CAT. Often, the test is
terminated when the examinee's standard error of measurement falls below a
certain user-specified value, hence the statement above that an advantage is that
examinee scores will be uniformly precise or "equiprecise.” Other termination
criteria exist for different purposes of the test, such as if the test is designed only
to determine if the examinee should "Pass" or "Fail" the test, rather than
obtaining a precise estimate of their ability.

Validity and Reliability of CAT as a measuring tool

As mentioned earlier in this paper, one advantage of computer adaptive testing is


its high reliability coefficient due to the fact that measurement errors are reduced
to a minimum. The reliability coefficient showed score consistency in test – retest
conditions (Madsen, 1991).

Whereas reliability essentially deals with measurement errors and test scores,
validity goes beyond measurement issues and deals with test interpretation and
use. Reliability is, of course, a prerequisite for validity, but nor sufficient in itself
to make a test valid. In examining validity, we raise questions such as: “Does the
test measure the ability and content which are supposed to be measured?”, “Is
the test biased?” “What is the impact of a specific test design?”

… While reliability is a quality of scores themselves, validity is a quality of


test interpretation and use…

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 12


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

… Judging the extent to which an interpretation or use of a given test score


is valid thus requires the collection of evidence supporting the relationship
between the test score and an interpretation of use (Bachman 1991, 25;
243).

The issue of validity related to computer adaptive language testing has been
addressed through its many facets:

 Content validity
 Concurrent validity
 Predictive validity
 Construct validity
 Face validity
 Test bias

Content Validity

A test is said to have content validity if its content is a representative sample of


what it is intended to measure (Bachman 1991). Research shows that CAT
content validity can be achieved (Kaya-Carton, et al., 1991).

Considering the possible lack of “inter-consultant” reliability in some assessment


situations, another approach to probing validity is to pilot test with a large
sample of subjects to get a range of difficulty index for calibration purposes: “The
calibration of the items resulting from applying IRT [Item Response Theory]
provides evidence of items content validity” (Kaya-Carton, et al. 1991, 276).

Concurrent Validity

In examining concurrent validity, differences in test performance among groups


are analyzed, or correlations among various measures of a given ability are
established (Bachman 1991). Kaya-Carton, et al. (1991, 277) reported the
concurrent validity of their computer adaptive test of French reading proficiency
in these terms:

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 13


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

The computer adaptive test had been administered to individuals whose


French reading proficiency had been diagnosed independently either by the
oral interview method or by expert judgment… the test results agree with
the independent judgments, and suggest concurrent validity for the test.

Kaya-Carton, et al. suggests that concurrent validity is more systematically


examined by means of correlations across various measures; they underline the
fact that concurrent validity established across groups may, indeed, be
questionable since groups may vary in many ways.

Another aspect that may jeopardize concurrent validity for CAT is the fact that
test items may actually function differently, depending on whether a test is
administered in a paper-and-pencil mode or via computer: “A potential threat to
test validity centers around the possibility that test items may actually function
differently depending on the mode of presentation” (Henning 1991, 214).

Predictive Validity

When examining predictive validity, we look at the degree to which a test can
predict students’ future performance (Bachman 1991). In Larson’s report (1989)
of a Spanish computerized adaptive placement exam, the predictive validity was
established by inquiring – once courses were underway – about the proportion of
students who had been placed appropriately:

Only three of the [179] teachers interviewed reported that their students
had been placed too high. The majority of those who indicated their
student(s) had been poorly placed said the placement should have been
one course higher, meaning that, for the most part, the errors in placement
seemed to be conservative (Larson 1989, 284).

Overall, in this study, 79.9 percent of the teachers indicated that their students
had been appropriately oriented, which represents a high predictive coefficient.

Construct Validity

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 14


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

When examining construct validity, we look at how well a test measures the
ability which is supposed to be measured (Bachman 1991). In their report of a
computer adaptive reading proficiency test, Kaya-Carton, et al. indicated that the
first step for establishing construct validity was to provide an operational
definition of what reading proficiency is, in order to isolate the elements to be
measured. However, as mentioned earlier in this paper, since language
proficiency is the result of more than one construct, the unidimensionality of CAT
raises the issue of compatibility between the potential of computer adaptive
testing and the assessment of language performance (Lange 1990; Madsen 1991;
Canale 1986; Tung 1986). Canale suggests that the unidimensionality of Item
Response Theory seriously limits the construct validity of CAT.

Face Validity

Face validity is the public acceptability of a test as valid, a belief which is not
grounded on its theoretical value but rather on its surface credibility (Bachman
1991). Keeping this definition in mind, CAT is actually developing strong face
validity from the very fact that computers are known to be objective and accurate
graders.

The face validity of CAT, however, has been put into question for various reasons
(Henning 1991):

 Pacing differences: Most CATs so far are not uniformly paced, and
examinees take various amounts of time to answer test questions. For this
reason, the face validity of CAT becomes questionable: the lack of
homogeneous pacing is sometimes viewed as unfairness instead of
individualization.

 Test length: since a test stops when an examinee’s ability level has been
accurately estimated, tests of ten vary in length across examines. Again,
this may raise a sense of unfairness.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 15


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

If one person’s score is based on 25 items while another person’s score on the
same test is based on 35 items encountered, even though the ability estimates
derived can be shown to have identical accuracy attached to them, there is a
potential threat to face validity. Any given examinee could object that the same
number of opportunities (items) was not afforded to every examinees (Henning
1991, 218).

In order to mitigate such objections, students, teachers, and the public in general
should be clearly informed that CAT is criterion-referenced, not norm-referenced.
The difference is crucial. Norm-referenced tests are interpreted in relation to the
results of a group which constitute the reference point as well as the norm;
criterion-referenced tests, however, are interpreted with respect to a specific level
of ability, an approach which is part of computer adaptive language testing based
on the recognition that every foreign language student is potentially different.

Test Bias Related to CAT in Particular

Test bias can be detected by investigating errors nor directly related to the ability
being tested (Bachman 1991). In computer adaptive testing, test bias has been
reported to stem essentially from computer anxiety. CAT can be biased in favour
of examinees who are already familiar with computers. Henning (1991) points
out, however that the extent to which computer anxiety differs from test anxiety
is not clear.

Implications for future practices in CAT

Is computer adaptive testing a real improvement? Does it fit today’s education


philosophy? Does it allow assessment in relatively authentic situations?

Since computer adaptive testing tailors test to the students’ level, it certainly
allows teachers to have a better assessment of the students’ personal ability.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 16


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

Computers allow individualized reports, not only in terms of scores but also in
terms of strengths and weaknesses. What did the student accomplish? What was
the student expected to accomplish? To answer these questions, CAT provides
both a summative and a formative evaluation: It not only assess the students’
progress in terms of grades, but it also evaluates how effective both the students’
learning and the instructor’s teaching have been. If weaknesses are similar
across examinees, instructional procedures may have to be changed. If
weaknesses are specific to some students, the student’s learning approach may
need to be looked at. In either case, computers are certainly excellent tools to
locate such types of information in a short period of time.

Since most conventional tests are nonreciprocal in nature, CAT certainly


improves the testing conditions through its interactional potential.

Computer adaptive testing is still as it were in its infancy and will hopefully
follow the same route as computer assisted instruction (CAI): from electronic
workbooks, CAI has reached the era of live action simulations; hopefully, from
computer adaptive testing, CAT will probably move toward “computer adaptive
task based assessment.”

CONCLUSION

Computer adaptive testing is a relatively new computer based testing approach


which offers substantial advantages such as the opportunity for examinees to
pace themselves; the ability to individualize tests, to make tests shorter; to
promote a more positive attitude toward tests; to report test results immediately
(i.e. in real time); to measure tests with precision; and to improve test security.

CAT also offer the advantage of tailoring tests to the student’s level of ability. The
item sequencing of CAT is based on a continuum of difficulty; items are
calibrated according to their difficulty index (also called item easiness index). CAT
tests are administered according to pre-established indices of increment and
decrement. However CAT is fraught with some challenges as the handling of
open-ended question and construct validity.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 17


ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

REFERENCES

Computerized Adaptive Testing. Retrieved on 15th July, 2021 from


https://en.wikipedia.org/wiki/Conputerized_adaptive_testing

Henning, G. (1991). Validating and Item Bank in a Computer Assisted or


Computer Adaptive Test. Using Item Response Theory for the Process of
validating CATS. Computer Assisted Language Learning and Testing:
Research Issues and Practice, edited by P. Dunkel. New York, NY:
Newbury House.

Madsen, H. S. (1986). Evaluating a Computer Adaptive ESL Placement Test.


CALICO Journal 4, 21-50.

Osa-Edo, G. (2016). Effective Study Habits for Excellent performance, Revised


Edition.

Thissen, D., & Mislevy, R. J. (2000). Testing Algorithms. In Wainer, H. (Ed.)


COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 18
ADAPTIVE TESTING & COMPTER BASED ADMINISTRATION

Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence


Erlbaum Associates.

Tung, P. (1986). Computerized Adaptive Testing: Implications for language


Test Developers. Technology and Language Testing, edited by C. W.
Stansfield. Washington, DC: TESOL.

COURSE LECTURERS PROF. (MRS.) G. OSA-EDOH & DR. P. K. ADEOSUN 19

You might also like