Language Testing
Language Testing
CHAPTER TWO
CHARACTERISTICS OF A
GOOD TEST
1- Validity
2- Reliability
3- Practicality
Any test that we use must be appropriate and applicable to our objectives.
Dependable in the evidence it provides.
Applicable to our particular situation.
without any one of these three qualities a test would be a poor test.
RELIABILITY
1- The meaning of reliability
2- Types of estimates of reliability.
3- Estimating the reliability of speeded tests
4- The question of satisfactory reliability.
5- The standard error of measurement.
1- The meaning of reliability.
Reliability is the stability of test scores. A test cannot measure anything well
unless it measures consistently.
To have confidence in a measuring instrument, we would need to be assured
that approximately the same results would be obtained.
If we tested a group on Tuesday instead of Monday.
If we gave two parallel forms of the test to the same group on Monday and on
Tuesday.
If we scored a particular test on Tuesday instead of Monday.
If two or more competent scorers scored the test independently.
Two types of consistency or reliability
3- Giving a single administration of one form of the test dividing the items
into two halves, obtaining two scores for each individual.
4- Rational equivalence. Reliability here is estimated from a single
administration of one form. We are concerned with inter-item consistency as
determined by the proportion of persons who pass and who don’t pass each
item.
Speeded Tests.
The items of the test are easy but the time limit is short. Neither the split-half
nor the rational equivalence should be used with speed tests.
Test-retest or parallel forms are the methods best adapted to measure speed
test reliability.
Satisfactory Reliability
Quotient of 1.00 indicates a perfect or reliable test.
Standard test to make individual diagnoses would have at least 0.90
Homemade tests would run somewhat lower in the 0.70s or 0.80s.
Reliability can be increased by lengthening the test additional material must be
similar in quality and difficulty to the original.
The Standard Error of Measurement
An obtained score on any test consists of the “True” score plus a certain
amount of test error.
A student may score 60 on an English entrance test and 55 when retested with
an equivalent form of the test. Five points decrease is probably not
statistically important.
In short, reliability refers simply to the precision with which the test measures.
No matter how high the reliability quotient, it is by no means a guarantee that
test measures what the test user wants to measure.
VALIDITY
What precisely does the test measure?
How well does the test measure?
A test must be based on a sound analysis of the skill or skill we wish to
measure.
There must be sufficient evidence that test scores correlate fairly highly with
actual ability the skills area being tested. Then we assume that the test is valid
for our purposes.
Types of Validity:
1- Content Validity
2- Empirical Validity
3- Face Validity
Content Validity:
If a test designed to measure mastery of a specific skill or the content of a
particular course of study , we should expect the test to be based upon a
careful analysis of the skill or an outline of the course.
In choosing a test, we cannot simply accept the title which the authors have
given it, for titles very often are misleading.
We should expect the test makers to be able to provide us with information
about the specific materials or skills being tested, and the basis for their
selection.
Empirical validity:
The best way to check on the actual effectiveness of a test is to determine how
test scores are related to some independent, outside criterion such as marks
given at the end of a course ratings.
If there is a high correlation between test scores and a trustworthy external
criterion, we are justified in putting our confidence in the empirical validity of
the test.
Two kinds of empirical validity:
Predictive
Concurrent.
if we use a test of English as a second language to screen university
applicants and then correlate test scores with grades made at the end of
the first semester, we are attempting to determine the predictive validity
of the test.
If we follow up the test immediately by having an English teacher rate
each student’s English proficiency on the basis of his class performance
during the first week and correlate the two measures we are seeking to
establish the concurrent validity of the test.
Empirical Validity depends on the reliability of the test and the criterion
measure.
Face Validity
/we simply mean the way the test looks to the examinees.
Its importance should not be underestimated.
Content must be relevant and appropriate.
Test makers must always keep face validity in mind.
Practicality
Refers to
1- Economy.
2- Ease of administration and scoring.
3- Ease of interpretation.
Economy
It refers to the cost per copy. Whether the test books are reusable.
Number of scorers and administrators needed.
Time allowed for administration and scoring.
Ease of administration and scoring.
Full test directions provided.
Test requirements ( mechanical devices, rooms available. number of
examinees)
Scoring the test subjectively or objectively
Ease of Interpretation
If a standard test is being adopted
Examine the date of publication, check if there is an up to date test manual for
information about both reliability and validity. If the test items are appropriate.