Test Construction and Administration
Test Construction and Administration
Test Construction
(II) Introduction: Test is the most widely used approach of all assessment methods; testing has
been the centre of discussion and debate among educators for years. We may use the
evidence to make statements about student competence or make decisions about the next
aspect of teaching for particular students. Testing consists of four primary steps: test
construction, test administration, scoring and analysing the test. Each of these steps can result
in various test forms and elicit a variety of valuable outcomes.
Types of Test
Broad Categories
Formative – Short term assessment such as classroom assessment techniques (CAT)
Summative– Long term assessment such as comprehensive final exams
• Proficiency Test: check learner levels about general standards. They provide a broad
picture of knowledge and ability.
1
Short-term progress tests check how well students have understood or learned the material
covered in specific units or chapters. They enable the teacher to decide if remedial or
consolidation work is required.
Long-term progress tests check the learners’ progress over the course. They enable the
students to judge how well they have progressed. Administratively, they are often the sole
basis of decisions to promote to a higher level.
b) Validity
Validity indicates how well an assessment is measuring what it is supposed/claim to measure.
Approaches to Validity
i. Face Validity: is the extent to which a test is subjectively viewed as covering the concept
it purports to measure. It refers to the transparency or relevance of a test as it appears to
test participants.
ii. Content Validity: Content validity assesses whether a test is representative of all aspects
of the construct.
iii. Construct Validity: evaluates whether a measurement tool represents the thing we are
interested in measuring. It’s central to establishing the overall validity of a method.
iv. Criterion Validity: evaluates how closely your test results correspond to the results of a
different test.
c) Reliability
Reliability tells you how consistently a method measures something. You should get the
same results when you apply the same method to the same sample under the same
conditions. If not, the method of measurement may be unreliable. There are four main types
of reliability. Each can be estimated by comparing different sets of results produced by the
same method.
i. Test-retest reliability: measures the consistency of results when you repeat the same test
on the same sample at a different point in time. You use it when you measure something
that you expect to stay constant in your sample.
ii. Interrater reliability: (also called interobserver reliability) measures the degree of
agreement between different people observing or assessing the same thing. For example,
when researchers collect data, you use it to assign ratings, scores or categories to one or
more variables.
2
iii. Parallel forms reliability: measures the correlation between two equivalent versions of a
test. You use it when you have two different assessment tools or sets of questions
designed to measure the same thing.
iv. Internal consistency: assesses the correlation between multiple items in a test intended
to measure the same construct.
Weight to Content: Indicates the various aspects of the content to be tested and the
weight to be given
3
Weight to Difficulty Level: Indicates weight to be given to different levels of
questions
4
Domains of Learning/Levels of Assessment
5
\Example of a blue print cognitive domain
7
Depth of knowledge: Some question types are better at tapping higher-order thinking
skills, such as analysing or synthesising, while others are better for surface level
recall.
Processing speed: Some question types are more easily processed and can be more
quickly answered. This can impact the timing of the test and the distribution of
students’ effort across different knowledge domains.
All test items should:
Assess the achievement of learning outcomes for the unit and/or course
Measure essential concepts and their relationship to that unit and course
Align with your teaching and learning activities and the emphasis placed on concepts
and tasks
Measure the appropriate level of knowledge
Vary in levels of difficulty (factual recall and demonstration of knowledge,
application and analysis, and evaluation and creation)
Categories of Test Items: There are two general categories for test items
1. Objective Items – students select the correct response from several alternatives or supply
a word or short phrase answer. These items are easier to create for lower-order Bloom’s
(recall and comprehension) while still designing for higher-order thinking test items (apply
and analyse). Objective test items include:
a)Multiple choice: provide an excellent pre-assessment indicator of student knowledge and a
source for a post-test discussion.
Use Multiple Choice Questions to Assess:
Information recall
Application
Evaluation
Understanding concepts
Advantages
easy to score,
increase reliability,
may lower test anxiety,
requires little instruction, and
manageable for beginning learners who can't produce a lot.
Can cover lots of content areas on a single exam
Disadvantages
Often test literacy skills: “if the student reads the question carefully, the
answer is easy to recognise even if the student knows little about the subject.”
Provide unprepared students with the opportunity to guess, and with correct
guesses, they get credit for things they don’t know
Expose students to misinformation that can influence subsequent thinking
about the content
Take time and skill to construct (especially good questions).
Tips for developing for MCQs
Avoid using the same correct answer option for each question
Keep answer options to a minimum. Too many become confusing
Keep question text clear and to the point
Try to keep all answers the same length
Avoid using “all of the above” - too obvious
Avoid using “none of the above.”
Keep distractors plausible
Randomise answer options - will prevent candicates from memorising the letters
8
When using numbers, keep answer options in a logical order
Avoid using double negatives
Repeating words/phrases such as “Did you know” - keep this in the question only.
Poor example:
A nurse is assessing a client who has pneumonia. Which of these assessment findings
indicates that the client does not need to be suctioned?
a) Diminished breath sounds
b) Absence of adventitious breath sounds
c) Inability to cough up sputum
d) Wheezing following bronchodilator therapy
Good example:
Which of these assessment findings, if identified in a client who has pneumonia, indicates
that the client needs to be suctioned?
a) Absence of adventitious breath sounds
b) Respiratory rate of 18 breaths per minute.
c) Inability to cough up sputum.
d) Wheezing before bronchodilator therapy.
Make the distractors mutually exclusive.
Poor example:
How long does a biennial plant generally live?
a) It dies after the second year
b) It lives for many years
c) It lives for more than one year
d) It needs to be replanted every two years
Good example
How long does a biennial plant generally live?
a) One year
b) Two years
c) Several years
Make distractors approximately equal in length. Students often select the most extended
option as the correct answer.
b) True-false: True and False Questions consist of a question and two answer options. More
often than not, the answer options used are 'True and False'. You can however use other
options, such as 'Yes' and 'No', 'I Agree' and 'I Disagree'.
Also known as: TF, binary choice questions, objective
Use True and False Questions to Assess:
Recognizing facts
Reflection of materials learned
Knowledge check
Question Usage Ideas:
Statement Analysis
Feedback
Item Analysis
Pre Tests
Surveys
Advantages of True and False Questions:
Can customize to use 'Yes' and 'No' or 'I Disagree' and 'I Agree'
Easy to grade on paper
Automatically graded online
Can be answered quickly by Test takers
9
Large range of content can be tested
Questions are easy to create
Disadvantages of True and False Questions:
Takes time to create questions
There's a 50% chance of candidates getting the question correct
Hard to determine who knows the material and who doesn't
Can be “too easy”
Candidates can just check an answer without any comprehension of the question
Tips for developing true/false questions
Keep question text to a minimum
Add more 'false' questions than 'true'. Candidates tend to choose 'true' more than they
do 'false'.
Use your own wording
Avoid using double negatives
Use only one fact/statement per each question
Keep the statement either all true or all false - no in between
Be clear with your wording
Keep both true and false statements the same length
10
Don't use too many items per question. Again, you are testing the material, not
searchability.
Add clear instructions
Keep matches (right side) plausible
Shuffle matches and clues
Consider limiting the number of items to 10 or less.
2. Subjective Items – students present an original answer. These types of items are easier to
use for higher-order Bloom’s (apply, analyse, synthesise, create, evaluate). Subjective test
items are best used when essay questions assess:
Comprehension of material learned
Writing skills
Evaluation
Analysis
User's ability to organise facts and ideas
Vocabulary
Problem Solving
Question Usage Ideas:
Gain Feedback
Gather information
Comparison of two items
Discussion
a) Essay There are two significant categories of essay items-short response (also referred to
as restricted or brief) and extended response.
i) Short Response
11
Usually require students to respond to an open-ended prompt using a few words to a
few sentences.
Short Response items are more focused and constrained than extended response
questions. For example, a short response might ask a student to “write an example”,
“list three reasons”, or “compare and compare two techniques”.
Advantages
Quick and easy to grade
Quick and easy to write
Disadvantages
Encourage students to memorise terms and details so that their understanding of
the content remains superficial
It can be challenging to develop a key that can accommodate a variety of responses.
b) Performance Testing
An assessment of individual performance in a systematic way.
It requires an examinee to perform a task or activity rather than simply answering questions
referring to specific parts.
12
The purpose is to ensure greater fidelity to what is being tested. It can be individual or
group. Some performance tests are simulations.
An example is OSCE questions
Advantages
It can be used to assess from multiple perspectives
Direct observation of student ability
Can be scored holistically or analytically
Active student engagement
Authentic assessment of ability
Assess transfer of skills and integration of content
Encourages time on academics outside of class
Provide a dimension of depth not available in the classroom
Promote student creativity
Can be summative or formative
Place faculty more in a mentor role than as a judge
Provide an avenue for student self-assessment and reflection
Can be embedded within courses
Most valid way of assessing skill development
Disadvantages
It can be very time consuming
It can be costly
Relies heavily on student initiative and drive
It relies heavily on specific skill sets of students
Ratings and Results can be subjective
It can be intimidating to students
Requires careful design and training of ratter’s
Sample of behaviour or performance may not be typical, especially if observers are
present
In summary, Objective and subjective test items are both suitable for measuring most
learning outcomes and are often used in combination. Both types can be used to test
comprehension, application of concepts, problem-solving, and ability to think critically.
However, ertain types of test items are better suited than others to measure learning
outcomes. Learning outcomes that require a student to ‘demonstrate’ may be better measured
by a performance test item. In contrast, work requiring the student to ‘evaluate’ may be better
measured by an essay or short answer test item.
13
TEST ADMINISTRATION
(I) Introduction: Test administration procedures are developed for an exam program to help
reduce measurement error and increase the likelihood of fair, valid, and reliable assessment.
Consistent, standardised administration of the exam allows you to compare examinees'
scores directly. However, the examinees may have taken their tests on different dates, at
various sites, and with different proctors.
14
(IV) Methods of Test Administration
(i) Paper-and-Pencil Tests (PPT)
• PPT refer to a general group of assessment tools in which candidates read questions
and respond in writing.
• One of the most common and systematic ways of gathering information about the
learners’ behaviour and performance.
• Assess the level of knowledge and ability or skill qualifications.
• Because many candidates can be assessed simultaneously with a paper-and-pencil
test, such tests are an efficient method of assessment.
Advantages
• Economical in terms of time and money
• Provides an opportunity to obtain detailed feedback for both the teachers and learners
because the responses are being recorded.
• A large number can be tested at the same time
• Allows one to test the students under uniform conditions because examination time can
be strictly controlled.
• Cover a wide area of syllabus than performance and oral tests.
• All students answer the same question paper; hence comparison of the results can be
made effectively.
Disadvantages
• The results may be influenced by external factors like sickness, stress etc.
• High cost associated with the process.
• Non-eco-friendly- a lot of paper is needlessly wasted in the traditional evaluation
process.
15
• Computerised Adaptive Testing (CAT)
• Computerised Simulations and Multimedia
16
The number of tests administered will differ with more tests administered to
candidates whose knowledge or ability is close to the passing point and fewer tests
that pass or fail.
A CCT requires several components:
• An item bank calibrated with a psychometric model selected by the test designer
• A starting point
• An item selection algorithm
• A termination criterion and scoring procedure
Advantages of T/CMT: Promote retentiveness and mastery skills
Disadvantage of CCT/CMT: Not suitable for a large number of students
17
Limitations of Simulation or Modelling
• Mistakes may be made in the programming or rules of the simulation or model.
• The cost of a simulation model can be high.
• The cost of running several different simulations may be high.
• Time may be needed to make sense of the results.
• People’s reactions to the model or simulation might not be realistic or reliable
(ii) Oral Testing: An oral test is a test that is answered orally (verbally). The teacher or oral
test assessor will verbally ask a student, who will then answer it using words.
Advantages
The oral test provides direct contact between the examiner and the examinee.
More than one examiner can assess more than one candidate simultaneously
Provides an opportunity to evaluate the strong and weak areas of each learner.
Provides an opportunity to question the candidate about how she arrived at the
answer.
Provides the examiner with an opportunity to clarify the question in case the
candidate has not understood.
Disadvantages
It depends heavily on the examiner's experience and their ability to retain in their
minds an accurate impression of the standard required.
Lacks standardisation; hence the results of the test cannot compare across candidates
Expensive because an examiner cannot examine more than fifteen students in a day.
Lacks objectivity because it is very subjective; examiner can be affected by other
factors external to the test.
Lacks a precise definition of the criteria for the award of a satisfactory rate
18
In conclusion, test construction and administration are critical components of effective
testing/assessment to evaluate and improve learning performance and quality educational
outcomes.
.
Group Activity
Participants (in groups) will be asked to discuss with group members the type of CBT that
can be adopted for the NMCN Professional examination
• Strategies of implementation
• Perceived challenges and
• Suggestions on the way forward
Then, a group report should be submitted, or group presentations can be conducted.
References
1. S.M. Downing, in International Encyclopedia of Education (Third Edition), 2010
2. Reference: McAllister, D., and Guidice, R.M. (2012). This is only a test: A
machine-graded improvement to the multiple-choice and true-false
examination. Teaching in Higher Education, 17 (2), 193-207.
3. Ahmad RG, Hamed O Impact of adopting a newly developed blueprinting method and
relating it to item analysis on students' performance. Med Teach. 2014 Apr; 36 Suppl
1():S55-61
4. Clark, D. (2010). Bloom’s taxonomy of learning domains. The three types of
learning.
5. Anderson, L., & Krathwohl, D. A. (2001). Taxonomy for learning, teaching and
assessing: A revision of Bloom's Taxonomy of Educational Objectives. New York:
Longman.
6. Armstrong, P. (n.d.). Bloom’s Taxonomy. Center for Teaching, Vanderbilt University.
7. Bloom, B. S.; Engelhart, M. D.; Furst, E. J.; Hill, W. H.; Krathwohl, D. R.
(1956). Taxonomy of educational objectives: The classification of educational goals.
Handbook I: Cognitive domain. New York: David McKay Company.
8. Harrow, A.J. (1972). A taxonomy of the psychomotor domain. New York: David McKay
Co.
19