0% found this document useful (0 votes)
63 views13 pages

Monitoring and Evaluation Final Paper

A test item bank is a collection of test items and related information used for testing programs. It allows teachers to reuse test items, improves scoring reliability through computerized scoring, and provides diagnostic feedback to students. While computerized testing has advantages like large group administration, computer equipment may not always be available or functioning properly. Validity refers to how well a test measures what it is intended to measure. There are different types of validity including internal and external validity. Reliability is the degree to which a test provides consistent results and measures what it aims to measure regardless of when it is administered. Following guidelines around question structure, language, independence, and distractors can help improve the quality of multiple choice questions.

Uploaded by

Dilshad Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views13 pages

Monitoring and Evaluation Final Paper

A test item bank is a collection of test items and related information used for testing programs. It allows teachers to reuse test items, improves scoring reliability through computerized scoring, and provides diagnostic feedback to students. While computerized testing has advantages like large group administration, computer equipment may not always be available or functioning properly. Validity refers to how well a test measures what it is intended to measure. There are different types of validity including internal and external validity. Reliability is the degree to which a test provides consistent results and measures what it aims to measure regardless of when it is administered. Following guidelines around question structure, language, independence, and distractors can help improve the quality of multiple choice questions.

Uploaded by

Dilshad Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

1

Monitoring and Evaluation

Shorts Questions

1. Define test item bank?

A test item bank is a term for a source of test items that belong to a testing program, as well as
all information pertaining to those items. In most applications of testing and assessment, the
items are of multiple choice format, but any format can be used.

Advantages of test bank:

 Teachers do not have to write new items for every examination.

 Reliability in scoring; computers are more accurate.

 Traditional time limits are not necessary.

 Diagnostic feedback can be provided very quickly to each student on those items
answered incorrectly.

 Large group of students can be tested at the same time.

 Easy to find subjects.

Disadvantages of test bank:

 Computer equipment may not always be available, or in working order.

 The graphics capabilities of many computers may be limited.

 Computer anxiety is another potential disadvantage.

 Differences in the degree to which students are familiar with using computers or
typewriter keyboards may lead to discrepancies in their performances on computer-
assisted or computer- adaptive tests.

2. Types of Essay?

 Restricted Response Essay Items

An essay item that poses specific problem for which a student must recall
proper information, organize it in a suitable manner, derive a defensible conclusion, and
express it within the limits of posed problem, or within a page or time limit, is called a restricted

Composed by. ZAIN UL ABIDEEN


2

response essay type item. The statement of the problem specifies response limitations that
guide the student in responding and provide evaluation criteria for scoring.

Restricted Response Essay Items are usually used to:

 Analyze relationship

 Compare and contrast positions

 State necessary assumptions

 Identify appropriate conclusions

 Explain cause-effect relationship

 Organize data to, support a viewpoint

 Evaluate the quality and worth of an item or action

 Integrate data from several sources

 Extended Response Essay Items

An essay type item that allows the student to determine the length and
complexity of response is called an extended-response essay item. This type of essay is most
useful at the synthesis or evaluation levels of cognitive domain. We are interested, in
determining whether students can organize, integrate, express, and evaluate information,
ideas, or pieces of knowledge the extended response items are used.

Synthesis and evaluation levels; a lot of freedom in answers.

3. Advantages of Norm or Criterian reference test?

Advantages of Criterion reference test:

 Students are not competing with each other.

 Students are thus more likely to actively help each other learn.

 A student's grade is not influenced by the caliber of the class.

Advantages of Norm reference test:

 They are easy for instructors to use.

 They work well in situations requiring severe differentiation among students.

Composed by. ZAIN UL ABIDEEN


3

 They are generally appropriate in large courses.

4. Administration of test:

The test is ready. All the remains is to get the students ready and hand out the test. Here are
some suggestions to help your students psychologically prepare for the test.

1. Maintain a positive attitude for achievement

2. Maximize achievement motivation

3. Equalize advantages to all the students

4. Provide easy, comfortable and proper seats

5. Provide proper system of light, temperature, air and water

6. Clarify all the rules and regulations of the exam center/hall

7. Rotate distributions

8. Remind the students to check their copies

9. Monitor students continuously

10. Minimize distractions

11. Give time warnings properly

12. Collect test uniformly

13. Count the answer sheets, seal it in a bag and hand it over to the concerned authority.

5. Define Validity and reliability?

Validity:

Validily refers to how well a test measures what it is supposed/claimed to measure. It is the
degree to which a test measures.

Types of validity:

 Internal validity

 External validity

Reliability:

Composed by. ZAIN UL ABIDEEN


4

Reliability is the degree to which an assessment tool produces stable and consistent results. The
degree to which a test consistently measures no matter whatever it measures.Reliability is a
measure of the stability or consistency of a test score. It is the ability for a test or rescarch
findings to be repeatable.

Long Questions

1. Rules for writing multiple choice or objective types?

There are several rules we can follow to improve the quality of this type of written
examination.

Examine only the important facts!


Make sure that every question examines only the important knowledge.
Avoid detailed questions each question has to be relevant for the previously set instructional
goals of the course.

Use simple language!


Use simple language, taking care of spelling and grammar. Spelling and
grammar mistake (unless you are testing spelling or grammar) only a confuse students.
Remember that you are examining knowledge about your subject and not language skills.

Make the questions brief and clear!


Clear the text of the body of the question from all superfluous words
and irrelevant content. It helps students to understand exactly what is expected of them. It is
desirable .................... question in such way' that the main part of the text is in the body of the
question without being repeated in the answers.

Form the questions correctly!


........that the formulation of the question does not (indirectly) hide the
key to the correct answer. Student (adept at solving tests) will be able to recognize it easily and
will find the right answer because of the word combination, grammar etc, and not because
of.............. knowledge.

Take into consideration the independence of questions!


Be careful not to repeat content, and terms related to the same theme,
since the answer to one questian can become the key to solve another.

Composed by. ZAIN UL ABIDEEN


5

Offer uniform answer!


All offered answers should be unified, clèar and, realistic. For example,
unlikely realisation of an' answer or uneven text quantity of different answers can point to the
right answer. Such a question does not test real knowledge. The position of the 'key should be
random. If the answers are numbers, they should be listed in an ascending order.

Avoid asking negative questions!


If you use negative questions, negation must be emphasized by using
CAPITAL letters, e.g "Which of the following IS NOT correct...." or "All of the following
statements are true, EXCEPT....."

Avoid distracted in the form of "All the answers are correct"- or "None of
the answer is correct"
Teachers use these statement most frequently when they run out of
ideas for distracters.Students knowing what is behind such questions, are rarely misled by it.
Therefore, if you do use such statements, sometimes use them as the key answer. Furthermore,
if a student recognizes that there are two correct answers (out of 5 options), they will be able
to conclude that the key answer is the statement "all the answer are correct",.without knowing
the accuracy of the other distraçters.

Distracters must be Significantly Different from the Right Answer (key)!


Distracters which only slightly differ from the key answer are bad
distracters. Good or strong distracters are statements which themselves seem correct, but are
not the correct answer to a particular question.

Offer an Appropriate Numbers of Distraçters!


The greater the number of distracters, the lesser the possibility that a
student could guess the right answer (key), In higher education tests questions with 5 answers
are used most often (1 key + 4 distracters): That means, that a student is 20% likely, to guess
the right answer.

2. Validity and reliability of test and its types?

Reliability of test:-

Definition

Composed by. ZAIN UL ABIDEEN


6

Reliability is the degree to which an assessment tool produces stable and consistent results. The
degree to which a test consistently measures no matter whatever it measures.

Reliability is a measure of the stability or consistency of a test score. It is the ability for a test or
rescarch findings to be repeatable. For example, a medical thermometer is a reliable tool that
would measure the correct temperature each time it is used. In the same way, a reliable math
test will accurately describe measure mathematical knowledge for every student who takes it
and reliable research findings can be replicated over and over.

So it is the extent to which an experiment, test, or any measuring procedure shows the same
result on repeated trials.

Types of reliability

Split-Half Method:
A measure of consistency where a test is split in two and the scores for each half of the test are
compared with one another. A test is split into two, odds and evens, if the two scores for the
two tests are similar then the test is reliable.

Split-half reliability is a form of internal consistency reliability. It assesses the internal


consistency of a test, such as psychometric tests and questionnaires. It measures the extent to
which all parts of the test contribute equally to what is being measured.

This is done oy comparing the results of one half of a test with the results from the other half. A
test can be split in half in several ways, e.g. first half and second half, or by odd and even
numbers. If the two halves of the test provide similar results this would suggest that the test
has internal reliability.

The reliability of a test could be improved through using this method. For example any items on
separate halves of a test which have a low correlation (e.g. r= .25) should either be removed or
re-written.

The split-half method is a quick and easy way to establish reliability. However it can only be
effective with large questionnaires in which all questions measure the same construct. This
means it would not be appropriate for tests which measure different constructs.

Test-Retest Method:
Test-retest reliability is the degree to which scores are consistent over time. It indicates score
variation that occurs from testing session to testing session as a result of errors of
measurement. Each time the test is carried out, the results are the same. Test-retest reliability

Composed by. ZAIN UL ABIDEEN


7

is a measure of reliability obtained by administering the same test twice over a period of time
to a group of individuals. The scores from Time I and Time 2 can then be correlated in order to
evaluate the test for stability over time. It applies on same people, different times.

The test-retest method assesses the external consistency of a test. Examples of appropriate
tests include quest ionnaires and psychometric tests. It measures the stability of a test over
time.

For example, a test designed to assess student learning in psychology could be given to a group
of students twice, with the second administration perhaps coming a week after the first. The
obtained correlation coefficient would indicate the stability of the scores.

A typical assessnment would involve giving participants the same test on two separate
occasions.The same or similar results are obtained then external reliability is established. The
disadvantages of the test-retest method are that it takes a long time for results to be obtained.
Beck et al. (1996) studied the responses of 26 outpatients on two separate therapy sessions
one week aparl, they found a correlat ion of .93 there fore demonstrating high test-retest
reliability of the depression invenlory. The timing of the test is important; if the duration is to
brief then part ic ipants may recall information from the first test which could bias the results.
Alternatively, if the duration is too long it is feasible that the participants could have changed in
some important way which could also bias the results.

Inter Rater Method


This refers to the degree to which different raters give consistent estimates of the same
behavior. Inter-rater reliability can be used for interviews. It applies on different people, same
test.

Independent judges (two or more) score the test or experiment, the data is then compared to
find out how consistent they were with the raters estimates.

Note. it can also be called inter-observer reliability when referring to observational research.
Here researcher when observe the same behavior independently (to avoided bias) and compare
their data. If the data is similar then it is reliable.

Where observer scores do not significantly correlate then reliability can be improved.

For example. if two researchers are observing 'aggressive behavior' of children at nursery they
would both have their own subjective opinion regarding what aggression comprises. In this
scenario it would be unlikely they would record aggressive behavior the same and the data
would be unreliable.

Composed by. ZAIN UL ABIDEEN


8

However. if they were to operationalize the behavior category of aggression


this would be more objective and make it easier to identify when a specific behavior occurs.

Equivalent-Forms or Alternate-Forms Reliability


Two test that are identical in every way except for the actual items included.
Used when it is likely that test takers will recall responses made during the first session and
when alternate forms are available. Correlate the two scores. The obtained coefficient is called
the coefficient of stability or coefficient of equivalence.

Internal Consistency Reliability


It is a measure of reliability used to evaluate the degree to which different
test items that investigate/examine the same construct produce similar results. It can prove if
students have or have not mastered a particular subject.

The internal consistency method provides a unique estimate of reliability for


the given test administration. This measures the consistency of data within the same
experiment. The most popular internal consistency reliability estimate is given by Cronbach's
alpha. Internal consistency determines how all items on the test relate to all other items.
Kuder-Richardson-> is another estimate of reliability that is essentially equivalent to the
average of the split-half reliabilities computed for all possible halves. It applies on different
questions, same construct.

Validity of test

Definition
Validily refers to how well a test measures what it is supposed/claimed to measure. It is
the degree to which a test measures.

Aspects of validity
There are two aspects of validity:

Internal validity

Definition
The degree to which observed differences on the dependent variable is a direct result
of manipulation of the independent variable, not some other variable. The instruments or
procedures used in the research measured what they were supposed to measure.

Composed by. ZAIN UL ABIDEEN


9

Example:
As part of a stress experiment, people are shown photos of war atrocities. After the
study, they are asked how the pictures made them feel, and they respond that the pictures
were very upsetting. In this study, the photos have good internal validity as stress producers.

External validity
The degree to which results are generalizable or applicable to groups and
environment outside the experimental settings.

The resulls can be generalized beyond the immediate study. In order to have external validity,
the claim that spaced study (studying in several sessions ahead of time) is better than cramming
for exams should apply to more than one subject (e.g., to math as well as history). It should also
apply to people beyond the sample in the study.

Types of Validity

Content validity
Content validity refers to the connections between the test items and the subject-related tasks.
The test should evaluate only the content related to the field of study in a manner sufficiently
representative, relevant, and comprehensible.

When we want to find out if the entire content of the behavior/construct/area is represented in
the test we compare the test task with the content of the behavior. This is a logical method, not
an empirical one.

Example
If we want to test knowledge on American Geography it is not fair to have most questions
limited to the geography of New England.

Construct validity
It is used to ensure that the measure, actually measures what it is intended to measure and not
other variables. Using a panel of "experts" familiar with the construct is a way in which this type
of validity can be assessed. The experts can examine the itenms and decide what that specific
item is intended to measure. Students can be involved in this process to obtain their feedback.

For Example

Composed by. ZAIN UL ABIDEEN


10

A women's studies program may design a cumulative assessment of learning throughout the
major. The questions are written with complicated wording and phrasing. This cause the test
inadvertently becoming a test of reading comprehension, rather than a test of women's studies.
It is important that the measure is actually assessing the intended construct, rather than an
extraneous factor. It implies using the construct correctly (concepts, ideas motions).

Construct validity seeks agreement between a theoretical concept and a specific measuring
device or procedure.

For example
A test of intelligence now a days must include measures of multiple intelligences rather than
just logical-mathematical and linguistic ability measures. It is the deeree to which à test
measures an intended hypothetical construct. It is the assessment actually measures what it is
designed to measure. A actually is A.

Criterion-related validity
Also referred to as instrumental validity, it states that the criteria should be clearly defined by
the teacher in advance. It has to take into account other teachers' criteria to be standardized
and it also, needs to demonstrate the accuracy of a measure or procedure compared to
another measure or procedure which has already been demonstrated to be valid. lt has further
two types.

Types of Criterion-related validity


The types of Criterion-related validity are given below:

Concurrent validity
lt is the degree to which the scores on a test are related to the scores on another,
already established, test administered at the same time, or to some other valid criterion
available at the same time.

For example
A new simple test is to be used in place of an old cumbersome one, which is
considered useful measurements are obtained on both at the same time. Logically, predictive
and concurrent validation are the same, the term concurrent validation is used to indicate that
no time clasped between measures. This assessment correlates with other assessments that
measure the same construct. A correlates with B.

Composed by. ZAIN UL ABIDEEN


11

Concurrent validity is a statistical method using correlation, rather than a


logical method. Examinee's who are known to be either masters or non masters on the content
measured by the test are identified before the test is administered. Once the tests have been
scored, the relationship between the examinees status as either masters or non-masters and
their performance (i.e., pass or fail) is estimated based on the test. This type of validity provides
evidence that the test is classifying examinees correctly. The stronger the correlation is, the
greater the concurrent validity of the test is.

Predictive validity
It estimate the relationship of test scores to an examinees, future performance as
a master or non-master. Predictive validity considers the question; How well does the test
predict examinees future status as masters or non-masters? For this type of validity, the
correlation that is computed is based on the test results and the examinee's later performance.
This type of validity is especially useful for test purposes such as selection or admissions. This
assessment predicts performance on a future assessment. A predicts B.

3. Essay Type Tests and its types?


Essay questions are supply or constructed response type questions and can
be the best way to measure the students higher order thinking skills, such as applying
organizing. synthesizing, integrating, evaluating, or projecting while at the same time providing
a measure of writing skills. The student has to formulate and write a response, which may be
detailed and lengthy. The accuracy and quality of the response are judged by the teacher.

Essay questions provide a complex prompt that requires written responses,


which can vary in length from a couple of paragraphs to many pages. Like short answers
questions, they provide students with an opportunity to explain their understanding and
demonstrate creativity, but make it hard for students to arrive at an acceptable answer by
bluffing. They can be constructed, reasonably quickly and easily but marking these questions
can be time-consuming and grade agreement can be difficult.

Essay questions differ from short answer questions in that the essay
questions are less structured. This , openness allows students to demonstrate that they can
integrate the course material in creative ways. As a result, essays are a favoured approach to
test higher levels of cognition including analysis, synthesis and evaluation. However, the
requirement that the students provide most of the structure increases the amount of work
required to respond effectively. Students often take longer time to compose a five paragraph
essay than they would take to compose paragraph answer to short answer questions.

Composed by. ZAIN UL ABIDEEN


12

Essay items can vary from very lengthy, open ended end of semester term
papers or take home tests that have flexible page limits (e.g. 10-12 pages, no more than 30
pages etc) to essays with responses limited or restricted to one page or less. Essay questions
are used both as formative assessments (in classrooms) and summative assessments (on
standardized tests). There are 2 major categories of essay questions short response (also
referred to as restricted or brief) and extended response.

Types of Essay Tests


Essay tests may be divided into many types. Monree and Cater (1993) divide essay
tests into the many categories like Selective recall-basis given, evaluation recall-basis given
comparison of two things on a single designated basis, comparison of two things in general,
Decisions For or against cause and effect, explanation of the use or exact meaning of some
word, phrase on statement, summary of some unit of the text book or article, analysis,
statement of relationships; illustration or examples, classification, application of rules, laws, or
principles to new situation, discussion, statement of an author's purpose in the selection or
organization of materials, Criticism as to the adequacy, correctness or relevance of a printed
statement or to a class mate's answer to a question on the lesson, reorganization of facts,
formulation of new question-problems and question raised, new methods of procedure etc.

1.Restricted Response Essay Items


An essay item that poses specific problem for which a student must recall
proper information, organize it in a suitable manner, derive a defensible conclusion, and
express it within the limits of posed problem, or within a page or time limit, is called a restricted
response essay type item. The statement of the problem specifies response limitations that
guide the student in responding and provide evaluation criteria for scoring.

More consistent scoring, outlines parameters of responses

Example 1:
List the major similarities and differences in the lives of people living in Islamabad and
Faisalabad

Example 2:
Compare advantages and disadvantages of lecture teaching method and
demonstration teaching method.

When Should Restricted Response Essay Items be used?

Composed by. ZAIN UL ABIDEEN


13

Restricted Response Essay Items are usually used to:

Analyze relationship

Compare and contrast positions

State necessary assumptions

Identify appropriate conclusions

Explain cause-effect relationship

Organize data to, support a viewpoint

Evaluate the quality and worth of an item or action

Integrate data from several sources

2.Extended Response Essay Items


An essay type item that allows the student to determine the length and
complexity of response is called an extended-response essay item. This type of essay is most
useful at the synthesis or evaluation levels of cognitive domain. We are interested, in
determining whether students can organize, integrate, express, and evaluate information,
ideas, or pieces of knowledge the extended response items are used.

Synthesis and evaluation levels; a lot of freedom in answers.

Example:
Identify as many different ways to generate electricity in Pakistan as you can? Give
advantages and disadvantages of each. Your response will be graded on its accuracy,
comprehension and practical ability. Your response should be 8-10 pages in length and it will be
evaluated according to the RUBRIC scoring criteria) already provided.

Composed by. ZAIN UL ABIDEEN

You might also like