0% found this document useful (0 votes)
5 views

PSY 311 Week 2

The document discusses various types of tests used in education, including psychological tests, non-standardized tests, interest inventories, personality inventories, and projective devices. It emphasizes the importance of validity and reliability in testing, as well as the challenges posed by cultural biases in standardized tests. The document also differentiates between measurement and evaluation, highlighting that measurement provides a description of performance while evaluation assigns value or judgment to that performance.

Uploaded by

gladyswanjiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

PSY 311 Week 2

The document discusses various types of tests used in education, including psychological tests, non-standardized tests, interest inventories, personality inventories, and projective devices. It emphasizes the importance of validity and reliability in testing, as well as the challenges posed by cultural biases in standardized tests. The document also differentiates between measurement and evaluation, highlighting that measurement provides a description of performance while evaluation assigns value or judgment to that performance.

Uploaded by

gladyswanjiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

TOPIC TWO

TESTS, FUNCTIONS AND OTHER ISSUES

Expected learning Outcomes


By the end of the lesson, the learner should be able to;
i. Illustrate different types of tests used in education.
ii. Assess the purpose of tests in the teaching and learning process.
iii. Examine different techniques of measuring projection of people’s behaviour.

INTRODUCTION
Psychological variables or characteristics are best measured using
psychometric tools referred to as tests or psychological tests. The term
tests can be broad and include any measurement tool or device that can
be used to measure such psychological attributes or variables such as
the intelligence, aptitude, personality, interests, and other variables of
interest to a psychologist.

Test s are used to determine whether students have learned


what they were expected to learn or to level or degree to
which students have learned the material. They may be used
to measure learning progress and achievement and to evaluate the
effectiveness of educational programs. Tests refer to
instruments or tools used to collect behavioral data (psycho-
physical characteristics/ attributes), such as intelligence,
personality, attitudes or emotions , to name but a few.

Types of Tests
There are different types of tests which include;
1) Psychological Tests and Inventories:
As data-gathering devices, psychological tests are among the most
useful tools of educational research, for they provide the data for most
experimental and descriptive studies in education. In school surveys for
the past several decades, achievement tests have been used extensively
in the appraisal of instruction. Because tests yield quantitative
descriptions or measure, they make possible more precise analysis than
can be achieved through subjective judgment alone. There are many
ways of classifying psychological tests as seen earlier. One distinction
is made between performance tests and paper-and-pencil tests.
Performance tests, usually administered individually, require that
subjects manipulate objects or mechanical apparatus while their actions
are observed and recorded by the examiner. Paper-and-pencil tests,
usually administered in groups, require the subjects to mark their
response on a prepared sheet.

Two other classes of tests are power versus timed or speed tests.
Power tests have no time limit, and the subjects attempt progressively
more difficult tasks until they are unable to continue successfully.
Timed or speed tests usually involve the element of power, but in
addition, they limit the time the subjects have to complete certain tasks.

1
Another distinction is that made between non-standardized, teacher
made tests and standardized tests. The test that the classroom teacher
constructs is likely to be less expertly designed than that of the
professional, although it is based upon the best logic and skill that the
teacher can command and is usually “tailor-made” for a particular
group of pupils.

Which type of test is used depends on the test’s intended purpose. The
standardized test is designed for general use. The items and the total
scores have been carefully analyzed, and validity and reliability have
been established by careful statistical controls. Norms have been
established based upon the performance of many subjects of various
ages living in many different types of communities and geographic
areas. Not only has the content of the test been standardized, but the
administration and scoring have been set in one pattern so that those
subsequently taking the tests will take them under like conditions. As
far as possible, the interpretation has also been standardized.

Although it would be inaccurate to claim that all standardized tests


meet optimum standards of excellence, the test authors have attempted
to make them as sound as possible in the light of the best that is known
by experts in test construction, administration, and interpretation.

2). Non-standardized or teacher-made tests are designed for use with


a specific group of persons. Reliability and validity are not usually
established. However, more practical information may be derived from
a teacher-made test than from a standardized one because the test is
given to the group for whom it was designed and is interpreted by the
teacher / test-maker.
Note
The fact that some individuals with culturally different backgrounds
may not score well or highly on external tests has led to charges of
discrimination against members of underprivileged. The case has been
made that most of these tests do not accurately predict academic
achievement because their contents are culturally biased. Efforts are
being made to develop culture-free tests that eliminate this undesirable
quality. However, it is extremely difficult to eliminate culture totally
and develop one test that is equally fair for all. Since knowledge has
developed within a culture, it is virtually impossible to test knowledge
without bringing an aspect of culture into it. Think even when
measuring, we have to give measurement in a language (English or
Kiswahili), English culture or Kiswahili culture has to come in even in
our thinking and interpretation of our thoughts. Measurement has been
mentioned here for in tests we measure learning outcomes or
potentialities.

3). Interest Inventories


Interest inventories attempt to yield a measure of the types of activities
that an individual has a tendency to like and to choose. There are

2
instruments to measure interest.
Interest blanks or inventories are examples of self-report instruments in
which individuals note their own likes and dislikes. These self-report
instruments are really standardized interviews in which the subjects,
though introspection, indicate feelings that may be interpreted in terms
of what is known about interest patterns.

4). Personality Inventories


Personality tests measure emotional patterns, motivations, values, and
other enduring features of an individual’s psychological makeup.
Personality tests can measure attitudes and opinions – that is, the
individual’s orientation to or assessment of other persons or things.
Such attitude measures may help to draw inferences about personality,
since such attitudes may reveal something about one’s perceptual style.

Personality tests can range from single items to elaborate, multi-scale


instruments. The widely used Internal-External scale (I-E scale)
developed by Rotter, illustrates the multiple- item, personality
questionnaire. Rotter thought behaviour depended on the belief that it
will yield a reward. According to Rotter, individuals differ in their
generalized expectancies, that is, their belief about locus (source) of
control of reinforcement. I-type people, or “internals”, believe strongly
that they control their own fate and E-type people, or “externals”
believe that they are not in control of their own fate. The remaining
people fall between these two extremes. Rotter’s I-E scale consists of a
29-item paper-and-pencil test (including 6 filler items designed to
conceal the purpose of the test and lower reactivity). Each item consists
of a pair of statements of belief about the locus of control. The subject
selects the statement from each pair that most agrees with his or her
own general point of view and receives one point for each answer
scored in the external direction.
Internal consistency (inter-item reliability) for the I-E scale has ranged
in the .60s and .70s. Test-retest reliability (interim period of one to
two months) has ranged from the .50s to the .70s. The I-E has
correlated negatively with measures of social desirability. That is, the
external locus of control belief statements appear less socially
desirable than the internal
ones. To test its construct validity, researchers have given the I-E to
people known independently to differ on the construct of alienation.
Rotter’s theory predicts that externals would feel more alienated or
powerless. Compared to externals, internals, as measured by the I-E,
show more signs of being actively aware of and involved in their
environment (e.g., concerned about their health and doing something
about it, activists etc.).
Personality questionnaires can be constructed in different ways. One
approach selects items depending on how well they agree with each
other. This approach employs factor analysis to make sure that a
common factor underlies all of the items of the scale. If an item fails to
agree with this common factor, it is excluded in favour of one that
does. Factor analysis has shown the 24 yes/no extraversion (E) items of

3
Eysenck Personality Inventory (EPI) all share something in common
with the other items of this scale (Dooley, 2004).
An alternative to factor analysis, the empirical criterion approach,
selects test items according to their ability to discriminate previously
identified groups. For example, the developers of the Minnesota
Multiphasic Personality Inventory (MMPI) used the empirical criterion
approach.

The empirical criterion approach has opened the MMPI to some


criticisms. A subject’s scale score takes its meaning from its relation to
the criterion group’s score. Thus being the case the culture and time has
to be considered. For instance, the original group may have been tested
in 1940s and may have little resemblance to 2000s (this millennium)
when computer has dominated every aspect of life.

Personality scales are usually self-report instruments. The individual


checks responses to certain questions or statements. These instruments
yield scores which are assumed or have been shown to measure certain
personality traits or tendencies.
Because of individuals’ inability or unwillingness to report their own
reactions accurately or objectively, these instruments may be of limited
value. Part of this limitation may be due to the inadequate theories of
personality upon which some of these inventories have been based. At
best, they provide data that are useful in suggesting the need for further
analysis. Some have reasonable empirical validity with particular
groups of individuals but prove to be invalid when applied to others.
For example, MMPI (Minnesota Multiphasic Personality Inventory),
initial version proved valuable in yielding scores that correlate highly
with the diagnoses of psychiatrists in clinical situations. But when
applied to college students, its diagnostic value proved disappointing.
The tendency to withhold embarrassing response and to express those
that are socially acceptable, emotional involvement of individuals with
their own problems, lack of insight-all these limit the effectiveness of
personal and social-adjustment scales. Some psychologists believe that
the projective type of instrument offers greater promise, for these
devices attempt to disguise their purpose so completely that the subject
does not know how to appear in the best light.

4). Projective Devices


A projective instrument enables subject to project their internal
feelings, attitudes, needs, values, or wishes to an external object. Thus
the subjects may unconsciously reveal themselves as they react to
external object. The use of projective devices is particularly helpful in
counteracting the tendency of subjects to try to appear in their best
light, to respond as they believe they should. Projection may be
accomplished through a number of techniques:
1. Association. The respondent is asked to indicate what he or she
sees, feels, or thinks when presented with a picture, cartoon, ink
blot, word or phrase. The Thematic Apperception Test, the
Rorschach Ink Blot Test, and various word- association tests are

4
familiar examples.
2. Completion. The respondent is asked to complete an incomplete
sentence or task. A sentence-completion instrument may include
such items as:My greatest ambition is My greatest fear is I most
enjoyI dream a great deal about I get very angry when If I could do
anything I wanted it would be to.
3. Role-playing. Subjects are asked to improvise or act out a situation
in which they have been assigned various roles. The researcher
may observe such traits as hostility, frustration, dominance,
sympathy, insecurity, prejudice- or the absence of such traits.
4. Creative or Constructive. Permitting subject to model clay, finger
paint, play with dolls, play with toys, or draw or write imaginative
stories about assigned situations may be revealing. The choice of
colour, form, words, the sense of orderliness, evidence of tensions,
and other reactions may provide opportunities to infer deep-seated
feelings. Just like good tests, good inventories need to have a high
degree of both validity and reliability.
Note
Kuder-Richardson formula. This formula is a mathematical test that results in the
average correlation of all possible split half correlation (Cronbach, 1951).

5). Economy
Tests that can be given in a short period of time are likely to gain the
cooperation of the subject and to conserve the time of all those
involved in test administration. The matter of expense of administering
a test is often a significant factor if the testing program is being
operated on a limited budget.
Ease of administration, scoring, and interpretation is an important
factor in selecting a test, particularly when expert personnel or an
adequate budget are not available. Many good tests are easily and
effectively administered, scored, and interpreted by the classroom
teacher, who may not be an expert.

6).Interest
When psychological tests are used in educational research, one should
remember that standardized tests scores are only approximate measures
of the trait under consideration. This limitation is inevitable and may
be ascribed to a number of possible factors:
i. Errors inherent in any psychological test – no test is completely
valid or reliable.
ii. Errors that result from poor test conditions, inexpert or careless
administration or scoring of the test, or faulty tabulation of test
scores
iii. Inexpert interpretation of test results
iv. The choice of an inappropriate test for specific purpose in mind.

7). Finding Self-Report Measures. New measures are difficult to


come by. Nonetheless, although researchers can develop new measures
for their constructs which is not an easy task, they should first search
for the best existing tests for obvious reasons. Using existing measures

5
can save large amount of test development time. Moreover, using
existing measures improves the comparison of different studies. When
studies use the same measure, differences in their outcomes can be
traced to design and sample differences rather than to measurement
differences.

Definition of ‘test’:
A test is a systematic procedure for measuring a sample
of behaviour (psychological variable). Systematic
procedure indicates that a test is constructed,
administered, and scored (or marked) according to
prescribed rules or laid down rules, which must be
followed to the letter or absolutely.
Test items are systematically chosen to fit the test
specifications, the same or equivalent items are
administered to all persons (examinees) and the
directions and time limits are the same for all persons
taking the test. The use of predetermined rules [or
marking scheme] for evaluating (scoring) responses
assures agreement between different persons who might
score (mark) the test, in other words consistency or
reliability is ensured consequently.
Using standard procedures ensures comparability among
the examinees and ensures there is uniformity in all
aspects you can think of. The test should not favour any
individuals or any group of individuals unfairly. A test
should not have any kind of bias.
A second important term in the definition is behaviour.
In the strictest sense, a test measures only test-taking
behaviour. That is, the responses a person (examinee)
makes to the test items. Here we are talking about
psychological variables and as we know these cannot be
measured directly, rather we infer the characteristics
(trait) from his or her responses to the given test items.
We have to measure their manifestations since they are
not tangible.
If the behaviour exhibited (manifested) on the test
adequately mirrors the construct (trait) being measured,
the test will provide useful information. Here we are
talking about validity of the test, i.e. the test measuring
what it is supposed to measure. If the test does not
adequately reflect the underlying characteristic,
inferences made from test scores will be in error for
validity is important.
A test contains only a sample of all possible items. No
test is so comprehensive that it includes every possible
item. No test is so comprehensive that it includes every
possible item that might be developed to measure the
behaviour domain [or population or universe]; e.g. a
driver’s test will not test you how to drive at night, or on

6
a slippery wet road or when raining very much. Thus
any particular test is better thought of as a sample of all
possible items.
Because a test contains only a sample of all possible items, two
problems arise.
1. We must ensure that the questions or items
represented on the test are a representative sample
of all-possible questions or items. [Validity]
2. Would an examinee get the same score if he were
given a different set of sampled items from the
same domain? [Reliability].
A test is a measuring instrument. Thus we need to define measurement.
Measurement is assigning of numbers to individuals in a
systematic way as a means of representing the properties
of the individuals such that those with more of the
property you are measuring will score more, those with
less will score less.

Difference between Measurement and Evaluation


Measurement answers the question, how much? That is,
measurement provides a description of a person’s
(examinee’s) performance. It does not provide
judgment. That is, it says nothing about the worth or
value of the performance. If we put value or worth or
judgment on it, then we are evaluating. We are going
beyond description. We are attempting to answer the
question how good? This is evaluation. A mark or
score like 40 out of 50 is measurement. If we say it is
B+ then this is an evaluation, since judgment has been
made on the value of the
mark or score in terms of how good. That is, objective
description here is a measurement, while subjective
judgment of quality is an evaluation.

Uses / Purpose of Tests:


Explicit uses of tests:
1. Selection:
Selection is done in academic setting, in business and
other sectors offering jobs where there are more
qualified applicants than job opportunities. That is, in
the selection situation there are more applicants than can
be accepted (or employed or hired), and a decision has
to be made on whom to accept. The role of the test is to
identify the most promising applicants (or candidates or
examinees) i.e. those with the greatest probability of
success. In the simplest case, the decision is either to
accept or reject. In Kenya, for university entrances, there
are clear-cut off (or cut off points), which are strictly
adhered to. Once laid down, one cannot go to complain.

7
Hence we seem to have very little interest on those who
are rejected (or left out). Social economic status, poor
health, poor facilities and background or other adverse
factors may contribute to a person being left out. Many
such factors are assumed uniform for all. In other words,
nobody is favoured is the assumption yet we know this
is not true.
2. In Placement:
There are several individuals and several alternative
courses of action for instance, in universities there are
several departments and each has its requirements. In
general each person is to be assigned to a program using
certain criteria.
3. Diagnosis:
It involves comparing an individual’s performance in
several areas in order to determine relative strengths and
weaknesses. Generally, diagnostic procedures are
instituted when an individual is having difficult in some
area. Once the areas of disability are identified, a
program of remediation can be undertaken. For
example, If a child has problem in reading or doing
word problems in mathematics, you may give a test
consisting of phonetic, word meaning (vocabulary),
sentence meaning, paragraph meaning and reading rate,
so as to identify what particular weaknesses or strengths
of the child need appropriate action.
4. Hypothesis Testing:
In psychological research, tests are often used for
hypothesis testing. And what is a hypothesis? This is
dealt with here briefly (for details see the appropriate
section on this). In brief a hypothesis is a speculative
statement, or an educated guess, which you may wish to
establish whether to accept or reject. For instance, we
can manipulate our subjects in a certain way (varying
may be the degree of manipulation) and then we try to
find the effect of the manipulation. We give a test to find
out the effect of the manipulation. This is a type of
experimental study or design. In a correlational study
(design) we have cases of natural manipulation. In a
correlation study we may look at the performance at a
certain time or under certain conditions. We study what
has taken place and then we make inferences. Using
varies methods like keeping other variables constant or
eliminating them analytically or otherwise, we are able
to study the effect of the variable manipulated.
Tests can also be used for hypothesis building. We may
find a difference in performance in 80’s and 90’s and
then we go on to hypothesize what could be the reason
may be a drop in socioeconomic status, 8-4-4
educational system or a combination of these and others.

8
Psychologists or educators (even lay-people) use tests to
make a lot of deductions or build hypotheses. For
instance, Muthoni got a very good division I in the O-
level examination, but failed to obtain university
entrance after A-level. Why? Muthoni went to do
science for A- level because of parental pressure. Her
father wanted her to be a doctor, but she did not have
much interest in sciences (Biology and the like).
Muthoni could have done very well if she took Arts
(Humanities). Or Muthoni may have lost her father just
before the exams and this traumatized her too much
beyond recovery and this indeed may have contributed
to her poor performance in the A-level exams.
5. Another use of tests is in Evaluation:
Formative evaluation and summative evaluation:
A teacher can use test to find not only the weak students
but also his weaknesses or topic not understood well etc.
Thus classroom examinations and tests are usually used
to evaluate the instructional method or the teacher.
All of these uses involve some decision. In selection, the
decision is whether to accept or reject an applicant. In
placement, where does the candidate fit best in terms of
ability and skills while in diagnosis, which remedial
treatment is to be used after finding out the weakness?
In hypothesis testing, usually using statistics you need to
establish (reject or accept) the hypothesis. In evaluation,
what grade to give to a student or how effective is the
procedure, effectiveness has to deal with summative
evaluation, or evaluation done at the end while what is
done at the beginning (e.g. to check entry behaviour) is
formative evaluation.
We know how seriously we take tests. We belong to a
culture, which overrates exams. You get a lot of respect
if you are an A student, division I, first class or Ph.D.
scholar. If you do your tests badly, you seldomly
(rarely) get a chance of saying why you obtained a low
score. Research on tests shows ‘ability’ is important in
doing well in a test, but accounts for less than 50%.
Other factors do count like difficult of items, quality of
instructions, personality variables e.g. socioeconomic
status, linguistic variables etc.

7 ULTIMATE FUNCTIONS OF TESTS IN


EDUCATION
As a teacher, when you set a test for your
class, what do you do with the results? How
does the test help you in the teaching and
learning situation?

9
Nwana (1981) gives the following variety of functions of tests.
These are:

i. Motivate pupils to study.


ii. Determine how much the pupils have learned.
iii. Determine the pupils’ special difficulties.
iv. Determine the pupils’ special abilities.
v. Determine the strength and weakness of the teaching Method.
vi. Determine the adequacy or otherwise of instructional
Resources.
vii. Determine the extent of achievement of the objectives etc.

To Motivate the Pupils to Study

This is why test are regularly used to motivate pupils to learn. They
study hard towards their weekly, terminal or end of the year
promotion examinations.

To determine how much pupils have learned

One of the functions of tests can be to find out the extent to which
the contents have been covered or mastered by the testees. For
instance, if you treat a topic in your class at the end you give a test
and many of your students score high marks. This is an indication
that they have understood the topic very well. But if they score very
low marks, it implies that your efforts have been wasted. You need
to do more teaching. It is the results of the test that will help you
decide whether to move to the next topic or repeat the current
topic.

To Determine the Pupils’ Special Difficulties

Tests can be constructed and administered to students in order to


determine particular problems of students. This is done in order to
determine appropriate corrective actions. This identification of
weaknesses and strength on the part of the students is the
diagnostic use of tests. It helps in the desirable effort to give pupils,
individuals or group remedial attention. Can you think of any of such
tests? Before you continue do this activity.

To Determine The Pupils Special Abilities

Tests can be used as a measure to indicate what a person or a


group or persons or students can do. These can be measures of
aptitudes – capacity or ability to learn and measures of achievement
or attainment. These can be done using aptitude tests and
achievement tests. The major concentration of the class teacher is
achievement test which he is expected to use to promote learning

10
and bring about purposeful and desirable changes in the students
entrusted to him

Determine The Strength And Weakness Of The Teaching


Methods

The results of classroom tests provide empirical evidence for the


teacher to know how well or how effective his teaching methods are.
Test results are used as self -evaluation instrument. They can be
used by others to evaluate the teacher. If the results are not
encouraging, the teacher may decide to review his teaching
methods with a view to modifying or changing to another.

To Determine The Adequacy Or Otherwise Of Instructional


Materials

A good teacher makes use of a variety of teaching aids for


illustrations and demonstrations. Effective use of these instructional
resources helps to improve students understanding of the lesson.
Topics which look abstract can be brought to concrete terms by the
use of these materials. Therefore to determine the effectiveness,
adequacy or otherwise of these teaching aids test can be used.

To Determine The Extent Of Achievement Of The Objectives

There are goals and objectives set for the schools. Every school is
expected to achieve the goals and objectives through the
instructional programmes. The results of tests given to students are
used to evaluate how well the instructional programmes have
helped in the achievement of the goals and objectives

11

You might also like