MODULE 2 Lesson 1-4
MODULE 2 Lesson 1-4
II. Introduction
When teachers use instructional objectives, it becomes very clear what should be on
the test. Objectives indicate behaviors and skills that students should be able to do
after preparing for class, listening to the lecture, and completing homework and
assignments. The desired behaviors from the objectives translate directly into the
items for the test.
Prior to the construction of a paper and pencil test to be used in the measurement of
cognitive learning, teachers have to answer the question, “How good must a
measuring instrument be?” Here are what we have to look into:
Validity
Validity may be defined as the degree to which a test measures what it purports to
measure, and how adequately. It is considered to be a very important quality when
preparing or selecting an instrument for use. The validity of a measuring instrument such
as a test must always be considered in relation to the purpose to which it is intended and
should always be specific in relation to some definite situations. The test must be valid,
relevant, appropriate, and adequate.
Validity may be classified under four types, namely: content validity, construct validity,
concurrent validity, and predictive validity.
Content Validity
Content validity refers to the content and format of the measuring instrument. It
refers to the appropriateness and the adequacy of the content of the course, or of
its objectives. Validity essentially involves the systematic examination of the test
content to determine whether it covers a representative sample of the behavior
domain to be measured. The domain under consideration should be fully described
in advance, and not after the test has been prepared. These domains of behavior
include cognitive, affective, and psychomotor.
Construct Validity
The construct validity of a test refers to the extent to which the test measures a
theoretical trait (Calmorin, 1994). Kerlinger (1973) emphasizes that “construct
validation is preoccupied with theory, theoretical constructs, and scientific
empirical inquiry involving the testing of hypothesized relations.
Concurrent Validity
Reliability
Establishing the reliability of the test is mainly done through statistical estimates.
Reliability estimates provide an idea of how much variations to expect. Such estimates
are called reliability coefficients. A reliability coefficient expresses the relationship
between test scores of the same individuals on the same instrument at two different times,
or even between two parts of the same instrument. Unlike other uses, of correlation
coefficient, reliability coefficient ranges from 0.000 to 1.000.
The test-retest method involves the administering of the same test twice to the same
group after a sufficient lapse of time. The reliability coefficient is computed to determine
the relationship between the two scores obtained. Reliability coefficients are affected by
the length of time that elapses between the administrations of the two tests. The longer
the time interval, the lower the reliability coefficient is likely to be, since there is a greater
likelihood that such factors as unlearning, forgetting among otehres may occur and my
result to low correlation of the test. On the other hand, if the time is short, the examinees
may recall previous responses and may tend to make the correlation of the test high. In
checking for evidence of test-retest validity an appropriate time interval should be
selected. Generally, two weeks or so has been considered in educational measurement
to be most appropriate, since this time interval may eliminate “memory effect” as well as
“maturity effect” on the part of the examinees.
The Spearman rank correlation or Spearman rho may be used to correlate the scores for
this method.
6 Σ D2
ρ=1−
N (N 2 − 1)
Where:
Example
6 Σ D2
ρ=1−
N (N2 − 1)
6 (3.50)
= 1−
10 (102 − 1)
21
= 1−
1000 − 10
21
=1−
990
The rho (ρ) obtained is 0.98 or 98%, a very high correlation value, therefore, the
achievement test in Mathematics is reliable.
The correlation between the scores of these two forms of test representing the
reliability coefficient of the test is then calculated. A high coefficient indicates that the test
is reliable and that the two forms are measuring the same thing.
The split-half method involves administering the test to a group of examinees only
once, but dividing the test items into two halves using the “odd-even” scheme, that is,
divide the test into odd and even items. The two halves of the test must be similar but not
identical in content, number of items, difficulty, averages or means and variability.
In this method, each examinee is given two scores, one on the even and the other
on the odd items in one test. The correlation between the scores obtained on the two
halves represents the reliability coefficient of a half test. To obtain the reliability coefficient
of the whole test, the Spearman – Brown Formula is used.
Formula:
2 rht
rwt = 1 + rht
Where:
Example:
If the correlation coefficient of a half test obtained is 0.85, determine the reliability
Coefficient of the total test.
2 rht
rwt =
1 + rht
2 (0.85)
=
1 + 0.85
= 0.92 or 92%.
The value of rwt is 0.92 or 92%, a very high relationship indicating that the whole
test is reliable.
n V− ΣPi qi
Formula # 20: rtt = [ ]
n−1 V
Where:
rtt = reliability coefficient
n = total number of test items
V = variance of the test scores
ΣPi qi = sum of the product of proportion of passed and failed for
item i.
n nv−x̅ (n−x̅)
Formula # 21: rtt = [ ]
n−1 nv
Where:
rtt = reliability coefficient
n = total number of test items
V = variance of the test scores
x̅ = mean of the test scores
Example:
A 50-item test was administered to a group of students. The test scores were
found to have a mean (x̅) = 45, and a variance (V) = 25. Estimate the reliability
coefficient of the test.
n nv−x̅ (n−x̅)
rtt = [ ]
n−1 nv
50 50(25)−45 (50−45)
= [ ]
49 50(45)
1250−225
= 1.02 [ ] = 0.836 or 0.84
1250
Thus, the reliability coefficient estimates for scores on this test 0.84 or 84%.
In evaluating reliability coefficients, there are two steps that can be used.
1. Compare the obtained reliability coefficients with the extremes that are possible. A
coefficient of 0.00 indicates no relationship, therefore, no reliability at all, while a
coefficient of 1.00 indicates the maximum possible coefficient that can be obtained.
2. Compare the obtained coefficients with the coefficients usually obtained for tests of
the same type. For example, many classroom tests have been reported to have
reliability coefficients of 0.70 and higher is much preferred.
The reliability of a test (or any measuring instrument) generally can be improved by
increasing the length of the test provided the following conditions are met.
1. The test items to be added must have about the same level of difficulty as the
original ones, and
2. The test items to be added must have the same content, or must be measures of the
same factors or skills as the original ones.
Usability
Usability means the degree to which the measuring instrument can be satisfactorily
used by teachers, supervisors, and school administrators without undue expenditure of
time, money, and effort. In other words usability means practicability. There are five
factors that determine usability:
Lesson 2
Measures of Central Tendency: Mean, Median and Mode
II. Introduction
The measures of central tendency of a given set of observations is the score value
around which the whole set of observations or scores tend to cluster. It is
represented by a single number which summarizes and describes the whole set.
The most commonly used measures of central tendency are: mode, the mean, and
the arithmetic mean, or average.
The Mean
The arithmetic mean may be defined as an arithmetic average. It is the sum of thhe
individual score divided by the number of scores. It is a computed average and its
magnitude is influenced by every score value in the set. It is the location measure
most frequently used, but can be miseading when the distribution cotains
extremely large or small values.
The symbol for the smple mean is X̄ (reads as bar X), and for the population mean
is the Greek letter mu (µ) reads as “myu”.
Mean of a sample:
𝐗𝐢
𝐗̄ =
𝐧
Where:
Xi = variable / score
Σ = summation / total
n = number of scores
The Median
The median is the value or the score scale that separates the top half of the
distribution from the bottom half. It is the midpoint of the distribution.
For the distributions having an even number of arrayed scores, the median is the
average of the two middlemost values and for the distributions having an odd
number of arrayed scores, median is the middlemost value. The median is the
most appropriate locator of centersince it has resistance to extreme values. It is a
positional average, hence, its vaue depends on its position relative to the number
of scores in the array (or the number of scores in the distribution). The median is
sometimes denoted by Me or Mdn. We will refer to it here as Mdn.
To find the median, first array the scores in either ascending or descending order
of magnitude.
23, 24, 25, 25, 26, 27, 28, 29, 30 (ascending order)
Then find the median. Since there is an odd number of scores (9), the median is
the middlemost score. It is 26.
To find the median, array the scores in either ascending or descending order of
magnitude.
Then find the median. Since there is an even number of scores (8), the median is
thhe averge of the two middlemost scores.
26+27
Mdn = = 26.5
2
When the number of scores in an arrayed arrangement is even and one of the two
middlemost scores occurs two or more times, the median is equal to the average
of the identical scores and the score of immediately preceding it.
27+27+26+25
Mdn = = 26.25
4
When the number of scores in an arrayed arrangement is odd, and the middlemost
score occurs two or more items, the median is the average of the middlemost score
and the other identical score 9s) and its/their counterpart(s) which either precedes
or follows the middlemost score.
26+26+25
Mdn = = 25.67
3
Remember:
It must be noted that the median is a point and not a score on a scale of
measurement and as such may fall on a score on a scale of measurement and as
such may fall on a score as when n is odd, or it may fall between values, hence,
the median may or may not be a variate. The median is not a variate if the two
middlemost scores are not equal, but if the two middlemost scores are equal, the
median is a variable. (A variate means the actual value of a score).
The Mode
The mode is the measure of central tendency that is easiest to find. It is the score
or the point on the scale of measurement that has a frequency larger than of any
other scores in the distribution. It is the score that occurs most frequently and
corresponds to the highest point in the frequency polygon, and can be found by
mere inspection.
To find the mode of, arrange the scores in either ascending or descending order
of magnitude. The score that occurs most frequently is the mode.
To find the mode, rearrange the scores from the highest to the lowest. The mode
is the score that occurs most frequently.
97, 96, 96, 96, 93, 90, 89, 88, 86, 85
To determine the crude mode from a score frequency distribution, first arrange the
scores in either ascending or descending order of magnitude, writing the score
only once even for score(s) that occur several times.
89, 95, 98, 92, 89, 95, 86, 83, 80, 80, 92, 92, 89, 83, 89, 89
First, arrange the scores in descending order of magnitude, then tally, and write
the frequency of each score.
Score Tally f
98 I 1
95 II 2
92 III 3
89 IIII 5
86 II 2
83 II 2
80 I 1
However, a score frequency distribution may have more than one mode. It is
bimodal when two different scores have the same highest frequency, and multi-
modal when more than two different scores have the same highest frequency.
It is also possible that a distribution of scores may not have any mode at all. The
mode is a rough measure of central location.
1. The mean is the most frequently used measure of location since it reflects every
value and has characteristics of simplicity, uniqueness, and stability from sample
to sample in a distribution. However, when the distribution contains very large or
very small values, it can be misleading, while the median, on the other hand is the
most appropriate locator of the central measure since it is the midpoint of the
distribution, and is not influenced by extreme values, large or small, but by the
number of scores in a given set.
2. Main characteristics:
a. Mean
i. The mean is the arithmetic average of the measurements.
ii. It lies between the largest and smallest measurements of a set of
test scores.
iii. It is influenced by extreme scores.
iv. There is only one mean in a set of test scores.
b. Median
i. The median is the central value; 50% of the test scores lie above it
and 50% fall below it.
ii. It lies between the largest and smallest measurements of a set of
test scores
iii. It is not influenced by extreme scores.
iv. There is only one median for a set of test scores.
c. Mode
i. The mode is the most frequent score in an array.
ii. It is not influenced by extreme values.
iii. There can be more than one mode for a set of scores. If there are
two modes, the set of scores is bimodal, for three or more, it is multi-
modal.
3. In a symmetrical distribution (normal curve) where there is only one mode, the
mean, the median and the mode have equal values and coincide at the highest
point of the polygon and they all lie at the axis of symmetry.
Lesson 3
Measurement of Learning in the Cognitive Domain
II. Introduction
Lorin Anderson, a former student of Bloom, and David Krathwohl revisited the
cognitive domain in the mid-nineties and made some changes, with perhaps
the three most prominent ones being (Anderson, Krathwohl, Airasian,
Cruikshank, Mayer, Pintrich, Raths, Wittrock, 2000):
• changing the names in the six categories from noun to verb forms
• rearranging them as shown in the chart below
• creating processes and levels of knowledge matrix
III. Content / Concept
There are three domains of behavior measured and assessed in schools. The
most commonly assessed, however, is the cognitive domain. The cognitive
domain deals with the recall and recognition of knowledge and development of
intellectual abilities and skills (Bloom, et al, 1956). It is further subdivided into
six heirarchal levels, namely: Remembering, Understanding, Applying,
Analyzing, Evaluating, and Creating.
Prior to the construction of a paper and pencil test to be used in the measurement
of cognitive learning, teachers have to answer to answer the following questions:
Teachers use two types in assessing student learning in the cognitive domain:
objective test, and essay test (Reyes, 2000).
An objective test is a kind of test wherein there is only one answer to each item.
An essay test is one wherein the test taker has the freedom to respond to a
question based on how he feels it should be answered.
Example 2:
• Key List Test – a test wherein the student has to examine paired
concepts based on a specified set of criteria (Olivia, 1998).
Example:
Instruction: Examine the paired items in column 1 and column 2. On
the blank before each number, write:
A = if the item in column 1 is an example of an item in column
2;
B = if the item in column 2 is a synonym of item in column 2;
C = if item in column 2 is an opposite of the item in column 1;
and
D = if items in columns 1 and 2 are not related in any way.
Column 1 Column 2
____1. Capitalism economic system
____2. Labor incentive capital intensive
____3. Planned economy command economy
____4. Opportunity cost demand and supply
____5. Free goods economic goods
B. Essay Test
• Remembering
Explain how Siddharta Gautama became Buddha
• Understanding
What does it mean when a person has crossed the rubicon?
• Applying
Cite three instances showing the application of the Law of Supply and
Demand.
• Analyzing
Analyze the annual budget of your college as to categories of funds,
sources of funds, major expenditures, and needs of your college.
• Evaluating
Are you in favor of the political platform of the Liberal Party? Justify your
answer.
• Creating
Propose solutions that can address the landfill problems in the
Philppines.
Bloom’s Exhibit memory Demonstrate Solve problems to Examine and break Present and Compile
Definitio of previously understanding of new situations by information into defend opinions information
n learned material facts and ideas by applying acquired parts by identifying by making together in a
by recalling facts, organizing, knowledge, facts, motives or causes. judgments about different way by
terms, basic comparing, techniques and Make inferences information, combining
concepts, and translating, rules in a different and find evidence validity of ideas, elements in a
answers. interpreting, giving way. to support or quality of work new pattern or
descriptions, and generalizations. based on a set of proposing
stating main ideas. criteria. alternative
solutions.
Verbs • Choose • Classify • Apply • Analyze • Agree • Adapt
• Define • Compare • Build • Assume • Appraise • Build
• Find • Contrast • Choose • Categorize • Assess • Change
• How • Demonstrate • Construct • Classify • Award • Choose
• Label • Explain • Develop • Compare • Choose • Combine
• List • Extend • Experiment with • Conclusion • Compare • Compile
• Match • Illustrate • Identify • Contrast • Conclude • Compose
• Name • Infer • Interview • Discover • Criteria • Construct
• Omit • Interpret • Make use of • Dissect • Criticize • Create
• Recall • Outline • Model • Distinguish • Decide • Delete
• Relate • Relate • Organize • Divide • Deduct • Design
• Select • Rephrase • Plan • Examine • Defend • Develop
• Show • Show • Select • Function • Determine • Discuss
• Spell • Summarize • Solve • Inference • Disprove • Elaborate
• Tell • Translate • Utilize • Inspect • Estimate • Estimate
• What • List • Evaluate • Formulate
• When • Motive • Explain • Happen
• Where • Relationships • Importance • Imagine
• Which • Simplify • Influence • Improve
• Who • Survey • Interpret • Invent
• Why • Take part in • Judge • Make up
• Test for • Justify • Maximize
• Theme • Mark • Minimize
• Measure • Modify
• Opinion • Original
• Perceive • Originate
• Prioritize • Plan
• Prove • Predict
• Rate • Propose
• Recommend • Solution
• Rule on • Solve
• Select • Suppose
• Support • Test
• Value • Theory
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing, Abridged Edition. Boston, MA: A llyn and Bacon.
Lesson 4
Evaluation of Learning in the Psychomotor and Affective Domains
As pointed out in then previous lesson, there are three domains of learning
objectives that teachers have to assess. While it is true that achievement in the
cognitive domain is what teachers measure frequently, the students’ growth in
non-cognitive domains of learning should also be given equal emphasis. This
lesson expounds different ays by which learning in the psychomotor and
affective domains can be assessed and evaluated.
The psychomotor domain is focused on the processe and skills involving the
mind and the body (Eby & Kujawa, 1994). It is the domain of learning which
classifies objectives dealing with physical movement and coordination (Arends,
1994; Simpson, 1966). Thus, objectives in the psychomotor domain require
significant motor performance. Playing a musical instrument, singing a song,
drawing, dancing, putting a puzzle together, reading a poem and presenting a
speech are examples of skills developed in the aforementioned domain of
learning.
• Imitation
This is the ability to carry out the basic rudiments of a skill when given
directions and under supervision. At this level, the total act is not
performed skillfully. Timing and coordination of the act are not yet refined.
• Manipulation
This is the ability to perform a skill independently. The entire skill can be
performed in sequence. Conscious effort is no longer needed to perform
the skill, but complete accuracy has not been achieved yet.
• Precision
This is the ability to perform an act accurately, efficiently, and
harmoniously. Complete coordination of the skill has been acquired. The
skill has been internalized to such extent tht it can be performed
unconsciously.
• Articulation
This is the ability to coordinate and adapt a series of actions to achieve
harmony and internal consistency
• Naturalization
Mastering a high level performance until it becomes second-nature or
natural, without needing to think much about it.
Examples:
• Copying a work of art
• Performing a skill while observing a demonstator
Key Words:
• Copy
Imitation • Folllow
• Mimic
• Repeat
• Replicate
• Reproduce
• Trace
Examples:
• Being able to perform a skill on ones own after
taking lessons or reading about it.
• Follows instructions to build a model
Manipulation Key Words:
• Act
• Build
• Execute
• Perform
Examples:
• Working and reworking something, so it will be
“just right”.
• Perform a skill or task without assistance.
Precision
• Demonstrate a task to a beginner.
Key Words:
• Calibrate
• Demonstrate
• Perfectionism
• Master
Examples:
• Combining a series of skills to produce a video
that involves music, drama, color, sound etc.
• Combining a series of skills or activities to meet a
novel requirement.
Key Words:
Articulation
• Adapt
• Constructs combine
• Creates
• Customize
• Modify
• Formulate
Examples:
• Maneuvers a car into a tight parallel parking spot.
• Operates a computer quickly and accurately.
• Displays competence while playing the piano.
• Michael Jordan playing basketball
Key Words:
Naturalization
• Create
• Design
• Develop
• Invent
• Manage
• Do naturally
Measuring the Acquisition of Motor and Oral Skills
There are two approaches that teachers can use in measuring the acquisition of
motor and oral skills in the classroom:
This is as assessment approach in whcih the learner does the desired skill in
the presence of the teacher. For instance, in a Physical Education clas, the
teacher can directly observe how male students dribble and shoot the
basketball. In this approcah the teacher observes the performance of a
student, gives feedback, and keeps a record of his performance, if
appropriate.
For example, a teacher who required his students to make an oral report
on a research they undertook, describes the factors which go into ideal
presentation. What the teacher may consider in grading the report,
include the following: knowledge of the topic, organization of the
presentation of the report, enunciation, voice projection, and
enthusiasm. The ideal presentation has to be described and the teacher
has to comment on each of these factors. A student whose presentation
closely matches the ideal described by the teacher would receive a
perfect mark.
This is another approach that teachers can use in the assessment of the
students’ mastery of skills. For example, projects in the different learning areas
may be utilized in assessing students’ progress. Student products include
drawings, models, construction paper products, etc.
There are four steps to consider in making use of this type of performance
assessment:
Portfolio assessment also needs to consider the setting in which the students’
performance will be gathered. Shall it be a written portfolio? Shall it be a
portfolio of oral or physical performances, science experiments, artistic
productions and the like? Setting has to be looked into since arrangements
have to made on how desired performance can be properly collected.
Rating scale is nothing but a series of categories that are arranged in order of quality.
It can be helpful in judging skills, products, and procedures. According to Reyes
(2000), there are three steps to follow in constructing a rating scale:
• Identify the qualities of the product to be assessed. Create a scale for each
quality or performance aspect.
• Arrange the scales either from positive to negative or vice-versa.
• Write directions for accomplishing the rating scale.
Student Teacher____________________________________Date_____________
Subject __________________________________
Rate the student teacher on each of the skill areas specified below. Use the
following code: 5=outstanding, 4=very satisfactory, 3=satisfactory, 3=fair, 1=needs
improvement.
Encircle the number corresponding to your rating.
5 4 3 2 1 Audience Contact
5 4 3 2 1 Enthusiasm
5 4 3 2 1 Use of questions
5 4 3 2 1 Use of reinforcement
The rater would simply check the items that occurred during the conduct of group
experiment.
Another type of checklist requires a yes or no response. The yes is checked when
the action is done satisfactorily; the no is checked when the action is done
unsatisfactorily. Below is an example of this type of checklist.
Name___________________________________________Date_______________
Objectives in the affective domain are concerned with emotional development. Thus,
affective domain deals with attitudes, feelings, and emotions. Learning intent in this
domain of learning is organized according to the degree of internalization. Krathwohl
and his colleagues (1964) identified five levels of learning in the affective domain.
Although it is difficult to assess learning in the affective domain, there are some tools
that teachers can use in assessing learning in this area. Some of these tools are the
following: attitude scale, questionnaire, simple protective techniques, and self-
expression techniques.
Name____________________________________________Date______________
Response to the items is based on the response code provided in the attitude scale.
A value ranging from 1 to 5 is assigned to the options provided. The value of 5 is
usually assigned to the option “strongly agree” and 1 to the option “strongly
disagree”. When a statement is negative, however, the assigned values are usually
reversed. The composite score is determined by adding the scale values and
dividing it by the number of statements or items.
• Semantic differential
This is another type of response on a questionnaire. It is usually a five-
point scale showing polar or opposite objectives. It is designed so that
attitudes, feelings, and opinions can be measured by degrees from
very favorable to very unfavorable. Given below is an example of a
questionnaire employing the aforementioned response type.
• Likert scale
This is one of the frequently used style of response in attitude
measurement. It is oftentimes a five-point scale that links the options
“strongly agree” and “strongly disagree”. An example of this is:
Name_______________________________________Date________
Student Leaders: