0% found this document useful (0 votes)
57 views

AIL Unit 3

This document discusses the important qualities of valid and reliable assessment tools, including validity, reliability, fairness, and practicality. It provides details on different types of validity like content validity, criterion-related validity, and construct validity. It also discusses factors that can affect validity and reliability, how to compute and interpret validity and reliability coefficients, and ways to improve validity. The key aspects covered are the importance of ensuring assessments accurately measure their intended objectives, being reliable by providing consistent results, and being fair, practical and having positive consequences for students.

Uploaded by

julian.alipat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

AIL Unit 3

This document discusses the important qualities of valid and reliable assessment tools, including validity, reliability, fairness, and practicality. It provides details on different types of validity like content validity, criterion-related validity, and construct validity. It also discusses factors that can affect validity and reliability, how to compute and interpret validity and reliability coefficients, and ways to improve validity. The key aspects covered are the importance of ensuring assessments accurately measure their intended objectives, being reliable by providing consistent results, and being fair, practical and having positive consequences for students.

Uploaded by

julian.alipat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit 3

QUALITIES OF ASSESSMENT TOOLS


Prof. Lorelie G. Sabando

CONTENT
1. Validity
2. Reliability
3. Fairness
4. Positive Consequences
5. Practicality and Efficiency

MODULE OUTCOMES
In this module you will be able to:
1. discuss validity and reliability;
2. compute and interpret the validity coefficient and reliability coefficient;
3. discuss fairness, positive consequences and practicality & efficiency

Lesson 1: VALIDITY
Lesson Outcomes:
At the end of the lesson, the learners must have:
1. discussed validity and its types; and
2. computed and interpreted the validity coefficient

ACTIVATE:

The quality of the assessment instrument and method used in education is very
important since the evaluation and judgment that the teacher gives on a student are
based on the information he obtains using these instruments. Accordingly, teachers
follow a number of procedures to ensure that the entire assessment is valid and reliable.

ACQUIRE:

Validity refers to how accurately a method measures what it is intended to


measure.
Types of Validity
1. Face Validity.
It refers to the outward appearance of the test. It is the lowest form of test
validity.
Face validity tells us nothing about what a test actually measures. Face validity
refers to how test takers perceive the attractiveness and appropriateness of a test. Why
then is it important? If test takers consider the test to have face validity, they may offer a
more conscientious effort to complete the test. If a test does not have face validity they
might hurry through a test and take it less seriously.
2. Content Validity.
A type of validation that refers to the relationship between test and the
instructional objectives, establishes content so that the test measures what it is
supposed to measure. Things to remember about validity:
a. The evidence of the content validity of a test is found in the (Table Specification).
b. This is the most important type of validity for a classroom teacher.
c. There is no coefficient for content validity. It is determined by experts judgmentally,
not empirically.
2. Criterion-related Validity. This is used to predict future or current performance-it
correlates test results with another criterion of interest.
Example: If a physics program designed to measure to assess cumulative
student learning throughout the major, the new measure could be correlated with a
standardized measure of ability in this discipline. The higher the correlation between the
established measure and new measure, the more faith the teachers can have in the
new assessment tool.
a. Concurrent validity.
The criterion and the predictor data are collected at the same time. This
type of validity is appropriate for tests designed to assess a student's current
criterion status or when-you want to diagnose student's status; it is a good
diagnostic screening test. It is established by correlating the criterion and the
predictor using Pearson product correlation coefficient and other statistical tools
correlations.

b. Predictive validity.
A type of validation that refers to a measure of the extent to which a
student's current test result can be used to estimate accurately the outcome of the
student's performance at later time. It is appropriate for tests designed to assess
student's future status on a criterion.
3. Construct Validity.

This is used to ensure that the measure is actually measure of what it is intended
to measure (construct), and not the other variables. Using panel of experts who are
familiar with the construct is a way in which this type of validity can be assessed. The
experts can examine the items and decide what that specific item is intended to
measure. Students can be involved in this process to obtain their feedback.

Example: A women’s studies program may design a cumulative assessment of learning


throughout the major. The questions are written with complicated wording and phrasing.
This can cause the test inadvertently becoming a test or reading comprehension,
rather than a test of women’s studies. It is important that the measure is actually
assessing the intended construct, rather than extraneous factor.

A test has construct validity if it accurately measures a theoretical, non-


observable construct or trait. The construct validity of a test is worked out over a period
of time on the basis of an accumulation of evidence. There are a number of ways to
establish construct validity.

Two methods of establishing a test’s construct validity are convergent/divergent


validation and factor analysis.

a. Convergent/divergent validation. A test has convergent validity if it has a high


correlation with another test that measures the same construct. By contrast, a
test’s divergent validity is demonstrated through a low correlation with a test that
measures a different construct. Note: this is the only case when a low correlation
coefficient (between two tests that measure different traits) provides evidence of
high validity.
b. Factor analysis. Factor analysis is a complex statistical procedure which is
conducted for a variety of purposes, one of which is to assess the construct
validity of a test or a number of tests.

Important Things to Remember about Validity

1. Validity refers to the decisions we make, and not to the test itself or to the
measurement.
2. Like reliability, validity is not an all-or-nothing concept; it is never totally
absent or absolutely perfect.
3. A validity estimate, called a validity coefficient, refers to specific type of
validity. It ranges between 0 and 1.
4. Validity can never be finally determined; it is specific to each administration
of the test.

Factors Affecting the Validity of a Test Item


1. The test itself.
2. The administration and scoring of a test.
3. Personal factors influencing how students response to the test.
4. Validity is always specific to a particular group.

Reasons That Reduce the Validity of the Test Item


1. Poorly constructed test items
2. Unclear directions
3. Ambiguous test items
4. Too difficult vocabulary
5. Complicated syntax
6. Inadequate time limit.
7. Inappropriate level of difficulty
8. Unintended clues
9. Improper arrangement of test items
Ways to Improve Validity
1. Make sure your goals and objectives are clearly defined and achievable.
Expectations of students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally,
have the test reviewed by faculty of other schools to obtain feedback from
outside party.
3. Get students involved; have the students look over the assessment for
troublesome wording, or other difficulties.
4. If possible, compare your measure with other measures, or data that may be
available.

Validity Coefficient
The validity coefficient is the computed value of Pearson Product Moment
Correlation Coefficient rxy.
Pearson Product Moment Correlation Coefficient (rxy) Formula:
(𝒏) (∑ 𝒙𝒚)−(∑ 𝒙) (∑ 𝒚)
rxy =
√[(𝒏)(∑ 𝒙𝟐 )− (∑ 𝒙)𝟐 ][(𝒏)(∑ 𝒚𝟐 )−(∑ 𝒚)𝟐 ]

where:
r – correlation coefficient
n – refers to the number of “pairs” your data has
x – values of the x-variable in a sample
𝑥̅ – mean of the value of the x-variable
y- values of the y-variable in a sample
𝑦̅ –mean of the values of the y-variable

∑ x - means that you have to add up all the x scores.


∑ 𝑦 – means that you have to add up all the y scores
∑ 𝑥 2 - means that you have need to square the x scores before adding them up
∑ 𝑦 2 - means that you have need to square the y scores before adding them up
∑ 𝑥𝑦 - means that you must multiply each x score by each y score and then add them
up. This is called the cross-product

In theory, the validity coefficient has values like the correlation that ranges from 0
to 1. In practice, most of the validity scores are usually small and they range from
0.3 to 0.5, few exceed 0.6 to 0.7. Hence, there is a lot of improvement in most of our
psychological measurement.

Another way of interpreting the findings is to consider the squared correlation


coefficient (rxy)2 , this is called coefficient of determination. Coefficient of
determination indicates how much variation in the criterion can be accounted for by
the predictor (teacher test). Example, if the computed value of r xy = 0.75. The
coefficient of determination is (0.75) 2 = 0.5625 or 56.25% of the variance in the
student performance can be attributed to the test or 43.75% of the student
performance cannot be attributed to the test results.
Lesson 2: RELIABILITY

Lesson Outcomes:
At the end of the lesson, the learners must have:
1. discussed reliability and its types; and
2. computed and interpreted reliability coefficient of a test.

ACTIVATE:

Reliability and validity are closely related. To better understand this relationship, let's
step out of the world of testing and onto a bathroom scale.
If the scale is reliable it tells you the same weight every time you step on it
as long as your weight has not actually changed. However, if the scale is
not working properly, this number may not be your actual weight. If that is
the case, this is an example of a scale that is reliable, or consistent, but not
valid. For the scale to be valid and reliable, not only does it need to tell you
the same weight every time you step on the scale, but it also has to measure your actual weight.
Switching back to testing, the situation is essentially the same. A test
can be reliable, meaning that the test-takers will get the same score no matter
when or where they take it, within reason of course. But that doesn't mean that
it is valid or measuring what it is supposed to measure. A test can be reliable
without being valid. However, a test cannot be valid unless it is reliable.
In order for assessments to be sound, they must be free from bias
and distortion. Reliability and validity are two concepts that are important for defining
and measuring bias and distortion. Reliability refers to the extent to which assessments
are consistent. Instruments such as classroom tests and national standardized exams
should be reliable- it should not make any difference whether a student takes the
assessment in the morning or afternoon; one day or the next.

ACQUIRE:

Types of Reliability
1. Test-retest Method.

Test-retest is a type of reliability determined by administering the same test twice


to the same group of students with any time interval between the tests. The results of
the test scores are correlated using the Pearson product correlation coefficient (r) and
this correlation coefficient provides a measure of stability. This indicates how stable the
test result over a period of time. The formula is:
(𝒏) (∑ 𝒙𝒚)−(∑ 𝒙) (∑ 𝒚)
rxy =
√[(𝒏)(∑ 𝒙𝟐 )− (∑ 𝒙)𝟐 ] [(𝒏)(∑ 𝒚𝟐 )−(∑ 𝒚)𝟐 ]
Why it’s important?
Many factors can influence your results at different points in time: for example,
respondents might experience different moods, or external conditions might affect their
ability to respond accurately.

Test-retest reliability can be used to assess how well a method resists these
factors over time. The smaller the difference between the two sets of results, the higher
the test-retest reliability.

Test-retest reliability example

• A test designed to assess student learning in psychology could be given to a


group of students twice, with the second administration perhaps a week after the
first. The obtained correlation coefficient would indicate the stability of the scores.

Improving test-retest reliability

• When designing tests or questionnaires, try to formulate questions, statements


and tasks in a way that won’t be influenced by the mood or concentration of
participants.
• When planning your methods of data collection, try to minimize the influence of
external factors, and make sure all samples are tested under the same
conditions.
• Remember that changes can be expected to occur in the participants over time,
and take these into account.

2. Equivalent or Parallel Forms Reliability


A type of reliability determined by administering two different but equivalent forms
of the test (also alternate forms) to the same group of students in close succession. The
equivalent forms are constructed to the same set of specifications that is similar in
content, type of items and difficulty. The results of the test scores are correlated using
the Pearson product correlation coefficient (r) and this correlation coefficient provides a
measure of the degree to which generalization about the performance of students from
one assessment to another assessment is justified. It measures the equivalence of the
tests.

Why it’s important?


If you want to use multiple different versions of a test (for example, to avoid
respondents repeating the same answers from memory), you first need to make sure
that all the sets of questions or measurements give reliable results.
Parallel forms reliability example
• If you want to evaluate the reliability of critical thinking assessment, you might
create a large set of items that all pertain to critical thinking and then randomly
split the questions up into two sets, which would represent the parallel forms.

Improving parallel forms reliability

• Ensure that all questions or test items are based on the same theory and
formulated to measure the same thing.

3. Inter-rater Reliability
It is a measure of reliability used to assess the degree to which different judges
or raters agree in their assessment decisions. Inter-rater reliability is useful because
human observers will not necessarily interpret answers the same way; raters may
disagree as to how well certain responses or material demonstrate knowledge of the
construct or skill being assessed.

Why it’s important?


People are subjective, so different observers’ perceptions of situations and
phenomena naturally differ. Reliable research aims to minimize subjectivity as much as
possible so that a different researcher could replicate the same results.

Interrater reliability example

• Inter-rater reliability might be employed when different judges are evaluating the
degree to which art portfolios meet certain standards. Inter-rater reliability is
especially useful when judgments be more likely when evaluating artwork as
opposed to math problems.

4. Internal Consistency Reliability is a measure of reliability used to evaluate the


degree to which different test items of the same construct procedure similar results.

• Split-half Method. Administer test once and score two equivalent halves of
the test. To split the test into halves that are equivalent, the usual
procedure is to score the even-numbered and the odd-numbered test item
separately. This provides two scores for each student. The results of the
test scores are correlated using the Spearman-Brown formula and this
correlation coefficient provides a measure of internal consistency. It
indicates the degree to which consistent results are obtained from two
2𝑟
halves of the test. The formula is: rot = 1+𝑟𝑜𝑒 .The details of this formula will
𝑜𝑒
be discussed in later lessons.
• Kuder-Richardson Formula. Administer the test once. Score the total
test and apply the Kuder-Richardson formula. The Kuder-Richardson 20
formula is applicable only in situations where students' responses are
scored dichotomously, and therefore, is most useful with traditional test
items that are scored as right or wrong, true or false, and yes or no type.
KR-20 formula estimates of reliability provide information whether the
degree to which the item in the test measure is of the same characteristic,
it is an assumption that all items are of equal in difficulty. (A statistical
procedure used to estimate coefficient alpha, 'a correlation coefficient is
given.) Another formula for testing the internal consistency of a test is the
KR-21 formula, which is not limited to test items that are scored
dichotomously.

Why it’s important?

When you devise a set of questions or ratings that will be combined into an overall
score, you have to make sure that all of the items really do reflect the same thing. If
responses to different items contradict one another, the test might be unreliable.

Improving internal consistency

• Take care when devising questions or measures: those intended to reflect the
same concept should be based on the same theory and carefully formulated.

Factors Affecting Reliability of a Test


1. Length of the test
2. Moderate item difficulty
3. Objective scoring
4. Heterogeneity of the student group
5. Limited time

RELIABILITY COEFFICIENT
Reliability coefficient is a measure of the amount of error associated with the test
scores.
Description of Reliability Coefficient
a. The range of the reliability coefficient is from 0 to 1.0.
b. The acceptable range value 0.60 or higher.
c. The higher the value of the reliability coefficient, the more reliable the overall
test scores.
d. Higher reliability indicates that the test items measure the same thing.
Example, knowledge of solving number problem in algebra subject.

1. Pearson Product Moment Correlation Coefficient (rxy)


(𝒏) (∑ 𝒙𝒚)−(∑ 𝒙) (∑ 𝒚)
rxy =
√[(𝒏)(∑ 𝒙𝟐 )− (∑ 𝒙)𝟐 ][(𝒏)(∑ 𝒚𝟐 )−(∑ 𝒚)𝟐 ]

2. Spearman-Brown Formula
𝟐𝒓𝒐𝒆
rot = 𝟏+𝒓𝒐𝒆

where, rot = reliability of the original test


roe = reliability of the correlation of odd and even items

3. KR-20 and KR-21 Formulas


The KR-20 formula is also known as the Kuder-Richardson formula.
𝒌 ∑ 𝒑𝒒
KR20 = (𝟏 − )
𝒌−𝟏 𝒔𝟐

k = number of items
p = proportion of the students who got the item correctly (index of
difficulty)
s2 = variance of the total score
𝒌 𝒙 ̅̅̅
̅ (𝒌− 𝒙)
KR21 = c[𝟏 − ]
𝒌−𝟏 𝒌𝒔𝟐

k = number of
items
𝑥̅ = mean value
s2 = variance of the total score

Interpreting Reliability Coefficient


1. The group variability will affect the size of the reliability coefficient. Higher
coefficient results from heterogeneous groups than from the homogenous
groups. As group variability increases, reliability goes up.
2. Scoring reliability limits test score reliability. If tests are scored unreliable, error
is introduced. This will limit the reliability of the test scores.
3. Test length affects test score reliability. As the length increases, the test's
reliability tends to go up.
4. Item difficulty affects test score reliability. As test items become very easy or
very hard, the test's reliability goes down.
Level of Reliability Coefficient Reliability
Reliability Coefficient Interpretation
Excellent reliability
Above 0.90
Very good for a classroom test
0.81 – 0.90
0.71-0.80 Good for classroom test. There are probably few items
needs to be improved
Somewhat low. The test needs to be supplemented by
0.61-0.70
other measures (more test) to determine grades
Suggests need for revision of test, unless it is quite short
0.51-0.60 (ten or fewer items). Needs to be supplemented by other
measures (more test) for grading
Questionable reliability. This test should not contribute
0.50 and Below
heavily to the course grade, and it needs revision.
Source: Scorepak@: Item Analysis

APPLY:
Let us discuss the steps in solving the reliability coefficient using the different
methods of establishing the reliability of the given tests using the different examples.
Example 1: Prof. Gwen conducted a test to her 10 students in Elementary Statistics class
twice after one-day interval. The test given after one day is exactly the same test given
the first time. Scores below were gathered in the first test (FT) and second test (ST).

Using test-retest method, is the test reliable? Show the complete solution.

Student FT ST
1 36 38
2 26 34
3 38 38
4 15 27
5 17 25
6 28 26
7 32 35
8 35 36
9 12 19
10 35 38
Using the Pearson r formula, find the ∑x, ∑y, ∑xy, ∑x2, ∑y2.

Student FT (x) ST (y) xy x2 y2

1 36 38 1 368 1 296 1 444

2 26 34 884 676 1 156


3 38 38 1 444 1 444 1 444
4 15 27 405 225 729
5 17 25 425 289 625
6 28 26 728 784 676
7 32 35 1 120 1 024 1 225
8 35 36 1 260 1 225 1 296
9 12 19 228 144 361
10 35 38 1 330 1 330 1 444

n = 10 ∑x = 274 ∑y = 316 ∑xy = 9 192 ∑x2 = 8 332 ∑ y2 =10, 400

(𝑛) (∑ 𝑥𝑦)−(∑ 𝑥) (∑ 𝑦)
rxy =
√[(𝑛)(∑ 𝑥 2 )− (∑ 𝑥)2 ][(𝑛)(∑ 𝑦 2 )−(∑ 𝑦)2 ]

(10) (9 192)−(274)(316)
rxy =
√[(10)(8 33 2)−(274)2 ][(10)(10 400)−(316)2 ]

91 920−86 584
rxy =
√(83 320−75 076) (104 000−99 856)

5 336
rxy =
√ 8 244)(4 144)
(

5 336
rxy =
√34 163 136

5 336
rxy = 5 844.92

rxy = 0.91

Analysis:
The reliability coefficient using the Pearson r = 0.91, means that it has a very
high reliability. The scores of the 10 students conducted twice with one-day interval are
consistent. Hence, the test has a very high reliability.
Example 2: Prof. Glenn conducted a test to his 10 students in his Mathematics class
two times after one-week interval. The test given after one week is the parallel form of
the test during the first time the test was conducted. Scores below were gathered in the
first test (FT) and second test or parallel test (PT). Using equivalent or parallel form
method, is the test reliable? Show the complete solution, using the Pearson r formula.

Student FT PT

1 12 20
2 20 22
3 19 23
4 17 20
5 25 25
6 22 20
7 15 19
8 16 18
9 23 25
10 21 24

Using the Pearson r formula, find the ∑x, ∑y, ∑xy, ∑x2, ∑y2.

Student FT(x) PT(y) Xy x2 y2

1 12 20 240 144 400


2 20 22 440 400 484
3 19 23 437 361 529
4 17 20 340 289 400
5 25 25 625 625 625
6 22 20 440 484 400
7 15 19 285 225 361
8 16 18 288 256 324
9 23 25 575 529 625
10 21 24 504 441 576
n = 10 ∑x = 190 ∑𝒚 =216 ∑xy= 4 174 ∑ x2= 3 754 ∑ y2 = 4 724

(𝑛) (∑ 𝑥𝑦)−(∑ 𝑥) (∑ 𝑦)
rxy =
√[(𝑛)(∑ 𝑥 2 )− (∑ 𝑥)2 ][(𝑛)(∑ 𝑦 2 )−(∑ 𝑦)2 ]
(10) (4 174)−(190)(216)
rxy =
√[(10)(3 754)−(190)2][(10)(4 724)−(216)2 ]

41 740−41 040
rxy =
√(37 540−36 100) (47 240 − 46 656 )

700
rxy =
√ 1 440)(584)
(

700
rxy =
√840 960

700
rxy = 917.04

rxy = 0.76

Analysis:
The reliability coefficient using the Pearson r = 0.76, means that it has a high
reliability. The scores of the 10 students conducted twice with one-week interval are
Consistent. Hence, the test has a high reliability.

Example 3: Prof. Edwin Santos conducted a test to his 10 students in his Chemistry
class. The test was given only once. The scores of the students in odd and even items
below were gathered, (O) odd items and (E) even items. Using split-half method, is the
test reliable? Show the complete solution.

Odd (x) Even (y)


15 20
19 17
20 24
25 21
20 23
18 22
19 25
26 24
20 18
18 17

𝟐𝒓
Use the formula rot = 𝟏+𝒓𝒐𝒆 to find the reliability of the whole test, find the ∑x, ∑y, ∑xy,
𝒐𝒆
∑x2, ∑y2 to solve the reliability of the odd and even test items.
Odd (x) Even (y) xy x2 y2
15 20 300 225 400
19 17 323 361 289
20 24 480 400 576
25 21 525 625 441
20 23 460 400 529
18 22 396 324 484
19 25 475 361 625
26 24 624 676 576
20 18 360 400 324
18 17 306 324 289
2 2
∑x = 200 ∑𝐲 = 211 ∑xy = 4249 ∑x = 4 096 ∑y = 4 533

(𝒏) (∑ 𝒙𝒚)−(∑ 𝒙) (∑ 𝒚)
rxy =
√[(𝒏)(∑ 𝒙𝟐 )− (∑ 𝒙)𝟐 ][(𝒏)(∑ 𝒚𝟐 )−(∑ 𝒚)𝟐 ]
(𝟏𝟎) (𝟒 𝟐𝟒𝟗)−(𝟐𝟎𝟎)(𝟐𝟏𝟏)
rxy =
√[(𝟏𝟎)(𝟒 𝟎𝟗𝟔)−(𝟐𝟎𝟎)𝟐][(𝟏𝟎)(𝟒 𝟓𝟑𝟑)−(𝟐𝟏𝟏)𝟐 ]

𝟒𝟐 𝟒𝟗𝟎−𝟒𝟐 𝟐𝟎𝟎
rxy =
√ 𝟒𝟎 𝟗𝟔𝟎−𝟒𝟎 𝟎𝟎𝟎) (𝟒𝟓 𝟑𝟑𝟎− 𝟒𝟒 𝟓𝟐𝟏 )
(

𝟐𝟗𝟎
rxy =
√(𝟗𝟔𝟎)(𝟖𝟎𝟗)
𝟐𝟗𝟎
rxy =
√𝟕𝟕𝟔 𝟔𝟒𝟎
𝟐𝟗𝟎
rxy = 𝟖𝟖𝟏.𝟐𝟕

rxy = 0.33
Find the reliability of the original test using the formula:
𝟐𝒓
rot = 𝟏+𝒓𝒐𝒆
𝒐𝒆

𝟐 (𝟎.𝟑𝟑)
rot = 𝟏+𝟎.𝟑𝟑
𝟎.𝟔𝟔
rot = 𝟏.𝟑𝟑

rot = 0.50
Analysis:

The reliability coefficient using Brown formula is 0.50, which is questionable


reliability. Hence, the test items should be revised.
Example 4: Ms. Kaitlin administered a 40-item test in English for her Grade VI pupils in
Miagao Central Elementary School. Below are the scores of 15 pupils, find the reliability
using Kuder-Richardson 21(KR-21) formula.

Student Score (x)


1 16
2 25
3 35
4 39
5 25
6 18
7 19
8 22
9 33
10 36
11 20
12 17
13 26
14 35
15 39

Solve the mean and the standard deviation of the scores using the table below.

Student Score (x) x2

1 16 256
2 25 625
3 35 1 225
4 39 1 521
5 25 625
6 18 324
7 19 361
8 22 484
9 33 1 089
10 36 1 296
11 20 400
12 17 289
13 26 676
14 35 1 225
15 39 1 521
n = 15 ∑x = 405 ∑ x2 = 11 917
𝒏 (∑ 𝒙𝟐 )− (∑𝒙)𝟐
Standard Deviation Formula: s2 = 𝒏 (𝒏−𝟏)
Applying the formula, we have:
𝟏𝟓 (𝟏𝟏 𝟗𝟏𝟕)− (𝟒𝟎𝟓)𝟐
s2 = 𝟏𝟓 (𝟏𝟓−𝟏)
𝟏𝟕𝟖 𝟕𝟓𝟓− 𝟏𝟔𝟒 𝟎𝟐𝟓
s2 = 𝟏𝟓 (𝟏𝟒)
𝟏𝟒 𝟕𝟑𝟎
s2 = 𝟐𝟏𝟎
s2 = 70.14
∑𝐱
Mean (x̅) Formula: 𝐧
𝟒𝟎𝟓
𝐱̅ = 𝟏𝟓
𝐱̅ = 27
Solve for the reliability coefficient using the Kuder-Richardson 21 formula.
k x̅ (k− x̅)
KR21 = [1 − ]
k−1 ks2
40 27 (40− 27)
KR21 = [1 − ]
40−1 40 (70.14)
40 27 (13)
KR21 = [1 − ]
39 40 (70.14)
351
KR21 = 1.03 [1 − ]
2 805.60

KR21 = 1.03 (1 – 0.1251)


KR21 = 1.03 (0.8749)
KR21 = 0.90
Analysis:
The reliability coefficient using KR-21 formula is 0.90 which means that the test
has a very good reliability. Meaning, the test is very good for a classroom test.

ASSESS:

A. Identify the type of Reliability that is being described.

Measures the consistency of…


a. The same test conducted by different people
b. The individual items of a test.
c. Different versions of a test which are designed to be equivalent.
d. The same test over time.

B. Using what you’ve learned about reliability, brainstorm possible answers to this
problem.

1. Mr. Cruz teaches Intermediate Algebra. Parents are very enthusiastic


about the school’s high standards and international focus and students are
generally eager to excel in their classes. Mr. Cruz administers
achievement tests to his students every quarter and he often finds himself
explaining the results to parents and students.
How could Mr. Cruz handle each of the situations below?
a. I heard someone say that the test they took had different questions
(Set A) from mine
(Set B). It sounds like theirs was easier. I don’t think that’s fair.
Could I take the test again?
b. I do not think the score that my child got on this test is accurate.
Could you please rescore it?
c. I was having a bad day and I know I did not do as well as I could
have on the test. Could I re-take it?

C. Compute the following and interpret the result:


Using your validated 20-item multiple choice test in your Unit 3: Lesson1,
conduct a test to 20 respondents (assuming you followed the guidelines in conducting a
test). The test will be given only once.
Due to COVID-19 pandemic, you can just have your classmates as your respondents.
a. Using split-half method, is the test reliable? Show the complete solution.
Note: The scores of the students in odd and even items below will be
collected, (O) odd items and (E) even items.

b. Find the reliability using Kuder-Richardson 21(KR-21) formula.


Lesson 3: FAIRNESS
Lesson Outcomes:
At the end of the lesson, the learners must have discussed fairness and its importance
in assessment.
.
ACTIVATE:

As far back as assessments


have been used in education,
there have been students who
stated that the assessment was
‘unfair’. Take a look at the picture
on the left. What can you say
about the situation? For you what
makes an assessment ‘fair’ or
‘unfair’?

ACQUIRE:

Fairness means the test item should not have any biases. It should not be
offensive to any examinee subgroup. A test can only be good if it is fair to all the
examinees.
An assessment procedure needs to be fair. This means many things. First,
students need to know exactly what the learning targets are and what method of
assessment will be used. If students do not know what they are supposed to be
achieving, then they could get lost in the maze of concepts being discussed in class.
Likewise, students have to be informed how their progress will be assessed in order to
allow them to strategize and optimize their performance.
Second, assessment has to be viewed as an opportunity to earn rather than an
opportunity to weed out poor and slow earners. The goal should be that of diagnosing
the learning process rather than the learning object.
Third, fairness also implies freedom from teacher-stereo-typing. Some
examples of stereotyping include: boys are better than girls in Mathematics or girls are
better than boys in language. Such stereotyped images and thinking could lead to
unnecessary and unwanted biases in the way that teachers assess their students.

No one wants to use an assessment tool with obvious stereotyping or offensive material,
of course. But it's easy to assess in ways that inadvertently favor some students over others.
Effective assessment processes yield evidence and conclusions that are meaningful,
appropriate, and fair to all relevant subgroups of students (Lane, 2012; Linn, Baker, & Dunbar,
1991). The following tips minimize the possibility of inequities.
1. Don't rush.
Assessments that are thrown together at the last minute invariably include flaws that
greatly affect the fairness, accuracy, and usefulness of the resulting evidence.

2. Plan Your Assessments carefully.


Aim not only to access your key learning goals but to do so in a balanced, representative
way. If your key learning goals are that students should understand what happened during a
certain historical period and evaluate the decisions made by key figures during that period, for
example, your test should balance questions on basic conceptual understanding with questions
assessing evaluation skills.

3. Aim for Assignments and Questions That Are Crystal clear.


If students find the question difficult to understand, they may answer what they think is
the spirit of the question rather than the question itself, which may not match your intent.

4. Guard Against Unintended bias.


A fair and unbiased assessment uses contexts that are equally familiar to all and uses
words that have common meanings to all. A test question on quantitative skills that asks
students to analyze football statistics might not be fair to women, and using scenarios involving
farming may be biased against students from urban areas, unless you are specifically assessing
student learning in these contexts.

5. Ask a Variety of People with Diverse Perspectives to Review Assessment tools.

This helps ensure that the tools are clear, that they appear to assess what you want
them to, and that they don't favor students of a particular background.

6. Try Out Large-Scale Assessment tools.


If you are planning a large-scale assessment with potentially significant consequences,
try out your assessment tool with a small group of students before launching the large-scale
implementation. Consider asking some students to think out loud as they answer a test
question; their thought processes should match up with the ones you intended. Read students'
responses to assignments and open-ended survey questions to make sure their answers make
sense, and ask students if anything is unclear or confusing.

(Excerpted and adapted from Assessing Student Learning: A Common Sense Guide, 3rd
Edition by Linda Suskie. Copyright © 2018, Wiley.)

Practices that lead to fairness in these areas are considered separately below. Fairness
in assessment arises from good practice in four phases of testing:

a. Writing
b. Administering
c. Scoring
d. Interpreting assessments.
a. Writing assessments. Base assessments on course objectives. Students expect a
test to cover what they have been learning. They also have a right to a test that neither
“tricks” them into wrong answers nor rewards them if they can get a high score through
guessing or bluffing.

Cover the full range of thinking skills and processes. Assuming instruction has included
higher order thinking, so should an assessment based on that instruction prompt
students to use the material intellectually, not merely repeat memorized knowledge.
Further, if a teacher’s tests cover only memorization, the students will emphasize only
memorization of facts in their preparation.

Cover course content proportionally to coverage in instruction. The content areas on the
test should be representative of what students have studied. The best guide to
appropriate proportions is the relative amounts of instructional time spent on those
topics.

Test what is important for students to know and be able to do rather than isolated trivia.
The best guide to the content most appropriate for the test is to cover what is important
for students to come away with from the course. When writing a test, ask yourself
whether each task is what other teachers would agree is important when teaching that
course. Better, asks a colleague to review your draft test.

Avoid contexts and expressions that are more familiar and/or intriguing to some
students than to others. One challenge in writing tests is to make sure none of your
students are advantaged or disadvantaged because of their different backgrounds. For
example, music, sports, or celebrity-related examples might be appealing to some
students but not others. Language or topics should not be used if they are more well-
known or interesting to some students than to others. If that proves impossible, then at
least make sure the items that favor some students are balanced with others that favor
the rest.

b. Giving assessments. Make sure students have had equal opportunities to learn the
material on the assessment. Whether or not students have learned as much as they
can, at least they should have had equal chances to do so. If some students are given
extra time or materials that are withheld from others, the others likely will not feel they
have been treated fairly.

Announce assessments in plenty of time for students to prepare for them. Since
students’ learning styles differ, some will keep up to date with their studying and others
will prefer to put in extra effort when it is most needed. Surprise assessments reward
the former and punish the latter. But these styles are not part of the material to be
learned. Not only is it more fair to announce assessments in advance, it also serves as
a motivator for students to study.

Make sure students are familiar with the formats they will be using to respond. If some
students are not comfortable with the types of questions on an assessment, they will not
have an equal chance to show what they can do. If that might be the case, some
practice with the format beforehand is recommended to help them succeed.

Give plenty of time. Most tests in education do not cover content that will eventually be
used under time pressure. Thus, most assessments should reward quality instead of
speed. Only by allowing enough time so virtually all students have an opportunity to
answer every question will the effects of speed be eliminated as a barrier to
performance.

c. Scoring assessments. Make sure the rubric used to score responses awards full
credit to an answer that is responsive to the question asked as opposed to requiring
more information than requested for full credit. If the question does not prompt the
knowledgeable student to write an answer that receives full credit, then it should be
changed. It is unfair to reward some students for doing more than has been requested
in the item; not all students will understand the real (and hidden) directions since they
have not been told.

d. Interpreting assessments. Base grades on summative, end-of-unit assessments


rather than formative assessments that are used to make decisions about learning as it
is progressing. The latter are intended as diagnostic and to be used to help accomplish
learning. Since grades certify attainment, they should be determined based on
assessments made after learning has taken place.

Base grades on several assessment formats. Since students differ in their preferred
assessment formats, some are advantaged by selected-response tests, others by essay
tests, others by performance assessments, and still others by papers and projects.

Base grades on several assessments over time. As with assessment formats, grades
should also depend on multiple assessments taken at different times.

Make sure factors that could have resulted in atypical performance for a student are
used to minimize the importance of the student’s score on that assessment. If it is
known that a student has not done her or his best, then basing a grade or other
important decision on that assessment is not only unfair; it is inaccurate. Retrieved
from:
https://wps.ablongman.com/ab_slavin_edpsych_8/38/9954/2548362.cw/content/index.ht
ml

APPLY:

Example:
Suppose a high stakes math test that must be passed contained a lot of word
problems based on competitive sports examples that many more boys than girls were
familiar with. The girls may have a lower performance than the boys because they are
less familiar with the sports contexts of the word problems, not because they are less
skilled in math (Popham 128).
Suppose in most test items in an exam, men are shown in high-paying,
respectful career, and women are portrayed in low-paying jobs. Many women would be
offended by this and may perform less well than they normally would have (Popham
128 - 129).

ASSESS:

Answer the following:


1. What is Fairness? Why is Fairness important in Assessment?

2. If a question ask about is attending an amusement park that costs a lot, have
all your students had the chance to visit it? Is it fair to asked questions to
economically challenged students that do not get that experience? Why?

3. If a question states or insinuates girls are not good at sports with in the
content, there will be some upset students while taking the test. Do you take a
test well while being upset and offended? Why?
Lesson 4: POSITIVE CONSEQUENCES, PRACTICALITY, AND
EFFICIENCY
Lesson Outcomes:
At the end of the lesson, the learners must have discussed positive consequences and
practicality and efficiency in assessment.

ACTIVATE:

In your own words, explain the


meaning of the quotation in the left.

ACQUIRE:
A. Positive Consequences

Positive consequences or positive reinforcements are still in response to an


action, but this time, it's a response to something good that the student is doing. It can
still be a learning opportunity, but the response has to be something that will make the
students wants to continue those behaviors that you want to see more of.
For example, when you are teaching students to raise their hand to answer a
question, you can give them a piece of candy or a high five when they do so. The
positive consequence of a candy or high five will reinforce the positive behavior of
raising their hand after they answer a question.
Often times, when we are giving a negative consequence in the classroom, we
are reinforcing a behavior because we are giving that outburst attention, which may be
exactly what the student wants. We can reverse this idea, by giving praise to a student
who is on task and showing kids that they get attention when they are following the
rules and acting within the boundaries you set as the teacher.

Quality Positive Consequences for Teachers and Students


• Students will learn and study.
• There are positive consequences on student motivation.
• The relationship between student & teacher is strengthened.
• Teachers focus teaching towards the assessment.
• Better decisions are made about student needs.
• Teachers get accurate perceptions of others.
What are the positive consequences of assessment?
1. Students will be accustomed to reforming their study habits to reflect the
teacher’s assessment style (essay questions will demand a different style of
studying).
2. Appropriate assessment may increase student motivation due to the fact that the
student will understand the assessment style and the feedback provided by the
teacher.
3. Motivation will increase by having appropriate assessment designs restructured.
4. The teacher will structure the instruction to reflect the impending assessment
designs.

B. Practicality and Efficiency

Practicality and Efficiency of Assessment of Student Learning


Teachers need to be familiar with the tools of Assessment. In the development
and use of classroom assessment tools, certain issues must be addressed in relation to
the following important criteria.
1. Purpose and Impact. How will the assessment be used and how will it
impact instruction and the selection of curriculum?
2. Validity and Fairness. Does it measure what it intend to measure? Does it
allow students to demonstrate both what they know and are able to do?
3. Reliability. Is the data that is collected reliable across applications within the
classroom, school, and district?
4. Significance. Does it address content and skills that are valued by and
reflect current thinking in the field?
5. Efficiency. Is the method of assessment consistent with the time available in
the classroom setting?

According to Rehman (2007), Usability or practicality is another important


characteristic of a good test. It deals with all practical considerations that go into decisions
to use a particular test. While constructing or selecting a test, practical considerations
must be taken into account. Rehman (2007) has given the following five practical
considerations of a test.

1. Ease of administration
The test should be easy to administer so that the tester may easily administer. For this
purpose there should be simple and contain clear instructions. There should be little
number of subsets and appropriate (not too long) time for administering the test.

2. Time required for administration


In order to provide appropriate time to take the test, if the time is reduced, then reliability
of the test will also reduce. A safe procedure to allocate as much time as the test requires
for providing reliable and valid results. Between 20 to 60minutes is fairly good time for
each individual score yielded by a published test.
3. Ease of interpretation and application
Another important aspect of practicality of test scores is interpretation and application of
test results. If they are misinterpreted that will be harmful for the students. On the other
hand if they are misapplied or not applied at all, then the test is useless.

4. Availability of equivalent forms


Equivalent forms tests help to verify the test scores. Retesting at the same time on the
same domain of learning eliminates the factor of memory among the students. The
availability of equivalent forms of the test should be taken into mind while
constructing/selecting a test.

5. Cost of testing
A test should be economical in terms of preparation, administration and scoring.

Importance of practicality of a test


The teachers, particularly untrained teachers can easily administer the tests which
have been constructed/ considering practicality. The parents can be informed with right
test results if the practical considerations have been taken care while constructing a test,
which they will use in decision making about their children. Economical tests may save
unnecessary expenses on stationery, print materials, and photocopies and so on. True
interpretations of test scores will be used by the students in their own plans and decisions
(Linn & Gronlund, 2000).

Limitations
• There are chances of giving wrong directions to students by untrained teachers while
constructing or administering the tests.
• If time is reduced for taking test, the reliability of the test is reduced.
(Retrieved from: https://acadstuff.blogspot.com/2017/06/practicalityusability-
characteristic-of_73.html:)

APPLY:
Give one example of a positive consequence in assessment.
Give one example where practicality and efficiency is considered in assessment.

ASSESS:

Answer the following questions:


1. What is a positive consequence? Why is it important in assessment?
2. What is practicality and efficiency? Why is it important in assessment?

UNIT REFERENCES

Airasian, P. W. (2000). Assessment in the classroom: A concise approach


(Second Edition). USA: McGraw-Hill Companies, Inc.
De Guzman, E.S. & Adamos, J. L. (2015) Assessment of Learning 1. Adriana
Printing Company

Gabuyo, Y.A. (2014). Assessment of Learning I: Texbook and Reviewer. Rex


Printing Company, Inc. Quezon City

McMillian, J.H. (2017). Classroom Assessment: Principles and Practice for


Effective Standards-Based Instruction, 7th ed. Pearson:USA

Navarro, R. L., Santos, R. G. & Corpuz, B. B. (2017). Assessment of learning 1


(Third Edition). Quezon City, Phils.: Lorimar Publishing, Inc. Robinson

Russel, M.K. & Airasian, P.W. (2011). Classroom Assessment: Concepts and
Applications, 7th ed. McGraw-Hill Education

You might also like