0% found this document useful (0 votes)
42 views

Statisticals Tools Used in Establishing Validity and Reliability

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Statisticals Tools Used in Establishing Validity and Reliability

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

 It refers to the consistency of scores obtained by the

same person when retested using the same


RELIABILITY
instrument or one that is parallel to it. DEFINITION
Reliability coefficient is a measure of the amount of
error associated with the test scores.
RELIABILITY
Description of Reliability Coefficient COEFFICIENT
a. The acceptable range value is 0.70 but preferably
higher.
b. The higher the value of the reliability coefficient,
the more reliable the overall test score.
c. Higher reliability indicates that the test items
measure the same thing.
LEVEL OF RELIABILITY COEFFICIENT
Reliability Interpretation
Coefficient
Above 0.90 Excellent reliability
0.81-0.90 Very good for a classroom test RELIABILITY
0.71-0.80 Good for classroom test. There are probably few COEFFICIENT
items needs to be improved.
0.61-0.70 Somewhat low. The test needs to be
supplemented by other measures (more test) to
determine grades.
0.51-0.60 Suggest need for revision of test, unless it is
quite short (ten or fewer items). Needs to be
supplemented by other measures (more test) for
grading.
0.50 and below Questionable reliability. This test should not
contribute heavily to the course grade, and it
needs revision.
SOURCE: SCOREPAK®:ITEM ANALYSIS
r Interpretation
0.00 Zero correlation CORRELATION
0.01 to 0.20 Negligible correlation
INTERPRETATION
0.21-0.40 Low or slight correlation
0.41-0.70 Moderate relationship
0.71-0.90 High relationship
0.91-0.99 Very high relationship
1.0 Perfect correlation
 A type of reliability determined by administering
the same test twice to the same group of students


with any time interval between the tests.
The results of the test scores are correlated using
RELIABILITY
the Pearson product correlation coefficient (r) and TEST-RETEST
this correlation coefficient provides a measure of
stability.
 This indicates how stable the test result over a
period of time.
 The formula is
(𝑛)(σ 𝑥𝑦)−(σ 𝑥)(σ 𝑦)
𝑟𝑥𝑦 =
Example 1: Prof. Henry conducted test to his 10
students in Elementary Statistics class twice after one-
day interval. The test given after one day is exactly the
same test given the first time. Scores below were
gathered in the first test (FT) and second (ST). Using
test-retest method, is the test reliable? RELIABILITY
Student FT ST
TEST-RETEST
1 36 38
2 26 34
3 38 38
4 15 27
5 17 25
6 28 26
7 32 35
8 35 36
9 12 19
10 35 38
Solution:
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝐹𝑇(𝑥) 𝑆𝑇(𝑦) 𝑥𝑦 𝑥2 𝑦2
1 36 38 1368 1296 1444
2
3
26
38
34
38
884
1444
676
1444
1156
1444
RELIABILITY
4 15 27 405 225 729 TEST-RETEST
5 17 25 425 289 625
6 28 26 728 784 676
7 32 35 1120 1024 1225
8 35 36 1260 1225 1296
9 12 19 228 144 361
10 35 38 1330 1225 1444
𝑛 = 10 ෍ 𝑥 = 274 ෍ 𝑦 = 316 ෍ 𝑥𝑦 = 9192 ෍ 𝑥2 = 8332 ෍ 𝑦2 = 10400

Substitute: (𝑛)(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)


𝑟𝑥𝑦 =
Solution:

10 9192 − (274)(316)
𝑟𝑥𝑦 =
[ 10 8332 − 2742][ 10 10400 − 3162]
RELIABILITY
𝒓𝒙𝒚 = 𝟎. 𝟗𝟏
TEST-RETEST
Analysis:
The reliability coefficient using the Pearson R is 0.91,
means that it has a very high reliability. The scores of
the 10 students conducted twice with one day interval
are consistent. Hence, the test has a very high
reliability.
Correlation in Microsoft Excel

RELIABILITY
Formula TEST-RETEST

Reliability
Coefficient
 A type of reliability determined by administering
two different but equivalent forms of the test (also
called the parallel or alternative forms) to the same


group of students in close succession.
The results of the test scores are correlated using
RELIABILITY
the Pearson product correlation coefficient (r) and EQUIVALENT FORM
this correlation coefficient provides a measure of
the degree to which generalization about the
performance of students from one assessment to
another assessment is justified.
 It measures the equivalence of the tests.
 The formula is
(𝑛)(σ 𝑥𝑦)−(σ 𝑥)(σ 𝑦)
𝑟𝑥𝑦 =
Example 2: Prof. Glenn conducted a test to his 10 students
in his biology class two times after one-week interval. The
test given after one week is the parallel-form of the test
during the first time the test was conducted. Scores below
were gathered in the first test (FT) and second test or
parallel test (PT). Using the equivalent or parallel form
method, is the test reliable?
RELIABILITY
EQUIVALENT FORM
Student FT PT
1 12 20
2 20 22
3 19 23
4 17 20
5 25 25
6 22 20
7 15 19
8 16 18
9 23 25
10 21 24
Solution:
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝐹𝑇(𝑥) 𝑃𝑇(𝑦) 𝑥𝑦 𝑥2 𝑦2
1 12 20 240 144 400
2
3
20
19
22
23
440
437
400
361
484
529
RELIABILITY
4 17 20 340 289 400 EQUIVALENT FORM
5 25 25 625 625 625
6 22 20 440 484 400
7 15 19 285 225 361
8 16 18 288 256 324
9 23 25 575 529 625
10 21 24 504 441 576
𝑛 = 10 ෍ 𝑥 = 190 ෍ 𝑦 = 216 ෍ 𝑥𝑦 = 4174 ෍ 𝑥2 = 3754 ෍ 𝑦2 = 4724

Substitute: (𝑛)(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)


𝑟𝑥𝑦 =
Solution:

10 4174 − (190)(216)
𝑟𝑥𝑦 =
[ 10 3754 − 1902][ 10 4724 − 2162]
RELIABILITY
𝒓𝒙𝒚 = 𝟎. 𝟕𝟔
EQUIVALENT FORM
Analysis:
The reliability coefficient using the Pearson R is 0.76,
means that it has a high reliability. The scores of the 10
students conducted twice with one-week interval are
consistent. Hence, the test has a high reliability.
Correlation in Microsoft Excel

RELIABILITY
Formula EQUIVALENT FORM

Reliability
Coefficient
 Administer test once and score two equivalent
halves of the test.
 To split the test into halves that are equivalent, the
usual procedure is to score the even-numbered and
the odd numbered test item separately.
RELIABILITY
 The results of the test are correlated using the SPLIT-HALF METHOD
Pearson product correlation coefficient (r) and
Spearman-Brown formula.
 It provides a measure of internal consistency.
 The formula are:
(𝑛)(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)
𝑟𝑥𝑦 =

2𝑟𝑜𝑒
𝑟𝑜𝑡 =
1 + 𝑟𝑜𝑒

𝑁𝑜𝑡𝑒: 𝑟𝑥𝑦 = 𝑟𝑜𝑒


Example 2: Prof. Maxene conducted a test to her 10 students
in her Chemistry class. The test was given only once. The
scores of the students in odd and even items below were
gathered, (O) odd items and (E) even items. Using the split-
half method, is the test reliable?

Odd(x) Even(y)
RELIABILITY
SPLIT-HALF METHOD
15 20
19 17
20 24
25 21
20 23
18 22
19 25
26 24
20 18
18 17
Solution:
𝑂𝑑𝑑(𝑥) 𝐸𝑣𝑒𝑛(𝑦) 𝑥𝑦 𝑥2 𝑦2
15 20 300 225 400
19
20
17
24
323
480
361
400
289
576
RELIABILITY
25 21 525 625 441 SPLIT-HALF METHOD
20 23 460 400 529
18 22 396 324 484
19 25 475 361 625
26 24 624 676 576
20 18 360 400 324
18 17 306 324 289
෍ 𝑥 = 200 ෍ 𝑦 = 211 ෍ 𝑥𝑦 = 4249 ෍ 𝑥2 = 4096 ෍ 𝑦2 = 4533

Substitute: (𝑛)(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)


𝑟𝑥𝑦 =
Solution:

10 4249 − (200)(211)
𝑟𝑜𝑒 = 𝑟𝑥𝑦 =
[ 10 4096 − 2002][ 10 4533 − 2112]
RELIABILITY
𝑟𝑜𝑒 = 𝑟𝑥𝑦 = 0.33
SPLIT-HALF METHOD
To find for the reliability, use the Spearman-Brown
formula
2𝑟𝑜𝑒
𝑟𝑜𝑡 =
1 + 𝑟𝑜𝑒
2(0.33)
𝑟𝑜𝑡 =
1 + 0.33

𝒓𝒐𝒕 = 𝟎. 𝟓𝟎
The reliability coefficient is 0.50, which is questionable
reliability. Hence, the test items should be revised.
Correlation and Brown formula in Microsoft Excel

Correlation Formula RELIABILITY


SPLIT-HALF METHOD
𝑟𝑜𝑒

Brown Formula

Reliability Coefficient
 Administer the test once.
 Score the total test and apply the Kuder-Richardson
formula.

RELIABILITY
 The Kuder-Richardson 20 formula is applicable only KUDER-RICHARDSON FORMULA
in situations where students’ responses are scored
dichotomously. It is most useful with traditional test
items that are answered as right or wrong, true or
false, and yes or no type.
 KR-20 formula estimates of reliability provide
information whether the degree to which the items
in the test measure is of the same characteristic, it is
an assumption that all items are of equal in difficulty.
 Another formula for testing the internal consistency
of a test is the KR-21 formula, which is not limited to
test items that are answered dichotomously.
Formula:
𝑘 𝑥ҧ
(𝑘 − 𝑥ҧ
)
𝐾𝑅21 = 1−
𝑘−1
Where

k-------total number of items RELIABILITY


------ mean
𝑥ҧ KUDER-RICHARDSON FORMULA
𝑠2 ------- variance
In solving for the value 𝐾21, we should find first the value of the
mean(𝑥ҧ ) and variance(𝑠2).

𝑛(σ 𝑥2) − 2
𝑠2 =
𝑛(𝑛 − 1)

σ𝑥
𝑥ҧ=
𝑛
Where
n----------sample size/number of students who took the exam
Example 4: Prof. Maricar administered a 40-item test in
English for her Grade VI pupils in Cataggaman Elementary
School. Below are the scores of 15 pupils, find the reliability
using the Kuder-Richardson formula.

Student Score(x) Student Score(x) RELIABILITY


1 16 11 20 KUDER-RICHARDSON FORMULA
2 25 12 17
3 35 13 26
4 39 14 35
5 25 15 39
6 18
7 19
8 22
9 33
10 36
Solve for the mean and variance of the scores
using the table below.

Student Score(x) 𝑥2 Student Score(x) 𝑥2


1
2
16
25
256 11 20 400 RELIABILITY
625 12 17 289 KUDER-RICHARDSON FORMULA
3 35 1225 13 26 676
4 39 1521 14 35 1225
5 25 625 15 39 1521
6 18 324 405 11917
7 19 361
8 22 484
9 33 1089
10 36 1296
Solution:
𝑛(σ 𝑥2) − 2 𝑘 𝑥ҧ
(𝑘 − 𝑥ҧ
)
𝑠2 = 𝐾𝑅21 = 1−
𝑛(𝑛 − 1) 𝑘−1

𝑠2 =
15 11917 − (405)2
𝐾𝑅21 =
40
1−
27(40 − 27) RELIABILITY
15(15 − 1) 40 − 1 40(70.14) KUDER-RICHARDSON FORMULA
𝒔𝟐 = 𝟕𝟎. 𝟏𝟒 𝑲𝟐𝟏 = 𝟎. 𝟗𝟎
σ𝑥
𝑥ҧ= The reliability coefficient using
𝑛
𝐾𝑅21 formula is 0.90 which means
405 that the test has a very good
𝑥ҧ=
15 reliability. Meaning, the test is
𝒙ഥ= 𝟐𝟕 very good for classroom test.
MEAN AND VARIANCE IN MICROSOFT EXCEL

Variance Formula RELIABILITY


KUDER-RICHARDSON FORMULA
Variance value

Mean Formula

Mean Value
𝑲𝑹𝟐𝟏 IN MICROSOFT EXCEL

RELIABILITY
KUDER-RICHARDSON FORMULA
𝑲𝑹𝟐𝟏 Formula

Reliability Coefficient
Formula:

𝑘
𝐾𝑅20 = 1−
𝑘−1 𝑠2
RELIABILITY
KUDER-RICHARDSON FORMULA
Where
k-------total number of items
𝑛
p-------difficulty index 𝑝 =
𝑁
q-------1 − 𝑝

𝑠2 ------ variance
Steps in Solving the Reliability Coefficient Using
𝑲𝑹𝟐𝟎

 Solve the difficulty index of each item (p).


𝑛
RELIABILITY
𝑝= KUDER-RICHARDSON FORMULA
𝑁
Where
n-----number of students got the correct answer in each
item
N-----number of students who answered each item

 Solve the value of q in each item.


𝑞 =1−𝑝
Steps in Solving the Reliability Coefficient Using
𝑲𝑹𝟐𝟎

 Find the product of p and q columns.


RELIABILITY
 Find the summation of pq. KUDER-RICHARDSON FORMULA
 Solve the variance of the scores.

𝑛(σ 𝑥 2 ) − 2
𝑠2 =
𝑛(𝑛 − 1)
Where
n-----------number of items
Steps in Solving the Reliability Coefficient Using
𝑲𝑹𝟐𝟎

 Solvethe reliability coefficient using the


formula.
𝐾𝑅20
RELIABILITY
KUDER-RICHARDSON FORMULA

𝑘
𝐾𝑅20 = 1−
𝑘−1 𝑠2
Example 5: Mr. Ancheta administered a 20-item true or
false for his Math class. Below are the scores of 40
students. Find the reliability coefficient using the 𝐾𝑅20
formula and interpret the computed value, solve also
the coefficient of determination.
Item x Item
Number
x RELIABILITY
Number KUDER-RICHARDSON FORMULA
1 25 11 36
2 36 12 35
3 28 13 19
4 23 14 39
5 25 15 28
6 33 16 33
7 38 17 19
8 15 18 37
9 23 19 36
10 25 20 25

x-number of students who got the correct answer


Item
number x p q pq 𝑥2
1 25 0.625 0.375 0.234375 625
2 36 0.9 0.1 0.09 1296

RELIABILITY
3 28 0.7 0.3 0.21 784
4 23 0.575 0.425 0.244375 529
5 25 0.625 0.375 0.234375 625
6 33 0.825 0.175 0.144375 1089 KUDER-RICHARDSON FORMULA
7 38 0.95 0.05 0.0475 1444
8 15 0.375 0.625 0.234375 225
9 23 0.575 0.425 0.244375 529
10 25 0.625 0.375 0.234375 625
11 36 0.9 0.1 0.09 1296
12 35 0.875 0.125 0.109375 1225
13 19 0.475 0.525 0.249375 361
14 39 0.975 0.025 0.024375 1521
15 28 0.7 0.3 0.21 784
16 33 0.825 0.175 0.144375 1089
17 19 0.475 0.525 0.249375 361
18 37 0.925 0.075 0.069375 1369
19 36 0.9 0.1 0.09 1296
20 25 0.625 0.375 0.234375 625
578 3.38875 17698
Solution:
𝑛(σ 𝑥2) − 2 𝑘
𝑠2 = 𝐾𝑅20 = 1−
𝑛(𝑛 − 1) 𝑘−1

𝑠2 =
20 17698 − (578)2
𝐾𝑅20 =
20
1−
3.38875 RELIABILITY
20(20 − 1) 20 − 1 52.31 KUDER-RICHARDSON FORMULA
𝑲𝟐𝟎 = 𝟎. 𝟗𝟖
𝒔𝟐 = 𝟓𝟐. 𝟑𝟏

The reliability coefficient using the 𝐾𝑅20 = 0.98 means


that it has a very high reliability or excellent reliability.

𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = (0.98)2 = 0.9604 = 96.04%

96.04% of the variance in the students’ performance


can be attributed to the test.
𝑲𝑹𝟐𝟎 in Microsoft Excel

𝒑
RELIABILITY
KUDER-RICHARDSON FORMULA

𝒒
𝑲𝑹𝟐𝟎 in Microsoft Excel

𝒑𝒒
RELIABILITY
KUDER-RICHARDSON FORMULA

𝑲𝑹𝟐𝟎 Formula

Reliability Coefficient
1. Test Length. In general, a longer test is more IMPROVING
reliable than a shorter one because longer tests
sample the instructional objectives more RELIABILITY
adequately.
2. Spread of scores. A group of students with
heterogeneous ability will produce a larger spread
of test scores than a group with homogeneous
ability.
3. Item difficulty. In general, tests composed of items
of moderate or average difficulty (.30-.70) will have
more influence on reliability than those composed
of primarily of easy or very difficult items.
4. Item discrimination. In general, tests composed of IMPROVING
more discriminating items will have greater reliability
than those composed of less discriminating items. RELIABILITY
5. Time limits. Adding a time factor may improve
reliability for lower-level cognitive test items. For
higher-level cognitive test items, the imposition of a
time limit may defeat the intended purpose of the
items.
 Validity is the degree to which the assessment
instrument measures what it intends to measure.

VALIDITY
 It also refers to the usefulness of the instrument DEFINITION
for a given purpose. It is the most important
criterion of a good assessment instrument.

JUST A REVIEW!
 Content Validity
 Criterion-related Validity
-Concurrent Validity
VALIDITY
TYPES
-Predictive Validity
 Construct Validity
-Convergent Validity
-Divergent Validity

JUST A REVIEW!
It is type of validation that refers to the relationship
between a test and the instructional objectives.

 The evidence of the content validity of a test is found VALIDITY


in the Table of Specifications. CONTENT VALIDITY
 This is the most important type of validity for a
classroom teacher.
 There is no coefficient for content validity. It is
determined by experts judgmentally, not empirically.
It is established statistically such that a set of scores
revealed by the measuring instrument is correlated
with the scores obtained in another external predictor
or measure. It has two purposes:
VALIDITY
CRITERION-RELATED VALIDITY
 concurrent
 predictive
 The criterion and the predictor data are collected at
the same time.
 This type of validity is appropriate for tests designed VALIDITY
to assess a student’s status; it is a good diagnostic CRITERION-RELATED VALIDITY
screening test. CONCURRENT VALIDITY
 It is established by correlating the criterion and the
predictor using Pearson product correlation
coefficient and other statistical tools correlations.
 A type of validation that refers to a measure of the
extent to which a student’s current test result can be
used to estimate accurately the outcome of the
student’s performance at later time.
VALIDITY
CRITERION-RELATED VALIDITY
 It is appropriate for tests designed to assess student’s PREDICTIVE VALIDITY
future status on a criterion.
 It is very important in psychological testing, like if the
psychologists want to predict responses, behaviors,
outcomes, performance and others.
 Regression analysis can be used to predict the
criterion of a single predictor or multiple predictors.
 A type of evaluation that refers to the measure of the
extent to which test measures a theoretical and
unobservable variable qualities such as intelligence,
math achievement, performance anxiety, and the like,
VALIDITY
over a period of time on the basis of gathering CONSTRUCT VALIDITY
evidence
 It is established through intensive study of the test or
measurement instrument using convergent and
divergent validation.
 It is a type of construct validation wherein test has a
high correlation with another test that measures the
same construct. VALIDITY
CONSTRUCT VALIDITY
CONVERGENT VALIDITY
 It is a type of construct validation wherein a test has
low correlation with a test that measure a different
construct. VALIDITY
 In this case, a high validity occurs only when there is CONSTRUCT VALIDITY
a low correlation coefficient between tests that DIVERGENT VALIDITY
measure different traits.
Teacher James develops a 45-item test and he wants to determine
if his test is valid. He takes another test that is already
acknowledged for its validity and uses it as criterion. He
conducted these two sets of test to his 15 students. The table
shows the results of the two test. Is the test developed by Teacher

VALIDITY
James valid? Find the
Teacher validity
Criterion testcoefficientTeacher
using Pearson r. test
Criterion
James test James test SAMPLE PROBLEM
(x) (y) (x) (y)
12 16 26 33
22 25 44 45
23 31 36 40
25 25 29 35
28 29 37 41
30 28
33 35
42 40
41 45
37 40
Formula
VALIDITY
SAMPLE PROBLEM
Validity
Coefficient
(validity of the
test is high)
Unclear directions. Directions that do not clearly
indicate how to respond to the tasks and how to record
the responses tends to reduce validity. FACTORS AFFECTING
Reading vocabulary and sentence structure are too
difficult. Vocabulary and sentence structure that are VALIDITY
too complicated for the students would result in the
assessment of reading comprehension; thus altering
the meaning of assessment result.
Ambiguity. Ambiguous statements in assessment tasks
contribute to misinterpretations and confusion.
Ambiguity sometimes confuses the better students
more that it does the poor students.
Inadequate time limits. Time limits that do not provide
students with enough time to consider the tasks and provide
thoughtful responses can reduce the validity of
interpretation of results. Rather than measuring what a FACTORS AFFECTING
student knows or able to do in a topic given adequate time,
the assessment may become a measure of the speed with
which the student can respond. For some students (e.g. a
VALIDITY
typing test), speed may be important. However, most
assessments of achievement should minimize the effects of
speed on student performance.
Overemphasis of easy-to assess aspects of domain at the
expense of important, but hard-to assess aspects
(construct under representation). It is easy to develop test
questions that assess factual knowledge or recall and
generally harder to develop ones that tap conceptual
understanding or higher-order thinking processes such as
the evaluation of competing positions or arguments. Hence,
it is important to guard against underrepresentation of tasks
getting at the important, but more difficult to assess aspects
of achievement.
Test items inappropriate for the outcomes being
measured. Attempting to measure understanding, thinking
skills, and other complex types of achievement with test forms FACTORS AFFECTING
that are appropriate only for measuring factual knowledge
will invalidate the results.
Poorly constructed test items. Test items that unintentionally
VALIDITY
provide clues to the answer tend to measure the students’
alertness in detecting clues as well as mastery of skills or
knowledge the test is intended to measure.
Test too short. If a test is too short to provide a representative
sample of the performance we are interested in, its validity
will suffer accordingly.
Improper arrangement of items. Test items are typically
arranged in order of difficulty, with the easiest items first.
Placing difficult items first in the test may cause students to FACTORS AFFECTING
spend too much time on these and prevent them from
reaching items they could easily answer. Improper
arrangement may also influence validity by having a
VALIDITY
detrimental effect on student motivation.
Identifiable pattern of answer. Placing correct answers in
some systematic pattern (e.g., T,T,F,F, or B,B,B,C,C,C,D,D,D)
enables students to guess the answers to some items more
easily, and this lowers validity.
What if a test is so hard and no respondent could
correctly answer even a single item?
VALIDITY
AND
Scores would still be consistent, but not valid.
RELIABILITY
If a test measures what it is supposed to measure, it is
valid, but a reliable test can consistently measure the
wrong thing and be invalid.
Reliability is necessary but not sufficient for establishing
validity. VALIDITY
AND

A valid test is always reliable but a reliable test is not RELIABILITY


always valid.

You might also like