0% found this document useful (0 votes)
129 views

Properties of Assessment Method: Validity

This document discusses various types of validity and reliability in assessment methods. It describes content validity, criterion-related validity including concurrent and predictive validity, construct validity, and face validity. It also covers different measures of reliability such as test-retest, parallel forms, internal consistency using KR21 and Spearman Brown, and discusses how to interpret reliability coefficients. Fairness, practicality, and ethics in assessment are important considerations as well.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

Properties of Assessment Method: Validity

This document discusses various types of validity and reliability in assessment methods. It describes content validity, criterion-related validity including concurrent and predictive validity, construct validity, and face validity. It also covers different measures of reliability such as test-retest, parallel forms, internal consistency using KR21 and Spearman Brown, and discusses how to interpret reliability coefficients. Fairness, practicality, and ethics in assessment are important considerations as well.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Properties Of Assessment Method

Validity
Validity
-Appropriateness, Correctness, Meaningfulness and Usefulness of the
Specific conclusions of a teacher on the teacher-learner situation

-(For testing) The extent to which an Instrument measures what it


intends to measure
Types of Validity
Content Validity – Content
- Evidence that test items represent the proper domain
-Teachers should give emphasis on: adequacy of experience of the students, coverage of sufficient
material to assess a domain

How do we establish Content Validity?


1. Show that the number of items for each content area matches the relative importance of these items
as reflected in the survey of the domain

2. Show that the content of the test matches what was found in the survey of the domain
3. Established by an expert
Example:
Professor Elle G. Beaty gave a Preliminary examination for Test and
Measurement class that covers the content of the discussion of the last
3 weeks
Item Validity:
Criteria Item No.

1 2 3 4 5 6 7
1. Material Covered Sufficiently
2. Students have prior experience with the type of task
3.Most students are able to answer them correctly
4.Decision
Entire Test
Skills Are Estimated Percent of Percentage of items Covered
Instruction in Tests
1. Knowledge

2. Comprehension

3. Application

4. Analysis

5. Synthesis

6. Evaluation
Face Validity
Superficial Appearance of the test

Is the mere appearance that a test measures its target construct

Tests wherein the purpose is clear, even to naïve respondents, are said
to have high face validity. Accordingly, tests wherein the purpose is
unclear have low face validity (Nevo, 1985).

Usually established by the test takers themselves through a Likert scale


CRITERION-RELATED VALIDITY
Use of an established criterion to create a new measurement
to measure a construct you are interested in
Theoretically
Related

what is the relationship between a test and a criterion


(external source) that the test should be related to
Example of Criterion-Related Validity:
• An existing measurement procedure for depression (valid and
reliable) is available but response rate is low and is too long ( say 100
items)
FORMS OF CRITERION RELATED
VALIDITY
CONCURRENT VALIDITY - relationship between test scores and
another currently obtainable benchmark
EXAMPLE: Test scores obtained from an class achievement test correlate highly on
the school –wide achievement test

PREDICTIVE VALIDITY - relationship between test scores and a future


standard
How well the test predicts future performances?
EXAMPLE : College Entrance Exams
CONSTRUCT VALIDITY
Seeks agreement between theoretical concept.
The extent to which an assessment corresponds to other
variables as predicted by some rationale or theory

Example: Test that tries to establish a degree of ego-centrism across


different age groups
Reliability
Extent to which an assessment is consistent
Dependability and Stability

degree of freedom from measurement error-


consistency of test scores.
the degree to which test scores are free from
errors of measurement
Factors can cause some errors
Poorly worded questions
Poor test taking instructions
Test taker anxiety
Destructing in testing room
Test-Retest Reliability

Relationship between scores from one test given at two different


administrations.

Measures stability

The same measuring instrument is administered twice to the same


group of people and the correlation coefficient is determined.
LIMITATIONS OF TEST-RETEST METHOD
ARE:
1. When the time interval is short, the respondents may recall their
previous responses and this tends to make correlation coefficient
high.
2. When the time is long, such factors as unlearning, forgetting,
among others may occur and may result in low correlation of the
measuring instrument.
3. Regardless of the time interval separating conditions such as noise,
temperature, lighting, and other factors may affect the correlation
coefficient of the measuring instrument.
Inter-Rater Reliability
Parallel/Alternate/Equivalent Form
Reliability
Relationship between scores from two similar versions of the same test

A challenge involved in this kind of reliability is to assure that both


forms of the test use the same or very similar directions, format and
number of questions, and equal in difficulty and content.
Split-Half Reliability

Correlating one half of the test against the other half.

Requires only one form and one administration of the test, splits the
test in half and correlates the scores of one half of the test with the
other half.
Internal Consistency
Reliability measured statistically by going “within the test”

How scores on individual items relate to each other or to the test as a


whole.

Ex. Individuals who score high on a test of depression should, on average,


respond to all items on the test in a manner that indicates depressive
ideation.

BArOn
Kuder-Richardson KR21

KR21= K/(K-1)[1-{M(K-M)}/K(Variance)]

K= total number of items


M- Mean of the scores
V- Variance of test scores ( sum of differences of the score and mean/
n-1) or the square of standard deviation
N- Number of test takers
(best for dichotomous, multiple choice, if there is a right answer)
So how to interpret the scores? ( Index of Reliability)

• .50 or below = Questionable reliability


• .50-.60 = May need revision, other measure may be supplemented
• .60 - .70 = Somewhat low, Some items may need improvement
• .70 - .80 = Good for a classroom test, few items need revision
• .80 - .90 Very good for a classroom test
• .90 above = Excellent reliability ( usually standardized tests)
Example: Try to find the KR21 Reliability
of Index
• 8 students took a 10 item multiple choice test, the following are the
scores:
1 7
2 7
3 8
4 9
5 3
6 4
7 5
8 6
Pearson Product Moment Correlation
• In our case, X = one person’s score on the first half of items, X = the
mean score on the first half of items, Y = one person’s score on the
second half of items, Y = the mean score on the second half of items.
Σ(X – X) (Y – Y)
rxy = √[Σ (X – X)2] [ΣY – Y)2]
Spearman Brown Prophecy Formula
• A degree of correlation between two set of scores must first be
established to determine that index of reliability for SB

• SB= (2xrhalf)/(1+rhalf)

rhalf= reliability of half of the test ( Pearson)


Try for Yourself
• A 50 item test was administered to a group of 20 students. The mean
score was 35, standard deviation is 5.5. What is KR21 index of
reliability?

• Compute for the Internal consistency (using Spearman Brown) for the
following scores: x= 1,3,4,4, y= 2, 5, 5, 8
Can test be reliable and not valid?
Fairness, Practicality, Efficiency
• Assessment should be fair
• Assessment should be viewed as an opportunity to learn
• Should be free from stereotyping

• Practical, Easy to use and does not use to much time


Ethics in Assessment
• Sexual Fantasies
• Sensitive Information
• Using unreliable Tests,

• “Will any physical or psychological harm come to any as a result of


this assessment/testing?”
• Confidentiality of results
• Deception

You might also like