0% found this document useful (0 votes)
91 views

Scoring and Interpretation - Part 1 (Unit 3)

The document discusses various methods used in test construction and scoring, including: 1. The process of test construction involves test conceptualization, construction, tryout, item analysis, and revision. Test conceptualization determines what the test will measure and who will use it. 2. Scaling assigns numbers to test responses and can include norm-referenced scales, criterion-referenced scales, ratio scales, comparative scales, and non-comparative scales like rating scales. 3. Methods of absolute scaling include Thurstone scaling to establish item difficulty and the method of equal appearing intervals to develop interval scales of statements judged on their intensity.

Uploaded by

Rania Abdullah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Scoring and Interpretation - Part 1 (Unit 3)

The document discusses various methods used in test construction and scoring, including: 1. The process of test construction involves test conceptualization, construction, tryout, item analysis, and revision. Test conceptualization determines what the test will measure and who will use it. 2. Scaling assigns numbers to test responses and can include norm-referenced scales, criterion-referenced scales, ratio scales, comparative scales, and non-comparative scales like rating scales. 3. Methods of absolute scaling include Thurstone scaling to establish item difficulty and the method of equal appearing intervals to develop interval scales of statements judged on their intensity.

Uploaded by

Rania Abdullah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Scoring and Interpretation

(Unit 3)
Process of Test Construction
Process
1. Test conceptualization

2. Construction

3. Tryout

4. Item analysis

5. Revision
Test conceptualization
• Different from already existing

• New phenomenon

• New theory

• New skills

• New disorder
Preliminary questions
• What the test is designed to measure?

• Objectives?

• Need of the test?

• Test user?

• Test taker?

• Content area?

• Administration?
Cont…
• Format?

• More than one form?

• Training for interpretation and administration?

• Type of responses required? E.g. instruction for disable

• Beneficiaries?

• Any harm of administration?

• Meaning of test score?


Type of test

• Norm-referenced
– Age based
– Grade based
– Norms based e.g. stanine

• Criterion-referenced
Scaling
• How a device is designed and calibrated and
the way numbers are assigned to different
amount of traits, attributes etc. being measured

• Thurstone (1925) ---- absolute scaling “a


method to study item difficulty across samples
of test takers who differ in ability”.
Pure science

T-scores/z-scores
Temperature

Positions

Male/female
Cont…
• Ratio scales
– Money
– Weight
– Grip (neurological abilities)
Guttman Equal appearing interval
Comparative
Paired comparisons
choosing among pair of stimuli to study preferences or
priorities
High score depends upon the number of judges agreeing to one statement
more frequently than other (e.g., moral dilemma--- cheating in exam or
cheating in paying taxes)
– (MSCEIS) --- Target, consensus, judges
– Edwards Personal Preference Sheet (15 preferences, 225 pairs)
Cont…

Rank Order
Sorting tasks -
judgment in comparison to other
(e.g., sorting 30 moral values from 1-30)
– Categorical scaling ---- placing stimuli (moral
dilemmas) in three piles (e.g., never justified,
sometimes justified, always justified)
Non Comparative Scaling
• Test score is a function of a testtaker’s more or less
characteristics.

• Scaling helps in assigning a number to response

• Continuous scaling
– Age in years, education in years
• Rating scale
Summated rating
– Likert-type scale (Likert, 1932) --- ordinal scale
– 5-point “strongly agree to strongly disagree”
– Guttman Scale (1944, 1947) (Scalogram analysis)
• Ordinal measure
• Items are arranged sequentially from weaker to stronger
expression of an attitude, belief, or feeling being
measured (e.g., attitude towards suicide)
• Scalogram analysis --- Graphic mapping of testtaker’s
responses (e.g. in consumer psychology --- buying a
product would definite lead to buying another product e.g.
fishing related)
Semantic Differential
METHOD OF ABSOLUTE SCALING
Calculate Mean and SD for each statement

SD should be less that reflects homogenous


response of judges

Items are ordered based upon mean value.


Cont….
• Russo (1994)
– 216 items
– 3 depression inventories
– Judges
• 527 undergrad students
• 37 clinical faculty
Cont…
The present results suggest that if the original scoring is used
for the three scales examined here, then the distinctions
between well-being and absence of depression as well as
between moderate and severe will be difficult to make. Such
imprecision will make it difficult to assess the efficacy of
treatments for depression, because a lack thereof must be a
function of added measurement error due to ordinal measures.
Such error could also wreak havoc in longitudinal studies,
especially in those in which memory is involved.
(Russo, 1994)
Cont…
• Scale of equal appearing interval (Thurstone,
1929)
• Interval scale
• Steps in developing scale
1. Large number of positive and negative statements
2. Judges evaluating each statement e.g., on 1-9 scale
treating it as equal distance among statements (e.g.,
how strongly a statement justifies suicide) --- treating
it as interval
3. Mean and SD of judges rating is calculated for each
statement
4. Inclusion of item depends upon quality of item,
confidence in item sorting representing equal
interval, mean and SD. Less SD in judges opinion
better item.
Cont…
5. Final administration depends upon objective.
Participants are asked to rate statements as per
their own attitudes. The values of the items that
respondent selects (based on are judges ratings) is
averaged to produce a score.

An example of direct estimation variety others are


indirect estimation, scores need not to be
transformed in any other format

Method of scaling depends upon variable being


measured, target population, and preference of test
developer.
Multidimensional scaling
• is used when an item taps more than one response to the item
(e.g. harmful or beneficial use of marijuana)
• Factor analysis vs. multidimensional scaling

You might also like