Psych Assessment Chapter 4
Psych Assessment Chapter 4
Chapter 4
Some Assumptions About
Psychological Testing and
Assessment
Assumption 1: Psychological Traits and States Exist
A trait has been defined as “any distinguishable, relatively enduring way in
which one individual varies from another”.
States also distinguish one person from another but are relatively less
enduring.
The definitions of trait and state we are using also refer to a way in which one
individual varies from another. Attributions of a trait or state term are relative
Assumption 2: Psychological Traits and States Can Be
Quantified and Measured
Once it’s acknowledged that psychological traits and states do exist, the specific traits
and states to be measured and quantified need to be carefully defined.
Test developers and researchers, much like people in general, have many different ways
of looking at and defining the same phenomenon.
Measuring traits and states by means of a test entails developing not only appropriate
test items but also appropriate ways to score the test and interpret the results.
The test score is presumed to represent the strength of the targeted ability or trait or
state and is frequently based on cumulative scoring.
Assumption 3: Test-Related Behavior Predicts Non-Test-
Related Behavior
The objective of the test is to provide some indication of other aspects of the examinee’s
behavior.
The tasks in some tests mimic the actual behaviors that the test user is attempting to
understand.
In some forensic (legal) matters, psychological tests may be used not to predict behavior
but to postdict it —that is, to aid in the understanding of behavior that has already taken
place.
Assumption 4: Tests and Other Measurement Techniques
Have Strengths and Weaknesses
Competent test users understand a great deal about the tests they use. They
understand, among other things, how a test was developed, the circumstances
under which it is appropriate to administer the test, how the test should be
administered and to whom, and how the test results should be interpreted.
Competent test users understand and appreciate the limitations of the tests
they use as well as how those limitations might be compensated for by data
from other sources.
Assumption 5: Various Sources of Error Are Part of the
Assessment Process
Error traditionally refers to something that is more than expected; it is actually a
component of the measurement process.
One source of fairness-related problems is the test user who attempts to use a
particular test with people whose background and experience are different from
the background and experience of people for whom the test was intended.
Assumption 7: Testing and Assessment Benefit Society
Yet a world without tests would most likely be more a nightmare than a dream.
Most of all, a good test would seem to be one that measures what it
purports to measure.
RELIABILITY VALIDITY
The criterion involves the Psychological test, like other test
consistency of the measuring and instruments, are reliable to
tool: the precision with which varying degrees. In addition to
the test measures and the being reliable, test must also be
extent to which error is present reasonably accurate. The test
in measurements. In theory, the should be valid, it must measure
perfectly reliable measures in what it purports to measure.
the same way.
Questions with Regards to The Validity of the Test
A. Item focused:
i. Do the items adequately sample the range of areas that must be sampled to
adequately measure the construct?
ii. How do individual items contribute to or detract from the test’s validity?
B. Grounds related to interpretation
iii. What do these scores really tell us about the targeted construct?
iv. How are high scores on the test related to test takers’ behavior?
v. How are low scores on the test related to test takers’ behavior?
vi. How do scores in this test relate to scores on other test purporting to measure the
same construct?
vii.How do scores in this test relate to scores on other test purporting to measure the
opposite construct?
Other Considerations
A good test is one that trained examiners can administer, score, and interpret with a
minimum of difficulty.
A good test is a useful test, one that yields actionable results that will ultimately
benefit individual testtakers or society at large.
If the purpose of a test is to compare the performance of the testtaker with the
performance of other testtakers, then a “good test” is one that contains adequate
norms. Also referred to as normative data, norms provide a standard with which the
results of measurement can be compared.
Norms
Norm-referenced testing and assessment as a method of evaluation and a way of
deriving meaning from test scores by evaluating an individual testtaker’s score and
comparing it to scores of a group of testtakers.
In a psychometric context, norms are the test performance data of a particular group
of testtakers that are designed for use as a reference when evaluating or interpreting
individual test scores.
norming, refer to the process of deriving norms. Norming may be modified to describe
a particular type of norm derivation.
Norming a test, especially with the participation of a nationally
representative normative sample, can be a very expensive
proposition. For this reason, some test manuals provide what are
variously known as user norms or program norms, which “consist
of descriptive statistics based on a group of testtakers in a given
period of time rather than norms obtained by formal sampling
methods”
Sampling to Develop Norms
The process of administering a test to a representative sample of testtakers for
the purpose of establishing norms is referred to as standardization or test
standardization.
Sampling
In the process of developing a test, a test developer has targeted some defined
group as the population for which the test is designed. This population is the
complete universe or set of individuals with at least one common, observable
characteristic.
Sampling to Develop Norms
The test developer can obtain a distribution of test responses by administering
the test to a sample of the population—a portion of the universe of people
deemed to be representative of the whole population.
If such sampling were random (or, if every member of the population had the
same chance of being included in the sample), then the procedure would be
termed stratified-random sampling.
Sampling to Develop Norms
Two other types of sampling procedures are purposive sampling and incidental
sampling. If we arbitrarily select some sample because we believe it to be
representative of the population, then we have selected what is referred to as
a purposive sample.
Non-probability Sampling is the type of sampling wherein, not all who was
included in the population have equal chance to be chosen as sample.
Different Sampling Procedures
Probability Sampling Non-Probability Sampling
Simple Random Sampling Purposive Sampling
Systematic Random Sampling Snowball Sampling
Stratified Random Sampling Convenience or Incidence
Clustered Random Sampling Sampling
Quota Sampling
Developing Norms for A Standardized Test
Having obtained a sample, the test developer administers
the test according to the standard set of instructions that
will be used with the test. The test developer also
describes the recommended setting for giving the test.
Ty p e s o f N o r m s
1. Percentile Norms
2. Age Norms
3. Grade Norms
4. National Norms
5. National Anchor Norms
6. Subgroup Norms
7. Local Norms
Ty p e s o f N o r m s
1. Percentiles
A percentile is an expression of the percentage of people whose score on a test or
measure falls below a particular raw score.
3. Grade Norms
Designed to indicate the average test performance of testtakers in a given school grade, grade
norms are developed by administering the test to representative samples of children over a
range of consecutive grade levels
Both grade norms and age norms are referred to more generally as developmental norms, a
term applied broadly to norms developed on the basis of any trait, ability, skill, or other
characteristic that is presumed to develop, deteriorate, or otherwise be affected by
chronological age, school grade, or stage of life.
Ty p e s o f N o r m s
4. National Norms
As the name implies, national norms are derived from a normative sample that was nationally
representative of the population at the time the norming study was conducted.
Using the equipercentile method, the equivalency of scores on different tests is calculated
with reference to corresponding percentile scores.
Ty p e s o f N o r m s
6. Subgroup Norms
A normative sample can be segmented by any of the criteria initially used in selecting subjects
for the sample. What results from such segmentation are more narrowly defined subgroup
norms.
7. Local Norms
Typically developed by test users themselves, local norms provide normative information with
respect to the local population’s performance on some test.
Fixed Reference Group Scoring Systems
Norms provide a context for interpreting the meaning of a test score.
Another type of aid in providing a context for interpretation is termed a
fixed reference group scoring system.
Here, the distribution of scores obtained on the test from one group of
testtakers—referred to as the fixed reference group—is used as the basis
for the calculation of test scores for future administrations of the test
Norm-Referenced Versus Criterion-Referenced Evaluation
One way to derive meaning from a test score is to evaluate the test score in relation to
other scores on the same test. This approach to evaluation is referred to as norm-
referenced.
Another way to derive meaning from a test score is to evaluate it on the basis of
whether or not some criterion has been met. We may define a criterion as a standard
on which a judgment or decision may be based.
So, in selecting a test for use, the responsible test user does some
advance research on the test’s available norms to check on how
appropriate they are for use with the targeted testtaker population.