0% found this document useful (0 votes)
29 views

Psych Assessment Chapter 4

Uploaded by

Quennie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Psych Assessment Chapter 4

Uploaded by

Quennie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Of Test and Testing

Chapter 4
Some Assumptions About
Psychological Testing and
Assessment
Assumption 1: Psychological Traits and States Exist
A trait has been defined as “any distinguishable, relatively enduring way in
which one individual varies from another”.

States also distinguish one person from another but are relatively less
enduring.

Samples of behavior may be obtained in a number of ways,


ranging from direct observation to the analysis of self-report
statements or pencil-and-paper test answers.
The term psychological trait, much like the term trait alone, covers a wide
range of possible characteristics. Thousands of psychological trait terms can be
found in the English language.

Among them are psychological traits that relate to intelligence, specific


intellectual abilities, cognitive style, adjustment, interests, attitudes, sexual
orientation and preferences, psychopathology, personality in general, and
specific personality traits.

For our purposes, a psychological trait exists only as a construct—an informed,


scientific concept developed or constructed to describe or explain behavior.
We can’t see, hear, or touch constructs, but we can infer their existence from
overt behavior.
Overt behavior refers to an observable action or the product of an observable
action, including test- or assessment-related responses.

The phrase relatively enduring in our definition of trait is a reminder that a


trait is not expected to be manifested in behavior 100% of the time.

The definitions of trait and state we are using also refer to a way in which one
individual varies from another. Attributions of a trait or state term are relative
Assumption 2: Psychological Traits and States Can Be
Quantified and Measured
Once it’s acknowledged that psychological traits and states do exist, the specific traits
and states to be measured and quantified need to be carefully defined.

Test developers and researchers, much like people in general, have many different ways
of looking at and defining the same phenomenon.

Measuring traits and states by means of a test entails developing not only appropriate
test items but also appropriate ways to score the test and interpret the results.

The test score is presumed to represent the strength of the targeted ability or trait or
state and is frequently based on cumulative scoring.
Assumption 3: Test-Related Behavior Predicts Non-Test-
Related Behavior
The objective of the test is to provide some indication of other aspects of the examinee’s
behavior.

The tasks in some tests mimic the actual behaviors that the test user is attempting to
understand.

In some forensic (legal) matters, psychological tests may be used not to predict behavior
but to postdict it —that is, to aid in the understanding of behavior that has already taken
place.
Assumption 4: Tests and Other Measurement Techniques
Have Strengths and Weaknesses
Competent test users understand a great deal about the tests they use. They
understand, among other things, how a test was developed, the circumstances
under which it is appropriate to administer the test, how the test should be
administered and to whom, and how the test results should be interpreted.

Competent test users understand and appreciate the limitations of the tests
they use as well as how those limitations might be compensated for by data
from other sources.
Assumption 5: Various Sources of Error Are Part of the
Assessment Process
Error traditionally refers to something that is more than expected; it is actually a
component of the measurement process.

More specifically, error refers to a long-standing assumption that factors other


than what a test attempts to measure will influence performance on the test.

Because error is a variable that must be taken account of in any assessment, we


often speak of error variance, that is, the component of a test score attributable
to sources other than the trait or ability measured.
Assumption 6: Testing and Assessment Can Be Conducted
in a Fair and Unbiased Manner
Today all major test publishers strive to develop instruments that are fair when
used in strict accordance with guidelines in the test manual.

One source of fairness-related problems is the test user who attempts to use a
particular test with people whose background and experience are different from
the background and experience of people for whom the test was intended.
Assumption 7: Testing and Assessment Benefit Society

Yet a world without tests would most likely be more a nightmare than a dream.

• In a world without tests or other assessment procedures, personnel might be hired on


the basis of nepotism rather than documented merit.
• In a world without tests, teachers and school administrators could arbitrarily place
children in different types of special classes simply because that is where they
believed the children belonged.
• In a world without tests, there would be a great need for instruments to diagnose
educational difficulties in reading and math and point the way to remediation.
• In a world without tests, there would be no instruments to diagnose
neuropsychological impairments.
• In a world without tests, there would be no practical way for the military to screen
thousands of recruits with regard to many key variables.
What’s a “good test”?
The criteria for a good test would include clear instructions for
administration, scoring, and interpretation.

Most of all, a good test would seem to be one that measures what it
purports to measure.

Test users often speak of the psychometric soundness of tests, two


key aspects of which are reliability and validity.
PSYCHOMETRIC SOUNDNESS

RELIABILITY VALIDITY
The criterion involves the Psychological test, like other test
consistency of the measuring and instruments, are reliable to
tool: the precision with which varying degrees. In addition to
the test measures and the being reliable, test must also be
extent to which error is present reasonably accurate. The test
in measurements. In theory, the should be valid, it must measure
perfectly reliable measures in what it purports to measure.
the same way.
Questions with Regards to The Validity of the Test
A. Item focused:
i. Do the items adequately sample the range of areas that must be sampled to
adequately measure the construct?
ii. How do individual items contribute to or detract from the test’s validity?
B. Grounds related to interpretation
iii. What do these scores really tell us about the targeted construct?
iv. How are high scores on the test related to test takers’ behavior?
v. How are low scores on the test related to test takers’ behavior?
vi. How do scores in this test relate to scores on other test purporting to measure the
same construct?
vii.How do scores in this test relate to scores on other test purporting to measure the
opposite construct?
Other Considerations
 A good test is one that trained examiners can administer, score, and interpret with a
minimum of difficulty.
 A good test is a useful test, one that yields actionable results that will ultimately
benefit individual testtakers or society at large.

If the purpose of a test is to compare the performance of the testtaker with the
performance of other testtakers, then a “good test” is one that contains adequate
norms. Also referred to as normative data, norms provide a standard with which the
results of measurement can be compared.
Norms
Norm-referenced testing and assessment as a method of evaluation and a way of
deriving meaning from test scores by evaluating an individual testtaker’s score and
comparing it to scores of a group of testtakers.

In a psychometric context, norms are the test performance data of a particular group
of testtakers that are designed for use as a reference when evaluating or interpreting
individual test scores.

A normative sample is that group of people whose performance on a particular test is


analyzed for reference in evaluating the performance of individual testtakers.

norming, refer to the process of deriving norms. Norming may be modified to describe
a particular type of norm derivation.
Norming a test, especially with the participation of a nationally
representative normative sample, can be a very expensive
proposition. For this reason, some test manuals provide what are
variously known as user norms or program norms, which “consist
of descriptive statistics based on a group of testtakers in a given
period of time rather than norms obtained by formal sampling
methods”
Sampling to Develop Norms
The process of administering a test to a representative sample of testtakers for
the purpose of establishing norms is referred to as standardization or test
standardization.

A test is said to be standardized when it has clearly specified procedures for


administration and scoring, typically including normative data.

Sampling
In the process of developing a test, a test developer has targeted some defined
group as the population for which the test is designed. This population is the
complete universe or set of individuals with at least one common, observable
characteristic.
Sampling to Develop Norms
The test developer can obtain a distribution of test responses by administering
the test to a sample of the population—a portion of the universe of people
deemed to be representative of the whole population.

The process of selecting the portion of the universe deemed to be


representative of the whole population is referred to as sampling.

If such sampling were random (or, if every member of the population had the
same chance of being included in the sample), then the procedure would be
termed stratified-random sampling.
Sampling to Develop Norms
Two other types of sampling procedures are purposive sampling and incidental
sampling. If we arbitrarily select some sample because we believe it to be
representative of the population, then we have selected what is referred to as
a purposive sample.

DIFFERENT SAMPLING PROCEDURES


Probability Sampling is the type of sampling wherein, all that were included in
the population of interest has equal chance to be included as sample.

Non-probability Sampling is the type of sampling wherein, not all who was
included in the population have equal chance to be chosen as sample.
Different Sampling Procedures
Probability Sampling Non-Probability Sampling
Simple Random Sampling Purposive Sampling
Systematic Random Sampling Snowball Sampling
Stratified Random Sampling Convenience or Incidence
Clustered Random Sampling Sampling
Quota Sampling
Developing Norms for A Standardized Test
Having obtained a sample, the test developer administers
the test according to the standard set of instructions that
will be used with the test. The test developer also
describes the recommended setting for giving the test.
Ty p e s o f N o r m s

1. Percentile Norms
2. Age Norms
3. Grade Norms
4. National Norms
5. National Anchor Norms
6. Subgroup Norms
7. Local Norms
Ty p e s o f N o r m s

1. Percentiles
A percentile is an expression of the percentage of people whose score on a test or
measure falls below a particular raw score.

Intimately related to the concept of a percentile as a description of performance on


a test is the concept of percentage correct.

A percentile is a converted score that refers to a percentage of testtakers.


Percentage correct refers to the distribution of raw scores—more specifically, to
the number of items that were answered correctly multiplied by 100 and divided
by the total number of items
Ty p e s o f N o r m s
2. Age Norms
Also known as age-equivalent scores, age norms indicate the average performance of
different samples of testtakers who were at various ages at the time the test was
administered.

3. Grade Norms
Designed to indicate the average test performance of testtakers in a given school grade, grade
norms are developed by administering the test to representative samples of children over a
range of consecutive grade levels

Both grade norms and age norms are referred to more generally as developmental norms, a
term applied broadly to norms developed on the basis of any trait, ability, skill, or other
characteristic that is presumed to develop, deteriorate, or otherwise be affected by
chronological age, school grade, or stage of life.
Ty p e s o f N o r m s
4. National Norms
As the name implies, national norms are derived from a normative sample that was nationally
representative of the population at the time the norming study was conducted.

5. National Anchor Norms


An equivalency table for scores on the two tests, or national anchor norms, could provide the
tool for such a comparison. Just as an anchor provides some stability to a vessel, so national
anchor norms provide some stability to test scores by anchoring them to other test scores.

Using the equipercentile method, the equivalency of scores on different tests is calculated
with reference to corresponding percentile scores.
Ty p e s o f N o r m s
6. Subgroup Norms
A normative sample can be segmented by any of the criteria initially used in selecting subjects
for the sample. What results from such segmentation are more narrowly defined subgroup
norms.

7. Local Norms
Typically developed by test users themselves, local norms provide normative information with
respect to the local population’s performance on some test.
Fixed Reference Group Scoring Systems
Norms provide a context for interpreting the meaning of a test score.
Another type of aid in providing a context for interpretation is termed a
fixed reference group scoring system.

Here, the distribution of scores obtained on the test from one group of
testtakers—referred to as the fixed reference group—is used as the basis
for the calculation of test scores for future administrations of the test
Norm-Referenced Versus Criterion-Referenced Evaluation
One way to derive meaning from a test score is to evaluate the test score in relation to
other scores on the same test. This approach to evaluation is referred to as norm-
referenced.

Another way to derive meaning from a test score is to evaluate it on the basis of
whether or not some criterion has been met. We may define a criterion as a standard
on which a judgment or decision may be based.

Criterion-referenced testing and assessment may be defined as a method of


evaluation and a way of deriving meaning from test scores by evaluating an individual’s
score with reference to a set standard.
Norm-Referenced Versus Criterion-Referenced Evaluation
The criterion in criterion-referenced assessments typically derives from the
values or standards of an individual or organization.

Because the focus in the criterion-referenced approach is on how scores


relate to a particular content area or domain, the approach has also been
referred to as domain- or content-referenced testing and assessment.
Norm-Referenced Versus Criterion-Referenced Evaluation
In norm-referenced interpretations of test data, a usual area of focus is how an
individual performed relative to other people who took the test.

In criterion-referenced interpretations of test data, a usual area of focus is the


testtaker’s performance:
• what the testtaker can or cannot do;
• what the testtaker has or has not learned;
• whether the testtaker does or does not meet specified criteria for inclusion in some
group, access to certain privileges, and so forth.

Because criterion-referenced tests are frequently used to gauge achievement or


mastery, they are sometimes referred to as mastery tests.
Culture and Inference
It is incumbent upon responsible test users not to lose sight of culture
as a factor in test administration, scoring, and interpretation.

So, in selecting a test for use, the responsible test user does some
advance research on the test’s available norms to check on how
appropriate they are for use with the targeted testtaker population.

In interpreting data from psychological tests, it is frequently helpful to


know about the culture of the testtaker, including something about the
era or “times” that the testtaker experienced.

You might also like