0% found this document useful (0 votes)
56 views

Week 5 of Tests and Testing

1. The chapter discusses the assumptions and myths of psychological testing, including that traits and states exist and can be measured, test behavior predicts future behavior, and tests have strengths and weaknesses. 2. It explains the characteristics of a good test, including clear instructions, economy of time and money, and reliability and validity in its measurement. 3. The chapter also covers the topics of norms, sources of error in testing, and the goal of conducting testing and assessment in a fair and unbiased manner to benefit society.

Uploaded by

vannamargaux14
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Week 5 of Tests and Testing

1. The chapter discusses the assumptions and myths of psychological testing, including that traits and states exist and can be measured, test behavior predicts future behavior, and tests have strengths and weaknesses. 2. It explains the characteristics of a good test, including clear instructions, economy of time and money, and reliability and validity in its measurement. 3. The chapter also covers the topics of norms, sources of error in testing, and the goal of conducting testing and assessment in a fair and unbiased manner to benefit society.

Uploaded by

vannamargaux14
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 4: Of Tests and Testing

Objectives:
After the completion of the chapter, students should be able to:
1. Discuss the myths and realities in psychological assessment
2. Understand the psychometric properties of a good test
3. Understand the nature of norms
Chapter Topics:
A. Some assumptions about psychological testing and assessment
B. What’s a good test?
C. Norms

A. Some Assumptions about Psychological Testing and Assessment


Assumption 1: Psychological Traits and States Exist
Traits  Has been defined as “any distinguishable, relatively
enduring way in which one individual varies from
another.”
 A trait is considered to be something that is part of an
individual’s personality and therefore a long-term
characteristic of an individual that shows through their
behavior, actions and feelings. It seen as being a
characteristic, feature or quality of an individual.
 For example someone who says ‘I am a confident person’
or ‘I am just an anxious person’ is stating that is these
attributes are part of who they are.
States  Also distinguish one person from another but are relatively
less enduring.
 A state on the other hand is a temporary condition that
they are experiencing for a short period of time. After the
state has passes they will return to another condition.
 For example someone who says ‘I am feeling quite
confident about this interview’ or ‘I feel nervous about
doing this’ are describing states.

Psychological Trait 
The term psychological trait, much like the term trait alone,
covers a wide range of possible characteristics.
 Among them are psychological traits that relate to
intelligence, specific intellectual abilities, cognitive style,
adjustment, interests, attitudes, sexual orientation and
preferences, psychopathology, personality in general, and
specific personality traits.
Construct  A psychological trait exists only as a construct
 It is an informed, scientific concept developed or
constructed to describe or explain behavior.
Overt Behavior  Refers to an observable action or the product of an
observable action, including test- or assessment-related
responses.
Assumption 2: Psychological Traits and States Can Be Quantified and Measured
 Measuring traits and states by means of a test entails
developing not only appropriate test items but also
appropriate ways to score the test and interpret the
results.
 For many varieties of psychological tests, some number
representing the score on the test is derived from the
examinee’s responses.
Cumulative Scoring  Is based on the assumption that the more often the test
taker responds in a particular direction as noted by test
manual as correct or consistent with a particular
trait/ability/state, the higher/lower the test taker is
presumed to be on the targeted ability/trait/state.
Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior
 The objective of the test is to provide some indication of
other aspects of the examinee’s behavior.
 The tasks in some tests mimic the actual behaviors that
the test user is attempting to understand. By their nature,
however, such tests yield only a sample of the behavior
that can be expected to be emitted under non-test
conditions. The obtained sample of behavior is typically
used tomake predictions about future behavior, such as
work performance of a job applicant.
Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses
 Competent test users understand and appreciate the
limitations of the tests they use as well as how those
limitations might be compensated for by data from other
sources that test users know the tests they use and are
aware of the tests’ limitations—emphasized repeatedly in
the codes of ethics of associations of assessment
professionals.
Assumption 5: Various Sources of Error are Part of the Assessment Process
Error  In everyday conversation, we use the word error to refer to
mistakes, miscalculations, and the like.
 Error refers to a long-standing assumption that factors
other than what a test attempts to measure will influence
performance on the test.
Error Variance  The component of a test score attributable to sources
other than the trait or ability measured.
 The element of variability in a score that is produced by
extraneous factors, such as measurement imprecision,
and is not attributable to the independent variable or other
controlled experimental manipulations.
Classical or True Score Theory  It is a theory of testing based on the idea that a person's
observed or obtained score on a test is the sum of a true
score (error-free score) and an error score. Generally
speaking, the aim of classical test theory is to understand
and improve the reliability of psychological tests.

Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner
 Today, all major test publishers strive to develop
instruments that are fair when used in strict accordance
with guidelines in the test manual.
 However, despite the best efforts of many professionals,
fairness-related questions and problems do occasionally
arise. One source of fairness-related problems is the test
user who attempts to use a particular test with people
whose background and experience are different from the
background and experience of people for whom the test
was intended.
Assumption 7: Testing and Assessment Benefit Society
 A world without tests would most likely be more a
nightmare than a dream.
 Without tests or other assessment procedures, people
could present themselves as surgeons, bridge builders, or
airline pilots regardless of their background, ability, or
professional credentials on the basis of nepotism.

B. What is a “Good Test?”


 Logically, the criteria for a good test would include clear
instructions for administration, scoring, and
interpretation. It would also seem to be a plus if a test
offered economy in the time and money it took to
administer, score, and interpret it. Most of all, a good test
would seem to be one that measures what it purports to
measure.
 Beyond simple logic, there are technical criteria that
assessment professionals use to evaluate the quality of
tests and other measurement procedures. Test users often
speak of the psychometric soundness of tests, two key
aspects of which are reliability and validity.
Reliability  The criterion of reliability involves the consistency of the
measuring tool: the precision with which the test
measures and the extent to which error are present in
measurements. In theory, the perfectly reliable measuring
tool consistently measures in the same way.
Validity  A test is considered valid for a particular purpose if it does,
in fact, measure what it purports to measure.
Other Considerations  A good test is one that trained examiners can
administer, score, and interpret with a minimum of
difficulty. A good test is a useful test; one that yields
actionable results that will ultimately benefit individual
testtakers or society at large. In “putting a test to the test,”
there are a number of different ways to evaluate just how
good a test really is (see this chapter’s Everyday
Psychometrics).
 If the purpose of a test is to compare the performance of
the testtaker with the performance of other testtakers, a
good test is one that contains adequate norms.

C. Norms
Norms  Are the test performance data of a particular group of
testtakers that are designed for use as a reference when
evaluating or interpreting individual test scores.
Norm-referenced testing and  A method of evaluation and a way of deriving meaning
assessment from test scores by evaluating an individual testtaker’s
score and comparing it to scores of a group of testtakers.
In this approach, the meaning of an individual test score is
understood relative to other scores on the same test. A
common goal of norm-referenced tests is to yield
information on a testtaker’s standing or ranking relative to
some comparison group of testtakers.

Normative sample  Is that group of people whose performance on a


particular test is analyzed for reference in evaluating the
performance of individual testtakers.
Norming  Refer to the process of deriving norms. Norming may be
modified to describe a particular type of norm derivation.
 Race Norming
 Is the controversial practice of norming on the basis of
race or ethnic background. Race norming was once
engaged in by some government agencies and private
organizations, and the practice resulted in the
establishment of different cutoff scores for hiring by
cultural group.
User norms or program norms  Which “consist of descriptive statistics based on a group of
testtakers in a given period of time rather than norms
obtained by formal sampling methods”
Sampling to Develop Norms
Standardization or Test  The process of administering a test to a representative
Standardization sample of testtakers for the purpose of establishing norms.
 A test is said to be standardized when it has clearly
specified procedures for administration and scoring,
typically including normative data. To understand how
norms are derived, an understanding of sampling is
necessary.

Sample  A portion of the universe of people deemed to be


representative of the whole population.
Sampling  The process of selecting the portion of the universe
deemed to be representative of the whole population.
 Stratified Sampling
 A method of sampling that involves the division of a
population into smaller sub-groups known as strata.
In stratified random sampling, or stratification, the
strata are formed based on members' shared
attributes or characteristics such as income or
educational attainment.
 Stratified-random Sampling
 If such sampling were random (that is, if every
member of the population had the same chance of
being included in the sample), then the procedure
would be termed stratified-random sampling.
 Purposive Sample
 (Also known as judgment, selective or subjective
sampling) is a sampling technique in which
researcher relies on his or her own judgment when
choosing members of population to participate in the
study.
 Incidental Sample or Convenience Sample
 Is a type of non-probability sampling method where
the sample is taken from a group of people easy to
contact or to reach. For example, standing at a mall or
a grocery store and asking people to answer
questions would be an example of a convenience
sample.

Types of Norms
Percentile Norms  Are the raw data from a test’s standardization sample
converted to percentile form.
 Percentile
 Is an expression of the percentage of people whose
score on a test or measure falls below a particular raw
score.
 Percentage Correct
 Refers to the distribution of raw scores—more
specifically, to the number of items that were
answered correctly multiplied by 100 and divided by
the total number of items.
Age Norms  Also known as age-equivalent scores
 Indicate the average performance of different samples of
testtakers who were at various ages at the time the test
was administered.
Grade Norms  Are developed by administering the test to representative
samples of children over a range of consecutive grade
levels (such as first through sixth grades).
 Developmental Norms
 A term applied broadly to norms developed on the basis
of any trait, ability, skill, or other characteristic that is
presumed to develop, deteriorate, or otherwise be
affected by chronological age, school grade, or stage of
life.

National Norms 
Are derived from a normative sample that was nationally
representative of the population at the time the
norming study was conducted. In the fields of
psychology and education, for example, national norms
may be obtained by testing large numbers of people
representative of different variables of interest such as
age, gender, racial/ethnic background, socioeconomic
strata, geographical location (such as North, East, South,
West, Midwest), and different types of communities within
the various parts of the country (such as rural, urban,
suburban).
National Anchor Norms  Provide the tool for such a comparison.
 Just as an anchor provides some stability to a vessel, so
national anchor norms provide some stability to test
scores by anchoring them to other test scores.
Local Norms  Provide normative information with respect to the local
population’s performance on some test.
 For example a local company personnel director might find
some nationally standardized test useful in making
selection decisions but might deem the norms published in
the test manual to be far afield of local job applicants’
score distributions.
Subgroup Norms  A normative sample can be segmented by any of the
criteria initially used in selecting subjects for the sample.
What results from such segmentation are more
narrowly defined subgroup norms.
Fixed Reference Group Scoring Systems
Fixed Reference Group Scoring  The distribution of scores obtained on the test from one
Systems group of testtakers—referred to as the fixed reference
group—is used as the basis for the calculation of test
scores for future administrations of the test.
 Perhaps the test most familiar to college students that
exemplifies the use of a fixed reference group scoring
system is the SAT.
Norm-Referenced versus Criterion-Referenced Evaluation
Criterion  Standard on which a judgment or decision may be based.
Criterion-Referenced Testing and  May be defined as a method of evaluation and a way of
Assessment deriving meaning from test scores by evaluating an
individual’s score with reference to a set standard.
Some examples:
 To be eligible for a high-school diploma, students
must demonstrate at least a sixth-grade reading level.
 To earn the privilege of driving an automobile, would-
be drivers must take a road test and demonstrate their
driving skill to the satisfaction of a state-appointed
examiner.
 To be licensed as a psychologist, the applicant must
achieve a score that meets or exceeds the score
mandated by the state on the licensing test.
Norm Reference Test  Refers to standardized tests that are designed to
compare and rank test takers in relation to one
another. Norm-referenced tests report whether test takers
performed better or worse than a hypothetical average
student, which is determined by comparing scores against
the performance results of a statistically selected group of
test takers, typically of the same age or grade level, who
have already taken the exam.
 On a norm-referenced test, an individual’s percentile rank
is calculated according to the performance of their peers.
 For example: Score is 40= 97th percentile

Correlation and Inferences


Correlation  Is an expression of the degree and direction of
correspondence between two things.
Coefficient Correlation  The coefficient of correlation is the numerical index that
expresses this relationship: It tells us the extent to which
X and Y are “co-related.”
Pearson r  Also known as the Pearson correlation coefficient and
the Pearson product-moment coefficient of correlation.
 Most widely used to measure correlation.
 The formula used to calculate a Pearson r from raw scores
is

Coefficient of Determination  The coefficient of determination is an indication of how


much variance is shared by the X- and the Y-variables.
 The calculation of r2 is quite straightforward. Simply
square the correlation coefficient and multiply by 100; the
result is equal to the percentage of the variance accounted
for.
 For example, you calculated r to be .9, then r2 would be
equal to .81. The number .81 tells us that 81% of the
variance is accounted for by the X- and Y-variables. The
remaining variance, equal to 100(1−r2), or 19%, could
presumably be accounted for by chance, error, or
otherwise unmeasured or unexplainable factors.
Spearman Rho  Also called as rank-order correlation coefficient, a rank-
difference correlation coefficient, or simply Spearman’s
rho.
 This coefficient of correlation is frequently used when the
sample size is small (fewer than 30 pairs of
measurements) and especially when both sets of
measurements are in ordinal (or rank-order) form.
Graphic Representations of Correlation
Scatterplot  A scatterplot is a simple graphing of the coordinate
points for values of the X-variable (placed along the
graph’s horizontal axis) and the Y-variable (placed along
the graph’s vertical axis).
 Scatterplots are useful because they provide a quick
indication of the direction and magnitude of the
relationship, if any, between the two variables.
Regression  May be defined broadly as the analysis of relationships
among variables for the purpose of understanding how
one variable may predict another.
 Simple Regression
 Involves one independent variable (X), typically referred
to as the predictor variable, and one dependent variable
(Y), typically referred to as the outcome variable.
 Multiple Regression
 The use of more than one score to predict Y requires
the use of a multiple regression equation.
 Equation takes into account the intercorrelations among
all the variables involved. The correlation between each
of the predictor scores and what is being predicted is
reflected in the weight given to each predictor.
Meta-Analysis  Refers to a family of techniques used to statistically
combine information across studies to produce single
estimates of the statistics being studied.
 A key advantage of meta-analysis over simply reporting a
range of findings is that, in meta-analysis, more weight can
be given to studies that have larger numbers of subjects.

References:
Cohen & Swerdik (2009). Psychological Testing and Assessment: An Introduction to Tests and
Measurement 7th Edition.

You might also like