PSYCH ASSESSMENT Notes
PSYCH ASSESSMENT Notes
TESTING ASSESSMENT
OBJECTIVE To obtain a numerical gauge with To answer a referral question, solve a problem,
regards to ability or attribute or make a decision through the evaluation
ROLE OF Tester is not key to the process. Assessor is key to the process of test
EVALUATOR selection/other tools of evaluation
OUTCOME Testing yields a score or series of test Assessment entails a logical problem-solving
scores approach that brings to bear many sources of
data
DURATION Shorter; few minutes to hours Longer; few hours to days or more
SOURCES OF One person, the test-taker Often includes collateral sources such as
DATA relatives, or friends.
QUALIFICATION Knowledge of tests and testing Knowledge of testing, assessment methods, and
procedures like psychometricians specialty area being assessed
INDIVIDUAL DIFFERENCES
● Charles Darwin. This revolutionary notion
WAYS TO CLASSIFY TESTS
aroused interest in the comparison between
humans and animals.
● Francis Galton (1869). The classification of ● According to qualification and training of
people according to their “natural gifts” and test user
ascertain their “deviation from an average” is ● According to the number of test takers
said to be the precursor of contemporary ● According to the variable being measured
tools. He also pioneered the use of the
coefficient of correlation.
○ It can be culturally biased against
QUALIFICATION AND TRAINING OF TEST USER ethnic minorities
● This includes the three-tier of psychological ○ Nonconventional, original, or novel
test: responses are penalized on
○ Type A includes achievement tests. intelligence tests.
○ Type B includes group intelligence tests Misconceptions of IQ Testing include:
and objective personality tests. ○ It measures innate intelligence
○ Type C includes individual intelligence ○ IQs are fixed, immutable and never
tests, projective tests, and diagnostic changed
tests. ○ It provides perfectly reliable scores
○ It measures everything we need to
NUMBER OF TEST TAKERS know about a person’s intelligence
● Individual Test ○ All intelligence tests measures the
● Group Test same thing
○ IQ obtained from different tests are
VARIABLE BEING MEASURED interchangeable
● The variables being measured can either be ○ A battery of tests can tell us
ability or typical performance. everything that we need to know to
● Ability has 3 different types of tests: make judgments about a person’s
achievement, aptitude, and intelligence competence.
tests.
● Typical Performance is measured through
personality inventories (objective and TYPICAL PERFORMANCE
projective), interest, and values. 1. Personal Inventory
Personality Inventory includes the
ABILITY projective and objective personality tests.
1. Achievement Tests Assumptions of personal inventory:
Achievement tests measure previous ○ It assumes that each person is
learning. consistent to some extent; we have
2. Aptitude Tests coherent traits and active patterns
Aptitude Tests measure an individual’s that arise repeatedly.
potential for learning or acquiring a ○ It assumes that each person is
specific skill. distinctive to some extent; behavioral
When is aptitude testing needed? differences exist between individuals.
○ When the selection ratio (no. of people Personality inventories has construction
needed / no. of applications) is low. strategies:
○ When the success ratio (no. of those ○ Theory-Guided Inventories are
capable from those selected / No. of constructed around a theory of
accepted applicants) is low. personality.
3. Intelligence Tests ○ Factor-Analytically Derived
Intelligence Tests measure problem Inventories
solving abilities, adaptation to change, and ○ Criterion-Keyed Inventories
abstract thinking. Objective Personality Inventories are
Pros of intelligence testing include: standardized, have a limited number of
○ IQ predicts success in a variety of responses, and have a high reliability and
areas other than any other measure. validity.
○ It provides standardized ways of Projective Personality Inventories
comparing a child’s performance with measure wishes, intrapsychic conflicts,
that of other children. desires, and unconscious motives. These
○ It provides a profile of cognitive are also subjective in interpretation and
strengths and weaknesses. clinical judgment. These have low reliability
and validity.
Cons of intelligence testing include:
○ IQs are misused as measure of innate
capacity
○ The single IQ does not do justice to OTHER INFO
the multidimensional nature of
intelligence. ● Projective and Objective Personality Tests
○ IQs are of limited value in predicting differ in the following: definiteness of task,
non-test and non-academic response mechanism, product, analysis of
intellectual ability results, and emphasis on critical validation.
2. Interest ● People are not always good judges of their
Interest is a feeling or preference ability.
concerning activities, ideas, and objects. It ● Provides an estimate
is of high variability, engaged in for a goal, ● Clinicians include self-report measures as
highly changeable, and can be correlated part of their initial examinations of
with personality traits. presenting clients.
Interest Inventories… ● Self-Report measures are frequently subject
○ measure the direction and strength of to self- censorship.
interest. ● People know their responses are being
○ assumes that even though interests measured and wish to be seen in a favorable
are unstable, they have a certain light. (self-serving bias)
stability or else it cannot be measured. ● Items are frequently included to measure the
○ Stability is said to start at 17 years old. extent to which people provide socially
○ Broad lines of interests are more desirable responses.
stable, while specific lines of interests
are more unstable. SETTINGS FOR PSYCHOLOGICAL ASSESSMENT
3. Attitude
○ Attitude is a learned predisposition to
respond positively or negatively to a EDUCATION SETTING
certain subject. ● Intelligence tests and achievement tests are
○ The approval or disapproval (moral used from an early age.
judgment) is what differentiates it ● From kindergarten on, tests are used for
from interest. placement and advancement.
○ It consists of three aspects (Affective, ● Educational institutions have to make
Behavioral, and Cognitive) admissions and advancement decisions
Attitude Inventories are… regarding students.
○ Direct observation on how a person ● Used to assess students for special
behaves in relation to certain things education programs. Also, used in
○ Attitude questionnaires or scales diagnosing learning difficulties.
(Bogardus Social Distance Scale, 1925) ● Guidance counselors use instruments for
○ Reliability is good but not as high as advising students.
those tests of ability. ● Investigates school curriculum.
○ Attitude measures have not generally
correlated very highly with actual PERSONNEL SETTING
behavior. ● Tests are used to assess: training needs,
○ Specific behaviors, however, can be worker’s performance in training, success in
predicted from measures of attitude training programs, management
toward the specific behavior. development, leadership training, and
4. Values selection.
Values define what a person thinks is ● For example, the Myers -Briggs type indicator
important and refers to the importance, is used extensively to assess managerial
utility or worth attached to particular potential. Type testing is used to hopefully
activities and objectives. match the right person with the job they are
most suited for.
Values inventory…
○ purports to measure generalized and
dominant interests MILITARY SETTING
○ Validity is extremely difficult to ● For proper selection of military recruits and
determine by statistical methods proper placement in military duties.
○ The only observable criterion is overt
behavior GERIATRIC SETTING
○ Employed less frequently than interest ● Assessment for the aged
in vocational counseling and career
decision-making. GOVERNMENT AND ORGANIZATIONAL
CREDENTIALING
● Promotional purposes
SELF-REPORT INVENTORIES ● Licensing
● Self-report relies upon the test taker’s ● Certification
awareness and honesty. ● General credentialing of professionals
● It is the best method to measure internal
states; things only the person themselves
can be aware of and judge.
FORENSIC SETTING ○ Formative evaluation refers to
● Evaluate mental health of people charged evaluation conducted during or after
with crime/s instruction.
● Investigating malingering cases in court ○ Summative evaluation refers to
● Criminal profiling evaluation conducted at the end of a
● Making child custody/annulment/divorce unit or a specified period of time.
decisions
RESEARCH
CLINICAL SETTING ● For example, a neuropsychologist wished to
● Tests of Psychological Adjustment and tests investigate the hypothesis that low- level
which can classify and/or diagnose patients lead absorption causes behavior deficits in
are used extensively. children.
● Psychologists generally use a number of
objective and projective personality tests. STEPS IN CLINICAL PSYCHOLOGICAL
● Neuropsychological tests which examine ASSESSMENT
basic mental function also fall into this
category. Perceptual tests are used for 1. Deciding what is being assessed
detecting and diagnosing brain damage. 2. Determining the goals of assessment
● “Tests do not diagnose, People do!” 3. Selecting standards for making decisions
4. Collecting assessment data
5. Making decisions and judgments
USES OF TESTS 6. Communicating results
AFFECT SENSORIUM
● Visible moment-to-moment emotional tone ● assesses brain function, including
that can be observed intelligence, capacity for abstract thought,
● Based on nonverbal behavior and outward and level of insight and judgment
expression of emotion ○ Consciousness
● Content or type. Facial expression, eye ○ Orientation
contact, tone of voice, & body posture, ○ Concentration & attention
movement ○ Memory
● Range and duration ○ Reading & writing
● Appropriateness. Evaluated based on speech ○ Visuospatial ability
content, context of the subject or life ○ Abstract thinking
situation ○ Fund of knowledge
● Depth or intensity
● May or may not be congruent with mood ORIENTATION
● Person, place, time, situation (O x 4)
MOOD ● Assesses confusion or disorientation, which
● Pervasive & sustained emotion that colors is usually associated with an organic process
the person’s perception of the world ● Loss of awareness usually in this sequence:
● Internal, subjective, verbal self-report of situation, sense of time, sense of place,
mood state identity (person)
SPEECH CONSCIOUSNESS
● Rate, volume, quantity, quality ● Assessed along a continuum from alert to
● Provides basis to evaluate thought process comatose
and content ● Disturbances usually indicate organic brain
impairment
THOUGHT PROCESS ○ Alert, confused, clouded, stuporous,
● The client’s form of thinking, the way in unconscious
which ideas & associations are put together;
logical, coherent, illogical, incomprehensible MEMORY
○ Overabundance or poverty of ideas ● The ability to recall experiences.
○ Rapid thinking, flight of ideas, slow ● Types of memory.
thinking ○ Remote memory (e.g., childhood) can
○ Vague or empty, relevant or irrelevant be integrated in the psychosocial
○ Goal-directed thinking, clear history portion of the intake
cause-&-effect relation in the patient’s ○ Recent past memory (e.g., recall of
explanations important events from the past few
○ Loose associations, tangentiality, months)
circumstantiality ○ Recent memory (e.g., dinner in the
○ Neologisms, word salad, clang previous evening)
associations ○ Immediate memory (retention and
recall of information upon exposure)
THOUGHT CONTENT ● Concentration and attention (effort given to
focusing on parts of an experience)
● Reading and writing
● Visuospatial ability
● Abstract thinking is the ability to deal with TOPIC 5: Nature and
concepts, explaining similarities between Uses of Psychological
●
objects, and interpreting simple proverbs
Fund of Knowledge. Intelligence is related to
Tests
vocabulary and general fund of knowledge Psychological Assessment
● Take into account patient’s educational level (PSY 9) SEM 1
and socioeconomic status
SIMILARITIES DIFFERENCES
All psychological tests require an individual to The behavior they require the test taker to perform
perform a behavior.
The behavior performed is used to measure some The attribute they measure
personal attribute, trait, or characteristic.
The behavior performed may also be used to predict How they are administered and formatted
outcomes.
RAW SCORES
● Raw score is the numerical description of
the performance of an individual. Raw scores
obtained via psychological tests are
commonly interpreted by reference to norms.
NORMS
● Norms represent the average test
performance (i.e., what is typical or normal)
of the individuals within a
standardization/normative sample PROPERTIES OF A NORMAL CURVE
(reference group whose scores are the ● Bell-shaped. The top of the curve shows the
standard of comparison for future test mean, mode, and median of the data
takers). collected. Its standard deviation depicts the
● Norms determine what is normal for a bell curve's relative width around the mean.
target/specific population. (eg. tests for 5th ● Bilaterally symmetrical. The two sides of a
grade levels are solely targeted for 5th graders) bell curve is equal.
● Asymptotic. The curve approaches the
DERIVED SCORES horizontal axis but never meets it (aw).
● Derived scores are transformed raw scores ● Mean=Median=Mode. An equal mean,
into a number that more precisely illustrates median, and mode results to a normal
an individual’s exact position relative to the distribution. Otherwise, the curve will be
normative group. skewed.
● Derived scores also make the scores ● Unimodal. There is only one clear peak.
meaningful.
● Purpose of derived scores,
○ They indicate the individual’s relative SCALES OF MEASUREMENT
standing in the normative sample.
○ They provide a comparable measure PROPERTIES OF SCALE
that permit a direct comparison of the ● Magnitude is the property of “moreness”.
individual’s performance on different Numbers are ordered from smaller to larger.
tests or with different people. ● Equal intervals. Difference from two points
○ Derived scores are expressed in two at any place on the scale has the same
ways: Developmental scores which meaning as the difference between two other
pertain to developmental level attained points that differ by the same number of
e.g., age and grade equivalents and scale units.
scores of relative standing which are ● Absolute zero. Nothing of the property being
relative position within a specific measured exists. The data has a possibility
group. of being zero.
PRIMARY SCALES OF MEASUREMENT SCALING TECHNIQUES
NOMINAL
● A non-parametric measure that is also called
a categorical variable. Simple classification.
eg. sex (Male/Female), nationality (Filipino,
Korean, American).
● It cannot be arranged in any particular order.
● Best measure of central tendency is mode.
● Best measure of spread is none.
ORDINAL
● A non-parametric measure wherein cases are
ranked or ordered.
● Best measure of central tendency is mode
and median.
● Best measure of spread is IQR. COMPARATIVE AND NON-COMPARATIVE SCALES
OF MEASUREMENT
INTERVAL
● A parametric measure which includes equal
COMPARATIVE SCALES
intervals, wherein the difference between two
values is meaningful.
● Best measure of central tendency is mean, PAIRED COMPARISON
median, mode. ● A comparative technique in which a
● Best measure of spread is range, variance, respondent is presented with two objects at
SD, and IQR. a time and asked to select one object
according to some criterion. The data
RATIO obtained are in ordinal nature.
● A parametric measure wherein this scale is ● It aims to compared two objects at a time
similar to interval but include a true zero based on preference.
point and relative proportions make sense.
● Best measure of central tendency is mean, RANK ORDER
median, mode. ● Respondents are presented with several
● Best measure of spread is range, variance, items simultaneously and asked to rank
SD, and IQR. them in order of priority. This is an ordinal
scale that describes the favored and favored
SEATWORK ANSWERS objects, but does not reveal the distance
between the objects.
● Socioeconomic status - ordinal
● This yields a better result when comparisons
● Amount you spend per month on
are required between the given objects.
entertainment - ratio
● Month of your birthday - nominal
● Birthplace - nominal CONSTANT SUM
● Number of subjects you’re taking - ratio ● A ratio scale where respondents are asked to
● Hair color - nominal allocate a constant sum of units such as
● Amount of rainfall in a month - ratio points, rupees or chips among a set of
● Number of texts you send on your phone stimulus objects with respect to some
each day - ratio criterion.
● Days of the week - nominal ● Respondents might be asked to divide a
constant sum to indicate the relative
● Dress sizes - interval
importance of the attributes.
● Current world rankings of Basketball players
- ordinal
● Average number of words recalled in a Q-SORT TECHNIQUE
memory task - ratio ● It uses a rank order procedure to sort objects
● Military rank - ordinal based on similarity with respect to some
● Marital status - nominal criterion. The important characteristic is that
● IQ score - interval it is more important to make comparisons
among different responses of a respondent
than the responses between different
respondents.
substitute for the semantic differential
OTHER INFO
when it is difficult to create pairs of
bipolar adjectives. The modified staple
Paired Comparison compares two objects at a
scale places a single adjective in the
time. Rank Order ranks particular objects.
center of an even number of numerical
Constant Sum assigns points to given objects.
values.
Q-Sort Technique sorts objects into a given
criteria.
● Guttman Scale.
○ Also known as cumulative scaling or
scalogram analysis. It is an ordinal scale
with a number of statements placed in a
hierarchical order. The order is arranged
so that if a respondent agrees with a
statement, they will also agree with all of
the statements that fall below it in
extremity.
NON-COMPARATIVE SCALES
DESCRIPTIVE STATISTICS
ITEMIZED RATING SCALE
● A scale having numbers or brief descriptions
associated with each category. The MEASURES OF CENTRAL TENDENCY
categories are ordered in terms of scale
position and the respondents are required to MEAN
select one of the limited numbers of ● Sum of the scores divided by the number of
categories that best describes the object scores (Average)
being rated. ● Sensitive of extreme scores and to the exact
● Likert Scale. values of all the scores in the distribution
○ Respondents indicate their own attitudes ● Use when the data is approximately normally
by checking how strongly they agree or distributed and does not have extreme
disagree with carefully worded outliers.
statements that range from very positive
to very negative towards the attitudinal MEDIAN
object. ● Arranged in rank, it is the centermost score if
the number of scores is odd. If it’s even, it is
● Semantic Differential Scale. the average of the two centermost scores.
○ This is a seven-point rating scale with ● Less sensitive than mean to extreme scores
end points associated with bipolar labels ● Use when the data is skewed or contains
(such as good and bad, complex and extreme outliers.
simple) that have semantic meaning. It
can be used to find whether a respondent MODE
has a positive or negative attitude ● Most frequent score in the distribution
towards an object. ● Unimodal (one mode) & Bimodal (two
modes)
● Stapel Scale. ● Use when the goal is to identify the most
○ Originally developed to measure the common values in the dataset, especially in
direction and intensity of an attitude categorical/discrete data
simultaneously. Modern versions of the
staple scale place a single adjective as a
APPROPRIATE USE ACCORDING TO TYPE OF DATA
BEING USED
● Nominal – Mode
● Ordinal – Median
● Interval/Ratio (Normal) – Mean
● Interval/Ratio (Skewed) – Median
INTERQUARTILE RANGE
● An ordinal statistic of variability equal to the
difference between the third and first
quartile points in a distribution that has
been divided into quartiles.
● Q3 – Q1 = IQR
SKEWNESS
Skewness pertains to a lack of symmetry. In a data
set, it is important for it to be normally distributed,
instead of skewed.
POSITIVE SKEW
VARIANCE ● Positive skew means that there is more low
● A measure of variability equal to the scores than high scores. It also reaches the
arithmetic mean of the squares of the test floor which is the lower limit of the test.
differences between the scores in a ● On a graph, the curve is leaning towards the
distribution and their mean. left.
● A positively skewed test means that the test
STANDARD DEVIATION is too difficult, which doesn’t make it a good
● A measure of variability equal to the square test.
root of the averaged squared deviations
about the mean.
● Eg. Ms. Salas administered a test about
acceleration to her physics class. If the
scores of the students assume a
hypothetical normal distribution, with a
mean = 70, and SD = 15, a square equal to 2
standard deviations may possibly be 100.
NEGATIVE SKEW ● Determines the proportion of the total area
● Negative skew means that there are more greater than, in between, or less than an
high scores than low scores. It also reaches empirical value (using the z-score table)
the test ceiling which is the upper limit of ○ It transforms the original score into
the test. units of standard deviation. It allows
● On a graph, the curve is leaning towards the us to compare scores that are on the
right. same scale.
LINEAR TRANSFORMATION
EXAMPLE
● Dr. Aranas administered a 100-item exam
about tests to a class of 40 students. To his
surprise, only 5 students passed with scores
of 80, 82, 85, 90, and 99, while others
obtained scores within the range of 20 to 35.
Example.
The shape of the distribution with this set of
In the exam, the mean grade = 78, and the SD = 10.
scores is positively skewed, and the most
advisable measure of central tendency is
1. Z-score of students whose grades are 93 and
median.
62.
OTHER INFO
KURTOSIS
Kurtosis is the amount of dispersion of data.
PLATYKURTIC
● Data has the highest dispersion, making it
flat.
● Positive kurtosis = CK < 3 2. The grades of students whose z-scores are
0.6 and 1.2
MESOKURTIC
● Data has a normal curve
● Normal distribution. CK = 3
LEPTOKURTIC
● Data has the lowest dispersion, making it
more curved.
● Negative kurtosis. CK > 3
STANDARD SCORES
PERCENTILES
Z-SCORE ● Percentiles are the expression of the
● Also known as the standard score, a percentage of people whose score on a test or
statistical measure that quantifies how measure falls at or below a particular raw
many standard deviations a data point is score.
away from the mean of a data set ● Measure of relative performance, and at
ordinal level.
EXAMPLE
● Maia and Mia are seatmates who recently TOPIC 2: Test Analysis
received their scores for their exam. Their Psychological Assessment
professor gave their scores in terms of (PSY 9) SEM 1
different standard scores. Assuming a
normal distribution with a mean = 70, and SD
= 8, Maia received a z-score of 0.8 while Mia
has a T-score of 60. A “GOOD” TEST
● Therefore, Mia received a higher raw score
● The psychometric properties of a good test
than Maia.
include:
○ Norms
○ Reliability
○ Validity
● Basically, it can accurately differentiate high
scorers and low scorers.
RELIABILITY
● Reliability pertains to the stability and
consistency of the measurement
● Reliability Coefficient (r) is the ratio
between true score variance on a test and
total variance
RELIABILITY ESTIMATES
TEST-RETEST RELIABILITY
● Coefficient of Stability typically measures
stable variables e.g., traits
● It compares the scores of individuals who
have been measured twice by the instrument
● This is not applicable for tests involving APPROPRIATE MEASURES USED TO ESTIMATE
reasoning and ingenuity ERROR
● Longer interval will result to lower
correlation while shorter interval will result
to higher correlation Source of Error Measure
● The ideal time interval for test-retest
Inter-scorer & Inter-rater reliability
reliability is 2-4 weeks
Interpretation Sampling
● Source of error variance is time sampling
Error
error and test administration error
Time Sampling Error Test-Retest Reliability
PARALLEL/ALTERNATE FORMS RELIABILITY
● Coefficient of Equivalence where same Content Sampling Error Parallel/Alternate Form
persons are tested with one form on the first Reliability or Split Half
occasion and with another equivalent form Reliability
on the second.
○ Parallel Form: Different versions of the test
developed to be as similar as possible in RELIABILITY RANGES
terms of content, format, difficulty.
○ Alternate Form: Different versions of the test
that allows for slight variability in terms of
content and wording, as long as they
remain conceptually equivalent.
○ Immediate alternate forms. Source of error
variance is content sampling
○ Delayed alternate forms. Source of error
variance is item sampling and content
sampling
● Its advantage is that it prevents practice
effect, but it is impractical (disadvantage). LOW RELIABILITY
● Increase the number of items
SPLIT HALF RELIABILITY ○ It provides more opportunities to
● Coefficient of Internal Consistency. Two assess the construct of interest. In
scores are obtained for each person by general, longer tests tend to have
dividing the test into equivalent halves higher reliability.
○ Odd-Even Split: Divided based on item ● Use factor analysis and item analysis
numbers (even-numbered items are ● Use the correction for attenuation formula
correlated with odd-numbered items)
○ Top-Bottom Split: Divided based on FACTORS AFFECTING RELIABILITY
content and difficulty (first half is 1. Test Format
correlated with second half) 2. Test Difficulty
● The reliability of the test is directly related to 3. Test Objectivity
the length of the test 4. Test Administration
● The source of error variance is content/item 5. Test Scoring
sampling error 6. Test Economy
7. Test Adequacy
INTER-RATER RELIABILITY
● It pertains to the degree of agreement
between raters on a measure VALIDITY
● The source of error variance is content ● Validity pertains to the judgment or estimate
sampling error of how well a test measures what it purports
to measure in a particular test.
INTERNAL CONSISTENCY
● It assesses the correlation between multiple TRINITARIAN VIEW OF VALIDITY
items in a test that are intended to measure ● The trinitarian view of validity is an
the same construct approach that considers criterion-oriented
○ Homogenous Test: Tests that measure (predictive), content, and construct validity
one variable for the assessment of test validity.
● The source of error variance is content
sampling error
● Types of Criterion Validity include:
○ Concurrent Validity is the extent to which
test scores may be used to estimate an
individual’s present standing on a
criterion
○ Predictive Validity pertains to how scores
on a test can predict future behavior or
scores on another test taken in the future
○ Incremental Validity is related to predictive
validity wherein it is defined as the
degree to which an additional predictor
explains something about the criterion
TYPES OF VALIDITY
measure that is not explained by
predictors already in use.
FACE VALIDITY
● Face validity is the least stringent type of CONSTRUCT VALIDITY
validity, whether a test looks valid to test ● A construct is an informed scientific idea
users, examiners, and examinees. developed or hypothesized to describe or
● An IQ test has good face validity because it explain a behavior
clearly measures memory, verbal reasoning, ● “Does the construct we want to measure
etc, while a word association test has bad really exist?”
face validity because it is not clear. ● It is designed to measure a construct must
estimate the existence of an inferred,
CONTENT VALIDITY underlying characteristic based on a limited
● Content validity pertains to whether the test sample of behavior
covers the behavior domain to be measured ● A test has a good construct validity if there is
which is built through the choice of an existing psychological theory which can
appropriate content areas, questions, tasks support what the test items are measuring.
and items ● Evidences of construct validity include:
● This considers the adequacy of representation ○ Evidence of Homogeneity which measures a
of the conceptual domain the test is single construct
designed to cover. ○ Evidence of Changes with Age where score
● “Do the test items (content) reflect the increases or decreases as a function of
behavior being measured?” age, passage of time, or experimental
● Determination of content validity is often manipulation
made by expert judgment. A panel of experts ○ Evidence of Pretest-Posttest Change where
can review the test items and rate them in difference of scores from pretest and
terms of how closely they match the posttest of a defined construct after
objective or domain specification. careful manipulation
● Issues arising from a lack of content ○ Convergent Evidence where a test
validity: correlates highly with other variables
○ Construct underrepresentation - Failure to with which it should correlate
capture important components of a ○ Discriminant Evidence where test does not
construct (e.g. An English test which only correlate significantly with variables
contains vocabulary items but no from which it should differ
grammar items will have a poor content ● Example:
validity.) ○ The professor announce that his class’
○ Construct-irrelevant variance - Happens when 100-item exam will cover 10 chapters of
scores are influenced by factors irrelevant the reference. Tina only studied 5
to the construct (e.g. test anxiety, reading chapters, but still managed to have the
speed, reading comprehension, illness) highest score. Therefore, the exam
lacked content validity.
CRITERION VALIDITY
● Criterion is a standard against which a test TEST FAIRNESS
or a test score is evaluated ● Test Fairness is the extent to which a test is
● “How similar/dissimilar would the score be if used in an impartial, just and equitable way.
we will compare it to another test related to ● Factors influencing test validity:
what is being measured?” ○ Appropriateness of the test
● It indicates the test effectiveness in ○ Directions/Instructions
estimating an individual’s behavior in a ○ Reading Comprehension Level
particular situation ○ Item Difficulty
● It tells how well a test corresponds with a ○ Test Construction Factors
particular criterion.
○ Length of Test BARNUM EFFECT
○ Arrangement of Items ● This refers to when individuals believe that
○ Patterns of Answer generic information, which could apply to
anyone, applies specifically to themselves.
TEST BIAS AND RATING ERRORS
● It pertains to a judgment resulting from the SOCIAL DESIRABILITY
intentional or unintentional misuse of rating ● Social desirability is the tendency for people
scales. to present themselves in a generally
favorable fashion.
SEVERITY/STRICTNESS ERROR
● iT IS type of rating error in which the ratings SOCIAL ACQUIESCENCE
are consistently overly negative, particularly ● The response set called acquiescence, for
with regard to the performance or ability of example, refers to one’s tendency to respond
the participants. with “true” or “yes” answers to questionnaire
● It is caused by the rater's tendency to be too items regardless of what the item content is.
strict or negative and thus to give
undeservedly low scores. NON-ACQUIESCENCE
● This is the exact opposite of acquiescence
LENIENCY/GENEROSITY ERROR bias where the participant seeks to disagree
● It means the rater is lenient and is going “too with every statement or question the
easy” on the person they are rating. That researcher makes.
means all scores will be very high.
FAKING-GOOD
CENTRAL TENDENCY ERROR ● It refers to a behavior in which subjects
● The central tendency bias causes some present themselves in a favorable manner,
raters to score every question on a scale near endorsing desirable traits and rejecting
the center. A rating of “3” on a 5 point scale undesirable ones
for every question is a clear example of the
central tendency bias at play. FAKING-BAD
● It refers to attempts to appear worse than
PROXIMITY ERROR actually is the case.
● This occurs when a rating made on one item
or dimension of a rating scale affects the
rating of the following or nearest
item(s)/dimension(s). TOPIC 3: Test
PRIMACY ERROR
Development
Psychological Assessment
● This refers to the tendency to recall
information presented at the start of a list
(PSY 9) SEM 1
better than information at the middle or end.
EXAMPLES
HOMEWORK
TEST REVISION
● Based on the results of test tryout and item
analysis, a test developer modifies a test’s
content or format for the purpose of ITEM RESPONSE THEORY
improving the test’s effectiveness as a tool of ● Latent Response Theory (“Modern
measurement. Psychometrics”) is a statistical framework
used to model and analyze the relationship
between individual’s response to test items
and their underlying ability or trait.
CHALLENGING TRADITIONAL VIEWS: Intelligence
as Multiple Abilities
TOPIC 4: Theories &
Measurement of LOUIS THURSTONE’S MULTIPLE-FACTOR THEORY
Intelligence ● It argues that different aspects of an
individual are distinct enough that multiple
Psychological Assessment abilities must be considered.
(PSY 9) SEM 1
INTELLIGENCE
● Francis Galton. Intelligent people are those
equipped with best sensory abilities.
● Alfred Binet. Intelligence is the ability to solve
problems. It four has components: reasoning,
judgment, memory, and abstraction.
● David Wechsler. Intelligence is the aggregate
capacity of an individual to act purposefully, to
think rationally, and deal with the environment RAYMOND CATTELL’S FLUID AND CRYSTALLIZED
effectively. INTELLIGENCE
● Jean Piaget. Intelligence is conceived of as a
● Fluid intelligence is about dealing with new
kind of evolving biological adaptation to the
problems not influenced by past learning and
world. A person moves through the stages of
culture. This can include identifying new
cognitive development through the
patterns and new solutions.
interaction of one’s biological factors and
● Crystallized intelligence is about using
learning.
learned skills, knowledge, and experiences
● Intelligence is the ability to solve problems
influenced by past learning and culture.
we encounter in our environment using our
senses and other mental abilities to survive
CATTELL-HORN-CARROLL MODEL
or live a good life.
● This model is called the Three-Stratum
Theory of Cognitive Abilities, which entails
PERSPECTIVES that intelligence is made up of 3 layers; the
top level (general intelligence), the second
● Factor Analytic Theories is identifying ability
level (consisting of 8 abilities), and the last
or groups of abilities that constitute
level (many level factors linked to each
intelligence. What constitutes intelligence?
ability).
● Information Processing Theories is
identifying the specific mental processes that
constitute intelligence. How does intelligent
work?
CULTURAL CONSIDERATIONS
● Norms. No must always be considered.
○ There is no IQ test that is culture-free
○ Culture loading - extend on how an
item favors a particular culture. Eg.
“Who is the first prime minister in
Canada?” has a high culture loading
● Language Barrier. Little to no verbal items
(e.g., CFIT, Raven’s) means less culture
loading
● Street Smart. This is a characteristic of a
person who knows their way around the
streets.
● Flynn Effect. A shorthand reference to the
progressive rise in intelligence test scores
that is expected to occur in every generation.
It implies that as time goes by, the average
intelligence scores of humanity increases.