Module 2 PSYCH 3140
Module 2 PSYCH 3140
Prepared by:
ELIZABETH S. SUBA, Ph.D., RPsy, RPm, RGC
ANGELO R. DULLAS, MA Clinical Psych
E-mail Address:
[email protected]
MODULE 2
Overview
In this module, we will provide you with the Principles of Psychological Assessment and
Psychological Testing, the definition and its basic concepts. You are expected to define what are
the Principles of Psychological Assessment and Psychological Testing, the definition and its basic
concepts primarily the statistical foundation of modern psychometrics. The following are the
outline of this chapter.
P S Y C H O L O G I C A L A S S E S S M E N T | 15
1. Scales of Measurement
2. Statistical Interpretation of Tests Scores (Raw and Derived Scores)
3. Measure of Central Tendencies
4. Measures of Variability
5. Norms
5.1 Linear and Non-Linear Transformation
5.2 Types of Norms
6. Test Reliability
6.1 General Model of Reliability
6.2 Test-retest
6.3 Alternate Form
6.4 Split Half Reliability
6.5 Kuder Richardson
6.6 Standard Error of Measurement
7. Test Validity
7.1 Content Validity
7.2 Criterion-related Validity
7.3 Construct Validity
8. Item Analysis
I. Objectives:
Descriptive Statistics - procedures used to summarize and describe a set of data in quantitative
terms, where complete population data are available.
P S Y C H O L O G I C A L A S S E S S M E N T | 16
Inferential Statistics - procedures used in drawing inferences about the properties and
characteristics of populations from sample data. Inferences are logical deductions about things
that cannot be observed directly.
Scales of Measurement
A measurement scale differentiate people from each other on any one variable.
Variable- a factor, property, attribute, characteristic, or behavior dimension along which people
or objects differ.
Physical dimension- length or weight
SCALES OF MEASUREMENT
Scales Description Limitations/Application
- numbers are used to classify and Limitations:
identify people or objects
Nominal -they do not provide very precise
according to category labels.
information about individual
differences; do not really quan- tify
test-taker’s performance;
Examples:
Gender can be categorized as
“male” or “female” -they indicate the presence or
absence of a property but not the
extent or amount of a property
we can choose to give all females a
“score” of 1 and all “males” a score
of 2. Compare:
IQ = 102, IQ= 108
Example, IQ = Average
we can administer an IQ test to a
group of people and reclassify their
Note: when we transform scores to
scores as “below average”,
a nominal scale our information
“average”, or “above average”.
becomes more general and less
precise.
Limitation:
Ordinal
P S Y C H O L O G I C A L A S S E S S M E N T | 17
Ex. height
1. Person A demonstrates a higher
The difference between 60 level of anxiety than Person B, who
and 65 inches, a 5 unit difference, in turn is more anxious than person
is exactly the same as the C. The scores permit us to
difference between 40 and 45 determine the relative extent of
inches. anxiety in these three people.
Note:
2. the difference between a score of
55 and a score of 65 for persons A
Scores on most
and B is equivalent to the difference
psychological tests are designed to
between 45 and 55 for person B and
represent interval scales of
measurement
P S Y C H O L O G I C A L A S S E S S M E N T | 18
NOTE:
As we move from nominal scale to interval and ratio scales, we increase the precision of
the measurement process.
P S Y C H O L O G I C A L A S S E S S M E N T | 19
Interval and ratio scales with their equal units are most appropriate for comparing people,
for the study of individual differences.
Types of Scores
Transformed scores or Derived Scores- scores resulting from the transformation of raw score into
other scales in order to facilitate analysis and interpretation.
• in a linear transformation the original (raw) score and transformed scores will be related
in a linear manner.
1. Central Location or Central Tendency- refers to a value or measure near the center of the
distribution which represents the average score of the group.
Mean- the arithmetic average or the value obtained by adding together a set of measurements
and then dividing by the number of measurements in the set.
Median- the middlemost score or the score above and below which 50% of the score fall. It is
sometimes referred to as the 50th percentile, the 5th decile, and the second quartile.
Mode- the score that occurs more frequently in a set of test scores or the score obtained by the
most number of people. When test scores are grouped into intervals, the mode is the midpoint of
the interval containing the largest number of scores.
1. Variation- refers to the extent of the clustering about a central value or the dispersion of
scores around a given point. If all scores are close to the central value, their variation will
be less than if they tend to depart more markedly from the central values.
Range- the simplest measure; the difference between the largest and smallest score.
P S Y C H O L O G I C A L A S S E S S M E N T | 20
Positively Skewed- if the larger frequencies tend to be concentrated toward the low end of the
variable and the smaller frequencies toward the high end. Few high scores and many low scores.
Mean is larger than the median.
Negatively skewed- the larger frequencies are concentrated toward the high end of the scale and
the smaller frequencies toward the low end. Many high scores and few low scores. The median is
larger than the mean.
P S Y C H O L O G I C A L A S S E S S M E N T | 21
Example: If a test is easy, the scores would cluster at the high end of the scale and tail off
toward the low end.
Normal Curve- if the distribution is symmetrical, bell shaped and the larger frequencies are
clustered around the average. The mean, median and mode coincide.
Norm-Referenced Test
❑ when an individual’s score is compared to other individuals who have taken the test often
called standardization sample or normative group.
NORMS- refer to the performance of the standardization sample used in the process of
standardizing the test; empirically established and presented in tabular form.
- raw scores are converted to some form of derived scores or norms.
Two essential points should be stressed:
KINDS OF NORMS
Developmental Norms - indicate how far along the normal developmental path the individual
had progressed. (Anastasi & Urbina, 1997).
1. Age norms- Age equivalent is the median score on a test obtained by persons (standardization
sample) of a given chronological age.
Mental Age score of an examinee corresponds to the chronological age of the subgroup in the
standardization group whose median is the same as that of the examinee.
2. Grade norms or equivalents- often used in interpreting educational achievement tests. Grade
norms are found by computing the mean or median raw score obtained by students at a given
grade level.
For example: if the average number of problems solved correctly on an Arithmetic test by
the fourth graders in the standardization sample is 23, then a raw score of 23 corresponds to a
grade equivalent of 4.
Percentile- scores that are expressed in terms of the percentage of persons in the standardization
sample who fall below a given raw score. Also called percentile rank
For example, If 28% of the subjects obtained a score of 15 problems correct on a
Mathematical Ability test, then a raw score of 15 corresponds to the 28th percentile.
Limitations: inequality of their units, especially at the extremes of the distribution.
P S Y C H O L O G I C A L A S S E S S M E N T | 24
Standard Scores
Nonlinear or Normalized standard scores- expressed in terms of a distribution that has been
transformed to fit a normal curve.
T Scores
If the normalized standard score is multiplied by 10 and added to or subtracted from 50.
Has a fixed mean of 50 and a standard deviation of 10.
A score of 50 corresponds to the mean, a score of 60 to 1 SD above the mean, and so forth.
Some test developers prefer T-scores because they eliminate the decimals and positive
and negative signs of z-scores.
Stanines
▪ Range from 1 to 9, with a mean of 5 and a standard deviation of 1.96 except for the stanines
of 1 and 9.
P S Y C H O L O G I C A L A S S E S S M E N T | 25
▪ Raw scores are converted to stanines by having the lowest 4 percent of the individuals
receive a stanine score of 1, the next 7 percent receive a stanine of 2, the next 12 percent
receive a stanine of 3, and then just keep progressing through the group.
▪ The disadvantage is that the stanines represent a range of scores, and sometimes people
do not understand that one number represents various raw scores.
Deviation IQs
• is a standard score with a mean of 100 and an SD that approximate the SD of the Stanford-
Binet IQ distribution. It resembles an IQ scale because of the use of 100.
• the deviations from the mean are converted into standard scores, which typically have a
mean of 100 and a standard deviation of 15.
• an extension of the ratio IQ (intelligence quotient) used in early intelligence tests. They
are more preferred now than the ratio IQ .
CORRELATIONAL STATISTICS
Correlation is concerned with determining the extent to which two sets of measures such
as intelligence test scores and school grades are related.
Correlation coefficient – a numerical index that describes the magnitude and direction of the
relationship between two variables. (Aiken, 2000) It may be either Positive or Negative.
-The accuracy with which a person’s score on measure Y can be predicted from his or her
score on measure X depends on the magnitude of the correlation between the two
variables.
-The closer the correlation coefficient is to an absolute value of 1.00 (either +1.00 or –
1.00, the smaller the average error made in predicting Y scores from X scores
For example,
If the correlation between tests X and Y is close to +1.00, it can be predicted with
confidence that a person who makes a high score on variable X will also make a high score
on variable Y and a person who makes a low score on X will also obtain a low score on Y.
On the other hand, if the correlation is close to – 1.00 what could be your prediction?
P S Y C H O L O G I C A L A S S E S S M E N T | 28
-The fact that two variables are significantly correlated facilitates predicting performance
on one from performance on the other, but it provides no direct information on whether
the two variables are causally connected.
Simple Linear Regression – procedure for determining the algebraic equation of the best-fitting
line for predicting scores on a dependent variable from one or more independent variables. The
product moment correlation coefficient, which is a measure of the linear relationship between
two variables is actually a by-product of the statistical procedure for finding the equation of the
straight line that best fits the set of points representing the paired X-Y values.
Multiple Regression Analysis – it is an extension of simple linear regression analysis to two or more
variables, with Y as the criterion variable and X1, X2, and X3 as the independent variables.
➢ The chi-square for independence – it sets to find out whether two nominal variables A and
B are independent of each other, or whether an association exists between them.
Example: is there an association between the manager’s annual salary rate (High,
moderate, low) and his educational attainment ?
P S Y C H O L O G I C A L A S S E S S M E N T | 29
Refers to the consistency of scores obtained by the same person when retested with the
same test or with an equivalent form of the test on different occasions.
➢ Validity
Item Analysis- process of statistically reexamining the qualities of each item of the test. It
includes Item Difficulty Index and Discrimination Index.
P S Y C H O L O G I C A L A S S E S S M E N T | 30
TEST RELIABILITY
▪ Refers to the accuracy or consistency of measurement or the degree to which test scores
are consistent, dependable, repeatable and free from errors or free from bias.
▪ Broadly, test reliability indicates the extent to which individual differences in test scores
are attributable to “true differences in the characteristics under consideration and the
extent to which they are attributable to chance errors”
Other things being equal, the longer the test, the more reliable it will be.
Lengthening a test ,however, will only increase its consistency in terms of content sampling, not
its stability over time. The effect that lengthening or shortening a test will have on its coefficient
can be estimated by means of the Spearman-Brown formula.
Spearman Brown formula is used to correct the split-half reliability estimates.
- Provides a good estimate of what the reliability coefficient would be if the two halves were
increased to the original length of the instrument.
The mean of this hypothetical score distribution is the person’s true score on the test. If a
client took a test 100 times, we would expect that one of those test scores would be his
or her true score.
Depending on the confidence level that is needed, standard error of measurement can be
used to predict where a score might fall 68%, 95% or 99.5% of the time.
The formula for calculating the standard error of measurement (SEM) is:
SEM= s√ 1- r
Where: s represents the standard deviation and r is the reliability coefficient.
P S Y C H O L O G I C A L A S S E S S M E N T | 34
(Whiston, 2000)
TEST VALIDITY
The degree to which a test measures what it purports (what it is supposed) to measure when
compared with accepted criteria. (Anastasi and Urbina, 1997).
TYPES OF VALIDITY
Types Purpose/Description Procedure Types of Tests
To compare whether Compare test blueprint with the -Survey
the test items match the school, course, program achievement tests
CONTENT
set of goals and objectives, goals.
-Criterion-referen-
objectives;
Have panel experts in content ced tests
area (e.g. teachers, profess -
-Essential skills tests
sors), to do the following:
-if the test items are
-Minimum-level
representative of the -Examine whether the items
skills tests
defined universe or represent the defined universe
content domain that or content domain. - State assessment
they are supposed to tests
- Utilize systematic obser- vation
measure.
of behavior (observe skills and -Professional
- concern is on test competencies needed to licensing exams
items (content), perform a given task;.
-Aptitude Tests
objectives, and format.
P S Y C H O L O G I C A L A S S E S S M E N T | 36
Achievement tests
certification tests
-Scholastic aptitude
tests
CRITERION- -criterion measure is to Correlate test scores with
RELATED be obtained in the criterion measure obtained after -General aptitude
future. a period of time. batteries
-Goal is to have test -Prognostic tests
Predictive
scores accurately pre-
Ex. Predictive validities of -Readiness tests
dict criterion perfor-
Admission tests
mance identified. -Intelligence tests
Validity Coefficient – the correlation between the scores on an instrument and the correlation
measure.
ITEM ANALYSIS
A general term for procedures designed to assess the utility or validity of a set of test items.
• Validity concerns the entire instrument, while item analysis examines the qualities of each
item.
• done during test construction and revision; provides information that can be used to revise
or edit problematic items or eliminate faulty items.
• it reflects the proportion of people getting the item correct, calculated by dividing the
number of individuals who answered the item correctly by the total number of people.
• item difficulty index can range from .00 (meaning no one got the item correct) to 1.00
(meaning everyone got the item correct.
• item difficulty actually indicate how easy the item is because it provides the proportion of
individuals who got the item correct.
Example: in a test where 15 of the students in a class of 25 got the first item on the test
correct.
p = 15 = .60
P S Y C H O L O G I C A L A S S E S S M E N T | 38
25
• the desired item difficulty depends on the purpose of the assessment, the group taking the
instrument, and the format of the item.
▪ calculate by subtracting the proportion of examinees in the lower group from the
proportion of examinees in the upper group who got the item correct or who endorsed
the item in the expected manner.
▪ item discrimination indices can range from + 1.00 (all of the upper group got it right and
none of the lower group got it right) to – 1.00 (none of the upper group got it right and all
of the lower group got it right)
▪ the determination of the upper and lower group will depend on the distribution of scores.
If normal distribution, use the upper 27% for the upper group and lower 27% for the lower
group (Kelly,1939). For small groups Anastasi and Urbina (1997) suggest the range of
upper and lower 25% to 33%.
▪ In general, negative item discrimination indices, particularly and small positive indices are
indicators that the item needs to be eliminated or revised.
• it rests on the assumption that the performance of an examinee on a test item can be
predicted by a set of factors called traits, latent traits or abilities.
• using IRT, we get an indication of an individual’s performance based not on the total score,
but on the precise items the person answers correctly.
P S Y C H O L O G I C A L A S S E S S M E N T | 39
• it suggests that the relationship between examinees’ item performance and the underlying
trait being measured can be described by an item characteristic curve.
Item characteristic curve. A graph, used in item analysis, in which the proportion of examinees
passing a specified item is plotted against total test scores.
• Item response curve is constructed by plotting the proportion of respondents who gave the
keyed response against estimates of their true standing on a uni-dimensional latent trait
or characteristic. An item response curve can be constructed either from the responses of
a large group of examinees to an item, or if certain parameters are estimated from a
theoretical model
Rasch Model – one parameter (item difficulty) model for scaling test items for purposes of
item analysis and test standardization.
- The model is based on the assumption that indexes of guessing and item discrimination
are negligible parameters. As with other latent trait models, the Rasch model relates
examinees’ performances on test items (percentage passing) to their estimated standings
on a hypothetical latent-ability trait or continuum.
P S Y C H O L O G I C A L A S S E S S M E N T | 40
References
Anastasi, Anne and Urbina, Susana (1997). Psychological Testing. 7th edition, New York: McMillan
Publishing.
Aiken, Lewis R. (2000) Psychological Testing and Assessment. Boston: Allyn and Bacon Inc.
Cohen, Ronald Jay & Swerdlik, Mark E. (2010). Psychological Testing and Assessment. New York:
McGraw-Hill Companies, Inc.
Cronbach, Lee J. 1984. Essentials of Psychological Testing. 4th edition. Harper and Row
Publishers. New York.
Del Pilar, Gregorio H. (2015) Scale Construction: Principles and Procedures, Workshop powerpoint
presentation. AASP-PAP, 2015, Cebu City
Drummond, Robert J. (2000). Appraisal Procedure for Counselors and Helping Professional. 4th
edition, New Jersey: Prentice Hall.
Dullas, A.R. (2018). The Development of Academic Self-efficacy Scale for Filipino Junior High School
Students. Frontiers in Education, Educational Psychology section. Front. Educc. 3:1. DOI:
10.3389/feduc.2018.00019
Friedenberg, Lisa (1995). Psychological Testing: Design, Analysis and Use. Boston.Allyn and Bacon
Inc.
Groth-Marnat, Gary (2009) Handbook of Psychological Assessment 5th edition. John Wiley and
Sons Inc.
Kaplan, Robert M. And Sacuzzon, Dennis P. (1997) Psychological Testing: Principles and Application
and Issues. 4th edition, California: Brooks/Cole Publishing Company.
Kellermen, Henry and Burry, Anthony (1991) Handbook of Psychological Testing.2nd edition,
Boston:Allyn and Bacon Inc.
Murphy, Kevin R. and Davidsholer, Charles O. (1998) Psychological Testing: Principles and
Application. New Jersey: Prentice Hall Inc.
Newmark, Charles S. (1985) Major Psychological Assessment Instruments. Boston: Allyn and
Bacon.
Orense, Charity and Jason Parena (2014) Lecture in Psychological Assessment, Review Manual in
RGC Licensure Examination, Assumption College, Makati.
P S Y C H O L O G I C A L A S S E S S M E N T | 41
Suba, Elizabeth S. (2014) Lecture (powerpoint) in Psych 140 Psychological Assessment, CLSU,
Nueva Ecija.
Suba, Elizabeth S. (2013) Lecture (powerpoint) in GU 722 Psychological Assessment , CLSU, Nueva
Ecija
Walsh, w. Bruce and Bets, Nancy E. (1995) Test Assessment. New Jersey: Prentice Hall Inc.
Morrison, J. (2014). DSM-5 Made Easy. The Clinician’s Guide to Diagnosis. The Guilford Press.
New York.
Nolen-Hoeksema, S. (2014). Abnormal Psychology (6th Ed.). Mcgraw-Hill. New York, NY.
Sarason, I.G. & Sarason, B.R. (2005). Abnormal Psychology. The Problem of Maladaptive Behavior
(11th Edition). Pearson Prentice Hall. New Jersey.
Others:
1. Manual of psychological tests
2. Psychological Resources Center – test brochures and test descriptions.
3. www.AssessmentPsychology.com