Essentials of A Good Psychological Test

This document discusses key concepts in psychological testing, including reliability, validity, and potential sources of invalidity. It provides definitions and examples of: 1. Reliability as the consistency and repeatability of test scores. There are various types of reliability including test-retest, alternate forms, split-half, and internal consistency. Tests should aim for reliability coefficients of 0.7 or higher. 2. Validity as the extent to which a test measures what it is intended to measure. There are different types of validity including face, construct, criterion, convergent, and discriminant validity. Establishing validity is important for a test to actually reflect the measured construct. 3. Sources of potential invalid

Uploaded by

mguerrero1001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views

Essentials of A Good Psychological Test

Uploaded by

mguerrero1001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Individual Differences

Intelligence
Essentials of a Good Psychological Test
Last updated:
25 Jul 2004
Reliability - overview
Types of reliability
How reliable should tests be?
Validity
Types of validity
Sources of invalidity
Generalizability
Standardization
Recommended Links
Reliability - overview
Reliability is the extent to which a test is repeatable and yields consistent scores.
Note: In order to be valid, a test must be reliable; but reliability does not guarantee validity.
All measurement procedures have the potential for error, so the aim is to minimize it. An
observed test score is made up of the true score plus measurement error.
The goal of estimating reliability (consistency) is to determine how much of the variability in
test scores is due to measurement error and how much is due to variability in true scores.
Measurement errors are essentially random: a persons test score might not reflect the true
score because they were sick, hungover, anxious, in a noisy room, etc.
Reliability can be improved by:
getting repeated measurements using the same test and
getting many different measures using slightly different techniques and methods.
- e.g. Consider university assessment for grades involve several sources. You would not
consider one multiple-choice exam question to be a reliable basis for testing your knowledge
of "individual differences". Many questions are asked in many different formats (e.g., exam,
essay, presentation) to help provide a more reliable score.
Types of reliability
There are several types of reliability:
There are a number of ways to ensure that a test is reliable. Ill mention a few of them now:
1. Test-retest reliability
The test-retest method of estimating a test's reliability involves administering the test to the
same group of people at least twice. Then the first set of scores is correlated with the
second set of scores. Correlations range between 0 (low reliability) and 1 (high reliability)
(highly unlikely they will be negative!)
Remember that change might be due to measurement error e.g if you use a tape measure to
measure a room on two different days, any differences in the result is likely due to
measurement error rather than a change in the room size. However, if you measure childrens
reading ability in February and the again in June the change is likely due to changes in
childrens reading ability. Also the actual experience of taking the test can have an impact
(called reactivity). History quiz - look up answers and do better next time. Also might
remember original answers.
2. Alternate Forms
Administer Test A to a group and then administer Test B to same group. Correlation between
the two scores is the estimate of the test reliability
3. Split Half reliability
Relationship between half the items and the other half.
4. Inter-rater Reliability
Compare scores given by different raters. e.g., for important work in higher education (e.g.,
theses), there are multiple markers to help ensure accurate assessment by checking inter-
rater reliability
5. Internal consistency
Internal consistence is commonly measured as Cronbach's Alpha (based on inter-item
correlations) - between 0 (low) and 1 (high). The greater the number of similar items, the
greater the internal consistency. Thats why you sometimes get very long scales asking a
question a myriad of different ways if you add more items you get a higher cronbachs.
Generally, alpha of .80 is considered as a reasonable benchmark
How reliable should tests be? Some reliability guidelines
.90 = high reliability
.80 = moderate reliability
.70 = low reliability
High reliability is required when (Note: Most standardized tests of intelligence report reliability
estimates around .90 (high).
tests are used to make important decisions
individuals are sorted into many different categories based upon relatively small
individual differences e.g. intelligence
Lower reliability is acceptable when (Note: For most testing applications, reliability estimates
around .70 are usually regarded as low - i.e., 49% consistent variation (.7 to the power of 2).
tests are used for preliminary rather than final decisions
tests are used to sort people into a small number of groups based on gross individual
differences e.g. height or sociability /extraversion
Reliability estimates of .80 or higher are typically regarded as moderate to high (approx. 16%
of the variability in test scores is attributable to error)
Reliability estimates below .60 are usually regarded as unacceptably low.
Levels of reliability typically reported for different types of tests and measurement devices are
reported in Table 7-6: Murphy and Davidshofer (2001, p.142).
Validity
Validity is the extent to which a test measures what it is supposed to measure.
Validity is a subjective judgment made on the basis of experience and empirical indicators.
Validity asks "Is the test measuring what you think its measuring?"
For example, we might define "aggression" as an act intended to cause harm to another
person (a conceptual definition) but the operational definition might be seeing:
how many times a child hits a doll
how often a child pushes to the front of the queue
how many physical scraps he/she gets into in the playground.
Are these valid measures of aggression? i.e., how well does the operational definition match
the conceptual definition?
Remember: In order to be valid, a test must be reliable; but reliability does not guarantee
validity, i.e. it is possible to have a highly reliable test which is meaningless (invalid).
Note that where validity coefficients are calculated, they will range between 0 (low) to 1
(high)
Types of Validity
Face validity
Face validity is the least important aspect of validity, because validity still needs to be
directly checked through other methods. All that face validity means is:
"Does the measure, on the face it, seem to measure what is intended?"
Sometimes researchers try to obscure a measures face validity - say, if its measuring a
socially undesirable characteristic (such as modern racism). But the more practical point is to
be suspicious of any measures that purport to measure one thing, but seem to measure
something different. e.g., political polls - a politician's current popularity is not necessarily a
valid indicator of who is going to win an election.
Construct validity
Construct Validity is the most important kind of validity
If a measure has construct validity it measures what it purports to measure.
Establishing construct validity is a long and complex process.
The various qualities that contribute to construct validity include:
criterion validity (includes predictive and concurrent)
convergent validity
discriminant validity
To create a measure with construct validity, first define the domain of interest (i.e., what is
to be measured), then construct measurement items are designed which adequately measure
that domain. Then a scientific process of rigorously testing and modifying the measure is
undertaken.
Note that in psychological testing there may be a bias towards selecting items which can be
objectively written down, etc. rather than other indicators of the domain of interest (i.e. a
source of invalidity)
Criterion validity
Criterion validity consists of concurrent and predictive validity.
Concurrent validity: "Does the measure relate to other manifestations of the construct
the device is supposed to be measuring?"
Predictive validity: "Does the test predict an individuals performance in specific
abilities?"
Convergent validity
It is important to know whether this tests returns similar results to other tests which purport
to measure the same or related constructs.
Does the measure match with an external 'criterion', e.g. behaviour or another, well-
established, test? Does it measure it concurrently and can it predict this behaviour?
Observations of dominant behaviour (criterion) can be compared with self-report
dominance scores (measure)
Trained interviewer ratings (criterion) can be compared with self-report dominance
scores (measure)
Discriminant validity
Important to show that a measure doesn't measure what it isn't meant to measure - i.e. it
discriminates.
For example, discriminant validity would be evidenced by a low correlation between between a
quantitative reasoning test and scores on a reading comprehension test, since reading ability
is an irrelevant variable in a test designed to measure quantitative reasoning.
Sources of Invalidity
Unreliability
Response sets = psychological orientation or bias towards answering in a particular
way:
Acquiescence: tendency to agree, i.e. say "Yes. Hence use of half -vely and
half +vely worded items (but there can be semantic difficulties with -vely
wording)
Social desirability: tendency to portray self in a positive light. Try to design
questions which so that social desirability isn't salient.
Faking bad: Purposely saying 'no' or looking bad if there's a 'reward' (e.g.
attention, compensation, social welfare, etc.).
Bias
Cultural bias: does the psychological construct have the same meaning from one
culture to another; how are the different items interpreted by people from
different cultures; actual content (face) validity may be different for different
cultures.
Gender bias may also be possible.
Test Bias
Bias in measurement occurs when the test makes systematic errors in
measuring a particular characteristic or attribute e.g. many say that most
IQ tests may well be valid for middle-class whites but not for blacks or
other minorities. In interviews, which are a type of test, research shows
that there is a bias in favour of good-looking applicants.
Bias in prediction occurs when the test makes systematic errors in
predicting some outcome (or criterion). It is often suggested that tests used
in academic admissions and in personnel selection under-predict the
performance of minority applicants Also a test may be useful for predicting
the performance of one group e.g. males but be less accurate in predicting
the performance of females.
Generalizability
Just a brief word on generalizability. Reliability and validity are often discussed separately but
sometimes you will see them both referred to as aspects of generalizability. Often we want to
know whether the results of a measure or a test used with a particular group can be
generalized to other tests or other groups.
So, is the result you get with one test, lets say the WISC III, equivalent to the result you
would get using the Stanford-Binet? Do both these test give a similar IQ score? And do the
results you get from the people you assessed apply to other kinds of people? Are the results
generalizable?
So a test may be reliable and it may be valid but its results may not be generalizable to other
tests measuring the same construct nor to populations other than the one sampled.
Let me give you an example. If I measured the levels of aggression of a very large random
sample of children in primary schools in the ACT, I may use a scale which is perfectly reliable
and a perfectly valid measure of aggression. But would my results be exactly the same had I
used another equally valid and reliable measure of aggression? Probably not, as its difficult to
get a perfect measure of a construct like aggression.
Furthermore, could I then generalize my findings to ALL children in the world, or even in
Australia? No. The demographics of the ACT are quite different from those in Australia and
my sample is only truly representative of the population of primary school children in the
ACT. Could I generalize my findings of levels of aggression for all 5-18 year olds in the ACT?
No. Because Ive only measured primary school children and there levels of aggression are
not necessarily similar to levels of aggression shown by adolescents.
Standardization
Standardization: Standardized tests are:
administered under uniform conditions. i.e. no matter where, when, by whom or to
whom it is given, the test is administered in a similar way.
scored objectively, i.e. the procedures for scoring the test are specified in detail so that
ant number of trained scorers will arrive at the same score for the same set of
responses. So for example, questions that need subjective evaluation (e.g. essay
questions) are generally not included in standardized tests.
designed to measure relative performance. i.e. they are not designed to measure
ABSOLUTE ability on a task. In order to measure relative performance, standardized
tests are interpreted with reference to a comparable group of people, the
standardization, or normative sample. e.g. Highest possible grade in a test is 100. Child
scores 60 on a standardized achievement test. You may feel that the child has not
demonstrated mastery of the material covered in the test (absolute ability) BUT if the
average of the standardization sample was 55 the child has done quite well (RELATIVE
performance).
The normative sample should (for hopefully obvious reasons!) be representative of the target
population - however this is not always the case, thus norms and the structure of the test
would need to interpreted with appropriate caution.
Recommended Links
What are the essentials of a good psychological testing report on an older adult?
(American Psychological Association)
Factors influencing internal and external validity (Campbell & Stanley, 1963)
How to choose tools, instruments, & questionnaires for intervention research &
evaluation (James Neill, 2004)

Get Social Psychology: The Science of Everyday Life 2nd Edition Jeff Greenberg free all chapters
No ratings yet
Get Social Psychology: The Science of Everyday Life 2nd Edition Jeff Greenberg free all chapters
55 pages
Georjeanna Wilson Doenges SPSS For Research Methods - A Basic Guide W. W. Norton - Company - 2021
No ratings yet
Georjeanna Wilson Doenges SPSS For Research Methods - A Basic Guide W. W. Norton - Company - 2021
308 pages
End of Chapter - Activities - Guffey - EBC - 12e - Ch04
No ratings yet
End of Chapter - Activities - Guffey - EBC - 12e - Ch04
10 pages
Insight Paper On Six Stage Model of Crisis Intervention (James & Gilliland)
100% (2)
Insight Paper On Six Stage Model of Crisis Intervention (James & Gilliland)
3 pages
Bronfenbrenner Model of Development
No ratings yet
Bronfenbrenner Model of Development
1 page
WHAT Screening Questions Should I Ask Children and Adolescents?
No ratings yet
WHAT Screening Questions Should I Ask Children and Adolescents?
2 pages
Social Work MCQs For Practice Set 1
60% (5)
Social Work MCQs For Practice Set 1
3 pages
Research in Psychology
From Everand
Research in Psychology
Connor Whiteley
No ratings yet
Interpreting Regression Output
No ratings yet
Interpreting Regression Output
16 pages
CV of Sarfraz Ahmad Mayo
No ratings yet
CV of Sarfraz Ahmad Mayo
3 pages
Issues and Debates A Levels
No ratings yet
Issues and Debates A Levels
16 pages
Psychopharmacology Is The Study of Drug-Induced Changes in Mood, Sensation, Thinking
No ratings yet
Psychopharmacology Is The Study of Drug-Induced Changes in Mood, Sensation, Thinking
5 pages
Engineering Psychology Handbook
100% (1)
Engineering Psychology Handbook
117 pages
D. Behavior and Attitudes
No ratings yet
D. Behavior and Attitudes
19 pages
Case Study Assignment 1
No ratings yet
Case Study Assignment 1
18 pages
2015 Johnson, L.L. (2015) - Rethinking Parental Involvement A Critical Review of The Literature.
100% (1)
2015 Johnson, L.L. (2015) - Rethinking Parental Involvement A Critical Review of The Literature.
14 pages
Chapter 9 Introduction To The T Statistic
No ratings yet
Chapter 9 Introduction To The T Statistic
32 pages
Navon Experiment
100% (1)
Navon Experiment
21 pages
Personality Assessment Tests Topic
No ratings yet
Personality Assessment Tests Topic
6 pages
Subjectivity, Intersubjectivity and Epistemic Complementation
No ratings yet
Subjectivity, Intersubjectivity and Epistemic Complementation
18 pages
Cars Childhood Autism Rating Scale
No ratings yet
Cars Childhood Autism Rating Scale
26 pages
Chapter - 5 - Measurement in Research
No ratings yet
Chapter - 5 - Measurement in Research
24 pages
Power of Test
100% (1)
Power of Test
3 pages
3 SCTvsADHD TheSecondAttentionDisorder PDF
No ratings yet
3 SCTvsADHD TheSecondAttentionDisorder PDF
31 pages
The Conscious and Unconscious Mind
No ratings yet
The Conscious and Unconscious Mind
15 pages
Proxemics - Wikipedia
No ratings yet
Proxemics - Wikipedia
10 pages
Intuition - Wikipedia
No ratings yet
Intuition - Wikipedia
8 pages
C. George Thomas 2021 Research Methodology and Scientific Writing. 2nd Edition.
No ratings yet
C. George Thomas 2021 Research Methodology and Scientific Writing. 2nd Edition.
14 pages
Exp Psych Week 2
No ratings yet
Exp Psych Week 2
122 pages
Key People Review Cheat Sheet
No ratings yet
Key People Review Cheat Sheet
7 pages
Sexuality Education
No ratings yet
Sexuality Education
45 pages
Reporting Statistics in Psychology
No ratings yet
Reporting Statistics in Psychology
7 pages
Psychological Test - Preety Shekhar
100% (2)
Psychological Test - Preety Shekhar
20 pages
Understanding Attention: Cognitive Psychology
No ratings yet
Understanding Attention: Cognitive Psychology
28 pages
QR Notes
100% (1)
QR Notes
88 pages
4-Ethics in Research
No ratings yet
4-Ethics in Research
17 pages
2023 Basics and Biopsychosocial Model - Stage 1 Psychology
No ratings yet
2023 Basics and Biopsychosocial Model - Stage 1 Psychology
27 pages
240E3A - Statistics For Behavioral Science
No ratings yet
240E3A - Statistics For Behavioral Science
3 pages
Role of Self Esteem in Development of Aggressive Behavior Among Adolescents
No ratings yet
Role of Self Esteem in Development of Aggressive Behavior Among Adolescents
4 pages
Psychometric Report For One Individual Using The IPIP and MSECQ
No ratings yet
Psychometric Report For One Individual Using The IPIP and MSECQ
15 pages
Partial Correlation
No ratings yet
Partial Correlation
2 pages
Introduction to psychological assessment in the South African Context 4th Edition Cheryl Foxcroft - Get the ebook in PDF format for a complete experience
100% (1)
Introduction to psychological assessment in the South African Context 4th Edition Cheryl Foxcroft - Get the ebook in PDF format for a complete experience
61 pages
Characteristics of A Good Test: Content Validity
No ratings yet
Characteristics of A Good Test: Content Validity
23 pages
Psycho Pharmacology.
No ratings yet
Psycho Pharmacology.
10 pages
Assignment 1 - PYC4808
No ratings yet
Assignment 1 - PYC4808
5 pages
Social Psychology Research Methods
No ratings yet
Social Psychology Research Methods
16 pages
ACCS Feedback Form Assessment of Core CBT Skills
100% (1)
ACCS Feedback Form Assessment of Core CBT Skills
9 pages
Parent Child Bed Sharing
No ratings yet
Parent Child Bed Sharing
24 pages
Statistics-: Data Is A Collection of Facts
No ratings yet
Statistics-: Data Is A Collection of Facts
3 pages
-Am I Autistic--- Main Page
No ratings yet
-Am I Autistic--- Main Page
3 pages
[FREE PDF sample] (Ebook) Psychology and Crime: understanding and tackling offending behaviour by Francis Pakes, Jane Winstone ISBN 9781134021352, 9781843922599, 9781843922605, 9781843926603, 1134021356, 1843922592, 1843922606, 1843926601 ebooks
100% (1)
[FREE PDF sample] (Ebook) Psychology and Crime: understanding and tackling offending behaviour by Francis Pakes, Jane Winstone ISBN 9781134021352, 9781843922599, 9781843922605, 9781843926603, 1134021356, 1843922592, 1843922606, 1843926601 ebooks
71 pages
Types of Psychological Test: 1. Achievement and Aptitude Tests
No ratings yet
Types of Psychological Test: 1. Achievement and Aptitude Tests
6 pages
80+ Jobs in Psychology
No ratings yet
80+ Jobs in Psychology
12 pages
2-Research Methods in Psychology
No ratings yet
2-Research Methods in Psychology
7 pages
Risb Scoring Manual
0% (1)
Risb Scoring Manual
2 pages
Predispositions of Quantitative and Qualitative Modes of Inquiry
100% (1)
Predispositions of Quantitative and Qualitative Modes of Inquiry
4 pages
Berklifespan Sample Chapter8 PDF
100% (1)
Berklifespan Sample Chapter8 PDF
35 pages
Download Complete (eBook PDF) Research Methods in Psychology: Evaluating a World of Information Second Edition PDF for All Chapters
67% (3)
Download Complete (eBook PDF) Research Methods in Psychology: Evaluating a World of Information Second Edition PDF for All Chapters
51 pages
Using The Wisc Iv and The Wiat Ii
No ratings yet
Using The Wisc Iv and The Wiat Ii
22 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
6 pages
STC Report Hard
No ratings yet
STC Report Hard
6 pages
Exam Prep for:: Ethical Conflicts in Psychology
From Everand
Exam Prep for:: Ethical Conflicts in Psychology
Mzn Lnx
No ratings yet
Loans Collection List - SEPT 2021
No ratings yet
Loans Collection List - SEPT 2021
10 pages
Jeopardy Review - AP Bio Cell Communication PDF
No ratings yet
Jeopardy Review - AP Bio Cell Communication PDF
34 pages
Class VI(Chapter 3 -Formulas and Function in Excel 2016)
No ratings yet
Class VI(Chapter 3 -Formulas and Function in Excel 2016)
2 pages
110-introduction-to-threat-hunting
No ratings yet
110-introduction-to-threat-hunting
40 pages
Chimassorb 944 Tds
No ratings yet
Chimassorb 944 Tds
3 pages
Physical-Science-Gr-12-Acids-and-Bases-Worksheet-Memo
No ratings yet
Physical-Science-Gr-12-Acids-and-Bases-Worksheet-Memo
4 pages
MM - Lecture 1 - Introduction To Marketing Management
No ratings yet
MM - Lecture 1 - Introduction To Marketing Management
34 pages
I Have Here Pictures of Several Objects. Your Task Is To Share With Us Your Abilities by Choosing One Picture
No ratings yet
I Have Here Pictures of Several Objects. Your Task Is To Share With Us Your Abilities by Choosing One Picture
6 pages
Lot W3
0% (1)
Lot W3
3 pages
Metalsa-Supplier-Manual-Rev-4-1
No ratings yet
Metalsa-Supplier-Manual-Rev-4-1
58 pages
PRESENTER 6 Stages of Social Research
No ratings yet
PRESENTER 6 Stages of Social Research
14 pages
Point of View 3 Day Lesson Plan
100% (1)
Point of View 3 Day Lesson Plan
3 pages
SQL Queries Part1
No ratings yet
SQL Queries Part1
42 pages
DenA8261x060 SampleSequencer（硅表说明书）
No ratings yet
DenA8261x060 SampleSequencer（硅表说明书）
2 pages
Workout Jurgen Appelo
100% (1)
Workout Jurgen Appelo
19 pages
How To Approach Writing A Research Proposal
No ratings yet
How To Approach Writing A Research Proposal
7 pages
Hamlet III.i.56 89 (To Be or Not To Be) - Study Guide
No ratings yet
Hamlet III.i.56 89 (To Be or Not To Be) - Study Guide
9 pages
(Ebook) Learning Together: Peer Tutoring in Higher Education by Nancy Falchikov ISBN 9780415182607, 0415182603 - The ebook is ready for download to explore the complete content
100% (1)
(Ebook) Learning Together: Peer Tutoring in Higher Education by Nancy Falchikov ISBN 9780415182607, 0415182603 - The ebook is ready for download to explore the complete content
46 pages
FinalBook3 ManagementDynamicsCOVIDPandemic
No ratings yet
FinalBook3 ManagementDynamicsCOVIDPandemic
369 pages
Chap 3 - RESEARCH METHODOLOGY
No ratings yet
Chap 3 - RESEARCH METHODOLOGY
6 pages
Benzyl Alcohol Route
No ratings yet
Benzyl Alcohol Route
7 pages
Nuclear Chemistry Sem-6
No ratings yet
Nuclear Chemistry Sem-6
46 pages
Transmission Mercedes LF 408 G&NBSP (309055)
No ratings yet
Transmission Mercedes LF 408 G&NBSP (309055)
1 page
Gali Proposal
No ratings yet
Gali Proposal
20 pages
Angelus V Overview Brochure 0875
No ratings yet
Angelus V Overview Brochure 0875
2 pages
Ged 101 - Understanding The Self (3 Units)
No ratings yet
Ged 101 - Understanding The Self (3 Units)
2 pages
Purposive Communication
No ratings yet
Purposive Communication
77 pages
Abs (Basf Terluran Gp22)
No ratings yet
Abs (Basf Terluran Gp22)
1 page

Essentials of A Good Psychological Test

Uploaded by

Essentials of A Good Psychological Test

Uploaded by

Individual Differences

You might also like