0% found this document useful (0 votes)
52 views

Integrity Tests Facts and Unresolved Issues

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Integrity Tests Facts and Unresolved Issues

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/232540259

Integrity tests: Facts and unresolved issues

Article  in  American Psychologist · February 1994


DOI: 10.1037/0003-066X.49.2.112

CITATIONS READS
115 2,432

2 authors, including:

Wayne J. Camara
LSAC
63 PUBLICATIONS   1,343 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Score comparability and fairness across different digital devices View project

All content following this page was uploaded by Wayne J. Camara on 01 September 2015.

The user has requested enhancement of the downloaded file.


Integrity Tests
Facts and Unresolved Issues
Wayne J. Camara and Dianne L. Schneider

This article describes two independent reports, recently visory and consultant capacities (U.S. Congress, 1990).
completed by the American Psychological Association In addition, a task force of the APA established in con-
(APA) and the U.S. Congress Office of Technology As- junction with Division 5 (Division of Evaluation, Mea-
sessment, that examined many scientific and measure- surement, and Statistics) and Division 14 (The Society
ment issues concerning integrity testing (e.g., validity, cri- for Industrial and Organizational Psychology) completed
terion relevance). Background data are offered on a variety a 20-month study of integrity tests (Goldberg etal., 1991),
of tests collected by a survey of test publishers, providing herein referred to as the APA report.
a view of the industry's scope (e.g., test audience, user Our purpose is threefold. First we discuss the in-
screening, score reporting) not available elsewhere. In the creased demand for integrity testing and findings reported
light of APA's Ethical Principles for Psychologists and by OTA and APA concerning the scientific underpinnings
Code of Conduct (APA, 1992), unresolved concerns are of these tests. Second, we present descriptive data con-
addressed that have a wide range of implications for the cerning the development and use of integrity tests. Finally,
profession of psychology, the testing industry, and public we discuss unresolved issues surrounding current prac-
policy (e.g., cutting scores, user screening and training, tices in integrity testing and their implications on public
and test marketing practices). policy and the psychological professional, in light of APA's
Ethical Principles of Psychologists and Code of Conduct
(APA, 1992).

I ntegrity tests, or honesty tests as they are often referred


to, are paper-and-pencil instruments for personnel se-
lection that are used to predict dishonesty or coun-
terproductivity. These tests are composed of items that
query job applicants about their attitudes toward theft
Demand for Data on integrity Tests
Estimates of annual economic losses to American busi-
nesses from employee theft range from $ 15 billion to $25
billion per year (Shephard & Duston, 1987), with as much
and inquire about any past thefts. Over the past decade, as 30% of all business failures attributed to employee
integrity testing has greatly increased and has become a theft (American Management Association, 1977). With
mainstream selection practice for a wide variety of jobs such compelling data and with the virtual prohibition
in which employees have access to cash or merchandise against polygraph use in employment selection,1 there is
or perform security functions (Goldberg, Grenier, Guion, little doubt as to why an estimated 6,000 organizations
Sechrest, & Wing, 1991; Guastello & Rieke, 1991; administer up to 5,000,000 integrity tests annually
O'Bannon, Goldinger, & Appleby, 1989). (Sackett & Harris, 1984; U.S. Congress, 1990).
In the past five years, integrity testing has received Although use of integrity tests had become rampant
more scrutiny by the media, policymakers, and the public by 1988, information about them remained in short sup-
than has any other class of psychological tests. Much of
this increased interest and use of integrity tests followed Nadine M. Lambert served as action editor for this article.
the passage of the Employee Polygraph Protection Act of Wayne J. Camara, American Psychological Association, Washing-
1988. Many of the technical flaws and misuses of poly- ton, DC; Dianne L. Schneider, George Mason University.
We gratefully acknowledge the assistance of Lewis Goldberg, G.
graphs for employment screening (see Saxe, Dougherty, Harris, Paul Sackett, and three anonymous reviewers who provided
& Cross, 1985; U.S. Congress, 1983) have, in part, in- comments on drafts of this article. We also gratefully acknowledge the
creased skepticism that any psychological instrument can contributions of psychologists who served on APA's Task Force on the
accurately identify individuals who will be dishonest in Prediction of Dishonesty and Theft in Employment Settings and who
contributed to the OTA study on this subject.
a specific work environment (Saxe, 1991). Concerns about
Correspondence concerning this article should be addressed to
the reliance on integrity test scores in personnel selection Wayne J. Camara, Science Directorate, American Psychological Asso-
decisions, the potential restriction of opportunities for ciation, 750 First Street, NE, Washington, DC 20002-4242. Electronic
job applicants, invasion of individuals' privacy, labeling, mail may be sent to [email protected].
and basic psychometric issues of validity spurred inves- 1
tigations by Congress and the American Psychological The Employee Polygraph Protection Act of 1988 prohibits the
use of polygraphs in employment screening. Exemptions exist for federal,
Association (APA). The U.S. Congress Office of Tech- state, and local government employees, national defense and security
nology Assessment (OTA) completed a two-year study of agencies, and security personnel in nuclear power plants, water supply
integrity testing that included several psychologists in ad- facilities, public transportation, and financial securities businesses.

112 February 1994 • American Psychologist


Copyright 1994 by the American Psychological Association, Inc. 0003-066X/94/S2.00
Vol. 49, No. 2, 112-119
ply. At the onset of the OTA and APA studies, only a ensured the confidentiality of data on sales volume and
handful of research studies had been published in tradi- client base. Nonrespondents were mailed a second survey.
tional peer-reviewed journals in psychology and mea- Some integrity test publishers communicated directly with
surement. Furthermore, many integrity tests had been us to request more information or express their concerns
derived from the polygraph or developed by polygraphers about the study.
(Sackett, Burris, & Callahan, 1989) as opposed to main- Eventually, 30 (65%) of the 46 publishers responded
stream psychological tests, which are typically developed to the request for information in some form. Information
and researched by psychologists. In addition to general on 37 separate instruments from these 30 publishers was
concerns (e.g., individuals' privacy, labeling, and the po- reviewed; from these, the task force decided that 34 tests
tential for test misuse) common to many other forms of from 29 different publishers were relevant for inclusion
assessments, critics of integrity tests voiced additional in the study. In contrast, Sackett and Harris (1984) iden-
concerns about the technical adequacy and validity of tified 24 integrity tests. The increase in the number of
these tests, much as they had with the polygraph. available integrity tests during this four-year interval, from
Congressional concern over integrity tests emanated 24 to 46, illustrates the substantial growth in this testing
primarily from hearings during the mid-1980s on poly- area. Few publishers provided all of the requested data
graph use and resulted in a 1989 request that the OTA and materials. Of the 34 tests included in the APA study,
undertake a study. Paralleling these concerns on Capitol tests and completed surveys were furnished for 24 tests,
Hill, psychologists cited the lack of consensus among re- and test manuals were provided for 22 tests.
searchers concerning the adequacy of the scientific evi-
dence and ethical use of procedures for predicting dis- Integrity Tests and Their Use Today
honesty and theft in employment settings. In 1989, APA Data from the questionnaires provide a unique profile of
established a task force and asked its members to "gather the use of integrity tests that has not been previously
and examine data relevant to the scientific and social- reported. This includes information from test publishers
policy considerations associated with the development and on the construct of interest, the intended test audience,
use of predictors of dishonesty, theft, and related behav- and user services. Professional and policy issues arising
iors" (Goldberg et al, 1991, Appendix A). from this examination of integrity testing are discussed
To collect data for this task force, we developed a later.
survey and request for technical materials (e.g., tests,
manuals), published and unpublished research reports, Test Titles
and additional information (e.g., marketing brochures,
test user training materials) from integrity test publishers. Researchers, employers, test users, policymakers, and the
The task of identifying and locating integrity tests and public are principally concerned with the question, What
integrity test publishers proved to be difficult because do these tests measure? In 1984 the prevalent instruments
many integrity tests had been marketed only recently and were typically labeled as honesty tests (Sackett & Harris,
such tests were not yet listed in traditional testing refer- 1984). Today, publishers use a variety of labels for the
ences. This still appears to be a problem: As of 1992, constructs measured by integrity tests (see Table 1).
only nine integrity tests had been reviewed in the existing Although most test publishers and researchers con-
test references and reviews, such as the Tenth Mental tinue to refer to these instruments as integrity tests,
Measurements Yearbook (Conoley & Kramer, 1989) or Goldberg et al. (1991) noted that most of these constructs
Test Critiques (Keyser & Sweetland, 1991). In addition
to searching reviews and references for tests, we reviewed
paid advertisements in personnel, law enforcement, and Table 1
security publications; exhibitor directories from security Publisher Self-Reports of Constructs Measured
and retail conventions; and databases for magazine, by 24 Integrity Tests
newspaper, and research articles. Announcements were
Construct No. of Tests
also placed in the APA Monitor and APA divisional news-
letters. Counterproductivity 15
Forty-six separate publishers or developers of integ- Honesty 9
rity tests were identified through this search. Each pub- Job performance 9
lisher was sent a 26-item survey, along with a cover letter Attitudes 8
explaining the purpose of the study. Publishers were in- Integrity 6
formed that the APA was conducting a study of integrity Reliability 4
tests and their use, and they were asked to complete the Other0 12
survey and supply additional testing materials (e.g., test Note. Most respondents checked more than one option.
booklets, manuals, marketing materials, norms) and re- ° Includes: absenteeism/tardiness, admissions of dishonesty and drug abuse,
credibility, dependability/conscientiousness, emotional stability, interview guide,
search reports that were available on their instruments. managerial/sales/clerical potential, predictiveness, probability of voluntary short-
The survey included items on test design, the constructs) term turnover, service orientation, stress tolerance, substance abuse, substance
being measured, sales and administration policies, and abuse resistance, and supervisor ratings of performance.
scoring and interpretation procedures. The cover letter

February 1994 • American Psychologist 113


can plausibly be subsumed under the broad concept of Table 2
trustworthiness. Regardless of the diversity of the con- Information on Scores Provided to Integrity Test Users
struct, concerns remain that individuals failing the test
(i.e., falling below some cutting score) will be typecast as No. of No. of
Information tests Information tests
dishonest. Critics say labeling is a particular concern with
these tests because of the negative connotation of the dis- Scores Reference group (for
honesty construct. Subtest 12 derived scores)
Total 8 Local 5
Test User Screening and Training
Other" 14 National 17
Of publishers responding to the APA survey, 64% reported Derived scores Other 7
that they do not use test user qualifications forms to screen Percentile 11 Available norms
potential purchasers, with 56% of these using no formal Raw 12 Gender 5
screening practices of any kind. Those publishers who Standard 5 Industry 4
reported using any quality assurance methods before Otherb 6 Occupation 5
Race 5
selling their tests primarily relied on some form of test Other 5
user training (e.g., self-study, video, or face-to-face) or an
agreement by the purchaser to adhere to "sound testing Note. Based on surveys of 24 tests. Most respondents checked more than one
practices," some of which are detailed in the testing ma- option.
° Includes discriminant equation results, interpretations, risk zones for subscales,
terials. Only 5% of the publishers required either a grad- scores on independent scales, and "simply tell how applicant responded."
b
uate degree or specialized licensure or certification for Includes no score, risk ranges.
test users.
Test Scoring and Reporting
The majority of tests are computer-scored by the test narrative reports because users are concerned with esti-
publisher (by modem, mail-in, or telephone) or scored mating risk of theft, not with more complex phenomena
on-site using software provided to the user by the pub- requiring clinical interpretation.
lisher. Only three tests offer hand-scoring services that
permit test users to determine the scoring of any subtests Reports on Integrity Tests: Many
or to develop individualized interpretations; tests offering Contrasts, Few Comparisons
these services are not usually considered integrity tests
but are considered more general measures of personality The APA and OTA studies differed in foci, particularly
(e.g., Personnel Reaction Blank, Gough, 1954). their intended purposes, evidentiary bases, and conclu-
After the tests are scored by the publisher, employers sions regarding the implications of using integrity tests
receive results in various formats and levels of detail. for employment screening. OTA's report was written to
Table 2 illustrates the methods used to report scores to guide Congress in determining what, if any, legislative
employers. The majority of publishers report subtest action on integrity tests was warranted. It focused on
scores and a total score. Publishers of overt tests (see public policy issues, such as potential errors in selection
Sackett & Harris, 1984) generally report standard (or decisions, the effect these tests have on labeling, and in-
percentile) scores in terms of risk levels (e.g., for theft, dividuals' reduced employment opportunities—concerns
substance abuse). One test publisher uses a standardized that might generalize to all types of personnel selection
written interview rather than a structured test and does processes. In contrast, APA's report was produced by
not provide any scores or data but reports applicants' psychologists for psychologists and test users. It targeted
responses verbatim. scientific and technical issues in integrity testing (e.g.,
Publishers of 60% of the tests in this sample, and defining the construct; identifying appropriate criteria;
nearly all of the overt integrity tests, provide cutoff scores and examining the validity, reliability, and utility of the
to employers in some form. Nine publishers responding tests) and dealt with policy indirectly, by examining only
to the survey indicated that they provide a general cutting those issues for which psychology can offer special ex-
score, with six others reporting that a specific cutting score pertise.
could be provided. Publishers of 10 tests said that they The most obvious—and consequential—difference
could assist a user in setting a cutting score, and 11 in- between the reports is the evidentiary basis for their con-
dicated that cut scores were inappropriate for most uses clusions. OTA narrowed their research base to five pre-
of the test. dictive validity studies, using detected theft or a close
Computer-generated narrative reports are available proximity as a criterion. Reasoning that such studies
for nearly all tests, which vary in the amount of detail provide the most compelling line of evidence, OTA dis-
provided. Publishers typically offer very basic reports missed studies using contrasted groups, self-reports of
(containing scale scores and risk levels) with more detailed theft, organizational level counterproductivity and
reports (e.g., specific cut scores by industry, critical items) shrinkage studies, and predictive studies using less direct
offered at additional cost. Unlike clinical personality as- theft criteria that had been cited in other reviews (e.g.,
sessments, integrity tests are not accompanied by detailed O'Bannon et al., 1989; Sackett et al., 1989). Each of the

114 February 1994 • American Psychologist


five remaining studies was then reviewed in detail and nitive ability tests and other commonly used selection
subsequently criticized for methodological flaws (e.g., low methods. The OTA study and the three previous quali-
base rate for criteria, classification errors) common to tative reviews of integrity testing (O'Bannon et al., 1989;
most solitary studies (Goldberg et al., 1991). OTA decided Sackett et al., 1989; Sackett & Harris, 1984) did not con-
that these reports were inconclusive in supporting or dis- sider evidence for total job performance in evaluating the
missing the assertion that integrity tests can reliably pre- utility of integrity tests.
dict dishonest behavior in the workplace. We have not identified any studies that indicated the
APA's task force reviewed nearly 300 studies using incremental validity gained from incorporating integrity
a wide spectrum of validity designs. Goldberg et al. (1991) tests in a selection system. Ones, Viswesvaran, & Schmidt
agreed that methodological flaws could be found in all (1993) reported a mean operational validity of .34 with
studies because of difficulties inherent in conducting ap- overall job performance and .47 with counterproductive
plied research, especially when using detected theft in the behaviors. It is probable that integrity tests tap some as-
workplace as the criterion. However, their conclusions pects of job performance not captured by cognitive ability
were based on a pattern of evidence that emerged from tests and unstructured interviews, yet estimates of the
examining a large number of independent validation magnitude of any increase in validity or utility have not
studies that used substantially different designs, which been examined across tests and jobs.
may have collectively compensated for methodological In a negative tone, the OTA (U.S. Congress, 1990)
flaws found in any one type of design. The APA report reported that 95.6% of integrity test takers who fail are
concluded that "for those few tests for which validity in- incorrectly classified as dishonest (e.g., false positives).
formation is available, the preponderance of the evidence The OTA reviewed five studies using detected theft as the
is supportive of their predictive validity" (Goldberg et al., sole criterion; the percentage of the sample that was ac-
1991, p. 26). tually detected for theft averaged 3%. That figure is sub-
Finally, the APA and OTA reports differ in the mag- stantially lower than theft base rates reported by Hollinger
nitude of criticisms they directed at integrity tests and and Clark (1983) and Slora (1989), which ranged from
resultant implications for public policy decisions. APA 28% to 62% when using self-report data.
did not evaluate the tests "against absolute levels of va- Unfortunately, the arguments concerning false pos-
lidity" (Goldberg et al., 1991, p. 7) but comparatively itive classification and base rates of theft are largely ir-
against alternative selection measures. The report noted relevant to the validity of integrity tests. Martin and Terr is
a range of social and scientific drawbacks of alternatives (1990) noted that "all else being equal, the false positive
to integrity tests. Unstructured interviews have much rate is simply a function of the [test's] validity" (p. 52).
lower reliability and validity than do integrity tests. As test validity increases, false positive and false negative
Structured integrity interviews (either in person, tele- classification errors will be reduced. However, with theft
phone, or computer-based) contain items that are gen- base rate estimates ranging from 3% (OTA, U.S. Congress,
erally identical to those found in paper-and-pencil integ- 1990) to 62% (Slora, 1989), estimates of the overall utility
rity tests. Although such interviews offer a useful means of these tests are dubious.
for an applicant to explain any contradictory or partic- Notwithstanding these basic differences concerning
ularly troublesome responses on integrity tests, they ap- the validity and utility of integrity tests, we did find several
pear to have limited validity and utility as a stand-alone points on which the two reports do agree. First, there is
screening instrument. Background checks are expensive little agreement on the behaviors of interest and the pre-
and are becoming less accurate and informative as em- cise definition of integrity. The construct is overly broad
ployers shy from providing more than a verification of and ill defined, making evidence of construct validity
employment dates because of legal concerns. Surveillance more necessary and burdensome to produce. Second,
efforts and more comprehensive background investiga- there is overreliance on cut scores. Problems of labeling
tions, of the type often used by intelligence agencies, are and false positive errors that have dogged integrity tests
extremely costly and invasive. The use of such procedures are exacerbated by the use of cutting scores. Both reports
for screening a large number of applicants for part-time recommend that the standard error of measurement and
employment in retail or service positions may be consid- overlapping score ranges be reported, but they note that
ered too offensive by many applicants. Indeed, Stone, tests users may not be qualified to interpret such data.
Stone, and Hyatt (1989) found that subjects rated lie de- Finally, there is the proprietary nature of testing. It is
tectors, drug tests, medical exams, and background checks unlikely that the credibility and quality of research will
as more invasive than written integrity tests. increase until publishers open up the research process
Several studies have discussed the utility of honesty and encourage independent research.
tests when used in combination with additional selection
tools (Goldberg et al., 1991). Although there is no evi- Professional and Public Policy
dence to suggest that integrity tests are widely used to Implications
predict individuals' total job performance, the APA report The continued use of integrity tests for employment
notes evidence for the utility of honesty tests for selection screening raises professional and policy issues for psy-
in some job families. Of greater interest is the incremental chology. As professionals, psychologists have a responsi-
validity such tests may offer above that derived from cog- bility to reduce the misuse of integrity tests. Each of the

February 1994 • American Psychologist US


following issues can best be discussed in light of several [b]); and (e) retain responsibility for the interpretation
principles (2.02, 2.05, 2.06, and 2.08) concerning assess- and use of assessments (including automated services;
ment in the Ethical Principles of Psychologists and Code Principle 2.09). Although these principles are relevant
of Conduct (APA, 1992). for all test users, the Ethical Principles can be enforced
Two of these principles address a psychologist's re- only with APA members. Similarly, psychologists and all
sponsibility to prevent misuse of assessments and infor- testing professionals responsible for the development,
mation from assessments by unqualified persons. Prin- marketing, and sale of assessment products should con-
ciple 2.02 (b) instructs psychologists to refrain from re- sider these criteria when determining the qualifications
leasing test results or data to persons not qualified to use of all test users.
such information. This presents several dilemmas for the Most integrity tests are classified as proprietary by
56% of integrity test publishers who report using no for- their publishers. The APTP (1990) distinguishes between
mal screening practices for test users. proprietary and nonproprietary tests.2 Nonproprietary
Demands for cutting scores by employers and the tests are sold to users whom the publisher deems qualified
willingness of test publishers to produce them were cited to administer and interpret the test; publishers sell the
as a major problem by both APA and OTA. Publishers test, scoring keys, and supporting materials to users pos-
often appear to have set cutting scores for some tests on sessing the necessary background (e.g., a doctoral degree)
purely arbitrary grounds, then created failing and passing or experience (e.g., specialized training). In contrast, pro-
categories (Goldberg et al., 1991) based on this scoring prietary test users purchase a package deal from a pub-
dichotomy. Scrivner (1991) pointed out that the integrity lisher, who provides continuous client service. Publishers
testing industry's own association specifies in its guide- maintain technical responsibility for norming, scoring,
lines (Association of Personnel Test Publishers [APTP], validating, and interpreting tests for the user and providing
1990) that test users should know how a cutting score ongoing psychological expertise to clients (APTP, 1990).
was derived. Yet, given the complexities of proper pro- Users are responsible for test administration, security of
cedures to set cutting scores (see Rorer, Hoffman, & Hsieh, test scores, and final decisions based on the test scores.
1966), it is unrealistic to impose such responsibilities on According to the APTP, "users of proprietary tests are
the untrained persons who administer integrity tests not required to be credentialed by the publisher, although
(Cascio, Alexander, & Barrett, 1988). Similarly, APA's in general the more sophisticated users will make better
report recommended that standard error of measurement use of such tests than users who do not have a compre-
and score ranges be reported but noted that the typical hensive understanding of a test's psychometric properties"
test user may not be qualified to interpret anything more (p. 6). However, publishers with minimal or no screening
than dichotomous categories (e.g., hire vs. do not hire). procedures for test purchases and test services cannot
When the competence level of test users is not known, evaluate the competence of users or risk of misuse for
test publishers will have difficulty inferring the appropriate their services.
amount and level of information that should be provided. A working group of the Joint Committee on Testing
Principle 2.06 (APA, 1992) prohibits psychologists Practices (Eyde, Moreland, Robertson, Primoff, & Most,
from promoting (e.g., through marketing or sales) the use 1988) made no distinctions between proprietary and
of tests by unqualified persons. Integrity test publishers nonproprietary tests in terms of potential test misuse or
might note that test administration is often straightfor- in qualifications for test use. Certainly, ensuring that test
ward and that interpretation requires minimal under- users are trained appropriately is a common problem in
standing of categorical (e.g., low-, moderate-, or high-risk many forms of assessment: clinical, educational, and em-
categories) or dichotomous data. One might argue that ployment. The Standards for Educational and Psycho-
personnel responsible for the administration and use of logical Testing (APA, 1985) state that "the ultimate re-
integrity tests manage other facets of the employment sponsibility for test use lies with the test user (p. 41).
screening process, such as interviews, reference checks, However, risk of misuse can be reduced if those respon-
and possibly some cognitive ability tests. Yet, other prin- sible for the sale of test materials and provision of services
ciples and professional standards (e.g., APA, 1985) suggest take some minimal steps, such as screening test users,
that the use of tests and test data require a level of tech- requiring organizations that purchase products and ser-
nical knowledge of statistics, measurement, and assess- vices to limit access to only those persons considered
ment that many integrity tests users lack. For example, competent by the publisher, and requiring test use training
when psychologists serve as test users, they should (a) be for organizations that employ persons who do not satisfy
familiar with the reliability, validity, standardization, or appropriate criteria for test use.
outcome studies of the techniques (Principle 2.04 [a]); Publishers are justified in making distinctions about
(b) recognize limits to the certainty of predictions made the level of competence required for using different types
about individuals (Principle 2.04 [b]); (c) consider various of test data. Few would argue that users who derive a
test factors and individual characteristics that might limit diagnosis and treatment plan for a client using data from
the accuracy of interpretations, including automated in-
terpretations (Principles 2.04 [c] and 2.05); (d) select 2
Although integrity tests are not the only type of proprietary test,
scoring and interpretation services on the basis of evidence we use APTP's definitions of proprietary and nonproprietary for their
of the validity and other considerations (Principle 2.08 convenience and clarity.

116 February 1994 • American Psychologist


multiple scales on a personality test require substantially However, there is also general agreement that integrity
more expertise and training than users who receive data tests can predict a number of outcomes of interest to
reported in dichotomous (e.g., hire vs. do not hire) cat- employers and that they have levels of validity comparable
egories on a preemployment assessment report. However, to many other kinds of tests used in employment settings
the responsibilities for proper test use (e.g., Eyde et al., (Goldberg et al., 1991; Ones et al., 1993; Sackett et al.,
1993) and the importance of decisions based on assess- 1989).
ment data may be no less important in employment set- The Appendix summarizes the recommendations of
tings than in clinical practice. Kay (1991) doubted that the task force and offers their conclusions regarding areas
the unsophisticated test user will ever be competent in need of further research, development, and improve-
enough to transform continuous test data into dichoto- ment. These include scientific problems, such as estab-
mous selection decisions. He argued that many of the lishing construct validity, and extend to issues such as
problems attributed to integrity tests result from end users safeguarding the proper use of test scores and verifying
who are largely unqualified to evaluate and interpret test that tests live up to publishers' promotional claims. In
scores. Kay urged publishers to adopt guidelines for the total, these recommendations demonstrate a commitment
sale and use of integrity tests, similar to those currently to protecting the rights of test takers and ensuring that
used in qualifying users of other broadband-width per- honest persons are not wrongfully denied job opportu-
sonality inventories. nities.
Psychologists are ethically bound (APA, 1992) to Integrity testing would benefit if publishers encour-
consider various test factors (e.g., situational variables, aged more research on potential moderators of the validity
surveillance systems), individual characteristics, and of integrity tests, on divergent and convergent validity,
limitations of tests when making recommendations and and on comparisons with other measures. Integrity testing
to ensure the integrity and security of tests and results. could benefit from the growing body of research on per-
However, most users of integrity and employment tests sonality constructs by establishing empirical linkages to
are not psychologists (a situation similar to the admin- the Big Five personality constructs (see Barrick & Mount,
istration and interpretation of most large-scale educa- 1991; Goldberg, 1990). Test publishers could benefit from
tional testing programs). Personnel managers, security greater openness in encouraging independent researchers
specialists, and supervisors have no such ethical respon- to evaluate their tests. Employers could benefit from more
sibilities and in addition probably lack familiarity with skepticism of marketing claims and from independent
the Ethical Principles of Psychologists and Code of Con- and professional expertise in measurement and selection
duct (APA, 1992) and the Standards for Educational and before purchasing tests and establishing comprehensive
Psychological Testing (APA, 1985). Goldberg et al. (1991) selection systems that evaluate all relevant aspects of job
acknowledged this problem and called for more concrete, performance, rather than focusing solely on risk of in-
comprehensive training for test users and for screening ternal theft. Finally, the public and the profession of psy-
of potential users. These aspects of psychological testing chology could benefit from increased screening and
are generally unregulated in employment settings. Test training for test users, who could take a more proactive
publishers currently represent the final check in ensuring role in ensuring that tests are used appropriately by or-
that only qualified persons administer and interpret tests. ganizations. Increased scrutiny of test use and test users
Currently, Massachusetts is the only state that bans can minimize misuse of assessments, increase the validity
integrity tests, although Rhode Island prohibits use of and utility of tests, and deter further regulation of em-
these tests as the sole basis for employment decisions. ployment testing at the local, state, and federal levels.
Both laws went into effect before the studies completed Although technical and scientific questions about
by OTA and APA. Since then, legislation aiming to ban integrity tests (e.g., how well they work, what they mea-
or restrict the use of integrity and other forms of person- sure, and why they work) may not be answered to the
ality tests for preemployment screening has been intro- complete satisfaction of behavioral scientists, a general
duced in at least five other states. However, not one piece consensus on their validity and utility has emerged. This
of legislation has yet been signed into law. Inquiries about approval is, of course, limited to properly documented
the attitudes and past conduct of applicants and employers tests. The fate of integrity testing, and perhaps all per-
have been a major concern of legislators seeking to restrict sonality tests and preemployment tests, may eventually
these tests; however, their misuse by unqualified users has be decided by legislatures and the courts. We psychologists
also been cited as a rationale for such legislation (H. A. should willingly participate in such public policy debates.
Fernandez, personal communication, June 18, 1991; Or-
egon Psychological Association, 1991). REFERENCES

Conclusion American Management Association. (1977). Summary overview of the


'stale of the art' regarding information gathering techniques and level
Researchers have generally agreed that there is insufficient of knowledge in three areas concerning crimes against business.
evidence to reach definitive conclusions on some aspects Washington, DC: National Institute of Law Enforcement and Criminal
Justice, Law Enforcement Administration.
of integrity testing and that the underlying construct of American Psychological Association. (1992). Ethical principles of psy-
integrity itself is poorly understood (Goldberg et al., 1991; chologists and code of conduct. American Psychologist, 47, 1597-
Sackettetal., 1989; Scrivner, 1991; U.S. Congress, 1990). 1611.

February 1994 • American Psychologist 117


American Psychological Association. (1985). Standards for Educa- Martin, S. L., & Terris, W. (1990). The four-cell classification table in
tional and Psychological Testing. Washington, DC: Author. personnel selection: A heuristic gone awry. The Industrial/Organi-
Association of Personnel Test Publishers. (1990). Model guidelines zational Psychologist, 27, 49-55.
for pre-employment integrity testing programs. Washington, DC: O'Bannon, R. M., Goldinger, L. A., & Appleby, G. S. (1989). Honesty
Author. and integrity testing: A practical guide. Atlanta: Applied Information
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality di- Resources.
mensions and job performance: A meta-analysis. Personnel Psychology, Ones, D. S., Viswesvaran, C , & Schmidt, F. L. (1993). Meta-analysis
44, 1-26. of integrity test validities: Findings and implications for personnel
Cascio, W. E, Alexander, R. A., & Barrett, G. V. (1988). Setting cutoff selection and theories of job performance [Monograph]. Journal of
scores: Legal, psychometric, and professional issues and guidelines. Applied Psychology, 78, 679-703.
Personnel Psychology, 41, 1-24. Oregon Psychological Association. (1991). [From insert distributed with
Conoley, J. C , & Kramer, J. J. (Eds.). (1989). The tenth mental mea- newsletter]. OPA Newsgram, 70(8).
Rorer, L. G., Hoffman, P. J., & Hsieh, K. (1966). Utilities as base-rate
surements yearbook. Lincoln, NE: Buros Institute of Mental Mea-
multipliers in the determination of optimum cutting scores for the
surements.
discrimination of groups of unequal size and variance. Journal of
Employee Polygraph Protection Act of 1988, Sec. 200001 et sec, 29 Applied Psychology, 50, 364-368.
U.S.C. Sackett, P. R., Burris, L. R., & Callahan, C. (1989). Integrity testing for
Eyde, L. D., Moreland, K. L., Robertson, G. J., Primoff, E. S., & Most, personnel selection: An update. Personnel Psychology, 42, 491-529.
R. B. (1988). Test user qualifications: A data-based approach to pro- Sackett, P. R., & Harris, M. M. (1984). Honesty testing for personnel
moting good test use (Report of the Test User Qualifications Working selection: A review and critique. Personnel Psychology, 37, 221-245.
Group of the Joint Committee on Testing Practices). Washington, Saxe, L. (1991). Lying: Thoughts of an applied social psychologist.
DC: American Psychological Association. American Psychologist, 46, 409-415.
Eyde, L. D., Robertson, G. J., Krug, S. E., Moreland, K. L., Rob- Saxe, L., Dougherty, D., & Cross, T. (1985). The validity of polygraph
ertson, A. G., Shewan, C. M., Harrison, P. L., Porch, B. E., Ham- testing: Scientific analysis and public controversy. American Psychol-
mer, A. L., & Primoff, E. S. (1993). Responsible test use: Case ogist, 40, 355-366.
studies for assessing human behavior (Report of the Joint Com- Scrivner, E. (1991). Integrity testing: A new frontier for psychology. Fo-
mittee on Testing Practices). Washington, DC: American Psycho- rensic Reports, 4, 75-89.
logical Association. Shephard, I. I., & Duston, R. L. (1987). Workplace privacy: Employee
Goldberg, L. R. (1990). An alternative "description of personality": The testing, surveillance, wrongful discharge, and other areas of vulnera-
Big Five factor structure. Journal of Personality and Social Psychology, bility. Washington, DC: Bureau of National Affairs.
59, 623-634. Slora, K. B. (1989, August). An empirical approach to determining
Goldberg, L. R., Grenier, J. R., Guion, R. M., Sechrest, L. B., & Wing, employee deviance base rates. Paper presented at the 97th Annual
Convention of the American Psychological Association, New
H. (1991). Questionnaires used in the prediction of trustworthiness in
Orleans, LA.
pre-employment selection decisions: An APA task force report. Wash-
Stone, E. F., Stone, D. L, & Hyatt, D. (1989, April). Personnel se-
ington, DC: American Psychological Association.
lection procedures and invasion of privacy. In R. M. Guion (Chair),
Gough, H. G. (1954). Personnel reaction blank. Palo Alto, CA: Consulting
Privacy in organizations: Personnel selection, physical environment,
Psychologists Press. and legal issues. Symposium presented at the Fourth Annual
Guastello, S. J., & Rieke, M. L. (1991). A review and critique of honesty Conference of the Society for Industrial and Organizational Psy-
test research. Behavioral Sciences and the Law, 9, 501-523. chology, Boston.
Hollinger, R. C , & Clark, J. P. (1983). Theft by employees. Lexington, U. S. Congress, Office of Technology Assessment. (1983). Scientific va-
MA: Lexington Books. lidity of polygraph testing (OTA-TM-H-15). Washington, DC: U.S.
Kay, G. C. (1991). Casting stones at integrity testing, not at integrity Government Printing Office.
tests. Forensic Reports, 4, 163-169. U. S. Congress, Office of Technology Assessment. (1990). The use of
Keyser, D. J., & Sweetland, R. C. (Eds.). (1991). Test critiques (Vol. 8). integrity tests for pre-employment screening (OTA-SET-442). Wash-
Austin, TX: Pro-Ed, Inc. ington, DC: U.S. Government Printing Office.

APPENDIX
Sample Recommendations Offered to Publishers by the APA
Task Force on the Prediction of Dishonesty and Theft in
Employment Settings

Construct Validity
Each test publisher should define the construct(s) measured by the test.
Each test publisher should provide direct evidence bearing on the test's construct
validity. Such evidence should include examinations of convergent and discrim-
inant validity to identify moderators and alternative explanations for individual
differences.
Investigators should devote increased research attention to issues such as whether
scores from one end of the continuum are more valid than are scores from the
other end.
Correlations with other measures should be regularly sought and reported.
Item analyses should be carried out routinely, with aggregate results reported at a
minimum.

118 February 1994 • American Psychologist


Cut Score Use
Both test score and criterion distributions are continuous. Basing decisions on
degree of risk rather than category designation reduces incidence of decision
errors.
Publishers need to provide clear and detailed information on how cutting scores
(or boundaries between zones) are derived.
Publishers need to provide evidence when the same cutting score is applied to
different businesses and individual companies.
Marketing and Promotional Materials
Publishers should adopt and enforce standards ensuring that the promotional claims
made by each testing organization rest on a firm empirical foundation.
Each publisher should carefully consider the aims for each of its tests, determine
which aims it can support by what evidence, and then examine its promotional
material to make sure that its claims are justified.
The development of nontechnical guidelines for users of all kinds of preemployment
screening tests would assist potential users in evaluating claims made on behalf
of competing tests.
Increasing Openness of Integrity Test Evaluation
Independent researchers should be given access to these tests for use in studies of
honesty and related concepts, and test publishers should offer to make databases
available for secondary or meta-analyses.
Publishers should do all in their power to ensure that their tests receive adequate
reviews.
Test publishers should commit themselves to making available results of all, even
negative, research on tests.
Test publishers should encourage comparative studies between integrity tests, as
well as other types of tests, to increase understanding of the underlying constructs
and incremental validity.
Proper Test Use and Training of Test Users
Publishers should do more to ensure test users are knowledgeable about appropriate
and inappropriate uses of tests and test scores. Publishers themselves should take
increased responsibility for ensuring proper test use by their clients.
Publishers should institute specific procedures for handling test materials and de-
termining that users are in compliance with acknowledged principles for the
protection of privacy.
All publishers should undertake analyses of their instruments' compliance with
published test standards.
For foreign translations, it is recommended that (a) the equivalence of any test
provided in two languages be demonstrated and (b) such a test not be distributed
for use in making selection decisions before determination of its reliability and
validity in each language.

February 1994 • American Psychologist 119

View publication

You might also like