Ethical Issues in Assessment
Ethical Issues in Assessment
140152
Wayne J. Camara
Tests and assessments in the USA have taken on additional burdens as their uses have been greatly expanded by educators, employers, and policy makers. Increased demands are often placed on the same
assessment, by different constituencies, to serve varied purposes (e. g., instructional reform, student accountability, quality of teaching and instruction). Such trends have raised considerable concerns about the appropriate use of tests and test data and testing is under increased scrutiny in education, employment and
health care. This paper distinguishes among the legal, ethical and professional issues recently emerging
from the increased demands of assessments and also identifies unique issues emanating from computerbased modes of test delivery and interpretation. Efforts to improve assessment practices by professional
associations and emerging issues concerning the proper use of assessments in education are reviewed.
Finally, a methodology for identifying consequences associated with test use and a taxonomy for evaluating
the multidimensional consequential outcomes of test use within a setting are proposed.
Individuals are first exposed to tests and assessments very early in their school years in the United
States. By the age of 18, assessments have already
played a significant role in the life decisions of many
young adults such as graduation, promotion and retention, college admissions, placement, and scholarship awards. Tests and assessments are also widely
used in the psychological and educational screening
of children and adults, for career and vocational assessment, for certification and licensing of individuals for a number of occupations, and for the selection and placement of workers within government
and private sector organizations. Given the diverse
and important use of tests and assessments, measurement professionals have become increasingly
concerned with questions of validity, fairness, intended use(s) and consequences related to the appropriate use of educational and psychological assessments.
The ethical and legal conduct of lawmakers, celebrities, athletes and professionals from all areas
(e. g., business, investment, marketing, law) have attracted headlines and the attention of mass media.
In the U. S. and many European countries there has
been a fixation on such ethical and legal issues involving the responsible conduct and obligations to
the public in recent years. Attention has also focused
on the use of tests and test results in education, em-
ployment, and health care settings (Berliner & Biddle, 1995). The real and perceived misuses of assessments and assessment results have become one of
the most challenging dilemmas facing measurement
professionals and test users today. Abuses have
been widely reported in preparing students to take
tests and in the use and misuse of data resulting
from large-scale testing programs. Elaborate and
costly test cheating operations have been disclosed
by federal agencies (Educational Testing Service,
1996), test preparation services have employed confederates to allegedly steal large pools of items from
computer-based admissions testing programs, instances of students and employees being provided
with actual test items before an administration have
been reported, as have unauthorized extension of
time limits, falsification of answer sheets and score
reports, and violations of the confidentiality of test
data (Schmeiser, 1992). Misuses of test data in high
stakes programs abound and the accuracy and marketing tactics of test publishers have been criticized
in some (Sackett, Burris, & Callahan, 1989; Sackett
& Harris, 1984).
Professional conduct and responsibilities in use
of assessments can be ordered within three levels:
(1) legal issues, (2) ethical issues, and (3) professional issues. The practices and behaviors within these
three levels are certainly interrelated, yet this cate-
141
more common issues and concerns of test use unaddressed (e. g., validity of the measure when disparate impact is not present). Numerous federal and
state laws and executive orders have implications on
employment testing primarily through prescribed
standards for equal employment opportunity
(Camara, 1996), but also for the assessment of individuals with disabilities, the handling and retention
of personnel records, and restrictions of the use of
certain pre-employment techniques (e. g., Employee Polygraph Protection Act of 1988). The general consensus among industrial psychologists is
that Civil Rights laws, which emanated in the 1960s,
have been a major stimulus for improved pre-employment assessment practices. Employers became
better educated about the technical, professional,
and legal issues involved in the use of testing out of
necessity, and while there is some evidence that
regulations initially decreased use of employment
testing, today they are used by a higher proportion
of organizations than ever (Deutsch, 1988).
The first formal ethics code for any profession using assessments was adopted by the American Psychological Association (APA) in 1952. Eighteen of
the more than 100 ethical principals from this Code
(APA, 1953) addressed the use of psychological
tests and diagnostic aids, and addressed the following issues of test use: (1) qualifications of test users
(3 principles); (2) responsibilities of the psychologist
sponsoring test use (4 principles); (3) responsibilities and qualifications of test publisher representatives (3 principles); (4) readiness of a test for
release (1 principle); (5) the description of tests in
manuals and publications (5 principles); and (6) security of testing materials (2 principles).
Codes from the Canadian and British Psychological Associations came later, as did those from other
European nations (Lindsay, 1996). In the past decade, many other professional associations have
adopted ethical standards and professional codes
which cover measurement and assessment issues.
These trends have resulted from the increased public awareness of ethical issues, the variety of new
proposed and actual uses for assessments, the increased visibility given to assessments for accountability purposes, and a commitment from the professions to safeguard the public (Eyde & Quaintance, 1988; Schmeiser, 1992). Ethical standards of
the American Counseling Association, and APA are
unique in that these associations maintain formal
enforcement mechanisms that can result in member
suspension and expulsion, respectively. In 1992, the
American Educational Research Association
(AERA) adopted ethical standards, followed in
142
1995 by the National Council of Measurement in
Educations (NCME) Code of Professional Responsibilities in Educational Measurement. Several
other organizations such as the Society for Industrial and Organizational Psychology (SIOP) and regional I-O organizations formally adopted APAs
most recent ethical code for their members for educational purposes without any enforcement mechanisms.
Laws which affect testing, primarily strive to protect certain segments of the public from specific
abuses. Ethical standards and codes attempt to establish a higher normative standard for a broader
range professional behaviors. For example, APAs
ethical standards note that:
. . . in making decisions regarding their professional behavior, psychologists must consider this
Ethics Code, in addition to applicable laws and
psychology board regulations. If this Ethics Code
establishes a higher standard of conduct than in
required by law, psychologists must meet the
higher ethical standard. If the Ethics Code standard appears to conflict with the requirements of
law, then psychologists make known their commitment to the Ethics Code and take steps to resolve the conflict in a responsible manner (APA,
1992, p. 1598).
Coinciding with this increased attention to ethical
codes has been a dramatic increase in professional
and technical standards for assessment issued which
is described later. The Standards for Educational
and Psychological Testing (AERA, APA, & NCME,
1985) are the most widely cited document addressing technical, policy, and operational standards for
all forms of assessments that are professionally developed and used in a variety of settings. Four separate editions of these standards have been developed by these associations and a fifth edition is currently under development. However, numerous
other sets of standards have been developed to address more specific applications of tests or aimed at
specific groups of test users. Standards have been
developed for: (1) specific uses such as the validation and use of pre-employment selection procedures (Society for Industrial and Organizational
Psychology, 1987), integrity tests (ATP, 1990), licensing and certification exams (Council on Licensure,
Enforcement, and Regulation, 1993; Council on Licensure, Enforcement and Regulation & National
Wayne J. Camara
Organization for Competency Assurance, 1993),
educational testing (Joint Committee on Testing
Practices, 1988); (2) for specific groups or users such
as classroom teachers (AFT, NCME, NEA, 1990),
and test takers (Joint Committee on Testing Practices, 1996), and (3) for specific applications such as
performance assessments, adapting and translating
tests (Hambleton, 1994), and admissions testing
(College Entrance Examination Board, 1988; National Association of Collegiate Admissions Counselors, 1995).
Professional standards, principals, and guidelines
are more specific and generally oriented toward
more technical issues to guide test users with specific applications and use of assessments. First, technical issues concerning the development, validation,
and use of assessments are addressed in standards.
Validity is the overarching technical requirement
for assessments, however, additional professional
and social criteria have been considered in evaluating assessments, such as: (1) how useful the test is
overall, (2) how fair the test is overall, and (3) how
well the test meets practical constraints (Cole &
Willingham, 1997). These criteria are directed at
both the intended and unintended uses and consequences of assessments. Existing standards guide
test users in the development and use of tests and
assessments; however, these standards may rarely
reach and influence test users not associated with a
profession.* For example, most employers are unaware of the Principles for the validation and use of
personnel selection procedures (Society for Industrial and Organizational Psychology, 1987) and certainly the vast majority of educational administrators and policy makers who determine how to use
tests and cite test results in making inferences about
the quality of education have never viewed a copy
of any of the various standards in educational measurement and testing.
Professional standards developed by groups such
as APA and AERA do not appear in publications
commonly read by employers, educators and policy
makers. Many standards are written at a level where
they may be incomprehensible to such individuals
even if they had access to them. Finally, in many instances, members of the professional associations
which develop and release standards themselves
may not have and use copies of the standards and
may have had little exposure to the standards and
other new topics in testing, measurement, and sta-
Often professional standards have been cited by courts in case law and have influenced assessment practices in these
ways.
143
tested, the more likely instances of test misuse will
occur.
With the expanded uses and increased focus on
assessment has come renewed criticism of the misuses and negative consequences of assessments.
Professionals in measurement and testing are increasingly struggling with how to best improve
proper test use and to both inform and influence an
increasingly diverse group of tests users today who
may have no formal training in testing and measurement uses, but still have legitimate claims for using
test results in a wide variety of ways. Such groups
have attempted to address the legal, ethical, and
professional concerns with additional codes of conduct, technical standards, workshops, and case studies. However, most of these efforts rarely reach beyond members of the specific professional association or clients/users of a specific assessment product.
Clearly such efforts are essential for improving the
proper use of assessments and appropriate understanding of assessment results. Yet, these initiatives
will not generally reach the secondary users who
may insist on using assessment results as the sole determinant of high school graduation, rewards and
sanctions to schools and teachers, and the primary
indicator of equity, educational improvement or student achievement. Unfortunately, efforts which are
aimed at only one segment of a much more expansive population of test users may not go far enough,
fast enough for improving proper assessment practice.
Associations have attempted to cope with this
new more expansive group of secondary test users
by developing broader and simpler forms of standards, such as the Code of Fair Testing Practices in
Education which basically condenses the primary
standards from a 100 page document into a four
page booklet which encourages duplication. Other
efforts have been to work in collaboration with
broader groups such as the National Education Association to develop codified guidelines or standards. However all such efforts have had no consistent impact across situations because there are few
common linkages, different priorities and expectations for assessments, and little common understanding between primary and secondary test users.
Relatively few efforts have been focused on undergraduate and graduate programs which train teachers and measurement specialists. Because universities and colleges differ in types of programs offered,
the title of courses and course sequences, and even
the departments which such programs are housed
in, targeting and reaching educational programs
broadly presents a number of substantial logistical
144
obstacles. Often it is difficult to identify the faculty
and administrators responsible for such programs
and to effect systematic changes in their training
programs.
For these and other reasons, Haney and Madaus
(1991) state that test standards have had little direct
impact on test publishers practice and even less impact on test use. They note that professional codes
and standards primarily serve to enhance the prestige, professional status, and public relations image
of the profession rather than narrow the gap between standards and actual practice. How do we resolve these issues? Given the increased social policy
implications of testing, some have argued that greater legal regulation, litigation, or enforcement of
technical standards by independent auditing agency
present some potential mechanisms for reducing
the misuse of assessment practices (Haney, 1996;
Haney, Madaus, & Lyons, 1993; Madaus, 1992).
However, such mechanisms may have little impact
on many of the most visible misuses of assessments
because it is often legislative and executive
branches of state and federal government who advance expanded and often inappropriate use of assessments. Because test use is so expansive and
abuses are so diverse, solutions which address only
one element or one audience (e. g., test developer,
teacher) may not be equipped to resolve the majority of instances where assessments are misused.
Wayne J. Camara
There are also often positive consequences when
validated assessments are appropriately used:
merit as a guide for decision making (selecting
the most qualified candidate or making awards
based on relevant performance)
efficiency (relatively quick and effective means of
collecting a large amount of data across a range
of skills/competencies)
quality control (certification or licensure)
protection of the public (negligent hiring for critical occupations)
objectivity for making comparisons and decisions
among individuals or against established criteria
cost effectiveness and utility
Consideration of how social ramifications of assessments affect validity has been summarized by Cronbach (1988) who stated that validity research is essentially a system which considers personal, institutional, and societal goals as they relate to inferences
derived from test scores. If validity is established
through evidence that supports inferences regarding specific uses of a test, then intended and unintended consequences of test interpretation and use
should be considered in evaluating validity (Messick, 1989). Test developers and test users need to
anticipate negative consequences that might result
from test scores, potential corruption of tests, negative fallout from curriculum coverage, and how
teachers and students spend their time (Linn, 1993).
While there is some consensus that the consequences of test use must become an important criterion in evaluating tests within education, this view
is not generally held in other settings (e. g., personnel, clinical).
Before consequences can be integrated as a component in evaluating assessments, a taxonomy or
model is required. Such a taxonomy must consider
both positive and negative impacts and consequences associated with test use. The impact, consequences, and feasibility of alternative procedures
(e. g., biographical data, open admissions vs selection). Further complicating such a taxonomy is the
knowledge that different stakeholders will have
widely differing views on these issues. After the consequences have been identified, their probability of
occurrence, the weight (positive or negative) associated with each consequence, and the level at which
the consequence occurs (i. e., individuals, organizations, or society) must be determined.
This taxonomy borrows from terminology and
processes from expectancy theory (Vroom, 1964)
where the weight of the consequences are similar to
the valence and the probability is related to the
145
Individual
(e. g., student)
Organization
(e. g., school)
Societal
(e. g., community)
Positive
Harmful
Summative
Consequence #1
Summative
Consequence
Valence
Strength of the
consequence
Instrumentality
Probability
consequence
will occur
10 to +10
0 to 10
Individual
consequence
Probability
40
Organizational
consequence
Probability
+9
146
fore embarking on a new or revised testing program.
Most consequences will have multiple impacts on
individuals (e. g., test taker, teacher), organizations
(e. g., schools, business), and society (e. g., community, state). Steps 5 and 6 require individuals, often
with very diverse views, to arrive at a consensus or
common judgment about the probabilities and
strength (and direction) of consequences. The literature on standard setting may be of assistance in
structuring a more explicit process.
Table 1 illustrates how consequences may be
identified and classified through a consensus process. A list of potential consequences would be developed and classified within each of the nine boxes
(step 4). Once all potential consequences are identified, each consequence is fully evaluated to determine its valence and instrumentality as illustrated in
Figure 1. Step 7 in the process would have key stakeholders determine the overall summative consequences on individuals, organizations and society
before a final decision is reached on the desirability
and appropriateness of an assessment program or
proposed use for assessments.
Such a taxonomy would not ensure that test misuse is minimized, but it would help to raise awareness of the diverse range of issues that emerge
across different stakeholders and constituency
groups who are involved in a high stakes assessment
program. The absence of literature proposing models or taxonomies to identify and contrast consequences associated with test use leaves the test developer and user with little or no guidance in improving professional conduct and appropriate use
of assessments.
Wayne J. Camara
quiring much greater time and effort for clinical
judgment and interpretation of computer-generated clinical interpretations than is usually the case.
Specifically, two or more identical soil readings,
blood chemistries, meteorological conditions or
MMPI profiles may require different interpretations depending on the natural or human context in
which each is found . . . use of the same objective
finding (e. g., an IQ of 120 or a 27 MMPI codetype)
may be quite different if the unique patient is a 23year-old individual being treated for a first acute,
frankly suicidal episode than if the unique patient
is a 52-year-old truck driver . . . applying for total
disability (Matarazzo, 1986, pp. 2021). Eyde and
Kowal (1987) explained that computer-based testing provides greater access to tests and expressed
concern about the qualifications of such expanded
test users.
Technological innovations and the increased
pressure for accountability in health care services
may also be creating a different demand and market
for clinical and counseling assessments. Assessments in all areas can be and are delivered directly
to consumers. The availability of a take-home CDROM IQ test for children and adults, marketed to
the general public or by Pro-Ed, a psychological
testing and assessment publisher, has raised these
same ethical and professional issues for psychologists. The CD-ROM test comes with an 80-page
manual which informs parents of some of the theories of testing, how to administer the test, and how
to deal with test results (New York Times, January
22, 1997). In such instances when tests are delivered
by the vendor directly to the test taker there is no
traditional test user. The test taker or their parents,
who have no training and little knowledge of testing,
must interpret the results, which increases the risk
of test misuse.
Computer-adaptive testing (i. e., assessments in
which the examine is presented with items or tasks
matched to his or her ability or skill level) is increasingly used for credentialling and licensing examinations and admissions test programs today. Several
unique concerns arise even when computer-based
tests are administered under controlled conditions,
such as in the above instances. First, issues of equity
and access arise because these computer-based testing programs often charge substantially higher testing fees, which are required to offset the additional
expenses incurred for test development and delivery, and often have more limited geographical testing locations. Second, familiarization with technology and completing assessments on computer may be
related to test performance. Research has demon-
147
148
measurement and has not become another educational fad as some had predicted. Several large assessment programs had sought to replace their
standardized testing programs with fully performance-based or portfolio systems. Today it appears
that the model state assessment program will
combine such constructed response tasks with more
discrete, selected response (e. g., multiple choice,
grid-ins) test items. Employing multiple measures
allows educators to gain the benefits of more indepth and applied performance tasks that increase
curricular validity, as well as increased reliability
and domain coverage that selected response items
offer. However, a number of legal, ethical and professional concerns emerge with any high stakes assessment program whether the decisions made primarily affect the student or the school.
Single assessments, either norm-reference multiple choice assessments or more performance-based
assessments, do not well serve multiple, high-stakes
needs (Cresst, 1995). Often key proponents of largescale assessments support multiple uses, but actually
have very different priorities given these uses. Kirst
and Mazzeo (1996) explain that when such a state
assessment system moved from a design concept to
becoming an operational testing program it became
clear that not all the proposed uses and priorities for
the design could be accommodated. When priorities
of key stakeholders could not be met, support for
the program decreased.
Phillips (1996) identified legal criteria which apply to such expanded uses of assessments for high
stakes purposes. These criteria have been modified
and supplemented with several additional criteria
which reflect a range of issues:
Adequate advance notification of the standards required of students. To ensure fairness, students and
parents should be notified several years in advance
of the type of standards they will be held to in the
future. Students and teachers should be provided
with the content standards (knowledge and skills required) and performance standards (level of performance). Sample tasks, model answers, and released items should be provided and clear criteria
should be established when high stakes (e. g., graduation) uses are associated with the test.
Evidence that students had an opportunity to learn.
The critical issue is whether students had adequate
exposure to the knowledge and skills included on
the assessment or whether they are being asked to
demonstrate competency on content or in skills that
they were not exposed to in school. Phillips (1996)
Wayne J. Camara
notes that such curricular validity can often be demonstrated through survey responses from teachers
that ensure students had on average more than one
opportunity to learn each skill tested.
Evidence of opportunity for success. This challenge
emerges when major variations from standardization occur. This assumes that all students are familiar with the types of tasks on the assessment, the
mode of administration (e. g., computer-based testing), have the same standardized administrative,
scoring procedures, and equipment (e. g., some students have access to a calculator or superior laboratory equipment in completing the assessment), and
that outside assistance (e. g., group tasks, student
work produced over time where parents and others
could unduly offer assistance) could not affect performance on the assessment. Variations in these and
other conditions can present an unfair advantage to
some students.
Assessments reflect current instructional and curricular practices. If assessments are designed to reflect
exemplary instructional or curricular practices, as is
often the desire of educators who hope to use the
assessment to drive changes, which are not reflected
in the actual practices for many schools, a fundamental fairness requirement may not be met. The
same challenges could be brought where teachers
do not receive the professional development to implement new instructional or assessment practices
(e. g., use of a graphing calculator) that are required
on the assessment or in end-of-course assessments
where the teacher lacks appropriate credentials for
the subject area.
While these concerns apply to most educational
assessments, they move from professional issues to
legal and ethical concerns when assessments are
used to make high stakes decisions. Additional ethical and professional issues which have been associated with various high stakes educational assessments may also affect other types of testing programs in other settings. Only a few of these issues
are briefly addressed below.
Overreliance or exclusive reliance on test scores. Test
performance should be supplemented with all relevant and available information to form a coherent
profile of students when making individual high
stakes decisions (e. g., admissions, scholarships).
Student performance on tests should be interpreted
within the larger context of other relevant indicators of their performance. In admissions decisions,
students grades, courses, and test scores are gener-
149
considered when making simplistic comparisons
among schools, districts, and other units. When these
issues are not adequately considered by test developers and test users serious professional and ethical
issues arise.
Exclusion of students from large-scale testing programs. Most large-scale national assessment programs which use aggregate level data (school, district, state) to monitor educational progress and permit comparisons systematically exclude large
proportions of students with limited English proficiency and disabilities (McGrew,Thurlow,& Spiegel,
1993) Often school staff determine which students
may be excluded from such national and state testing programs and there is variation across schools
in the exclusion rates and application of criteria for
excluding students. Paris, Lawton, Turner, and Roth,
(1991) have also demonstrated that low achievers
are often excluded by some schools or districts
which would have the effect of artificially raising
district test scores. Such practices introduce additional error into analyses, complicate accurate policy studies, affect the rankings resulting from the test
data and introduce a basic unfairness in the use of
test data (National Academy of Education, 1992).
150
Conclusion
This paper has attempted to distinguish among legal
and regulatory mandates, ethical issues, and professional responsibilities all which concern the appropriate use of tests and test data. Numerous efforts
have been undertaken by testing professionals and
professional organizations to improve responsible
use of tests, yet often these efforts are judged to
have fallen short. As tests are used by an increasing
number of users with a variety of objectives (e. g.,
policy makers, state and local education officials,
business) the potential for misuse of tests increases
and efforts to educate and monitor test users become less effective. Existing testing standards and
specialty guidelines and other forms of addressing
the responsible use of tests are discussed. The potential consequences of testing and assessment are
reviewed and a taxonomy has been proposed to aid
test users in addressing the multiple and multidimensional consequences resulting from test use
with various key stakeholder groups. Finally, this paper provides a more detailed review of the professional concerns arising from the migration of tests
to a computer-based platforms and the increased
demands placed on assessments in U. S. education.
The value of assessment is often related to its impact. Individual appraisals should bring to bear all
relevant information to describe and explain important qualities, minimize problems, promote growth
and development, and increase the validity of important decisions (e. g., course placement, admissions, certification, selection) (Scheuneman & Oakland, in press). National, state, and local testing programs should provide comprehensive data that can
supplement other sources of information in both informing us of student skills and knowledge today
and the growth in learning over time.
Legal, ethical, and professional concerns with assessment are difficult to distinguish. All such issues
concern the proper use of assessment and the probable consequences of using assessments. Consequences of testing are in the eye of the beholder.
The same assessment which presents several potential benefits to some groups (e. g., policy makers,
community, business) may also result in negative
consequences to individuals (e. g., test takers, students). A paradigm is needed to assist test users
identify and evaluate the potential consequences
that result from test use and the consequences
which would result from alternative practices (use
of more subjective processes, collecting no data).
Additional attention to the consequences of testing,
Wayne J. Camara
and how these are determined and evaluated by the
various stakeholders is essential to reduce the misuse of testing and improve assessment practices
among the increasingly diverse types of individuals
using tests and results from testing.
Rsum
Les tests et les valuations ont fait lobjet de contraintes supplmentaires au fur et mesure que leur
utilisation a t tendue par les enseignants, les employeurs et les dcideurs. Des exigences grandissantes sont souvent adresses ces mmes valuations
par diverses instances, pour rpondre des buts
varis (par ex: la rforme de lenseignement, la responsabilit des tudiants, la qualit des mthodes
denseignement). Ces tendances ont suscit des
proccupations considrables quant lemploi appropri des tests et de leurs rsultats, et les
mthodes de testing sont examines de plus en plus
scrupuleusement dans le domaine de lducation, de
lemploi et de la sant. Le prsent article diffrencie
les problmes lgaux, thiques et professionnels apparus rcemment du fait de demandes accrues
dvaluations et il identifie les problmes spcifiques lis lapplication et linterprtation informatises des tests. Lauteur passe en revue les efforts entrepris par les associations professionnelles
en vue damliorer les pratiques dvaluation ainsi
que les problmes concernant lutilisation adquate
des valuations dans le domaine de lducation. Enfin il propose une mthodologie pour identifier les
consquences lies lemploi des tests et une
taxonomie destine valuer les consquences
multidimensionnelles de lemploi des tests dans un
contexte donn.
Authors address:
Dr. Wayne J. Camara
The College Board
19 Hawthorne Drive
Princeton Junction, NJ 08550
USA
E-mail: [email protected]
References
Aiken, L., West, S. G., Sechrest, L., Reno, Raymond R.,
Roediger III, H. L., Scarr, S., Kazdin, A. E., & Sherman,
S. J. (1990). Graduate training in statistics, methodology, and measurement in psychology: A survey of PhD
151
Deutsch, C. H. (October 16, 1988). A mania for testing
spells money. New York Times.
Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991).
Quality control in the development and use of performance assessments. Applied Measurement in Education,
4(4), 289303.
Educational Testing Service (October 31, 1996). Test
cheating scheme used encoded pencils, complaints
charges. ETS Access. Princeton, NJ: Author.
Employee Polygraph Protection Act of 1988, Sec. 200001
et sec., 29 U. S.C.
Everson, H. E. (in press). A theory-based framework for
future college admissions tests. In S. Messick (Ed.), Assessment in higher education. Hillsdale, NJ: Erlbaum.
Eyde, L. D., & Kowal, D. M. (1987). Computerized test
interpretation services: Ethical and professional concerns regarding U. S. producers and users. Applied Psychology: An International Review, 36, 401417.
Eyde, L. D., & Quaintance, M. K. (1988). Ethical issues
and cases in the practice of personnel psychology. Professional psychology: Research and Practice, 19(2), 148
154.
Fairtest (Summer, 1996). Cheating cases reveal testing mania. Fairtest Examiner, 9, 34.
Hambleton, R. K. (1994). Guidelines for adapting psychological and educational tests: A progress report. European Journal of Psychological Assessment, 10, 229244.
Haney, W. (1996). Standards, schmandards: The need for
bringing test standards to bear on assessment practice.
Paper presented at the Annual Meeting of the American Educational Research Association, New York.
Haney W., & Madaus, G. C. (1991). In R. K. Hambleton
& J. C. Zaal (Eds.), Advances in educational and psychological testing (pp. 395424). Boston, MA: Kluwer.
Haney, W., Madaus, G. C., & Lyons, R. (1993). The fractured marketplace for standardized testing. Boston, MA:
Kluwer.
Helms, J. E. (1992). Why is there no study of cultural
equivalence in standardized cognitive ability testing?
American Psychologist, 47, 10831101.
Joint Committee on Testing Practices. (1988). Code of fair
testing practices in education. Washington, DC: Author.
(Copies may be obtained from NCME, Washington,
DC)
Joint Committee on Testing Practices. (1996). Rights and
responsibilities of test takers (Draft). Washington, DC:
Author.
Kirst, M. W., & Mazzeo, C. (1996). The rise and fall of state
assessment in California 199396. Kappan, 22, 319323.
Lindsay, G. (1996). Ethics and a changing society. European Psychologist, 1, 8588.
Linn, R. L. (1993). Educational assessment: Expanded expectations and challenges. Educational evaluation and
policy analyses, 15(1), 116.
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex,
performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 1521.
Madaus, G. F. (1992). An independent auditing mechanism for testing. Educational Measurement: Issues and
Practice, 11, 2631.
Matarazzo, J. D. (1986). Computerized clinical psycholog-
152
ical test interpretations: Unvalidated plus all mean and
no sigma. American Psychologist, 41, 1424.
McGrew, K. S., Thurlow, M. L., & Spiegel, A. N. (1993).
An investigation of the exclusion of students with disabilities in national data collection programs. Educational Evaluation and Policy Analyses, 15, 339352.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 3346). New York:
Macmillan.
Mills, C. N., & Stocking, M. L. (1996). Practical issues in
large-scale computerized adaptive testing. Applied
Measurement in Education, 9, 287304.
National Academy of Education (1992). Assessing student
achievement in the states: The first report of the National
Academy of Education panel on the evaluation of the
NAEP trial state assessment; 1990 Trail State Assessment. Stanford, CA: Stanford University, National
Academy of Education.
National Association of Collegiate Admissions Counselors (1995). NCACA commission on the role of standardized testing in college admissions. Author.
New York Times (January 22, 1997). One of the newest
take-at-home tests: IQ. New York Times.
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L.
(1991). A developmental perspective of standardized
Wayne J. Camara
achievement testing. Educational Researcher, 20(5),
1220.
Phillips, S.E. (1996). Legal defensibility of standards: Issues and policy perspectives. Educational Measurement: Issues and Practice, 15, 513.
Russell, M., & Haney, W. (1997). Testing writing on computers: An experiment comparing student performance
on tests conducted via computer and via paper-andpencil. Educational Policy Analyses Archives, 5(3) 1
18.
Sackett, P. R., Burris, L. R., & Callahan, C. (1989). Integrity testing for personnel selection: An update. Personnel Psychology, 42, 491529.
Sackett, P. R., & Harris, M. M. (1984). Honesty testing for
personnel selection: A review and critique. Personnel
Psychology, 32, 487506.
Scheuneman, J. D., & Oakland, T. (in press). High stakes
testing in education.
Schmeiser, C. B. (1992). Ethical codes in the professions.
Educational Measurement: Issues and Practice, 11(3),
511.
Society for Industrial and Organizational Psychology
(1987). Principles for the validation and use of personnel
selection procedures. Bowling Green, OH: Author.
Vroom, V. H. (1964). Work and motivation. New York:
Wiley.