A Comparative Study of ESL Writers Performance in A Paper B 2004 Assessing
A Comparative Study of ESL Writers Performance in A Paper B 2004 Assessing
Abstract
1. Introduction
1075-2935/$ – see front matter © 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.asw.2004.01.001
H.K. Lee / Assessing Writing 9 (2004) 4–26 5
The concept of authenticity has been central to the study of language testing in
the past several decades (Bachman & Palmer, 1996; Douglas, 2000; Lewkowicz,
2000; Messick, 1994; O’Malley & Pierce, 1995; Widdowson, 1979). It has, how-
ever, been mostly associated with the nature of test tasks. Messick (1994) advocates
the view originally proposed by Arter and Spandel (1992) that authentic assess-
ment purports to describe the processes and strategies test takers use to perform
their task so as to manage the work demands properly. Accordingly, “test admin-
istrators must provide realistic contexts for the production of student work by
having the tasks and processes, as well as the time and resources, parallel those in
the real world” (p. 18). O’Malley and Pierce (1995) also stress that authenticity in
various subject assessments must be taken into account at each stage of test admin-
istration, from constructing and implementing a test, to the grading and reporting
procedures.
Side by side with the concept of authenticity lies the concept of fairness. Ac-
cording to the APA/AERA/NCME Standards (1999, p. 74), “fairness requires that
all examinees be given a comparable opportunity to demonstrate their standing on
the construct(s) the test is intended to measure. Just treatment also includes such
factors as appropriate testing conditions and equal opportunity to become familiar
with the test format, practice materials, and so forth.” Disallowing computers in a
writing test may deprive habitual computer writers of an equal chance to perform
as well as they do in their ordinary writing circumstance. Indeed, regardless of the
inconsistency of research findings about the writer’s performance difference be-
tween two different writing media, habitual computer writers expressed frustration
and embarrassment when given a paper-and-pencil test, as described by Collier
and Werier (1995).
The act of writing has been addressed from various perspectives. Most im-
portantly, it involves complex cognitive processes comprising of versatile recur-
sive stages such as planning, translating or formulating, and reviewing (Hayes &
6 H.K. Lee / Assessing Writing 9 (2004) 4–26
Increasing rater reliability has been an important, but unresolved issue in writing
assessment (Huot, 1990; Lumley & McNamara, 1995; McNamara, 1996). Still, in
the modern era when interpretative approaches pervade, Moss (1994) argues that an
individual rater’s subjective context-bound judgment should not be suppressed for
the sake of agreement with other raters. From any perspective, the often-reported
handwriting effect on raters leading to lower reliability would not be welcomed.
Many studies have consistently addressed the association of poor handwriting
with lower marks from raters (e.g., Chase, 1986; Markham, 1976). Earlier studies
described that raters are likely to assign a lower grade to typed essays than to their
handwritten counterparts (e.g., Arnold et al., 1990; Bridgeman & Cooper, 1988;
Sweedler-Brown, 1991). Even with rater training focused on handwriting impact,
raters still showed this same tendency when grading handwritten and typed essays,
as shown in Powers, Fowles, Farnum, and Ramsey’s study (1994). The authors
attributed this tendency to a seemingly lengthier image in handwritten papers,
greater conspicuity of errors in the typed essays, and raters’ increased expectation
in word-processing tasks. The research thus consistently supports the existence of
a handwriting effect on raters. Investigating how handwriting affects readers and
to what extent it does so, would be of importance not only for the current research
inquiry itself, but also for the validity of the current research findings.
3. Method
3.1. Subjects
The subjects were solicited from the three consecutive regular paper-and-pencil
based ESL Placement Tests (EPT) at the University of Illinois at Urbana-Champaign
in Fall 2001. The EPT is administered to place incoming international students into
an appropriate level of ESL courses and employs the traditional paper-and-pencil
based testing mode. Upon the completion of each regular EPT, I advertised a
chance to take the computer-delivered EPT. The incentive was offered that the
better grade between the regular and the computer EPT would be considered for
the final placement decision. Among 75 sign-ups, a total of 42 subjects, 16 females
and 26 males participated in the computer-delivered test. Five were undergraduate
students and 37 were graduate students. The subjects came from nine different
native language backgrounds and from various departments.
The subjects in the study took the regular EPT essay test. In the regular,
paper-based EPT essay test, they wrote an essay after listening to a 10-minute
lecture and reading a topic-relevant article. Fifty minutes were given for reading
the article and writing their essay. The computer-delivered EPT was administered
within a maximum of three days from the regular paper-based EPT in a computer
lab with 28 Macintosh and 24 IBM stations. The topic for the computer EPT was
‘Brain Specialization,’ and the topics for the paper EPTs were ‘Ethics’ and ‘Trade.’
Specific disciplines from which the three topics were generated are economics for
H.K. Lee / Assessing Writing 9 (2004) 4–26 9
the topic, ‘Trade,’ neurolinguistics for ‘Brain Specialization,’ and philosophy for
‘Ethics.’ The writing prompts for the three topics (videotaped lectures and the
related reading articles) currently used in the EPT essay test have been empirically
validated as parallel over various subgroups of examinees despite the origin of
distinctly different disciplines of the topics (Liu, 1997).
For the computer-delivered essay test, I was able to use the EPT essay test-
ing software, previously constructed by another researcher (Kim, 2002). In the
test, the subjects first faced test instructions on the computer, which had been
transcribed from the cover page of the paper test booklet. They then watched the
videotaped lecture on the computer with headphones on. The listening stimulus
was computer-delivered for equal aural effect to all examinees. As in the regular
EPT, the video was shown only once and note-taking was allowed during listening.
After listening to the lecture, subjects were given 50 minutes to read the hard-copy
article and to write their essay on the computer. In their writing, subjects could
use word processing functions such as copy and paste, delete, undo, and indenta-
tion. The final essays were submitted to the server. Subsequent to the test, subjects
were asked to complete a questionnaire (Appendix A) on their writing habits and
processes.
As described, I designed the procedure for the computer-delivered EPT as sim-
ilarly as possible to the regular paper-and-pencil based EPT, except for the differ-
ence in the writing medium and the individually delivered video prompt.
The subjects’ essays produced in the paper and computer EPT were first graded
holistically at the operational scoring session held on the day of the test, and later
analytically graded for this research purpose. The operational EPT scoring pro-
cedure is as follows: Each essay is independently rated by two raters based on
holistic grading benchmarks. All raters are ESL teaching assistants with more
than a semester of teaching experience, and all participate in a mandatory training
session. The essays are marked with one of the four placement levels: too low,
ESL 400, ESL 401, and exempt for graduate level and, too low, ESL 113L, ESL
113U, and ESL 114 for undergraduate level.1 In case of a one-level score dif-
ference between the two ratings for a particular essay, raters discuss the essay to
reach a consensus. A score discrepancy of two levels is resolved by a third rater.
Appendix B gives the holistic benchmarks used in the operational EPT grading.
Subsequently after the computer-delivered test, I retrieved the 42 computer-
written essays from the server and printed for scoring. In advance to the computer-
delivered testing date, I had word-processed the 42 paper-written essays produced
1 The score, ‘too low,’ is rarely assigned to essays. There is no ‘exempt’ score for undergraduate
students because the university strictly regulates rhetoric requirement for all undergraduate students.
For international undergraduate students, taking sequential classes of ESL 114 and 115 is considered
to fulfill the rhetoric requirement.
10 H.K. Lee / Assessing Writing 9 (2004) 4–26
Table 1
Inter-rater reliability
Type of essays Reliability in the Reliability in the feature
holistic measurement analysis
Handwritten essays .60 .79
Transcribed essays .86 .70
Computer-generated essays .81 .68
As noted, all the essays in this study were given one of four placement levels.
For statistical purposes, the scores from ‘too low’ to ‘exempt’ were converted to
a numerical scale of one to four, respectively. Table 2 presents the descriptive
statistics of the three types of essay scores in the holistic measurement.
The repeated-measures ANOVA detected no significant mean difference in the
holistic scores of the three types of essays. Since in the placement tests a more
meaningful interpretation can be drawn from the discrepant cases of placement
results within individuals on different occasions, a comparison was made of place-
ment results per subject assessed by different essay type. Table 3 compares the
12 H.K. Lee / Assessing Writing 9 (2004) 4–26
Table 2
Descriptive statistics for the holistic measurement (n = 42)
Essay type Mean SD Minimum Maximum
Table 3
Frequency of placement result in the computer EPT and paper EPT
Computer
ESL 400 14 3 2 19
ESL 401 4 15 4 23
Exempt 0 0 0 0
Total 18 18 6 42
Note. Scores of five undergraduate students were included in the analysis by placing their score
corresponding to the score of graduate level. For example, the undergraduate placement result, ‘ESL
114’ coincided with the corresponding graduate score, ‘exempt.’
H.K. Lee / Assessing Writing 9 (2004) 4–26 13
Table 4
Frequency of placement results between the handwritten essays and the transcribed essays
Typed
that the topic in the second test seemed more difficult than that in the first, regular
EPT. As topic is widely perceived as one of the significant factors affecting writ-
ers, it appeared that a conceivably more difficult topic diminished the advantage
of using familiar writing tools. In sum, although the higher mean score of the
computer-delivered test did not result in a statistically significant difference from
the paper-based EPT, an examination of individual cases substantiates performance
differences in tests with different writing media.
Although the holistic scores on the transcribed version of essays were no longer
important for the practical placement decision, from an empirical point of view a
comparison of scores of the pairs of transcribed essays and original handwritten
essays would shed light on the validity of the current study. As readers are one
of the inevitable sources of error in writing assessments, and text appearance has
been often reported to affect raters, an examination of the handwriting effect on
raters is necessary to check the validity of the assigned scores.
Table 4 displays the holistic scores between pairs of original handwritten es-
says and their transcribed counterparts. In the measurement of the 42 transcribed
essays, 8 essays were scored higher by one level, whereas 6 were marked one
level lower than the handwritten counterparts. Thus, the mean difference between
the transcribed and handwritten versions was slight, with the transcribed essays
being marked higher. This result contradicts outcomes from earlier studies such
as Arnold et al. (1990) and Powers et al. (1994) in which transcribed essays got
higher marks than handwritten essays.
The contrary finding in the current study seems likely due to the severe time-
constraints imposed on raters. As scores must be available by the next business
day after testing, raters need to finish their work between 11.00 a.m. and 5.00 p.m.
of the test day. Furthermore, there is a great influx of the EPT test takers each
fall semester, and the limited number of raters must handle 80–100 essays in
this limited time. Under such time pressure, as Sloan and McGinnis (1978) also
reported, readers evaluating many essays as rapidly as possible tended to assign
lower scores to messy handwritten essays than to neat ones because raters who
had to pass essays quickly to another reader were not able to devote a sufficient
amount of time to deciphering messy handwriting.
Indeed, our raters agreed in the follow-up interview that when they read severely
illegible essays, they had been more likely to give low scores to the essays. The
14 H.K. Lee / Assessing Writing 9 (2004) 4–26
raters added that the illegible essays presented 15–20% of the total number of
essays, and they interrupted the smooth flow of reading and impaired their focus
on content. Therefore, in a time-constrained testing condition for writers and raters
alike, handwriting may play a negative role.
Regardless of the non-significant statistical outcome, the placement results in
the handwritten essays and transcribed counterparts in the discrepant 14 cases
would warrant substantial consideration. About a third of subjects who might
have been placed at a different level had their essays been written on a computer
would get disservice from their writing class simply due to raters’ error. Extending
these results to the entire EPT population, this is a harmful factor impairing the
validity of the EPT. Furthermore, this result prompts a reconsideration of the
comparison of results between computer essays and handwritten essays analyzed
above. Admittedly, the existence of the handwriting effect shown in the 14 cases
suggests that the better scores on the computer-written essays might have been
confounded with the handwriting effect along with subjects’ real performance
differences.
To conclude, for a third of all the essays, the measurement of the transcribed
essays differed from that of the handwritten counterparts. This finding calls our
attention to rater training. It strongly suggests that test administrators need to be
alert and train raters to focus on just the content and immunize themselves to text
appearance. Rater training will be re-addressed in a later discussion.
Descriptive statistics for the feature analysis are presented in Table 5. The feature
scores of each essay were the average of the marks assigned by two raters. Table 5
shows that across all features, essays in the computer-based EPT displayed the
highest means, and the essay pairs of paper and transcribed versions had similar
means.
The repeated-measures MANOVA revealed highly significant test type effects
(P < .001) for the features (Wilks’ Lambda = 0.504, F = 4.187 with df =
8). In repeated-measures MANOVA, it is important to satisfy the assumption
of sphericity, equality of variances of differences for all pairs of levels of the
repeated-measures factor. This is tested by Mauchley’s sphericity test. The absence
Table 5
Descriptive statistics in the feature analysis (n = 42)
Paper EPT Transcribed Computer EPT
Feature M SD M SD M SD
Table 6
Univariate ANOVA for each feature
Source Measure SS df MS F P>F
Essay Organization 2.90 1.71 1.70 11.42 .000
Content 2.20 1.75 1.25 9.18 .001
Source use 2.53 1.49 1.70 5.93 .009
Linguistic expression 1.23 1.62 1.62 5.33 .011
Table 7
Tests of pairwise contrasts for the four features
Feature Essay Mean difference P>F
as the feature scores were concerned, subjects performed better overall on the
computer-delivered essay test than on the paper-based test. The degree of signif-
icance of the mean difference was see to be highest in the organization feature,
followed by content.
Predictably, organization was enhanced in the computer-based domain, likely
due to the extended time for planning and review as in Kellogg’s study (1994). This
claim was supported by subjects’ responses to the survey item 21, which asked if
they could organize better on the computer EPT. Approximately 60% of subjects
marked on the positive two scales for this item.2 Additionally, half of the subjects
thought that reviewing the text was easier than in the paper EPT.
Contrary to the findings of Freedman and Clarke’s study (1988) in which com-
puter writers made no improvement in the overall content due to the greater efforts
they devoted to correcting local and surface errors, this study discovered enhanced
content with computer-writing. The improved content might be ascribed to rea-
sons similar to those influencing the organization feature. Better structuring and
reviewing of the text appeared to result in better content. It is also possible that, as
Rodrigues (1985) demonstrated, the highly readable screen display might encour-
age more reading of the text, resulting in more in-depth revision. Further, over half
the subjects said that they could write more sentences and thus expected better re-
sults. Although quantitative text-increase did not necessarily indicate a qualitative
improvement, adding text may tend to enrich the content overall.
The enhancement of the linguistic expression feature with computer-writing
seems to imply that subjects did not pay attention merely to the local level of
errors. Sixty percent of the subjects answered positively that in the computer EPT
their writing was enhanced by the opportunity for correction and revision. Further,
owing to a more simplified revision process than on paper writing, subjects were
able improve their writing sample by replacing awkward expressions during the
revising process.
On the other hand, it is very hard to explain the more effective use of sources in
the computer-written essay. The computer-writing process, facilitated throughout
the task by word-processing tools, might contribute to making more use of sources.
Perhaps due to the alleviated physical labor involved in writing and revising by
computer, the subjects were able to read and re-read the given article. In addition,
the individually delivered video prompt could give rise to a more focused listening
activity, better comprehension, and therefore more facility with using the sources.
Finally, some discussion is required to explain the fact that while the holistic
grading did not yield significant mean differences between computer-delivered
and paper-based essays, analytic rating resulted in significant improvement of all
features with the computer-delivered EPT. It is likely that impressionistic holistic
ratings measured different things from analytic assessment. Although the feature
analysis forms were developed based on a close consideration of holistic bench-
2 Subsequently, when survey results are discussed, agreement rate is measured on marks on two
marks, raters’ judgments in the holistic scale may encompass something not scru-
tinized in single features. Or, as Hayes, Hatch, and Silk (2000), Huot (1990), and
Vaughan (1993) cast doubt on the reliability of holistic measurement, differences
might be ascribable to greater rater error involved in holistic assessment.
Given the moderate inter-rater reliability for all essay types with feature anal-
ysis, and the nature of feature analysis involving a more thorough and focused
reading, the results of the feature analysis deserve more emphasis in evaluating
writers’ performance on both the computer-delivered and paper-and-pencil based
essay tests. Taken together, this study outcome suggests that subjects perform
better on the computer EPT than they do on the paper-based EPT.
Several limitations of the study should be noted. First of all, admittedly, the
better performance on the computer EPT might have been conflated by the order
effects present in the study. Indeed, some subjects commented on the survey that
the time was more manageable in the second test. Also, the second test could not
avoid the subjects’ increased awareness of the test format and practice. Regrettably,
because of logistical problems involved in the subject solicitation procedure, I
could not counter-balance the order of test occasions in the current study.
Second, although I tried to minimize the effect of topics, the subjects’ different
perceptions of topic difficulty in the two tests might have confounded the findings.
This point was also supported by the subjects’ responses to the survey. Still, it is not
clear, and is open to further study, whether or not the level of difficulty perceived
by writers indeed affects test performance, and to what extent topics affect writers’
performance.
Finally, because the subjects were all volunteers, and their number was small,
generalization of the study findings needs special care. When soliciting subjects,
I placed emphasis on their familiarity with computer-writing. Hence, the subject
group was not representative of EPT test takers as a whole, but probably of habitual
computer writers. Nevertheless, as revealed in the survey, it could not be assumed
that the subjects were randomly selected from a population of computer writers,
because some of them admitted that they were not, but rather took the computer
test for other reasons. Therefore, the study results would not be applicable to all
computer writer groups or to the general population of EPT task takers.
6. Further implications
This study brings rater training to particular attention. To date, the rater recali-
brating procedure for most placement tests has not included a lesson for the possible
impact of textual image. Given the significantly lower figure of inter-reliability in
H.K. Lee / Assessing Writing 9 (2004) 4–26 19
3 The enhanced EPT is designed based on the notion of process-oriented writing. Students are given
approximately five and a half hours for their writing activities including brainstorming, peer-feedback,
self-evaluation, and rewriting.
20 H.K. Lee / Assessing Writing 9 (2004) 4–26
Acknowledgments
Grateful thanks are expressed to the EPT Supervisor, Dr. Fred Davidson. I am
also grateful to two anonymous Assessing Writing reviewers for their valuable
comments. The remaining errors are solely my responsibility.
References
Akyel, A., & Kamisli, S. (1999). Word processing in the EFL classroom: Effects on writing strategies,
attitudes, and products. In: M. C. Pennington (Ed.), Writing in an electronic medium: Research
with language learners (pp. 27–60). Houston: Athelstan.
American Educational Research Association, American Psychological Association, & National Coun-
cil on Measurement in Education. (1999). Standards for educational and psychological testing.
Washington DC: American Educational Research Association.
Arnold, V., Legas, J., Obler, S., Pacheco, M. A., Russell, C., & Umbdenstock, L. (1990). Do students get
higher scores on their word-processed papers? A study in scoring hand-written vs. word-processed
papers. Unpublished paper, Rio Hondo College, Whittier, CA.
Arter, J., & Spandel, V. (1992). Using portfolios of student work in instruction and assessment. Edu-
cational Measurement: Issues and Practice, 11 (1), 36–44.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University
Press.
Benesch, S. (1987). Word processing in English as a second language: A case study of three non-native
college students. (Available ERIC: ED281383.)
Bridgeman, B., & Cooper, P. (1988). Comparability of scores on word-processed and handwritten
essays on the Graduate Management Admissions Test. Paper presented at the Annual Meeting of
the American Educational Research Association, San Diego, CA.
Bridwell-Bowles, L., Johnson, P., & Brehe, S. (1987). Composing and computers: Case studies of
experienced writers. In: A. Matsuhashi (Ed.), Writing in real time: Modeling composing processes
(pp. 81–107). Norwood, NJ: Ablex.
Burley, H. (1994). Postsecondary novice and better than novice writers: The effects of word processing
and a very special computer assisted writing lab. Paper presented at South-western Educational
Research Association Meeting, San Antonio, TX.
Chadwick, S., & Bruce, N. (1989). The revision process in academic writing: From pen & paper to
word processor. Hong Kong Papers in Linguistics and Language Teaching, 12.
Chapelle, C. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–
272.
Chase, C. I. (1986). Essay test scoring: Interaction of relevant variables. Journal of Educational Mea-
surement, 23, 33–41.
Cho, Y. S. (2001). Examining a process-oriented writing assessment in a large-scale ESL testing
context. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.
Collier, R. (1983). The word processor and revision strategies. College Composition and Communica-
tion, 34, 149–155.
H.K. Lee / Assessing Writing 9 (2004) 4–26 21
Collier, R., & Werier, C. (1995). When computer writers compose by hand. Computers and Composi-
tion, 12, 47–59.
Daiute, C. (1985). Writing & computers. Addison-Wesley.
Dorner, J. (1992). Authors and information technology: New challenges in publishing. In: M. Sharples
(Ed.), Computers and writing: Issues and implementation (pp. 5–14). Dordrecht: Kluwer Academic
Publishers.
Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge University
Press.
Freedman, A., & Clarke, L. (1988). The effect of computer technology on composing processes and
written products of grade 8 and grade 12 students (Education and Technology Series). Toronto:
Ontario Department of Education.
Grejda, G. F., & Hannafin, M. J. (1992). Effects of words processing on sixth graders’ holistic writing
revisions. Journal of Educational Research, 85, 144–149.
Haas, C. (1989). How the writing medium shapes the writing process: Effects of word processing on
planning. (Available ERIC: EJ388 596.)
Harrington, S. (2000). The influence of word processing on English placement test. Computers and
Compositions, 17, 197–210.
Hawisher, G. E. (1987). The effects of word processing on the revision strategies of college freshman.
Research in the Teaching of English, 21, 145–159.
Hayes, J. R., & Flower, L. S. (1981). Identifying the organization of writing process. In: L. W. Gregg &
E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3–30). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Hayes, J. R., Hatch, J. A., & Silk, C. M. (2000). Does holistic assessment predict writing performance?
Written Communication, 17, 3–26.
Huot, B. (1990). Reliability, validity, and holistic scoring: What we know and what we need to know.
College Composition and Communication, 41, 201–213.
Kellogg, R. T. (1994). The psychology of writing. New York, NY: Oxford University Press.
Kellogg, R. T. (1996). A model of working memory in writing. In: C. M. Levy & S. Ransdell (Eds.),
The science of writing: Theories, methods, individual differences, and applications (pp. 57–72).
Mahwah, NJ: Lawrence Erlbaum.
Kim, J. (2002). Computer-delivered ESL Writing Placement Test at the University of Illinois at
Urbana-Champaign. Unpublished masters thesis, University of Illinois at Urbana-Champaign.
Kroll, B. M. (1985). Social-cognitive ability and writing performance: How are they related. Written
Communication, 2, 293–305.
Lam, F. S., & Pennington, M. C. (1995). The computer vs. the pen: A comparative study of word
processing in a Hone Kong secondary classroom. Computer-Assisted Language Learning, 7, 75–
92.
Lewkowicz, J. A. (2000). Authenticity in language testing: Some outstanding questions. Language
Testing, 17, 43–64.
Li, J., & Cumming, A. (2001). Word processing and second language writing: A longitudinal case study.
In: R. Manchon (Ed.), Writing in the L2 classroom: Issues in research and pedagogy. International
Journal of English Studies, 1, 127–152.
Liu, H. (1997). Constructing and validating parallel forms of performance-based writing prompts in an
academic setting. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training.
Language Testing, 12, 54–71.
Markham, L. R. (1976). Influences of handwriting quality on teacher evaluation of written work.
American Educational Research Journal, 13, 277–283.
McNamara, T. (1996). Measuring second language performance. London: Longman.
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance
assessment. Educational Researcher, 23, 13–23.
Moss, P. A. (1994). Can there be validity without reliability? Educational Researcher, 23, 5–12.
22 H.K. Lee / Assessing Writing 9 (2004) 4–26
Norman, D. A. (1989). Cognitive artifacts. In: J. M. Carroll (Ed.), Designing interaction: Psychology
at the human-computer interface (pp. 17–38). New York: Cambridge University Press.
O’Malley J. M., & Pierce, L. V. (1995). Authentic assessment for English language learners, practical
approaches for teachers. Addison-Wesley.
Pennington, M. C. (1996). The computer and the non-native writer: A natural partnership. Cresskill,
NJ: Hampton Press.
Pennington, M. C. (2003). The impact of the computer in second language writing. In: B. Kroll (Ed.),
Exploring the dynamics of second language writing (pp. 287–310). Cambridge, UK: Cambridge
University Press.
Phinney, M., & Khouri, S. (1993). Computers, revision, and ESL writers: The role of experience.
Journal of Second Language Writing, 2, 257–277.
Powers, D. E., Fowles, M., Farnum, M., & Ramsey, P. (1994). Will they think less of my handwritten
essay if others word process theirs? Effects on essay scores of intermingling handwritten and
word-processed essays. Journal of Educational Measurement, 31, 220–233.
Rhodes, B. K., & Ives, N. (1991). Computers and revision — Wishful thinking or reality? (Available
ERIC: ED 331 045.)
Rubin, D. L. (1984). Social cognition and written communication. Written Communication, 1, 211–245.
Schwatz, H., Fitzpatrik, C., & Huot, B. (1994). The computer medium in writing for discovery. Com-
puters and Composition, 11, 137–149.
Sharples, M. (1994). Computers support for the rhythms of writing. Computers and Composition, 11,
217–226.
Sloan, C. A., & McGinnis, L. (1978). The effect of handwriting on teachers’ grading of high school
essays. (Available ERIC: ED 220 836.)
Spandel, V., & Stiggins, R. J. (1980). Direct measures of writing skill: Issues and applications. Portland,
OR: Northwest Regional Educational Development Laboratory.
Sweedler-Brown, C. O. (1991). Computers and assessment: The effect of typing versus handwriting
on the holistic score of essays. Research and Teaching in Developmental Education, 8, 5–14.
Vaughan, C. (1993). Holistic assessment: What goes on in the rater’s mind? In: Hamp-Lyons, L.
(Ed.), Assessing second language writing in academic contexts. Norwood, NJ: Ablex Publishing
Corporation.
Warschauer, M. (1996). Motivational aspects of using computers for writing and communication. In:
Warschauer, M. (Ed.), Wtelel collaboration in foreign language learning (pp. 29–46). Honolulu,
HI: University of Hawaii Second Language Teaching and Curriculum Center.
Widdowson, H. G. (1979). Explorations in applied linguistics. Oxford: Oxford University Press.
Wolfe, E. W., Bolton, S., Feltovich, B., & Niday, D. M. (1996). The influence of student experience
with word processors on the quality of essays written for a direct writing. Assessing writing, 3,
123–147.
Zimmermann, R. (2000). L2 writing: Subprocesses, a model of formulating and empirical findings.
Learning and Instruction, 10, 73–99.
Directions: This questionnaire is given to you to find out general writing process
and test-taking experiences. Read each question carefully and choose the choice
that best represents your opinion.
H.K. Lee / Assessing Writing 9 (2004) 4–26 23
Strongly Strongly
disagree agree
1 I liked the computer EPT better than the 1 2 3 4 5
paper-and-pencil based EPT.
2 I could write better in the computer EPT 1 2 3 4 5
than at the paper EPT.
3 I felt more comfortable at the computer EPT. 1 2 3 4 5
4 The computer EPT provided an easier 1 2 3 4 5
writing situation than the paper EPT.
5 Writing on a computer represents my 1 2 3 4 5
habitual writing situation better than the
paper EPT.
6 Most of the time, I type on a computer when 1 2 3 4 5
I cope with my writing workload.
7 I usually use both writing methods (writing 1 2 3 4 5
on a paper and on a computer) when I
compose an essay.
8 I think my writing process differs when I 1 2 3 4 5
write on a computer from when I do on a
paper.
9 I usually write better when I write on a 1 2 3 4 5
computer.
10 In my case, the writing tool does not make 1 2 3 4 5
any difference in terms of the writing quality.
11 I think the difficulty of the topic was the 1 2 3 4 5
same in the two tests.
12 At the computer EPT, I spent more time 1 2 3 4 5
organizing ideas than at the paper EPT.
13 I spent more time translating my ideas into 1 2 3 4 5
the texts than at the paper EPT.
14 I spent more time reviewing the texts than at 1 2 3 4 5
the paper EPT.
15 At the computer EPT I spent more time 1 2 3 4 5
correcting spelling and grammatical errors
than at the paper EPT.
16 Do you feel that you are misplaced based on 1 2 3 4 5
your score for the paper test?
17 Do you expect you will be placed 1 2 3 4 5
appropriately on computer test rather than
paper test?
18 If you answer ’YES’ in question 17, why do
you feel like that?
24 H.K. Lee / Assessing Writing 9 (2004) 4–26
Strongly Strongly
disagree agree
Only if you think you wrote better at the computer-delivered EPT,
please proceed to the next items. Otherwise please skip to item 25.
19 I wrote better because the given topic was 1 2 3 4 5
easier than the one before.
20 I wrote better because I was able to write 1 2 3 4 5
more sentences.
21 I wrote better because I was able to organize 1 2 3 4 5
better.
22 I wrote better because I was able to correct 1 2 3 4 5
more errors.
23 I wrote better because reviewing the text was 1 2 3 4 5
easier.
24 I wrote better because functions in a 1 2 3 4 5
computer such as ‘copy and paste’ promote
easier text development.
25 If you have any comments or suggestions
regarding the computer EPT, write them below.
Too low
• Insufficient length
• Extremely bad grammar
• Doesn’t write on assigned topic; doesn’t use any information from the
sources
• Majority of essay directly copied
• Summary of source content marked by inaccuracies
ESL 400
• May contain an Intro, Body, and Conclusion (generally simplistic)
• Does not flow smoothly; hard to follow ideas
• Lacks development and/or substantial content
• May be off topic
• Little or no use of sources to develop ideas
• Lacks synthesis (of ideas in the two sources or of source and the student’s
own ideas)
• Summary of source content contains minor inaccuracies (details) and
possibly major inaccuracies (concepts)
• Lacks sophistication in linguistic expression; little sentence variety and
sentence complexity not mastered
H.K. Lee / Assessing Writing 9 (2004) 4–26 25
Directions: There are four statements in a feature. Please mark on a statement which you
think best represents the feature of the essay.
ORGANIZATION
_____ No organization of ideas, insufficient length to ascertain organization
_____ May contain an Intro, Body &Conclusion (generally simplistic) lack of paragraph
and essay cohesion (doe not flow smoothly, hard to follow ideas)
_____ Usually contains an introduction, body, conclusion (reasonable attempt); some
paragraph and essay cohesion
_____ Clear plan; excellent introduction, body, conclusion; cohesion at paragraph and
essay levels.
CONTENT
_____ Doesn’t write on assigned topic
_____ May be off topic, Lacks develop and/or substantial content,
_____ Some development or elaboration of ideas, Some use of sources and develop
ideas, Writes on topic
_____ Main idea developed well. Substantive content and effective elaboration
LINGUISTIC EXPRESSION
_____ Extremely bad grammar; totally incomprehensible. No sentence variety or
complexity
_____ Grammatical/lexical inaccuracies are frequent and impede understanding;
awkwardness. Lacks sophistication in linguistic expression; little sentence variety
and sentence complexity not mastered, Little sentence variety and sentence
complexity not mastered.
_____ Some grammatical/lexical errors, but still comprehensible; some sentence variety
and complexity, Neither simplistic and awkward nor smooth and sophisticated
_____ Strong linguistic expression in terms of grammar, vocabulary and style, May be
sophisticated with lexical and sentence variety and complexity
USE OF SOUCES
_____ Doesn’t use any information from the sources, Majority of essay directly
copied Summary of source content marked by inaccuracies.
_____ Lacks synthesis (of ideas in the two sources or of source and the students’ own
ideas), Attempts at paraphrase are generally unskillful and inaccurate. Summary of
source content contains minor inaccuracies (details) and possibly major
inaccuracies (concepts)
_____ Some synthesis (of ideas in two sources or of source and the student’s own ideas)
Summary of source content may contain minor inaccuracies (details) Moderately
successful paraphrase
_____ Good synthesis, Summary of source content usually accurate. Effective skillful
paraphrase