Studies in Educational Evaluation: Yu-Ju Hung
Studies in Educational Evaluation: Yu-Ju Hung
A R T I C L E I N F O A B S T R A C T
Keywords: While studies of peer assessment (PA) of both written and oral performance are more common in higher edu-
Peer assessment cation settings, particularly in first-language contexts, PA’s potential for familiarizing elementary-level students
Oral presentation with assessment criteria, empowering them to gain ownership of their learning, and developing their motivation
Alternative assessment and collaborative skills is less well understood. This study investigated the implementation of group PA of oral
Young learner
performance in English-as-a-foreign-language (EFL) classes in Taiwan. A mixed methodology research design
EFL
integrated analysis of teacher- and peer-assessment ratings for each presenting group, post-assessment survey
data, and an instructor interview, documenting the perceptions of and attitudes toward PA of 130 upper ele-
mentary students (ages 10–12) and their instructor. The results show that the ratings by fifth and sixth graders,
but not fourth graders, were significantly correlated with those of the instructor. Practical and research im-
plications for future implementation of PA of oral performance are discussed.
https://doi.org/10.1016/j.stueduc.2018.02.001
Received 7 May 2017; Received in revised form 21 January 2018; Accepted 11 February 2018
0191-491X/ © 2018 Elsevier Ltd. All rights reserved.
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
collaborative activities to uncover tensions and conflicts that need to be TA in the groups with verbal feedback, leading to the conclusion that
addressed for future practice. formulating and delivering verbal feedback enhanced students asses-
sing ability.
2. Benefits and challenges of peer assessment practice
2.2.2. As support for interaction and collaboration
Overall, PA has been studied primarily in higher education (e.g., De In addition to providing a source of assessment and feedback to
Grez, Valcke, & Roozen, 2012; Falchikov, 2001; Hughes & Large, 1993; sharpen students’ oral performance skills, PA can develop collaborative
Langan et al., 2005, 2008). In English-as-a-foreign-language (EFL) set- and teamwork skills (Riley, 1995). In a study of Dutch secondary school
tings, PA research has focused on writing in tertiary schools (e.g., Tsai & students, van Gennip, Segers, and Tillema (2010) investigated how
Chuang, 2013; Suzuki, 2009; Min, 2006). Research on group PA of oral participation in PA affected interpersonal variables. Survey results in-
skills in EFL settings at the elementary level is limited. Therefore, the dicated changes in psychological safety, value diversity, and trust in the
following is a discussion of the general characteristics of PA and its peer as an assessor, showing that the students perceived it to be safe to
implementation in a variety of settings, leading to the rationale for the take interpersonal risks in a group and could more easily accommodate
present study. different opinions among group members to reach a consensus. The
findings stressed positive effects of PA practice on developing some
2.1. Definition of PA interpersonal variables important for group work. However, the reasons
underlying why PA contributed to making a change in the variables
In line with social constructivism’s focus on social interaction as a required further investigation.
vehicle for acquiring new knowledge and skills (Vygotsky, 1978), Another value of PA-related collaboration is that objectivity can be
Strijbos and Sluijsmans (2010) defined PA as “an educational ar- developed in group discussion. Peng (2010) investigated high-inter-
rangement where students judge a peer's performance quantitatively mediate and low-intermediate classes in a Taiwanese university that
and/or qualitatively and which stimulates students to reflect, discuss practiced both group-to-group PA of oral presentations and within-
and collaborate” (p. 265). Thus, PA fulfills the double duty of assess- group PA to assess each member’s contributions to the group’s work.
ment and learning (Boud, 2000). Furthermore, the present study ex- The results showed an increase in favorable attitudes toward PA and
tends the current PA literature to include between-group PA, in which strong reliability of PA scores at both proficiency levels. The students
students evaluate their classmates’ performances via discussing and reported that PA helped increase their participation and their motiva-
giving group ratings and comments. The process engages students in tion for learning, and the group discussion helped them to be more
high level cognitive and discursive processes such as questioning, ne- objective in group-to-group PA, which decreased their tendency to
gotiating, and articulating their thoughts (Kollar & Fischer, 2010). over-mark, a common issue raised in previous studies (Sluijsmans,
Dochy, & Moerkerke, 1998). However, student ratings in within-group
2.2. Benefits of PA PA were much higher than in group-to-group PA, perhaps due to a
friendship effect.
2.2.1. As an alternative assessment
Studies affirming the reliability and potential learning benefits of 2.3. Challenges of PA
PA as an alternative assessment in L2 contexts have primarily involved
adult learners (Falchikov & Goldfinch, 2000; Topping, 2003; van Despite positive reactions to doing PA, collaboration may also in-
Zundert, Sluijsmans, & van Merriënboer, 2010). In a study of Taiwanese volve issues. Hung, Samuelson, and Chen (2016) explored relationships
EFL university students in an oral training course, who received PA between peer-, self- (SA), and teacher assessments of sixth graders’ (age
training and decided on evaluation criteria with the instructor, Chen 12) English oral presentations in Taiwan. After each student’s pre-
(2006, p. 7) found that PA and TA were highly correlated (r = 0.87). sentation, the student did SA while the other students did PA in groups.
The students reported that the practice helped them as performers to The comparisons of ratings indicated strong correlation between PA
develop speaking skills and think critically, and as assessors to gain and TA and moderate correlation between SA and TA. The researchers
confidence in evaluation and commentary. Nevertheless, some pro- argued that playing the role of the teacher motivated the student as-
posed a lower ratio for student ratings in their final grades, implying a sessors to assess fairly and improved their own presentations by re-
lack of trust in their fellow students’ assessment and a preference for the flecting on their classmates’ performances in group discussions. How-
authority of the instructor in grading. Learners’ reliance on instructor as ever, some learners were still concerned that grades assigned by peers
the sole assessor and lower confidence in assessing may make them were not fair and a few group members dominated the grading process.
reluctant to render strong judgements of peers. Shimura (2006) im- The issues of lack of proficiency and violating friendship norms have
plemented PA at a Tokyo university, in which each student’s short been found to be the downside of PA. In a case study of four Hong Kong
presentation was assessed by three peers, who were only briefly in- secondary EFL students who were considered weak in English and less
troduced to the practice of PA. The students’ and instructor’s mean confident in themselves, Mok (2010) found that assessing others fru-
scores were not significantly different, but the instructor’s standard strated the students because they felt that peers were not qualified to
deviation was much larger, suggesting that students are less willing to evaluate each other. They also reported that the evaluation form, which
assign extremely high or low scores. provided smiley, neutral, or sad faces for each category, did not facil-
Such evidence of student reticence while evaluating peers under- itate cognitive engagement as specific criteria for feedback might. In a
scores the need for careful planning and preparation before im- study of Japanese university students’ perceptions of PA in an EFL
plementing PA (Cheng & Warren, 2005), including setting clearly public speaking course, White (2009) also found that although students
itemized criteria (Chen, 2006) and sufficiently training student asses- recognized the advantages of PA, some found it uncomfortable. The
sors (Saito, 2008) to assure they take into account a balance of criteria students rehearsed providing PA for mini-presentations and then con-
rather than focus excessively on interactional features such as gestures ducted PA during the mid-term and final presentations. The results
and facial expressions (Shimura, 2006). PA along with peer feedback showed that, while the majority of the students considered PA to be fair
has also been found to increase student-teacher agreement. Patri (2002) and helpful in planning and delivering their own presentations, some
examined practice of PA in small working groups in a Hong Kong voiced concerns about inconsistent application of criteria, and some
university. Each group member’s oral presentation was assessed by the were concerned about lowering their peers’ grades.
other members, who gave ratings either with or without accompanying Bryant and Carless (2010) found that Hong Kong elementary stu-
verbal feedback. The results showed higher agreement between PA and dents still viewed their teachers as the only trustworthy assessors in a
20
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
test-dominated school culture. In this study, fifth grade students used Table 1
criteria to judge their classmates’ writings and give comments. Al- Participants’ Demographic Data.
though awareness that others would read their writings encouraged
Grade 4th Grade 5th Grade 6th Grade Total
them to take their writing seriously, some students considered their
own language proficiency as too low to assess others’ writing and # of Classes 2 2 1 5
tended to avoid giving negative comments that might upset their Gender M F M F M F M F
33 34 22 25 15 1 70 60
classmates. On the other hand, they considered peer feedback unhelpful
# of Students 67 47 16 130
when not sufficiently critical. # of Groups 12 8 3 23
In sum, research on the use of group PA with elementary English
learners is scarce. Tangential studies indicate that benefits of the ap-
proach pertain largely to the interactive environment PA fosters while Table 1 for additional demographic details).
challenges are complex, including issues of confidence, legitimacy, and
group unanimity. To shed more light on the implementation of PA with 3.3. Group seating arrangement
young learners, this study investigated how group PA was practiced as
formative assessment of the oral performance of EFL learners in a To encourage daily communications, the instructor arranged the
Taiwanese elementary school, guided by two research questions: seats in the classroom so that students faced each other in groups who
RQ1 How do Taiwanese elementary EFL students’ group ratings of worked together to do their role plays and PA. Her instructional ap-
peers’ oral performance compare with their teacher’s ratings? proach involved a lot of teacher-student and student–student interac-
tion, role-plays, and English games and songs, to which students re-
RQ2 How do Taiwanese elementary EFL students and their teacher sponded enthusiastically. Unlike other English classes, which relied
perceive the benefits and the challenges of group PA? solely on summative written tests, Miss Lin conducted PA as formative
oral evaluation in which students conducted group role plays using
3. Research method conversations in the textbook as models.
This section begins with a description of the setting and participants 3.4. Study procedures
and moves on to discuss the study procedures, data sources and
methods of collection, sequence of PA activities, and data analysis. The original design of the present study contained both PA and SA.
Due to the fact that the results of SA were not conclusive and the report
3.1. Setting of PA could be more focused and complete, this paper reports on PA
only. The researcher designed the PA procedures with Miss Lin, who
The setting was a public elementary school (grades 1–6) in southern had attended a two-day workshop on formative assessments held by the
Taiwan, established in 1996 to serve a new high income suburban MOE. Miss Lin then practiced PA in a pilot demonstration for six weeks
community. Students’ parents were usually supportive when the school and followed the same procedures with some revision in the formal
teachers implemented innovative teaching approaches. This school was study for another six weeks. An important alteration in procedures from
regarded as a high-performing school as its teachers and students had the pilot to the formal study was changing the evaluation rubrics from a
received several government achievement awards. Students were di- binary to a five-point scale because students had difficulty making
vided into 30 classes of 20–35 members, depending on the total en- binary decisions about fluency levels and other criteria. The 12-week
rollment in an academic year, about 800 during the study. The ethnic intervention took place in the five target classes for a total of 120 class
distribution was approximately 90% Taiwanese, and 10% Hakka (an periods, with one class hour at most dedicated to the PA activity each
ethnic Chinese group comprising 15–20% of Taiwan’s population) or week. The scheduled course content was carried on in the rest of class
the second or third generation of the immigrants from provinces in time.
China or other countries. Therefore, all were native speakers of Chinese Immediately after the intervention, students voluntarily completed
(Mandarin) and not second language learners, as might be the case in a Chinese version of a questionnaire made up of 11 closed and 3 open-
other settings where immigrants are involved. ended questions. Also, the researcher conducted an approximately 90-
Although national educational policy mandates the study of English min semi-structured instructor interview to explore Miss Lin’s reflec-
from the fifth grade, local policy required all students to start studying tions on the implementation of PA and her concerns. The interview was
English in the second grade. Also the local government developed the conducted in Chinese and recorded and transcribed.
assigned textbooks, The Enjoy series (Shen et al., 2001), which, in
keeping with the focus of the new curriculum on speaking and listening 3.5. Data sources
(Chern, 2002), featured daily conversations on various topics. The
students attended two 40-min English classes every week. Data included the students’ and teacher’s ratings, the post-assess-
ment student survey, classroom observations, and an instructor inter-
3.2. Participants: teacher and students view. Because students were involved in developing evaluation criteria
to help them develop a sense of ownership (Topping, 2009), the rubrics
Miss Lin (pseudonym), the EFL teacher of the focal classes, com- varied slightly across the classes. The teacher created the following
pleted the PA activities in five fourth through sixth grade classes (ages criteria for rubrics based on the students’ suggestions (indicated within
10–12). All classes having the same instructor eliminated any issue with parentheses): for fourth and sixth grade classes, which were the same,
teacher variability. Eight students did not complete the study due to fluency (no unnecessary pauses), pronunciation (saying words and
absences or lack of parental consent. The remaining 130 students were sentence intonation correctly), collaboration (smooth turn-taking), and
included in this study. Though all the students were at beginning or creativity (interesting content); for both fifth grade classes, use of
near beginning proficiency levels, the teacher further distinguished props, performance (acting and gesturing), clarity (saying words and
levels in each class based on test scores and performance the previous sentences correctly and loudly), and collaboration (smooth turn-taking)
year. These stratifications were used in creating groups for the study. as criteria. The fifth grade classes’ rubrics differed on one criterion in
Each group of 5–6 members comprised students of mixed proficiency that one used courtesy (including an opening and a closing) and the
levels and was assigned a higher achiever as the group leader (see other content (including more sentences/information).
21
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
22
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
Table 2 Table 4
Coding Scheme for the Qualitative Data. Descriptive Statistics for Peer Group Ratings and Instructor Ratings.
benefits 1 positive affect 1-A I like scoring others All Peer Groups
2 cognition/metacognition 2-A I watch other’s performance Peer Rating 16.9 2.9 5 25 23
2-B I get comments to improve myself Instructor Rating 19.9 3.8 11 25 23
2-C I improve my speaking ability
4th Grade
3 increased interaction 3-A I discuss with others
Peer Rating 17.8 1.4 11 25 12
3-B I interact with classmates more
Instructor Rating 21.3 2.6 16 25 12
challenges 4 difficulty of accepting 4-A I don’t like that classmates have
5th Grade
comments different opinions about ratings
Peer Rating 15.5 4.3 9 25 8
4-B I don’t like being criticized
Instructor Rating 18.4 4.5 13 25 8
4-C I think that others give biased
grades 6th Grade
5 lack of collaboration 5-A Classmates don’t collaborate Peer Rating 17.5 1.3 9 22 3
5-B Classmates don’t participate in Instructor Rating 18.3 5.5 13 24 3
discussion
6 poor time management 6-A Insufficient time for PA The minimum and maximum scores refer to original peer group ratings instead of mean
6-B Would like to allow more time for ratings to reflect differentiation of student assessment.
PA
Table 5
Students' Perceptions of PA (all grades).
3.7.3. Qualitative data
The instructor interview and students’ responses to the three open- Item SA-5 A-4 N-3 D-2 SD −1
ended questions expressing their attitudes toward this learning ex-
1. Generally speaking, I like PA activities. 77 44 6 0 3
perience were analyzed recursively (Bogdan & Biklen, 1998). To ensure
59.2% 33.8% 4.6% 0% 2.3%
inter-coder agreement, a bottom-up procedure was adopted, beginning 2. I think that PA is a fair evaluation method. 83 35 7 2 3
with an open coding scheme that identified all preferences and chal- 63.8% 26.9% 5.4% 1.5% 2.3%
lenges reported by the teacher and the students. After the researcher 3. I think that students in my grade can evaluate 75 38 15 1 1
and an invited coder had agreed upon the sub-codes and developed each other.
57.7% 29.2% 11.5% 0.8% 0.8%
themes (codes), each coded all data independently. A Kappa measure 4. PA helps me learn English. 74 38 14 2 2
indicated that agreement was greater than 0.88, an acceptable level of 56.9% 29.2% 10.8% 1.5% 1.5%
inter-rater reliability (Landis & Koch, 1977). Agreement on discrepant 5. PA helps me understand the teacher’s 71 39 16 4 0
codes was reached through discussion, and frequencies of responses requirements better.
54.6% 30.0% 12.3% 3.1% 0%
were calculated to indicate the importance of each sub-code of re-
6. PA stimulates me to try harder to speak 57 42 22 6 3
sponses (Seale & Silverman, 1997). Table 2 shows the coding scheme English.
for the qualitative data. The researcher’s class observations and stu- 43.8% 32.3% 16.9% 4.6% 2.3%
dents’ rubrics were used as Supplementary data. 7. PA improves my English oral presentation 74 37 10 7 2
To ensure the trustworthiness, triangulation (Guba & Lincoln, 1989) skills.
56.9% 28.5% 7.7% 5.4% 1.5%
was employed to establish that the findings were accurately represented 8. PA increases interactions between the teacher 79 32 13 5 1
(Creswell, 2008). The conclusion for the first research question was and the students.
drawn from correlations between teacher’s and students’ group ratings 60.8% 24.6% 10.0% 3.8% 0.8%
and students’ reports in response to Q2, “I think that PA is a fair eva- 9. PA helps me develop a sense of participation. 70 41 11 4 4
53.8% 31.5% 8.5% 3.1% 3.1%
luation method,” and Q3, “I think that students in my grade level have
10. PA activities increase interactions between 78 40 8 3 1
the ability to evaluate each other.” Major data sources for the second students.
research question were the students’ survey responses, their responses 60.0% 30.8% 6.2% 2.3% 0.8%
to the open-ended questions, and the teacher interview. 11. Being graded by peers motivates me to 73 33 15 4 5
participate more in my group.
56.2% 25.4% 11.5% 3.1% 3.8%
4. Results
Open Ended Questions:
4.1. Comparing student and instructor ratings
Q1. What do you like about the PA activity?
Q2. What do you NOT like about the PA activity?
As shown in Table 3, overall, the student groups’ ratings were sig- Q3. How do you want the PA activity to be changed?
nificantly correlated (r = .79, p < .01) with their teacher’s. The mean
scores of the peer groups’ ratings were within one standard deviation of N = 130; Cronbach's alpha coefficient = .873.
SA = strongly agree; A = agree; N = neutral; D = disagree; SD = strongly disagree.
the teacher’s ratings (see Table 4), indicating agreement among the
ratings (Kwan & Leung, 1996).
However, Table 3 also shows that only the fifth graders’ ratings
Table 3
correlated significantly with the instructor’s ratings (r = .95, p < .01),
Correlation between Peer Group Ratings and Instructor Ratings. while the fourth and sixth graders’ ratings did not. In the case of the
sixth graders, the low population (N = 3) calls into question the va-
All Peer 4th Grade 5th Grade 6th Grade lidity of statistical analysis. Also the mean differences between the peer
Groups
groups and the instructor were within one standard deviation for fifth
Instructor Pearson .79 .52 .95 .93 and sixth graders but not for fourth graders (see Table 4), which might
Correlation be due to the younger age of fourth graders and their more limited
Sig. (2-tailed) .000 .084 .000 .246 familiarity with the teacher’s standards.
N 23 12 8 3
Friendship marking (Peng, 2010; Pond, Ul-Hag, & Wade, 1995;
23
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
Table 6
Student Survey Mean Scores.
All Grades 4.62 4.60 4.51 4.43 4.40 4.06 4.60 4.40 4.26 4.49 4.40
4th Grade 4.39 4.39 4.33 4.36 4.31 4.15 4.16 4.39 4.25 4.40 4.15
5th Grade 4.62 4.60 4.51 4.43 4.40 4.06 4.60 4.40 4.26 4.49 4.40
6th Grade 4.44 4.56 4.56 4.38 4.44 4.06 4.31 4.50 4.63 4.69 4.38
White, 2009) and insufficient differentiation in PA (Murphy & responses, nine students expressed the opinion that their classmates
Cleveland, 1995; Shimura, 2006), which have been found in assess- were negatively biased (see Table 7b) As one student stated, “some
ments by adult learners, do not appear in this study. As shown in people like to give other people very low ratings.” Congruent with
Table 4, the mean scores of peer ratings were below the instructor’s Mok’s (2010) and Roskams’s (1999) conclusions, these students were
mean scores in all three grades, indicating that students did not grant unsure about the fairness of PA. Perhaps they felt that their peers gave
favors, and the range of the students’ ratings was greater than the in- them unfairly critical feedback. This might also reflect the entrenched
structor’s in all three grades, suggesting they were discerning in their Asian perception of teachers as the sole legitimate assessors.
evaluations. In terms of effects of PA on their learning, the students reported that
it helped them learn English (Q4), understand the teacher’s require-
4.2. Perceptions of and attitudes towards peer assessment ments better (Q5), try harder to speak English (Q6), and improve their
English oral presentation skills (Q7) (see Table 5), although there was
The second research question addressed participants’ perceptions of more disagreement with Q6 and Q7 than with Q4 and Q5, suggesting
and attitudes towards PA, its benefits, and its challenges. that the benefits of PA were less apparent in their English development
than in their reflections on their peers’ performance. Also, because PA
was conducted in the learners’ L1, they must have realized that it would
4.2.1. Students’perceptions
not be conducive to their L2 learning.
The students perceived the group PA activity as a positive learning Students’ responses were generally positive toward group colla-
experience. The mean score for Q1–Q11 across all three grades was 4.4
boration (see Table 5, Q8-Q11) as they were expected to discuss and
(see Tables 5 and 6), and 93% of the students specifically reported they agree on points within their group. Nevertheless, there was more dis-
liked PA activities (see Table 5, Q1). Their open-ended responses
agreement with Q9, “PA helps me develop a sense of participation” and
(Table 7a) indicated that their reasons for liking PA activities were, in Q11, “Being graded by peers motivates me to participate more in my
order of frequency, being able to be “little teachers” who gave scores to
group” than with the other items. In the open-ended responses shown in
classmates, watching others’ performances, receiving feedback to im- Table 7b, 25 students reported they did not like classmates expressing
prove their performance, improving their speaking ability, and inter-
different opinions (4th grade: N = 18, 27%; 5th grade: N = 7, 15%; 6th
acting with others more than in traditional classrooms. grade: N = 0, 0%). Others felt that their classmates did not work to-
Students’ self-evaluations of their ability to assess peers echoed the
gether (4th grade: N = 9, 13%; 5th grade: N = 1, 2%; 6th grade: N = 0,
agreement between the instructor’s and students’ assessments. About 0%) or participate in discussions (4th grade: N = 10, 15%; 5th grade:
90% of the students considered PA a fair evaluation approach (Q2) and
N = 10, 21%; 6th grade: N = 0, 0%), and some indicated that they
about 87% agreed that students in their grade level had the ability to disliked being criticized by their peers (4th grade: N = 9, 13%; 5th
evaluate each other (Q3) (Table 5). The researcher also observed that
grade: N = 1, 2%; 6th grade: N = 0, 0%). These challenges were mostly
students’ comments were usually similar when peer groups shared their raised by the fourth graders and some by the fifth graders. Moreover,
opinions orally in the whole class reflection. The young learners’ con-
although only three students complained that they did not have enough
fidence that they could evaluate their peers’ oral performance suggests time, 79 suggested that the activity needed more time (4th grade:
that the responsibility they were given in PA could empower them to
N = 38, 57%; 5th grade: N = 29, 62%; 6th grade: N = 12, 75%). This
take ownership of their own learning. finding suggests the importance of allotting enough time for PA. As the
However, whereas students enjoyed the authority the assessor role
researcher observed, the instructor originally allowed about three
gave them as little instructors (Li, 1998; LoCastro, 1996), some still minutes for one group discussion, but it usually took five minutes. To
doubted that their peers could evaluate them. In the open-ended
Table 7a
Coding Scheme for Students' Perceptions of the Benefits of PA (all grades).
1 Positive affect 1-A I like scoring others “I like to score myself and also score other classmates.” (S46_5th Grade) 62
“When doing ratings, because doing ratings lets us know our own weakness and lets us
know how to learn.” (S102_4th Grade)
“Being a little teacher.” (S10_4th Grade)
2 Cognition/Meta- 2-A I watch other’s “I can watch performances. Because some people want good grades, they make their 23
cognition performance performance very interesting.” (S100_4th Grade)
2-B I get comments to improve “When we discuss and assess, we can know our own errors and change them.” (S80_5th 15
myself Grade)
2-C I improve my speaking “PA motivates me to speak more English.” (S20_4th Grade) 8
ability
3 Increased interaction 3-A I discuss with others “I enjoy the feeling that everyone can discuss together. Everyone interacts with each 19
other. This makes everyone do ratings pleasantly.” (S15_4th Grade)
“I can discuss with classmates and interact with them well.” (S98_4th Grade)
3-B I interact with classmates “I enjoy the feeling that everyone can discuss together. Everyone interacts with each 5
more other. This makes everyone do ratings pleasantly.” (S15_4th Grade)
24
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
Table 7b
Coding Scheme for Students' Perceptions of the Challenges of PA (all grades).
4 Difficulty of accepting 4-A I don’t like that classmates have “Some difficulties that I encounter is when doing ratings, some people say 25
comments different opinions about ratings 5 and some say 1. We get angry easily.” (S109_4th Grade)
4-B Doesn’t like being criticized “I don’t like other people to criticize me about the parts that I don’t do 10
well.” (S46_5th Grade)
“Control criticizing language. Don’t be too mean.” (S27_4th Grade)
4-C Thinks that others give biased grades “Some people like to give other people very low ratings.” (S110_4th 9
Grade)
5 Lack of collaboration 5-A Classmates don’t collaborate “Some people don’t participate in the activity and keep forgetting their 20
script.” (S36_5th Grade)
5-B Classmates don’t participate in “I like everything except that some people don’t participate in the 12
discussion discussion.” (S130_4th Grade)
6 Poor time management 6-A Insufficient time for PA “Too little time for discussion.” (S75_5th Grade) 3
6-B Would like to allow more time for PA “When we discuss together in class, it influences time. I hope we can have 79
more time.” (S78_5th Grade)
move the process along smoothly, the instructor would constantly re- recognized that students’ inappropriate feedback created tensions
mind the learners to complete the discussion and decide on the group across the three grades. Although she demonstrated constructive com-
ratings. In examining the qualitative data, the mean scores of the fourth ments, pointed out good PA performances, and gave examples of well-
graders’ responses on almost all the items were slightly lower than chosen vocabulary, and congenial intonation patterns, students some-
those of the fifth and sixth graders (see Table 6), suggesting the ap- times gave harsh comments, which naturally offended the recipients.
proach was more challenging for younger learners. Feelings were also hurt when peers criticized the performance of spe-
cific students instead of giving more holistic feedback to the group, as
the teacher did. For example, one fourth-grade peer group commented,
4.2.2. Teacher’s perceptions “Clear, but needs to be more fluent. Tim didn’t participate,” while the
Table 8 summarizes Miss Lin’s interview responses. In congruence instructor’s comment was, “Each student expressed sentences clearly,
with the students’ general satisfaction with the experience, she per- but they needed to work more on teamwork.” The student peers spe-
ceived that the students seemed to enjoy rating their peers and that PA cifically mentioned the name of the bad performer while the teacher
enhanced their engagement in the group role plays. By attentively ob- avoided it but still indicated the same problem. This is no doubt one
serving and discussing the oral presentations they were evaluating, they difference between discourse by adults and children: children tend to be
became familiar with the criteria for a good presentation to apply to more direct, whereas adults tend to take feelings into account and
their own performances. For example, she witnessed a student re- couch things more diplomatically. Disagreement could also heat up
minding his group members not to make the same mistakes they had within groups. Miss Lin considered such issues as part of the learning
observed in another presentation (see Table 8). She also appreciated the process. Wrapping up an activity with class discussion, the instructor
increased participation and discussion afforded by the activity: specifically asked, “Do you like to be criticized harshly? Of course not.
As students assess each other, they learn what they should do and Then all of us should learn how to give comments.” The whole group
how much they should participate. Learning doesn’t belong just to discussion was conducted in L1 to ensure students’ understanding.
the teacher, but to students. Teachers are just facilitators. When I told them if that one student gives [a rating of] 4 and another gives
students have ownership, they participate actively. In my students’ 1, they can argue about it. Discussion is a process in which everyone
discussion, I found they actually had their own ideas and opinions. can share their opinion and reach a consensus. Then, that is the final
Those students also did a good job in their performance. For a better rating from this group…. Sometimes they would really have a he-
group performance, some students even voluntarily guided the ated discussion. I would just smile because I knew that they would
students who lagged behind. learn how to do it eventually.
However, not every group collaborated successfully, and in some As a result of this experience, Miss Lin decided to focus on teaching
cases high-achievers monopolized the roles of leading discussions and her students how to give and receive comments diplomatically in the
helping others, while some group members remained passive and per- next round of PA.
haps frustrated. Moreover, Miss Lin observed that some students at-
tributed their low ratings to vindictiveness, as occurred when one group
of fourth graders accused another group during a break. Miss Lin also
Table 8
Coding Scheme for Instructor’s Perceptions of PA.
Code Examples
1 Positive affect “Basically students enjoy this activity. From rating others, they learn what to pay attention to.”
2 Cognition/Meta-cognition “When observing the previous group’s performance, the students that haven’t done their role play whisper in their groups to remind each
other not to make the same mistakes.”
3 Increased interaction “Discussion activates students’ learning mechanism and their active participation mechanism.”
4 Difficulty of accepting comments “Students might make some revenge remarks, or they are given low grades…This is also process of learning.”
“You need to give positive comments first. Then give suggestions…Don’t use language like mom is blaming kids. ”
5 Lack of collaboration “Every group needs to include students at different levels…but students with higher achievement may not like to help students with lower
achievement. This needs to be educated.”
6 Poor time management “Spending time on learning is more important than completing the test.”
“Peer assessment can be used with second graders, too. Training is very important. Use simpler language to lower graders.”
25
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
5. Discussion undoubtedly helped them when their turn came to perform. Moreover,
as seen in Peng’s (2010) study, small group PA creates a psychologically
In this study, group PA was found to be a generally successful social safe space for interpersonal risk-taking, so problems common to adult
process of learning to assess and assessing to learn, especially for the learners, such as friendship marking and range restriction, can be
fifth and sixth graders, demonstrated by the significant correlation avoided (van Gennip et al., 2010). The findings in this study suggest
between the fifth graders’ and teacher’s ratings, and in the students’ that student assessors might be more objective in group PA than in-
self-reports of the fairness of PA and their ability to assess each other dividual PA although further research needs to be done to assess this
(Q2–Q3 in student survey). The survey and qualitative data indicate claim as well as the long-term learning outcome of repeated PA activ-
that both the instructor and the students found that PA engages stu- ities.
dents in class, positively affects learning, and empowers students to Nevertheless, the group interaction in collaborative PA should not
take control of their learning. Although the learning of English skills is be idealized (Kollar & Fischer, 2010; Simpson, Mercer, & Majors, 2010).
less evident in this one PA trial practice, the students are found to de- Discussion and collaboration were major challenges for this group of
velop assessment skills and, implicitly, presentation skills through ob- students, most of whom were used to teacher-centered instruction in
serving and reflecting upon their peers’ performances. Although some classes other than English. Although the majority of the students re-
learners, particularly the fourth graders appeared to be frustrated, the ported that the PA activity increased their participation and interaction,
instructor regarded it as a part of learning process and used these in responding to the open-ended questions, 25 students expressed dis-
conflicts as teaching moments. The teacher’s guidance and student in- comfort when their classmates had different opinions about ratings, and
teraction play an essential role in this process. These findings accord 32 complained about some classmates’ lack of participation in group
with the Vygotskian focus on the role of social interaction in learning discussions. One student, unused to making a case for his opinion, was
and the notion of the “zone of proximal development (ZPD)” (Vygotsky, observed to withdraw into silence or chat off-topic when the group did
1978, p. 86), in which a more knowledgeable person provides guidance not accept his idea or invite him to elaborate, showing that not all
as learners develop higher levels of understanding. In a social con- young students may be ready to learn from peers.
structivist framework, the teacher as the guide, the peer assessor as Also peer comments might be expressed or taken as negative criti-
“little teacher,” and the student presenter as learning model can work cism rather than constructive feedback. In the student survey, ten stu-
together to make PA a student-centered, interactional learning occa- dents stated that they did not like being criticized, and nine thought
sion. that other students gave biased grades. These views were expressed
The 86% student agreement that PA increased teacher-student in- more frequently by fourth graders, who had less experience with group
teractions (Q8 in student survey) reflected the teacher’s cultivation of a discussion in English classes and therefore less socialization in the
student-centered classroom and provision of timely guidance such as dialogic learning that is at the center of the process of group rating
mini lessons along the way. Her professionalism echoed Mok’s (2010) (Morita, 2000). To accommodate different levels of readiness, teachers
emphasis on the teacher’s methodological and psychological prepara- must provide appropriate modeling and scaffolding to raise students’
tion for PA activities and ability to help alleviate tensions and conflicts. appreciation of different perspectives and ways to negotiate disagree-
For example, she used her observation that one student gave top ratings ments. While fourth graders’ level of maturity itself might be a possible
to his best friend’s group as a teaching opportunity to remind students reason for the lack of significant correlation between their and their
to behave like teachers and be fair in their evaluations, which would instructor’s ratings, the instructor believed this should not be an in-
affect final grades. hibiting factor and suggested adjusting the procedure by using simpler
About 93% of the students reported that they enjoyed doing PA (Q1 language, switching the five-point scale to a three point scale, and
in student survey), suggesting that with further practice PA could be providing descriptors for scale options to help student assessors make
even more successful. Scoring others, which empowered them as “little decisions.
teachers” in classrooms where teachers still hold sole authority, was the Constructive feedback is a key element in the process of assessment
most frequently given reason why they enjoyed it, suggesting that they for learning (Carless, 2007; Kollar & Fischer, 2010). Conversely, in-
gained confidence in their ability to assess and were developing a sense appropriate comments harm the process. It is essential for teachers to
of responsibility. Responses to the survey (Q5–Q7) supported the two- model appropriate comments and assist students in learning how to
way benefits of PA, as assessors learned from observing their peers’ give and take comments in a positive manner. Giving praise is not in
performances while assessees learned from peer feedback. Most re- itself effective. As Hattie and Timperley (2007) have stated, effective
levant to language learning, some students reported that the PA activity feedback consists of “information relative to a task or performance goal,
motivated them to speak English more in their role plays, which is often in relation to some expected standard, to prior performance, and/
consistent with Pope’s (2005) finding that being assessed by their peers or to success or failure on a specific part of the task” (p. 89). Olson et al.
triggers students to perform more actively. (2012) demonstrated the effectiveness of explicit teaching, modeling
The group PA discussions were conducted in students’ L1 due to and instruction of cognitive and meta-cognitive strategies for English
their low English proficiency, so they would not be inhibited from ex- language learners in mainstream classes in the U.S. Students can learn
pressing themselves because of language barriers. The dialogue pro- to give comments on peers’ oral or written performances using sentence
duced with group members served as a catalyst for their learning as starters such as, “I like ___ because…” or “This could be more effective if
English learners as well as assessors. As Wells (2007) argued, dialogue ….” Instructors can use think-alouds to demonstrate how, when, why to
is the most powerful way of mediating activities that facilitate the co- use them. These strategies help students give constructive feedback and
construction of knowledge, a view in keeping with the social con- avoid conflicts. Only when students work together well can they realize
structivist claim that learning occurs through interaction, conversation, the benefits of collaborative learning.
and experimentation (Mooney, 2000). Group PA in particular fosters a The limitations of this study are those inherent in action research,
linguistically rich environment with meaningful conversation and which features a particular setting and participants and allows for
purposeful peer-to-peer interaction, in contrast with teacher-centered changes in the research design as informed by ongoing practice. These
classrooms in which students are unlikely to have sufficient opportu- included changes in the rubric from the binary choices of the pilot study
nities to speak. Learners could compare and check their understandings to the five-point Likert items in the main study and differences in
of English language and presentation skills used in role plays through rubrics among the classes resulting from students’ input into assessment
discussions in group PA. The process also enables them to position criteria. Although the analysis of correlations between the students’ and
themselves in relation to the performance of their peers (Smyth, 2004; teacher’s ratings was based on the mean ratings across the criteria and
Topping, 2009). In other words, watching other groups perform an analysis of each criterion could not be carried out, this disadvantage
26
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
is offset by the value of allowing variations in the rubrics to reflect the Chern, C. (2002). English language teaching in Taiwan today. Asia-Pacific Journal of
needs and desires of the students. These variations have been accounted Education, 22, 97–105. http://dx.doi.org/10.1080/0218879020220209.
Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction
for in the analysis of the data and do not, we believe, weaken the to tests and measurement. Boston: McGraw Hill.
conclusion concerning the benefits of this aspect of collaborative as- Creswell, J. W. (2008). Educational research: Planning, conducting, and evaluating quanti-
sessment. Another limitation may be that the training for PA, in which tative and qualitative research (3rd ed.). Upper Saddle River: NJ: Pearson/Merrill
Prentice Hall.
the teacher explained its advantages to the students, might have in- De Grez, L., Valcke, M., & Roozen, I. (2012). How effective are self- and peer assessment
fluenced their later evaluations of the approach. Pleasing the teacher at of oral presentation skills compared with teachers' assessments? Active Learning in
this age is still typical. What the teacher says and favors very likely Higher Education, 13, 129–142. http://dx.doi.org/10.1177/1469787412441284.
Denzin, N. K., & Lincoln, Y. (2000). Qualitative research. Thousand Oaks: Sage.
matters with these young learners. The students’ positive feedback Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A
could possibly have resulted from the observer-expectancy effect, at meta-analysis comparing peer and teacher marks. Review of Educational Research, 70,
least in part. Because the research was conducted in actual classrooms, 287–322. http://dx.doi.org/10.3102/00346543070003287.
Falchikov, N. (2001). Learning together: Peer tutoring in higher education. London:
we prioritized providing a positive learning experience and so de-
Routledge Falmer.
termined that the risk of bias was acceptable because the students Guba, E. G., & Lincoln, Y. (1989). Fourth generation evaluation. Newbury Park, CA: Sage.
needed clarification as to the purpose, rationale and procedures of the Harding, L. (2014). Communicative language testing: Current issues and future research.
intervention. We also believed that the six-week gap between the Language Assessment Quarterly, 11, 186–197. http://dx.doi.org/10.1080/15434303.
2014.895829.
training and survey was sufficient time to allow the students’ actual Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research,
experiences to overshadow the initial statements made by the in- 77, 81–112. http://dx.doi.org/10.3102/003465430298487.
structor. Also, multiple data sources were triangulated, minimizing the Huang, H.-T. D., Hung, S.-T. A., & Plakans, L. (2016). Topical knowledge in L2 speaking
assessment: Comparing independent and integrated speaking test tasks. Language
effects of these limitations. Finally, because teacher action research Testing, 35, 27–49. http://dx.doi.org/10.1177/0265532216677106.
frequently addresses a question having to do with the impact of a Huang, S.-C. (2016). Understanding learners’ self-assessment and self-feedback on their
particular intervention or policy (Lytle & Cochran-Smith, 1992) on a foreign language speaking performance. Assessment & Evaluation in Higher Education,
41, 803–820. http://dx.doi.org/10.1080/02602938.2015.1042426.
particular situation, results must be applied to other settings with Hughes, I. E., & Large, B. J. (1993). Staff and peer-group assessment of oral commu-
caution. nication skills. Studies in Higher Education, 18, 379–385.
Hung, Y.-J., Samuelson, B. L., & Chen, S.-C. (2016). Reltionships between peer- and self-
assessment and teacher assessment of young EFL learners’ oral presentations. In M.
6. Conclusion Nikolov (Ed.). Assessing young learners of English: Global and local perspectives (pp.
317–328). New York: Springer.
This study explored the potential benefits and challenges of im- Johnson, R. B., & Onwuegbuzie, A. J. (2004). Mixed methods research: A research
paradigm whose time has come. Educational Researcher, 33(7), 14–26. http://dx.doi.
plementing PA of oral performance with young EFL learners who have
org/10.3102/0013189x033007014.
low-intermediate or beginner proficiency levels. Little research has fo- Kaufman, D. (2004). Constructivist issues in language learning and teaching. Annual
cused on the use of PA with younger learners. We found that group PA Review of Applied Linguistics, 24, 303–319. http://dx.doi.org/10.1017/
of oral performance can be an engaging and pedagogically sound ap- S0267190504000121.
Knapp, K., Seidlhofer, B., & Widdowson, H. G. (2009). Handbook of foreign language
proach to language learning and assessment, with some modifications communication and learning. New York: Mouton de Gruyter.
of PA procedures recommended for younger learners. The Taiwanese Kollar, I., & Fischer, F. (2010). Peer assessment as collaborative learning: A cognitive
students in this study responded positively to engaging in this assess- perspective. Learning and Instruction, 20, 344–348. http://dx.doi.org/10.1016/j.
learninstruc.2009.08.005.
ment activity with their peers. Participating in PA activities in small Kwan, K.-P., & Leung, R. (1996). Tutor versus peer group assessment of student perfor-
groups helped them engage without being responsible for assessment mance in a simulation training exercise. Assessment & Evaluation in Higher Education,
decisions individually. Teachers can support these activities by helping 21, 205–214. http://dx.doi.org/10.1080/0260293960210301.
Landis, J. R., & Koch, G. D. (1977). The measurement of observer agreement for cate-
students to facilitate their group discussion and by providing thorough gorical data. Biometrics, 33, 159–174.
training for students before beginning the activity. This study has de- Langan, A. M., Wheater, C. P., Shaw, E. M., Haines, B. J., Cullen, W. R., Boyle, J. C., ...
monstrated that PA of oral performance deserves greater research at- Preziosi, R. F. (2005). Peer assessment of oral presentations: Effects of student
gender, university affiliation and participation in the development of assessment
tention, not only as a formative assessment approach but also as a way
criteria. Assessment & Evaluation in Higher Education, 30, 21–34. http://dx.doi.org/10.
to provide oral language practice and help students develop interactive 1080/0260293042000243878.
and self-monitoring skills. Langan, A. M., Shuker, D. M., Cullen, W. R., Penney, D., Preziosi, R. F., & Wheater, C. P.
(2008). Relationships between student characteristics and self-, peer and tutor eva-
luations of oral presentations. Assessment & Evaluation in Higher Education, 33,
Acknowledgments 179–190. http://dx.doi.org/10.1080/02602930701292498.
Lantolf, J. P. (Ed.). (2000). Sociocultural theory and second language learning. New York:
I thank Professor Peter Van Petegem and two anonymous reviewers Oxford University Press.
Li, D. (1998). It's always more difficult than you plan and imagine: Teachers' perceived
for their constructive comments on an earlier draft of this paper. difficulties in introducing the communicative approach in South Korea. TESOL
Quarterly, 32, 677–703. http://dx.doi.org/10.2307/3588000.
References LoCastro, V. (1996). English language education in Japan. In H. Coleman (Ed.). Society
and the language classroom (pp. 148–168). Cambridge: Cambridge University Press.
Lytle, S., & Cochran-Smith, M. (1992). Teacher research as a way of knowing. Harvard
Bogdan, R. C., & Biklen, S. K. (1998). Qualitative research for education: An introduction to Educational Review, 62, 447–474.
theory and methods (3rd ed.). Boston, MA: Allyn and Bacon. Min, H. T. (2006). The effect of trained peer review on EFL students’ revision types and
Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society. writing quality. Journal of Second Language Writing, 15, 118–141. http://dx.doi.org/
Studies in Continuing Education, 22, 151–167. 10.1016/j.jslw.2006.01.003.
Bryant, D. A., & Carless, D. R. (2010). Peer assessment in a test-dominated setting: Mok, J. (2010). A case study of students' perceptions of peer assessment in Hong Kong.
Empowering, boring, or facilitating examination preparation? Educational Research ELT Journal, 65, 230–239. http://dx.doi.org/10.1093/elt/ccq062.
for Policy and Practice, 9, 3–13. http://dx.doi.org/10.1007/s10671-009-9077-2. Mooney, C. G. (2000). Theories of childhood: An introduction to Dewey, Montessori, Erickson,
Butler, Y. G., & Zeng, W. (2014). Young foreign language learners’ interactions during Piaget & Vygotsky. St. Paul, MN: Redleaf Press.
task-based paired assessments. Language Assessment Quarterly, 11, 45–75. http://dx. Morita, N. (2000). Discourse socialization through oral classroom activities in a TESL
doi.org/10.1080/15434303.2013.869814. graduate program. TESOL Quarterly, 34, 279–310. http://dx.doi.org/10.2307/
Butler, Y. G. (2011). The implementation of communicative and task-based language 3587953.
teaching in the Asia-Pacific region. Annual Review of Applied Linguistics, 31, 36–57. Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social or-
http://dx.doi.org/10.1017/S0267190511000122. ganizational, and goal-based perspectives. Thousand Oaks, CA: Sage.
Carless, D. (2007). Learning-oriented assessment: Conceptual bases and practical im- Olson, C. B., Kim, J. S., Scarcella, R., Kramer, J., Pearson, M., van Dyk, D. A., ... Land, R.
plications. Innovations in Education and Teaching International, 44, 57–66. E. (2012). Enhancing the interpretive reading and analytical writing of mainstreamed
Chen, Y.-M. (2006). Peer and self-assessment for English oral performance: A study of English learners in secondary school: Results from a randomized field trial using a
reliability and learning benefits. English Teaching & Learning, 30(4), 1–22. cognitive strategies approach. American Educational Research Journal, 49, 323–355.
Cheng, W., & Warren, M. (2005). Peer assessment of language proficiency. Language http://dx.doi.org/10.3102/0002831212439434.
Testing, 22, 93–121. http://dx.doi.org/10.1191/0265532205lt298oa. Palinscar, A. S. (2005). Social constructivist perspectives on teaching and learning. In H.
27
Y.-j. Hung Studies in Educational Evaluation 59 (2018) 19–28
Daniels (Ed.). An introduction to Vygotsky (pp. 285–314). (2nd ed.). New York: functional, and conceptual developments. Learning and Instruction, 20, 265–269.
Routledge. http://dx.doi.org/10.1016/j.learninstruc.2009.08.002.
Pan, Y., & Newfields, T. (2011). Teacher and student washback on test preparation evi- Suzuki, M. (2009). The compatibility of L2 learners' assessment of self- and peer revisions
denced from Taiwan’s English certification exit requirements. International Journal of of writing with teachers' assessment. TESOL Quarterly, 43, 137–148. http://dx.doi.
Pedagogies and Learning, 6, 260–272. http://dx.doi.org/10.5172/ijpl.2011.6.3.260. org/10.1002/j.1545-7249.2009.tb00233.x.
Patri, M. (2002). The influence of peer feedback on self- and peer-assessment of oral skills. Tashakkori, A., & Teddlie, C. (2010). Sage handbook of mixed methods in social & behavioral
Language Testing, 19, 109–131. http://dx.doi.org/10.1191/0265532202lt224oa. research. Los Angeles: SAGE Publications.
Peng, J. (2010). Peer assessment in an EFL context: Attitudes and correlations. Paper Topping, K. J. (1998). Peer assessment between students in colleges and universities.
presented at the 2008 second language research forum. Review of Educational Research, 68, 249–276. http://dx.doi.org/10.3102/
Pond, K., Ul-Hag, R., & Wade, W. (1995). Peer review: A precursor to peer assessment. 00346543068003249.
Innovation in Education and Training International, 32, 314–323. Topping, K. J. (2003). Self and peer assessment in school and university: Reliability,
Pope, N. K. L. (2005). The impact of stress in self- and peer assessment. Assessment & validity and utility. In M. Segers, F. Dochy, & E. Cascallar (Eds.). Optimizing new modes
Evaluation in Higher Education, 30, 51–63. http://dx.doi.org/10.1080/ of assessment: In search of qualities and standards (pp. 55–87). Dordrecht, The
0260293042000243896. Netherlands: Kluwer Academic.
Riley, S. M. (1995). Peer responses in an ESL writing class: Student interaction and subsequent Topping, K. J. (2009). Peer assessment. Theory into Practice, 48, 20–27. http://dx.doi.org/
draft revision (Unpublished Doctoral Dissertation). Tallahassee, FL, USA: Florida State 10.1080/00405840802577569.
University. Tsai, Y.-C., & Chuang, M.-T. (2013). Fostering revision of argumentative writing through
Roskams, T. (1999). Chinese EFL students' attitudes to peer feedback and peer assessment structured peer assessment. Perceptual and Motor Skills, 116, 210–221. http://dx.doi.
in an extended pairwork setting. RELC Journal, 30, 79–123. http://dx.doi.org/10. org/10.2466/10.23.pms.116.1.210-221.
1177/003368829903000105. van Gennip, N. A. E., Segers, M. S. R., & Tillema, H. H. (2010). Peer assessment as a
Saito, H. (2008). EFL classroom PA: Training effects on rating and commenting. Language collaborative learning activity: The role of interpersonal variables and conceptions.
Testing, 25, 553–581. http://dx.doi.org/10.1177/0265532208094276. Learning and Instruction, 20, 280–290. http://dx.doi.org/10.1016/j.learninstruc.
Seale, C., & Silverman, D. (1997). Ensuring rigour in qualitative research. European 2009.08.010.
Journal of Public Health, 7, 379–384. http://dx.doi.org/10.1093/eurpub/7.4.379. van Zundert, M., Sluijsmans, D., & van Merriënboer, J. (2010). Effective peer assessment
Shen, C., Lin, F., Xu, Y., Guo, W., Guo, F., Chen, M., ... Liu, S. (2001). Enjoy. Tainan, processes: Research findings and future directions. Learning and Instruction, 20,
Taiwan: Bureau of Education of Tainan City Government. 270–279.
Shih, C.-M. (2010). The washback of the general English proficiency test on university Vongpumivitch, V. (2012). English-as-a-foreign-language assessment in Taiwan. Language
policies: A Taiwan case study. Language Assessment Quarterly, 7, 234–254. http://dx. Assessment Quarterly, 9, 1–10. http://dx.doi.org/10.1080/15434303.2012.649592.
doi.org/10.1080/15434301003664196. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes.
Shimura, M. (2006). Peer- and instructor assessment of oral presentations in Japanese MIT Press: Cambridge, MA.
university EFL classrooms: A pilot study. Waseda Global Forum, 3, 99–107. Wells, G. (2007). Semiotic mediation: Dialogue and the construction of knowledge.
Simpson, A., Mercer, N., & Majors, Y. (2010). Douglas Barnes revisited: If learning floats Human Development, 50, 244–274.
on a sea of talk, what kind of talk? And what kind of learning? English Teaching: White, E. (2009). Student perspectives of peer assessment for learning in a public
Practice and Critique, 9(2), 1–6. speaking course. Asian EFL Journal, 33, 1–36.
Sluijsmans, D., Dochy, F., & Moerkerke, G. (1998). Creating a learning environment by Wu, J. R. W. (2012). GEPT and English language teaching and testing in Taiwan. Language
using self-, peer and co-assessment. Learning Environments Research, 1, 292–319. Assessment Quarterly, 9, 11–25. http://dx.doi.org/10.1080/15434303.2011.553251.
Smyth, K. (2004). The benefits of students learning about critical evaluation rather than Ye, X., (2001). Alternative assessment, In: Y. Shi, (Ed.),
being summatively judged. Assessment & Evaluation in Higher Education, 29, 369–378. [English teaching and assessment in
Strijbos, J.-W., & Sluijsmans, D. (2010). Unravelling peer assessment: Methodological, primary and middle schools]. Ministry of Education, Taipei, Taiwan, 42–73.
28