0% found this document useful (0 votes)
15 views

Elliot, M. The Expression of Affect in Speaking...

Speaking Assessment

Uploaded by

Yazan Brahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Elliot, M. The Expression of Affect in Speaking...

Speaking Assessment

Uploaded by

Yazan Brahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Research Notes

Issue 42/November 2010

A quarterly publication reporting on research, test development and validation

Senior Editor & Editor


Dr Hanan Khalifa & Dr Ivana Vidaković

Editorial Board
Dr Nick Saville, Director, Research and Validation Group, Cambridge ESOL
Roger Johnson, Director, Assessment and Operations Group, Cambridge ESOL

Production Team
Caroline Warren, Research Support Administrator
Rachel Rudge, Production Controller
George Hammond, Design

Printed in the United Kingdom by Océ (UK) Ltd.


C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 1

Research Notes
Contents
Editorial Notes 1
Developing a model for investigating the impact of language assessment: Nick Saville 2
Construct validation of the Reading module of an EAP proficiency test battery: Hanan Khalifa 8
Comparing proficiency levels in a multi-lingual assessment context: Karen Ashton 14
Testing financial English: Specificity and appropriacy of purpose in ICFE: Angela Wright 15
The expression of affect in spoken English: Mark Elliott 16
Peer–peer interaction in a paired Speaking test: The case of FCE: Evelina D Galaczi 22
Second language acquisition of dynamic spatial relations: Ivana Vidaković 23
Demonstrating cognitive validity of IELTS Academic Writing Task 1: Graeme Bridges 24
Qualification and certainty in L2 writing: A learner corpus study: Sian Morgan 33
Prompt and rater effects in second language writing performance assessment: Gad S Lim 39
Computer-based and paper-based writing assessment: A comparative text analysis: Lucy Chambers 39
A study of the context and cognitive validity of a BEC Vantage Test of Writing: Hugh Bateman 40
Models of supervision – some considerations: Juliet Wilson 40
A framework for analysing and comparing CEFR-linked certification exams: Marylin Kies 41
IRT model fit from different perspectives: Muhammad Naveed Khalid 41
Conferences and publications 42

Editorial Notes
Welcome to issue 42 of Research Notes, our quarterly publication reporting on matters
relating to research, test development and validation within Cambridge ESOL.
This special issue of Research Notes shares with the readers summaries of doctoral and
Master’s theses by Cambridge ESOL staff. The issue is organised according to skill area and
domain of interest. It begins with Nick Saville’s paper on an expanded impact model intended
to provide a more effective way of understanding how language examinations impact on
society. In the area of reading, Hanan Khalifa investigates the construct validity of the reading
module of an EAP test battery using qualitative and quantitative research methods. Also using
a mixed-method approach, Karen Ashton compares reading proficiency levels of secondary
school learners of German, Japanese and Urdu, while Angela Wright examines context validity
of the ICFE test of Reading. If your interests lie in the area of speaking, you may want to read
Mark Elliott’s paper on affective factors in oral communication, Evelina Galaczi’s summary of
her thesis on paired test format and Ivana Vidaković’s summary on learning how to express
motion in a second language and factors affecting second language acquisition. In the area of
writing, we would like to introduce to you Graeme Bridges’ paper on cognitive validity of IELTS,
Sian Morgan’s paper on qualification and certainty in L2 writing, Gad Lim’s work on prompt
and rater effect in assessing writing, Lucy Chambers’ summary on comparability issues
between paper-based and computer-based modes of assessment and Hugh Bateman’s work
on context and cognitive validity of a BEC Writing paper. Finally, Juliet Wilson discusses
models of teaching supervision, Marylin Kies proposes a framework for assessing and
comparing examinations linked to the CEFR and Muhammad Naveed Khalid investigates IRT
model fit from a variety of perspectives.
We finish this issue by reporting on the conference season and events Cambridge ESOL
supported. Laura Cope and Tamsin Walker report on the IACAT conference (June 2010) on
computerised adaptive testing. Martin Nuttall describes the ALTE events and Lynda Taylor
provides a brief on the three latest volumes in the SiLT series.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
2 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Developing a model for investigating the impact of


language assessment
NICK SAVILLE RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

Introduction the emphasis on actions and activities which would allow


Cambridge ESOL to ‘work for positive impact’ and to avoid
This summary is based on a doctoral thesis submitted to
negative consequences for test users.
the University of Bedfordshire (UK) in 2009. Financial
Based on the analysis, the main outcome of the thesis
support for the PhD was provided by Cambridge ESOL and
was an expanded model of impact designed to provide
it was supervised by Professor Cyril Weir.
examination providers with a more effective ‘theory of
action’. When applied within Cambridge ESOL, this model
The main research question was:
allows anticipated impacts of the English language
What are the essential components of an action-oriented model of examinations to be monitored more effectively and leads to
impact that would enable the providers of high-stakes language well-motivated improvements to the examination systems.
examinations to investigate the impact of their examinations within the Wider applications of the model in other assessment
educational contexts in which they are used? contexts were also suggested.
The thesis was based on the premise that there is no
comprehensive model of language test or examination The concept of impact in language assessment
impact and how it might be investigated within educational Impact is relatively new in the field of language assessment
contexts by a provider of high-stakes examinations. It, and has only fairly recently appeared in the literature as an
therefore, addressed the development of such a model from extension of washback. Both terms were discussed in the
the perspective of Cambridge ESOL as a provider of English literature review. Broadly speaking, impact is the
language tests and examinations in over 150 countries. superordinate concept covering the effects and
The starting point was a discussion of examinations consequences of tests and examinations throughout
within educational processes generally and the role that society, whereas washback is more limited and refers to the
examinations boards, such as Cambridge ESOL, play influence of tests and examinations in teaching and
within educational systems. The historical context and learning contexts.
assessment tradition were an important part of this The literature review covered relevant work in applied
discussion. linguistics, assessment and education, mainly focusing on
In the literature review, the effects and consequences of a 15-year period up to 2004. The notion of washback which
language tests and examinations were discussed with was developed in the 1990s to take account of changing
reference to the better known concept of washback and views of validity in language testing provided a useful basis
how impact can be defined as a broader notion operating for building an expanded model of impact. Much of the
at both micro and macro levels. This was contextualised research in the language testing literature, however, had
within the assessment literature on validity theory and the been small-scale projects and no systematic programme
application of innovation theories within educational had been initiated and carried out by staff within a major
systems. examination provider.
The stance in this work reflected the author’s own
interests and responsibilities in developing a model of From washback to impact
impact to guide practice within the organisation. His voice
The literature review summarised the developments of
as participant, reviewer and developer of the impact model,
washback and impact models starting with Alderson & Wall
as well as his relationships with other participants and
(1993) and ending with Green’s (2003) washback model.
researchers, were an important feature of this work and its
See for example Cheng, Watanabe & Curtis (2004) for a
methodological framework. Starting in the early 1990s a
useful overview.
series of projects were carried out to implement an
The dimensions of the washback models which emerged
approach to impact which had begun to emerge in
in the 1990s can be summarised in the following seven
Cambridge ESOL at that time.
points.
Methodologically, the research was based on a meta-
analysis which was employed in order to describe and The test features: Surface features of the test were the main
review three impact projects. These three projects had been focus, for example item types and formats (e.g. multiple
carried out by researchers based in Cambridge to choice). Content validity, especially in terms of authenticity,
implement an approach to test impact which had emerged had become an important issue. In test validation (evidence
as part of the test development and validation procedures of validity) the unitary concept of validity was beginning to
adopted by Cambridge ESOL. A differentiating feature be adopted, in particular through the influence of Bachman
compared with research being conducted elsewhere was (see below).

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 3

The context: There was one main context which was the Locating impact research within
focus of attention: the school and classroom (i.e. the micro
Cambridge ESOL
context). The test-taking context was typically not separated
from the school context where the teaching and learning A fundamental concern in the thesis was how impact-
takes place. Although some wider contextual features related research can be integrated into operational
(macro context) were starting to be discussed, these were processes. For Cambridge ESOL, impact research needed to
not yet a major focus. combine theoretical substance with practical applications
and to become an integral part of the operational test
The participants: The main participants were taken to be development and validation processes.
the teacher and the learners in the classroom/school In placing impact within a validation framework, the work
context. There was a limited focus on other participants, of Bachman was influential, especially his series of
such as materials writers, or participants from the wider seminars delivered in Cambridge in 1990–1. He was one of
context (e.g. parents). the first language testers to discuss impact as a ‘quality’ of
a test and suggested that impact should be considered
The outcomes: Outcomes were seen as changes
within the overarching concept of test usefulness (Bachman
attributable to the introduction of the test: behaviour of
& Palmer 1996). The development of ‘useful tests’ involves
participants – actions, activities, performance in the target
the balancing of four qualities: validity, reliability, impact
language; views and attitudes of participants; decisions to
and practicality – the VRIP features as they became known
make changes to the curriculum/syllabus and to develop
in Cambridge.
new materials and methods (products).
In an internal working paper, Milanovic & Saville (1996)
The processes involved in bringing about the outcomes
first set out ideas on an expanded concept of test impact to
were not well understood nor well represented in the
meet the needs of Cambridge ESOL. They addressed the
model. For example, the processes whereby the test
question of how examinations can be developed with
features influenced the content and methods of the
appropriate systems in place to monitor and evaluate their
teachers were not understood. Some evidence existed to
impact.
suggest that content but not the teaching methodology was
Aware of the work of Hughes (1989) and others (e.g.
affected, but when these effects occurred, how they actually
Bailey 1996) who used checklists of behaviours to
came about and what factors influenced the strength of the
encourage positive washback, Milanovic & Saville (1996)
effects was not included in the model.
proposed four maxims to support working practices:
The researcher: The washback researcher was typically an Maxim 1: PLAN
academic, not usually involved in the test development Use a rational and explicit approach to test development
process as a participant, nor as a participant in the Maxim 2: SUPPORT
teaching/learning context itself (i.e. an outsider). Support stakeholders in the testing process
The research methods: No clear impact methodology, Maxim 3: COMMUNICATE
instrument validation procedures or validated instruments Provide comprehensive, useful and transparent
had been established, but qualitative methods were information
emerging in addition to survey techniques for data Maxim 4: MONITOR and EVALUATE
collection. The need to problematise washback in terms of Collect all relevant data and analyse as required
hypotheses had been recognised.

The timeline: In the washback model, the timeline was The statements were deliberately designed to be short and
implied but not explicitly focused on. The need for memorable, to capture the key principles and what is most
comparative data – before/after – had led to a focus on relevant, and in so doing to provide a basis for decision-
time-series designs and an appeal to insights from making and action planning.
innovation theory. Innovation theory, in relation to Wall’s Under Maxim 1 there was a requirement to plan
(1999, 2005) work using Henrichsen’s (1989) hybrid model effectively and for the organisation to adopt a rational and
of diffusion/implementation, suggests that each period of explicit model for managing the test development
an educational innovation has its own antecedents, processes in a cyclical and iterative way. Maxim 2 focused
processes and consequences. The investigation of on the requirement to provide adequate support for the
‘antecedent conditions’ are Henrichsen’s version of the stakeholders involved in the many processes associated
baseline study (see also Saville 2003). The consequences, with international examinations. Maxim 3 focused on the
therefore, are the changes which are brought about as a importance of communication and of providing useful and
result of the new processes which have been introduced. transparent information to the stakeholders and Maxim 4
on the requirement to collect relevant data and to carry out
Cheng (1997, 2005), Green (2003, 2007) and Wall (1999, analyses as part of the iterative process model.
2005) looked at different aspects of washback and had By conceptualising impact within VRIP-based validation
begun to focus more broadly on impact issues. However, processes, there was an explicit attempt to integrate impact
there had been no serious attempt to bring all the features research into ongoing procedures for accumulating validity
of impact together within a comprehensive model which evidence. The Cambridge perspective on impact was framed
would allow the complex relationships to be examined by these considerations and provided the starting point for
across broader educational and societal contexts. the model developed in the thesis.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
4 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Locating impact within educational Understanding the nature of context within educational
systems and the roles of stakeholders in those contexts are
systems
clearly important considerations for an examination board
The thesis focused broadly on how impact operates within like Cambridge ESOL (see Saville 2003:60).
educational systems and the literature on educational
reform and management of change was particularly
relevant. An understanding of how socio-political change
processes work within education was also considered to be Using case studies as meta-data
crucial (Fullan 1991). A range of data collection and analysis techniques needs to
Several concepts emerged from the literature and were be employed in impact-related research. These were
explored: discussed with reference to the literature on social research.
• a definition of stakeholders and the roles they play in Ways in which quantitative and qualitative approaches can
many varied contexts where language learning and be effectively combined in mixed-method designs were
assessment operate noted and the validation of instruments was illustrated.
Three case studies formed the central part of the thesis.
• a view of educational systems as complex and dynamic in
which planned innovations are difficult to implement • Case 1 was the survey of the impact of IELTS (the
successfully International English Language Testing System). This was
the starting point for the impact model; it set out the
• an understanding of how change can be anticipated and
conceptualisation of impact and described the design
how change processes related to assessment systems
and validation of suitable instruments to investigate it, as
can be successfully managed through the agency of an
applied within four Impact Projects as part of an ongoing
examination provider
programme of validation following the 1995 revision. This
• the critical importance of the evidence collected as part case included a description of the IELTS development and
of the validation system and as the basis for claims about the underlying constructs, the nature of the impact data
validity. which was targeted and the necessary instrumentation to
It has been suggested that educational processes take collect that data. The lessons learned were summarised
place within complex dynamic systems with interplay in relation to the developing model and how they
between many sub-systems and ‘cultures’ and where informed the next phase of development in Case 2.
understanding the roles of stakeholders as participants is a • Case 2 was the Italian Progetto Lingue 2000 (PL2000)
critical factor (e.g. Fullan 1993, 1999, Thelen & Smith 1994, Impact Study. This impact study was an application of the
Van Geert 2007). original model within a macro educational context and
The thesis situated the discussion of impact within the described an initial attempt at applying the approach
work of researchers who focus on how change can be within a state educational context, i.e. the Italian state
managed successfully within educational systems. Figure 1 system of education and a government reform project
illustrates macro and micro contexts within society; it intended to improve standards of language education at
shows how diversity and variation between contexts tend to the turn of the 21st century – the Progetto Lingue 2000.
increase as the focus moves from the macro context to the The impact of the reforms generally and the specific role
multiple micro contexts at the local level (i.e. schools, of external examinations provided by Cambridge ESOL
classes, groups, individual teachers and learners). formed the basis of this case. This study provided greater

Figure 1: Context in education – A complex dynamic system

Individual Differences
MACRO
• Demographic
CONTEXT
• Socio-Psychological Country
• Strategic • Culture
• Prior knowledge/learning • Politics
• L1
• Role of L2
• Model of L2

Learner
and Region
• Urban/rural
Teacher • Wealthy/poor
Micro

Community
• Demographic make up
con

Group
tex

School
ta

nd
cu Class Sector
ltu
re • Public/private
ion Cycle
i at
var • Primary
re asing • Middle
Inc
• Upper

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 5

focus on the contextual variables and the roles and Dimension 1: re-conceptualise the place and role of impact
responsibilities of particular stakeholder groups and study within the assessment enterprise, vis-à-vis societal
individuals within the educational system (see Hawkey systems generally and language education specifically.
2006). The re-conceptualisation of test impact draws on theories
• Case 3 was the Florence Learning Gains Project (FLGP). in the social sciences and goes beyond the work in applied
Still within Italy, this project built directly on the PL2000 linguistics and measurement. It is based on a 21st century
case and was an extension and re-application of the world view and takes into account recent ontological and
model within a single school context (i.e. at the micro epistemological developments.
level). It focused on individual stakeholders in one It extends the epistemological influences which guided
language teaching institution, namely teachers and Messick and his predecessors in the development of
learners preparing for a range of English language validity theory in the second half of the 20th century.
examinations at a prestigious language school in Messick explicitly referred to the philosophical perspectives
Florence. The complex relationships between assessment of Leibniz, Locke, Kant, Hegel and Singer, and to the
and learning/teaching in a number of language influences of their rationalism and logical positivism on the
classrooms, including the influence of the Cambridge nature of scientific enquiry in the 20th century (Messick
examinations, were examined against the wider 1989:30). In moving beyond Messick into the 21st century,
educational and societal milieu in Italy. The micro level of the influence of post-modernism cannot be ignored, but for
detail, as well as the longitudinal nature of the project examinations boards and language test providers an
conducted over an academic year, were particularly epistemology which can provide the basis for action is
relevant in this case. required.
The ontological approach suggested draws on ‘critical
The analysis and discussion in each case study was broadly realism’ in the social sciences (e.g. Sayer 1984, 2000)
structured around the seven features of the washback model and contemporary views on pragmatism derived from the
which had emerged by end of the 1990s, as noted above. philosophy which originated with Peirce in the late
19th century. This realist stance underpins the suggested
re-conceptualisation of impact and the other dimensions
The revised model of impact of the meta-framework:
Insights from the three case studies were assembled into a. Anticipating and managing change over time is a key
an expanded model; this meta-framework builds on aspect of impact research, noting the importance of
Milanovic & Saville’s maxims (1996), and constitutes an timescales and the timeline (change over time, planned
action-oriented approach with four inter-related dimensions and unpredicted) with recurrent cycles (before/during/
(see Figure 2). after). The recent educational literature on management

Figure 2: Revised model

Stance
Perspective of UK examinations board
Influenced by critical realism, contemporary pragmatism

Reconceptualising impact Key considerations


taking account of: Impacts (positive
and negative) Centrality of language construct,
– theories of knowledge
anticipated in theories of language learning:
– socio-cognitive theory
design phase – a socio-cognitive model
– constructivism
– learning understood as change
– theories of change
Impact research – effective communication
methodology
Impact research incorporated into
Impact by design used to find out
routine validation processes
what happens
Mixed method designs used
with impact ‘toolkit’ to collect
Procedural basis for knowing Remedial action
quantitative and qualitative data
about effects and consequences taken when
needed on the Importance of the timeline with
basis of impact iterative cycles of review and
Theory of action evidence revisions implemented over time

Emergent aspects of validity


Improved understanding of the meaning of language assessment in context
and of the effects and consequences on systems and people

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
6 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

of innovation suggests mechanisms which can be put in validation with iterative cycles is a necessary condition for
place to anticipate and achieve desirable outcomes creating construct-valid tests and for the development of
through change processes. Fullan (1993:19), for successful systems to support them.
example, suggests that the solution to achieving At the heart of this is the adequate specification of the
productive educational change ‘lies in developing better focal construct which is crucial for ensuring that the test is
ways of thinking about, and dealing with, inherently appropriate for its purpose and contexts of use (and to
unpredictable processes’. His work also points up the counter the twin threats to validity – construct under
social dimension of education and the relevance of representation and construct-irrelevant variance – noted
theories of social systems and practices to assessment by Messick (1996: 252)).
which have also been a focus of attention in language This is a necessary condition for achieving the
testing circles in recent years (e.g. McNamara & Roever anticipated outcomes, but it is not sufficient and only
2006). provides the ‘latent potential’ for validity in use. For
b. Socio-cognitive theories which place importance on both Cambridge ESOL impact by design highlights the
social and cognitive considerations are particularly importance of designing and implementing assessment
relevant to the conceptualisation of language constructs systems, which extend the design features beyond the
(e.g. Weir 2005). technical validities related to the construct, and incorporate
The research methodologies needed to investigate considerations explicitly related to the social and
the impact of examinations in their socio-cultural educational contexts of test use.
contexts indicate that insights from socio-cognitive As time passes following the introduction of an
theory might also be helpful in understanding how examination, new contexts of use arise and new users
language learning and preparation for examinations acquire a stake in the examination. As this extension of
takes place in formalised learning contexts. The literature ‘ownership’ happens, there is a risk of ‘drift’ away from the
on social psychology may also be relevant as social original intentions of the test developers; for example, the
psychologists seek to explain human behaviour in terms intended relationship between use of test results and the
of the interaction between mental state and social test construct may begin to change over time due to
context; this is an important aspect of impact at the influences in the wider educational context. The potential
micro level. for negative impact is likely to increase when the original
construct is no longer suitable for the decisions which the
c. Constructivism is important for the re-conceptualisation new users are making. In other words, the examination is
of impact for two reasons: first because contemporary no longer ‘fit for purpose’ and so corrective action of some
approaches to teaching and learning in formal contexts kind needs to be taken.
now appeal to constructivist theories; second because it Similarly, consequences – intended and unintended –
underpins the research paradigm which is most often emerge after the test has been ‘installed’ into real-life
appropriate to finding out what goes on in contexts of contexts of use which are not uniform and are constantly
test use, as seen in the case studies. changing as a result of localised socio-political and other
d. Contemporary theories of knowledge and of language factors. The overall validity of an assessment system,
learning need to play a more prominent role in the study therefore, is an emergent property resulting from a test
of impact. For example, from the learner’s perspective, interacting with contexts over time.
affective factors are vital for motivation, and feedback ‘Impact by design’ is therefore not strictly about
from tests that highlights strengths positively tends to prediction; a more appropriate term might be ‘anticipation’.
lead to better learning (assessment for learning). In working with stakeholders, possible impacts on both
micro and macro levels can be anticipated as part of the
These considerations are relevant in designing language design and development process. Where negative
assessment systems with learning-oriented objectives, and consequences are anticipated, potential remedial actions
whether these objectives have been met is a concern in or mitigations can be planned in advance. So, for example,
impact research. if ‘construct drift’ is a risk, it can be anticipated and
appropriate tolerances set before test revisions are
Dimension 2: introduce the concept of ‘impact by design’ required. This approach is congruent with the concept of
into the planning and operationalisation of language social impact assessment, a form of policy-oriented social
assessments by examination providers. research.
The concept of ‘impact by design’ is a key feature of the
expanded impact model. This means designing tests which Dimension 3: re-organise validation procedures to
have the potential for positive impacts, including well- incorporate impact research into operational activities to
defined focal constructs supported by contemporary provide the basis for knowing about and understanding
theories of communicative language ability, language how well an assessment system works in practice with
acquisition and assessment (cf. the socio-cognitive model). regard to its impact.
It takes an ex ante approach to anticipating the possible It is essential to know what happens when a test is
consequences of a given policy ‘before the event’. introduced into its intended contexts of use; this should
‘Impact by design’ builds on Messick’s idea (1996) of constitute a long-term validation plan, as required by the
achieving ‘validity by design as a basis for washback’. The impact by design concept.
importance of the rational model of test development and Finding out and understanding needs to be a routine

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 7

preoccupation within the operational procedures and Conclusion


should be problematised within a research agenda which
The outcome of the thesis is an expanded model which was
allows for impact-related research studies to be conducted
designed to help Cambridge ESOL and other examination
where appropriate.
providers to address the challenge of finding out and
The emergentist approach noted above encourages
understanding how their examinations impact on society.
impact researchers to develop an ‘impact toolkit’ of
Concrete and relevant applications for investigating the
methods and approaches to ‘finding out’ (e.g. to carry out
impact of language assessment at micro and macro levels
analyses of large-scale aggregated data, as well as micro-
within the routine work of the examinations board were also
analyses of views, attitudes and behaviours in local
suggested.
settings). The quantitative analysis of macro-level group
data can capture overall patterns and trends, while the
qualitative analysis of multiple single cases enables the References
impact researchers to monitor variability in local settings Alderson, J C and Wall, D (1993) Does washback exist? Applied
and to work with the ‘ecological’ features of context. Linguistics 14, 115–129.
While not rejecting experimental methods, an expanded Bachman, L (1990) Fundamental considerations in language testing,
model of impact looks to ‘real world’ research paradigms to Oxford: Oxford University Press.
provide tools which can shed light on what happens in Bachman, L and Palmer, A (1996) Language Testing in Practice,
testing contexts. Constructivist approaches to social Cambridge: Cambridge University Press.
research include mixed methods and quasi-experimental Bailey, K M (1996) Working for washback: a review of the washback
designs, as shown in the three cases reviewed in this concept in language testing, Language Testing 13 (3), 257–279.
thesis. Case studies are especially useful for investigating Cheng, L (1997) The Washback Effect of Public Examination Change
impact at the micro level and for understanding the on Classroom Teaching: An impact study of the 1996 Hong Kong
complexities of interaction between macro-level policies Certificate of Education in English on the classroom teaching of
and implementation in local settings. Without such English in Hong Kong secondary schools, unpublished PhD thesis,
University of Hong Kong.
methods it is difficult to find out about and understand how
the interaction of differing beliefs and attitudes can lead to Cheng, L (2005) Changing Language Teaching through Language
Testing: A washback study, Cambridge: Cambridge ESOL and
consensus or to divergence and diversity.
Cambridge University Press.
It is important for examination boards to modify their
Cheng, L and Watanabe, Y with Curtis, A (Eds) (2004) Washback in
validation procedures in order to collect, store and access
language testing: Research contexts and methods, Mahwah, NJ:
the necessary data and greater attention should be given to Lawrence Erlbaum Associates.
the planning and resourcing for this area of validation. Fullan, M (1991) The New Meaning of Educational Change (2nd ed.),
London: Cassell.
Dimension 4: develop an appropriate theory of action which Fullan, M (1993) Change Forces: Probing the Depths of Educational
enables examination providers to work with stakeholders to Reform, London: the Falmer Press.
achieve the intended objectives, to avoid negative Fullan, M (1999) Change Forces: The Sequel, London: the Falmer
consequences and to take remedial action when necessary. Press.
The ability to change systems to improve educational Green, A (2003) Test Impact and EAP: a comparative study in
outcomes or mitigate negative consequences associated backwash between IELTS preparation and university pre-sessional
with the examinations is ultimately the most important courses, unpublished PhD thesis, the University of Surrey at
dimension of the model. Anticipating impacts and finding Roehampton.
out what happens in practice are not enough if Green, A (2007) IELTS Washback in Context; Preparation for academic
improvements do not occur as a result; a theory of action is writing in higher education, Cambridge: Cambridge ESOL and
therefore required to guide practice. Cambridge University Press.

Examples of theory of action are found in the literature Hawkey, R (2006) The theory and practice of impact studies:
Messages from studies of the IELTS test and Progetto Lingue 2000:
on educational reform and school improvements, especially
Cambridge ESOL/Cambridge University Press.
in the USA. Such examples provide support for the ways
Henrichsen, L E (1989) Diffusion of innovations in English language
in which the four dimensions of the expanded model fit
teaching: The ELEC effort in Japan, 1956–1968, New York:
together in practice (e.g. Resnick and Glennan 2002). Greenwood Press.
A theory of action provides planners and practitioners with
Hughes, A (1989) Testing for language teachers, Cambridge:
the capacity to act in social contexts, to determine what Cambridge University Press.
needs to be done and when/how to do it. Being prepared to
McNamara, T and Roever, C (2006) Language Testing: the Social
change and to manage change is critical to a theory of Dimension, Oxford: Blackwell.
action. The challenge for the examination provider is to
Messick, S (1989) Validity, in Linn, R L (Ed) Educational measurement
‘harness the forces of change’ in order to get the relevant (3rd ed), New York: Macmillan, 13–103.
stakeholders working together to achieve better Messick, S (1996) Validity and washback in language testing,
assessment outcomes. Language Testing 13 (3), 241–256.
Some of the dilemmas which arise in assessment Milanovic, M and Saville, N (1996) Considering the Impact of
contexts can only be dealt with if a wide range of Cambridge EFL Examinations, Manuscript Internal Report,
stakeholders agrees to manage them in ways which they Cambridge: Cambridge ESOL.
find acceptable. As Fullan (1999:xx) puts it: ‘Top-down Resnick, L B and Glennan, T K (2002) Leadership for learning:
mandates and bottom-up energies need each other.’ A theory of action for urban school districts, in Hightower, A M,

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
8 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Knapp, M S, Marsh, J A and McLaughlin, M W (Eds), School Districts Wall, D (1999) The impact of high-stakes examinations on classroom
and Instructional Renewal, New York: Teachers College Press, teaching: a case study using insights from testing and innovation
160–172. theory, unpublished PhD thesis, Lancaster University.
Saville, N (2003) The process of test development and revision within Wall, D (2005) The Impact of High-Stakes Testing on Classroom
Cambridge EFL, in Weir, C and Milanovic, M (2003) (Eds) Continuity Teaching: A Case Study Using Insights from Testing and Innovation
and Innovation: Revising the Cambridge Proficiency in English Theory, Cambridge: Cambridge ESOL and Cambridge University
Examination 1913–2002, Cambridge: Cambridge ESOL/Cambridge Press.
University Press. Watanabe, Y (1997) The Washback Effects of the Japanese University
Sayer, A (1984) Method in Social Science: A Realist Approach, Entrance Examinations of English – Classroom-based Research,
Routledge: London. unpublished PhD thesis, University of Lancaster.
Sayer, A (2000) Realism and Social Science, Sage: London. Watanabe, Y (2004) Teacher factors mediating washback, in Cheng, L
Thelen, E and Smith, L B (1994) A Dynamic Systems Approach to the and Watanabe, Y (Eds) with Curtis A, Washback in language
Development of Cognition and Action, Cambridge, MA: The MIT testing: Research contexts and methods, Mahwah, N. J.: Lawrence
Press. Erlbaum Associates, 19–36.

Van Geert, P (2007) Dynamic systems in second language learning: Weir, C J (2005) Language Testing and Validation: An Evidence-based
Some general methodological reflections, Bilingualism: Language Approach, Basingstoke: Palgrave Macmillan.
and Cognition 10, 47–49.

Construct validation of the Reading module of an


EAP proficiency test battery
HANAN KHALIFA RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This summary is based on a doctoral thesis submitted to level is operationalised by scanning to locate specific
the University of Reading (UK) in 1997. The PhD was information, and reading carefully to infer the meaning of
supervised by Professor Cyril Weir. lexical items and identify pronominal referents. Global and
local comprehension levels are characterised by two
different rates of reading. Operations like skimming, search
reading, and scanning require a faster reading rate than
Research purpose those involving careful reading at microlinguistic level. Weir
The research sought to establish the construct validity of & Urquhart (1998) refer to the former as expeditious
the Reading module of an English for Academic Purposes reading operations (whereby the reader processes text
(EAP) Graduate Proficiency Test (GPT) Battery developed by quickly, selectively and efficiently) while they refer to the
the ESP Center of Alexandria University in Egypt. It latter as slow careful reading operations.
investigated the componential nature of the reading On reviewing empirical evidence provided by product- and
construct and the effect of background knowledge on test process-oriented studies, it became apparent that there is a
performance. Only full consideration of these two issues case for and against the multi-divisible nature of reading.
would substantiate validation of the Reading module. Product-oriented studies like that of Berkoff (1979), Carver
(1992), Davis (1968), Guthrie & Kirsch (1987) and process-
oriented studies (e.g. Anderson, Bachman, Perkins & Cohen
1991, Cohen 1984, Hosenfeld 1977, Nevo 1989) have
Research questions provided empirical evidence for the separability of skills. On
The Reading module of the Egyptian Graduate Proficiency the other hand, product-oriented studies (e.g. Lunzer, Waite
Test Battery (GPT) was intended to measure global and local & Dolan 1979, Rosenshine 1980, Rost 1993, Thorndike
comprehension. In the study, ‘global comprehension’ refers 1973) and process-oriented studies like that of Alderson
to understanding propositions at the macro-structure level (1990a & b) have provided evidence that reading is a single
of the text and ‘local comprehension’ refers to holistic process. What is most significant in all of these
understanding propositions at the micro-structure level. The studies is the occurrence of vocabulary as a second factor
former is concerned with the relationships between ideas (also referred to as word meaning, verbal reasoning, word
represented in complexes of propositions or paragraphs knowledge, semantic difficulty).
which tend to be logical or rhetorical (see Vipond 1980), The contradiction in findings seemed to be due to sample
whereas the latter is concerned with the relationships selection and methodology used. First, process-oriented
between individual sentences or concepts which tend to be studies researched at that time highlighted the absence of
mechanical or syntactical. Reading at the global level a working definition of the operations used in tests, hence,
involves skimming to establish the gist of the text, search disagreement among experts on what skill each item
reading to locate information on a pre-determined topic, tested. Second, most of the product-oriented studies did
and careful reading to understand explicitly and implicitly not take into account the ability to process text quickly, i.e.
stated main ideas. Reading at the local comprehension the tests used do not exhibit a wide coverage of putative

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 9

EAP reading operations. Third, most of the studies favouring When developing the specifications for the GPT,
a unitary concept have been carried out on young learners designers debated whether there should be separate
and in L1 contexts. The case might, therefore, be different if academic modules for the disciplines involved. Ultimately,
the sample used were adult non-native speakers who are they decided that the Reading module would have texts
spread out across a range of language proficiencies. It may covering three broad academic discipline areas: (1) Arts,
well be that for such a sample a distinction between lower- Social Sciences, Administrative and Business Studies
order skills and higher-order skills is valid (see Clarke 1980, (ASAB); (2) Sciences (SS); and (3) Dentistry, Medicine, and
Eskey & Grabe 1988). Health Sciences (DMHS). This decision was based on three
The fact that the Reading module was intended to views. First, if one were to design discipline-specific
measure a variety of reading operations, and that some modules for all disciplines it would clearly be a very large
studies provided evidence for the emergence of certain undertaking. Second, variation within a discipline area
operations as factors separate from a general reading inevitably meant that one module was by no means specific
competence one, provided the rationale for the formulation for all the candidates doing that module. Third, there is as
of the first research question. yet no body of evidence to support EAP testing claims that
candidates are disadvantaged if they take a test which is
Research Question 1:
not in the area of their discipline. The grouping of
Within which of the three Reading tests of the GPT Reading module are
the components of these tests testing different reading operations as
disciplines into three broad areas and classification of
claimed by the module designers? candidates accordingly were based on the lists supplied by
the Student Affairs Divisions in Alexandria (Egypt) and
Reading (UK) universities.
The starting point for the second research question was
The third research question explored the value of
Weir & Porter’s (1994) suggestion, based on reviewing
including subject-specific reading tests in EAP testing.
some empirical data, that tests which include items testing
What is meant by subject specific is ‘specific to the broad
local lower-order skills might discriminate against the
discipline areas’, for example, specific to the area of
micro-linguistically disadvantaged but otherwise competent
Science disciplines.
reader. Similarly, Alderson & Lukmani’s (1989) study has
shown that weaker students tended to cope quite well with Research Question 3:
the text and questions at the global level but this was not Will postgraduate candidates in three broad discipline areas perform
matched by their performance on questions focusing on better on a Reading Comprehension test whose content is on a topic
microlinguistic items at the local level. Thus, the researcher that is related to their own broad discipline area than on a Reading
Comprehension test whose content is on a topic that is related to
set out to investigate whether candidates were
another broad discipline area, given that the texts are of approximately
disadvantaged by the inclusion of any of the subtests,
comparable difficulty?
hence, the formulation of the second research question
where group and individual performances are considered.
Research Question 2: Studies in ESP testing examined at the time also appeared
(A) Do groups at different levels of proficiency perform the same across to suggest that other factors are at play and that these
the four components of each test in the GPT Reading module? factors seemed to be influencing the results or leading to
(B) Do individuals perform the same across the four components of conflicting results. We could divide these factors into two
each test in the GPT Reading module? types: test-related factors, such as sample size, sample
linguistic homogeneity, and sample academic level; and
The discussion of the nature of the reading construct posed text-related factors, such as text specificity, text difficulty,
another question: if sub-skills exist, do they interact with and topic familiarity. Thus, the fourth research question
other factors such as text organisation or readers’ familiarity attempted to find out which of these factors contributes
with test content? It seemed quite obvious that drawing most to candidates’ performance on EAP Reading
inferences can be easy when the reader has adequate Comprehension tests.
background knowledge about the topic. When discussing Research Question 4:
reading comprehension, we cannot discuss just the Which contributes more to candidates’ EAP reading proficiency scores:
interaction between the reader and the reading operations, topic familiarity, topic/text ease, or L2 proficiency level?
but also the interaction between the reader and the text, in
other words, the role of readers’ background knowledge in
text comprehension. Several studies (e.g. Alderson &
Research methods
Urquhart 1983, 1985 & 1988, Ausubel 1960, Clapham
1994 & 1996, Erickson & Molloy 1983, Ja’far 1992, Jensen Quantitative and qualitative research methods were used to
& Hansen 1995, Kattan 1990, Koh 1985, Moy 1975, Peretz investigate the above research questions. This included:
& Shoham 1990, Shoham, Peretz & Vorhaus 1987, Tan mindmapping, introspection procedures, feedback
1990) have investigated the effect of content familiarity on questionnaires and statistical analysis.
candidates’ performance in EAP reading tests. Data
emerging from these studies gives some tentative Instruments
indication that there is a relation between candidates’ To ensure that reading construct as defined by the test
background knowledge in their academic discipline and designers was adequately captured by the test items, the
their performance on EAP reading comprehension tests. items were matched against mindmaps of the text

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
10 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

produced by subject and language experts. This procedure It should be pointed out that the questionnaires were
was used to justify the existence of the test items, and to administered to test takers immediately after they had
re-categorise the items under four subtests. A panel of finished the tests. Since candidates did not take any two
language and subject experts was asked to provide tests immediately after each other, there is no reason to
mindmaps of the texts and identify key lexical words. They believe that in answering the questionnaires candidates
went through the operations the items were supposed to were comparing texts.
test. A synthesis of information was then collected from the
mindmaps. Items which did not feature in this consensus or Participants
on which expert judges widely disagreed were marked for Candidates who participated in this research comprised
possible exclusion from the tests. two sub-samples: linguistically heterogeneous and
The mindmapping procedure was followed by an linguistically homogeneous EAP learners. The homogeneous
introspection activity. The first part of this activity was used sample consisted of 973 non-native speakers of English
to establish whether each item measured what it was registering for postgraduate courses at Alexandria
designed to measure. Another group of language and University in Egypt. Candidates here share the same L1
subject experts and a group of proficient subject students background (i.e. Arabic). They were classified into the
were asked to introspect on what skill(s) they use in three broad discipline areas described above. The
answering the items. The second part of the activity heterogeneous sample consisted of 355 non-native
consisted of retrospection interviews with subject students. speakers of English. These were registering for
Interviews were conducted to clarify those cases where postgraduate courses at Reading University in England.
candidates had arrived at the same response via a process Candidates in this sample had different L1s (e.g. Chinese,
different from the expected one, and to ask why candidates French, Japanese, Danish, Italian, Turkish). Candidates
had left an item unanswered or had used more than one were classified into two broad discipline areas: Arts and
skill. The introspection procedure was a way of gaining Sciences. There is no Medical group in the UK sample
insights into how readers arrive at their answers and of since Reading University does not provide courses for
determining if test items were testing what they claimed to candidates in this group.
test. Forty-five subject lecturers (of near native proficiency in
The module was then administered and data was English) who were teaching postgraduates in Alexandria
subjected to classical and rasch analyses. Decisions on University in Egypt participated in the study. Lecturers in
which items to exclude or retain depended on the pulling of Arts disciplines were teaching at the faculties of Arts, Fine
evidence from three different data sources: meaning and Arts, Commerce and Tourism. Science disciplines lecturers
lexical consensus, introspection proforma, and item were teaching at Agriculture, Engineering and Science
analysis. faculties. Lecturers from the Medical disciplines were
In order to investigate research questions 2(A) and 4, teaching at the faculties of Dentistry, Medicine, Nursing and
it was necessary to have a common measure of proficiency Pharmacy. No data was collected from subject lecturers in
so that candidates could be placed into language levels. the UK due to practical constraints.
Thus, a vocabulary and grammar test which was part of
the Test of English for Educational Purposes (TEEP) (see
Weir 1988) was used. Candidates were divided into
three levels in accordance with Egyptian universities’ Results and discussion
proficiency level requirements for admission to
postgraduate courses. Research question 1
In order to investigate research question 4, two sets In order to investigate the first research question,
of questionnaires were used to find out about text qualitative data from introspection proforma and
specificity, topic/text ease, and topic familiarity. The retrospection interviews as well as quantitative data from
subject lecturers’ questionnaire was used to find out how subtests’ inter-correlations and factor analysis were
they assessed the specificity, familiarity, and difficulty collected from Egyptian and UK pre-sessional samples
of the Reading module texts on a 4-point scale (high, taking a single test: the Arts Test, the Science Test, or the
medium, low, not at all) according to their knowledge of Medicine Test.
their students’ level of proficiency and of the discipline All three tests exhibited low inter-correlations between
knowledge they thought their students might use in subtests measuring global and local comprehension, and
answering the items. The term ‘specific’ here was used to between subtests requiring expeditious and careful
indicate how specific the topic was, how specific the reading. Factor analysis gave an indication that the tests
vocabulary used in the text, and how specific the non-linear were not operating uni-dimensionally. It showed the
information given in the text were to their postgraduate consistent presence of at least a second factor. It also
students. Familiarity was defined in terms of the topic and appeared to suggest that candidates behave differently on
the rhetorical organisation of the texts and tasks required the operations being tested: a clear factor structure
to answer the test items. Difficulty was seen in terms of showing a distinction between expeditious and careful
language in a text and item difficulty. reading occurred across a range of samples of EAP
The test takers’ feedback questionnaire was used to find candidates taking different tests. This is in line with
out about perceived topic familiarity, and perceived topic Guthrie & Kirsch’s (1987) and Carver’s (1992) findings
ease/test bias. A 3-point scale was used for those items. that made a case for differentiating between reading to

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 11

comprehend explicitly stated ideas and reading to locate consistently passed on slow careful but failed on the
specific information. expeditious reading parts.
Similarly, introspection proforma and retrospective Overall, the findings indicate that candidates perform
interviews indicated that the operations the subject differentially on the subtests. Some appear to be
students reported using to answer the test items differed disadvantaged by the expeditious reading subtests
according to the subtest they were answering. For example, compared to the careful reading ones, while others appear
in the scanning subtest, students reported rapid to be disadvantaged by the local comprehension subtests
inspection of the text; going backward and forward in the compared to the global ones. In certain individuals,
text looking for specific words, dates, etc. In contrast, in however, the case may be more marked in terms of global
the reading carefully subtest students reported slow and local or expeditious and slow. Individuals vary in their
inspection of the text; observance of the linearity and profile of proficiency – where local comprehension might be
sequencing of the text. They read and reread in order to weaker than global comprehension and expeditious reading
establish more clearly and accurately the comprehension weaker than slow reading, for instance. Furthermore, these
of main ideas. differences may vary considerably with level of candidates
On the whole, the answer to this research question is and according to text. It is clear from this data that a
‘Yes’. Findings from qualitative and quantitative research serious case can be made for the profiling of abilities in
methods appear to support the test designers’ claim that each of the skill operations; otherwise false conclusions
the tests are measuring separable subskills, and lend may be drawn about candidates’ reading ability.
support to the argument for the existence of separate
reading operations. They, therefore, contradict the oft- Research question 3
expressed view that reading is a unitary construct. There seemed to be no straightforward answer to this
research question. The findings showed that the evidence is
Research question 2 mixed. For the entire test population, no significant
For research question 2, group and individual performances difference was observed between the performance of the
in the linguistically homogeneous and heterogeneous different discipline groups. Candidates did not seem to
samples in single and paired data sets were looked at. either suffer or profit from taking Reading tests in different
The Grammar Test was used as a measure of candidates’ discipline areas. This finding is compatible with those of
general language ability and to classify them into high, Carrell (1983) and Clapham (1993, 1994, 1996).
middle, and low proficiency level groups. Cross-tabulations When looking at group performances in the paired data
were used. The intention was to compare the performances sets, significant differences were found. Both discipline
of individuals who passed and those who failed in each of groups (Arts and Sciences) of the linguistically
the GPT Reading module tests. Research findings provided heterogeneous sample appeared to suffer when taking the
evidence for significant differential performance on the Science Test and profit when taking the Arts Test. In
components of the tests. contrast, each of the three groups of the linguistically
In most cases candidates perform better on global items homogeneous sample (Arts, Sciences, Medicine) appeared
than on local items. This seems to be in line with the to be at an advantage when taking the Science Test and at a
findings of Alderson & Lukmani’s (1989) study. Similarly, disadvantage when taking the Arts Test. This picture was
most of the evidence shows that candidates of different confirmed when considering individual performances in the
ability levels seem to perform better on items requiring slow paired data sets.
careful reading than those requiring expeditious reading. In considering the findings of this research question, it
This is in line with Beard (1972) and Weir (1983) whose should be noted that the value of using a homogeneous
studies into students’ abilities indicate that ‘for many sample is that candidates share the same L1, similar
readers reading quickly and efficiently posed greater instructional background, or previous learning experiences,
problems than reading carefully and efficiently’ (Weir 1998). that is, variables that were not controlled for in the
This draws attention to Weir & Urquhart’s (1998) call for heterogeneous sample and might have neutralised the
‘paying attention to expeditious reading strategies in both subject effect for this sample. It should also be noted that
teaching and testing’. It should be noted that candidates of the texts in the GPT Reading module were selected from
different proficiency levels performed the worst on academic journals in the appropriate broad discipline
scanning, with the low-level groups being the most severely areas. They were expected to be appropriate and specific to
disadvantaged by the inclusion of scanning items in a the relevant Reading module and, therefore, by implication
Reading Comprehension test. to be unsuitable for or unfamiliar to candidates in other
The results of cross-tabulations for individual disciplines. However, the evidence provided by this
performances affirmed those reported for the group data. research showed that, in some cases, this is not necessarily
The most interesting finding, however, came from the the case. One possible explanation could be that studying
paired data sets. These showed that, across two tests, not a in one particular discipline area does not mean that
single individual performed consistently better on local candidates are ignorant about other disciplines or
than on global comprehension components, or on unfamiliar with other rhetorical structures. They may well
expeditious than on slow careful reading components. In read books and articles in disciplines outside their own
contrast, the results showed a number of individuals who academic field.
consistently passed on global and failed on the local The findings of the third research question seem to
comprehension parts of both tests, and others who indicate that if there is to be one test catering for

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
12 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

candidates from different disciplines for reasons of Conclusion


practicality, then there is evidence coming out of the
heterogeneous sample to suggest the selection of a Separability of reading operations
humanities-based test. However, there is evidence provided It seemed that, irrespective of texts or item difficulty,
by the homogeneous sample to suggest that candidates differential performances on the components of the
would suffer if they take a humanities-based test and profit Reading tests occurred. This has implications for teaching
if they take a science-based one. This implies that the and testing EAP reading at least within the Egyptian context.
argument for ESP testing remains unproven at the time the Firstly, the findings of this study seemed to suggest that
study was conducted. training in expeditious reading strategies may still be
inadequate in the EAP classroom. It seems that the
Research question 4
tendency in these classes is to focus on careful reading,
In order to investigate this research question, subject while expeditious reading in the sense of efficient and
lecturers’ and test takers’ views on topic familiarity and on quick reading seems to be neglected to a large extent.
topic/text ease were considered. Here, only the paired data If the aim is to have efficient readers, then the teaching
set of the Egyptian sample was used. A multiple linear tasks should also include the practice of expeditious
regression analysis was also conducted to find out what reading operations. If students are trained to use various
proportion of the total variability in candidates’ scores can expeditious operations to deal with different reading tasks,
be explained by proficiency level, topic familiarity, and they will be able to cope with similar real world reading
topic/text ease, as well as which combination of these tasks better. As Weir & Urquhart (1998) suggest, different
variables most accurately predicts candidates’ performance passages could be used for teaching expeditious and
or which do not add much to the prediction. The Grammar careful reading to make students aware of the flexibility of
Test was used as a way of determining proficiency levels. using different approaches to different texts and different
There is some evidence to suggest that subject lecturers’ tasks.
views on topic familiarity, on the whole, seem to be a good Secondly, although expeditious reading operations have
indicator of candidates’ test performance. For example, been incorporated into reading materials, they have been,
when subject lecturers said their students were more to a large extent, overlooked by test designers whose focus
familiar with the Science Test topic than with the Arts Test at the time of conducting the study was mainly on careful
topic, this was reflected in the students’ better performance reading. The study, with its literature review and data
on the Science Test. Thus, in selecting texts it seems analyses drawn from various sources, reflected the need to
worthwhile to collect such views in order to reduce test bias include a subtest forming expeditious reading operations in
by eliminating unfamiliar test topics. There is also some EAP tests. Similarly, in view of construct validity test
evidence to suggest that test takers’ subjective evaluation designers can hardly ignore such a need.
of the relative difficulty of a Reading text (as measured by Thirdly, the profiling of abilities seemed fairer than
their views on topic/text ease) is not always a good reporting results as a composite score. In other words, in a
indicator of their actual performance on Reading case like the GPT reading tests, the aim should be to
Comprehension tests. This finding is consistent with produce a profile of: the ability to read expeditiously at the
Carrell’s suggestion that ‘non-native readers appear not to global level, the ability to read expeditiously at the local
have good sense of how easy or difficult a text is for them level, the ability to read carefully at the global level, and the
to understand’ (Carrell 1983:183). ability to read carefully at the local level.
The results of multiple regression analysis, which was
run separately on the Arts and Science Tests, showed that Number of texts
topic familiarity and proficiency level contributed the most In terms of task effect, the evidence provided by the
to candidates’ reading proficiency scores. Proficiency level present study forces us to accept that, despite rigorous test
appeared to have a much stronger effect on candidates’ development procedures, item/component difficulty may
scores on the Arts Test than did topic familiarity, whereas vary from one test to another; and that some tests will
it was the other way round in the Science Test. This seems simply be easier to access than others. This might be due
to imply the existence of text effect. One can only speculate to factors like rhetorical organisation, macro-structure, and
on the reasons, though. It might be possible that so on. The only real solution, therefore, is to develop
candidates resorted to their knowledge of the language clearer procedures for identifying these factors, or to use a
because they found the Arts text more difficult than the range of texts for testing each component.
Science one. On the other hand, it might be possible that In terms of text effect, the evidence supplied is mixed
candidates found the Science text more specific than the and it appears that the nature of the research sample is
Arts one so they turned to their knowledge of the topic and playing an important role. For example, the UK pre-
capitalised on their familiarity with the topic. The sessional data suggests that if there has to be one text,
contribution of topic/text ease was the least marked. In fact a humanities-based one would be the least
this variable did not contribute much to the regression disadvantageous. However, data from the Egyptian
equation. These findings are compatible with Mohammed & homogeneous sample suggests that candidates would be
Swales (1984) and Zuck & Zuck (1984) who found that topic better off with a science text. One can only speculate that
familiarity is often a greater predictor of comprehension this mixed evidence might be due to differences in
ability than are text-based linguistic factors such as instructional background. There seems to be a need,
syntactic ease. therefore, to consider carefully candidates’ previous

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 13

education when deciding on the number and nature of Following such a procedure provided a sound basis for the
texts to be used in an EAP Reading test. final version of the tests in the Reading module. The
Given the above mixed evidence, it would probably be mindmapping consensus eliminated idiosyncrasies that
safer to use a variety of texts from the broad discipline existed in content selection. The introspection proforma
areas. If either one text or more than one is opted for, there appeared to enhance the probability that the required
should be systematic ways followed in text selection. The operations were being tested. The retrospection interviews
next section describes the basis on which texts should be illuminated, to some extent, how the behaviour test items
selected. produced may equate with the behaviour identified in the
theory-based model.
Selection of texts
In selecting texts, the importance of face validity cannot be References
ignored if just one academic Reading test is opted for. We Alderson, J C (1990a) Testing Reading Comprehension Skills (Part
cannot ignore what subject lecturers and test takers said One), Reading in a Foreign Language 6 (2), 425–438.
regarding text specificity and topic familiarity. In addition, Alderson, J C (1990b) Testing Reading Comprehension Skills: Getting
it might be hard to get an approval from university Students to Talk about Taking a Reading Test (Part Two), Reading in
authorities to use a test which seemingly does not look a Foreign Language 7 (1), 465–503.
subject specific. The danger also exists that under Alderson, J C and Lukmani, Y (1989) Cognition and Reading: Cognitive
examination conditions some students might be upset by Levels as Embodied in Test Questions, Reading in a Foreign
Language 5 (2), 253–270.
the apparently unfamiliar material. In turn, they might not
do as well as they should have. Alderson, J C and Urquhart, A H (1983) The Effect of Students’
Background Discipline on Comprehension: a Pilot Study, in
On the other hand, if there is a need to create parallel
Hughes, A and Porter, D (Eds) Current Developments in Language
EAP Reading tests, it seems quite impossible to find texts Testing, London: Academic Press, 121–138.
which are similar in terms of specificity, difficulty, and
Alderson, J C and Urquhart, A H (1985) The Effect of Students’
familiarity unless either a general academic text or a very Academic Discipline on Their Performance on ESP Reading Tests,
highly specific one is opted for. If the latter is chosen, then Language Testing 2 (2), 192–204.
the number of candidates who would be sitting for such a Alderson, J C and Urquhart, A H (1988) This Test is Unfair: I’m not an
test is inevitably limited. In addition, tests which are too Economist, in Carrell, P L, Devine, J and Eskey, D E (Eds) Interactive
specialised may assess subject matter knowledge in a Approaches to Second Language Reading, Cambridge: Cambridge
particular field more than the reading ability of the University Press, 168–183.
candidates, and thus individuals who happen to have less Anderson, N J, Bachman, L, Perkins, K and Cohen, A (1991) An
subject matter knowledge might be discriminated against. Exploratory Study into the Construct Validity of a Reading
Thus one is forced to choose texts which are equally Comprehension Test: Triangulation of Data Sources, Language
Testing 8 (1), 41–66.
comprehensible for, and generally accessible to candidates
in all fields within the broad discipline areas. They should Ausubel, D P (1960) The Use of Advance Organisers in the Learning
and Retention of Meaning Material, Journal of Educational
come from an academic source and have an academic
Psychology 51, 267–272.
nature. The rhetorical structure could be argumentative or
Beard, R (1972) Teaching and Learning in Higher Education,
Introduction-Methods-Results-Discussion (IMRD), the
Harmondsworth: Penguin Books Ltd.
former being more suitable to humanities-oriented
Berkoff, N A (1979) Reading Skills in Extended Discourse in English
candidates and the latter to scientifically-oriented
as a Foreign Language, Journal of Research in Reading 2 (2),
candidates. 95–107.
In other words, in developing Reading tests which cater
Carrell, P L (1983) Some Issues in Studying the Role of Schemata or
for a large number of candidates, there is a need to ensure Background Knowledge in Second Language Comprehension,
that the chosen topic is fairly familiar to all candidates so Reading in a Foreign Language 1 (2), 81–92.
as to avoid bias caused by topic familiarity. Several texts Carver, R P (1992) Reading Rate: Theory, Research and Practical
of different topics might be used to counter-balance the Implications, Journal of Reading 36 (2), 84–95.
topic-familiarity effect. The level of difficulty of the test Clapham, C (1993) Is ESP Justified? in Douglas, D and Chapelle, C
should also be taken into account. The texts also would (Eds) A New Decade of Language Testing Research, TESOL,
have to be submitted to subject specialists and students 257–271.
to check that no discipline is advantaged over another. Clapham, C (1994) The Effect of Background Knowledge on EAP
These factors appear to be crucial to test designers to get Reading Test Performance, unpublished PhD thesis, University of
stable, reliable, and meaningful results. Thus what seems Lancaster.
to be needed is the development of a mechanism to screen Clapham, C (1996) The Development of IELTS: A Study into the Effect
texts for difficulty, familiarity, and specificity. of Background Knowledge on Reading Comprehension, Cambridge:
University of Cambridge Local Examinations Syndicate.
Clarke, M A (1980) The Short-circuit Hypothesis of ESL Reading – or
Triangulation of data sources
When Language Competence Interferes with Reading Performance,
In empirically validating the GPT Reading module, The Modern Language Journal 64 (2), 104–109.
information was collected from a variety of sources: Cohen, A D (1984) On Taking Tests: What the Students Report,
experts’ mindmapping consensus, experts’ and subject Language Testing 1 (1), 70–81.
students’ introspection proforma, subject students’ Davis, F B (1968) Research in Comprehension in Reading, Reading
retrospection interviews, and item statistical analyses. Research Quarterly 3 (4), 499–545.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
14 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Erickson, M and Molloy, J (1983) ESP Test Development for Rosenshine, B V (1980) Skill Hierarchies in Reading Comprehension,
Engineering Students, in Oller, J W (Ed.) Issues in Language Testing in Spiro, R J, Bruce, B C and Brewer, W F (Eds) Theoretical Issues in
Research, Rowley, Mass.: Newbury House, 280–300. Reading Comprehension, Hillsdale, NJ: Erlbaum, 535–554.
Eskey, D E and Grabe, W (1988) Interactive Models for Second Rost, D H (1993) Assessing the Different Components of Reading
Language Reading: Perspectives on Instruction, in Carrell, P L, Comprehension: Fact or Fiction? Language Testing 10 (1), 79–92.
Devine, J and Eskey, D E (Eds) Interactive Approaches to Second Shoham, M, Peretz, A S and Vorhaus, R (1987) Reading
Language Reading, Cambridge: Cambridge University Press, Comprehension Tests: General or Subject Specific?, System 15 (1),
223–236. 81–88.
Guthrie, J T and Kirsch, I S (1987) Distinctions Between Reading Tan, S H (1990) The Role of Prior Knowledge and Language
Comprehension and Locating Information in Text, Journal of Proficiency as Predictors of Reading Comprehension among
Educational Psychology 79, 220–228. Undergraduates, in de Jong, J H A L and Stevenson, D K (Eds),
Hosenfeld, C (1977) A Preliminary Investigation of the Reading Individualising the Assessment of Language Abilities, Clevedon,
Strategies of Successful and Nonsuccessful Second Language PA: Multilingual Matters, 214–224.
Learners, System 5 (2), 110–123. Thorndike, R L (1973) Reading as Reasoning, Reading Research
Ja’far, W M (1992) The Interactive Effects of Background Knowledge on Quarterly 9, 135–147.
ESP Reading Comprehension Proficiency Tests, unpublished PhD Vipond, D (1980) Micro- and Macro-processes in Text
thesis, University of Reading. Comprehension, Journal of Verbal Learning and Verbal Behaviour
Jensen, C and Hansen, C (1995) The Effect of Prior Knowledge on EAP 19, 276–296.
Listening Test Performance, Language Testing 12 (1), 99–119. Weir, C J (1983) Identifying the Language Problems of Overseas
Kattan, J (1990) The Construction and Validation of an EAP Test for Students in Tertiary Education in the United Kingdom, unpublished
Second Year English and Nursing Majors at Bethlehem University, PhD thesis, University of London.
unpublished PhD thesis, University of Lancaster. Weir, C J (1988) The Specification, Realisation and Validation of an
Koh, M Y (1985) The Role of Prior Knowledge in Reading English Language Proficiency Test, ELT Documents: 127, Modern
Comprehension, Reading in a Foreign Language 3 (1), 375–380. English Publications: The British Council.
Lunzer, E, Waite, M and Dolan, T (1979) Comprehension and Weir, C J (1998) The Testing of Reading in a Second Language,
Comprehension Tests, in Lunzer, E and Gardner, K (Eds) The Language Testing & Assessment 7, Kluwer: Dordrecht.
Effective Use of Reading, London: Heinemann Educational, 37–71. Weir, C J and Porter, D (1994) The Multi-Divisible or Unitary Nature of
Mohammed, M A H and Swales, J M (1984) Factors Affecting the Reading: The Language Tester between Scylla and Charybdis,
Successful Reading of Technical Instructions, Reading in a Foreign Reading in a Foreign Language 10 (2), 1–19.
Language 2 (2), 206–217. Weir, C J and Urquhart, A H (1998) Reading in a Second Language:
Moy, R (1975) The Effect of Vocabulary Clues, Content Familiarity and Process and Product, Longman.
English Proficiency on Cloze Scores, unpublished PhD thesis, Zuck, L V and Zuck, J G (1984) The Main Idea: Specialists and Non-
University of California. specialist Judgements, in Pugh, A K and Ulijn, J M (Eds), Reading
Nevo, N (1989) Test-taking Strategies on a Multiple-choice Test of for Professional Purposes: Studies and Practices in Native and
Reading Comprehension, Language Testing 6 (2), 199–215. Foreign Languages, London: Heinemann Educational, 130–145.
Peretz, A S and Shoham, M (1990) Testing Reading Comprehension in
LSP: Does Topic Familiarity Affect Assessed Difficulty and Actual
Performance?, Reading in a Foreign Language 7 (1), 447–455.

Comparing proficiency levels in a multi-lingual


assessment context
KAREN ASHTON RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This short summary is based on a doctoral thesis submitted and QCA 1999) and the increasing use of the Common
to the Faculty of Education, Cambridge University (UK) in European Framework of Reference (CEFR hereafter) (Council
2008. The research was funded by Cambridge ESOL. The of Europe 2001) both within England and Europe.
PhD was supervised by Dr Neil Jones and Dr Edith Esch. ‘Can Do’ statements are commonly used, and are being
The PhD research focused on Cambridge ESOL’s Asset promoted for wider adoption (see Council of Europe 2008),
Languages assessments. in educational assessment to describe the level of a
This mixed-methods PhD explores and compares the learner’s reading proficiency. However, there is no research
reading proficiency of secondary school learners of German, as to how, or whether, such ‘Can Do’ frameworks can be
Japanese and Urdu in England with the aim of investigating applied to all languages, particularly non-Latin script or
and shedding light upon the feasibility of relating learners community languages. The majority of research in this area
of different languages and contexts to the same framework. has focused on learners of English, although the few single
This research has important implications within education, language research studies undertaken indicate that reading
particularly given the use of frameworks such as the in languages like Japanese and Urdu requires different
National Curriculum for Modern Foreign Languages (DfES processing strategies from reading in alphabetic languages

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 15

such as German for learners with English as their first Urdu as a resource when reading. Finally, this research
language. Existing research has also failed to relate findings demonstrates that the construct of reading in the National
to proficiency level, making it impossible to compare Curriculum for Modern Foreign Languages is not endorsed
findings across studies. by any of the learner groups, which is worrying for language
This thesis employed a mixed-methods approach, using education and assessment within England and raises the
self-assessment ‘Can Do’ surveys and think-aloud need for further research.
protocols, to compare the reading proficiency of secondary
school learners of German, Japanese and Urdu in England. References
Findings show that statistically the same three factors best Council of Europe (2001) Common European Framework of Reference
represent learners’ understanding of reading proficiency for Languages: Learning, Teaching, Assessment, Cambridge:
across all three languages. However, there are also strong Cambridge University Press.
differences. For example, the difficulty of script acquisition Council of Europe (2008) Recommendations of the Committee of
in Japanese impacts on learners’ understanding of the Ministers to member states on the use of the Common European
construct, while learners of both Japanese and Urdu were Framework of Reference for Languages (CEFR) and the promotion of
unable to scan texts in the way learners of German were plurilingualism, Strasbourg, Adopted by the Committee of
Ministers on 2 July 2008.
able to. Urdu learners under-rated their ability, not taking
DfES and QCA (1999) The National Curriculum for England: Modern
into account the wide range of natural contexts in which
Foreign Languages, London: DfEE/QCA.
they use Urdu outside the classroom. The findings also
illustrate how Urdu learners use their spoken knowledge of

Testing financial English: Specificity and appropriacy


of purpose in ICFE
ANGELA WRIGHT BUSINESS MANAGEMENT GROUP, CAMBRIDGE ESOL

This short summary is based on a Master’s thesis submitted given to testing specialists only. It was designed to measure
to Anglia Ruskin University in 2007. The research was the degree of specificity of various aspects of context
funded by Cambridge ESOL. validity in ICFE in comparison to Business and General
Developers of tests of languages for specific purposes are tests. The third stage involved a corpus study which aimed
faced with the challenge of creating tests which allow for an to identify some of the characteristics of the core language
appropriate interaction between subject knowledge and of Financial English, by comparing Financial English texts to
language ability in relation to the target language use Business and General English texts. The results taken
domain. This dissertation was completed while the together suggest that ICFE might be placed at the more
International Certificate in Financial English (ICFE) was specific end of the ‘specificity continuum’ than the General
under development and set out to establish the extent to and Business English tests, and that although there is
which the Reading paper meets this challenge. The research considerable fuzziness between Financial and Business
aimed to establish the degree of specificity of the ICFE English, distinct linguistic differences were found between
Reading paper, to try and identify the characteristics that Financial and General English and the beginning of a core
make it specific, and to find out how appropriate it is as a Financial lexis was identified. It was found that the degree
testing instrument for people working in or intending to of specificity of ICFE made it appropriate as a testing
work in the financial domain. There were three stages in instrument in relation to the target domain. For more details
this research, each comparing ICFE to tests of General and on one of the aspects of this study see Wright (2008).
Business English at the same level (CEFR levels B2/C1). In
the first stage, a questionnaire was administered to both References
subject specialists and non-specialists. It was designed to Wright, A (2008) A corpus-informed study of specificity in Financial
measure the subject specificity and appropriacy of the texts English: the case of ICFE Reading, Research Notes 31, 16–21.
used in ICFE. In the second stage, a questionnaire was

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
16 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

The expression of affect in spoken English


MARK ELLIOTT ASSESSMENT AND OPERATIONS GROUP, CAMBRIDGE ESOL

This paper is based on a Master’s thesis submitted to dynamic system (Larsen-Freeman & Cameron 2008), and
King’s College London (UK) in 2008. The thesis was treating it as such provides a suitable framework for
supervised by Susan Maingay and Dr Nick Andon. investigating the expression of affect.
When we speak, we do not merely transfer information Complex systems involve a large number of components
from one individual to another; we also give expression to a interacting, often in a non-linear fashion (i.e. when a
whole range of emotions, attitudes and evaluations. This change in input results in a disproportionate change in
phenomenon, ‘pervasive, because no text or utterance is output). Complex systems exhibit certain key features. Let
ever absolutely free from it [and] elusive, because it may be us consider these, following Larsen-Freeman & Cameron
difficult to say exactly what it is that gives the text or (2008), with examples of how they relate to language:
utterance that certain quality’ (Dossena & Jucker 2007:7), 1. Heterogeneity of elements and agents: the elements or
is known as affect. agents in a complex system are often extremely diverse,
At present, affect tends to sit on the periphery of models and can be processes rather than entities, or even
of language and language proficiency, treated as an complex subsystems. Although the components may be
‘optional overlay of emotion’ (Thompson & Hunston diverse, they are interconnected – change in one
2000:20) to the expression of ‘core’ informational meaning. component affects others. Language elements include
Affect can be broken down into two core areas: emotion phonetic and phonological features, lexis, grammar and
and attitudes. Emotion covers feelings such as anger and discourse-level features; agents include users of the
happiness, while attitudes are an individual’s opinions of language (at an individual level) and society (at a higher
the world, formed through predisposition, experience and level).
ideology, and which colour his or her perceptions. Attitudes 2. Dynamics: complex systems are in a permanent state of
are realised in language by evaluation (Thompson & flux. Change takes place on scales (time) and levels
Hunston 2000), which are essentially good or bad value (size): change may occur at the level of the whole
judgements. Evaluation ‘does not occur in discrete items system, a subsystem within it, or only a very small part
but can be identified across whole phrases, or units of of it. Different levels and scales influence each other
meaning, and ... is cumulative’ (Hunston 2007:39). upwards and downwards. Languages change on both
Affect can be expressed towards many different objects. micro levels (such as the introduction of a new word)
These are most likely to be previous utterances, the and macro levels (such as changes in the formation of
proposition being made, agents implicated within the tenses), and both over short and long scales.
proposition, the listener or the speaker; there could,
3. Non-linearity: due to the interconnected nature of the
however, be still more.
elements in a complex system, change can result which
Many different resources are employed in the expression
is out of proportion to the external stimulus. An example
of affect, and they interact in complex and sometimes
of this is the famous ‘butterfly effect’ (weather is an
unpredictable ways. To reflect this, this study is grounded in
example of a complex system). Some language
a complex systems view of language (Larsen-Freeman &
innovations spread rapidly through a language while
Cameron 2008). The study considers how different
others are ignored. Similarly, a slight change of
elements of language interact within a specific context to
intonation could render a completely different
create affective meaning.
interpretation to an utterance.
4. Openness: complex systems are open. They can – and
must – take on new elements and energy in order to
Complex systems theory and language remain in a state of dynamic stability, where the system
‘Tidy explanations survive as long as all that has to be is stable but not static or fixed. New words are constantly
explained is the meaning of sentences invented by armchair being created, either to label new developments in
linguists’ (Coates 1990:62). society and the world (the source of external energy), or
Coates captures one of the tensions at the heart of from other languages through ‘borrowed’ words.
applied linguistics. By focusing on small, manageable areas 5. Adaptation: many complex systems are adaptive,
of the language and producing clear, tidy explanations, we meaning that change in one part of the system leads to
can lose sight of the fact that real-life language simply does change in the system as a whole, as it adapts to the new
not behave in this fashion. In reality, the production of situation. Although languages are in constant flux, the
meaning is a highly complex process involving the basic requirement of intelligibility dictates that the
interaction of a variety of components: lexis, grammar, language incorporates changes by adapting to new
phonology, discourse-level features, paralinguistic and non- circumstances without losing its overall integrity.
verbal features and, crucially, context. Indeed, language 6. The importance of context: context is crucial when
exhibits many, if not all, of the properties of a complex considering complex systems – indeed, the context

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 17

within which a system operates cannot be considered malleability of language: ‘What is important for the
separate from the system itself; it actually forms a part speaker about a linguistic form is not that it is always a
of the system. For example, no utterance in any language stable and self-equivalent signal, but that it is an always
can be fully interpreted without consideration of the changeable and adaptable sign.’ This is not to negate the
context it was uttered in, such as who uttered it, to importance of lexis, but merely to underline that it is one
whom and in what situation. of several affective resources employed in an utterance;
7. Constructions: construction grammar (Goldberg 2003) this holds true of all affective resources. In analysing a text,
provides a model of grammar which is consistent with we need to consider the interaction of the affective
complex systems theory, and within which we shall resources.
frame this study. Constructions range from morphemes There are other lexemes which encode ideational
through words and chunks, up to abstract grammatical meanings whilst also expressing an affective connotation;
structures. Constructions carry inherent semantic or these often exist in apposition to more affectively neutral
discoursal functions, rather than being ‘empty’ syntactic alternatives. For example, the words dog, doggie, cur and
shells for meaning-carrying words. These semantic mutt all have the same ideational referent, but encode
meanings can change over time – for example the be rather different affective connotations.
going to construction originally only denoted movement:
Semantic prosody
I’m going to the shops (literally), but developed its
present future meaning, as in: I’m going to buy some A form of connotation can exist at another level through
bread there (Perez 1990). semantic prosody – how ‘a given word or phrase may occur
most frequently in the context of other words or phrases
which are predominantly positive or negative in their
Discourse and complex systems evaluative orientation’ (Channell 2000:38). In this way,
We try to understand language in use ‘by looking at what connotations of collocants are ‘inherited’ by the word or
the speaker says against the background of what he might phrase, often lending them an affective meaning which can
have said but did not, as an actual in the environment of a develop across a text or texts. Corpus analysis of semantic
potential’ (Halliday 1978:52). This Systemic Functional prosodies has produced some interesting, not always
viewpoint is echoed in a complex systems approach, where intuitive, results – the phrase par for the course, for
discourse is ‘action in complex dynamic systems nested example, almost exclusively appears in cases of negative
around the microgenetic moment of language using’ evaluation, so although it may not directly encode a
(Larsen-Freeman & Cameron 2008:163). Individuals adapt negative connotation, it carries a negative semantic
their utterances to take into account all relevant contextual prosody (ibid.).
features.
In discourse, different scales and levels interact to create
Grammar
complex systems phenomena we have already
encountered: self-organisation (the progression of the Affective constructions
discourse), emergence (of meaning and new semiotic Wierzbicka (1987) argues that certain constructions encode
entities within the discourse) and reciprocal causality specific affective meanings that cannot be accounted for by
(between the interlocutors, and between the speakers and reference to conversational implicature alone. I will term
the discourse itself). The expression of affective meaning such constructions, which encode an affective meaning
can be viewed as an emergent phenomenon from the either instead of or in addition to an ideational meaning,
interaction of the elements and agents of the complex affective constructions. A simple example of an affective
system of discourse. construction is the What’s X doing Y? construction which
expresses incongruity, e.g. What’s this scratch doing on the
table? (Kay & Fillmore 1999).
Affective resources Other constructions, particularly focusing constructions,
may contribute to the expression of affect indirectly.
Speakers use a range of resources within the language to
For example, non-defining which-clauses, particularly
create affective meaning: lexis, grammar, phonology,
continuative ones, have been shown to encode an
discourse-level features and context. We will term these
evaluative function in the majority of cases (Tao & McCarthy
affective resources, and consider them in turn.
2001). The use of such marked forms may be considered a
case of grammatical metaphor (see below).
Lexis
Individual lexemes Grammatical metaphor
Some words and phrases serve purely affective functions; ‘A meaning may be realised by a selection of words that is
brilliant, for example, has no ideational meaning beyond different from that which is in some sense typical or
the evaluative. However, the affective meaning of an unmarked. From this end, metaphor is variation in the
utterance is not determined by lexis alone. The utterance expression of meanings’ (Halliday 1994:341).
That was brilliant could convey its ‘natural’ semantic Halliday’s concept of grammatical metaphor, analogous
meaning, but in a different context and with sarcastic to the concept of lexical metaphor, holds that grammatical
intonation, it could also convey precisely the opposite choices are made in the production of any utterance, and
meaning. As Volos̆inov (1986:68) notes regarding the that such choices are meaningful. Halliday uses the term

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
18 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

congruent to describe typical or unmarked forms – a Intonation


congruent form can be viewed as ‘the one that is most It is notoriously difficult to establish any concrete rules
functionally transparent or motivated’ (Veltman 2003:321). regarding the use of intonation for affective purposes;
Grammatical metaphor can encode affective meaning; by although a relationship between intonation and affective
employing an incongruent form which does not encode any meaning clearly exists, different speakers have their own
additional ideational meaning, an affective motivation is ways of exploiting intonation patterns to produce affective
likely to be inferred. results (Jenkins 2000). There appear to be norms at some
level, however, although such norms vary from dialect to
Semantic prosody – collostructions dialect (Tarone 1973).
The concept of collocation can be extended to constructions
Voice quality
as collostructions (Stefanowitsch & Gries 2003) by
considering the strength of attraction between a The quality of a speaker’s voice – whether it is neutral,
construction and its collexemes (lexis which appears in tense, breathy, whispery, harsh or creaky – is an important
slots within the construction). Collostructional analysis contributor to affective meaning. Again, the processes at
shows that the concept of semantic prosody, by extension, work are complex, and voice quality combines with other
also applies to constructions; for example, collostructional phonological and prosodic features such as speech rate to
analysis of the construction N waiting to happen shows create overall effect (Gobl & Chasaide 2003).
that it features strong negative lexical association, Other prosodic features
overwhelmingly favouring accident and disaster as
Marked stress, pauses and other features including those
collexemes (ibid.).
outlined above, combine to create phonological metaphor,
which operates in a similar manner to grammatical
Features of spoken grammar metaphor (Veltman 2003).
Spoken grammar differs from that of the written language
and some of these differences have a bearing on the Discourse-level features
expression of affective meaning. For example, subject
Presupposition
ellipsis, a feature of informal spoken English, frequently
encodes affective meanings (Nariyama 2006). An elided Beyond what is directly said in a text lies a whole set of
utterance has a more subjective, evaluative nature (Zwicky presuppositions, which together form a presupposed world,
2005), as illustrated by the first sentence below: in which ‘the narrator has given form to an idea of what an
agent and an action are, and of what an expected
Odd that Mary never showed up. succession of events is’ (Marsen 2006:261). Within the
It is odd that Mary never showed up. presupposed world, identities are ascribed to agents by
means of presupposition and relationships between agents
Similarly, the flexible word order of spoken English
and entities are constructed. These identities and
often serves evaluative functions. Carter and McCarthy
relationships can provide the key to discovering the
(1995:151) note that tails (right-dislocated phrases) tend to
evaluative message of a text.
occur ‘with phatic, interpersonal functions, usually in
contexts of attitudes and evaluations’, for example: ‘Good Implicature
winter wine that’. Lexical choices (e.g. young versus old) reveal evaluative
judgements; such choices are motivated, and imply ‘an
Phonology and prosody association between these signs of identity and the actions
that are ascribed to the agent’ (Marsen 2006:254). For
Phonemic modification
example, an utterance such as ‘gangs of black youths were
At the smallest phonological level, the modification of mugging elderly white women’ (Mumford & Power
individual phonemes contributes to affective meaning. 2003:206) implies a connection between the identity of the
On a global level, anger (or heavily negative evaluation) agents as black and youths and their action of mugging.
increases the accuracy of articulation, while sadness
reduces it (Kienast, Paeschke & Sendlmeier 1999). Conversational implicature
Vowel duration also seems to be influenced, with Grice’s (1975) Co-operative Principle, with its maxims can
happiness producing a particular lengthening effect on explain much ‘unstated’ evaluation. Grice posits a set of
(stressed) vowels, followed by sadness and anger (a slight unwritten conversation rules, or maxims, under the
lengthening effect); conversely, fear produces a shortening headings of quantity, quality, relation and manner. When a
effect (Kienast et al 1999). speaker flouts a maxim, the listener must deduce the
Consonants are also modified when expressing emotions reason for the speaker’s flouting of the maxim – this is a
and strong attitudes. For example, a link between plosive conversational implicature. Such conversational implicata
and fricative sounds and the expression of affect, in are often attitudinal or affective.
particular aggression, has been noted (Walsh 1968) – One feature of conversational implicata is that they avoid
‘spitting out’ or ‘hissing’ words. A similar effect on the direct expression of the speaker’s position and are
duration of voiced fricatives to that on stressed vowels has therefore more difficult to challenge: ‘conversational
been observed, although in this case anger tends to cause implicata are not part of the meaning of the expressions to
a slight shortening (ibid.). the employment of which they attach’ (ibid:58).

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 19

Dialogism Figure 1: Sociolinguistic and contextual filters in the expression of


affect
As speakers (or writers) are aware of their position in an
ongoing dialogue, they position themselves with respect to
previous statements and anticipate future responses. This Speaker’s affective judgement
process is known as dialogism (Volos̆inov 1986).
The degree to which speakers acknowledge the validity of
differing viewpoints (heteroglossia) or refuse to
acknowledge them at all (monolglossia) itself expresses
Sociolinguistic and contextual factors
affective meaning (Martin & White 2005), and increases or
decreases the interpersonal cost of challenging a position
(ibid.).
Business meeting Friends in pub

Context
Sociolinguistic considerations Less use of More use of
Affective resources Affective resources
The expression of affect is not a sociolinguistic
phenomenon. Sociolinguistics describes how external
sociological factors influence and constrain language;
affect, on the other hand, is intensely personal and internal.
However, sociolinguistic factors constitute a key element in Realisation of speaker’s affective judgement

determining how affect is encoded, and how its expression


is interpreted.
In many formal contexts, for example a business meeting,
it is not considered appropriate to behave in an overtly Listener’s awareness of sociolinguistic and contextual factors
emotional manner, so the affective resources at a speaker’s
disposal are circumscribed. However, this does not mean
that speakers do not express attitudes; rather that the
‘rules of the game’ change. The result is to amplify the
affective resources used – what would be considered mild
in another context would be interpreted more strongly. Listener’s interpretation of speaker’s affective judgement
Conversely, a group of young British men talking in a pub
will often use strongly affective language without encoding
a particularly strong affective meaning, and will interpret
each other’s utterances accordingly. personalities of the agents involved. Different people
Other sociolinguistic and contextual factors – express themselves differently, with more or less affective
relationships in terms of familiarity, age, gender and power expression, or with a tendency to use more positive or
– also affect the nature of the expression and interpretation negative expression than others; equally importantly, some
of affective utterances. Thus sociolinguistic and contextual people will adapt their utterances more according to the
factors act as ‘filters’ in the expression and interpretation of personality and behaviour of the other participant(s) in the
affective judgements, as illustrated in Figure 1. exchange, or conform more to sociolinguistic norms, than
others. An understanding of the nature of the participants is
‘The history of a sentence’ therefore important for a reliable analysis.
Another important aspect of context is what Halliday (2003) The mode of the interaction will have effects. A telephone
described as the history of a sentence. A sentence can be call will require different resources from a one-to-one
placed in a historical context from different aspects. conversation over a cup of coffee, due to the relative
Intratextual history refers to the placing of the sentence in availability of non-verbal resources such as gestures and
relation to the progression of the discourse as a whole. facial expressions.
Schematic nuances are developed, and ideational
meanings previously expressed create a framework within
which the sentence is interpreted.
Development history is ‘the prior semiotic experience of
Methodology
those who enact it, as performers or receivers’ (ibid:365). The data was analysed in terms of the affective resources
Development history can refer to the experience of an discussed above and how they interact to produce the
individual, a group or even all of humankind, and is the affective meanings expressed in the text. The discussion
process by which many words and phrases develop presented here is summarised and narrow in scope; it does
affective connotations over time according to their usage not refer to all the resources employed. For a fuller
within a particular speech community. discussion, see Elliott (2008).

Other contextual features Context and medium


Perhaps the most important factor in determining the type The data is taken from a BBC current affairs radio phone-in
of affective resources deployed in an exchange will be the programme from 2007, featuring questions to Nick Clegg

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
20 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

MP, leader of the British political party the Liberal • The people of this country (or just people), consistently
Democrats (prior to his becoming Deputy Prime Minister). positioned as victims in the text: people are scared of the
The sample features a question from a female caller about extent of immigration: when did you … ask the people of
immigration policy, specifically whether Clegg would ‘close this country if this is what they want? The speaker
the borders’ of the UK. positions herself with this group, which includes the
The interaction patterns within the sample are complex. listeners.
While the speaker is ostensibly addressing Clegg with her • Politicians – specifically party leaders, and in particular
question, she has another audience – the radio listeners. Clegg himself, consistently evaluated negatively.
Indeed, it could be argued that the listeners are her primary
audience, since the speaker’s motivation for phoning in to Throughout her discourse, the speaker employs bare
such a programme seems to be to make a point rather than assertions – statements with no hedging employed – to
to make a genuine enquiry of Mr Clegg. create a strongly monoglossic feel, not acknowledging any
The medium of a radio phone-in affects the exchange. alternative viewpoints. The evaluation builds through the
The lack of visual contact prevents the use of non-verbal text; we will consider two utterances of particular interest in
communication, which means that the language itself depth (for full analysis, see Elliott 2008).
carries all the affective meaning.
From a sociolinguistic perspective, the setting of a radio Utterance 1:
phone-in, and the position of Clegg as a senior politician, ‘Will you close the borders within Europe if we find that we are totally
are likely to have the following effects: swamped? Our culture and our way of life have changed beyond belief,

• the dual audience means there are two sets of people are scared of the extent of immigration …’

sociolinguistic norms at play – those between the The speaker uses the strongly negative term swamped.
speakers and the radio audience, and those between the The term swamped has an interesting developmental
speakers and each other history. It has a particular resonance in British political
• the ‘exposed’ nature of the discussion, conducted in such discourse on immigration – Margaret Thatcher was accused
a public forum, is likely to lead to circumspection, since of racism when she used the term in 1979, and further
the speakers will not want to appear unreasonable. controversy was caused in 2002 by the then Home
Secretary David Blunkett’s use of the term. The term
Agents swamped is so loaded as to create a qualitatively different
Nick Clegg has been an MEP, Liberal Democrat feel to the discourse in affective terms. Also, beyond belief
spokesperson for Europe (2005–06) and Home Affairs serves a similarly strong role.
spokesperson (2006–07). In the past, he has described
the issue of immigration as ‘the dog-pit of British politics – Utterance 2:
a place only the political rottweilers are happy to enter’ and ‘Would you close our borders to people from Europe, let alone the rest
arguing for a ‘liberal managed immigration system’ (Clegg of the world, if the people of this country became so distressed at …’
2007). The caller is Mary, a woman from Coventry. The
programme was hosted by Victoria Derbyshire, a BBC Radio Use of the let alone construction posits a scalar
presenter. relationship between Europe and the rest of the world
(Fillmore, Kay & O’Connor 1988), which would naturally be
interpreted in terms of the relative desirability of
immigration from the two parts of the world; this scalar
Discussion relationship is reinforced by marked stress and intonation
The analysis focused on the following extended turn by the accorded to both let alone and rest.
caller, although the previous (and subsequent) parts of the Here, so is heavily marked, with marked stress, a
discussion were also considered. markedly low fall, heavy sibilance on the vowel /s/ and an
elongated diphthong /əu/, conveying an impression of
‘Um … We have open borders within Europe. Millions of people can
anger (Kienast, Paeschke & Sendlmeier 1999, Walsh 1968).
come in here potentially. Um … (unclear) I want to ask you, when did
you, or any of the other two leaders, ask the people of this country if
The utterance is left unfinished, which naturally raises the
this is what they want? It’s not your country. Will you close the borders question of how it would finish; grammatically, completion
within Europe if we find that we are totally swamped? Our culture and with a that-clause to create a cause-and-effect relationship
our way of life have changed beyond belief, people are scared of the is suggested. We can only speculate as to what the
extent of immigration, I believe one in four in Boston, Lincolnshire is an unexpressed effect would be, but we can note the
immigrant. Would you close our borders to people from Europe, let following:
alone the rest of the world, if the people of this country became so
• The cause if the people ... became so distressed at ...
distressed at … you know, I just want to know – would you close the
evokes a fairly extreme set of circumstances, which
borders, or are you so keen on Europe that you don’t care how many
naturalises an expectation that the response would be
people come here?’
proportionally strong.
The text reveals multiple objects of evaluation: • The impression of an extreme response from the British
• Immigration and immigrants. Immigrants are subdivided people is reinforced by the fact that the utterance remains
into those from Europe and those from the rest of the unfinished. After producing some strong, direct
world. statements, the speaker feels unable to articulate these

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 21

consequences. She then appears to backtrack – you overall effect, as the unfinished utterance exemplifies.
know, I just want to know … – suggesting a reasonable The text we examined was a telephone-based exchange
position on the part of the speaker, especially with the with a whole host of other contextual and sociolinguistic
use of just (with a low intonational fall). factors in play relating to participants, medium, (multiple)
audiences and interaction patterns. The last two points in
We cannot know how the speaker intended to complete
particular raise interesting questions for future research
the utterance, but what is important is the interpretation
regarding their effects, since they apply whenever more
that the unfinished utterance, in conjunction with previous
than two people are involved in an exchange, even in a
utterances, naturalises – the perceived attitude. This seems
passive listening role.
to be that the consequences of the people of Britain
These reflections raise questions regarding models of
becoming so distressed are rather dark – too dark to be
language and language proficiency – affective meaning, a
spelled out on a radio programme.
central plank of communication, and often its main
As can be seen, the utterances need to be considered in
motivation, is underrepresented in current models and is a
the light of the full text, plus surrounding turns and the
prime candidate for in-depth exploration, which would
wider context, to realise how the interaction of the different
enrich our understanding of language as a whole. Similarly,
affective resources creates the full evaluative effect.
the study of its progression as a key part of language
proficiency could reap dividends, with consequences for
Global overview
language assessment – although obstacles such as the
• The use of noun phrases (the people of this country, high context-sensitivity and deeply personal nature of
people) and pronouns (we, our) throughout to position affective communication are by no means easy to overcome
the people of Britain as victims of both immigration and within an assessment context.
the politicians Mary holds responsible. The use of the
noun phrase the people of this country is interesting; References and further reading
concordance analysis shows that it almost exclusively
Carter, R and McCarthy, M (1995) Grammar and the spoken language,
occurs in political rhetoric, and that it carries a strong
Applied Linguistics 16 (2), 141–15.
positive semantic prosody (Elliott 2008).
Channell, J (2000) Corpus-based analysis of evaluative lexis, in
• The repeated use of bare assertions (often in Hunston S and Thompson, G (Eds) Evaluation in Text: Authorial
conjunction with subjective statements) lends a Stance and the Construction of Discourse, Oxford: Oxford University
monoglossic feel to the whole turn: the speaker does Press, 38–55.
not acknowledge alternatives. This is reinforced by Clegg, N (2007) Immigration in the 20th Century, speech at
(phonologically) prosodic features such as a rapid Liberal Democrats Conference 2007, retrieved from
http://www.nickclegg.org.uk/index.php?option=com_content&tas
speech rate for such utterances and low final falls in
k=view&id=219&Itemid=45.
intonation.
Coates, J (1990) Modal meaning: the semantic-pragmatic interface,
• The evaluation builds throughout the turn, reaching a Journal of Semantics 7, 53–63.
peak with the unfinished utterance, as the layers of Dossena, M and Jucker, A (2007) Introduction, Textus XX, 7–16.
evaluation interact to reinforce each other and amplify
Elliott, M (2008) The Expression of Affect in Spoken English: a case
the effect. study, unpublished MA thesis, King’s College London.
• The complex interaction patterns and multiple audiences Fillmore, C, Kay, P and O’Connor, M (1988) Regularity and idiomaticity
have an effect on the speaker as she attempts to tailor her in grammatical constructions: the case of let alone, Language
message to the different audiences and conform to 64 (3), 501–538.
different sociolinguistic norms simultaneously (it may have Gobl, C and Chasaide, A (2003) The role of voice quality in
been an inability to reconcile these with the intended communicating emotion, mood and attitude, Speech
message that led the speaker to abort the utterance). Communication 40, 189–212.
Goldberg, A (2003) Constructions: a new theoretical approach to
What is particularly striking is how different affective language, Trends in Cognitive Sciences 7 (5), 219–224.
resources interact to produce the overall effect, and how the Grice, H (1975) Logic and Conversation, in Cole, P and Morgan, J (Eds)
evaluation is dependent on previous utterances (and Syntax and Semantics Volume 3: Speech Acts, London: Academic
previous texts, as in the case of swamped). An analysis Press, 41–58.
focusing on only one or two of these areas, or on individual Halliday, M (1978) Language as a Social Semiotic, London: Arnold.
utterances in isolation, would not be able to account fully Halliday, M (1994) An Introduction to Functional Grammar (2nd ed.),
for the extremely strong affective meaning expressed London: Arnold.
throughout. Halliday, M (2003) On Language and Linguistics (edited by Webster,
J), London: Continuum.
Hunston, S (2007) Using a corpus to investigate stance quantatively
Conclusions and qualitatively, in Englebretson, R (Ed.) Stancetaking in
Discourse, Amsterdam: John Benjamins, 27–48.
We have seen that different elements of language combine Jenkins, J (2000) The Phonology of English as an International
to create affective meaning in a highly interrelated manner, Language, Oxford: Oxford University Press.
but that some individual elements can create a particularly Kay, P and Fillmore, C (1999) Grammatical constructions and
strong effect which reverberates throughout the whole text. linguistic generalizations: the what’s X doing Y? construction,
Even what is not said often can contribute greatly to the Language 75 (1), 1–33.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
22 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Kienast, M, Paeschke, A and Sendlmeier, W (1999) Articulatory Tao, H and McCarthy, M (2001) Understanding non-restrictive which-
reduction in emotional speech, EUROSPEECH ‘99, 117–120. clauses in spoken English, which is not an easy thing, Language
Sciences 23, 651–677.
Larsen-Freeman, D and Cameron, L (2008) Complex Systems and
Applied Linguistics, Oxford: Oxford University Press. Tarone, E (1973) Aspects of intonation in Black English, American
Speech 48 (1–2), 29–36.
Marsen, S (2006) How to mean without saying: presupposition and
implication revisited, Semiotica 160, 243–263. Thompson, G and Hunston, S (2000) Evaluation: An introduction, in
Hunston, S and Thompson, G (Eds) Evaluation in Text: Authorial
Martin, J and White, P (2005) The Language of Evaluation: Appraisal Stance and the Construction of Discourse, Oxford: Oxford University
in English, Basingstoke: Palgrave Macmillan. Press, 1–27.
Mumford, K and Power, A (2003) East Enders, Bristol: The Policy Veltman, R (2003) Phonological metaphor, in Simon-Vandenbergen,
Press. A-M, Taverniers, M and Ravelli, L (Eds) Grammatical Metaphor,
Nariyama, S (2006) Pragmatic information extraction from subject Amsterdam: John Benjamin, 311–335.
ellipsis in informal English, Proceedings of the 3rd Workshop on Volos̆inov, V (1986) [1929] Marxism and the Philosophy of Language,
Scalable Natural Language Understanding, 1–8. Cambridge, MA: Harvard University Press.
Perez, A (1990) Time in motion: grammaticalisation of the be going to Walsh, M (1968) Explosives and spirants: primitive sounds in
construction in English, La Trobe University Working Papers in cathected words, Psychoanalytic Quarterly 37, 199–211.
Linguistics 3, 49–64. Wierzbicka, A (1987) Boys will be boys: ‘radical semantics’ vs.
Stefanowitsch, A and Gries, S (2003) Collostructions: investigating ‘radical pragmatics’ Language 63 (1), 95–114.
the interaction of words and constructions, International Journal of Zwicky, A (2005) Saying more with less, Language Log, retrieved from
Corpus Linguistics 8 (2), 209–243. http://158.130.17.5/~myl/languagelog/archives/2005_03.html.

Peer–peer interaction in a paired Speaking test:


The case of FCE
EVELINA D GALACZI RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This short summary is based on a doctoral thesis submitted more accurately the relationship between the discourse
to Columbia University, New York City (US) in 2004. The PhD generated by the task and the scores for ‘interactive
was supervised by Professor James Purpura. communication’, and to provide some validity evidence for
the IC scores. The results showed that the high-scorers
This discourse-based study, which was undertaken as part of mostly oriented to a collaborative pattern of interaction,
a doctoral degree, investigated paired test taker discourse in while the low scorers generally oriented to a parallel pattern
the First Certificate in English (FCE) Speaking test. Its primary of interaction, as would have been expected. The
aim was to focus on fundamental conversation management significance of the study lies in the deeper understanding it
concepts, such as overall structural organisation, turn- provides of paired oral test interaction in the FCE and the
taking, sequencing, and topic organisation of the paired construct of conversation management. This study also
test taker interaction. The analysis highlighted global holds implications for FCE examiner training as it provides
patterns of interaction in the peer test taker dyads and insights which could lead to more accurate and consistent
salient discourse features of interaction. The three distinct assessment of FCE candidate output. A further contribution
patterns of interaction which emerged were termed of the present study is the recommendations it provides for
‘collaborative’, ‘parallel’, and ‘asymmetric’. The patterns of the performance descriptors used for ‘interactive
interaction were distinguished based on the dimensions of communication’ in the FCE assessment scales, which would
mutuality and equality, and were conceptualised as ultimately lead to a fairer test. For more details on this study,
continua ranging from high to low. In addition, the see Galaczi (2003, 2008).
dimension of conversational dominance, operationalised as
‘participatory’, ‘sequential’, and ‘quantitative’, was found to
intersect with the dimensions of mutuality and equality, References
leading to sub-groups within each interactional pattern of Galaczi, E D (2003) Interaction in a paired speaking test: the case of
high or low conversational dominance. The second goal of the First Certificate in English, Research Notes 14, 19–23.
the study was to investigate a possible relationship between Galaczi, E D (2008) Peer–Peer Interaction in a Speaking Test: The
the patterns of peer–peer interaction and the FCE score for Case of the First Certificate in English Examination, Language
‘interactive communication’ (IC). The aim was to understand Assessment Quarterly 5 (2), 89–119.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 23

Second language acquisition of dynamic spatial


relations
IVANA VIDAKOVIĆ RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This short summary is based on a doctoral thesis submitted learners’ interlanguage and its development over time are
to the University of Cambridge (UK) in 2006. The PhD was systematic. This systematicity cannot be directly related to
supervised by Dr Henriëtte Hendriks. either the first or the second language. The acquisition
paths exhibit similarities across different (first – L1, and
The aim of this thesis is to shed light on the nature of second – L2) language pairings, being influenced mostly by
adult second language acquisition, factors guiding the universal, and only marginally by language-specific factors,
acquisition and the ways in which these factors interact. since the interlanguage of beginners is syntactically and
This is achieved through exploring how English learners of semantically a very simple system. Previous studies on
Serbian and Serbian learners of English acquire another higher-level learners, whose interlanguages are more
way of expressing dynamic spatial relations (motion) in a complex syntactically and semantically, document mostly
second (foreign) language. language-specific influences. The present thesis set out to
Talmy (1985) divides languages into: investigate whether universal characteristics of learners’
a. satellite-framed, typically encoding Path in satellites development persist among learners beyond the beginning
and Manner in motion verbs (e.g. The bottle floated out) stage, or whether only language-specific influences hold
and sway, how all of them manifest and what their scope is.
Since the learners examined are beyond the beginning
b. verb-framed, typically encoding Path in motion verbs
stage, the over-arching hypothesis was that language-
and Manner, if expressed at all, outside the verb
specific influences would be stronger than among
(e.g. La botella salió flotando – The bottle exited floating).
beginners and acquisition paths not so homogenous, yet
English and Serbian were both classified as satellite- factors other than first or second language may bring out
framed languages within Talmy’s typology. However, recent similarities in the interlanguages and acquisition paths of
research revealed that Serbian differs to a certain extent learners with different first and second languages.
from English as to where Manner and Path are typically One of the contributions of the present thesis resides in
expressed (Filipović 2002), and as to the frequency of showing that even the interlanguage of learners beyond the
expression of Manner. Therefore, Filipović (2002) beginning stage shows similarities unrelated to the first or
reclassified Serbian placing it midway in the continuum second language, and also that it exhibits a rich interplay of
satellite-framed>Serbian>verb-framed. The contribution of both language-specific (L1/L2) and universal factors. For
the non-acquisition part of the thesis resides in providing example, both English and Serbian learners mostly prefer
further support for the reclassification of Serbian, based the satellite-framed, English pattern (e.g. run into X ) to the
on the analysis of the spoken mode of language use and verb-framed pattern favoured by Serbian native speakers
systematic examination of attention to Manner (as reflected when using their L1 (e.g. go running into X ). In this way,
in the frequency of Manner mention). The findings show learners resort to the economy-of-form strategy1 opting for
that: a pattern that is more economical by being shorter,
a. when they want to express Manner in boundary-crossing syntactically simpler and thus easier for processing
situations (e.g. entering, exiting, crossing), Serbian native (production/understanding). It is in the domain of linguistic
speakers most frequently opt for the verb-framed pattern attention to Manner that a language-specific influence
of expressing Path in the verb and Manner outside it (L1 influence) is at its strongest at times, being clearly
when using their mother tongue, and visible even among the advanced English and Serbian
learners. In addition, the findings reveal that L2 learners
b. they omit Manner information considerably more
undergo not only linguistic reorganisation, but also a
frequently than English native speakers when speaking in
change in the degree of linguistic attention to Manner
their mother tongue, even when Manner is not inferable
(increasing/decreasing frequency of Manner mention) with
from the context.
increasing proficiency levels.
Using the Interlanguage approach, the main, acquisition- Besides theoretical implications for the field of second
related part of the thesis examines how lower-intermediate, language acquisition, this thesis has also practical
upper-intermediate and advanced learners express motion implications for teaching the linguistic devices expressing
at a given stage of the acquisition process, how their dynamic spatial relations in the two languages. For more
linguistic means develop and what factors influence the details on this study see Filipović & Vidaković (2010).
acquisition. According to this approach, which has proved
fruitful for analysing the acquisition process of beginners, 1 This term was first used in Vidaković (2006).

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
24 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

References Language Acquisition, Amsterdam/Philadelphia: John Benjamins,


269–293.
Filipović, L (2002) Verbs in motion expressions: structural Talmy, L (1985) Lexicalization patterns: Semantic structure in lexical
perspectives, unpublished PhD dissertation, University of forms, in Shopen, T (Ed.) Language typology and syntactic
Cambridge. description: Grammatical categories and the lexicon, Cambridge:
Filipović, L and Vidaković, I (2010) Typology in the L2 classroom: Cambridge University Press, 57–149.
Second language acquisition from a typological perspective, in Vidaković, I (2006) Second Language Acquisition of Dynamic Spatial
Pütz, M and Sicola, L (Eds) Cognitive Processing in Second Relations, unpublished PhD dissertation, University of Cambridge.

Demonstrating cognitive validity of IELTS


Academic Writing Task 1
GRAEME BRIDGES ASSESSMENT AND OPERATIONS GROUP, CAMBRIDGE ESOL

Introduction concept, is configured on the basis of three types of


evidence:
This paper is based on a Master’s thesis submitted to
Anglia Ruskin University, Cambridge, UK, in October 2008. • construct: the extent to which the test scores reflect the
The research was funded by Cambridge ESOL. The MA was test takers’ underlying language knowledge and abilities
supervised by Dr Sebastian Rasinger. based on a model of communicative language ability
(see Bachman 1990)
This study further examines the validity of IELTS Academic • content: the extent to which the content of the test
Writing Task 1, the first of two compulsory tasks that are represents the target language use domain
designed to test the writing ability of those wishing to study
• criterion: the extent to which the test scores are
or work in the medium of English. The study makes use of
correlated with an external criterion that measures the
Weir’s (2005) socio-cognitive validity framework and focuses
same knowledge and abilities.
on cognitive validity by investigating the appropriateness of
the cognitive processes required to complete IELTS Academic Weir’s approach reconfigures construct validity along
Writing Task 1. As a secondary research goal, the processes three dimensions – context, cognitive processing and
required to address two different kinds of visual input scoring – and shows how they interact with each other
employed in Task 1 – a graph and a diagram – are compared. thereby demonstrating the unitary nature of validity (Weir
The study uses two research instruments – a verbal 2005, Weir, O’Sullivan, Jin & Bax 2007). In this model, the
protocol technique and a questionnaire which together construct does not just reflect the underlying traits of
provide qualitative and quantitative data. communicative language ability but is the result of trait,
The findings demonstrate that Task 1 does engage those context and score. The ‘trait-based’ or ‘ability’ approach to
cognitive processes that are deemed essential in the target assessment is thus reconciled with the ‘task-based’ or
language use domain. The study reveals that this task is ‘performance’ approach. An interactionalist position
essentially a knowledge telling exercise, so some (Chapelle 1998:43) is thus adopted whereby the construct
processes, especially organising, are under-represented resides in the interaction between the underlying cognitive
with this task type. It also shows that there seem to be ability and the context of use – hence the socio-cognitive
some differences in IELTS candidate perception regarding model (see Shaw & Weir 2007:2).
the data and diagram task type although such differences Figure 1 depicts how the components that make up
are statistically insignificant. construct validity join together both temporally and
Although a relatively small-scale research project, the conceptually. The arrows indicate the relationship between
study not only provides further evidence of cognitive validity the components with the timeline running from bottom to
of this task type but also raises questions for further top. The test taker characteristics box connects directly to
research. the cognitive and context validity boxes because ‘these
individual characteristics will directly impact on the way the
Literature review individuals process the test task set up in the context
validity box’ (Weir 2005:51).
The socio-cognitive approach In this framework cognitive validity involves collecting
Cambridge ESOL has for almost the last 20 years used the both a priori evidence on the mental processing activated
VRIP (Validity, Reliability, Impact and Practicality) approach by the test and a posteriori evidence involving statistical
to validating its tests (Saville 2003) with validity ‘generally analysis of scores following test administration. As this
considered to be the most important quality’ (ibid:65). In study concentrates on just considering a priori evidence,
this framework, validity, although seen as a unitary score analysis does not form part of the methodology.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 25

Figure 1: Construct validity components • monitoring: checking mechanical accuracy and overall
coherence
• revising: adjusting text as a result of monitoring.
SCORING
VALIDITY These stages of executive processing are the basis of
Shaw and Weir’s (2007) conceptualisation of the cognitive
validity component of the socio-cognitive framework for
writing and as such inform the methodology outlined
below.
A parallel strand of research focuses not on the stages of
TEST
the writing process per se but how these relate to different
CONSTRUCT
levels of language proficiency. Eysenck & Keane (2005:418)
argue that it is the planning process that differentiates the
skilled from the unskilled writer. Scardamalia & Bereiter
CONTEXT COGNITIVE (1987) describe two major strategies, knowledge telling and
VALIDITY VALIDITY knowledge transforming, which occur mainly at the
planning stage and help to identify the processing of skilled
writers and the less able. In knowledge telling the writer
plans very little and is concerned mainly with generating
content from remembered existing resources in terms of
TEST TAKER CHARACTERISTICS content, task and genre. In knowledge transforming the
skilled writer considers the complexities of a task as well as
(Shaw & Khalifa 2007) content, audience, register and other relevant factors in
written communication.
The elements making up construct validity can be seen to
be symbiotically related in that decisions taken in terms of IELTS-related writing research
task context will impact on the processing that takes place
As a high-stakes test IELTS has always attracted attention
in task completion. Likewise scoring criteria where known to
from researchers including those who have focused just on
the test taker will impinge on cognitive processing. Taken
the writing component. Much of the research has been
together ‘the more evidence collected on each of the
generated by the IELTS partners themselves thus
components of this framework, the more secure we can be
demonstrating their commitment to the continual
in our claims for the validity of a test’ (Weir 2005:47).
improvement of the test (see for example Taylor & Falvey
(2007) for a collection of IDP and British Council joint-funded
Models of second language writing research reports on IELTS Writing). In 2005, the assessment
Before the 1960s writing was often conceptualised as criteria and rating scales were revised in IELTS Writing largely
transcribed speech and was viewed as ‘decontextualised’ as a consequence of these and other research findings.
(Ellis 1994:188) and product-oriented with final texts seen Many of the inevitable criticisms that a high-stakes test such
as ‘autonomous objects’ where various elements were as IELTS attracts were addressed in 2005 but some issues
organised according to a ‘system of rules’ (Hyland 2002:6). concerning cognitive validity still remain.
Writing is now seen as essentially a communicative act. Of the two tasks in IELTS Academic Writing most research
A written text is therefore viewed as discourse in that the has been conducted on Task 2, the short essay. Being the
writer attempts to engage the reader using linguistic longer of the two in terms of time allocation (40 minutes)
patterns influenced by a variety of social constraints and and word length (250 words) it generates a greater sample
choices (writer’s goals, relationship with audience, content of L2 writing. There have therefore been several a posteriori
knowledge, etc.). Any model of writing needs to account for studies on Task 2 candidate scripts (see Mayor, Hewings,
these contextual factors and see writing as a social act. North, Swann & Coffin 2006). Task 2 also carries the heavier
A model of writing also needs to take account of the weighting in scoring, one of the justifications for Moore &
internal processing writers undertake. A recent model from Morton’s (2006) a priori study on test task authenticity.
Field (2004) is based upon information processing Weir et al (2007) were the first to use a specially designed
principles from psycholinguistic theory. He provides a cognitive validity-based questionnaire in their study of
detailed account of the stages a writer proceeds through: comparability of word-processed and pen & paper IELTS
• macro-planning: ideas gathering and identifying major writing. In that study, they compared candidate scores on
constraints (genre, readership, goals) two Task 2 prompts (a posteriori ) as well as a quantitative
and qualitative analysis of the questionnaire responses
• organisation: ordering ideas and identifying relationships
(a priori ). This questionnaire forms the basis of one of the
between them
research instruments used in my study.
• micro-planning: focusing on the part of the text Task 1 on the other hand has generated relatively less
(paragraph and sentence) about to be produced research interest and apart from some internal Cambridge
• translation: converting prepositional content held in ESOL validation studies, it has always been researched
abstract form to linguistic form alongside Task 2. Of greatest relevance to the present study

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
26 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

is Mickan, Slater and Gibson’s (2000) a priori study Figure 2: Data input task – cinema attendance
examining the readability of test prompts (Task 1 and 2)
and test-taking behaviours of intending IELTS candidates You should spend about 20 minutes on this task.
using verbal protocol analysis. This study essentially
The graph below gives information about cinema attendance in
focused on the context validity parameters of task input Australia between 1990 and the present, with projections to 2010.
emphasising the ‘socio-cultural influences on candidates’
Summarise the information by selecting and reporting the main
demonstration of their writing ability’ (ibid:29). As many features, and make comparisons where relevant.
aspects of IELTS Writing have evolved since this study,
Write at least 150 words.
including the rubric, it would be interesting to see how
candidates perceive Task 1 now.
100

90

80
Methodology outline

% of age groups attending cinema


70

at least once a year


From the above literature review I have located a research 60
area where much has already been explored. However, at
50
the time of writing the thesis, cognitive validity of IELTS Task
40
1 had not been investigated before, to my knowledge, and
30
there had not been an attempt to apply both qualitative and
20
to some extent quantitative methodologies in one study to
generate a priori evidence supporting the cognitive validity 10

of just the Task 1 in IELTS Academic Writing. 0


1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Two research instruments were employed (see Table 1
Year
below). Firstly, verbal protocol analysis (VPA) was utilised
with four IELTS preparation students as they wrote a 14–24 year olds 25–34 year olds
response to one of two Academic Writing prompts: one data 35–49 year olds 50+ year olds
prompt represented as a graph and one diagrammatic
prompt (see Figures 2 and 3). The aim was to provide rich
(Source: IELTS Scores Explained 2006)
qualitative data on the cognitive processes undertaken by
IELTS candidates. The same four students then responded
to a questionnaire which sought to further elicit their
thought processes. By distributing this questionnaire to Figure 3: Diagrammatic input task – brick manufacturing
56 other students, quantitative as well as qualitative data
was generated.
You should spend about 20 minutes on this task.
Table 1: Data collection methods The diagram below shows the process by which bricks are
manufactured for the building industry.
Methodology Instrument Participants
Summarise the information by selecting and reporting the main
Qualitative ‘Think aloud’ verbal 4 IELTS candidates at features, and make comparisons where relevant.
protocols (concurrent/ various levels of
non-mediated) proficiency and Write at least 150 words.
L1 background

Quantitative/ Cognitive-processing 4 candidates above + Brick manufacturing


Qualitative questionnaire
56 other candidates of
varying levels and
L1 background

‘Think aloud’ verbal protocols


Verbal protocol analysis is an introspective technique that is
well-suited to obtain evidence of cognitive processing as
part of construct validation. A participant is asked to ‘talk
aloud’ or ‘think aloud’ as they carry out a task with their
utterances comprising the ‘protocol’. ‘Verbal protocol’ is the
data gathered under these conditions. These verbalisations
can be seen as an accurate record of the participant’s
thought processes. It is important to stress, however, that
‘individuals cannot report their own cognitive processes’
and that it is for the researcher to ‘infer cognitive processes *Clay: type of sticky earth that is used for making bricks, pots, etc.
and attended information’ (Green 1998:4). In other words (Source: IELTS Scores Explained 2006)
participants are required to verbalise their thoughts and not
the processes leading to those thoughts.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 27

In the study the participants were asked to verbalise their Two participants (1 and 3) thought aloud as they wrote a
thoughts concurrently as they wrote their responses to one of response to the data input task (Figure 2) and two
two non-live IELTS Academic Writing Task 1s. Their participants (2 and 4) responded to the diagrammatic
verbalisations were audio-recorded generating a set of Academic Writing Task 1 (Figure 3).
protocols making up a body of qualitative data. Concurrent All were IELTS preparation students at Anglia Ruskin
reports are generally regarded as more reliable than University in Cambridge. More demographic information on
retrospective reports in that data is not reliant on recovering the participants is provided in Table 2 below.
thoughts from memory. These reports were supplemented by
field notes (e.g. instances of underlining, crossing out and Table 2: Demographic data of participants

insertions were recorded). As it was important not to interfere


Participant Nationality/ Gender Age Reasons for taking IELTS
with the thought processes that were being explored, a non- Language
intrusive approach was used (i.e. ‘non-mediated’) where
1 Swiss French F 20 Hoping to do BA in
prompting only occurred during long pauses and included Business Management in
the request to ‘keep talking’ or occasionally ‘speak louder’. London

A quick debriefing at the end of the recording session took 2 Mongolian F 29 Hoping to do an MA in
place where subjects were asked to comment on the task and Modern Society and Global
Transformation at
the research procedure. After this, the participants were Cambridge
asked to complete a questionnaire.
3 Korean M 23 Wants to do BA in Sports
After the data was collected, the recordings were Management at
transcribed and data was segmented according to their Loughborough
correspondence to single thought processes. The unit of 4 French M 20 Wants to improve English
analysis for segmentation was sometimes a word, phrase, while in UK for a year
clause, sentence or even 2–3 sentences. Each segment was
delineated with a ‘/’ and timed. In order to facilitate
analysis, a coding scheme was developed by focusing on Cognitive processing questionnaire
each protocol at a time and attempting to describe each For this part of the study, I adapted the 38-item cognitive
segment as a thought process. processing questionnaire (CPQ) designed by Weir et al
This involved four iterations of re-coding until a scheme (2007:321). The questions are grouped to reflect the
was established that accounted for all four sets of cognitive processes that writers are hypothesised to
protocols. Green (1998:70) emphasises that it is important undergo and are identified in the CPQ as one of Field’s
at this stage to keep ‘any theoretical assumptions to a (2004:329) six stages outlined previously. For example,
minimum’ as otherwise there is the danger of ignoring question 21 (see below) is one of several that focuses on
those verbalisations that are inconsistent with a particular the translation phase:
hypothesis.
I felt it was easy to express ideas using the correct sentences.
The coding that finally emerged consisted of each
1. Strongly disagree 2. Disagree 3. No view 4. Agree 5. Strongly agree
protocol being divided into three phases – pre-writing,
writing and post-writing – and was labelled PreW, W and Each stage is represented by at least four questions in
PostW respectively. Each thought process was then order to enhance the reliability of the questionnaire, as a
assigned a number so that PreW1 for example referred to single question is always susceptible to bias.
the process of ‘Reading (part of) the introductory A further advantage of this procedure is its uni-
background to the visual input’. As well as code labels and dimensionality in that all the questions measure in the
length of time, comments from the field notes were also same direction. Each item can therefore be scored from
collated. For example, the beginning of a participant’s 1 to 5 (except Question 12 which elicited a yes/no
protocol was presented as follows: response). The higher the score, the more favourable is
the attitude. This in turn means that a frequency count can
Segment Time Verbal protocol Code Length Comments
be carried out for the number and percentage of
of time respondents who choose each option of each question.
The mean value of responses to each question can then
001 00.00 OK. Writing Task 1. PreW3 00.08
You should spend be calculated to reveal the tendency of the responses with
about 20 minutes the proviso that a minimum number of 30 respondents
on this task/
are sourced.
002 00.08 The diagram below PreW1 00.10 Underlines For those four who participated in the think aloud
shows the process ‘bricks’ on
by which bricks are task procedure, this questionnaire was administered afterwards
manufactured for the in order to avoid the possible contamination of the
building industry/
protocols. As well as to these four participants, I distributed
003 00.18 Summarise the PreW3 00.17 Underlines this questionnaire to several language schools that run
information by ‘make
selecting and comparisons’ IELTS preparation courses in order to generate some
reporting the main and circles quantitative data.
features, and make ‘main
comparisons where features’ on A total of 60 IELTS preparation students of varied
relevant/ task nationalities studying in the UK (44 students) and Hong
Kong (16 students) wrote a response to either the data

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
28 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

input or diagrammatic writing tasks (see Table 3). They then Table 4b: Writing phase (17 minutes 33 seconds/75.81% of overall
time on task)
completed one of two questionnaires depending on the
task they had responded to. Code Coding category Length Frequency
of time
Table 3: Breakdown of respondents by language institute and task type W2 Converting ideas into text 07.54 22
W1 Rehearsing a linguistic form before writing 01.58 8
IELTS preparation course provider No of respondents
——————————————————— W11 Interpreting feature(s) of visual input 01.28 5
Data Diagram Total W14 Reviewing grammatical/lexical correctness 01.22 4
after writing some text
Eurocentres, Cambridge 4 4 8
W4 Previewing a concept before writing some 00.54 1
Anglia Ruskin University, Cambridge 8 10 18 text
W17 Making a goal statement 00.50 4
St Giles, central London 9 9 18
W13 Reviewing grammatical/lexical correctness 00.37 3
Centre for Language in Education, 8 8 16 while writing some text
Hong Kong Institute of Education W3 Attempting to retrieve a linguistic form 00.25 2
from memory
Total 29 31 60
W7 Reading (part of) the standard instructions 00.25 1
W15 Reviewing informational content while 00.11 1
writing some text
W20 Monitoring the word count 00.05 1
Data collection and analysis
‘Think aloud’ verbal protocols
From each of the protocols collected, the instances where a abstract ideas to linguistic form (W2 22 instances). The
coding category was applied were ranked in order of time second most common thought process was rehearsing a
spent. This was supplemented with data on the frequency linguistic form before writing. There were eight instances of
of instances so that together these rather crude measures this (W1) which generally occurred before the actual
could provide some indication of the prevalence of certain putting of pen to paper. There were however some overt
thought processes. This information was collated for each examples of micro-planning where the writer broke off
writing phase for each participant. mid-sentence, tried to find a phrase to continue the
For the purposes of exemplification, the findings of each sentence, went back to the task and read the instructions
writing phase based on the verbal protocol of Participant 1 and then made a goal statement, previewed an idea
are summarised in Table 4a, Table 4b and Table 4c. Of the before finally writing. This highlights the dynamic nature
23.09 minutes she took to complete the task, she spent of writing where the text becomes part of the context thus
03.41 minutes planning her response (see Table 4a). There compelling the writer re-visit the task, the instructions,
is evidence of macro-planning in that she clarified the task goals and their memory before they can continue encoding
requirements by reading the task-specific rubric (PreW1 and their thoughts.
PreW2) and the graphical input (PreW5). She attempted to As well as micro-planning there are also examples of
interpret the data (PreW8) and summarise it (PreW9). This monitoring during (W13) and after writing some text (W14).
was the only protocol where there was evidence of topic While writing there were occasions where the writer self-
definition (PreW10) where the writer generates ideas by corrected some errors e.g. The graph illustrate illustrates
utilising world knowledge. However, at no time did she erm/ (W13). This is an example of low level monitoring
write any notes although she did claim in the debriefing involving mechanical accuracy such as punctuation,
that she made notes in her head. spelling and syntax. However, the monitoring that occurred
Just over 75% of the time (17.33 minutes) was spent after some text had been written does require more
actually writing (see Table 4b), of which she spent 07.54 attentional resources as it involves checking cohesion
minutes engaging in translating – the actual conversion of between sentences and within sentences e.g. the writer
in her final paragraph prepared to write ‘To conclude’,
Table 4a: Pre-writing phase (3 minutes 41 seconds/15.91% of overall realised that the previous paragraph began with ‘To
time on task) conclude’ so replaced it with ‘To compare’ some 3 minutes
after originally beginning the penultimate paragraph.
Code Coding category Length Frequency
of time The degree of monitoring however did not seem to extend
to any consideration of the reader or to goals set earlier.
PreW10 Defining the topic 00.46 3
Nevertheless there is evidence of an evolving orientation
PreW5 Reading (part of) the visual input 00.36 4
towards goals. There are four instances of this where the
PreW8 Interpreting feature(s) of visual input 00.31 5
writer prompts herself: to make a difference, write one more
PreW9 Summarising feature(s) of visual input 00.23 2
sentence then a conclusion, draw a comparison and put it in
PreW2 Re-reading (part of) the introductory 00.13 1
background to the visual input my conclusion (W17).
PreW6 Re-reading (part of) the visual input 00.11 1 The sheer complexity of writing is further evidenced with
PreW1 Reading (part of) the introductory 00.10 1 this participant in that she prompted herself twice to
background to the visual input retrieve a linguistic form from her long-term memory (W3),
PreW7 Previewing potential linguistic form(s) 00.07 1 felt the need to read the standard instructions for the first
time (W7), reviewed the informational content of a piece of

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 29

text (W15) and was aware of the need to monitor the word Table 5: Stages involved in writing and questions designed to elicit
candidates’ behaviour
count (W20).
This participant was only one of two subjects who Stages Question No.
devoted any time to the post-writing phase (see Table 4c)
Macro-planning 1–9
although she had to prompt herself to do this (PostW1).
Organising 10–15
She mostly spent the time correcting errors (PostW3)
Micro-planning 16–19
although there were a couple of instances where she read
Translating 20–26
her script making no corrections (PostW2). In her debriefing
Monitoring & Revising 27–38
she thought that her response was short and that she
didn’t have enough time to count the number of words.

Table 4c: Post-writing phase (1 minute 55 seconds/8.28% of overall From the frequency data collected, the percentage of
time on task) agreement for each question was obtained by adding up
the percentage of those expressing agreement and strong
Code Coding category Length Frequency
of time agreement. This was done by task type and as a total and is
presented in Tables 6, 7, 8, 9 and 10 overleaf. Preceding
PostW3 Editing (part of) text 01.02 3
each table is a summary of the data highlighting the main
PostW1 Making a goal statement 00.18 2
findings with some tentative speculation as to the reasons
PostW2 Reading (part of) text 00.17 2
for the results.

Macro-planning
Overall there is strong evidence from this and from the In the goal-setting part of this stage (questions 1–5, see
other three participants that all but one of the cognitive Table 6) there is generally quite high to very high agreement
processes outlined in Weir (2005) and Field (2004) are being among the respondents. It does seem that many of these
employed. The only process where there was very little preparation students do read the instructions very carefully
evidence was of organising – this was also the case in and attempt to interpret both these and the visual input so
Mickan et al’s (2000) study which concentrated on Task 2, a that they can meet the task requirements. This seems to be
longer task requiring knowledge transforming skills. Perhaps especially true of those who responded to the diagrammatic
even more so for Task 1, candidates are unlikely to write input.
notes or mentally plan an outline. What was striking from all A very low proportion of candidates seem to utilise world
the participants was the perception that there was not knowledge or consider the genre constraints when
enough time so perhaps organising the response was responding to Academic Writing Task 1s. Regarding the
sacrificed due to that. However, from the participants’ scripts question of topic knowledge (Q6), it could be argued that
and also from some of their goal statements there was still low levels of agreement are actually a good thing as IELTS
some evidence of the provisional outlining of ideas. Writing tasks should not be seen to favour candidates from
The findings based on verbal reports of all four any specific discipline. Tasks have to be about something
participants showed that there did not seem to be any but not at a level where specialised knowledge would
striking dissimilarities in thought processes between those create bias.
taking the data input task as opposed to the diagram. Of more concern perhaps is the low level of knowledge
Differences were largely based on writing competence with about this task type which is a 150-word descriptive
the more skilled writers such as Participant 1 engaging summary (cf. question 8 in Table 6 overleaf). Interestingly,
more in macro-planning and monitoring than the less more candidates, albeit very marginally, seemed to be more
skilled (for more information see Bridges 2008). familiar with the diagrammatic task type than the data
Not surprisingly, the protocols collected in this study input.
provide stronger evidence of knowledge telling than
knowledge transforming. Task 1 is after all designed to Organising
facilitate the transfer of assembled information from a A not particularly clear picture emerges from this sample
visual input to a verbal written output. during this organising stage (see Table 7). For questions 10
It must be emphasised, however, that as this study and 11, which elicit information on whether the writer starts
involved just four participants it should be seen as to generate their ideas after the macro-planning phase
exploratory and any conclusions drawn are tentative. There above, it seems that about a third of the students report
are also drawbacks with the methodology of VPA itself that they engage in these activities.
which need to be considered in any conclusion. Questions 12 and 13 reveal that just over half do plan an
outline either on paper or as mental notes and that just
Cognitive processing questionnaire over 50% have thought of their ideas before they plan their
The design of the two questionnaires was aimed at outline. These ideas may well be incomplete (see question
investigating, through participants’ self-reports, the extent 10) or not well-organised (question 11) but there does
of the cognitive processes they employ in responding to two seem to be some provisional organisation of ideas.
types of the Academic Writing Task 1. Table 5 below Not surprisingly, as 51.7% reported that they thought of
summarises the different stages and the questions most of their ideas before planning an outline, only 29%
designed to elicit respondent behaviour. mostly thought of ideas while planning an outline. An

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
30 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

equally low percentage thought of their ideas in English. in responses between those who responded to the
This is not altogether surprising. L2 writers, especially diagrammatic task, of whom 58.1% thought it was easy
unskilled ones, may experience a heavy cognitive load in to put their ideas in good order, and the data task
simply encoding their thoughts as they write so are respondents, of whom only 17.2% thought it was easy
unlikely to plan for writing in English. (see question 19 in Table 8). It could be surmised that
the diagrammatic task does offer more scaffolding than
Micro-planning the data task although interestingly more data task
This level of planning takes place as the text evolves at respondents reported being able to put their ideas or
both the paragraph and sentence level while also taking content in good order (46.7% to 31.1%, see question
into account decisions made in macro-planning. Perhaps 17) but that of course does not necessarily mean it was
the most interesting finding is the substantial difference easy.

Table 6: Macro-planning (Agreement with questions 1–9)

Question agree or strongly agree


—————————————————————————————————
Data Diagram Total
(n=29) (n=31) (n=60)

1 I FIRST read the instructions very slowly considering the significance of each word in it. 55.2% 77.5% 66.7%

2 I thought of WHAT I was required to write after reading the instructions and visual input. 79.3% 80.6% 80.0%

3 I thought of HOW to write my response so that it would respond well to the instructions. 79.3% 71.0% 75.0%

4 I thought of HOW to satisfy readers or examiners. 55.2% 42.0% 48.3%

5 I was able to understand the instructions for this writing test completely. 69.0% 80.7% 75.0%

6 I know A LOT about this topic, i.e., I have enough ideas to write about this topic. 24.1% 16.1% 20.0%

7 I felt it was easy to produce enough ideas for the Task 1 from memory. 17.2% 35.5% 26.6%

8 I know A LOT about this task type, i.e. I know how to write a descriptive summary of data 24.1% 25.8% 25.0%
(chart, diagram, table)/diagrams (process, map, plan).

9 I know A LOT about other types of IELTS Academic Writing Task 1s e.g., diagrams 27.5% 32.2% 30.0%
(process, map, plan)/data (chart, diagram, table).

Table 7: Organising (Agreement with questions 10–15)

Question agree or strongly agree


—————————————————————————————————————————
Data Diagram Total
(n=29 for Q10–11) (n=31 for Q10–11) (n=60 for Q10–11)
(n=15 for Q12–15) (n=16 for Q12–15) (n=31 for Q12–15)

10 Ideas occurring to me at the beginning tended to be COMPLETE. 34.5% 32.2% 33.4%

11 Ideas occurring to me at the beginning were well ORGANISED. 31.0% 45.1% 38.3%

12 I planned an outline on paper or in my head BEFORE starting to write.* 51.8% 51.6% 51.7%

13 I thought of most of my ideas for the task BEFORE planning an outline. 60.0% 43.8% 51.7%

14 I thought of most of my ideas for the task WHILE I planned an outline. 33.3% 25.1% 29.0%

15 I thought of the ideas only in ENGLISH. 33.3% 25.1% 29.0%

*As respondents only had to answer Yes or No to this item, % agreement is based on those who answered ‘yes’.

Table 8: Micro-planning (Agreement with questions 16–19)

Question agree or strongly agree


—————————————————————————————————————————
Data Diagram Total
(n=15 for Q16–18) (n=16 for Q16–18) (n=31 for Q16–18)
(n=29 for Q19) (n=31 for Q19) (n=60 for Q19)

16 I was able to prioritise the ideas. 40.0% 37.6% 38.7%

17 I was able to put my ideas or content in good order. 46.7% 31.1% 38.8%

18 Some ideas had to be removed while I was putting them in good order. 40.0% 37.6% 38.7%

19 I felt it was easy to put ideas in good order. 17.2% 58.1% 38.4%

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 31

Table 9: Translating (Agreement with questions 20–26)

Question agree or strongly agree


————————————————————————————————
Data Diagram Total
(n=29) (n=31) (n=60)

20 I felt it was easy to express ideas using the appropriate words. 17.2% 45.1% 31.7%

21 I felt it was easy to express ideas using the correct sentences. 24.1% 25.8% 25.0%

22 I thought of MOST of my ideas for the summary WHILE I was actually writing it. 41.3% 64.6% 53.3%

23 I was able to express my ideas by using appropriate words. 13.8% 61.3% 38.4%

24 I was able to express my ideas using CORRECT sentence structures. 20.6% 45.2% 33.3%

25 I was able to develop any paragraph by putting sentences in logical order in the paragraph. 31.0% 64.5% 48.3%

26 I was able to CONNECT my ideas smoothly in the whole response. 13.8% 41.9% 28.4%

Translating of the whole text at the other. So although the socio-


It is at this stage that decisions made at macro-planning cognitive model presents these as two separate stages,
move from the abstract to the concrete where the writer these processes are so inextricably linked that for the
encodes their ideas into the written form. It is at this stage purposes of analysis questions 27 to 38 were used to
where the L2 writer may face particular problems elicit information on both types of revision (see Table 10).
dependent on their language resources. The questions in Monitoring is a very demanding activity so it is likely
Table 9 therefore relate to the ease or otherwise of this that the lower-level checking of mechanical accuracy of
process of conversion of abstract ideas (mostly thought in spelling, punctuation and syntax (questions 32–35) will
L1) to the linguistic form of L2. exceed the higher-level checking of how the text fits in
An initial glance would suggest that most of this sample with the goals established in macro-planning and the
did not find the translating stage easy. They also did not text produced so far (questions 28–31). The figures below,
think that they were able to express their ideas however, do not seem to bear this out with both types
appropriately and accurately. For questions 20 and 21 the of monitoring exhibiting fairly similar levels of
levels of disagreement were quite high – 53.4% and 45% agreement.
respectively – which suggests that this cohort was not Questions 27 and 36–38 show much lower levels of
particularly proficient or confident in their lexical and agreement. These focus more on revision after the text as
grammatical knowledge of English. a whole has been written. Only 28% tried to take into
The figures were only marginally higher for questions account the word count (question 27) constraints or wrote
23 and 24 which focused on the ability as opposed to the a redraft (question 36). Slightly more reviewed any
ease with which they expressed their ideas lexically and statements or thoughts that they had removed (33% in
grammatically. For both questions it was those who question 37) and just over a quarter of these candidates
responded to the diagram task that expressed markedly thought it easy to review and revise the whole response.
higher levels of agreement suggesting that these It seems that as the rubrics recommend just 20 minutes
candidates found the data task much more challenging for the completion of Task 1 it is time constraints that
lexically. This was perhaps partly due to the lack of lexical are probably the main factor in these low levels of
support when compared with the brick task (e.g. digger, agreement.
clay, mould, etc. are on the question paper). Regarding An interesting finding from the questionnaire data is the
responses to questions 25 and 26 just under 50% (and degree of difference in agreement between those
64.5% of the diagram respondents) thought they were able responding to the data and the diagram task, although as
to connect ideas within each paragraph but far fewer felt the numbers involved are quite low any conclusions must
that they were able to organise the information as a whole be treated as very tentative. With 29 and 31 respondents
(28.4%). Only 13.8% of the data respondents expressed respectively there is, however, a statistical procedure that
agreement which suggests that building a coherent could be used to see if there was any significant difference
response to a graph showing quite a number of variables between the two tasks.
may be more challenging than a process task where the From sampling the distribution of differences between
structure is almost self-evident. means a t-test for independent samples with equal variance
revealed no significant difference in the distribution of
Monitoring and revising differences between means between the two groups
When a writer reviews at the sentence, paragraph or whole (t=1.792, df=72, p=0.005). Thus there is no evidence to
text level this involves the process of monitoring. If writing suggest that the means between the two groups are
at any of these levels is found unsatisfactory, the writer different across the two task types, indicating that there is
is likely to revise, which could involve correcting a little difference in the perception of candidates between
typographical error at one extreme to a wholesale re-draft these two task types.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
32 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Table 10: Monitoring and revising (Agreement with questions 27–38)

Question agree or strongly agree


———————————————————————————
Data Diagram Total
(n=29) (n=31) (n=60)

27 I tried NOT to write more than the required number of words in the instructions. 31.0% 25.8% 28.3%

28 I reviewed the correctness of the contents and their order WHILE writing this response. 44.8% 45.1% 45.0%

29 I reviewed the correctness of the contents and their order AFTER finishing this response. 44.8% 48.4% 46.6%

30 I reviewed the appropriateness of the contents and their order WHILE writing this response. 41.4% 45.2% 43.3%

31 I reviewed the appropriateness of the contents and their order AFTER finishing this response. 48.3% 42.0% 45.0%

32 I reviewed the correctness of sentences WHILE writing this response. 51.8% 51.7% 51.6%

33 I reviewed the correctness of sentences AFTER finishing this response. 44.8% 41.9% 43.4%

34 I reviewed the appropriateness of words WHILE writing this response. 51.7% 54.8% 53.4%

35 I reviewed the appropriateness of words AFTER finishing this response. 44.8% 41.9% 43.4%

36 I was able to write a draft response in this test, then wrote the response again neatly 37.9% 19.4% 28.4%
within the given time.

37 After finishing the summary I also thought for a while of those statements or thoughts I removed. 37.9% 29.0% 33.4%

38 I felt it was easy to review or revise the whole response. 24.1% 29.0% 26.7%

Conclusion and recommendations • Analysis of linguistic features of scripts from the VPA
participants to gain further insight into levels of
For the cognitive processes required to complete the Task 1
processing in terms of rhetorical and content parameters.
in IELTS Academic Writing to be deemed appropriate, they
need to replicate those thought processes that test takers
IELTS has always been a research-led enterprise and so
will need to utilise in the future target language use
these and other studies are likely to come to fruition in one
situation. This study demonstrates that there is evidence of
form or another. As a high-stakes test it is important that
a large variety of the cognitive processes being employed,
IELTS continues to demonstrate validity. It is hoped that this
although organising does not seem to be as activated as
small scale study using a relatively recent theoretical
much as the other processes. This is perhaps because
framework contributes in some way to the validity argument
ultimately the completion of Task 1 requires a knowledge-
supporting the use of IELTS as a means of assessing the
telling strategy even with very proficient writers. Unskilled
writing ability of those wishing to study or work in the
writers are likely to plan less with each sentence generating
medium of English.
the content of the next piece of text in a linear non-
reflective manner. Skilled writers on the other hand may
find that re-shaping the content from a visual input is not
particularly demanding. They may adopt problem-solving References
strategies involved in knowledge transformation such as
Bachman, L (1990) Fundamental Considerations in Language Testing,
organising, but knowledge telling may be successful with Oxford: Oxford University Press.
very straightforward Task 1s.
Bridges, G (2008) Demonstrating further evidence of cognitive and
In order to follow up this study and to furnish further context validity for Task 1 of the IELTS Academic Writing Paper using
evidence of cognitive validity to support the use of IELTS a socio-cognitive validity framework, unpublished MA dissertation,
Academic Writing Task 1 the following research projects Anglia Ruskin University.
could be initiated: Chapelle, C (1998) Construct definition and validity inquiry in SLA
• Further verbal protocol analysis where each informant research, in Bachman, L and Cohen, A (Eds) Second Language
acquisition and language testing interfaces, Cambridge: Cambridge
would verbalise their thoughts on both data and diagram
University Press, 32–70.
input tasks. Comparisons were limited in my study as the
Ellis, R (1994) The Study of Second Language Acquisition, Oxford:
task variable was confounded by the participant variable.
Oxford University Press.
• Keystroke logging of responses during VPA as subjects Eysenck, M and Keane, M (2005) Cognitive Psychology (5th edition),
type their responses. This kind of research will become Hove: Psychology Press.
increasingly relevant as the IELTS partners plan to offer Field, J (2004) Psycholinguistics: the Key Concepts, London:
computer-based variations on the traditional pen and Routledge.
paper administrations they currently offer. Keystroke Green, A (1998) Verbal protocol analysis in language testing
logging provides a more accurate record of when and research, Cambridge: UCLES/Cambridge University Press.
where writers pause and together with concurrent Hyland, K (2002) Teaching and Researching Writing, London:
protocols potentially offers richer data. Longman.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 33

IELTS Scores Explained DVD (2006), Cambridge: Cambridge ESOL Scardamalia, M and Bereiter, C (1987) Knowledge telling and
Publications. knowledge transforming in written composition, in Rosenberg, S
Mayor, B, Hewings, A, North, S, Swann, J and Coffin, C (2006) A (Ed.) Advances in Applied Psycholinguistics, Volume 2: Reading,
writing and language learning, Cambridge: Cambridge University
linguistic analysis of Chinese and Greek L1 scripts for IELTS
Press, 142–175.
Academic Writing Task 2, in Taylor, L and Falvey, P (Eds) IELTS
Collected Papers: Research in speaking and writing assessment, Shaw, S and Khalifa, H (2007) Deconstructing the Main Suite tests to
Cambridge: Cambridge ESOL/Cambridge University Press, understand them better, Cambridge ESOL presentation to internal
250–315. staff.
Mickan, P, Slater, S and Gibson, C (2000) Study of Response Validity Shaw, S and Weir, C (2007) Examining Writing: Research and practice
of the IELTS Writing Subtest, in Tulloh, R (Ed.) IELTS Research in assessing second language writing, Cambridge: Cambridge
Reports Volume 3, Canberra: IELTS Australia, 29–48. ESOL/Cambridge University Press.
Moore, T and Morton, J (2006) Authenticity in the IELTS Academic Taylor, L and Falvey, P (2007) IELTS Collected Papers: Research in
Writing test: a comparative study of Task 2 items and university speaking and writing assessment Cambridge: Cambridge
assignments, in Taylor, L and Falvey, P (Eds) IELTS Collected Papers: ESOL/Cambridge University Press.
Research in speaking and writing assessment, Cambridge: Weir, C (2005) Language Testing and Validation: an evidence-based
Cambridge ESOL/Cambridge University Press, 197–249. approach, Basingstoke: Palgrave Macmillan.
Saville, N (2003) The process of test development and revision within Weir, C, O’Sullivan, B, Jin Yan and Bax, S (2007) Does the computer
UCLES EFL, in Weir, C and Milanovic, M (Eds) Continuity and make a difference? Reaction of candidates to a computer-based
innovation: revising the Cambridge Proficiency in English versus a traditional hand-written form of the IELTS Writing
Examination 1913–2002, Cambridge: UCLES/Cambridge University component: effects and impact, in Taylor, L (Ed.) IELTS Research
Press, 57–120. Report Volume 7, IELTS Australia and British Council, 311–347.

Qualification and certainty in L2 writing: A learner


corpus study
SIAN MORGAN CAMBRIDGE ESOL ORAL EXAMINER, UNIVERSITY OF MODENA AND REGGIO EMILIA, ITALY

Summary consciousness-raising activities and form-focused practice


are given.
This paper is based on a Master’s thesis in TESOL
submitted to Sheffield Hallam University (UK) in 2006. The
research was funded by Cambridge ESOL. The MA was
supervised by Dr Mary Williams.
Introduction
The ability to express qualification and certainty is In recent years Corpus Linguistics (CL) has allowed us to
considered to be an important interpersonal skill which examine authentic native English and observe linguistic
enables writers to avoid absolute statements and express and lexical patterns which occur typically in different writing
caution in anticipation of criticism. Acknowledging the contexts and discourse communities. This ‘new perspective
existence of possible alternative voices (Hyland 2005) plays on the familiar’ (Hunston 2002:3) can also have useful
a central role in building reader–writer relationships. It is applications in language teaching and pedagogy. One
important therefore that second language learners acquire developing field of enquiry in corpus linguistics is the
flexible control of this skill in order for their writing to be analysis of Computer Learner Corpora (CLC), which allows us
successful. This paper describes a classroom research to assemble authentic learner output and compare it to
project carried out with second year language students at authentic, native-speaker (NS) data from a similar field or
the University of Modena and Reggio Emilia (for more domain.
information see Morgan 2008). A small corpus of Such comparison can highlight what kind of features
argumentative writing was compiled and examined to occur in L2 writing, and which of these occur most
explore how this student population expressed frequently. It can also give us information on misuse: what
qualification and certainty in their writing. The findings errors occur typically at which level. Equally interesting are
mirror those from previous studies of L2 writers: the the insights it gives us about the phenomenon of under-
students in this study rely on a small pool of modal verbs, use, which does not lead to errors, but to under-
overuse informal devices typical of spoken discourse, and representation of words or structures (Van Els, Bongaerts,
tend to overstate their commitment to propositions. Extra, van Os & Janssen-van Dieten 1984:63). By observing
Implications for second language (L2) writing pedagogy and items which are avoided or distributed differently to
testing are discussed and some suggestions for comparable NS language, we are able to get a picture of

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
34 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

aspects of language which present difficulties for specific to be an important politeness strategy in speech and
groups of learners at different points on the interlanguage writing. Salager-Meyer (1995) considers hedges and
continuum. This information can yield insights about range, boosters to be ‘a significant communicative resource for
complexity and typical performance at different proficiency student writers at any proficiency level’. Hyland & Milton
levels. In fact, CLC is currently being used in the English (1997:186) also comment on this important area of
Profile project to describe in more detail linguistic and pragmatic competence, and argue that these devices
lexical features of learner output (McCarthy 2009). With C1 influence the reader’s assessment of ‘both referential and
and C2 levels, where advanced language performance may affective aspects of texts’ (bold added). In spoken discourse
reveal clusters of different features (Jarvis, Grant, Bikowski too, increasing attention has been paid to the pragmatic
& Ferris 2003:399), corpus analysis may help us importance of hedging strategies. Carter (2005:68)
understand how these are distributed over student suggests that they have an important interpersonal function
populations. Regardless of level however, if we are able to in keeping lines of communication open; Hyland (2005)
identify typical errors or avoidance strategies which still refers to this elsewhere as ‘opening up a discursive space’
need to be addressed, we can then try to feed work on in written discourse.
these areas into our teaching. All of this seems to suggest that flexible use of modal
devices is important both as an interpersonal feature and
as a communication strategy in L2 production in general. It
Focus of the study is because of their all-pervasive nature in many types of
discourse, as well as their significance in academic writing,
This study was prompted by a previous investigation by that I decided to carry out a preliminary study using learner
Hyland and Milton (1997) into the way Hong Kong students corpora to investigate the frequency and occurrence of
express qualification and certainty in their writing. The these devices in my own local teaching context.
authors believe that flexible use of linguistic devices to The research question was the following: how do
mitigate and boost statements is crucial to academic undergraduate students express qualification and certainty
discourse for the following reasons: in their argumentative writing, and what type of devices do
Mitigators or ‘hedges’ allow writers to: they use most frequently?
• avoid absolute statements
• acknowledge the presence of alternative voices
• express caution in anticipation of criticism. Student profile and methods
Amplifiers or ‘boosters’ allow writers to: Although the learner corpus used is very small, Granger
• demonstrate confidence and commitment in a (1998a) suggests that small corpora compiled by teachers
proposition of their own students’ work can yield useful insights into a
group profile of learner language. Clearly, for any corpus to
• mark their involvement and solidarity with the reader.
be useful it is essential to have clear design criteria; in the
My own experience of working with Italian students case of learner language it is particularly important to
suggests that they have firm control of amplifiers but are control for the many different types of learner language and
less likely to mitigate their statements. For example, several situations, taking into account variables such as the
years ago one student, Chiara1, wrote a well-structured and following:
supported, generally accurate essay on the subject of
teenage pregnancies in Britain, and was disappointed at Table 1: Variables to control for in learner corpora design
receiving a slightly lower mark than she had expected. This
Language Learner
was because she had failed to navigate the ‘area between
Yes and No’ (Halliday 1985:335), and used only categorical medium age
statements with inappropriate strength of claim, resulting in genre sex
what Milton (1999:230) has called ‘over zealous emphasis’. topic L1
If Chiara had qualified her statements more, in order to task level
‘recognise alternative voices’ (Hyland 2005:52) her essay task setting learning context
would have been more persuasive. According to Hyland
(Adapted from Granger 1998b:9)
(2005:24):
‘… meaning is not synonymous with ‘content’ but dependent on all the
The students in this project formed a relatively
components of a text. …both propositional and metadiscoursal
homogenous group in terms of age, level and language
elements occur together … each element expressing its own ‘content’:
learning background. The 50 students involved were in their
one concerned with the world, and the other with the text and its
reception.’ (bold added)
second year of a degree in European languages and culture
at the University of Modena and Reggio Emilia. This was a
Equally importantly, as well as its central function in predominantly female student population (42 female and
establishing the tone and style of academic writing, the 8 male) whose language level ranged from high B2 to low
ability to express qualification and certainty is considered C1, as measured by their results in the first year exam.
The study was conducted with this group of high-
1 A pseudonym intermediate students as it was hoped that their firm

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 35

control of grammatical and lexical resources would free … so work experience can really help you to grow…
them up to reflect upon how modal or epistemic devices … that’s why you’re really interested on it.

could be used to achieve different rhetorical purposes. … world of sport has really changed today …
… the meeting are really serious and …
The aim was to observe how they hedged or boosted their
… have turned out to be really appreciated …
statements; therefore the focus here was on
appropriateness, rather than accuracy. Expert NS and non-native-speaker (NNS) writers, in a
The data used are two small corpora based on student similar argumentative task, might have achieved this
writing produced at the end of the first and second emphasis more formally, for example, by replacing really
semesters. CORPUS 1 was compiled of two short important with crucial, really help you with be of
argumentative writing tasks submitted in the first considerable help, and really appreciated with very much
semester. The handwritten scripts were later keyed into the appreciated.
computer verbatim by the students themselves. I then
corrected typographical errors only and analysed the texts Predominance of central modals
using Wordsmith Tools text retrieval software to examine
The same central modal verbs will, should, would, could,
the type and frequency of hedges and boosters occurring
and the epistemic verb think, appeared in the top 10 tokens
in the scripts. A further manual analysis was conducted
of both Corpus 1 and Corpus 2 (see Table 3).
to disambiguate any items. CORPUS 2 was compiled from
two further assignments submitted at the end of the second
semester and a similar analysis was carried out. Table 3: Occurrence of central modals in Corpus 1 and 2 (raw figures)

CORPUS 1 CORPUS 2

No. words 23,470 29,560


Findings and analysis
No. texts 99 75
In order for a learner corpus to be meaningful it needs to
be compared to some kind of norm. For this, I used Hyland could 60 30
& Milton’s (1997:196) taxonomy of the most frequently couldn’t 1 0
appearing epistemic items in academic discourse, and may 7 9
observed which of these items occurred in these two might 5 4
learner corpora. should 39 20
shouldn’t 0 0
Informal items
would 81 29
The students in this population also used a considerable wouldn’t 0 0
number of informal items which were not cited in Hyland will 123 31
& Milton’s (1997) taxonomy (see Table 2). won’t 0 2

Table 2: Top 10 epistemic devices which occurred in this study Total 316 125

CORPUS 1 No. % CORPUS 2 No. %

can 112 4.8% all 63 2.1%


all 73 3.2% can 38 1.3%
Again this mirrors previous findings (Hinkel 2005, Hyland
everyone 27 1.2% every 21 0.7%
& Milton 1997) that both NS and NNS use the same pool of
items in their writing, albeit with different frequency
every 26 1.2% especially 20 0.7%
patterns. This may be partly developmental or interlingual;
really 20 0.9% according to 19 0.6%
it may also be a result of teaching, or the large amounts of
in my opinion 19 0.8% in my opinion 16 0.5%
attention devoted to these items in textbooks (Hyland &
especially 19 0.8% sort of 13 0.4%
Milton 1997:189). It does, however, seem to suggest that
must 17 0.7% really 12 0.4%
modal verbs are more automatically retrievable or easier to
extremely 16 0.7% show 11 0.4 %
manipulate for NNS writers than lexical modal devices,
completely 12 0.5% completely 10 0.3%
modal nouns or adverbs.

This mirrors previous findings (Hinkel 2005, Hyland Epistemic verbs


& Milton 1997, Milton 1999) which suggest that L2 writers After central modals, the next most frequent items were
rely more on items from spoken language and epistemic verbs such as think, know and believe, together
conversational discourse. For example, in this particular with usuality markers such as always and usually. Hinkel’s
study, there is an overuse of informal items such as really (2005) finding that think rather than believe is preferred by
which functions as an intensifier: NNS writers is replicated here with think appearing in third
… not obligatory, they are really important. and second position in Corpus 1 and 2 (Table 4 overleaf).
… to find people who really like travelling and … Several studies have confirmed this overuse of I think as
… met outside of school will really help you in your a popular sentence builder in L2 writing, occurring three to
… it doesn’t really concern only … five times more frequently in NNS writing compared to NS
… on these facts to be really effective; teachers … writing (Granger 1998a).

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
36 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

Table 4: Top 10 modal devices occurring in Corpus 1 and 2 initial position has a mitigating or amplifying effect on
(percentages)
writer commitment, and if this effect might change if it were
CORPUS 1 No. % CORPUS 2 No. % embedded or inserted at clause-initial position.
It may be that NNS writers prefer to use fixed phrases in
will 123 5% in fact 41 1.4%
sentence-initial position because they are often presented
would 81 3.5% think 34 1.2%
in school textbooks in this way, and this makes them
think 61 2.7% will 31 1.2%
implicitly available for uptake by students. This might be
could 60 2.6% could 30 1.1%
something we want to draw students’ attention to when
should 39 1.7% would 29 0.9%
using published materials.
always 26 1.1% always 29 0.9%
in fact 24 1% quite 28 0.9% Compound hedges
know 19 0.8% should 20 0.7%
Despite the predominance of boosters in this learner
usually 18 0.8% clear 16 0.5%
corpus, there were also some clear attempts to qualify
possible 15 0.6% believe 15 0.5%
assertions. For example, some students tried to combine
devices in a ‘compound hedge’ (Salager-Meyer 1995:155),
not always with harmonic results. Nevertheless, it is
Predominance of boosters interesting to note that such clusters, typical of expert or NS
It is also interesting to note that boosters (which have an writers, also occurred in this corpus (see examples below).
amplifying function) rather than hedges (mitigating This seems to indicate an increasing awareness of the
function) predominate in the list of 10 most frequently reader–writer relationship in this high-intermediate student
occurring devices in this corpus. This may be a result of a population.
mother tongue (L1) fingerprint on L2, although this If it is possible for me to make a suggestion, my advice would be to try
hypothesis would need to be researched further for an to reduce the number of cars circulating
Italian L1 context. Past learning experience or instruction …. or rather I would say that I feel the need to express my opinion
where students are encouraged to express their views concerning …
assertively may also be a contributory factor.
Personally, I think that imposing a daily “congestion charge” could be a
good idea.…
Sentence position
This restriction seems to me not quite right …
Previous studies of complexity in L2 writing have found
that, possibly because of the multiple demands of the Some researchers (e.g. Hyland & Milton 1997) have
composing process, learners frequently default to safe found that students who modify their statements with more
usages such as thing instead of topic issue/question. In this tentative expressions tend to have a higher level of general
corpus, too, the same phenomenon occurs when expressing language proficiency. Others, instead, suggest that
opinions. For instance, many students in this corpus relied although greater linguistic competence is an important pre-
on personal subjectivity markers such as in my opinion, requisite, it does not automatically imply the parallel
what Hasselgren (1994) might describe as a ‘lexical teddy development of pragmatic competence (Bardovi-Harlig &
bear’. This is illustrated in the examples below: Dörnyei 1998:234).
… instead of having a walk with a friend. In my opinion, it would be
Possible reasons for lack of control of modal devices
better spend …
Even in this small study of a relatively homogenous student
… “real” encounter takes place. In my opinion, to deal with this issue …
population there was some variation both in the degree of
… action proposing these two projects. In my opinion, proposal
formality and the degree in the use of tentativeness. This
number one is …
may be linked to one or more of the following factors:
… the Car Park and the city centre, yet in my opinion this may be
• language level (even within this relatively homogeneous
revealed as …
student population)
… threatened or highly endangered. In my opinion, we have led
• writing competence (as opposed to language
our planet …
competence)
The first proposal is, in my opinion, a great solution for … • incomplete register control
… and stressful sport activity. In my opinion the secret for staying fit … • individual differences in communicative style
… the health side to doing sports. In my opinion practicing sports, and … • cultural differences in rhetorical style.
… too much traffic and much noise. In my opinion a good solution for …

It is also interesting to observe where these hedges are


used, and to speculate whether positioning can strengthen
Discussion and implications for teaching
writer commitment. In this corpus, in my opinion occurs and testing
often at sentence-initial position, in contrast to NS writing The findings of this initial experiment indicate that many
where it occurs frequently in a subordinate or clause-initial high-intermediate students in this study used modality
position. It is arguable whether in my opinion at sentence- markers to express qualification and certainty. Like other

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 37

populations previously studied (Hinkel 2005, Hyland & models through extensive reading of a variety of text types.
Milton 1997), they tend to overstate rather then hedge their In this way they can explore contextualised examples of
assertions, possibly in a bid to ‘sell’ their ideas, and often these devices, notice how they occur typically in
default to informal items (e.g. really), creating a degree of discourse, and reflect on their function in each context.
writer visibility which may not be appropriate in all types of For example, the predominance of hedges in the abstract
writing. Also, the narrow range of modal auxiliaries which and discussion section of an academic article are
learners tend to rely on at this stage may not be adequate polypragmatic in that they express a degree of uncertainty
as they progress to more complex, pragmatically sensitive and therefore humility towards the academic community.
writing events in future contexts, both academic and Apprentice texts written by advanced-level students
professional. Therefore it is important to make learners (Flowerdew 2000) can also be an excellent source of
aware that there is a wider spectrum of linguistic choices reading texts for students of slightly lower levels. Attention
available for these purposes and to provide opportunities can be drawn to hedging devices, which are often lexically
for them to encounter such alternatives in context. invisible to learners (Lowe 1996:30), and the possible
A further consideration is the improvement of stylistic purpose of these can then be discussed. For example,
proficiency, which is an important objective as students they may be used to express caution in anticipation of
progress along the writing continuum. The increasing criticism, to show politeness and modesty towards the
internationalisation of higher education means that, in academic community and wider readership, or to open up
order to gain access to English-medium university courses, a dialogical space, among others.
students need to obtain advanced English language The following are some suggestions for form-focused
qualifications such as International English Language instruction and consciousness-raising (CR) activities:
Testing System (IELTS), CAE (Cambridge English: Advanced2) • remove hedges from texts and ask students to discuss
or CPE (Certificate of Proficiency in English). Testing criteria the resulting effect on the reader
for these exams, based on the Common European
• ask students to explore the function of multi-word items
Framework of Reference (CEFR) descriptors, include lexical
which naturally occur in the target discourse such as it
resources and interactive communication. To meet the
would seem that, to my knowledge, to some extent or the
required level for C1 and C2, students need to use a wide
more informal on the whole in their reading (and notice
range of lexis accurately and appropriately to perform
that they are sometimes embedded in the clause and not
interpersonal functions and meet the testing criteria.
in sentence-initial position)
Therefore a strong learner training component in exam
preparation classes could provide learners with strategies • ask students to distinguish statements in a text which
to extend their range of lexis and discover alternatives to report facts and those which are unproven
certain default usages or ‘islands of reliability’ (Dechert • students rewrite an academic essay (which uses hedges
1984:227). and boosters) into popular journalistic style (which
For second language learners, increasing their stock of doesn’t) or vice versa (Hyland 2005)
lexis is a particular challenge (Schmitt 2008:329). Research • design persuasive tasks of various kinds on sensitive
on advanced students’ vocabulary (Ringbom 1998:43) has topics, anticipating the potentially critical views of the
shown that learners at this level consistently use the 100 reader (Hyland 2005)
most frequent words more often than NS writers. Rundell &
• students could reformulate texts to accommodate
Granger (2007) report corpus findings demonstrating that
different audiences, and compare the before and after
learners writing academic texts use the discourse marker
effect on the audience.
besides about 15 times more frequently than native
speakers writing in the same mode. Such findings highlight
how expanding lexical resources is a key priority for
learners, and how vocabulary acquisition should concern Conclusion
not only content words, but also a range of lexis to perform This has been a preliminary investigation into an area of
interpersonal functions such as agreeing, disagreeing or learner language which is receiving increasing attention
expressing opinion. For example, the findings of this from discourse analysts. The study should be regarded as a
particular study suggest that these students need to point of departure rather than arrival, and the findings are
develop their repertoire of alternatives to central modals. intended to be representative of a specific student
Sinclair’s (1991) idiom (rather than open choice) principle population only. Clearly, it would benefit from further
holds that meaning is attached to the whole phrase rather quantitative and qualitative analysis and replication in
than the individual parts of it, so teachers may want to draw other student populations. Nevertheless, it has thrown up
students’ attention to prefabricated modal chunks (lexical interesting insights about how the students in this setting
phrases) as they are encountered, as well as individual navigate the ‘area of meaning between Yes and No’, which I
tokens (modal verbs). have since used to inform my teaching. What it suggests is
As well as providing opportunities for intentional learning that we may need to adopt a more systematic approach to
of vocabulary, we need to provide opportunities for raising students’ awareness of these interpersonal features
incidental learning of vocabulary (Schmitt 2008:353). in building reader–writer relationships and fostering
Students may benefit from exposure to appropriate text effective communication in general. In this way, unlike
Chiara in her essay on teenage pregnancies, they can learn
2 Previously known as Certificate in Advanced English to acknowledge the presence of ‘alternative voices’.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
38 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

References and further reading Hyland, K and Milton, J (1997) Qualification and Certainty in L1 and L2
Students’ Writing, Journal of Second Language Writing 6 (2),
Bardovi-Harlig, K and Dörnyei, Z (1998) Do language learners recognize 183–205.
pragmatic violations? Pragmatic versus grammatical awareness in
instructed L2 learning, TESOL Quarterly 32 (2), 233–262. Jarvis, S, Grant, L, Bikowski, D and Ferris, D (2003) Exploring multiple
profiles of highly rated learner compositions, Journal of Second
Carter, R (2005) What is a frequent word?, paper presented at the Language Writing 12, 377–403.
international IATEFL conference, Cardiff, 5–9 April, 2005.
Lowe, G (1996) Intensifiers and Hedges in Questionnaire items and the
Dechert, H (1984) Second language production: Six hypotheses, in Lexical Invisibility Hypothesis, Applied linguistics, 17 (1), 1–37.
Dechert, H, Mohle, D and Raupach, M (Eds) Second Language
Productions, Tubingen: Gunter Narr Verlag, 211–223. McCarthy, M (2009) English Profile. TESOL Talk from Nottingham,
retrieved from http://portal.lsri.nottingham.ac.uk/SiteDirectory/
Flowerdew, L (2000) Using a genre-based framework to teach TTfN/default.asp
organizational structure in academic writing, ELT Journal 54 (4),
369–378. Milton, J (1999) Lexical thickets and electronic gateways, in Candlin, C
N and Hyland, K (Eds) Writing: texts, processes and practices,
Granger, S (1998a) Prefabricated patterns in advanced ELT writing: London: Longman, 221–244.
collocations and formulae, in Cowie, A P (Ed.) Phraseology:
theory, analysis, and applications, Oxford: Clarendon Press, Morgan, B S (2008) The space between Yes and No: how Italian
145–160. students qualify and boost their statements, in Palawek, M (Ed.)
Investigating English Language Learning and Teaching, Poznan-
Granger, S (1998b) The computer learner corpus: a versatile new Kalisz: Adam Mickiewicz University, 267–278.
source of data for SLA research, in Granger, S (Ed.) Learner English
on Computer, New York: Pearson Education, 3–18. Ringbom, H (1998) Vocabulary frequencies in advanced learner
English: A cross-linguistic approach, in Granger, S (Ed.) Learner
Granger, S (2002) A Bird’s eye view of Learner Corpus Research, in English on Computer, London & New York: Addison Wesley
Granger, S, Hung, J and Petch-Tyson, S (Eds) Computer Learner Longman, 41–52.
Corpora, Second Language Acquisition and Foreign Language
teaching, Amsterdam/Philadelphia: John Benjamins Publishing Rundell, M and Granger, S (2007) From Corpus to confidence, retrieved
Company, 3–33. from: http://www.macmillandictionaries.com/MED-
Magazine/August2007/46-Feature_CorporatoC.htm
Halliday, M (1985) An Introduction to Functional Grammar, London:
Arnold. Salager-Meyer, F (1995) I Think That Perhaps You Should: A Study of
Hedges in Written Scientific Discourse, Journal of TESOL France 2,
Hasselgren, A (1994) Lexical teddy bears and advanced learners: a 127–143.
study into the way Norwegian students cope with English
vocabulary, International Journal of Applied Linguistics 4, 237–58. Schmitt, N (2008) Review article: Instructed second language
vocabulary learning, Language Teaching Research 12, 329–363.
Hinkel, E (2005) Hedging, inflating and persuading, Applied language
learning 15 (1–2), 29–53. Sinclair, J M (1991) Corpus Concordance Collocation, Oxford: Oxford
University Press.
Hunston, S (2002) Corpora in Applied Linguistics, Cambridge:
Cambridge University Press. Van Els, T, Bongaerts, T, Extra, G, van Os, C, and Janssen-van Dieten,
A M (1984) Applied Linguistics and the Learning and Teaching
Hyland, K (2000) Hedges, Boosters and Lexical Invisibility: Noticing Languages, Edward Arnold: London.
Modifiers in Academic Texts, Language Awareness 9 (4), 179–197.
Hyland, K (2005) Metadiscourse, London/New York: Continuum.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 39

Prompt and rater effects in second language writing


performance assessment
GAD S LIM RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This short summary is based on a doctoral thesis submitted constraint, expected grammatical person of response, and
to the University of Michigan, Ann Arbor (US) in 2009. The number of tasks. It also considers whether prompts are
PhD was supervised by Professor Diane Larsen-Freeman. differentially difficult for test takers of different genders,
language backgrounds, and proficiency levels. Second, the
Performance assessments have become the norm for study investigates the quality of raters’ ratings, whether
evaluating language learners’ writing abilities in international these are affected by time and by raters’ experience and
examinations of English proficiency. Two aspects of these language background. It also considers whether raters alter
assessments are usually systematically varied: test takers their rating behaviour depending on their perceptions of
respond to different prompts, and their responses are read prompt difficulty and of test takers’ prompt selection
by different raters. This raises the possibility of undue behaviour.
prompt and rater effects on test takers’ scores, which can The results show that test takers’ scores reflect actual
affect the validity, reliability and fairness of these tests. ability in the construct being measured as operationalised in
This study uses data from the Michigan English Language the rating scale, and are generally not affected by a range of
Assessment Battery (MELAB), including all official ratings prompt dimensions, rater variables, test taker characteristics,
given over a period of over four years (n=29,831), to or interactions thereof. It can be concluded that scores on
examine these issues related to scoring validity. It uses the this test and others like it have score validity and, assuming
multi-facet extension of Rasch methodology to model this that other inferences in the validity argument are similarly
data, producing measures on a common, interval scale. First, warranted, can be used as a basis for making appropriate
the study investigates the comparability of prompts that decisions. Further studies to develop a framework of task
differ on topic domain, rhetorical task, prompt length, task difficulty and a model of rater development are proposed.

Computer-based and paper-based writing


assessment: A comparative text analysis
LUCY CHAMBERS RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This short summary is based on a Master’s thesis submitted are used and the resulting scripts from paper-based and
to the Faculty of Arts, Law and Social Sciences, Anglia Ruskin computer-based administrations analysed.
University in 2007. The research was funded by Cambridge In the second and main part of the study scripts produced
ESOL. The MA was supervised by Dr Sebastian Rasinger. from a live PET administration were studied. Two samples of
texts were chosen; these samples were matched on
This MA research focused on Cambridge ESOL’s Preliminary candidates’ proficiency and the country in which they sat the
English Test (PET). exam. A number of linguistic and text features were
In 2007 Cambridge ESOL was starting to launch computer- analysed. Texts were found to be comparable in text length,
based versions of many of its paper-based tests. Thus it was surface features and lexical error rates. However, there were
important that the issues of comparability between differences in lexical variation and in the number of
administration modes were explored. This study focuses on sentences and paragraphs produced. It is recommended
the skill of writing and builds on research from overall score that these results be considered a starting point from which
and writing sub-element score comparability studies. Unlike to further explore text-level differences across writing
the majority of current research, which focuses on score modes, covering additional first languages, proficiency
comparability, this study focuses on the comparability of text levels and writing genres. Results from this and future
and linguistic features. Features studied include lexical studies can help inform rater training and provide
range and sophistication, text length and organisation and information for teachers and candidates. For more details on
surface features such as capitalisation and punctuation. this study see Chambers (2008).
The study is set within an ESOL assessment environment
and is in two parts. The first part is a qualitative analysis of a References
small sample of scripts that also acts as a pilot for part two. Chambers, L (2008) Computer-based and paper-based Writing
Tasks from Cambridge ESOL’s Preliminary English Test (PET) assessment: a comparative text analysis, Research Notes 34, 9–15.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
40 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

A study of the context and cognitive validity of a


BEC Vantage Test of Writing
HUGH BATEMAN ASSESSMENT AND OPERATIONS GROUP, CAMBRIDGE ESOL

This short summary is based on a Master’s thesis operations on both tasks; however, micro-planning of the
submitted to Anglia Ruskin University in 2008. The research third subject’s test task was influenced by a desire not to
was funded by Cambridge ESOL. The MA was supervised exceed the word limit specified in the task. Consideration of
Dr Sebastian Rasinger. the word limit also influenced one subject’s macro-planning
of the test task, and all three subjects engaged in
This study applied Weir’s (2005) socio-cognitive framework considerably more macro-planning for the test task than
to investigate context and cognitive validity of the Writing their real-life task. However, there was no evidence that
component of a test of English in a business context. macro-planning was affected by completing the test task on
Cognitive validity was investigated primarily through a paper rather than on computer. All three subjects engaged
small-scale, qualitative study which used verbal protocol in similar revising activity on both tasks. There was no
analysis to establish whether one of the test tasks activated evidence that the limits on major revisions to wording or
the same cognitive processes as similar tasks in the real-life structure that apply when handwriting a test task resulted in
workplace. Cognitive validity was found to be high. All three different cognitive processing operations to a word-
subjects displayed the same five stages of cognitive processed task. For a summary of the part of the study that
processing in completing the test task and the real-life task. investigated the test’s context validity (specifically, the
However, there was no evidence in either task of a sixth linguistic demands the test made of the candidates who
stage identified in the above framework, in which writers took it), see Bateman (2009).
organise ideas in a pre-linguistic form. It seems probable
that the lack of an organisation phase is related to the
brevity of the tasks rather than their English for Specific References
Purposes (ESP) nature. The fine-grained processing Bateman, H (2009) Some evidence supporting the alignment of an
operations of all three subjects were very similar for both LSP Writing test to the CEFR, Research Notes 37, 29–34.
tasks in the translation and monitoring phases. Two of the Weir, C J (2005) Language Testing and Validation: An Evidence-Based
three subjects displayed very similar micro-planning Approach, Basingstoke: Palgrave Macmillan.

Models of supervision – some considerations


JULIET WILSON CUSTOMER SERVICES GROUP, CAMBRIDGE ESOL

This short summary is based on the report submitted as and the resulting roles that a supervisor may be called upon
part of requirements for an MA in TESOL at the University of to carry out. The report went on to consider two case
London in 1996. The thesis was supervised by Dr John studies – my experiences as a trainer on a pre-service
Norrish. certificate course at a Further Education College in London,
and as a supervisor at a secondary school in Malta during
My report Models of Supervision – some considerations was the practicum of the Teacher Education and Training module
concerned with aspects of teacher supervision. After of the MA. I explored the limitations and successes of these
summarising various historical approaches to teacher two experiences and showed how the work I did in Malta
supervision and feedback, I outlined some of the factors modified my view of the supervisory process and led me to
which need to be taken into account when evaluating the draw some tentative conclusions about the advantages of a
potential of these different models, including the non-evaluative, co-operative approach to teacher training.
education/training debate, the issue of teacher evaluation

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 41

A framework for analysing and comparing CEFR-


linked certification exams
MARYLIN KIES CENTRE EXAMINATION MANAGER, UNIVERSITY OF SIENA, ITALY

This short summary is based on a Master’s dissertation linked to the CEFR using three sets of criteria: Weir’s (2005)
submitted to the University of London Institute of Education socio-cognitive validity framework to evaluate overall test
in 2009. It was supervised by Dr Amos Paran. validity, the CEFR scales to evaluate the extent to which
these are addressed in test tasks and the COE’s (2009)
Communication and transparency are fundamental ideals Manual for relating language examinations to the CEFR to
underlying the Council of Europe Common European assess the validity of linkage to a CEFR level. To illustrate
Framework of Reference (CEFR). The CEFR has facilitated this procedure, two 4-skills B1 certification exams in
communication immensely, as teachers, students, English for speakers of other languages were compared:
publishers, policy makers and examination boards all now Cambridge ESOL’s Preliminary English Test and Trinity
make reference to the CEFR levels. Transparency, however, College London’s Integrated Skills in English 1. The
presents a greater challenge, at least regarding language resulting analysis revealed that even exams that are similar
certification. Although test users may presume that exams in terms of their characteristics, aims and recognition might
pegged to the same CEFR level are ‘in some way equivalent’ not equally satisfy an institutional or professional test
or at ‘exactly the same level’ (COE 2009:4), this is not user’s requirements.
necessarily so, as the Council of Europe (COE) encourages
diversity. Moreover, interpretation of the CEFR specifications
varies considerably and no overseeing authority monitors References
claims of linkage. As students and aspiring employees
Council of Europe, Language Policy Division (2009) Relating language
normally choose the certification exams recognised, examinations to the Common European Framework of Reference for
required or offered by institutions and employers, these Languages: Learning, teaching, assessment (CEFR) A Manual,
latter must set their policies wisely. Strasbourg: Language Policy Division.
This study suggested how institutional and professional Weir, C J (2005) Language Testing and Validation: An Evidence-Based
test users may analyse and compare certification exams Approach, Basingstoke: Palgrave Macmillan.

IRT model fit from different perspectives


MUHAMMAD NAVEED KHALID RESEARCH AND VALIDATION GROUP, CAMBRIDGE ESOL

This short summary is based on a doctoral thesis submitted second problem pertains to the importance of DIF, i.e. the
to the University of Twente (Netherlands) in 2010. The PhD effect size, and related problem of defining a stopping rule
was supervised by Professor Cees A W Glass. for the searching procedure. Simulations show that the
importance of DIF and the stopping rule can be based on
The chapters in this thesis are self-contained; hence they the estimate of the difference between the means of the
can be read separately. ability distributions of the studied groups of respondents.
In Chapter 2, item bias or differential item functioning The searching procedure is stopped when the change in this
(DIF) is seen as a lack of fit to an IRT model. It is shown that effect size becomes negligible.
inferences about the presence and importance of DIF can Chapter 3 presents the measures for evaluating the most
only be made if DIF is sufficiently modelled. This requires a important assumptions underlying unidimensional item
process of so-called test purification where items with DIF response models such as subpopulation invariance, form of
are identified using statistical tests and DIF is modelled item response function, and local stochastic independence.
using group-specific item parameters. In the present study, These item fit statistics are studied in two frameworks. In a
DIF is identified using a Lagrange multiplier statistic. The frequentist MML framework, LM tests for model fit based on
first problem addressed is that the dependency of these residuals are studied. In the framework of LM model tests,
statistics might cause problems in the presence of relatively the alternative hypothesis clarifies which assumptions are
large number DIF items. However, simulation studies show exactly targeted by the residuals. The alternative framework
that the power and Type I error rate of a step wise procedure is the Bayesian one. The PPCs is a much used Bayesian
where DIF items are identified one at a time are good. The model checking tool because it has an intuitive appeal, and

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
42 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

is simple to apply. A number of simulation studies are the estimation of ability parameter) are compared with
presented that assess the Type I error rates and the power PPCs. Simulation studies are carried out using number of fit
of the proposed item fit tests in both frameworks. Overall, statistics in a number of combinations in both frameworks.
the LM statistic performs better in terms of power and Type I In Chapter 5, a method based on structural equation
error rates. modelling (or, more specifically, confirmatory factor
Chapter 4 presents fit statistics that are used for analysis) for examining measurement equivalence is
evaluating the degree fit between the chosen psychometric presented. Top-down and bottom-up approaches were
model and an examinee’s item score pattern. Person fit evaluated for constructing nested models. A comprehensive
statistic reflects the extent to which the examinee answered comparative simulation study is carried out to explore the
test questions according to the assumptions and factors that have impact performance for detecting DIF
description of the model. Frequentist tests as the LM test items.
and tests with Snijders’ correction (which take into account

Conferences and publications

IACAT conference solutions to CAT problems, rather than theoretical work


which is not implementable. The presentation examined
LAURA COPE AND TAMSIN WALKER RESEARCH AND VALIDATION
item pool usage and argued that despite all of the research
GROUP, CAMBRIDGE ESOL
that has been focused on designing exposure control
The first conference of the International Association for mechanisms, item exposure is only important practically if
Computerized Adaptive Testing (IACAT) was held from 7 to 9 item parameter drift occurs. The question ‘how much
June 2010 in Arnhem, the Netherlands. The conference was exposure is too much?’ was left unanswered, however, the
hosted by the Research Center for Examination and need to continually monitor items after calibration to check
Certification (RCEC), a partnership between Cito and The for item parameter drift was emphasised. The need to
University of Twente. Around 130 delegates from over 30 maximise the usage of items both in terms of over and
countries attended, representing a wide of range of under-exposure was emphasised, since a lot of research
interests: research, education, assessment, medical and has focused on the prevention of item over-exposure. The
commercial. Experience of CAT testing ranged from those presentation made the point that lowering the difficulty of
who were attending due to an initial interest, to items can improve test performance (psychologically) – this
organisations which already employed CAT tests, to was a popular theme across a number of presentations.
psychometricians who specialised in CAT. Finally, the question ‘what should you do if the computer
Of the three workshops which were run on the first crashes mid-test?’ was contemplated, with possible
morning, Research and Validation attended ‘Item Selection, suggestions being to restart the test, start from the point at
Exposure Control, and Test Specifications in CAT’, given by which the test stopped and start the test again the next
Bernard Veldkamp of RCEC. This described the use of linear day.
programming – mathematically defining requirements as a A useful overview on ‘How to make adaptive testing more
function which is then solved for an optimal solution – to efficient’ was given in a keynote presentation by Wim van
provide an optimal set of test items which conform to a set der Linden. He suggested individualising the start of a test
of test requirements. These requirements can be: using collateral information, such as using data from
quantitative, such as the item difficulty; categorical, such as previous instruction, or asking the candidate for a self-
the task type; or logical, for instance, sets of items which rating assessment. The use of covariates such as response
cannot be used in the same test. Requirements can be times can also speed up convergence of the ability
specified from item level through to multiple-test level. The estimate. Improving the efficiency of item writing by rule-
workshop included practical exercises in the formal based item generation (cloning) was suggested; the
specification of requirements. Once the requirements are efficiency of item calibration can then be improved by
defined, software packages are able to provide solutions pretesting item families, rather than all individual items.
within a split second. For CAT tests, an optimal linear test, Approaches to the optimal assembly of item pools, such as
the ‘shadow test’ is assembled online after each candidate item pool rotation (which helps the issues of both over and
response. After taking into account those items already under-exposure), and the idea of creating a pool as a set of
used, the next item is picked from this set of items rather test forms, each of which meets test requirement
than the whole item pool. constraints, were covered.
Brian Bontempo from Mountain Measurement, USA,
delivered an interesting presentation entitled ‘The
theoretical issues that are important to operational ALTE events
adaptive testing’, the emphasis of which was the Participants from as far afield as Chile, Libya and Qatar,
requirement for more research on operational and practical together with others from the Czech Republic, Denmark,

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0 | 43

Germany, Spain and the UK signed up for the ALTE summer a theoretical framework for validating tests of second
testing courses which took place from 20 to 24 September, language reading ability. The framework is then applied
and from 27 September to 1 October. These courses were through an examination of the tasks in Cambridge ESOL
hosted by the Basque Government, ALTE’s Basque member, Reading tests from a number of different validity
at the Royal Academy of the Basque Language in Bilbao. perspectives that reflect the socio-cognitive nature of any
The first course was an Introductory Course in Language assessment event. The authors show how an understanding
Testing run by Professor Cyril Weir and Dr Lynda Taylor, and and analysis of the framework and its components can
the second was an Introduction to Testing Reading run by assist test developers to operationalise their tests more
Dr Hanan Khalifa and Dr Ivana Vidaković from Research and effectively, especially in relation to the key criteria that
Validation. differentiate one proficiency level from another.
Later in the year, ALTE’s 39th meeting and conference will Key features of the book include: an up-to-date review of
take place at the Charles University in Prague from 10 to 12 the relevant literature on assessing reading; an accessible
November. As at previous meetings, the first two days will and systematic description of the different proficiency
include a number of Special Interest Group meetings, and levels in second language reading; and a comprehensive
workshops for ALTE members and affiliates, and the third and coherent basis for validating tests of reading. This
day will be an open conference day for anyone with an volume is a rich source of information on all aspects of
interest in language testing. The theme of the conference is examining reading ability. As such, it will be of considerable
‘Fairness and Quality Management in Language Testing’ interest to examination boards wishing to validate their own
and the speakers at the conference will include Professor reading tests in a systematic and coherent manner, as well
Antony Kunnan and Dr Piet van Avermaet, as well as Dr Neil as to academic researchers and graduate students in the
Jones, Juliet Wilson and Mike Gutteridge from Cambridge field of language assessment more generally. This is a
ESOL. Juliet, Mike and Dittany Rose will also run workshops companion volume to the previously published Examining
on the two days prior to the conference day. Writing (Shaw & Weir 2007).
Just prior to the Prague conference, ALTE is launching the Volume 31, co-edited by Lynda Taylor and Cyril J Weir, is
first of its Tier 3 language testing courses with a 2-day entitled Language Testing Matters: Investigating the wider
course on The Application of Structural Equation Modelling social and educational impact of assessment – Proceedings
(SEM) in Language Testing Research on 8 and 9 November. of the ALTE Cambridge Conference, April 2008. It explores
The course will be run by Dr Ardeshir Geranpayeh from the social and educational impact of language testing and
Research and Validation. This is an advanced course in assessment, at regional, national and international level, by
language testing (ALTE Tier 3) and is aimed at experienced bringing together a collection of 20 edited papers based on
and knowledgeable language testing professionals. The presentations given at the 3rd international conference of
Tier 3 courses complement the Foundation Courses (Tier 1) the Association of Language Testers in Europe (ALTE) held in
and Introductory Courses (Tier 2) which are already well Cambridge in April 2008.
established. Following the conference, on 13 November, The selected papers focus on three core strands
ALTE will continue its programme of Foundation Courses addressed during the conference. Section One considers
when Annie Broadhead will run a general introduction to new perspectives on testing for specific purposes, including
language testing. the key role played by language assessment in the aviation
The call for papers for the ALTE 4th International industry, in the legal system, and in migration and
Conference to be held in Kraków, Poland from 7 to 9 July citizenship policy. Section Two contains insights on testing
2011 is already open and will run until the end of January policy and practice in the context of language teaching and
2011. We encourage you to submit a paper for the learning in different parts of the world, including Africa,
conference, and reflecting ALTE’s commitment to multi- Europe, North America and Asia. Section Three offers
lingualism, papers can be submitted in English, French, reflections on the impact of testing among differing
German, Italian, Polish and Spanish. The theme of the stakeholder constituencies, such as the individual learner,
conference is ‘The Impact of Language Frameworks on educational authorities, and society in general.
Assessment, Learning and Teaching viewed from the Key features of the volume include: up-to-date
perspectives of policies, procedures and challenges’ and information on the impact of language testing and
the plenary speakers are Professor Lyle Bachman, Professor assessment in a wide variety of social and educational
Giuliana Grego Bolli, Dr Neil Jones, Dr Waldemar Martyniuk, contexts worldwide; accounts of recent research into the
Dr Michaela Perlmann-Balme and Professor Elana Shohamy. profiling of language proficiency levels and into cheating in
For further information about these events and other ALTE tests; insights into new areas for testing and assessment,
activities, please visit the ALTE website – www.alte.org e.g. teacher certification, examinations in L2 school
systems, testing of intercultural competence; discussion of
the relationships among different test stakeholder
Studies in Language Testing constituencies.
The last 12 months have seen the publication of three more With its broad coverage of key issues, combining
titles in the Studies in Language Testing series, published theoretical insights and practical advice, this volume is a
jointly by Cambridge ESOL and Cambridge University Press. valuable reference work for academics, employers and
Volume 29, authored by Hanan Khalifa and Cyril J Weir, is policy-makers in Europe and beyond. It is also a useful
entitled Examining Reading: Research and practice in resource for postgraduate students of language testing and
assessing second language reading. This volume develops for practitioners, i.e. teachers, teacher educators,

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.
44 | C A M B R I D G E E S O L : R E S E A R C H N OT E S : I SS U E 4 2 / N OV E M B E R 2 0 1 0

curriculum developers, materials writers, and anyone Key features of the book include: an up-to-date review of
seeking greater understanding of the social and educational the literature on the development and assessment of L1
impact of language assessment. and L2 reading ability; practical guidance on how to
July 2010 saw the publication of another title in the investigate the L2 reading construct using multiple
Studies in Language Testing series, published jointly by methodologies; and fresh insights into interpreting test
Cambridge ESOL and Cambridge University Press. Volume data and statistics, and into understanding the nature of L2
32, by Toshihiko Shiotsu, is entitled Components of L2 reading proficiency. This volume will be a valuable resource
Reading: Linguistic and processing factors in the reading for academic researchers and postgraduate students
test performances of Japanese EFL learners. interested in investigating reading comprehension
This latest volume investigates the linguistic and performance, as well as for examination board staff
processing factors underpinning the reading concerned with the design and development of reading
comprehension performance of Japanese learners of assessment tools. It will also be a useful reference for
English. It describes a comprehensive and rigorous curriculum developers and textbook writers involved in
empirical study to identify the main candidate variables preparing syllabuses and materials for the teaching and
that impact on reading performance and to develop learning of reading.
appropriate research instruments to investigate these. Information on all the volumes published in the SiLT
The study explores the contribution to successful reading series is available at: www.CambridgeESOL.org/what-we-
comprehension of factors such as syntactic knowledge, do/research/silt.html
vocabulary breadth and reading speed in the second
language.

©UCLES 2010 – The contents of this publication may not be reproduced without the written permission of the copyright holder.

You might also like