Multimodality and Cognitive Linguistics
Multimodality and Cognitive Linguistics
2015. John Benjamins Publishing Company. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Account: ns335141
AN: 1076832 ; Pinar Sanz, Maria Jesus.; Multimodality and Cognitive Linguistics
EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 2/10/2023 2:18 AM via
Multimodality and Cognitive Linguistics
Special issues of established journals tend to circulate within the orbit of the
subscribers of those journals. For the Benjamins Current Topics series a number
of special issues of various journals have been selected containing salient topics of
research with the aim of finding new audiences for topically interesting material,
bringing such material to a wider readership in book format.
For an overview of all books published in this series, please see
http://benjamins.com/catalog/bct
Volume 78
Multimodality and Cognitive Linguistics
Edited by María Jesús Pinar Sanz
These materials were previously published in Review of Cognitive Linguistics 11:2
(2013).
Edited by
doi 10.1075/bct.78
Cataloging-in-Publication Data available from Library of Congress:
lccn 2015021640 (print) / 2015028826 (e-book)
isbn 978 90 272 4266 2 (Hb)
isbn 978 90 272 6801 3 (e-book)
Index 211
María Jesús Pinar Sanz is a Lecturer in Linguistics and Discourse Analysis at the
University of Castilla-La Mancha (Spain). Her research interests are in multimod-
al discourse analysis and, more specifically, in aspects related to the analysis of
election campaigns, political advertising and ethnic humour. She has published
several articles on the generic structure of political ads, ethnic humour and the
relationship between the verbal and visual elements not only in political texts but
also in children’s narratives.
Lorena Pérez Hernández, PhD, has worked as an Associate Professor at the Uni-
versity of La Rioja (Spain) since 2001. She is an Associate Editor of The Metaphor
and Metonymy Bibliography (John Benjamins), and a member of the editorial
board of Journal of English Studies. Her research has been published in inter-
national journals such as Metaphor and Symbol, Journal of Pragmatics, Applied
Linguistics, and Language and Communication. Since 1996, she collaborates as a
linguistics consultant with the marketing company Lexicon Branding, Co. (San
Francisco, USA).
functional linguistics since the early 1990s. He is currently head of the doctoral
training research group on the ‘Textuality of Film’ at the University of Bremen, as
well as several third-party funded projects on the application of linguistic meth-
ods to filmic analysis.
Charlotte Fofo Lomotey is a PhD student and a Research Assistant in the Applied
Linguistics Laboratory at Texas A&M University-Commerce. She received her
MPhil. (Applied Linguistics) at the University of Education, Winneba, in Ghana.
Prior to coming to Commerce, she taught Linguistics courses at the UEW. Her
research interests include dialectal differences, ELF/ESL, Language Documenta-
tion, Discourse and Prosody.
Salvatore Attardo holds a PhD in English Linguistics from Purdue University and
is Professor of Linguistics and Dean of the College of Humanities, Social Sciences,
and Arts at Texas A&M University-Commerce. His research is focused primarily
on humor studies and pragmatics.
This volume includes 13 papers dealing with Multimodality and Cognitive Lin-
guistics. The introduction provides an overview of three of the main approaches
dealing with multimodality – Cognitive Linguistics and multimodal metaphors
(Forceville & Urios-Aparisi, 2009), social semiotics and systemic functional lin-
guistics, and multimodal interactional analysis (Jewitt, 2009, p. 29). The paper
summarizes the contributions to the volume, highlighting the main objectives
and conclusions of each of the papers.
1. Introduction
doi 10.1075/bct.78.01pin
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
2 María Jesús Pinar Sanz
Among the current scholars from different persuasions dealing with multi-
modality, we find Gibbons (2011), who provides a set of critical tools for analyz-
ing the cognitive impact of multimodal literature. Bateman (2011) presents an
approach to analyzing page-based documents that combine text, graphics and
pictures in different layouts. In turn, Jewitt (2009) surveys a variety of theoretical
approaches which have looked at multimodal communication and representa-
tion, including visual studies, anthropology and socio-linguistics, among other
disciplines. The papers in Ventola and Moya (2009) discuss the relationship be-
tween the discourses that “tell” and visuals that “show”. Jones and Ventola (2008)
explore the ways in which multimodality influences the work of linguists, lin-
guistic description and application. O’Halloran (2011) proposes a distinct mul-
timodal studies field as both the mapping of a domain of enquiry, and as the site
of the development of theories, descriptions and methodologies specific to and
adapted for the study of multimodality. Corpus Linguistics and multimodality
are addressed in Knight (2013), who looks at possible directions in the construc-
tion and use of multimodal corpus linguistics. In fact, Sindoni (2013) reconsiders
underlying linguistic and semiotic frameworks of analysis of spoken and written
discourse, in keeping with a multimodal corpus linguistics theoretical frame-
work. Finally, within Cognitive Linguistics, the papers compiled in Forceville and
Urios-Aparisi (2009) discuss metaphors drawing on combinations of visuals, lan-
guage, gestures, sound, and music.
The innovative nature of this volume in comparison to those existing in
the field lies in the fact that it brings together contributions from three of the
main approaches dealing with multimodality – Cognitive linguistics and mul-
timodal metaphors (Forceville & Urios-Aparisi, 2009), social semiotics and sys-
temic functional grammar and multimodal interactional analysis (Jewitt, 2009,
p. 29) – highlighting the importance of multimodal resources, and showing the
close relationship between this field of study and Cognitive Linguistics applied to
a variety of genres – ranging from comics, films, cartoons, or visuals in tapestry,
to name a few.
The present volume is structured in three parts. The first one is rooted in
Cognitive Linguistics and focuses on non-verbal and multimodal metaphor – for
a state-of-the-art panorama, see the papers in Forceville & Urios-Aparisi (2009).
The second part follows Hallidayan Systemic Functional Linguistics in a double
perspective: Social semiotic multimodality and multimodal discourse analysis –
for detailed discussion see Jewitt (2009); and the third part draws upon Norris’
model of multimodal interaction (Jewitt, 2009; Norris, 2004, 2011).
CMT is ultimately a language-based theory. The author intends to aid CMT and
multimodality scholarship by examining the concept home in a variety of instan-
tiations of the metaphor purposive activity is movement toward a destina-
tion in animation films. His contribution shows that (a) analysing the metaphor
under study presupposes understanding “home” as a symbol; (b) animation has
medium-specific affordances to implement the metaphor; (c) the metaphor com-
bines embodied and cultural dimensions.
In “Woven emotions: Visual representations of emotions in Medieval English
textiles”, Díaz Vera explores how the same conceptual metaphors underlie the
expression of Old English emotions in both the language and the visual modes.
The author analyzes the pictorial representations of emotions in the Bayeux Tap-
estry, an 11th century embroidered cloth that narrates and depicts the events that
led up to the Norman Conquest of England and the invasion itself. His analysis
shows that (1) Anglo-norman artists used a well-organised set of visual stimuli to
convey emotion-related meanings in a patterned way, that (2) the same idealized
conceptual models are shared by verbal and visual modalities and that (3) where-
as verbal expressions of emotions regularly draw on non-embodied behavioural
concepts, visual representations show a clear preference for embodied container
concepts.
In “Approaching the utopia of a global brand: the relevance of image schemas
as multimodal resources for the branding industry”, Pérez Hernández explores
the relevance of image schemas and related multimodal image schematic meta-
phors and metonymies in the branding industry. The author argues that image
schemas represent an efficient cognitive tool for the purpose of creating global
brands, as they have an experiential basis and are largely pervasive across cultures
and languages. In addition, she argues that the universal nature of image schemas
can be maximized through their multimodal expression. Her main claim is that
the multimodal and systematic use of image schemas in the process of brand cre-
ation and the final output may provide branding professionals with an inventory
of sound and ready-to-use multimodal resources for the design of global brands.
Diana E. Popa’s contribution “Multimodal metaphors in political entertain-
ment” attempts to shed light on the issue of multimodal metaphor in political
entertainment, with special attention to the ways in which the verbal, visual and
auditory modalities employed contribute to the construal of the multimodal met-
aphor and the functions of multimodal metaphors in animated political cartoons.
The paper deals with the way entertaining politics relies on multimodal meta-
phors to (a) explain the significance of real life events and characters through the
means of imaginary scenarios, (b) persuade people, (c) propagate a critical stance
towards somebody or something, and (d) provide information about political is-
sues, events and players that no other medium could openly transmit.
The second part of the volume delves into multimodality and its relationship with
Cognitive Linguistics and Systemic Functional Linguistics (henceforth SFL). Al-
though it may be argued that the latter approach “is in many respects too heavily
biased by its roots in linguistics” (Forceville, 2010, p. 59), recent studies show
that there have been significant advances in the development of other modalities
apart from language in Systemic Functional Linguistics (Böck & Pachler, 2013;
Jewitt, 2009; Jones & Ventola, 2008; O’Halloran, 2011; Ventola & Moya, 2009,
among others). The papers in this section deal with the construction of visual
metaphor from a Social Semiotic Approach, the use of semiotic metaphors and
visual metonymies within the framework of Systemic Functional Linguistics, the
cognitive mechanisms involved in the creation and interpretation of multimodal
texts and the way in which different semiotic channels provide different kinds of
information.
The first contribution in this section, “The visual representation of metaphor:
A social semiotic approach”, by Feng and O’Halloran, combines El Refaie’s (2003)
views of visual metaphor as “the pictorial expression of metaphorical thinking”
and Carroll’s (1996) and Forceville’s (1996) definition of visual metaphors in terms
of “their surface realization or formal characteristics”. While Feng and O’Halloran
agree with both definitions, they further ask (a) how metaphors are visually ex-
pressed or realized, (b) what the metaphor resources in visual images are, and
(c) how these resources work to construct metaphor. The authors consider that
visual images do not build spatial relations but are complex metafunctional con-
structs (Halliday & Matthiessen, 2004), integrating representational, interactive
and compositional meanings (Kress & van Leeuwen, 2006). In their approach, the
meta-functional resources are seen as metaphor potential and they explore how
these construct pictorial metaphors. Feng & O’Halloran employ the social semi-
otic theory of intersemiotic relations to explain the complex image-text interac-
tion in visual metaphor, and conclude that (a) social semiotic visual grammar
can provide a comprehensive account of the visual construction of metaphor, and
(b) conceptual metaphor theory lends epistemological status to such a grammar.
While Feng and O’Halloran explore the way visual metaphors are construct-
ed, in “Visual metonymy in children’s picture books”, Moya explores visual me-
tonymies, the other main trope within Cognitive Linguistics alongside metaphor.
After outlining the main features of the concept of visual metonymy, the author
examines the discourse functions of the metonymies and interprets the data in
functional terms. The data make it evident how visual metonymies are useful
strategies to convey representational meaning and create engagement in picture
books. All in all, the aim of the paper is to show how the use of visual metonymies
The third part focuses on Cognitive Linguistics and multimodal interaction and is
based on the analysis of human interaction. Multimodal interaction is concerned
with what individuals express and how others react or perceive in interaction
(Norris, 2004, p. 4; Norris, 2011). Norris studies the embodied (language, gesture
and gaze) and disembodied modes (music, print and layout) used by people in
interaction. She claims that human interaction can’t be explained if the human
mind is not taken into consideration since “a person always thinks, perceives,
and/or feels something when interacting with others, and at least some of these
thoughts, perceptions, and/or feelings are communicated through a person’s ac-
tions” (Norris, 2004, p. xi). The papers in this section will show that different
modes of communication are structured in different ways. Thus, some of the is-
sues explored are the perception and marking of humour in conversational texts,
human perception and the intersubjective coordinated patterns that move hu-
mans to interaction.
In López-Varela’s “Intermedial Cognitive Semiotics: some examples of multi-
modal cueing in virtual environments”, intermediality is studied from cognitive-
semiotic concerns and insights from digital environments. Human perception
is revised as well as the role of shared attention in communication. The paper
explores spatial and temporal cueing (eye-contact and the sonic modality) from
a task oriented and social interactive dimension that highlights their importance
in intersubjective communication. The importance of multimodal mirror-neuron
mappings, index assignment and pointers in cognition and discourse, and the
role of affective phenomena in engaging intersubjectivity are highlighted. The ex-
amples proposed explore vision and sound in online collaboration and show the
importance of mediating channels on the spatiotemporal axis of perception. This
is relevant as the slightest cues can have significant impact on communicative
situations.
Attardo, Pickering, and Taherzadeh’s paper “Multimodality in conversational
humor” addresses the issue of multimodal markers of humour in conversational
texts. Their paper seeks to determine whether humor is marked in the texts under
analysis and how. In particular, the paper examines the hypothesis put forth in
Attardo, Pickering, and Baker (2011) that the only consistent marker of conver-
sational humor is smiling. A further hypothesis is investigated: the “marking” of
humor consists of a combination of multimodal items such as prosodic, gestural
and facial expressions having the purpose of framing the interaction as humor-
ous, rather than mechanically “marking” the occurrence of humor. This paper fills
a gap in the field, since very little has been written about the prosodic and multi-
modal markers of humor, with the exception of the markers of irony. The tentative
conclusion is that the prosodic features investigated and smiling/laughter are not
markers, both because they are not consistently associated with the phenomenon
and because they lack integration.
Finally, in “Image schemas and mimetic schemas in cognitive linguistics and
gesture studies”, Cienki explores how cognitive linguistics and gesture studies ap-
proach the study of schemas. This author states that, rather than simply applying
the theoretical constructs form cognitive linguistics to gestures as data, the ges-
ture research raises new questions for schema research as well as provides new
insights into the role of schemas in cognition. The distinction between image and
mimetic schemas is looked into as well as the way in which each kind supports
lexical semantic analysis of a different kind. The paper explores to what degree
image schemas provide a useful explanatory tool for researching the concrete,
physically embodied details of gestures. The author considers that the research on
‘mimetic schemas’ has a great potential for thinking about some known phenom-
ena of gesture in a new way, and thus schema research provides a useful means
to analyze behavior in another modality involved in spoken language use, namely
the visual.
The collection of papers in this volume are a step towards developing the rela-
tionship between multimodality in its three main forms – social semiotic analysis
and multimodal discourse analysis, multimodal interactional analysis and mul-
timodal metaphors – and cognitive linguistics. Attention has been focused on
language and images, but also on gestures, posture, gazes or voice quality, among
others, how they interplay and the final effect of the interaction of the different
modes, since they are sometimes integrated in unprecedented ways, enacting new
interactional patterns and new systems of interpretation. The contributors have
discussed the need to integrate multimodality in its various forms with cognitive
linguistics in a variety of genres and situations and have highlighted the impor-
tance of studying contextual, socio-cultural backgrounds in both verbal and non-
verbal manifestations.
References
Attardo, S., Pickering, L., & Baker, A. (2011). Prosodic and multimodal markers of humor in
conversation. Pragmatics and Cognition, 19(2), 224–247. DOI: 10.1075/pc.19.2.03att
Bateman, J. (2011). Multimodality and genre: A foundation for the systematic analysis of multi-
modal documents. Basingstoke: Palgrave.
Böck, M., & Pachler, N. (2013). Multimodality and social semiosis: Communication, meaning
making, and learning in the work of Gunther Kress. London/New York: Routledge.
Carroll, N. (1996). A note on film metaphor. Journal of Pragmatics, 26, 809–822.
DOI: 10.1016/S0378-2166(96)00021-5
Cognitive Linguistics
and multimodal metaphor
Elisabeth El Refaie
Cardiff University
This article uses examples of multimodal metaphors from three different genres
in order to develop a new understanding of the nature of creativity in metaphor.
I argue that multimodality provides distinctive opportunities for metaphor cre-
ativity by exploiting the unique affordances of the different semiotic modes and
the possibility of combining them in unexpected ways. Such innovation at the
level of representation may encourage novel thought patterns, I suggest, even
in such cases where the underlying metaphorical mappings are relatively con-
ventional. The notion of “cross-modal resonances” is introduced to emphasize
the role of unconscious, preverbal, intuitive understanding and the emotions in
producing and interpreting creative multimodal metaphors.
1. Introduction
doi 10.1075/bct.78.02elr
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
14 Elisabeth El Refaie
2009; Kress & van Leeuwen, 2001), I suggest that multimodality dramatically in-
creases the opportunity for creativity at the level of representation, by exploit-
ing the distinct characteristics and meaning potentials of the various modes and
their combinations. The resulting “cross-modal resonances”, I argue, may encour-
age new insight, but this insight is often of a preverbal, emotional, and intuitive
nature, rather than involving logical processes of mapping knowledge from one
conceptual domain to another.
The second section of this article uses several examples to illustrate and de-
velop my central arguments. In keeping with the idea that creativity enables us
to break out of the mental prison imposed by entrenched patterns of thinking,
my examples of the multimodal metaphors used in an autobiographical comic,
a television commercial, and a political cartoon all use the source domain of in-
carceration to re-conceptualize a particular area of human experience. While the
underlying metaphorical mappings of my examples are all rather conventional,
the particular ways in which they are expressed are unique and highly creative.
death in terms of sleeping – “To sleep? Perchance to dream!” (III, 1, 65) – William
Shakespeare can be said to be using a conventional mapping but extending it to
include the concept of death as a form of nightmare. Elaboration refers to cases
where existing elements of a conventional metaphor are specified in an unusual
way, for example, when a dangerous but exhilarating period in a person’s life is
described as tobogganing down a sheer mountain face. Novel combinations of
two or more metaphors provide another important means of achieving creativity
in poetry.
According to Lakoff & Turner (1989, p. 89–96), most examples of metaphor
creativity fall into the three categories described above, but they also discovered
some cases of “one-shot” image metaphors, which work by superimposing one
image onto another in the mind of the reader. The mental imagery that results
from such metaphors can be more or less creative, depending on whether or not
it involves the mapping of visual properties that are normally perceived to be
similar. When, for instance, a woman’s body is likened to an hourglass, the result-
ing mental image is conventional, whereas comparing it to the trunk of a weeping
willow would result in a more original visual mapping.2
Lakoff & Turner’s claims regarding the essentially conventional basis of most
poetic metaphors have been criticized for failing to pay sufficient attention to the
essential role of linguistic and textual choices in people’s perceptions of meta-
phor creativity. One of the main premises of critical approaches towards language
and discourse is that our representations of the world have constitutive effects,
in the sense that they shape the way we attend to and understand social reality
(e.g. Mayr, 2008). If we apply the same argument to the analysis of metaphor, we
need to accept that “how we think about a given topic is altered by the metaphors
we regularly hear and employ, so that our mental representations are not whol-
ly antecedent to and independent of metaphorical talk” (Camp, 2006, p. 160).
Thus, the idea that we can limit our consideration of metaphor creativity to the
level of conceptual mappings becomes untenable. Instead, we must accept that
the particular form of a metaphor may have profound effects upon the ways in
which it is conceived and understood. As Semino (2008) points out, even the
most entrenched conceptual metaphor may be reactivated by a uniquely creative
word or by novel patterns of related expressions that may occur within a particu-
lar passage, throughout a text, or across different texts. Metaphor creativity thus
“needs to be considered both in terms of the novelty or otherwise of underlying
conceptual mappings, and in terms of the salience and originality of individual
metaphorical choices and patterns” (Semino, 2008, p. 54).
The opportunities for creativity are vastly increased in the case of multimodal
metaphors. Because of differences in the potentials and limitations of their mate-
rial properties and the way they have been used over many generations in specific
cultures and contexts, semiotic modes have developed distinct ways of expressing
similar meanings, as well as displaying a tendency towards specialization (Kress,
2009; Kress & van Leeuwen, 2001). Language is perhaps more suited to the rep-
resentation of actions and causality, for instance, while the spatial organization of
images may “lend itself with greater facility to the representation of elements and
their relation to each other” (Kress, 2000, p. 147). In Western culture, the word-
image distinction has also long been freighted with strong value judgments, with
writing being associated with high culture and learning, and images with popular
culture and illiteracy.
For these reasons, meanings from one mode cannot be translated exactly into
another: “No text has the exact same set of meaning-affordances as any image.
No image or visual representation means in all and only the same ways that some
text can mean” (Lemke, 2002, p. 304). It is this essential “incommensurability” of
semiotic modes that allows genuinely new meanings to be created through their
combination. According to Lemke (1998, p. 92), when several semiotic modes are
brought together, the possible meanings are thus multiplied rather than simply
added together.
The affordances of the different modes and the meanings that can be created
through their combination clearly have a bearing on the creation and interpreta-
tion of metaphors. For example, pictures are able to exploit more effectively than
language the visual similarities that may exist between the size, shape, texture, or
color of two entities. Consequently, the “image metaphors” that Lakoff & Turner
(1989) have identified as the most creative instances of poetic metaphor are not
at all unusual in visual genres.3 Indeed, metaphors of the concrete is concrete
variety are prevalent in all non-verbal semiotic modes, where perceptual resem-
blance (i.e., looking, sounding, smelling, feeling similar) often provides the cue
for constructing metaphorical meaning (Forceville, 2009, p. 27).
Apart from perceptual resemblance, we can distinguish between at least the
following other forms that non-verbal or multimodal metaphors may take: Fu-
sion/simultaneous cueing refers to cases where the source and the target of a meta-
phor are fused together into one amalgamated whole, or, in a temporal mode such
as sound or moving images, presented at the same moment. Many multimodal
metaphors are also based on some form of incongruity, where an object or living
being appears to be out of place in its context, thereby challenging expectations
of the “real” or “natural”. The term perceptual echo, finally, may be used to refer to
instances of metaphorical meaning that emerges from the representation of one
entity in a way that strongly calls to mind a different entity (terms adapted and
extended from Forceville, 2009).
The multiplication of meanings that can be achieved through the combination
of semiotic modes is also often exploited in the construction and interpretation of
multimodal metaphors. For instance, one mode may represent the source and tar-
get domains of a metaphor on its own, rendering the other mode(s) wholly or at
least partially redundant. More often, however, the other modes have an augment-
ing or modifying effect, in that they are able to reinforce metaphorical meaning, or
change it by drawing in additional connotations. In complementary relationships
between the semiotic modes, each supplies essential information about either the
source or the target of a metaphor. The many possibilities of evoking metaphori-
cal meanings through the combination of different semiotic modes, each with
their own, unique affordances, thus provide countless opportunities for metaphor
creativity at the level of representation, which, I suggest, may also enable new
connections to be made at the conceptual level.
Rachel Giora’s work (Giora, 1999; Giora et al., 2004) provides a useful way of
theorizing the role of creativity at the expression level of metaphors. In a series
of empirical studies, Giora et al. (2004) discovered that the degree of pleasure
we derive from both verbal and visual metaphors depends on whether they are
“optimally innovative.” Optimal innovation involves the automatic recovery of a
familiar, salient meaning, which is at the forefront of our mind due to frequency,
familiarity, conventionality, and/or prototypicality, while also inviting a non-sa-
lient, qualitatively different interpretation: “It is not a sheer surprise, then, that is
pleasing, but a somewhat novel response assignable to or involving a salient, alto-
gether different response that is gratifying” (Giora et al., 2004, p. 118). This strat-
egy can be considered as a way of “foregrounding” or “estranging” entrenched
salient responses, thereby opening metaphors up to new meaning (Giora et al.,
2004, p. 120). The choice of an unusual form to express a metaphor may thus
be seen as an effective way of foregrounding conceptual mappings that are so
entrenched that they may otherwise be overlooked. This allows the creators of
multimodal metaphors to achieve optimal innovation and encourage increased
mental activity on the part of their audiences.
As mentioned above, the mental activity involved in understanding meta-
phors is described by CMT scholars as a form of “mapping.” This notion, which
is itself a metaphor, suggests a rather mechanistic thought process, whereby each
element of a set is matched up with an element in another set, following certain
logical rules of correspondence. In my view, the intuitive flash of understanding
involved in the interpretation of creative metaphors is captured better through
Max Black’s (1979, p. 20) concept of “resonance,” which, in its most common lit-
eral meaning, refers to the “sound produced by a body vibrating in sympathy
with a neighbouring source of sound” (Collins English Dictionary, p. 1376).4 In
the following discussion I will use the term “multimodal resonances” to describe
the way creative multimodal metaphors are often grasped intuitively and imag-
inatively, through a process that involves a sort of sympathetic vibration, both
between the source and target domain and between the distinct semiotic modes
that are used to represent a metaphor.
At first sight, the metaphor Satrapi employs in this example seems to follow a
highly conventional pattern, since it is not at all unusual for people to understand
the emotional experience of being in an unhappy, constraining relationship in
terms of the more concrete, physical experience of incarceration. On closer in-
spection, however, it turns out to be a lot more creative than it would have been if
the author had expressed the same idea in a purely verbal form.
One way of determining how creative a metaphor is, Hanks (2006) suggests,
is by examining the degree to which the two concepts share semantic properties.
Metaphors that bring together a concrete and an abstract concept, for example,
are always more resonant than those where target and source are closely related,
because the reader must work harder to establish a relevant interpretation. Images
differ from words in that it is simply not possible to represent abstract meaning
visually without recourse to some form of symbolism or metaphor (El Refaie,
2009a, p. 177). In our example (Plate 1) the target domain – the concept of an un-
happy marriage – could not have been expressed literally in pictorial form at all.
The full metaphorical meaning of the drawing only emerges in unison with the
verbal narration, which supplies the target of the metaphor and thus performs a
complementary function. To be precise, the same source domain of incarceration
is first used to represent Marjane’s feelings towards the couple’s shared apartment,
but then, in a second process of mapping, the target of the metaphor shifts onto
the more abstract level of her marriage to an Iranian man and all the cultural
expectations that this entails. So, in the space of just one panel, the conceptual do-
mains that are being brought together in this metaphor have been slightly modi-
fied, moving from a closer to a more distant semantic relation between source and
target, and thereby, according to Hank’s (2006) theory, acquiring increasing reso-
nance. This may invoke in the reader a range of associations between the home,
personal relationships, and cultural traditions on the one hand and incarceration
on the other.
Metaphor creativity is also related to meaning change over time. In verbal
metaphors, expressions whose meaning was originally acquired through a figura-
tive process gradually lose their metaphorical meaning as they become lexical-
ized, moving from being “active,” i.e. having no fixed meaning and being able to
generate lots of different resonances, over “inactive”, where metaphorical mean-
ing may still be switched on in particular contexts, to “dead”, where the original
metaphorical meaning of a word is no longer accessible to the average speaker of a
language (Goatly, 1997, pp. 30–38). A similar mapping between moral constraints
and physical confinement can thus be expressed in a conventional form, by talk-
ing about being caught or trapped in a relationship, but it can also be discussed in
more unusual language, by describing a marriage as being thrown into a deep, dark
dungeon, for instance.
clearly played a key role in the emergence of the prison scenario: Without the
perceptual resemblance cues provided by the grill-like visual appearance of the
cereals, the concept of locking hunger up inside a foodstuff would probably not
have occurred to the producers of this advert. It would also have made a lot less
sense to audiences and is unlikely to have achieved optimal relevance.
My final example, a political cartoon by the American artist Clay Bennett, il-
lustrates the creative potential of multimodal metaphors particularly well. Gener-
ally, the purpose of a political cartoon is to represent an aspect of social, cultural,
or political life in a way that condenses reality and transforms it in a striking,
original, and/or humorous way (El Refaie, 2009b). In one of Bennett’s cartoons
(Plate 2), the letters spelling out of the name that was given to the 2003 invasion of
Iraq by the United States military, “Operation Iraqi Freedom,” have become part
of the picture plane. A shadowing figure looks out at us through the gaps formed
by the letters, as if peering through the barred windows of a prison. It is just pos-
sible to make out the words “Abu Ghraib Prison” on the man’s shirt.
Part of the creative impact of the cartoon comes from the way Bennett ex-
ploits and subverts ingrained assumptions about the differences between the ver-
bal and the visual mode. On the face of it, the distinction between words and
images is perfectly straightforward, but it is becoming increasingly clear that the
boundaries between the two modes are in fact fuzzy. As Mitchell’s (2009, p. 118)
concept of imagetext suggests, much visual art includes writing in some form,
and, by the same token, all texts “incorporate visuality quite literally the moment
they are written or printed in visible form.” The size, weight, form, and regularity
of type often convey a vast amount of connotative meaning. In Bennett’s cartoon
(Plate 2) the typography suggests the kind of stenciled writing, much favored by
the military, which can be found on crates and packing cases. It also demonstrates
that written words can assume more explicit pictorial qualities, thereby remind-
ing us of the artificiality of the word/image dichotomy and, by extension, of all
rigidly dichotomous thinking.
The most interesting thing about this cartoon is that its central metaphor is
impossible to translate into words. We might say that it represents a political slo-
gan as a form of prison in order to castigate the hypocrisy inherent in the professed
aim of the US government and military to liberate the Iraqi people while robbing
many individuals of their freedom and human rights. However, such paraphrases
do not even come close to the shock of recognition that people are likely to expe-
rience upon viewing this cartoon. In my view, this is because the metaphor works
not so much by initiating logical thinking, but rather by encouraging pre-verbal,
intuitive cross-modal resonances in the mind of the reader.
4. Conclusion
Artists, authors and scientists often describe how their most creative insights seem
to come out of nowhere, when they are least expecting them; indeed, conscious-
ness actually seems to hamper their ability to form novel connections (Gibbs,
2011; Tardif & Sternberg, 1988). Similarly, the interpretation of metaphors typi-
cally relies on our human ability to grasp creative metaphorical meaning intui-
tively and imaginatively, often involving several of our senses. For this reason, the
full breath of a creative metaphor’s meanings cannot be captured fully by trying
to translate it into an explicit comparative statement:
When the metaphor is paraphrased or replaced, whatever had been extralingual,
unconscious, and therefore potentially new and alive in the collision of these two
entities gets reconstructed, this time in terms only of what is familiar. The point
of metaphor is to bring together the whole of one thing with the whole of another,
so that each is looked at in a different light. (McGilchrist, 2009, p. 117)
Notes
1. Conceptual blending theory (Fauconnier & Turner, 2003) is heavily indebted to Koestler’s
work.
2. There is a lot of anecdotal and empirical evidence for the important role of visualization
in creative thinking (Finke, Ward, & Smith, 1992, pp. 45–63). Other empirical research (Gibbs
& Bogdonovich, 1999) has apparently confirmed Lakoff & Turner’s (1989) intuition that “one-
shot” image metaphors do not usually persist as part of people’s ordinary conceptualization of
their experience.
3. A large proportion of metaphors in advertising (Forceville, 2009, p. 28) and film (Rohdin,
2009, p. 422) are of this type, for example.
4. Black (1979, p. 27) originally used “resonance” to describe instances of metaphor that “sup-
port a high degree of implicative elaboration.” Hanks (2006) has also adopted this terminol-
ogy.
References
Black, M. (1979). More about metaphor. In A. Ortony (Ed.), Metaphor and thought (pp. 19–43).
Cambridge: Cambridge University Press.
Camp, E. (2006). Metaphor in the mind: The cognition of metaphor. Philosophy Compass,
1(2), 154–170. Available at: http://upenn.academia.edu/ElisabethCamp/Papers/1047417/
Metaphor_in_the_Mind_The_Cognition_of_Metaphor1 (accessed 10.11.2012).
DOI: 10.1111/j.1747-9991.2006.00013.x
Collins English Dictionary (2005). 7th edition. Glasgow: HarperCollins.
Crisp, P. (2008). Between extended metaphor and allegory: Is blending enough? Language and
Literature, 17(4), 291–308. DOI: 10.1177/0963947008095960
de Bono, E. (1990). Lateral thinking: A textbook of creativity. London: Penguin.
El Refaie, E. (2003). Understanding visual metaphor: The example of newspaper cartoons. Vi-
sual Communication, 2(1), 75–96. DOI: 10.1177/1470357203002001755
El Refaie, E. (2009a). Metaphor in political cartoons: Exploring audience responses. In
C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 173–196). Berlin/New
York: Mouton-de Gruyter.
El Refaie, E. (2009b). Multiliteracies: How readers interpret political cartoons. Visual Commu-
nication, 8(2), 181–205. DOI: 10.1177/1470357209102113
El Refaie, E. (2012). Autobiographical comics: Life writing in pictures. Jackson: University Press
of Mississippi. DOI: 10.14325/mississippi/9781617036132.001.0001
Fauconnier, G., & Turner, M. (2003). The way we think: Conceptual blending and the mind’s
hidden complexities. New York: Basic Books.
Finke, R.A., Ward, T.B., & Smith, S.M. (1992). Creative cognition: Theory, research, and applica-
tions. Cambridge/ London: Massachusetts Institute of Technology.
Fludernik, M. (Ed.). (2011). Beyond cognitive metaphor theory: Perspectives on literary meta-
phor. New York/London: Routledge.
Forceville, C. (2009). Non-verbal and multimodal metaphor in a cognitivist framework: Agen-
das for research. In C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 19–
42). Berlin/New York: Mouton de Gruyter. DOI: 10.1515/9783110215366
Forceville, C., & Urios-Aparisi, E. (Eds.). (2009). Multimodal metaphor. Berlin/New York:
Mouton de Gruyter. DOI: 10.1515/9783110215366
Gibbs, R.W. Jr. (2011). Are ‘deliberate’ metaphors really deliberate? Metaphor and the Social
World, 1(1), 26–52. DOI: 10.1075/msw.1.1.03gib
Gibbs, R.W. Jr. & Bogdonovich, J. (1999). Mental imagery in interpreting poetic metaphor. Met-
aphor and Symbolic Activity, 14(1), 37–44. DOI: 10.1207/s15327868ms1401_4
Giora, R. (1999). On the priority of salient meanings: Studies of literal and figurative language.
Journal of Pragmatics, 31(7), 919–929. DOI: 10.1016/S0378-2166(98)00100-3
Giora, R., Fein, O., Kronrod, A., Elnatan, I., Shuval, N., & Zur, A. (2004). Weapons of mass dis-
traction: Optimal innovation and pleasure ratings. Metaphor and Symbol, 19(2), 115–141.
DOI: 10.1207/s15327868ms1902_2
Goatly, A. (1997). The language of metaphors. London/New York: Routledge.
DOI: 10.4324/9780203210000
Hanks, P. (2006). Metaphoricity is gradable. In A. Stefanowitsch & S. Gries (Eds.), Corpus-based
approaches to metaphor and metonymy (pp. 17–35). Berlin: Mouton de Gruyter.
Koestler, A. (1964). The act of creation. New York: Penguin Books.
Kövecses, Z. (2002). Metaphor: A practical introduction. Oxford: Oxford University Press.
Kövecses, Z. (2005). Metaphor in culture: Universality and variation. Cambridge: Cambridge
University Press. DOI: 10.1017/CBO9780511614408
Kress, G. (2000). Text as the punctuation of semiosis: Pulling at some of the threads. In U.H.
Meinhof & J. Smith (Eds.), Intertextuality and the media: From genre to everyday life
(pp. 132–154). Manchester: Manchester University Press.
Kress, G. (2009). What is mode? In C. Jewitt (Ed.), The Routledge handbook of multimodalanal-
ysis (pp. 54–67). London/New York: Routledge.
Kress, G., & van Leeuwen. T. (2001). Multimodal discourse: The modes and media of contempo-
rary communication. London: Arnold.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago/London: University of Chicago
Press.
Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago/
London: University of Chicago Press. DOI: 10.7208/chicago/9780226470986.001.0001
Lemke, J.L. (1998). Multiplying meaning: Visual and verbal semiotics in scientific text. In J.R.
Martin & R. Veel (Eds.), Reading science: Critical and functional perspectives (pp. 87–113).
London: Routledge.
Lemke, J.L. (2002). Travels in hypermodality. Visual Communication, 1(3), 299–325.
DOI: 10.1177/147035720200100303
Mayr, A. (2008). Language and power: An introduction to institutional discourse. London/New
York: Continuum.
McGilchrist, I. (2009). The master and his emissary: The divided brain and the making of the
western world. Yale: Yale University Press.
Mitchell, W.J.T. (2009). Beyond comparison. In J. Heer & K. Worcester (Eds.), A comics stud-
iereader (pp. 116–123). Jackson: University Press of Mississippi.
Musolff, A. (2006). Metaphor scenarios in public discourse. Metaphor and Symbol, 21(1), 23–
38. DOI: 10.1207/s15327868ms2101_2
Ortony, A. (1979). Metaphor, language, and thought. In A. Ortony (Ed.), Metaphor and thought
(pp. 1–16). Cambridge: Cambridge University Press.
Rohdin, M. (2009). Multimodal metaphor in classical film theory from the 1920s to the 1950s.
In C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 403–428). Berlin/
New York: Mouton de Gruyter.
Satrapi, M. (2006). Persepolis. London: Jonathan Cape.
Semino, E. (2008). Metaphor in discourse. Cambridge: Cambridge University Press.
Steen, G. (2008). The paradox of metaphor: Why we need a three-dimensional model of meta-
phor. Metaphor and Symbol, 23(4), 213–241. DOI: 10.1080/10926480802426753
Tardif, T.Z., & Sternberg, R.J. (1988). What do we know about creativity? In R.J. Sternberg
(Ed.), The nature of creativity: Contemporary psychological perspectives (pp. 429–440).
Cambridge: Cambridge University Press.
Thibodeau, P.H., & Boroditsky, L. (2011). Metaphors we think with: The role of metaphor in rea-
soning. PLOS one, 6(2), 1–11. Available at: http://www.plosone.org (accessed 11.10.2011).
Charles Forceville
University of Amsterdam
1. Introduction
The journal Metaphor and Symbol (formerly called Metaphor and Symbolic Activ-
ity) has in the more than 25 years of its existence been true to one half of its name
by publishing a vast number of papers with the word (or root) “metaphor” in the
title. In fact, a count of titles including the word or root “metaphor-” at least once
in the first 25 volumes (1986–2010) yields no less than 275 instances. By contrast,
“symbol” or one of its derivations occurs only 8 times in that same period.1
Since the very name of the journal suggests that “metaphor” and “symbol” are
closely related tropes, this is a somewhat surprising finding. Perhaps one reason
for the scarcity of work on symbolic activity is that Conceptual Metaphor Theory
(CMT), with its mission to lay bare structural “metaphors we live by” (Lakoff &
doi 10.1075/bct.78.03for
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
28 Charles Forceville
given (sub)cultural community, to stand for a target domain A (see e.g., Beckson
& Ganz, 1975, p. 246; Wales, 2001, p. 379). Some well-known examples are the
following: a rose stands for love, a cross for suffering, a skull for death, an hour-
glass for mortality. In these examples, B stands in a metonymical relationship
to A, where in contrast to metaphor both target and source belong to the same
semantic domain. The lover gives (red) roses to his beloved; Christ died on the
cross; the skull is a part of the human body’s remains after death; and the hour-
glass visualizes the passing of time. I suspect that most symbols are rooted in
metonymy rather than in arbitrary convention.
To avoid terminological confusion, it is crucially important to emphasize
that I use the word “symbol” for a sign that typically exemplifies a non-arbitrary
relation between signifier (e.g., cross) and signified (suffering). By contrast, in
his famous triad “icon-index-symbol” Charles Sanders Peirce reserves the word
“symbol” precisely for signs with an arbitrary relation between signifier and sig-
nified. In his theory, the verbal signifier “dog” refers to the signified (or concept)
dog purely by an arbitrary convention adopted in the English language. Given
my view that a symbol is a special kind of metonym, my “symbol” is closer to
Peirce’s “index.” Whether symbolism can also be based on an arbitrary link be-
tween source and target is a difficult question. After all, what appears as arbitrary
may once have been a motivated, metonymic connection that is now no longer
retrievable. (Has the metonymic motivation for the one-time symbolizing of gay-
ness by wearing a single earring been lost, or was it a symbol arising out of an
arbitrary convention in the first place?)
If this reasoning is correct, we could say that – always: within a given cultural
group – in symbolism one metonym of a concept has become so salient at the ex-
pense of other (existing or unrealized or possible) metonyms of that concept that
this privileged metonym suffices to evoke that concept on its own, even with no
or minimal context. The test for this is to provide the members of a cultural group
(country, club, party, gang …) with the word for the metonym (“cross,” “rose”)
and ask them to provide connotations. If the members of the group significantly
often mention the same connotations, the metonym can be said to serve as a sym-
bol for that salient connotation (“[Christ’s] suffering,” “[romantic] love”).
3. House/home as symbol
animals or unfriendly fellow humans. A suitable house thus helps human beings
to survive. Inasmuch as houses are often places where humans live together in
groups, often as (extended) families, houses are typically places where they live
out, or perform, a large part of what they consider their identities. Intimate re-
lationships flourish or derail in houses, one entertains friends there, and people
are born, love, and die in houses. The connotations of a house as a place where
one can be oneself, where important events take place, and where one feels safe,
adhere more specifically to the concept that, in English, is referred to by the word
“home”: “home” = “house” + positive connotations. This transpires from expres-
sions such as “my home is my castle,” “home is where the heart is,” “there’s no
place like home,” “make yourself at home,” and “East, west, home’s best.” In short,
human beings strive to have some sort of house-as-home. (Even though other
languages do not make the house/home distinction, I will henceforward assume
that the positive connotations adhering to “house” are widespread, possibly uni-
versal; I will thus go on using the word “home” as shorthand for “house + positive
connotations”). The material conditions of such a home can differ: it can be made
of stone, wood, clay, ice, or cloth; it can be a hut or a mansion; and while homes
are usually man-made, existing natural conditions such as caves or bowers can be
adapted to function as homes, too.
The house-as-home is often used as a symbol for safety, intimacy with kin
and friends, and thus for experiencing a sought-after identity. In this paper I will
examine the metaphor SEARCHING FOR ONE’S IDENTITY IS LOOKING FOR A HOME,
which is a special case of the more general metaphor PURPOSIVE ACTIVITY IS
MOVEMENT TOWARD A DESTINATION. The popular version of this latter is X IS A
JOURNEY – where X can for instance be LIFE, A RELATIONSHIP, or A CAREER. The
JOURNEY metaphor is probably one of the most deep-rooted metaphors in human
thinking (see Forceville, 2006, 2011a, 2011b; Forceville & Jeulink, 2011; Johnson,
1987; Katz & Taylor, 2008; Ritchie, 2008; Yu, 2009). Here, my central claim is that
the search for a/the home has such strong symbolical connotations that artistic
discourses exemplifying it evoke the metaphor SEARCHING FOR ONE’S IDENTITY
IS LOOKING FOR A HOME.
In Max Black’s (1979) terms, SEARCHING FOR ONE’S IDENTITY IS LOOKING FOR
A HOME would be a “strong” metaphor: it is emphatic in that it would be very diffi-
cult to replace the “home” part of the source domain by another concept without
affecting the potential mappings from source to target, and it is resonant in that
it allows for a wide range of mappings. These mappings in most contexts do not
consist of isolated features, but of structured networks of features in which the
relations between the features are co-mapped (this is discussed in terms of “struc-
ture mapping” by Dedre Gentner; see e.g., Gentner & Jeziorski, 1993, p. 448; I
will here assume that connotations, too, can be part and parcel of “structure map-
pings”). It is to a considerable extent the mappability of these relations between
the pertinent features in the source domain that make the metaphor emphatic.
Thus the source domains LOOKING FOR/GOING TO CHURCH/THE OFFICE/THE MU-
SEUM, for instance, while all potentially giving rise to emphatic metaphors in their
own right, cannot serve as replacements to LOOKING FOR A HOME because what
people typically do at home is very different from what they do at these other
buildings. Another way of putting this is that the symbolic connotations (if any)
evoked by these other buildings do not coincide with those of “home.”
4. Case studies
As in Forceville and Jeulink (2011) and Forceville (2011b), the case studies ana-
lyzed here are animation films. One reason for focusing on film is CMT’s central
tenet of “embodied cognition,” which entails that humans typically conceptualize
the abstract in terms of the concrete – where the concrete is that which is percepti-
ble or pertains to the body’s motor functions (Forceville 2011a, p. 282). However,
hitherto little work using the CMT framework has been applied to film (exceptions
are Coëgnarts & Kravanja, 2012; Fahlenbrach, 2007, 2010; Forceville, 2006, 2011a;
Kappelhoff & Müller, 2012; Yu, 2009). Animation is a specifically interesting type
of film because the visuals of animation usually are entirely made (rather than the
result of registering a pro-filmic reality), and can for instance easily make use of
transformation and exaggeration without requiring a realistic motivation. They
are thus to an unusually large extent under the control of the creator. This enables
the exploitation of embodied schemata such as BALANCE, PATH and CONTAIN-
MENT for metaphorical purposes more easily than for instance in live-action film.
Since in terms of resources (money, time), the making of animation is moreover a
costly procedure, it is a medium that requires careful planning of each detail that
is to end up in the final film. Perhaps more than in live-action photography or
film, in animation (as in comics) we are encouraged to find each single element
meaningful. A further reason for the focus on animation films is that they are
often short (say, between 1 and 15 minutes): meaning appears in condensed form.
Finally, short animation films often have no language, so that demonstrating how
structural metaphors are the motor for their interpretation helps show that “the
locus of metaphor is thought, not language” (Lakoff, 1993, p. 204).
Summary
After a long trip, Hoppity the grasshopper returns to the small “Lowlands”
world where his fellow bugs live, commenting “There’s still no place like
home.” However, the Lowlands, a patch of urban garden in the middle of a
metropolis, is under threat by “the human ones,” who carelessly drop their
garbage on the bugs’ houses and disturb their territory. Hoppity is shocked:
“Nobody’s safe in their own homes – or out of them. How long has this been
going on? … There’s only one thing that we can do, we’re in a groove, we got
to move.” Together with Mr. Bumble, he goes scouting for a new place, but on
this expedition Mr. Bumble is almost drowned. He is rescued by the lady of
the house, who says, “There you are, Mr. Bumble, this is where you belong,
right out here in the garden.” Eventually Hoppity finds the bug community’s
new home: in the garden next to a cottage on top of a skyscraper.
The variety of the central metaphor at work here could be formulated as SURVIVAL
IS LOOKING FOR A HOME. That the garden where the bug community will settle is
where they “belong” was anticipated by the lady of the house’s rescue of Mr. Bee.
The notion of looking for a safe house runs through the entire film. Hoppity is in
love with Honey Bee, but his rival, Bagley Beetle, puts pressure on Mr. Bee to let
him, Beetle, marry his daughter by promising that the two of them can live with
him in the vase-house that adorns the fence surrounding the Lowlands. Tellingly,
this vase-house is located higher than the houses of the other bugs; and it is no
less significant that the place where the bugs eventually find their new abode is
high up, exemplifying the metaphor GOOD IS UP (Plate 1). The animation medium
Plate 1. The bug community tries to find a safer place to live, higher up.
Still from Hoppity Goes to Town
is well-suited to depict, with due exaggerations, the embodied strains and difficul-
ties of climbing toward a craved-for home on a high location.
Summary
Arrietty is a miniature girl who lives with her parents in the basement below a
country house, unbeknownst to its owners. Normal, big people are considered
dangerous enemies (like the “human ones” in Hoppity Goes to Town) intent
on getting rid of miniature people. The family survives because the father
every now and then undertakes a nightly expedition to the big people’s home
to “borrow” things they don’t need or won’t miss, such as a lump of sugar, or
a lost pin. However, once the ill-disposed servant Haru has discovered the
family’s existence and whereabouts, Arrietty and her parents need to move
house.
As in Hoppity Goes to Town, the home where the protagonists live is no longer safe.
Moving towards a new home, then, is again primarily a matter of SURVIVAL IS LOOK-
ING FOR A HOME, but by extension the new home is where the miniature people
can be themselves and peacefully live out their true identity. In the film’s final shot
(Plate 2) the family is seen travelling down a stream toward the sunshine, in search
of a new home. The film medium, with its depth-of-field, here optimally exploits the
spatial dimensions of the journey via the TIME IS SPACE metaphor: the family travels
not just forward into the distance; they travel towards the future.
Plate 2. Final shot: Arrietty’s family travels down a stream in a tea kettle boat in search
of a new home. Still from Arrietty the Borrower
Summary
Shmendrick, living in the small Polish village of Chelm, has “a thirst for more
knowledge” – as his voice-over tells us – and leaves wife and children for
Warsaw “to see the big city.” On the way he takes a nap and waking up, without
realizing, takes the same road back. He is surprised to find a village which is
precisely like Chelm, with people very much resembling those he knew in
Chelm and a woman and children shockingly similar to his own family. Only
he himself is not there … After some qualms, he decides to stay, believing that
his alter ego is now in the village that he left, and that in the end “perhaps the
entire world is simply one enormous Chelm.”
Plate 3. After his nap, Shmendrick thinks he walks on to Warsaw, while actually
retracing his steps back to Chelm. Still from The Village of Idiots
Plate 4. Shmendrick’s dream of transporting his house and the rest of the village from
one place to another. Still from The Village of Idiots
Although Shmendrick’s ostensive goal is to gain knowledge and see the big city,
the idea of “going home” as going to the place where one can live out one’s true
identity is strongly present. Unlike Shmendrick himself, the audience knows that
he has simply returned home. The idea of making a journey toward where you
are already in order to find your identity is cued in an interesting manner in the
opening of the film: we see Shmendrick on the roof of his house with a pile of soles
with holes in them. He addresses one of them, and says, “an old sole [punning on
“old soul,” ChF] must have travelled far, having seen many places.” He puts corks
in the holes of the soles, and then hammers the soles over the holes in the roof. In
this context, that is, the soles are metonymically tied to both shoes, and therefore
to journeying, and to the home – the symbol of identity and the destination of
the journey. Again, spatial dimensions are made to work metaphorically: in Plate
3 Shmendrick walks toward a future that is simultaneously his past. The visual-
ization of Shmendrick’s dream, too, draws on movement: the journey toward his
“new self ” is a circular one, and shows him carrying his native village, the locus of
his home and thus his identity, on his back (Plate 4).
Summary
A boy, bottle cap collecting on the beach, runs into a large machine-like
creature with which he plays. At the end of the day he realizes it has nowhere
to go and decides to take care of it. After investigating this “Lost Thing,” his
scientifically minded friend Pete says that he “didn’t think the Lost Thing
came from anywhere, and didn’t belong anywhere either.” The boy then takes
it home. His parents are not very interested and he installs the Lost Thing in
the shed behind the house, where it “seemed happy.” But this can only be a
temporary solution, and the next day he takes the creature to the “Federal
Department of Odds and Ends.” However, in this depressively dark building,
the Lost Thing would just be stored away and forgotten. A cleaner advises
the boy to take his ward to a place he can find by following a wobbly arrow
sign. Eventually, they arrive at “what seemed to be the right place, in a dark
little gap, off some anonymous little street.” After opening a door, a brightly
lit world appears where all kinds of oddly-shaped “lost things” happily play
around. The boy takes leave of his friend, who from now on lives in this haven
for lost things.
Plate 5. The bright and happy world which will become the home of the Lost Thing.
Still from The Lost Thing
Plate 6. The Lost Thing moves to screen-right, whereas other people move to screen-left.
Still from The Lost Thing (thanks to reviewer 1 for pointing this out)
While the place the Lost Thing ends up living in is not, in the strict sense, a house,
the world in which it is finally “home” has house-like qualities: its entrance is a
door-like porch, and it is partly “enclosed” (Plate 5). The fact that it is obviously
too big to live in a normal house (as transpires from its size when it sits on the roof
of Pete’s house, and when it occupies too much space in the parental living room)
further supports the idea that the bright world is its new, and definitive home.
Here, too, spatial dimensions are important. In the search for a home, the Lost
Thing generally moves screen-right (Plate 6), and the camera’s panning (= mov-
ing horizontally) reinforces this left-right direction. Thus, the past is screen-left,
and the future is screen-right.
Summary
In this wordless film (which won an Oscar in 2009), an old man lives alone in
a house that stands in a sea, along with many other houses (Plate 7). But due to
persistent rain the water keeps rising, so with regular intervals he needs to add
a new floor to his house (Plate 8). Each floor is separated from a lower one by
a trapdoor. One time, he drops his pipe, which floats down through the open
trapdoor. He swims down in a diver’s suit to retrieve it, but then decides to
go even further down, through more trapdoors. At each underwater floor he
Plate 7. All houses stand in the sea; their inhabitants live in the top story.
Still from La Maison en Petit Cubes
Plate 8. The old man regularly needs to build a new story on his house to be safe from
the rising water. Still from La Maison en Petit Cubes
relives the period of his life spent there – with his ageing wife, with his young
wife, the birth of their daughter, his marriage … – the shift to memory-status
being signalled by a warm yellow glow as opposed to the real, greyish blue of
the underwater world. At the bottom of the sea we see how he meets his future
wife and together with her builds the first story of their house. He also finds a
wine glass there. When he is up again, he pours two glasses of wine, toasting his
now dead wife.
The home symbolizes the old man’s identity, each floor representing an episode in
his life. No words are needed, because we understand this, again, thanks to the visu-
ally presented metaphor TIME IS SPACE (see Forceville, 2011b; Forceville & Jeulink,
2011). Interestingly, TIME/SPACE is represented on a vertical, not the more custom-
ary horizontal dimension. In this orientation, PAST IS DOWN and FUTURE IS UP. So
the man needs to literally descend into his past (cf. “digging into the past”). The
homes the man is diving into are earlier versions of the home he is currently living
in. In order to understand the film, we need to recruit both the REMEMBERING THE
PAST IS GOING DOWNWARDS metaphor as a specific instantiation of the TIME IS SPACE
metaphor, and the home as symbol of identity. The old man thus literally dives into
his earlier identities as husband, father, young man, and child. These two conceptual
schemas (TIME-AS-SPACE and HOME-FOR-IDENTITY) are productive throughout the
film. The higher the story the old man builds, the older he is (the fact that, as Plate 7
shows, some other houses are under sea level suggests their owners are now dead).
The rising sea level exemplifies thus the inexorable progress of time; the moment
the man can no longer summon the strength or will to build a new story on top of
his house, he will drown in the sea of time. The TIME IS SPACE metaphor is also sup-
ported by the fact that, in the first scene, we see the man fishing through the trap-
door in his house, presumably angling for memories of the past. It is also telling that
each time he has to move to a higher story, he takes part of his furniture with him.
But as we can witness during his diving to lower floors, he also left some furniture
behind – a chair, the bed in which his wife was ill and possibly died and, lower down
yet, a couch where he remembers photographing his daughter and son-in-law with
their child. Moreover, his initial motivation for diving down is that he lost his pipe,
and although he at first considers the option of buying a new pipe from a travelling
salesman, he rather dives down in the hope of retrieving his beloved old pipe. These
events reinforce the idea that the home and the objects used in it are closely related
to the man’s identity: the bed is tied to his identity as married man, and when his
wife is dead he no longer wants the bed; but the pipe is part of an older identity he is
not yet ready to relinquish.
5. Discussion
The five animation films discussed all draw on the house-as-home as the sym-
bolical locus of literal survival and, by extension, of identity. Inasmuch as hu-
man beings (or their anthropomorphized animal or fabled counterparts) are
typically always in search of their identity, it is unsurprising that the structural
metaphor PURPOSIVE ACTIVITY IS MOVEMENT TOWARD A DESTINATION has as one
of its recurring manifestations SEARCHING FOR ONE’S IDENTITY IS LOOKING FOR
A HOME.
While the two feature-length mainstream films discussed, Hoppity Goes to
Town and Arrietty, primarily emphasize the search for a new home as a strategy
for literal survival, here too, there are overtones of the home as symbol of iden-
tity. For instance, it wouldn’t feel right for Mr. Bee and Honey Bee to go and live
in Beetle’s house, since the price would be Honey’s forced marriage with Beetle,
whereas she loves Hoppity. Living in Beetle’s house would be a violation of Honey
Bee’s identity. In Arrietty, the grandfather of the big people’s family long ago made
a doll’s house for the miniature people, hoping that one day they would realize
that some human beings are friendly to them, accept his present, and start living
there. That is, he wanted to provide a home allowing them to live out their identity
peacefully. By contrast, in the three short art animations, the home shifts from
being primarily a resort of protection against physical harm to being the locus of
identity in a more spiritual sense.
In both Hoppity Goes to Town and Les Maisons de Petits Maisons the move-
ment takes place along a vertical dimension, the UP/DOWN orientation being im-
portant here. But it is important to realize that the source domain here is linked
to different target domains. In Hoppity there is little doubt that both FUTURE IS
UP and GOOD IS UP (and PAST/BAD IS DOWN). By contrast, in Les Maisons, only
FUTURE IS UP (and PAST IS DOWN); a healthy reminder that a given source do-
main – here spatial image schemas – can occur with different targets (Kövecses,
2010, p. 136, calls this the “scope” of metaphors; see also Hampe, 2005).
The case studies show that the metaphors governing the animation films,
while crucial for the stories, are no more than very basic templates for the so-
phisticated refinements that can only be appreciated by audiences familiar with
symbols, intertexts, and genres: basic, embodied templates acquire rich meaning
only by being enhanced and nuanced by aesthetic and cultural details. It is im-
portant to be aware of the continuum from deep-rooted, embodied, presumably
universal image schemas and metaphors, via culturally specific knowledge, to the
idiosyncrasies of unique texts. Cognitivist scholars should never forget that the
convention to write conceptual metaphors in small capitals enables unequivocal
references to the CONCEPTUAL level of metaphor as distinct from its verbal level –
but that this is no more than convenient shorthand whose precise formulation
is of little consequence. I completely agree with Pettersson who, in a demonstra-
tion of conceptual metaphors’ role in poetry, warns that a healthy development
of CMT scholarship requires sensitivity to stylistic elements: “In terms of cog-
nitive literary theory […] one ignores essential thematic and formal qualities if
one reduces literary works to cognitive patterns or techniques” (Pettersson, 2011,
p. 108) – a point that pertains no less to the animations discussed here. There is
always the danger that the small-capitals version of conceptual metaphors is taken
as a somehow “correct” rendering of what happens in the mind. But if Lakoff and
Johnson are right – as I think they are – that metaphors are “primarily a matter of
thought and action and only derivatively a matter of language” (Lakoff & Johnson,
1980, p. 153), the verbal rendering of metaphors’ conceptual level is no more than
an approximation of our minds’ activities. A discourse, particularly an artistic
story, can be informed by or even depend on certain metaphors, but it can never
be reduced to them. So in the end, analysis of conceptual metaphors in artistic
discourse requires the analyst’s attentive and sensitive eye and ear not only to the
skeletal metaphors and symbols that structure it, but also to the medium-specific
stylistic and narrative choices made by its maker to present them afresh. The same
principle, incidentally, pertains to the genre of political cartoons (see Bounegru
& Forceville, 2011). Much more is going on in any multimodal discourse than
whatever can be captured by the conceptual metaphor that may trigger its central
strategy of interpretation.
6. Concluding remarks
In this paper I have made the following points – all of which require further in-
vestigation and empirical (dis)confirmation:
Further work on the topic discussed here could branch out in several directions.
In the first place, it is worthwhile to test other animations featuring LOOKING FOR
A HOME in light of the claims made here. We may for instance ask: are there other
target domains to which LOOKING FOR A HOME is systematically connected? A
systematic investigation of the direction of movement as well as of the vehicle
of movement is also worth pursuing. My hunch is that walking or other ways of
progressing depending on protagonists’ own muscle activity is privileged over
transportation in cars, planes, trains, motorboats, etc., since this reinforces the
physical and existential nature of LOOKING FOR A HOME.
It is to be expected that the home will not only feature as a symbol for identity
in journeys toward it, but also in building, repairing, extending, and changing the
home. Are there animation films which feature these other activities pertaining to
house-as-home, and if so, how are these metaphorically exploited? It could also be
insightful to investigate different kinds of buildings. I could imagine that X IS GO-
ING TO/BUILDING/REPAIRING A HOUSE/A CHURCH/A CASTLE/A MUSIC HALL might
occur, and that, given the symbolic potential of these buildings, they might func-
tion in conceptual metaphors as well. If so, it would be interesting to see whether
they are perhaps systematically linked to specific target domains.
Of course, there is no reason to limit such investigations to animation films.
Many live-action road movies and comics, too, feature the SEARCHING FOR ONE’S
IDENTITY IS LOOKING FOR A HOME metaphor non-verbally and multimodally –
and the alternatives suggested in the preceding paragraph (different buildings,
different activities) are no less worth examining. Indeed one reviewer of this
Acknowledgments
I thank Marloes Jeulink for insights about Hoppity Goes to Town and The Village
of Idiots, and Galen Campbell for alerting me to, and discussing, The Lost One. I
am furthermore deeply indebted to the thoughtful and detailed comments and
criticisms by two anonymous reviewers of an earlier draft of this paper.
Notes
1. The 25 volumes comprise 474 papers and book reviews. Occurrences of the two keywords
were counted in the titles of the reviewed books if these were indicated in the online database
at http://www.tandfonline.com/toc/hmet20/current (accessed January 2012). The word “irony”
occurred in 30, and the root “figur-” in 28 titles. The root “metonym-” appeared fewer than 10
times.
References
Beckson, K., & Ganz, A. (1975). Literary terms: A dictionary. New York: Farrar, Strauss and
Giroux.
Black, M. (1979). More about metaphor. In A. Ortony (Ed.), Metaphor and thought (pp. 19–43).
Cambridge: Cambridge University Press.
Brown, D.E. (2010 [1991]). The universal people. In B. Boyd, J. Carroll & J. Gottschall (Eds.),
Evolution, literature & film (pp. 83–95). New York: Columbia University Press.
Bounegru, L., & Forceville, C. (2011). Metaphors in editorial cartoons representing the global
financial crisis. Journal of Visual Communication, 10, 209–229.
DOI: 10.1177/1470357211398446
Coëgnarts, M., & Kravanja, P. (2012). From thought to modality: A theoretical framework for
analysing structural-conceptual metaphor and image metaphor in film. Image [&] Narra-
tive, 13(1), 96–11.
Following Forceville (2005, 2011), in this paper I show that the same conceptual
models underlie the expression of Old English emotions in both the language
and the visual modes. Kövecses (2000, 2005) and Stefanowitsch (2004, 2006)
have shown that verbal expressions and idioms used to describe emotions can
be traced back to a limited number of conceptual metaphors. In the light of
these findings, I will analyze here the pictorial representations of emotions
in the Bayeux Tapestry, an 11th century embroidered cloth that narrates and
depicts the events that led up to the Norman Conquest of England and the
invasion itself. The tapestry, which has been described as an example of early
narrative art (McCloud, 1993, pp. 12–14), shows hundreds of human figures in
an astounding range of poses and circumstances.
My analysis of the set of pictorial signals used in the Anglo-Norman
Bayeux Tapestry to represent emotion types such as ‘anger’, ‘grief ’ and ‘fear’
shows that (1) Anglo-Norman artists used a well-organized set of visual stimuli
to convey emotion-related meanings in a patterned way, that (2) the same ide-
alised conceptual models are shared by verbal and visual modalities, and that
(3) whereas verbal expressions of emotions regularly draw on non-embodied,
behavioural concepts, visual representations show a clear preference for embod-
ied container concepts.
The analysis and description of the different ways language mediates our concep-
tualization of emotions has received a growing amount of attention by researchers
doi 10.1075/bct.78.04dia
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
46 Javier E. Díaz Vera
The multimodal corpus used in this study is based on Foys’ (2003) digital edition
of the Bayeux Tapestry (hence BT; Fowke, 1913; Messant, 1999; Musset, 2005;
Stenton, 1957; Wilson, 1985), an embroidered cloth illustrating, in 32 scenes, the
events leading to the 1066 Norman Conquest of England and the invasion itself.
The BT provides a fitting corpus for the study proposed here; in fact, the tapestry
depicts up to 626 human figures in different actions and poses, accompanied by
57 Latin texts narrating the historical facts. Although the tapestry was commis-
sioned and produced after the Norman Conquest, it was probably designed and
constructed in England by Anglo-Saxon artists (Coatsworth, 2005) and, conse-
quently, is expected to illustrate Anglo-Saxon artistic traditions and techniques.
The BT is referred to by Scott McCloud in Understanding comics (1993) as
an example of early narrative art, whereas British comic book artist Bryan Talbot
(2007, p. 5) describes it as “the first known British comic strip”. In fact, just like a
comic strip, the BT is a hybrid medium having a verbal side tied to a visual side in
order to convey narrative, seeking synergy by using both visual (non-verbal) and
verbal side in interaction.
The complete list of pictorial signals for fear, anger and grief consists of nine-
teen different tokens. Of these, seven signals indicate facial expressions (eyes,
eyebrows and mouth), whereas ten correspond to bodily gesture (head, neck,
shoulders, trunk, arms, hands and legs) and the last two refer to body size.1
3.1.1 Eyes
I have analyzed three different types of eyes in this study: bulging eyes, semi-
closed eyes and tightly closed eyes. Bulging eyes are the most commonly used pic-
torial signal for fear in the BT (32 occurrences) and, less frequently, they are also
used in order to express anger (4 occurrences) and grief (1 occurrence). They
denote an enlarged, black pupil located on the edge of wide-open eyes, normally
delimited by two thick, black lines: upper (eyebrow) and lower (pouch). Plate 1 il-
lustrates two different types of bulging eyes used in the BT: black pupil with round
whole (left) and black pupil without hole (right).
For example, in scene 12 (panel 45), we find an image of King Harold strug-
gling to get to the shore of river Couesnon, carrying one foot-soldier on his back
and pulling another soldier by his hand. Both King Harold and one of the two
Plate 2. Et hic transierunt flumen Cosnonsis hic Harold dux trahebat eos de arena
(scene 12, panel 45, detail from the Bayeux Tapestry – By special permission of the City
of Bayeux)
soldiers are depicted with bulging eyes, as can be seen in Plate 2. Based on the
context of the scene (i.e., two persons trying to escape from drowning), one can
confidently argue that this pictorial signal is being used in order to express fear
on the side of these two characters.
Semi-closed eyes (6 occurrences) as a sign of grief are indicated by a lowered
upper eyelid, which is normally represented by a straight, horizontal line. Finally,
tightly closed eyes illustrating anger (7 occurrences) are indicated by a long line
indicating an enlarged pouch, besides or under the pupil, represented by a small
dot, as in the two characters on the left in Plate 3, where a messenger informs an
enraged Duke William on Harald’s usurpation of the English throne.
3.1.2 Eyebrows
Raised eyebrows (as in Plate 4) are very frequently associated with fear (29 oc-
currences) and grief (11 occurrences) in the tapestry, whereas no occurrence of
Plate 3. Willelm dux iussit naves edificare (scene 24, panel 80, detail from the Bayeux
Tapestry – By special permission of the City of Bayeux)
anger has been found. In the first case, they are normally combined with bulging
eyes, whereas in the second they are accompanied by semi-closed eyes. Raised
eyebrows are thicker and longer than non-raised eyebrows, and have an inverted-
C shape.
In scene 4, panel 10, Harold’s ship is crossing the channel driven solely by
the wind (as indicated by a billowing sail and empty oarlocks). The Latin text
notes that the wind filled the ship’s sails. Contemporary records indicate that
Harold’s ship was blown off course by a heavy storm, and that he was shipwrecked
in Picardy, where he was taken captive by Guy I. The panel depicts fourteen men
trying to regain control of the ship: as can be seen in Plate 5, raised eyebrows are
used for all the characters but one, whose face is hidden behind a rope. Based on
the context of the scene (i.e., a shipwreck), it can be argued that this pictorial sig-
nal is being used in order to express fear on the side of these characters.
Frowned eyebrows are preferred in the BT in order to express anger (10 oc-
currences). In this paper, eyebrows count as frowned when they are joined to each
other (frontal view) or to the upper part of the nose (view from the side).
Plate 5. Mare navigavit et veils vento plenis venit in terra Widonis comitis (scene 4,
panel 10, detail from the Bayeux Tapestry – By special permission of the City of Bayeux)
3.1.3 Mouth
A very thin, closed mouth is frequently found in the BT in order to express fear
(13 occurrences). The upper lip covers the lower lip, as if the character was biting
it with his upper teeth. Less frequently, we find a large mouth represented by a
thin, curved line. For example, the two characters to the right in scene 8, panel 23
(see Plate 6) represent two messengers of Duke William of Normandy, who im-
plore Guy I to release Harold. Closed, thin mouths are combined here with raised
Plate 6. Ubi nuntii Willelmi ducis venerunt at Wido (scene 8, panel 23, detail from
the Bayeux Tapestry – By special permission of the City of Bayeux)
Plate 7. Ubi Harold 7 Wido parabolant (scene 7, panel 20, detail from the Bayeux
Tapestry – By special permission of the City of Bayeux)
eyebrows and hunched shoulders (see below), indicating a mixture of fear and
respect.
Where a thin, closed mouth indicates anger (13 occurrences), the line is
frequently curved downwards. In two different cases (as, for example, the rep-
resentation of Guy de Ponthieu in the centre of Plate 7), the lower lip protrudes
markedly as the upper one recedes.
Finally, grief is indicated by a large, closed mouth (5 occurrences) represent-
ed by an inverted C-shaped line running from cheek to cheek.
3.2 Gesture
3.2.1 Head
A downcast face is frequently used in the BT as an indication of grief (15 occur-
rences). The notion of looking downwards is normally reinforced by the position
of the hair, the eyes and, less frequently, the hands and shoulders of the character
affected by grief.
3.2.2 Neck
An exaggeratedly long, upright neck is used in different scenes (5 occurrences) in
order to indicate anger. This is especially frequent in those cases where a charac-
ter expresses his anger towards a social inferior and can consequently be consid-
ered a symbol of superiority.
Plate 8. Reversus est ad anglicam terram et venit ad Edwardu(m) regem (scene 19,
panel 64, detail from the Bayeux Tapestry – By special permission of the City of Bayeux)
3.2.3 Shoulders
Hunched shoulders are used in the BT in order to indicate fear (9 occurrences).
The neck and the head are horizontally aligned with the shoulders, which are
emphatically raised. As described above, downward inclination of the body or
the head is considered a sign of fear, veneration, submission and reverence in
the Anglo-Saxon world (Díaz Vera, 2011). In fact, Anglo-Saxon writers use the
predicates feallan and creopan in order to refer to the physical expression of these
feelings, showing the existence of a close connection between them. This is clearly
illustrated by Plate 8, where Harold humbly stands before King Edward of Eng-
land, who shows his dissatisfaction with him.
Lowered shoulders (sometimes accompanied by lowered arms) are an indica-
tor of grief (7 occurrences). In some cases, lowered shoulders are exaggeratedly
narrow (that is, as wide as the character’s head, or even less).
3.2.4 Trunk
Turning the trunk backwards (face to the left, body to the right, or both face and
body to the left) is also used in the BT to indicate that someone is escaping or hid-
ing from the source of fear (6 occurrences in all). For example, scene 13, panel 47
depicts a Breton soldier trying to escape from the fortress of Dol, which is being
attacked by the Normans.
Plate 9. Et venerunt at Dol et Conan fuga vertit (scene 13, panel 47, detail from
the Bayeux Tapestry – By special permission of the City of Bayeux)
3.2.5 Arms
Fear can also be indicated by an upper arm emphatically close to the body, where-
as the lower arm is slightly separated from it, sometimes placed around the belly,
or with the hand on the chest (26 occurrences). This can be clearly seen in the
group of characters represented in the famous comet scene (scene 22, panel 74;
see Plate 10 below), whose right arms and left upper arms are complete adhered
to their bodies as a sign of fear provoked by the spectacular vision of a comet
Plate 10. Isti mirant(ur) stella(m) (scene 22, panel 74, detail from the Bayeux Tapestry –
By special permission of the City of Bayeux)
Plate 11. Portatur corpus Eadwardi regis ad ecclesiam Sci Petri ap(osto)li hic Eadwardus
rex in lecto alloquit(ur) fideles et hic defunctus est (scene 20, panel 68–69, detail from
the Bayeux Tapestry – By special permission of the City of Bayeux)
(frequently identified with Halley’s comet; Wright, 1999) with a tail flying through
the English skies by the beginning of the year 1066.
Moreover, widely extended arms (4 occurrences) are used in the BT as an
indication of anger. The upper arm is clearly separated from the trunk and hori-
zontally aligned with the shoulders, as can be seen in Duke William’s portrait in
Plate 2 above.
Plate 12. Hic ceciderunt qui erant cum Haroldo (scene 38, panel 166, detail from
the Bayeux Tapestry – By special permission of the City of Bayeux)
3.2.7 Limbs
Hanging limbs are used to indicate total paralysis of the body in a situation of
extreme fear in three different panels. For example, in scene 38, panel 166, a
Normal soldier holds an unarmoured Anglo-Saxon by the hair and prepares to
decapitate him. As can be seen in Plate 12, both the arms and the legs of the
Anglo-Saxon soldier are hanging, as well as his head.
Body scale size in the BT is highly conditioned by the physical size of the panels
and by the number of characters represented on each one of them. Furthermore,
different body sizes can also be used in order to try to represent the effects of vi-
sual perspective. However, it has been noted that, in the tapestry, someone who is
scared is at times represented in a much smaller scale, as compared to the other
characters in the same scene (13 occurrences). This is the case of the Anglo-Saxon
woman and her child trying to escape from their burning house in scene 30, pan-
el 118 (see Plate 13).
Similarly, anger can be indicated by a relative increase in body size, especial-
ly when the angry character is a social superior (see, for example, King Edward’s
portrait in Plate 8 above).
Table 1 indicates the number of occurrences of each signal as a possible indi-
cator of fear, anger or grief in the BT.
Plate 13. Hic domus incenditur (scene 30, panel 118, detail from the Bayeux Tapestry –
By special permission of the City of Bayeux)
Table 1. Pictorial signals of fear, anger and grief in the Bayeux Tapestry
Pictorial signals in BT fear anger grief
Facial expression bulging eyes 32 4 1
tightly closed eyes 7
semi-closed eyes 6
raised eyebrows 29 11
frowned eyebrows 10
small, closed mouth 13 13
large, closed mouth 5
Gesture downcast face 15
long neck 5
hunched shoulders 9
lowered shoulders/arm(s) 7
trunk backwards (left) 6
arm(s) stuck to body 26
extended arm(s) 5
open hand(s) 9
pointing with the finger 12 5
hanging limbs 3
Body size smaller scale size 13
bigger scale size 6
Total 143 55 54
The pictorial signals analysed here are all highly coherent with existing folk mod-
els for emotions (Díaz Vera, 2011; Gevaert, 2002; Kövecses, 1990). In fact, most of
the Old English metaphors described in earlier research on the verbal expression
of emotions have got a non-verbal correspondence in the BT. It is widely known
that many of our everyday words for emotions stem from a combination of two
ontological metaphors: the body is a container for emotions and emotions
are temperature changes. Within this context, bulging eyes (indicating the
three emotions analyzed here) can be interpreted as a sign of the interior pressure-
aspect of the fluid in container metaphor. Alternatively, bulging eyes could
indicate increased body heat, which is in any case related to the substance in
container metaphor. Similarly, tightly closed eyes can suggest both the pressure
on the body-container in the stage of suppression, or a bodily accompaniment of
released anger (Forceville, 2005, p. 81). In the case of semi-open eyes, which are
used in the BT as an indicator of grief, they can be interpreted as a signal of lack
of vitality (Kövecses, 1990, p. 25).
Raised eyebrows indicate fear and grief in the tapestry, and can also be in-
terpreted as a representation of the interior pressure-aspect encoded in the sub-
stance in container metaphor. Furthermore, a mouth emphatically closed (13
occurrences) could be indicating that the character is preventing his fear or his
anger from coming out of the body/container, whereas a mouth emphatically
wide and large is, again, an indicator of lack of vitality.
4.2 Gesture
Downcast face and lowered shoulders and arms are another manifestation of the
lack of vitality that accompanies grief, and square with the general sadness is
down metaphor.
As for truck backwards and hunched shoulders, both are to be connected to the
figurative expressions fear is turning back and fear is becoming smaller.
Furthermore, arms stuck to the body indicate that the body is rigid and hence
square with the fear is change of flexibility metonymy (Díaz Vera, 2011).
Less frequently, paralysis is represented by hanging limbs in the BT.
Long neck and extended arms can be metaphorically motivated by the map-
ping anger is an opponent in a struggle (Kövecses, 1990, p. 21), as they imply
an increase in body size. Similarly, pointing towards the source of anger with the
index finger of the right hand is a sign of aggressive behaviour, whereas pointing
towards the source of fear with the index finger of the left arm indicates a defen-
sive reaction and, consequently, illustrates the fear is an opponent metaphor.
Finally, open hands are another indicator of lack of vitality, a recurrent metaphor
for grief.
Finally, as in the case of hand and finger position, body size is to be related to
the aggressive and defensive behaviours that accompany, respectively, anger and
fear, and square well with the anger/fear is an opponent metaphors.
5. Conclusion
In this paper, I have examined the different ways anger, fear and grief are rep-
resented in the Bayeux Tapestry. As a result of this analysis, I have reconstructed
a whole set of pictorial signals used to represent these three emotions in the tap-
estry. The pictorial signals described in this chapter are commensurate with the
results of previous research on the linguistic expression of emotions (Díaz Vera,
2011; Gevaert, 2002; Kövecses, 2000). This set of pictorial signals explains how
medieval ‘readers’ of the BT were able to identify in a simple and straightforward
way what kind of emotion is expressed in each panel from only pictorial signals
of the types described here. Furthermore, this set constitutes a more or less rudi-
mentary sign-system where facial expressions and gesture play an important role
in helping observers to cue emotions. It should also be noted that, apparently,
none of these pictorial signals alone is able to convey the idea of fear, anger or
grief. It is the combination of different pictorial signals in the same character that
is used by medieval embroiderers in order to suggest one of these three emotions.
Conversely, as has been seen here, many of the signals described in this paper are
shared by more than one emotion, which indicates that a particular signal is not
necessarily reserved for the expression of one concrete emotion but may, in com-
bination with other signals, suggest a completely different emotion.
How precise and universal this sign-system is can only be assessed after the
model presented here has been applied to a wider number of visual represen-
tations of emotions in different historical periods and cultural areas, and the
multiple relationships between these visual representations and the correspond-
ing linguistic expressions for emotions have been reconstructed (with special
attention to metaphoric and metonymic expressions). The primary aim of this
research was to outline, on the basis of a selection of pictorial signals used in the
Bayeux Tapestry, how these signals can be distinguished, classified and interpret-
ed. I hope that this analysis has demonstrated how theorists of visual metaphor
and metonymy may benefit from the systematic examination of the use of picto-
rial signals for the expression of emotions. Furthermore, cognitivists in various
disciplines should find the existence of stable sets of pictorial signals and their use
as communication systems pertinent to their research.
Notes
1. Only four of these pictorial signals appear in Forceville’s analysis of the representation of
anger in the Asterix album La Zizanie (Forceville, 2005, p. 75–77): bulging eyes, tightly closed
eyes, tightly closed mouth and arm position.
References
Abbott, M., & Forceville, C.J. (2011). Visual representations of emotion in manga: loss ofcontrol
is loss of hands in Azumanga Daioh volume 4. Language and Literature, 20(2), 91–112.
DOI: 10.1177/0963947011402182
Coatsworth, E. (2005). Stitches in time: Establishing a history of AngloSaxon embroidery. In
R. Netherton & G.R. OwenCrocker (Eds.), Medieval clothing and textiles, Vol. 1 (pp. 1–27).
Woodbridge: Boydell & Brewer.
Danaher, D. (1998). Peirce’s semiotic and conceptual metaphor theory. Semiotica, 119(1/2), 171–
207.
Díaz Vera, J.E. (2011). Reconstructing the Old English cultural model for fear. Atlantis, 33(1),
85–103.
Eerden, B. (2009). Anger in Asterix: The metaphorical representation of anger in comics and
animated films. In C.J. Forceville & E. UriosAparisi (Eds.), Multimodal metaphor (pp. 246–
264). Berlin: Mouton de Gruyter.
Fesmire, S.A. (1994). Aerating the mind: The metaphor of mental functioning as bodily func-
tioning. Metaphor and Symbolic Activity, 9, 31–44. DOI: 10.1207/s15327868ms0901_2
Forceville, C.J. (2005). Visual representations of the idealized cognitive model of anger in the
Asterix album La Zizanie. Journal of Pragmatics, 37, 69–88.
DOI: 10.1016/j.pragma.2003.10.002
Forceville, C.J. (2011). Pictorial runes in Tintin and the Picaros. Journal of Pragmatics, 43, 875–
890. DOI: 10.1016/j.pragma.2010.07.014
Fowke, F.R. (1913). The Bayeux Tapestry: A history and description. London: G. Bell & Sons.
Foys, M.K. (2003). The Bayeux Tapestry digital edition. Woodbridge: Boydell & Brewer.
Gevaert, C. (2002). The evolution of the lexical and conceptual field of anger in Old and Middle
English. In J.E. Díaz Vera (Ed.), A changing world of words: Studies in English historical
lexicology, lexicography and semantics (pp. 275–299). Amsterdam: Rodopi.
Increasingly global markets impose strains on the branding industry for the
design of trademarks with a worldwide appeal. This paper explores the potential
benefits of the exploitation of embodied schemata for this purpose. A corpus of
international automobile brands is analyzed in search of the image schemas at
work in the conceptualization of different car categories (i.e. minis, family cars,
sports cars, and off-road 4 × 4s). Our findings evince that, together with other
well-known strategies (i.e. sound symbolism), multimodal image schemas can
be added to the inventory of branding tools which help to imbue brands with a
globally comprehensible semantics. In the context of branding, it is also attested
that the structure of the general schemas is fleshed out through their interaction
with the most salient attributes of the target product/service named by a partic-
ular brand, rather than in relation to other contextual or cultural facts.
1. Introduction
Creating a new brand for the already crowded present-day market is a complex
task, as shown by the increasing number of branding and naming companies
offering such services. New brands must not only comply with basic marketing
needs (i.e. ease of pronunciation/spelling, lack of negative unfortunate semantic
associations, displaying a catchy and distinctive character, etc.), but they are also
expected to be meaningful to consumers worldwide. This paper explores the role
of image schemas and related multimodal image schematic metaphors and me-
tonymies in the process of creating globally-valid trademarks.
doi 10.1075/bct.78.05per
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
62 Lorena Pérez Hernández
the construction of gender spaces, as well as with the purpose of introducing sex-
ism, in advertising. In a similar vein, Felices Lago and Cortés de los Ríos (2009)
make use of image schemas in their account of the dominant values in print ad-
vertisements announcing different types of environmentally friendly products
and services from energy (oil, electricity, etc.) and heavy industry corporations.
In this paper, we further contend that image schemas, and image-schematic
mappings (Johnson, 1987; Lakoff, 1987; Lakoff & Johnson, 1999), by virtue of
their aforementioned pre-conceptual, experiential, and cross-cultural nature, also
qualify as an efficient, yet still largely overlooked, tool for the more specific pur-
pose of creating global brands (Pérez Hernández, 2011). In addition, this paper
argues that the universal nature of image schemas can be maximized through
their multimodal expression. In fact, the systematic exploitation of multimodal
image schemas in the process of brand design is expected (1) to ease the gen-
eration of global brands and (2) to increase their suggestiveness and conceptual
richness, both through the fixed implications of the internal logic of the sche-
mas, and also via metaphoric and metonymic extensions making use of those
embodied schemata as their source domains. With this aim in mind, a corpus of
international automobile brands/logos is analyzed in search of the image schemas
at work in their conceptualization; the extension of their use and its multimodal
nature is assessed; and a description of the semantic shades added by each image
schema is offered.
In so doing, our aim is to add one more strategy to the inventory of multimod-
al resources for the design of internationally appealing and meaningful brands.
Additionally, our research into the image-schematic foundations of brands al-
lows a parallel incursion into the current debate about the situatedness of image
schemas themselves (Gibbs, 1999; Johnson, 2005; Kimmel, 2005, 2008; Sinha &
Jensen de Lopez, 2000). In the context of branding, our findings suggest that the
structure of the general schemas is fleshed out through their interaction with the
most salient attributes of the target product/service named by a particular brand,
rather than in relation to other contextual or cultural facts.
The outline of this paper is as follows. Section 2 tackles the issue of the defini-
tion and scope of image schemas, and image-schematic mappings. Additionally,
the current debate about the universal versus contextually-situated nature of em-
bodied schemata is revisited from the perspective of the branding industry and
the specific needs of brand designers. Section 3 describes the corpus of analysis.
Section 4 offers a description of the main image schemas at work in four automo-
bile categories. Finally, Section 5 considers the implications of our findings for the
process of brand design and provides some suggestions for future research.
In this paper we take sides with the original definition of image schemas as men-
tal patterns of bodily, perceptual, including kinetic, experience (Johnson, 1987;
Lakoff, 1987):1 a category of “recurring patterns of our sensory-motor experience
by means of which we can make sense of that experience and reason about it”
(Johnson, 2005, pp. 18–19). As pointed out by Lakoff and Johnson (1999), this
type of meaning structures constitutes a sort of “cognitive unconscious”, operating
beneath the level of our perceptive awareness. Furthermore, given that all human
bodies share several quite specific sensory-motor capacities constrained by their
own size and constitution, as well as by the common traits of the diverse environ-
ments they inhabit, image schemas are expected to be the same across different
cultures. Thus, through the relative bilateral symmetry of our bodies we experi-
ence the image schema of left-right; our constant interaction with containers
leads us to structure such experiences in terms of the container schema; etc.
It is precisely their directly meaningful, preconceptual, and universal nature,
which emerges as an interesting trait for the branding industry. If the essential
embodied semantics of image schemas could somehow be incorporated into the
content of brands, the latter would by default convey a basic, cross-culturally
shared meaning, which could be straightforwardly apprehended by consumers
worldwide.
This global semantics of trademarks would then be necessarily enriched and
completed through their embeddedness within specific cultural, emotional, and
interactional experiences. As Gibbs (1999), Johnson (2005), and Kimmel (2005,
2008), among others, have rightly argued, attending only to the structure of image
schemas as recurring patterns of organism-environment sensory motor interac-
tions and ignoring their nonstructural, more qualitative aspects stemming from
socio-cultural facts, prevents us from accounting for their variation both across
cultures and in situated cognition. Different instantiations of the container
schema, for example, illustrate the need to “put flesh on image-schematic skel-
etons” (Johnson, 2005, p. 27). More specifically, there is a varied range of feel-
ings associated with the actual experience of containment: from the coziness and
warmth caused by a tight hug to the constraint prompted by the confines of a
small room; and from the freedom or, in some cases, fear people may feel when
leaving a closed area to the phobia or the sense of autonomy which can derive
from entering an open territory.
The present account of the image-schematic foundation of brands does not
entertain as one of its goals the description of related “situated image schemas”.2
Since the branding industry focuses on the generation of globally understandable
messages, our analysis is geared towards those universal aspects of image schemas
serving this purpose. Nevertheless, it would be unwise to overlook the mounting
evidence in favor of the situated and context-sensitive dimension of embodied
schemata. As far as the branding industry is concerned, our findings evince that
it is possible to strike a balance between the use of too general schemas devoid of
anything context-bound, on the one hand, and the positing of isolated, culturally-
tied, situated schemas devoid of a universally shared semantic core, on the other.
This can be achieved by limiting the situated enrichment of the initial general
schemas to those facts deriving from the key attributes of the product/service un-
der consideration. In other words, the flesh needed to complete the image-sche-
matic skeleton upon which a particular brand is built stems from the interaction
between the structure of the image schema at work and the concrete ‘affordances’
offered by the target product which the brand names. By way of illustration, the
force image schema, in the context of automobile brands, may be instantiated
differently in relation to a sports car, in which case those aspects of the schema
related to ‘speed’ are activated; or in relation to a jeep, whose affordances bring to
the forefront those features of the force schema related to the notion of ‘power’.3
3. Data selection
(3) Sports cars are medium-size, usually two-seat, two-door vehicles designed for
high speed driving. Their focal affordances include those of “speed”, “smooth
driving”, and “high performance”. E.g. Audi TT.
(4) Off-road 4 × 4s: Basic key attributes characterizing off-road 4 × 4s are “power”,
“solidness”, and “robustness”. E.g. Land Rover.
Minis display characteristically small dimensions, and therefore, “size” and the
related attribute schema (small-big) become essential to their marketing. In
spite of the fact that some cultures may share primary metaphors which assign an
In all cases, the positive affordances of small cars are still preserved, but the nega-
tive ones (i.e. lack of space/comfort) may somehow be challenged by the interac-
tion of the image schema of attribute (hinted at by the diverse visual features
of the brand name) and the ICM of Size, on the one hand, and the positive in-
terpretation which is, by default, expected in a marketing/branding context, on
the other.
As shall become apparent in Sections 4.2 and 4.3, the attribute schema will
be found at work in different, sometimes diametrically opposed, car categories
such as minis, family cars, and 4 × 4s. The image schema of attribute describes
basic properties of objects related to their size, strength, weight, and tempera-
ture, among others. Most of these properties involve a scale between two extreme
points (i.e. big-small). Which of these gets to be activated, or to what degree,
will be determined by the intrinsic properties of the target product in interaction
with the cue provided by the visual inputs included in its brand and/or marketing
campaign. This cognitive process, known as parametrization (Ruiz de Mendoza,
2011), consists in adapting the basic conceptual layout provided by the expression
to other textual and contextual clues (i.e. in the case of branding, the relevant con-
text is provided by the nature of the target product itself). If we consider a wine
brand such as Imperial, for instance, it is our knowledge that emperors lived in a
world of luxury that allows us to interpret this wine as a high quality product in
terms of taste and aroma. If the same brand were to be used to name a horse or a
car instead, its parametrization would trigger different interpretations, probably
along the lines of a pure breed, competitive horse, and of a luxurious and expen-
sive car, respectively. The same brand (i.e. linguistic cue) is parametrized differ-
ently depending on the product that it names. Likewise, the use of the attribute
image schema as a branding strategy may be parametrized as required depending
on the nature of the target product (i.e. car category). Different visual inputs will
serve as cues for the achievement of a felicitous parametrization.
Specific brands and logos of family cars stand out as the most neutral in our cor-
pus. They make a standard use of capitalization and typeface size. Alternations
of capital/lowercase letters or the exclusive use of small letters observed in Sec-
tion 4.1 are not recurrent features of those brands in this segment.6 34 out of a
total of 55 family car brands are rendered in full capital letters, a feature which
is largely compatible with the key attribute “space”, typical of this car category,
and which points to a potential exploitation of the image schema of attribute
(big-small). The rest are presented with an initial capital letter as is mandatory
for proper names, thus making no relevant contribution to the semantics of the
brand as far as the key attribute of “space” is concerned.
In most cases, branding specialists rely on the manufacturers’ general brand
for the commercialization of family cars. When considering the embodied foun-
dations of general automobile trademarks, our analysis reveals that the most
recurrent image schema is that of container, one of the most ubiquitous and
Plate 2. BMW, Honda, and Lexus logos. Logos reproduced with permission.
All Lexus logos, trademarks, service marks and copyrights are solely and exclusively
owned by Toyota Motor Corporation.
“Speed”, “smooth driving”, and “high performance” are some of the focal features
of sports cars. Our corpus reveals that the first of them finds its way into the
brands/logos of this car category through a metonymic exploitation of the image
schema of force. As Johnson points out (1987, p. 43) “our experience of force
usually involves the movement of some object (mass) through space in some di-
rection. In other words, force has a vector quality, a directionality”. In addition,
“forces have degrees of power or intensity” (Johnson, 1987, p. 43). As a car moves,
it traces a path that can be described by a force vector. In turn, the speed of the
moving car is a reliable indicator of the intensity of the force it exerts. The vector
element of the force image schema is visually cued in many of the sports cars
brands in our corpus. In some cases this is done by means of a subtle transforma-
tion of the traditionally circular/oval frames of car logos (see Plate 2) into other
types of closed areas incorporating some kind of pointed element resembling a
vector (e.g. >).
Plate 3 is a schematic representation of the type of logo frame found in some
of the most representative manufacturers of sports cars, including Porsche, Lam-
borghini, Gumpert, and Ferrari. Other sports car companies, such as Maclaren,
Maseratti, Corvette, and Tesla, also display logos with some kind of vector image
built into them. Plate 4 offers a schematic representation of the different graphics
representing force vectors in the aforementioned trademarks.
Nevertheless, the most pervasive and subtle strategy for the metonymic acti-
vation of the force schemata in two-seaters brands is by far the use of italics. 18
out of 29 brands in this category display letters whose upper part is slightly tilted
Plate 5. Honda CR-Z and Hyundai Veloster logos. Reproduced with permission.
towards the right, thus suggesting a forward movement. Honda CR-Z and Hyun-
dai Veloster are just two examples (Plate 5).
The Honda CR-Z logo displays another recurrent feature of sports cars trade-
marks, namely the use of elongated letters, especially in their horizontal strokes.7
Such visual lengthening of the horizontal lines in car brand names iconically
cues the path image schema, to which the force schema itself is subsidiary, and
which, in the context under consideration, may evoke the felicitous notion of
movement towards a destination. Several other brands achieve the same effect by
linking two or several characters of the brand name with an underlying line which
again visually suggests a path. Veloster above is a good example, together with
other well-know trademarks like Ferrari, Lotus Evora, and Lexus LF-A. Opel GT
combines both strategies, so that the elongated upper parts of the G and T charac-
ters are virtually fused into a long curved stroke resembling an open and smooth
curve. As a result, the brand logo is especially apt for suggesting the third of the
essential attributes of sports cars, that is, their high performance in curved paths.
Chrysler Crossfire deserves closer attention. This brand name and its associ-
ated logo point to the image schematic gestalt of force through a double met-
onymic mapping of the effect for cause and attribute for entity types,
respectively. This brand name exhibits horizontal strokes lingering after each of
its characters as a sort of tail or hair blowing in the wind. Thus, the visual effects
of speed (hair/clothes lingering backwards in the wind) stand for their cause (i.e.
speed). In turn, speed is but one of the intrinsic characteristics of the smooth
movement of an entity (a car in the case under consideration) along a path. As
a result, this logo succeeds in conveying the central key attributes of sports cars
(i.e. “speed” and “smooth movement”) in a novel and singular way, consequently
increasing the distinctiveness of its associated brand.
Plate 7. Honda CR-V and CR-Z brand logos. Reproduced with permission.
VS. VS.
Plate 8. Broader vertical strokes and squared circular characters in off-roads brand
names.
All and all, non-italicized, sturdy, capital letters turn up as pervasive features
of 4 × 4s brands. These visual cues help to parametrize the schema of attribute
(big-small) in a way compatible with the nature of 4 × 4s, and in turn, to highlight
their characteristic dimensions and robustness. Finally, an interesting exploitation
of the path and force image schemas at work in off-road brands involves the use
of broken letters.
Discontinuous paths or gaps along a path block the force exerted by the
moving vehicle and are considered impediments to travel. The broken letters in
Hyundai Tucson and the discontinuous, but still straight path running along the
Chevrolet Captiva logo visually activate these notions, which are also defining
characteristics of the type of bumpy, irregular roads on which 4 × 4s are expected
to perform. Such visual call on the difficulties of the path, together with the ro-
bustness of the chosen typeface, combine to convey a clear positive message as
to the good performance of these vehicles in the adverse driving circumstances
under consideration.
5. Conclusions
While it is not possible to make full generalizations on the basis of the analysis of
a limited number of brands, the four case studies presented in Section 4 license
the formulation of the following observations, which are subject to confirmation
through further research:
1. Incorporating image schemas into the design of brands grounds them in em-
bodied experience and, as a result, may make them more extensively mean-
ingful across cultures. By way of illustration, consider the incorporation of the
container schema in many of the car brands in our corpus. Humans with
different cultural backgrounds will all share a certain general understanding
of the semantics of those brands, since they all have a common experience
of what it means for something/someone to be inside or outside a container,
especially in relation to the affordances offered by the target product (i.e. ve-
hicle).
2. Image schemas have been shown to be incorporated into car brands mainly
through visual means. Due to its iconic, cross-cultural nature, the use of the
visual mode, either in isolation or in conjunction with others (linguistic, au-
ditory) is expected to result in the generation of brands with a wider global
scope. Special attention has, therefore, been paid to the visual aspects of the
brands analyzed in this study, including their associated logos and the font
type, size and shape of their brand names.
3. Just as generic image schemas operate within bodies which interact with en-
vironments offering specific affordances, the specific schemata found in the
semantic make-up of car brands have been proved to be largely compatible
with the “affordances” offered by each specific car category. The interaction
between the schema and the product-affordances gives way to enriched, con-
textually-situated, but still globally-meaningful brands.
Further research should look for empirical confirmation of the theoretical find-
ings of the present analysis on the image-schematic foundation of car brands. In
this respect, a survey is at present being conducted among a group of multi-cul-
tural consumers in order to assess to what extend car brands with specific image
schemas built into their semantics (1) are more straightforwardly associated with
those related key attributes identified by the present study, and (2) whether such
association is equally established by all consumers independently of their cultural
and linguistic backgrounds.
Another area of interest for future studies would be to examine how image
schemas in combination with primary metaphors may allow branding specialists
to boost their brands with additional shades of suggestive meaning tied to more
abstract domains of experience.8 Given their grounding in sensory-motor physi-
cal experience, such metaphorical extensions of the meaning of brands would still
retain a conveniently cross-cultural shared semantic core.
Acknowledgements
The research on which this paper is based has been funded by the Ministry of Sci-
ence and Innovation, Spain, Project No. FFI2010-17610. This research has been
carried out within the Center for Research in the Applications of Language (CRAL),
University of La Rioja (Spain).
Notes
1. This view has found support in simultaneous research carried out within psychology and
other related fields (Damasio, 1994; Finke, Pinker, & Farah, 1989). It differs, however, from
more recent characterizations of image schemas by authors such as Sweetser (1990), Turner
(1991), and Clausner and Croft (1999). Some of these accounts envisage image schemas as
structures which are too complex to count as basic dimensions of perceptual representation.
Thus, Turner (1991, pp. 176–177) has identified image schemas for notions such as a “cup” or
a particular “phoneme”. Other proposals include within the category of image schemas certain
nonperceptual representations (i.e. schemas which are not tied to any particular aspect of bodi-
ly experience). The schemas of “complexity”, “ceasing to exist” (Turner, 1991), or “sharpness”
(Clausner & Croft, 1999) fall within this class.
3. Image schemas do not merely exist in our brains, but rather they operate within bodies of
a particular physiological make-up which interact with environments that offer very specific
“affordances” (Gibson, 1979). An “affordance”, according to Gibson, is a pattern of potential
engagement and interaction with parts of our environment. A chair, for instance, “affords” sit-
on-ability for human beings.
In the context of marketing, the affordances offered by those products/services named by
brands are closely linked to their key attributes. Thus, a car, for example, affords driving-abil-
ity. In turn, different cars will afford diverse types of driving-ability: fast driving, safe driving,
power driving etc. Such specific affordances help to flesh out the full semantic interpretation of
the initial image schemas at work in the semantic make-up of their related brand names.
4. The choice of automobile brands as data for this study was motivated by the extensive and
varied number of image schematic visual inputs they display, as well as by the straightforward
connection observed between the basic semantics of the image schemas of attribute, force,
and container, on the one hand, and the target attributes of the four car categories under
scrutiny, on the other.
5. The full classification of cars by Euro NCAP can be accessed through its web page at [http://
www.euroncap.com]. For the full list of manufacturers, whose brands have been included in
this research, see Appendix 1. Due to space constraints and copyright issues we cannot include
a visual image for every brand logo included in the analysis. The vast majority of them can be
found at the web page Brands of the World available at [http://www.brandsoftheworld.com/].
6. Only one exception (i.e. cee’d) has been found in a list of 55 brands of family cars. This low
frequency of occurrence contrasts sharply with the generalized use of this type of strategies
observed within the category of minis.
7. This trait is remarkably evident in brands like Porsche, Lotus Elise, Mitsubishi Eclipse, Jaguar
XJ and XK, which unfortunately cannot be reproduced here due to lack of permission.
8. Johnson (2005, pp. 24–27) has amply dealt with the pivotal role of image schemas in ab-
stract reasoning. Together with Lakoff and Johnson (1980, 1999), Lakoff (1987), and Lakoff and
Núñez (2000), among others, he argues in favor of an embodied logic, which recruits body-
based image-schematic structures for the understanding of abstract concepts and the drawing
of inferences. Such mappings, labeled as “primary metaphors”, provide perceptual anchors to
target notions which can be equally basic and central to human interaction and understanding
as their corresponding embodied source domains, but which are not directly apprehensible
through our perceptual systems (Grady, 1997, 2005).
References
Clausner, T.C., & Croft, W. (1999). Domains and image schemas. Cognitive Linguistics, 10, 1–
31. DOI: 10.1515/cogl.1999.001
Cortés de los Ríos, M.E. (2001). Nuevas perspectivas lingüísticas en la publicidad impresa anglo
sajona. Almería: Servicio de Publicaciones de la Universidad de Almería.
Damasio, A. (1994). Descartes’ error. New York: GrossetPutnam.
Felices Lago, A., & Cortés de los Ríos, M.E. (2009). A cognitiveaxiological approach to print
ecoadvertisements in the economist: the energy sector under scrutiny. Revista de Lingüísti-
ca y Lenguas Aplicadas, 4, 59–78. DOI: 10.4995/rlyla.2009.735
Finke, R., Pinker, S., & Farah, M. (1989). Reinterpreting visual patterns in mental imagery.
Cognitive Science, 13, 41–78.
Forceville, C. (1998). Pictorial metaphor in advertising. London: Routledge.
Forceville, C. (2007). Multimodal metaphor in ten Dutch TV commercials. Public Journal of
Semiotics, 1, 19–51.
Forceville, C. (2008). Pictorial and multimodal metaphor in commercials. In E.F. Mc Quarrie
& B.J. Phillips (Eds.), Go figure!: New directions in advertising rhetoric (pp. 272–310).
Armonk, NY: ME Sharpe.
Gibbs, R.W. (1999). Taking metaphor out of our heads and putting it into the cultural world. In
R. Gibbs & G. Steen (Eds.), Metaphor in cognitive linguistics (pp. 145–166). Philadelphia/
Amsterdam: John Benjamins. DOI: 10.1075/cilt.175.09gib
Gibson, J. (1979). The ecological approach to visual perception. Boston: Houghton-Mifflin.
Grady, J. (1997). Foundations of meaning: Primary metaphors and primary scenes. Ph.D. disser-
tation at the University of Berkeley.
Grady, J. (2005). Image schemas and perception. In B. Hampe (Ed.), From perception to mean-
ing: Image schemas in cognitive linguistics (pp. 35–56). Berlin: Mouton de Gruyter.
DOI: 10.1515/9783110197532.1.35
Hidalgo-Downing, L., & Kraljevic-Mujic, B. (2011). Multimodal metonymy and metaphor as
complex discourse resources for creativity in ICT advertising discourse. In F. Gonzálvez
García, S. Peña Cervel & L. Pérez Hernández (Eds.), Metaphor and metonymy revisited
beyond the Contemporary Theory of Metaphor (pp. 153–178). Special issue of Review of
Cognitive Linguistics. Amsterdam: John Benjamins. DOI: 10.1075/bct.56.08hid
Johnson, M. (1987). The body in the mind: The bodily basis of meaning, imagination, and reason.
Chicago: The University of Chicago Press.
Johnson, M. (2005). The philosophical significance of image schemas. In B. Hampe (Ed.), From
perception to meaning: Image schemas in cognitive linguistics (pp. 15–34). Berlin: Mouton
de Gruyter. DOI: 10.1515/9783110197532.1.15
Kimmel, M. (2005). Culture regained: Situated and compound image schemas. In B. Hampe
(Ed.), From perception to meaning: Image schemas in cognitive linguistics (pp. 285–312).
Berlin: Mouton de Gruyter. DOI: 10.1515/9783110197532.4.285
Kimmel, M. (2008). Properties of cultural embodiment: Lessons from the anthropology of the
body. In M.F. Roslyn, R. Dirven, T. Ziemke & E. Bernardez (Eds.), Body, language and
mind. Vol. 2. Socio-cultural situatedness (pp. 77–108). Berlin: Mouton de Gruyter.
Klink, R.R. (2001). Creating meaningful new brand names: A study of Semantics and sound
symbolism. Journal of Marketing: Theory and Practice, 9, 27–34.
Koller, V. (2009). Brand images: multimodal metaphors in corporate branding. In C.J. Forceville
& E. UriosAparisi (Eds.), Multimodal metaphor (pp. 45–73). Berlin: Mouton de Gruyter.
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind.
Chicago: University of Chicago Press. DOI: 10.7208/chicago/9780226471013.001.0001
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge
to western thought. New York: Basic Books.
Lakoff, G., & Núñez, R. (2000). Where Mathematics comes from: How the embodied mind brings
Mathematics into being. New York: Basic Books.
Lowrey, T.M., & Shrum, L.J. (2007). Phonetic symbolism and brand name preference. Journal
of Consumer Research, 34, 406–414. DOI: 10.1086/518530
Núñez Perucha, B. (2003). Esquemas de imágenes y modelos populares: An estudio del lenguaje
dela victimización en textos narrativos en lengua inglesa. Logroño: AESLA.
Ortiz, M.J. (2010). Visual rhetoric: Primary metaphors and symmetric object alignment. Meta-
phor and Symbol, 25(3), 162–180. DOI: 10.1080/10926488.2010.489394
Peña Cervel, M.S. (2000). A cognitive approach to the image schematic component in the meta
phorical expression of emotions in English. PhD Dissertation. Universidad de La Rioja.
Pereltsvaig, A. (2011). “What’s in a name?”. Languages of the World (October 6, 2011). http://
languagesoftheworld.info/etymology/whatsinaname.html.
Pérez Hernández, L. (2011). Cognitive tools for successful branding. Applied Linguistics, 32,
369–388. DOI: 10.1093/applin/amr004
Ruiz de Mendoza, F.J. (2011). Metonymy and cognitive operations. In R. Benczes, A. Barcelona
& F.J. Ruiz de Mendoza (Eds.), What is metonymy?: An attempt at building a consensus view
on the delimitation of the notion of metonymy in Cognitive Linguistics (pp. 45–76). Amster-
dam/Philadelphia: John Benjamins. DOI: 10.1075/hcp.28
Sinha, C., & Jensen de Lopez, K. (2000). Language, culture and the embodiment of spatial cog-
nition. Cognitive Linguistics, 11, 17–41.
Sweetser, E. (1990). From etymology to pragmatics: Metaphorical and cultural aspects of seman-
tic structure. Cambridge: Cambridge University Press. DOI: 10.1017/CBO9780511620904
Turner, M. (1991). Reading minds: The study of English in the age of Cognitive Science. Princeton:
Princeton University Press.
Umiker-Sebeok, J. (1996). Power and the construction of gendered spaces. International Review
of Sociology, 6(3), 389–403. DOI: 10.1080/03906701.1996.9971211
Velasco Sacristán, M.S., & Cortés de los Ríos, M.E. (2009). Persuasive nature of image schemat-
ic devices in advertising: Their use for introducing sexisms. Revista Alicantina de Estudios
Ingleses, 22, 239–270.
Appendix
Alfa Romeo, Audi, BMW, Chevrolet, Chrysler, Citröen, Dacia, Daewoo, Daihatsu, Dodge,
Ferrari, Fiat, Ford, Honda, Hyundai, Jaguar, Jeep, Kia, Lancia, Land Rover, Lexus, Lotus,
Mazda, Mercedes Benz, MG, MINI, Mitsubishi, Nissan, Opel, Peugeot, Porsche, Renault,
Toyota, VOLVO, Volkswagen.
Diana E. Popa
Dunarea de Jos University of Galati
1. Introduction
Metaphors are the essential core of human thought and creativity (Bronowski,
1972, p. 108). Seen as a central feature of human cognition that has evolved with
the development of language (Pinker, 1993), the ability to conceptualize one en-
tity in terms of another allows us to communicate through metaphor (Hart &
Long, 2011, p. 53).
From Black (1962) to Richards (1936) and Lakoff and Johnson (1980), theo-
rists have asserted metaphor’s irreducible cognitive force and the fact that many
of our actions are based on our metaphorical conceptions. A metaphor has such
cognitive force not because it provides new information about the world but be-
cause it reconceptualizes information that is already available (Kittay, 1989; Mio
& Katz, 1996). Moreover, we most often use metaphors to understand and experi-
ence the intangibles of a culture (Kövecses, 2005, p. 2). Metaphors, therefore, are
integral to public life, particularly to politics, not only at the verbal level but also
visually (Punter, 2007, p. 43).
doi 10.1075/bct.78.06pop
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
80 Diana E. Popa
1. the ways in which the verbal, visual and auditory modalities employed con-
tribute to the construal of the multimodal metaphor; and
2. the functions of multimodal metaphors in political entertainment, particu-
larly in animated political cartoons, as they are constructed in the examples
under analysis.
detail. Sections 4, 5 and 6 are dedicated to outlining the criteria used to select
the corpus and the sample for the current study as well as the analytical method
employed. Finally, Section 7 includes concluding remarks and a discussion of po-
tential topics for further research in the field of multimodal metaphor.
Since the writings of Walter Lippmann (1922/1965), it has come to seem obvi-
ous that politics is too complex and abstract to be experienced directly by ordi-
nary people. For many people, it is easier to access the political world (Mio, 1997,
p. 114) through television, which has been the site where politics and public life
are played out and where the meanings of public life are generated and debated
(Craig, 2004, p. 4). Political entertainment in the media, particularly in television,
can therefore serve two distinct, but not necessarily mutually exclusive, purposes
in addition to its overtly expressed mission to entertain. The first possible purpose
is to oversimplify politics and its internal processes so that it can reach a much
broader audience, perhaps even reaching the passively uninterested. The second
possible purpose is to deliberately manipulate public opinion or, at the least, to
influence public perceptions of certain politicians and their political acts. Either
of these effects, however, may stimulate citizen engagement with politics only tex-
tually rather than in an organizational or participatory manner (see Jones, 2010;
Popa, 2011a).
Political entertainment is no exception to the uses-and-gratifications model
developed by Pierce, Beatty, and Hagnar (1982), which asserts that people get
what they want to get out of the media. Therefore, if people watch political en-
tertainment to understand politics and politicians and their actions, humorous
political programs will have an effect on their political awareness. If people con-
sume political entertainment to confirm their preexisting opinions, the programs
will not have any effect on their opinions. However, there may be people who
enjoy political entertainment purely for its entertainment value. For such view-
ers, political entertainment functions as an institutionalized genre that exposes
politicians’ foibles and hypocrisy but also guarantees that the target of the humor
will not truly be injured. The relief function of such conventionally permitted
laughter is also supported by the fact that the target of the humor is a member of
a different class than the audience. Thus, the humorous effect of political enter-
tainment is triggered by both the internal humor devices of a program and, more
importantly, by the very fact that politicians, as a class, are being disparaged. In
this way, humor strengthens in-group solidarity for those who share the same
political views.
At a time when television and the Internet are predominantly visual media, politi-
cal cartoons have adapted to a much more dynamic format that can easily com-
bine bright, funny cartoons with hard-hitting political analysis to appeal to all
types of audiences.
As Keefe (2004) and Fiore (2004) note, the attraction of this genre lies in its
graphics, i.e., that the motion, sound, color and interactivity of animated political
cartoons set them apart from their print siblings. The goal of any cartoon, wheth-
er print or animated, is, as Fiore (2004, p. 5) notes, to get a message across in an
engaging and entertaining way. The message comes first and the humor second,
but ideally, these elements affect the viewer simultaneously.
Moss (2007) highlights the fact that, despite technological progress, the
themes of political cartoons have remained unaltered; they all involve domes-
tic politics, social themes and, in some cases, foreign affairs. Most commonly,
cartoons address a current political issue or event, a social trend, or a famous
personality in a way that takes a stand or presents a particular point of view (El
Refaie, 2009a, pp. 184–185).
Tsakona (2009, p. 1172) notes that, because of their condensed form and the
interaction between language and image, cartoons are often considered to be a
direct and easily processed means of communicating a message and influencing
public opinion. As part of televised political culture, animated political cartoons
are a medium that combines satirical verbal discourse with a visual representation
of opinion.
In the specific context of Romania, but in an analysis that can easily be ex-
tended to other cultures, Popa (2011a, p. 140) identifies four distinct functions of
political animated cartoons:
As Teng (2009, p. 207) notes, one of the generic conventions of cartooning is the
inclusion of a critical stance toward a particular socio-political situation, event
or person. Thus, an individual seeking or occupying a position of authority, the
institution housing that position, the policies that the individual promotes or the
political system as a whole may all be targeted by animated political cartoons.
However, cartoons are not themselves obligated to provide a viable solution for
the problem identified.
At a functional level, print and animated cartoons operate in similar ways;
however, with respect to their internal mechanisms, there are noticeable differ-
ences between the genres. In print cartoons, pictorial metaphor and its more
complex form, the visual pun (in Attardo & Chabanne’s (1992), Hempelmann and
Samson’s (2007) but also Mitchell’s (2007) understanding of the concept), seem to
be the most common mechanisms of humor; in animated political cartoons, one
of the most common mechanisms is the multimodal metaphor. This is not surpris-
ing given that, as Edelman (1971, p. 65) argues, metaphors are generally devices
for simplifying and giving meaning to complex or confusing sets of observations
that evoke concern. Because the field of politics is complex and confusing, the
use of multimodal metaphor in animated political cartoons is a way of explaining
the significance of real-life events and characters through an imaginary scenario
(El Refaie, 2009b, p. 176; Schilperoord & Maes, 2009). An imaginary scenario that
relies on an extended metaphor (see, among others, Crisp, 2005 and references
therein) that, in turn, sets up blended spaces (Fauconnier & Turner, 2003, p. 39)
results in both the creation of new meaning and the processing of information.
This effect may occur because metaphors, on the surface, simply draw a compari-
son between one thing and another, but in a more subtle way, they usually imply
an entire narrative and a prescription for action (Stone, 1988, p. 118).
The prescription for action may also be determined by the fact that meta-
phor, whether multimodal or monomodal, is argumentative (Moss, 2007;
Schilperoord & Maes, 2009). Consequently, metaphor can facilitate the persuasion
process through the exchange of information, further influencing the attitudes of
the audience. Though not explicitly, the argumentative nature of metaphor is also
an effect of its ability to convey both cognitive and emotional meaning within a
single framework. Whereas cognitively metaphor creates a feeling of enlighten-
ment during the processing of information, its emotional effect is to indirectly in-
fluence attitudes by establishing a connection between the sender and the receiver
of the information.
4. The corpus
The sample analyzed in this paper was chosen from a larger corpus consisting
of 14 episodes of the 2007–2008 season of The Animated Planet Show (originally
titled Animat Planet Show). The script from 230 minutes of 2D computer anima-
tion was manually transcribed, stored and archived. Although the program was
discontinued at the end of 2008, most of the 2007–2008 season is still freely avail-
able at www.youtube.com.
As with all print and animated political cartoons, the show is meant to be
amusing, but its purpose is not solely humor for its own sake. Political cartoons
include a stance toward a particular socio-political situation, event or person, and
The Animated Planet Show is no exception. However, the show can only aim to af-
fect points of view, beliefs and perspectives on those socio-political affairs, not to
change or influence behavior. Thus, as with non-animated political cartoons, this
Romanian television program operates on two distinct levels: it tells an imaginary
story about a make-believe world, and, concomitantly, it refers to real-life events
and characters. The relationship between these levels of meaning is essentially
metaphorical.
We decided to select our examples from this particular corpus because the
program was representative of its genre in terms of structure and was successful
according to its record high ratings during its run (for a detailed analysis of the
corpus as well as a comprehensive description of the television phenomenon in-
ternationally, see Popa, 2011a, pp. 140–145).
5. Sample selection
The level of abstraction in the examples to be analyzed was perhaps less than that
found in most printed cartoons for the following reasons:
a. establishing the real-world referents for this animated cartoon was facilitated
by its producers in that all public figures were represented by a complex mix-
ture of real images and drawings, with photographs of their faces attached to
caricatured representations of their bodies;
b. the audience did not have to impose a narrative on the cartoon because the
narrative was explicitly and overtly constructed by a narrator or character in
the show; and
c. the text–image relationship was much more easily comprehended by the au-
dience in conjunction with simultaneous modes of cognitive processing for
sound, music, color, movement, and other modes, which were important re-
sources for signaling narrative sequencing.
The sample was taken from the episode aired on May 11, 2008. The entire episode
was 20 minutes long, which was the average duration for episodes of this pro-
gram. The examples given here are taken from one of the mini-narratives within
the episode; the mini-narrative has a duration of approximately 8 minutes and 50
seconds, including the time allocated for the passage from one mini-narrative to
another.
The root metaphor is politics is a forest. This metaphor is rich with con-
notations, powerful and flexible in application, yet familiar to all. The rhetorical
fantasia (Edwards, 2001) of the animal tale is used to domesticate the political
world (Conners, 2005). The program relies on animal metaphors that are deeply
rooted in popular culture. As Haslam, Loughnan, and Sun (2011, p. 312) note,
the animal kingdom is a bountiful source domain and provides a rich metaphori-
cal vocabulary. Moreover, animal metaphors are particularly vivid and therefore
reduce ambiguity and increase the likelihood that the intended meaning of a criti-
cal stance will be understood.
In the following section, we focus in more detail on the analysis of the multi-
modal-metaphor examples, paying special attention to the way the verbal, visual,
and auditory modalities employed contribute to the construction of the multi-
modal metaphor.
construct the animal representation of each of the politicians included in the ani-
mal tale set in the forest.
As Sommer and Sommer (2011, p. 243) show, when used as metaphors for
human personalities, the generic terms ‘animal’ and ‘beast’ generally have exclu-
sively negative connotations. To refer to someone as ‘an animal’ or ‘a beast’ im-
plies that they are an ugly, uncouth, unpleasant individual. Sommer and Sommer
(2011) take their argument further and assert that most zoomorphs are negative
and that animals and beasts, including birds and insects, are uncomplimentary
when applied to people. In the specific context of animated political cartoons,
animal metaphors identify attributes that are socio-politically disapproved of. It
could be said that, under the circumstances, zoomorphs are “nothing but a polite
form of a more mean-spirited joke or putdown” (Pollio, 1996, p. 233). However,
animal metaphors should not be perceived by their targets as offensive because
their illocutionary force is satirical; in other words, they are part of a television
genre that qualifies as an institutionalized form of humor. Institutionalized hu-
mor has an acknowledged release function involving defunctionalization and lib-
eration from social constraints (Popa, 2011b).
The metaphorical representations in animated political cartoons can be clas-
sified as complimentary representations (examples 1, 4, and 9 in Table 1), uncom-
plimentary representations (examples 2, 5, 7, 8, and 10) (see Sommer & Sommer,
2011; Haslam et al., 2011) or ambivalent representations (examples 3 and 6).
In the first example (Plate 1), the animal metaphor was of the auditory-pic-
torial type. Pictorially, the target was cued using the same technique that was
used for all of the animal characters, namely, a caricatured body in the shape of
a particular animal to which the animators attached a photograph of the face of
a politician. The auditory modality used a stereotypical sound representing an
animated hopping hare.
The high-priority semantic markers in all animal metaphors profile areas of
general similarity among animals’ features; they are significant in determining
the cognitive/conceptual meaning of those animal metaphors (Ọlátéjú, 2005). In
this case, the high-priority semantic markers are [+animate] and [+animal], or in
some cases, variations such as [+animate] or [+bird] and [+animal].
The low-priority semantic markers contribute minimally to the cognitive/
conceptual meaning of the animal metaphors but contribute significantly to their
secondary or metaphorical meanings. In other words, low-priority semantic
markers determine the connotative interpretations of animal metaphors. In the
first example, the relevant low-priority semantic markers are [+speedy], [+un-
skilled] and [+novice].
As stated above, all animal-metaphor interpretation relies heavily on contex-
tual and cultural factors. Although the hare metaphor could be interpreted as
indicating someone is a novice player in the political campaign, in this case, it in-
dicated not that the individual was unskilled but that he was a winning candidate
who would outrun all of the other candidates in the Bucharest mayoral elections.
This interpretation was later verbally mediated by the dinosaur, Iliescu. Because
the hare was a relatively complimentary use of an animal metaphor, its critical
stance was not established by its mere presence but was constructed later through
the implied contrast with other, more uncomplimentary, animal metaphors such
as the dinosaur.
In the second example (Plate 2) as well, the target and the source were rep-
resented partially using different modes. The target was shown in a caricatured
cricket body, playing a guitar and singing at the same time. Crickets are not a
highly zoomorphic species; therefore, the use of a cricket in the analyzed episode
relied heavily on viewers’ prior pragmatic knowledge regarding internal political
affairs. However, the mapped connotation that can be inferred from the cultural
reference may be that of a deceiver. The politician was being criticized for his
empty promises and ineffective actions.
The third example (Plate 3) was of the verbo-pictorial type. The target was rep-
resented partially in the verbal mode and partially in the pictorial mode; the verbal
mode anchors introduced the character along with the other characters in the situ-
ation. The two domains involved in this metaphor would likely have been recog-
nizable without the verbal message, but the verbal aspect confirmed the identity of
the target. The hedgehog is a highly zoomorphic species in Romanian culture, and
Plate 4. The politician is an animal: lion, bulldog and squirrel and the politician is a bird:
woodpecker (Animat Planet Show, 2008)
its association with a politician is not especially surprising. The ambivalent nature
of the hedgehog metaphor as it was used in this cartoon – i.e., that it had both
positive and negative connotations constructed in the verbal and pictorial modali-
ties – rendered it more complex in this case. Even if the pictorial representation
tended to associate the hedgehog with a positive character, the verbal message,
“What do you mean? You just hang in there and the rest doesn’t matter,” represent-
ing the hedgehog’s attitude in the face of danger, depicted it in a rather unfavorable
light. The implied connotations were of self-protection and obstinacy to facilitate
a point that the cartoonists wanted to make through the animal metaphors they
introduced in the episode.
Examples 4, 5, 6 and 7 (Plate 4 above) were all introduced at the same time;
they were all of the sonic-pictorial type. Except in example 7, the target and source
would still have been identifiable even if the sonic factor were ignored. The audi-
tory modality involved stereotypical cartoon background music, which perhaps
reinforced the general situational theme rather than adding to the representations
of the specific zoomorphs. Given this analysis, it may be that examples 4, 5 and
6, as they would have been initially perceived by the audience, might qualify as
monomodal pictorial metaphors. The only complimentary zoomorph was the lion
(example 4). The target of this metaphor was not just any politician, but the Presi-
dent himself (the president is a lion). The particular connotations mapped
from the source to the target would be those of being a notable person and being a
powerful, aggressive player. Although initially complimentary in nature, the lion
zoomorph was later used to express criticism, when the other characters empha-
sized the lion’s intent to rule over not only the jungle but the forest as well (where
the country is a jungle and the capital is a forest).
paper but that cognitive meaning and connotative interpretation would be im-
paired.
In conclusion, a considerable amount of research has been carried out that
has highlighted the need to acknowledge the existence of multimodal metaphor.
Research in this area should perhaps be extended further to explore the effects
of multimodal metaphor in other forms of representation, such as postmodern-
ist theater. In the case of animated political cartoons, we would like to pursue a
cross-cultural investigation. Empirical data and case studies on instances in which
multimodal metaphor has been used for purposes other than simplification, ar-
gumentation or the propagation of a critical stance would provide the necessary
evidence to support the claim that multimodal metaphor is indeed operative and
that it is therefore a viable concept when compared with other methodologi-
cal tools.
Primary source
References
Attardo, S., & Chabanne, J.C. (1992). Jokes as a text type. Humour, 5, 165–176.
Black, M. (1962). Models and metaphors. Ithaca, NY: Cornell University Press.
Bosman, J., & Hagendoorn, L. (1991). Effects of Literal and Metaphorical Persuasive Messages.
Metaphor and Symbolic Activity, 6(4), 271–292. DOI: 10.1207/s15327868ms0604_3
Bounegru, L., & Forceville, C. (2011). Metaphors in editorial cartoons representing the global
financial crisis. Visual Communication, 10(2), 209–229. DOI: 10.1177/1470357211398446
Bronowski, J. (1972). Science and human values. New York: Harper Torchbooks.
Conners, J. (2005). Visual representations of the 2004 presidential campaign: Political cartoons
and popular culture references. American Behavioral Scientist, 49, 479–487.
DOI: 10.1177/0002764205280920
Craig, G. (2004). The media, politics and public life. Australia: Allen and Unwin Academic.
Crisp, P. (2005). Allegory, blending and possible situations. Metaphor and Symbol, 20(2), 115–
131. DOI: 10.1207/s15327868ms2002_2
Edelman, M. (1971). Politics as symbol action: Mass arousal and quiescence. Chicago: Markham.
Edwards, J. (2001). Running in the shadows in the Campaign 2000: Candidate metaphors in
editorial cartoons. American Behavioral Scientist, 44, 2140–2151.
DOI: 10.1177/00027640121958249
El Refaie, E. (2009a). Multiliteracies: how readers interpret political cartoons. Visual Commu-
nication, 8(2), 181–205. DOI: 10.1177/1470357209102113
Pollio, H. (1996). Boundaries in humour and metaphor. In J.S. Scott & A. Katz (Eds.), Metaphor:
Implications and applications (pp. 231–253). Mahwah, NJ: Lawrence Erlbaum Associates.
Popa, D. (2007). Media and the public sphere. In Y. Pasedeos (Ed.), International dimensions of
mass media research (pp. 493–504). Athens: Athens Institute for Education and Research.
Popa, D. (2011a). Political satire dies last: A study on democracy, opinion formation, and po-
litical satire. In V. Tsakona & D. Popa (Eds.), Studies in political humour (pp. 137–166).
Amsterdam/Philadelphia: John Benjamins. DOI: 10.1075/dapsac.46.10pop
Popa, D. (2011b). Political humour and the ritual of rebellion in computermediated communi-
cation. Mélanges francophones, 5(6), 344–362.
Punter, D. (2007). Metaphor. New York: Routledge.
Raskin, V. 1988. Sophisticated jokes. In F. Shaun, D. Hughes & V. Raskin (Eds.), WHIMSY VII.
W. Lafayette (pp. 125–127). IN-Tempe, AZ: Purdue University International Society of
Humor Studies.
Richards, I.A. (1936). The philosophy of rhetoric. London: Oxford University Press.
Schilperoord, J., & Maes, A. (2009). Visual metaphoric conceptualization in editorial cartoons.
In C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 213–240). Berlin/
New York: Mouton de Gruyter.
Sommer, R., & Sommer, B.A. (2011). Zoomorphology: Animal metaphors for human personal-
ity. Anthrozoős, 24(3), 237–248. DOI: 10.2752/175303711X13045914865024
Stone, D.A. (1988). Policy paradox and political reason. Glenview, IL: Scott, Foresman.
Teng, N.Y. (2009). Image alignment in multimodal metaphor. In C. Forceville & E. Urios-
Aparisi (Eds.), Multimodal metaphor (pp. 197–212). Berlin/New York: Mouton de Gruyter.
Thompson, S. (1996). Politics without metaphors is like a fish without water. In J.S. Scott &
A. Katz (Eds.), Metaphor: Implications and applications (pp. 185–201). Mahwah, NJ:
Lawrence Erlbaum Associates.
Tsakona, V. (2009). Language and image interaction in cartoons: Towards a multimodal theory
of humour. Journal of Pragmatics, 41, 1171–1188. DOI: 10.1016/j.pragma.2008.12.003
1. Introduction
doi 10.1075/bct.78.07fen
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
100 Dezheng Feng and Kay L. O’Halloran
semioticians, how metaphors are visually expressed. In other words, we are in-
terested in the visual mechanisms which are used to construct metaphors. From
this perspective, Carroll (1996) and Forceville (1996) can be viewed as efforts
to describe the visual realization of conceptual metaphors, which accord with
El Refaie’s (2003) cognitive definition.
However, the descriptions are inadequate, as “there seems to be a whole range
of different forms through which metaphorical concepts can be expressed visu-
ally” (El Refaie, 2003, p. 80). We argue that the inadequacy is due to the lack
of understanding and systematic description of meaning making mechanisms in
visual images. As a result, cognitive linguists attribute the understanding of vi-
sual metaphors to the situational/cultural context, but pay less attention to the
text-internal mechanisms of visual images. From a semiotic point of view, while
acknowledging the role of context and human cognition, we argue that visual
images themselves are constructed in certain ways to cue metaphors. Therefore,
our aim is to provide a systematic account of the visual mechanisms for the re-
alization of metaphor, based on Kress and van Leeuwen’s (2006) social semiotic
visual grammar. This endeavor complements Feng’s (2011a) explanation of visual
grammar with conceptual metaphor theory.
The data analyzed in this study is comprised of 100 car advertisements, which
are chosen for their creative use of visual images (cf. Forceville, 1996). The main
conceptual framework of the social semiotic approach is presented in Section 2.
Then we discuss how metaphors are visually realized in representational mean-
ing structures in Section 3. The metaphorical meaning of interactive resources is
investigated in Section 4, and in Section 5 we examine how conceptual metaphors
are realized in the composition of visual images. Finally, we describe how a social
semiotic visual grammar can provide a comprehensive account of the visual con-
struction of metaphors and how conceptual metaphor theory lends epistemologi-
cal status to such a grammar in Section 6.
Alongside Forceville’s (1994, 1996) theory of pictorial metaphor, the field of mul-
timodal semiotics emerged, building on Halliday’s (1994) social semiotic theory
of language (known as systemic functional linguistics). In this approach, language
is modeled as sets of inter-related systems of choices which are metafunctionally
organized. The “systemic” principle regards grammar as systems of paradigmatic
choices which are represented as system networks. The “functional” principle
states that language simultaneously provides resources for constructing three
metafunctions: ideational meaning, interpersonal meaning and textual meaning.
In this sub-section, we propose a social semiotic model for the “object is object”
metaphor which is the focus of Forceville (1994, 1996). As mentioned above,
Forceville’s “metaphorical subject” and “pictorial context” are seen as belonging to
one unified grammatical unit in representational meaning structures. Represen-
tational meaning in visual images is modeled in terms of processes, participants
and circumstances, and each image is a configuration of choices from these three
categories. In narrative structures, the metaphorical subject relates to other ele-
ments through actional, verbal, or mental processes; in conceptual structures, it
relates to other elements through relational processes in the form of taxonomic
relations (classification processes), part-whole relations (analytical processes) or
identifying relations (symbolic processes).
Defamiliarization metaphors are mainly constructed by anomaly, or un-
conventionality, of visual elements in the representational structure, in a similar
manner to the colligational interpretation of metaphor in language (Goatly, 1997,
p. 111). As there is variation in the conventions associated with different process
structures, it is important to examine the different types of anomaly. In what fol-
lows, we shall investigate how metaphors are realized in actional, classificational
and analytical processes.
In actional processes, conventional participants (i.e. actors or goals) or cir-
cumstantial elements associated with certain actions are substituted by uncon-
ventional ones, with the former as the source domain and the latter as the target
domain. For example, in a car advertisement in Feng (2011a, p. 63), the car is
worn on a man’s wrist like a watch. Apparently, the car takes the place of a watch,
which results in colligational anomaly. By taking the place of a watch, the car
adopts its attributes, constituting the metaphor car is watch. The medium of
an action (e.g. the tools which are used to perform the action, see Halliday, 1994,
p. 154) can also be substituted. In Forceville’s (1994, p. 10) example, a person is
killing himself by pointing a gas nozzle on his head. The metaphor gas nozzle
is gun is constructed because the gas nozzle adopts the role of a gun. El Refaie’s
(2003, p. 79) example in which a group of Kurdistan refugees are holding the flag
with its inscription “New Kurdistan” can also be explained with participant sub-
stitution. Conventionally, the army carries the flag and claims sovereignty after
conquering a place. Here, the conventional actor is substituted by refugees, which
results in the metaphor refugees are army/invaders.
In classificational processes, metaphor is constructed in two ways. First, en-
tity a is an unconventional member of a category whose conventional member is
entity b. As a result, a borrows the salient features of b and the metaphor a is b is
formed. For example, in a car advertisement from Qilu Evening Paper on 19 July
2008, the image shows five athletes ready to run a 100-meter race, but the middle
track is occupied by a car. As a result, the car adopts the most salient feature of the
athletes, that is, fast. In Teng’s (2009, p. 198) example, where an American news-
paper is put among horror books on a bookshelf labeled “horror”, the resultant
metaphor American news is like horror novels is another case of this type of
realization. Second, two entities may be put together unconventionally to form a
covert category (Kress & van Leeuwen, 2006). The formation of covert categories
requires a crucial visual feature – that is, symmetry in composition, such as equal-
ity in size, framing and arrangement (Kress & van Leeuwen, 2006, p. 79). This
process is similar to the visual simile in Forceville’s (1996) categorization, but the
conceptualization of two juxtaposing entities as forming an unconventional co-
vert category helps to explain the metaphorical mapping – the unconventionality
of the category alerts us to the metaphor and being members of the same category
makes the mapping of attributes possible. However, the source and target do-
mains cannot be structurally determined in this case because they are represented
on the premise that they are equal, and we have to draw upon other cues like the
linguistic context and the discourse purpose. The advertisement in Plate 1 is a
good case in point. The minivans are juxtaposed with weight-lifting champions.
They form a covert category by being identical in number and arrangement. Since
it is an advertisement for the minivan, the minivan is the target and the metaphor
thus formed is minivans are weight-lifting champions. The salient feature of
the athletes, that is, strength, is mapped onto the minivans.
Anomaly in analytical processes occurs when there is an unconventional part
in the whole. This can happen in two ways. First, the unconventional part a takes
the place of the conventional part b and hence inherits its salient features. The
Plate 1. Wuling minivan, from Qilu Evening Paper, July, 19th, 2008, A10
(reproduced with permission)
well-known example from Forceville (1996, p. 110), which shows a man’s torso
with a suit but with the tie substituted by a shoe, illustrates this type of metaphor.
By taking the place of the tie, the shoe inherits the salient features of the tie and
the metaphor shoe is tie is formed (see Forceville, 1996, p. 10 for detailed analy-
sis). However, there are also rare cases in which the substituted part is the target,
as in Forceville’s (1996, p. 123) car advertisement in which the life buoys take the
place of car tires. The metaphor formed is car tires are life buoys, in which the
unconventional part is the source. This is where the structural cue of realization
contradicts the contextual cue, and we have to resort to the latter to identify the
metaphor.
Second, an entity (or part of it) is superimposed on another entity (or part
of it). The superimposition may or may not change the conventional identity of
the entity. If it doesn’t, the superimposed entity becomes an unconventional part
of the whole, and as in the case of substitution, the unconventional part is the
target. However, this case differs from substitution, because the superimposed
entity inherits the attributes of the whole which it forms a part. We can call this
type Superimposition 1 (S1). An example is found in Yus (2009, p. 162), where a
saucepan has an image of the continents of the earth superimposed upon it. The
superimposition doesn’t change the identity of the saucepan. As part of the sauce-
pan, the earth inherits one of its attributes, that is, warms up gradually.
The superimposed entity may also change the identity of the original image
and they together form an unconventional whole, or a hybrid, similar to the for-
mation of covert categories in classificational processes. In this case, the superim-
posing part is the source and its salient features are added to the whole. We shall
call this type Superimposition 2 (S2). For example, in an advertisement in which
a pair of butterfly wings is added to a motorbike, the salient features of butterflies
such as beauty and lightness are projected onto the motorbike, which produces
the metaphor motorbike is butterfly. However, this example can also be seen
as the motorbike substituting the body of the butterfly, which results in the same
metaphor. Yus (2009, p. 164) provides a similar example in which dice dots are
superimposed on a ballot box. The superimposing part is the source which lends
attributes to the entity it is superimposed upon. It can also be interpreted as the
ballot box substituting the body of the dice. Either way, the ballot box borrows
the features of the dice and results in the metaphor ballot box is dice, which
further stands for election is gambling.
To summarize, we have examined the visual mechanisms for realizing met-
aphors in representational structures, as illustrated in Figure 1. The slanted ar-
row denotes realization. For example, visual anomaly in narrative structures is
“realized” as participant substitution and circumstance substitution. The analysis
shows that the source and target domains of defamiliarization metaphors can
Narrative anomaly
Participant substitution
Circumstance substitution
Member substitution
Unconventional covert category
Analytical anomaly
Part substitution
Part superimposition
mostly be identified by examining the way they are represented (i.e., anomaly in
different process structures). However, since metaphors are not constructed by
decontextualized visual components, representational resources alone may not
be able to specify the source and the target. Moreover, the cues from the process
of construction may contradict with the more explicit contextual cues of interpre-
tation. Such awareness of context makes our approach social semiotic, whereby
representational anomalies are seen as resources for metaphor realization, rather
than as rigid semiotic codes. In this sense, our framework only describes the met-
aphor potential in the representational structure, without claiming that structural
anomalies are able to determine all visual metaphors independently.
Plate 2. Toyota Camry, from The Straits Times, 4th October, 2008, C2
(reproduced with permission)
Cognitive studies of visual metaphor mostly focus on what is in the image, instead
of how the image is represented. In social semiotic terms, only representational
resources are investigated, while interactive and compositional resources remain
largely implicit. As an exception, El Refaie (2009) discusses the visual realization of
orientational metaphors in political cartoons, associating spatial orientations with
concepts like power and time. However, a systematic account of the metaphori-
cal meaning of spatial orientations in visual images is not yet available. Kress and
van Leeuwen’s (2006) semiotic model of interactive and compositional meaning
resources provides a comprehensive framework for systemizing such metaphors.
Building on Feng (2011a, 2011b), we discuss the visual realization of metaphor in
interactive and compositional meaning structures in Sections 4 and 5.
Interactive and compositional meaning resources construct conventional
metaphors. The mappings between the source and target domains in conventional
metaphors are not based on similarities, but on correlations derived from our ba-
sic experience of the world (Lakoff & Johnson, 1980, p. 155). The interpretation
of such metaphors does not depend on immediate context, but on physical and
cultural experiences that are common to human beings in general or to specific
cultural communities (Lakoff & Johnson, 1980, p. 14). Therefore, to prove the
validity of the conventional metaphors realized by interactive and compositional
resources, we need to provide their experiential bases.
Interactive meanings include contact, social distance and subjectivity. Accord-
ing to Kress and van Leeuwen (2006), contact is realized by gaze, social distance
by shot distance and subjectivity by camera angle (see also Dyer, 1989; Messaris,
1994). Since gaze and camera angle converge in most cases (i.e. gaze normally
converges with front angle and absence of gaze with oblique angle), we shall only
discuss the resources for social distance and subjectivity, under the term camera
positioning.
From a cognitive perspective, the relation between camera positioning and
interactive meaning is the metaphorical mapping between the source domain and
the target domain (Feng, 2011a). This mapping can be considered as a master
metaphor which entails all the sub-mappings between camera positioning and in-
teractive meaning. In this way, Kress and van Leeuwen’s (2006) descriptive gram-
mar is reformulated as a conceptual metaphor system that is visually realized, as
shown in Figure 2.
To prove the validity of the metaphor system, we need to provide experiential
bases for the mappings. The metaphorical meaning of camera positioning is pre-
mised on the iconic nature of visual images. That is, shot distance reproduces the
Close relation is close shot Image power is low angle Involvement is frontal view
Equality is eye-level angle
Distant relation is long shot Viewer power is high angle Detachment is back view
Figure 2. The visual realization of metaphor in interactive resources (Feng, 2011b, p. 27)
structural features of physical distance in real life and camera angle reproduces
features of the ways we look at and interact with people. The basis of the mapping
between physical distance (hence shot distance) and social distance is well estab-
lished in the study of proxemics (e.g. Hall, 1969) and will not be elaborated here.
The mapping between image-viewer power relation and vertical camera angle is
based on the structural features of real-life situations in which we “look up” to
powerful people and “look down” upon weak people (Messaris, 1994, p. 9). The
mapping between involvement and horizontal camera angle is based on real life
situations where we face the person we want to interact with and gaze at him/her,
and turn our face (gaze) away if we don’t want to interact.
Through these experiential bases, it can be argued that these metaphors do
exist and are conventionalized in our ordinary conceptual system. However, these
conventional or default interpretations of camera positioning may be overridden
by other factors in specific contexts. For example, Dick (2005, p. 53) points out
that sometimes film scripts require a high or low angle shot for the sake of consis-
tency rather than for symbolism. For this reason, social semiotic interpretations
are often criticized for being too rigid, while in reality the connections are fluid
and subject to change. From the cognitive perspective, this is because certain se-
miotic choices (e.g. low angle) are not motivated by the default experiential basis,
but by other factors (e.g. intertextual and discursive consistency). In such cases,
the overriding factors are usually more salient and point to one specific interpre-
tation. In the social semiotic approach, we do not consider camera positioning, as
well as composition, as rigid semiotic rules, but as resources for making meaning.
Ambiguity may arise as a result, particularly as metaphors are by their very nature
open to more than one interpretation (El Refaie, 2009, p. 182).
In the corpus of 100 print car advertisements, 67% uses high camera angle,
which suggests the advertisers’ intention to build consumer power (Feng, 2011a).
Lakoff and Johnson (1980, p. 126) point out that linguistic forms are endowed
with content by virtue of spatial metaphors. This observation is certainly appli-
cable to visual images where space plays an even more important role across a
larger number of dimensions. For example, visual semiotic resources include
the spatial positioning of different elements, their relative size, and the distance
between them. These semiotic resources construct compositional meanings of
information value, salience and framing (Kress & van Leeuwen, 2006). The infor-
mation values of given/new, ideal/real and important/unimportant are realized by
the spatial orientations of left/right, up/down and central/marginal respectively.
Salience and framing are not abstract concepts and will not be discussed, but the
size and the distance between elements are included in the visual metaphor sys-
tem, as shown in Figure 3.
Similar to interactive meanings, visual compositional meanings are also de-
rived from our embodied experience. Given is left/new is right is based on
the experience that in most cultures, people write and read from left to right, so
we take the left as given information and the right as new. In ideal is up, “ideal”
has two different but related entailments, that is, desirable and unrealistic (Feng,
2011a, p. 59). Desirable is up is synonymous with the well-established metaphor
good is up (Lakoff & Johnson, 1980) and will not be further explained. Unreal-
istic is up uses a different sense of “up” – that is, high. It is difficult or unrealistic
to get things that are too high (e.g. stars). Therefore, ideal things, while desirable,
6. Conclusion
this study has demonstrated that the integration of social semiotics and cognitive
metaphor theory is significant for the understanding and explanation of visual
semiosis. Therefore, we conclude with the hope that these two theoretical ap-
proaches will be combined in further explorations of multimodal discourse.
Acknowledgement
The research for this article was supported by the Interactive Digital Media Pro-
gram Office (IDMPO) in Singapore under the National Research Foundation’s
(NRF) Interactive Digital Media R&D Program (Grant Number: NRF2007IDM-
IDM002-066).
References
Kress, G., & van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed).
London: Routledge.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press.
Messaris, P. (1994). Visual literacy: Image, mind and reality. Boulder: Westview Press.
Morris, R. (1993). Visual rhetoric in political cartoons: A structuralist approach. Metaphor and
Symbolic Activity, 8(3), 195–210. DOI: 10.1207/s15327868ms0803_5
O’Halloran, K.L. (2005). Mathematical discourse: Language, symbolism and visual images.
London: Continuum.
O’Toole, M. (2010). The language of displayed art. (2nd ed). London: Routledge.
Teng, N.Y. (2009). Image alignment in multimodal metaphor. In C. Forceville & E. Urios-
Aparisi (Eds.), Multimodal metaphor (pp. 197–211). Berlin: Mouton de Gruyter.
Yus, F. (2009). Visual metaphor versus verbal metaphor: a unified account. In C. Forceville &
E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 147–172). Berlin: Mouton de Gruyter.
This article aims to explore how the use of visual metonymies in picture books
contributes to children’s understanding of stories and, in turn, attracts their at-
tention towards relevant aspects of the plot. The two picture books selected for
analysis are Gorilla, by Browne, and The Tale of Peter Rabbit, by Potter, intended
for children under 9 years of age.
A multimodal and cognitive perspective is adopted here to apply the non-
verbal trope of visual metonymy to the two picture books that form the sample
texts (Forceville, 2009, 2010; Forceville & Urios-Aparisi, 2009).
The results of the analysis show that visual metonymies are essentially used
in children’s tales to create narrative tension in certain stages of the plot and, in
turn, to establish a bond between the represented participants and the child-
viewer.
Two fundamental tropes within cognitive linguistics are the metaphor and the
metonymy, defined as phenomena of thought used to conceptualise reality by
means of the relationships that are established between a source and a target do-
main. Cognitive scholars have been interested essentially in the metaphor as a
figure of thought which may be used to represent abstract entities in terms of
more concrete phenomena, but in recent decades the verbal metonym has also
been a focus of study by renowned linguists such as Barcelona (2000), Dirven and
Pörings (2002), Gibbs (1994), Panther and Thornburg (2003), Ruiz de Mendoza
(2000, 2002, 2011) and Taylor (2002), among others. While these authors have
based their research on verbal manifestations of language, Forceville (1996, 2002,
2006, 2009, 2010) has extensively studied the potential of visual metaphor and
metonymy in multimodal discourses.
doi 10.1075/bct.78.08moy
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
116 A. Jesús Moya Guijarro
In his several works on multimodal tropes, Forceville (1996, 2006, 2009, 2010)
has emphasized repeatedly that visual tropes may behave differently depending
on the genre in which they are used. Therefore, it is important for the theorization
of visual metaphors and metonymies to consider their occurrence in a variety of
genres and analyze how these create meaning in specific contexts. Specifically,
this article aims to explore the occurrence of visual metaphors/metonymies in
children’s picture books in order to study how visual tropes may contribute to
children’s understanding of the stories and, in turn, draw their attention towards
relevant aspects of the plot.
Forceville (1996, 2002) has analysed the manifestations of non-verbal met-
aphors in advertising discourse within a cognitive linguistic framework, high-
lighting that conceptual metaphors can be manifested not only verbally, but also
non-verbally and multimodally. He defines a multimodal metaphor as a cognitive
process in which the target and the source domains are represented completely
or predominantly in different modes (Forceville, 2006, p. 384). Through meta-
phor an entity or concept provides mental access to another phenomenon which
belongs to a different conceptual domain. In the sample texts that are studied in
this article, no visual metaphors have been identified. This is most likely due to
the fact that they are picture books intended for children under nine (Cerrillo &
Yubero, 2007; Moya & Ávila, 2009), whose cognitive abilities are still in the pro-
cess of development. Consequently, these young readers are still not in a mature
enough cognitive state in order to understand the communicative potential of
metaphorical constructions.
Nevertheless, the illustrators have used monomodal visual metonymies to
transmit relative aspects of the narrative content of the stories to the young read-
ers through the visual mode1 (Forceville, 2009). Two well-known 20th century
children’s picture books, which respond to a standard of literary quality (Cerrillo
& Yubero, 2007) have been selected as sample texts to study the trope of visual
metonymy. In the selected stories, the texts, as well as the illustrations, play a
fundamental function in the construction of the plot, as is usually the case of tales
intended for young children between 0 and 9 years of age (Cerrillo & Yubero,
2007; Hunt, 2004; Moya & Ávila 2009). The two books selected for analysis are
Gorilla (Browne, 2002[1983]), written and illustrated by Browne, and The Tale of
Peter Rabbit (Potter, 2002[1902]), written and illustrated by Potter. These stories
are contemporary classics, that is, works that can be considered models to imitate
due to their notable literary quality. They are known to have persevered with suc-
cess among children, generations after they were created or written.
The article is structured in the following way. After the introduction, in
Section 2 the main features of the concept of visual metonymy are outlined.
modes. He explains the necessity of dealing with non-verbal tropes as follows: “If,
as Lakoff and Johnson famously claim, ‘the essence of metaphor is understand-
ing and experiencing one kind of thing in terms of another’ (1980, p. 5), it is
inevitable that scholars investigate other modes/modalities than language alone”
(Forceville, 2010, p. 57). In line with Lakoff and Johnson (1980, p. 36), Forceville
(2009, p. 69) considers that metonymy fulfils a referential function as it involves
a cognitive process by means of which, in a specific context, we use an entity to
stand for another that belongs to the same conceptual domain. In his analysis of
advertising discourse, Forceville (2009, p. 69) demonstrates that context is crucial
to understand the metonymic relationship that is established between a target and
a source domain.
Following Bartsch (2002, pp. 50–51) and Warren (2002, p. 123), Forceville
(2009, pp. 57–58) also acknowledges that there is always a reason for a speaker/
writer to use a metonymy in a specific context and that this reason can be ex-
plained in terms of relevance (Sperber & Wilson, 1995) and communicative in-
tentions (Gibbs, 1999). In fact, metonymies are frequently used to highlight some
aspect of the message and attract the reader’s attention to relevant parts of a mul-
timodal ensemble. Forceville (2009, p. 58) affirms that, “the choice of metonymic
source makes salient one or more aspects of the target that otherwise would not,
or not as clearly, have been noticeable, and thereby makes accessible the target
under a specific perspective […].” The use of a metonymy often implies a change
in salience or perspective.
Along this line of thought, Yu (2009) and Urios-Aparisi (2009) have also
studied the non-verbal manifestations of metaphor and metonymy in educa-
tional TV advertisements and TV commercials, respectively. In turn, Hidalgo &
Kraljevic-Mujic (2011) have also analyzed the role of multimodal metaphors and
metonymies in ICT advertising discourse. In addition, Pérez (2011, 2013a, 2013b,
2014) also attempts to theorise the concept of metonymy in combination with
notions from picture/film theory by analysing green washing and environmen-
tal campaigns in advertising discourse. Although she is in the initial stages of
her research, by adopting Ruiz de Mendoza & Díez’s (2002) and Urios-Aparisi’s
(2009) accounts to metaphor-metonymy interaction as well as Forceville’s (2002)
multimodal proposal on pictorial metaphor, she has found out that metaphor and
metonymy play a key role in the construction of persuasive meaning in advertis-
ing discourse. The current article attempts to contribute to the research on non-
verbal metonymy in discourse by exploring the potential of utilizing this trope in
a sample of two picture books intended for children. My interest resides essential-
ly in researching how visual metonymy contributes to representing the narrative
reality that visual artists try to transmit to their young readers in picture books.
3. The analysis
3.1 Gorilla
Gorilla by Anthony Browne is the first picture book analysed in this article.
The story is constructed on the basis of the relationship established between
Hannah, the protagonist, her real father and a gorilla, who acts as a stand-in-fa-
ther. Hannah is a lonely girl who is looking forward to seeing a real gorilla. Her
father, however, does not have enough time to take her to the zoo. The night be-
fore her birthday, something amazing happens. During the night, the toy gorilla
she gets as a birthday present comes to life and takes Hannah to the zoo to see the
gorillas. In the morning her real father also invites Hannah to go to the zoo, add-
ing a happy ending to the story.
The trope of visual metonymy plays a key role in the construction of the
narrative reality transmitted in this picture book. Sometimes, the protagonists
referred to in the verbal language are reflected in the illustrations through part/
whole metonymies. Eight visual metonymies have been identified in Gorilla, and
they contribute to emphasize some important aspects of the story’s plot. In each of
these, the target and the source belong to the same conceptual domain. In double
spread 9, for example, reproduced here as Plate 1, there are two close-ups that
show the face of two primates.2 The primates are represented through a visual me-
tonymy as one part (their heads) stands for their whole (the animals themselves),
attracting the reader’s attention in a special way. The use of visual metonymy in
these demand images, those that establish eye-contact between the represented
participants and the viewer (Kress & van Leeuwen 2006, p. 118), encourages the
child-reader’s empathy with the two characters depicted, in this case an orang-
utan and a chimpanzee. The face of the primates is the focus of the two illustra-
tions and through them the illustrator reveals their sad faces behind the bars of
their cages. Although they are wild creatures, the illustrator gives them humanised
features as they demonstrate a certain sadness in their eyes. This may be due to
their lack of freedom. In both cases, Browne uses frames within frames since the
Represented Participants (RPs) are presented within square and rectangular en-
closures and covered by bars which in turn are metonyms for cages. The pictures
clearly reflect that both animals are looking directly at the viewer, seeking support
and perhaps even imploring the reader to free them from their imprisonment. As
Gill (2002, p. 57–58) acknowledges, “the effect of their direct engagement with the
viewer is evocative, with the viewer almost feeling the animals beseeching them
to free them. The fact the animals are depicted at close range enhances this effect,
since the reader is positioned to engage with them on an intimate level.” In picture
books, this type of metonymic demand image is not common, as its utilization
usually interrupts the development of the narrative plot. However, Browne has
used them to achieve a strong engagement between the RP and the child, and
forge the identification of the latter with the animals.
Apart from these two part-whole metonymies (face-character), which show
the sad faces of the two primates locked in cages, there are also another two close-
ups in double spread 13 that reveal the face of Hannah smiling with the gorilla. In
this case, the reference to their faces is again a metonymy for the characters them-
selves, who communicate through visual language the way they feel. Here Hannah
is looking at a toy gorilla and the viewer can only see the character’s head. Hannah
is lying down in bed almost covered up to her head. Only her hair, one of her eyes
and her nose can be seen. Through this part-whole metonymy, Browne places
emphasis on the visual contact between Hannah and the gorilla, since the met-
onymic representation of the characters establishes the gazes that are exchanged
between them. This clearly contrasts with the lack of visual and verbal communi-
cation that characterizes the relationship between the main character and her real
father, as they never make eye contact with each other (Gill, 2002; Moya, 2011a).
In this way, the metonymy suggests an intimate/personal relationship that pro-
duces the effect of close proximity between Hannah and her substitute father, the
gorilla. Thus, while Hannah, the primates at the zoo and the gorilla are sometimes
depicted showing their feelings to the child-reader, Hannah’s father never appears
in a close-up or frees his emotions. In fact, he is never depicted in a demand
robe for Hannah, represents the protagonist in a more abstract way. The function
of the trope in this specific case is to accentuate the sensation of speed (a flash of
her red robe speeding down the stairs is enough to refer to Hannah in her totali-
ty), supported by the material process “rush”, referred to in the verbal component:
“Hannah rushed downstairs to tell her father what had happened.” The blur of red
reveals that the character is Hannah, since in the following illustration printed on
the same double spread, Hannah appears sitting on a chair in front of her father
with a red robe on.
I will now identify the metonymies that play a key role in the construction of real-
ity in the thirty-two double spreads that are contained in The Tale of Peter Rabbit,
in which verbal and visual elements are intertwined in the verso and the recto of
the pages that make up the story. Before identifying the visual metonymies found
in them, I will briefly refer to the plot of the story. Peter disobeys his mother and
trespasses on Mr McGregor’s garden where he greedily enjoys a feast of vegetables
until he feels sick. He is almost caught by the owner of the farm, Mr McGregor,
and runs the risk of losing his life on several occasions. After some risky encoun-
ters with Mr McGregor, Peter manages to escape from the garden and comes back
home where his mother gives him a dose of chamomile tea to relieve his pain.
Meanwhile, Peter’s sisters, Flopsy, Mopsy, and Cotton-tail, who have been good
girls, have bread, milk and blackberries for supper.
The first metonymy is identified in the first illustration of the tale and fulfils
the discourse function of introducing the main character to the reader, making a
specific feature of his personality evident. In the first double spread, the rabbits are
located near the tree trunk where they live. Mrs Rabbit, the mother of the litter,
directs her gaze directly at the viewer, inviting him into the story and introduc-
ing her children, as the verbal language also does through a there-construction:
“ONCE UPON A TIME there were four little Rabbits, and their names were –
Flopsy, Mopsy, Cotton-tail and Peter”. In this scene, although both the verbiage
and the illustration are essential to the creation of the story, the illustration
conveys more relevant information about Peter’s personality than the words do
(Moya, 2010). Peter is represented through a visual metonymy: Peter’s tale is met-
onymic for Peter. The picture clearly reflects that Flopsy, Mopsy and Cotton-tail
do not share the same personality as the main character. While his sisters show
their heads to the reader, Peter is playing in the burrow and only his backside can
be seen. He is absorbed in his own world and reveals a different attitude. Thus, the
visual component seems to anticipate the rebel nature of the main character, who
later trespasses Mr. McGregor’s garden, not heeding his mother’s advice.
Another interesting metonymy, the ears of the rabbit for Peter, is found in
double spread 19. The illustration reflects that Mr McGregor and Peter are in the
shed. Peter is hidden in a watering can. Only his ears can be seen. Presumably
the part-whole metonymy, ears for Peter, shows that the rabbit, aware of the risk
he is running, is trying to hide from the old man, who wants to capture him and
probably turn him into a rabbit pie. At least this was what happened to his father
when he broke into the farmer’s fields in the past. Mrs. Rabbit reminds Peter of
this ordeal before she sends her litter into the fields: “Your Father had an accident
there; he was put in a pie by Mrs. McGregor” (double spread 3). In turn, this
metonymy puts the child-viewer in a position of dominance as he knows more
about what is going on than either Peter, the main character, or Mr McGregor.
The enemy of the rabbit, Mr McGregor, unlike the child-viewer, does not know
where the protagonist is hidden (Moya, 2010). Through this metonymy, Potter
manages to put the child-viewer on the protagonist’s side, since the main charac-
ter involves him directly in the plot. This picture book was initially intended for
children of the English middle class in the Victorian era. This period was charac-
terised by strict and conservative manners in court and in children’s education. In
line with the moralising literature addressed to children that Potter was familiar
with, the author, who also doubles as illustrator, probably followed the ideolog-
ical requirements of the Victorian period. And, indeed, some moralistic values
predominate in the verbal narrative: whilst Peter’s sisters behave properly and,
consequently, are rewarded at the end of the tale with a nice supper, Peter, after
having disobeyed his mother, is almost killed by Mr McGregor and ends up with a
stomach-ache. However, the tale seems to be more than a story in which a charac-
ter has exposed himself to a risky situation by disobeying his mother’s advice. As
has been observed in the critical literature (Carpenter, 1989, p. 279; Scott, 2001,
p. 29), Potter’s voice seems to be that of a rebel in defence of liberty and natural
instinct (Moya, 2010). What the author may not be able to express in words is re-
flected in the visual component of the story. Despite Peter’s disobedience, Potter’s
visual techniques move the reader to be on the side of the defenceless and scared
rabbit and also perhaps to wish for his escape from his oppressor. Therefore, the
narrative function of this second metonymy is to encourage the child’s empathy
towards the main character in the story.
Finally, illustration 20, reproduced here as Plate 4, is evidence of another me-
tonymy, also used by Potter to make the child-viewer be in favour of the pro-
tagonist. The large foot of Mr McGregor is just about to step on the little rabbit.
Once again there is a metonymy that reflects a part (Mr McGregor’s boot) for a
whole (the old man), while at the same time intensifying the narrative tension
and the immediacy of the threat. Peter is about to be stepped on by his aggressor
near a small window, also reflected metonymically. The fact the Mr McGregor is
reflected through metonymy gives a greater sense of tension in the plot, as the
young reader can contemplate how close the protagonist, the little rabbit, is to
being trapped by the adult. If Mr McGregor had been represented completely, the
image, without a doubt, would offer less narrative tension and the danger would
be less imminent. In turn, the part-whole metonymy, foot for Mr. McGregor,
is probably used by Potter to dehumanize the old man, who is depicted more
as an entity of destruction than as a human being. The studs on the sole of Mr
McGregor’s boot contribute to representing the farmer as a dangerous man who
is utterly devoid of human feelings.
4. Conclusions
luck. In this manner Potter encourages the readers to empathise with the protago-
nist of the story, since she makes them an accomplice of the current predicament
that Peter is in at this narrative moment. At other times, the protagonist, Peter, is
represented through a visual metonymy in order to create narrative tension. Evi-
dence of this is the metonymy Mr. McGregor’s boot for Mr McGregor, when he is
just about to step on Peter before he escapes from the toolshed where he tried to
hide to avoid being caught by the old man. This metonymy intensifies the narra-
tive tension in the story as it shows that Peter is just about to be stepped on by one
of the old man’s big feet.
To the two main functions of visual metonymies already referred to, another
function may be added. The utilization of visual metonymies may encourage the
interaction of the child-viewer with the adult in various ways, be it teacher, parent
or grand-parent. The kind of metonymic depiction used in the tales studied in
this article may lead to questions such as: (i) where is Peter in this picture?, what is
this (shoe) on the right-hand side of the picture, and whose is it? (The Tale of Peter
Rabbit), and (ii) “what is this red blob in the left-hand part of the illustration?, is
she Hannah or someone else and how do you know? (Gorilla). This way the child
will be socialized into the reading experience and will adopt an active role in the
understanding of visual language. In addition, by being involved in this dialogic
linguistic experience the child reinforces his apprenticeship into the dialogic na-
ture of language, where the mutual completion of utterances plays an important
role (Purver, Howes, Healey, & Gergoromichelaki, 2009). Thus, the pictures in
the two tales do not serve to merely illustrate the stories, as they do in books for
older children, but to involve the child-viewer in dialogic interaction with the
adult (Moya, 2011b).
The analysis carried out also reveals that, although Forceville (2009, 2010)
distinguishes different types of metonymies (producer for product, object for
user, controller for controlled, institution for people responsible, the place for the
institution, the place for the event, etc.), the most common, at least in our sample
texts, is that in which a part stands for the whole. The part-whole metonymy is
based on the premise that we perceive paintings or photographs not by looking
at their individual elements, but discursively, that is, their elements are contem-
plated as arranged into a visual syntax and constituting a whole (Bal, 2006, p. 69).
Thus, when we look at a visual composition, any element can act metonymically
and one of its parts can stand for the whole. The characters are not always repre-
sented totally in the visual mode; in some stages of the tales, only some parts of
their bodies are drawn to refer to them in general. These visual representations
emphasise some key aspects of the plot of the stories, making them more notice-
able, as the analyses carried out in Section 3 demonstrate.
In addition, the results of the study also show that, in agreement with
Forceville’s (2009, p. 71) findings, the non-verbal metonymies found in the sample
of picture books are of the type source-in-target rather than target-in-source me-
tonymies. Ruiz de Mendoza (2000) and Ruiz de Mendoza and Díez (2002) propose
a distinction between target-in-source and source-in-target metonymies. Whilst
in the former a superordinate or matrix domain (source) stands for a subdomain
(target), in the latter a subdomain (source) establishes a metonymic relationship
with a matrix domain (target) through an expansion process. Ruiz de Mendoza
and Díez (2002, p. 495–499) point out that the first type of metonymy involves a
“domain reduction” while the latter, source-in-target metonymy, is understood in
terms of “domain expansion.” Picture books seem to adapt to the model of source-
in-target domain, as parts of the body of the represented participants depicted in
the stories stand for their whole. By means of these “domain expansions” (Ruiz
de Mendoza & Díez, 2002, p. 495–499), the illustrator highlights some relevant
aspects of the plot and creates a bond between the characters of fiction and the
child-viewer.
Writers and illustrators need to know how visual metonymies may be ex-
ploited to create interaction and represent the narrative reality without surpass-
ing the cognitive capacities of the children for whom the stories are written and
illustrated. Knowing how to create narrative tension and how to generate interac-
tion between the characters in tales and the child-viewer by means of visual me-
tonymies can stir children’s interest in picture books and, in turn, help educators
encourage children to read.
Notes
2. Close-ups show the head and shoulders or even less; sometimes only the face is visible
(Kress & van Leeuwen, 2006).
4. It is interesting to note that while Hannah and her father are often depicted from a high
angle, the gorilla is the only represented participant that is shown from a low angle, suggest-
ing the power he has over Hannah. The use of high angles proves Hannah’s vulnerability at
home when she feels lonely and is not capable of attracting her father’s attention. The vertical
angle transmits two types of power relationships, that between the RPs and the viewer, and that
between the RPs within an image (Kress & van Leeuwen, 2006). We can be positioned from a
high, low or eye-level angle. When the point of view is arranged upwards or downwards along a
vertical axis, an increase or a diminution of power over the RPs can be experienced: the viewer
has power over the RP if it is projected from a high angle. However, the RP has power over the
viewer if seen from a low angle. Finally, RPs aligned at eye level angles have equal power status
with their viewers (Kress & van Leeuwen, 2006).
References
Barcelona, A. (Ed.). (2000). Metaphor and metonymy at the crossroads: A cognitive perspective.
Berlin/New York: Mouton de Gruyter.
Bartsch, R. (2002). Generating polysemy: Metaphor and metonymy. In R. Dirven & R. Pörings
(Eds.), Metaphor and metonymy in comparison and contrast (pp. 49–74). Berlin/New York:
Mouton de Gruyter.
Bal, M. (2006). Reading ‘Rembrandt’: Beyond the word-image opposition. Amsterdam: Amster-
dam University Press.
Browne, A. (2002[1983]). Gorilla. London: Walker Books.
Carpenter, H. (1989). Excessively impertinent bunnies: The subversive element in Beatrix Pot-
ter. In G. Avery & J. Briggs (Eds.), Children and their books: A celebration of the work of Iona
and Peter Opie (pp. 271–298). Oxford: Clarendon Press.
Cerrillo, P., & Yubero, S. (2007). Qué leer y en qué momento [What to read and when]. In
P. Cerrillo & S. Yubero (Eds.), La formación de mediadores para la promoción de la lectu-
ra. Segunda Edición [Training specialists in the promotion of reading. Second edition]
(pp. 285–293). Cuenca: Servicio de Publicaciones de la Universidad de CastillaLaMancha.
Dirven, R., & Pörings, R. (Eds.). (2002). Metaphor and metonymy in comparison and contrast.
Berlin/New York: Mouton de Gruyter.
Forceville, C. (1996). Pictorial metaphors in advertising. London/New York: Routledge.
DOI: 10.4324/9780203272305
Forceville, C. (2002). The identification of target and source in pictorial metaphors. Journal of
Pragmatics, 34, 1–14. DOI: 10.1016/S0378-2166(01)00007-8
Forceville, C. (2006). Non-verbal and multimodal metaphor in a cognitive framework. Agen-
das for research. In G. Kristiansen, M. Achard, R. Dirven & F. Ruiz de Mendoza (Eds.),
Cognitive linguistics: Current applications and future perspectives (pp. 379–402). Berlin/
New York: Mouton de Gruyter.
Forceville, C. (2009). Metonymy in visual and audiovisual discourse. In E. Ventola & A.J. Moya
(Eds.), The world told and the world shown: Multisemiotic Issues (pp. 57–74). Basingstoke/
New York: Palgrave Macmillan.
Forceville, C. (2010). Why and how study metaphor, metonymy, and other tropes in multimod-
al discourse? In R. Caballero & M.J. Pinar (Eds.), Ways and modes of human communica-
tion (pp. 57–76). Cuenca: Ediciones de la Universidad de CastillaLa Mancha.
Forceville, C., & Urios-Aparisi, E. (Eds.). (2009). Multimodal metaphor. Berlin/New York:
Mouton de Gruyter. DOI: 10.1515/9783110215366
Gibbs, R. (1994). The poetics of mind: Figurative thought, language, and understanding.
Cambridge: Cambridge University Press.
Gibbs, R. (1999). Intentions in the experience of meaning. Cambridge: Cambridge University
Press.
Gill, T. (2002). Visual and verbal playmates: An exploration of visual and verbal modalities in
children’s picture books. Unpublished B.A. (Honours), University of Sydney, Australia.
Hidalgo, L., & Kraljevic-Mujic, B. (2011). Multimodal metonymy and metaphor as complex
discourse resources for creativity in ICT advertising discourse. Review of Cognitive Lin-
guistics, 9(1), 153–178. DOI: 10.1075/rcl.9.1.08hid
Hunt, P. (Ed.). (2004). International companion encyclopedia of children’s literature. Second Edi-
tion. Volume I. London: Routledge.
Radden, G., & Kövecses, Z. (1999). Towards a theory of metonymy. In K.U. Panther &
G. Radden (Eds.), Metonymy in Language and Thought. Amsterdam: John Benjamins.
DOI: 10.1075/hcp.4.03rad
Kress, G., & van Leeuwen, T. (2006 [1996]). Reading images. The grammar of visual design.
London: Routledge.
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind.
Chicago: University of Chicago Press. DOI: 10.7208/chicago/9780226471013.001.0001
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University Chicago Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge
to western thought. New York: Basic Books.
Moya, A.J. (2010). A multimodal analysis of Tale of Peter Rabbit within the interpersonal meta-
function. ATLANTIS, 32(1), 123–140.
Moya, A.J. (2011a). Engaging readers through language and pictures. A case study. Journal of
Pragmatics, 43(12), 2982–2991. DOI: 10.1016/j.pragma.2011.05.012
Moya, A.J. (2011b). A bimodal and systemicfunctional study of Dear Zoo within the textual
metafunction. Revista Canaria de Estudios Ingleses, 62, 123–138.
Moya, A.J., & Ávila, J.A. (2009). Thematic progression of children’s stories as related to different
stages of cognitive development. Text and Talk, 29(6), 755–774.
DOI: 10.1515/TEXT.2009.038
Panther, K.U., & Radden, G. (Eds.), (1999). Metonymy in language and thought. Amsterdam/
Philadelphia: John Benjamins. DOI: 10.1075/hcp.4
Panther, K.U., & Thornburg, L.L. (Eds.), (2003). Metonymy and pragmatic inferencing. Amster-
dam/Philadelphia: John Benjamins. DOI: 10.1075/pbns.113
Pérez, P. (2011). Don’t be so green. Analysis of the interaction between multimodal metaphor
and metonymy in greenwashing advertisements. Available at: http://sites.google.com/site/
metaphormetaphor2011/abstractscontributedspeakers/metaphorandmedia.
Logroño: University of La Rioja.
In this paper we show that some notions from the textual organisation of verbal
texts appear also to give insights to the organisation of films. In particular, the
beginnings of films are suggested to operate as indicators of those films’ ‘meth-
od of development’ and so serve to set up expectations for guiding hypotheses
and selective attention during film viewing. By means of a small exploratory
study, we demonstrate that film beginnings exhibit differing organisational fea-
tures that correlate with the overall narrative strategies pursued in the films as
a whole. These features may then function as useful indicators for viewers con-
cerning just what interpretative challenges they will face later in the film.
It is a commonplace that one of the open problems in dealing with strongly mul-
timodal artefacts such as film is their reliance on combinations of very different
information channels – channels traditionally listed in the case of film as spo-
ken language, written language, visual image, music and ambient sound. This be-
comes even more challenging when we move further away from a sensory-based
view of the information contributions and consider instead, or in addition, the
multitude of semiotic modes potentially involved, such as gesture, proxemics,
colour schemes, clothing, spatial relationships, and many more (Metz, 1974;
Monaco, 2009). Given this, it is clear that viewers must be being quite selective
in their allocation of attention: the dynamic unfolding of audiovisual representa-
tions in real-time would otherwise overwhelm the viewer rather than giving rise
to the broadly similar responses to film actually observed. That selectivity should
play a role here is in line with results in perceptual psychology that indicate that
doi 10.1075/bct.78.09bat
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
132 John A. Bateman and Chiaoi Tseng
“perception is selective: we attend to objects that bear salient meaning for certain
goals” (Gibson, 1979), a notion which has now also been applied in psychologi-
cally-oriented approaches to film (Anderson, 1996; Persson, 2003; Smith, Levin,
& Cutting, 2012; Wuss, 2009). Moreover, because of the nature of film as a de-
signed artefact, there is good evidence to believe that the perceptual guidance that
films exhibit is in many respects intended: that is, film makers explicitly construct
films precisely so that the attention of viewers is directed along paths that contrib-
ute to desired affects.
A further research question that must be raised is just how this selection and
guidance of attention comes about. Naturally the audiovisual properties of the
filmic material being experienced can guide attention in various ways – Smith
and Henderson (2008), for example, show how various aspects of movement at-
tract viewers’ attention at a very early stage in processing. The role of particular
distinctive combinations of properties for heightening suspense or emotional af-
fect has also been considered (Carroll, 2008; Grodal, 1999). Both facets build on
a long assumed connection between film perception and natural perception (cf.
Münsterberg, 1916). While by no means denying that this connection plays a sig-
nificant role when interpreting film, the phenomenon that will be explored in this
article is rather different. We will suggest that films not only combine contribu-
tions to meaning analogous to sensory perception in the real world but that this
combination is itself strongly guided by further mechanisms that are reminis-
cent of aspects of discourse organisation revealed in the study of natural language
texts. These two perspectives on the mechanisms involved in film interpretation
and understanding have been characterized in Bateman and Schmidt (2012,
p. 142) in terms of two contrasting families of ‘codes’: the reality codes, which
build on film’s audiovisual iconic nature as perceptually real and bring to bear
all the potential interpretative practices available in the interpretation of real-life
events, and the representation codes, which are specific to artefacts which employ
textual structuring for pro-actively shaping and guiding audience response and
interpretation.
The former family of codes has naturally been given most attention by re-
searchers concerned with psychological processes and the cognitive modelling
of film; indeed, as Smith et al. (2012) make clear, the connection drawn between
film and natural perception has been cited as one of the principal motivations
for psychologists to concern themselves with film at all. In this paper, however,
we will suggest that the other family of codes, although often overlooked, is also
highly relevant for explaining viewers’ responses to films. It is not only the mo-
ment-by-moment assessment of a perceptual input that is significant, but also the
entire scaffold of potential interpretations constituting textual organisation that
must be considered.
text’s recipient negotiate the complex textual structures being constructed. This
signposting function is most pronounced in more complex, written language,
since it is there that there is most need for guidance concerning how a text is go-
ing to develop – there is little opportunity in written text to interrupt and reorient
oneself as to where the text producer is attempting to go.
Transferring this kind of organisation back to the situation with film is sug-
gestive for several reasons. Given the broader range of material available for sig-
nalling structural relationships in film, one logical hypothesis is that this might
be put to good effect, particularly at the beginnings of films, for establishing a
scaffold of expectations for recipients concerning how the film is going to unfold
subsequently. These expectations could then be used to pre-structure interpreta-
tions and to serve as a convenient source of guidance concerning where attention
should be paid, thus marking a bridge, or cross-over point, between the textual
organisation provided by the representation codes and the operation of the per-
ceptual system.
A special role for the beginnings of films has often been suggested in film
theory. As Bordwell, for example, states:
The sequential nature of narrative makes the initial portions of a text crucial for
the establishment of hypotheses. A character initially described as virtuous will
tend to be considered so even in the face of some contrary evidence; the initial
hypotheses will be qualified but not demolished unless very strong evidence is
brought forward. (Bordwell, 1985, p. 38)
This function of the initial portions of a film is also often seen in terms of psy-
chological processes such as the primacy effect and priming. This is presumed to
charge the material which is first encountered with a particular salience for pro-
viding a frame for interpretation for what follows (cf. Luchins & Luchins, 1962).
The broadest and most detailed account of film beginnings to date is that set
out by Hartmann (2009). Hartmann identifies several perspectives that have been
taken on film beginnings and provides extensive discussion and examples, each
offering different insights on the functions being performed for viewers when a
film begins. These perspectives include: the ‘point of attack’ for the exposition, i.e.,
that particular event, moment, etc. chosen to lead into the narrative; an exempla-
ry microcosm reflecting the world of the film; a point of equilibrium/disturbance
leading into the narrative arc as set out by Propp (1968) and subsequently sug-
gested as a standard model for film screenplays by Vogler (1998); a densely coded
matrix of connections symptomatic for the rest of the film and so functional for
the viewers’ subsequent comprehension processes; a textual prelude, similar in
role to preludes or overtures in music; the ‘threshold’ between everyday reality
and the world of the film; ‘instructions for use’ or training material for how to
interpret the film; and, finally, the place where a ‘communication contract’ is made
between producer and audience on what they are to expect.
Several of these perspectives clearly overlap with the functions suggested for
macro-themes. Just as macro-themes set out expectations for methods of devel-
opment, the use of film beginnings as ‘training material’ or the dense establish-
ment of the techniques to be employed for the remainder of some film similarly
establish clear predictions of what kind of development strategies that film will
employ and so may help direct the interpretative hypotheses that viewers enter-
tain. Moreover, early on in film semiotics, theorists such as Metz (1974) and Heath
(1975) suggested that each film, as an aesthetic artefact, creates its own ‘system’
as well as deploying semiotic codes established external to the film. Building on
this, Bordwell (1985, p. 38), Hartmann (2009, p. 106) and others suggest that it is
in precisely this respect that film beginnings may come to play a central narrative
function. This is echoed very closely in the role claimed for macro-themes above:
film beginnings as macro-themes may then also be taken as providing guidance
into the filmic system of each individual film just as in text they provide guidance
and predictions concerning the development of each individual text.
One of the motivations for accepting macro-theme/hyper-theme analyses for
natural language texts, however, is that it is possible to find repercussions of this
textual structure in the actual linguistic forms and structures that are selected.
Unless there were some identifiable consequences in the observed linguistic ma-
terial, there would be no grounds for assuming that macro-themes represent a
significant linguistic abstraction. In language, this works particularly through the
deployment of marked themes and other textual constructions as suggested in
our example of preposed spatial prepositional phrases above. The question raises
itself, therefore, of whether similar arguments can be made for presumed filmic
macro-themes. In the rest of this paper we will accordingly explore this further,
suggesting how a linguistically-motivated characterisation of filmic organisation
indeed allows us to see filmic macro-themes at work.
To carry out an empirical study, we need first to identify the filmic properties that
we are to analyse. Here we draw on previous work in which we have argued that
there are several quite specific kinds of discourse organisation at work in film,
sometimes similar to those of language, sometimes interestingly distinct. We have
described two of these kinds of discourse organisation at length elsewhere: filmic
discourse relations (Bateman, 2007) and filmic cohesion (Tseng, 2008, 2012); a
brief overview and application of both these aspects of filmic discourse is also
S2 S3a S3b S4
Figure 1. The first four shots of Alfred Hitchcock’s The Birds (1963)
provided in Tseng and Bateman (2012). For current purposes, we will simply sug-
gest by example the kinds of textual organisations that these aspects of filmic dis-
course organisation employ so that the subsequent discussion can be followed.
We illustrate both kinds of analysis briefly with respect to the opening of
Alfred Hitchcock’s The Birds, as shown in Figure 1. The first shot (S1) is a complex
tracking shot following a character that gradually comes into closer focus against
the backdrop of busy streets, going behind a poster of San Francisco (S1c) along
the way. She looks up at the sky, seeing massed birds (S2) and then proceeds into
a pet store (S3). The fragment ends with her going up some stairs inside the pet
store (S4).
Filmic discourse relations are relations postulated to hold between film seg-
ments in terms of temporality, spatiality, epistemic status and audiovisual struc-
tural dependence. They draw on the notion of conjunctive relations proposed for
verbal language by Martin (1983) and as extended for the moving audiovisual
image by van Leeuwen (1991). Building on this, Bateman (2007) and Bateman
and Schmidt (2012) argue that relations of this kind can also be taken as the basis
for constructing filmic discourse structures, although the specificity of the filmic
medium necessitates some changes with respect to the relations employed for ver-
bal language. These relations are seen to operate at a discourse level of analysis,
which means that the description may well cut across shot-boundaries; in this
respect, analyses employing filmic discourse relations are somewhat different to
traditional notions of inter-shot relations pursued in treatments of filmic montage
and relate more to notions of ‘events’ as explored in the cognitive study of film
(cf. Zacks, Speer, & Reynolds, 2009) or to ‘subphases’ in the discourse account of
Baldry and Thibault (2006). The discourse relation approach has also now been
taken further by Wildfeuer (2013), who provides a more formal framework for
capturing discourse relations for film that relates directly to the formal account
g s
g s
g s
at or pin us
s
in ou
in ou
in ou
in ou
p uo
ow nu
ow nu
ow nu
nd nu
g
tem erla tin
rr ti
rr ti
rr ti
te ti
g
na on
na on
na on
ov on
ex on
l: : c
l: : c
l: : c
l: : c
l: : c
ia al
ia al
ia al
ia al
ia al
a t or
a t or
a t or
a t or
sp mp
sp p
sp p
sp mp
sp p
tem
tem
te
te
[ ] [ ]
S1a S1b S1c S1d S2 S3a S3b S4
temporal: continuous temporal: continuous
spatial: ? spatial: broadening
hypotactic: insert projection
Figure 2. The filmic discourse relations holding among the first four shots of Alfred
Hitchcock’s The Birds (1963). Hypotactically embedded fragments (inserts) are marked
with square brackets and their relation types are shown below the main line of
fragments; other relations are shown above, labelling the arcs
Fragment Melanie Setting (a): SF street view Setting (b): petshop Birds
S1a [v] [v] (squawking)
S2 [v] [v]
Figure 3. The filmic cohesive ties holding between the first four shots of Alfred
Hitchcock’s The Birds (1963) organised into cohesive chains. [v] indicates visual
elements, (…) indicates aural elements, and “…” indicates printed or verbal linguistic
elements
other passers-by, fire hydrants or whatever else may be visible in any particular
shot) at first, because the fact that they do not reappear means that they will not
participate in cohesive ties. The backgrounded presentation of “Davidson’s Pet
Shop” already in fragment S1d is of this nature; had the main character simply
walked on following S1d, the cohesive chain for this named pet shop as part of the
setting would never have been established. The building of ties into chains and,
subsequently, the description of chains that are brought together in particular
actions or events, thus reveals quite clearly what the film itself is constructing as
textually significant for its development.
Our study now applies these analytic schemes to a selection of films in order to
explore whether the structuring effect of the opening sections of a film can be
seen in how these resources are deployed. For this we employ a small film corpus
and annotate their beginnings according to the organisations suggested by our
framework. The corpus consists of 20 films selected to include both ‘mainstream’,
traditional narrative organisations and some less traditional, non-linear narrative
structures. The purpose of including films varying in this way is to see whether
the very different ‘methods of development’ that such films make use of can al-
ready be detected in the film beginnings.
The non-linear films selected have been variously described. They are
sometimes characterised as ‘puzzle films’ (Buckland, 2009), ‘forking narratives’
(Bordwell, 2002), ‘multiple draft’ films (Branigan, 2002) and similar. Less extreme
cases are commonly described using the narratological concept of ‘unreliable nar-
ration’ (e.g., Booth, 1961; Hansen, 2007; Koch, 2011). There is therefore consid-
erable variation within the groupings discussed, as well as many open questions
concerning their definitions and demarcation. The idea behind our selection is
that for the majority of non-linear films, it is nevertheless the case that the film is
considered narratively coherent by viewers. This raises the question as to how this
is achieved and the role that a film’s beginning may play in bringing this about.
If a film’s beginning functions in any way similarly to macro-themes, it would
be expected that guidance for following even non-traditionally structured films
would still be forthcoming.
The film corpus is listed in Table 1, grouped into two subcorpora reflecting a
pre-theoretical classification according to their linearity. An analysis of the first 5
minutes was undertaken for each film. Within the discourse relation dimension,
relations between shots were classified according to whether they were spatial,
Table 1. Film corpus used in the exploratory study; for production details, the reader is
referred to the invaluable Internet Movie Database (IMDB): http://www.imdb.com
Non-linear
Blind Chance (1981) Hana-bi (1997)
Run Lola Run (1998) Code: Unknown (2000)
Memento (2000) Vanilla Sky (2001)
Oldboy (2003) Reconstruction (2003)
2046 (2004) The Fountain (2006)
Linear
The Thin Red Line (1998) The Sixth Sense (1999)
Donnie Darko (2001) Black Hawk Down (2001)
Synecdoche, New York (2006) The Prestige (2006)
Once (2006) Juno (2007)
My Blueberry Nights (2007) Mr Brooks (2007)
Moreover, since non-linear films are still generally perceived as coherent by view-
ers, it might be expected that there is actually a trade-off between the two filmic
discourse organisations: the very fact of non-linear narrative will mean that the
discourse relations, and particularly scene transitions, will violate chronological
development and, as a consequence, the cohesive ties established may need to
This means that in order to maintain coherence it should be the case that the re-
spective beginnings of non-linear films signal to the viewer that non-linear, more
cohesively based interpretative schemes are to be applied during the film.
The results of performing the filmic discourse relation analysis on the beginnings
of the 20 films of the corpus are suggested graphically in Figure 4. To improve
reliability the analysis in each case adopted shot boundaries as a common level
of granularity throughout. The graph shows how many transitions between shots
needed to be taken before the discourse relations were reliably identifiable, order-
ing the films analysed along the horizontal axis according to the increasing delay
holding before reliable identification becomes possible. For the last film in the
list, Christoffer Boe’s Reconstruction (2003), the uncertainty lasts a considerable
time into the film, persisting well beyond what might reasonably be considered
the ‘beginning’. For the linear films, the discourse relations are in fact never seri-
ously in doubt. Each shot follows on the other in the manner that we saw with
the The Birds. This distribution is certainly compatible with our hypotheses 1 and
2 above.
n:
to
46
n
ab
sk
nc
ru
io
ilm
ow
ta
en
db
20
ct
n
a
lla
un
la
rf
em
ch
kn
Ha
ru
Ol
lo
ni
fo
ea
st
un
d
M
Va
n
lin
n
e
in
Ru
Th
co
:
Bl
de
10
Re
Co
e
Th
scene H
Figure 5. Filmic discourse relations in the opening sequence of Blind Chance (1981)
scene C
“Daniel”
scene D “Witek”
(called by
(called by Daniel)
Witek)
scene H “He is my son
Witek” (spoken
by father)
Turning to the cohesive analysis, there are several possibilities for describ-
ing the various textures that are constructed. Indeed, in both linear and non-
linear films, cohesive structures are well established from the beginning of each
film. Considering again the opening from Blind Chance we find with respect to its
cohesive analysis a relatively dense interweaving of reoccurring filmic elements.
This is shown graphically in Figure 6. We suggest here that it is particularly sig-
nificant that the texture is created cross-modally: i.e., there are reoccurrences of
visuals and verbal elements which are bound together, often again employing es-
tablished continuity techniques. Thus, the boy that is shown in Scene B is identi-
fied as ‘Witek’ by being called verbally in Scene C. This verbal identification is
then picked up again in Scene D. From this we can see that cohesive structures
tracking the main characters’ identities are strongly established and may then be
available as a counterbalance to the severe uncertainty in the filmic discourse rela-
tions that apply.
When we explore the distribution of visual cohesion and visual cohesion
combined with verbal cohesion across the entire corpus (cf. Table 2), a system-
atic trade-off between cohesion and underspecified discourse relations is sup-
ported. The results show a suggestive difference between how cohesion patterns
operate in the linear and non-linear films. Although both linear and non-linear
Table 2. Cohesion results across the beginnings of the films of the corpus
Cohesive texture Linear Non-linear
Cohesive chains with specific verbal cues specifying names and 5 9
identities of characters
Cohesive chains with no verbal cues but with visual reappearance to 5 1
confirm main characters
films adopt cohesive patterns relying on visual and verbal ties, when we move to
non-linear films: much less reliance is made on the purely visual elements and
multimodal cohesive ties, such as naming, are employed in addition in order to
carry the viewer over material for which it is difficult, or impossible, to find visual
relations. The sample size here is too small to offer statements of statistical signifi-
cance but the trend is clear: the openings of non-linear films appear to exhibit pat-
terns of filmic discourse relations and filmic cohesion that distinguish them from
the linear films, thus supporting hypothesis 3. This difference may well provide
a more or less explicit message to the viewer that a different kind of method of
development than usual is to be expected for the film that is to follow.
5. Conclusion
As a summary of the analytic results reported above, we can therefore suggest the
following general film properties:
In addition, all three hypotheses above were supported, although further studies
need to be undertaken to evaluate the extent to which the tendencies revealed are
robust with respect to larger samples. Nevertheless, the apparent availability of
clear indicators for the kinds of filmic narrative strategies that are going to be em-
ployed within a film suggests that this source of guidance needs to be taken into
consideration whenever exploring viewers’ interpretative activities when watch-
ing film.
References
Anderson, J.D. (1996). The reality of illusion: An ecological approach to cognitive film theory.
Carbondale/Edwardsville: Southern Illinois University Press.
Asher, N., & Lascarides, A. (2003). Logics of conversation. Cambridge: Cambridge University
Press.
Baldry, A., & Thibault, P. (2006). Multimodal transcription and text analysis. London: Equinox.
Bateman, J.A. (2007). Towards a grande paradigmatique of film: Christian Metz reloaded. Se-
miotica, 167(1/4), 13–64.
Bateman, J.A., & Schmidt, K.H. (2012). Multimodal film analysis: How films mean. London:
Routledge.
Booth, W. (1961). Rhetoric of fiction. Chicago: Chicago University Press.
Bordwell, D. (1985). Narration in the fiction film. Madison, WI: University of Wisconsin Press.
Bordwell, D. (2002). Film futures. SubStance, 31(1), 88–104. DOI: 10.1353/sub.2002.0004
Branigan, E. (2002). Nearly true: forking plots, forking interpretations. A response to David
Bordwell’s ‘Film Futures’. SubStance, 31(1), 105–114.
Buckland, W. (Ed.). (2009). Puzzle films: Complex storytelling in contemporary cinema.
Chichester, U.K.: Wiley-Blackwell.
Carroll, N. (2008). The philosophy of motion pictures. Oxford: Oxford University Press.
Fries, P.H. (1995). Themes, methods of development, and texts. In R. Hasan & P.H. Fries (Eds.),
On subject and theme: A discourse functional perspective (pp. 317–359). Amsterdam: John
Benjamins. DOI: 10.1075/cilt.118.10fri
Ghadessy, M. (Ed.). (1995). Thematic development in English texts. London: Pinter Publishers.
Gibson, J.J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Grodal, T. (1999). Emotions, cognitions, and narrative patterns in film. In C. Plantinga & G.M.
Smith (Eds.), Passionate views: Film, cognition, and emotion (pp. 127–145). Baltimore/
London: The John Hopkins University Press.
Halliday, M.A.K. (1967). Notes on transitivity and theme in English – Parts 1 and 2. Journal of
Linguistics, 3, 37–81 and 199–244. DOI: 10.1017/S0022226700012949
Halliday, M.A.K., & Matthiessen, C.M.I.M. (2004). An introduction to functional grammar.
Third edition. London: Edward Arnold.
Hansen, P.K. (2007). Reconsidering the unreliable narrator. Semiotica, 165(1/4), 227–246.
Hartmann, B. (2009). Aller Anfang: Zur Initialphase des Spielfilms. Marburg: Schüren.
Heath, S. (1975). Film and system: terms of analysis. Part I. Screen, 16(1), 7–77.
Koch, J. (2011). Unreliable and discordant film narration. Journal of Literary Theory, 5(1), 57–
80. DOI: 10.1515/JLT.2011.006
Luchins, A.S., & Luchins, E.H. (1962). Primary-recency in communications reflecting attitudes
toward segregation. Journal of Social Psychology, 58, 357–369.
DOI: 10.1080/00224545.1962.9712387
Martin, J.R. (1983). Conjunction: the logic of English text. In J.S. Petöfi & E. Sözer (Eds.), Micro
and macro connexity of discourse (pp. 1–72). Hamburg: Helmut Buske Verlag.
Martin, J.R. (1992). English text: Systems and structure. Amsterdam: John Benjamins.
DOI: 10.1075/z.59
Metz, C. (1974). Language and Cinema. The Hague: Mouton. Translated by D.J. Umiker-Sebeok.
DOI: 10.1515/9783110816044
Monaco, J. (2009). How to read a film: Movies, media and beyond. 30th edition. Oxford, U.K.:
Oxford University Press.
Münsterberg, H. (1916). The photoplay: A psychological study. New York: D. Appleton and
Company.
Persson, P. (2003). Understanding cinema. Cambridge: Cambridge University Press.
DOI: 10.1017/CBO9780511497735
Propp, V. (1968). The morphology of the folktale. Austin, Texas: University of Texas Press. Orig-
inally published in Russian in 1928.
Smith, T.J., & Henderson, J.M. (2008). Edit blindness: The relationship between attention and
global change blindness in dynamic scenes. Journal of Eye Movement Research, 2(2)(6),
1–17.
Smith, T.J., Levin, D.T., & Cutting, J.E. (2012). A window on reality: Perceiving edited moving
images. Current Directions in Psychological Science, 21(2), 107–113.
DOI: 10.1177/0963721412437407
Tseng, C. (2008). Cohesive harmony in filmic text. In L. Unsworth (Ed.), Multimodal semiotics:
Functional analysis in contexts of education (pp. 87–104). London: Continuum.
Tseng, C. (2012). Audiovisual texture in scene transition. Semiotica, 192, 123–160.
Tseng, C., & Bateman, J.A. (2012). Multimodal narrative construction in Christopher Nolan’s
Memento: A description of method. Journal of Visual Communication, 11(1), 91–119.
DOI: 10.1177/1470357211424691
van Leeuwen, T. (1991). Conjunctive structure in documentary film and television. Continuum:
Journal of Media and Cultural Studies, 5(1), 76–114. DOI: 10.1080/10304319109388216
Vogler, C. (1998). The writer’s journey: Mythic structure for writers. Studio City, CA: Michael
Wiese Productions.
Wildfeuer, J. (2013). Film discourse interpretation: Towards a new paradigm for multimodal film
analysis. London: Routledge.
Wuss, P. (2009). Cinematic narration and its psychological impact: functions of cognition, emo-
tion and play. Newcastle: Cambridge Scholars.
Zacks, J. M, Speer, N. K, & Reynolds, J.R. (2009). Segmentation in reading and film comprehen-
sion. Journal of Experimental Psychology, 138, 307–327. DOI: 10.1037/a0015305
Digital stories are a very recent multimedia practice by which ordinary people
construct short narratives on personal affairs combining voice, images and
sometimes music. This paper contributes to the description of this new emer-
gent genre from both a multimodal and a cognitive point of view, by exploring
how diverse semiotic channels in digital storytelling provide different kinds of
information (factual, emotional, cultural, etc.) which are finally integrated to
construct the global meaning of the narrative. For this purpose, we combine
Kress and Van Leeuwen’s (1996) scholarly work related to multimodal repre-
sentation, with the use of some notions of the Mental Spaces and Conceptual
Integration theory (Dancygier, 2008; Fauconnier & Turner, 2002). The results of
this study are of interest to those concerned with the representational and com-
municational modes of semiotic resources in storytelling.
1. Introduction1
doi 10.1075/bct.78.10alo
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
148 Isabel Alonso, Silvia Molina and María Dolores Porto
aspects of how to synchronise images, voice and background music, or even mat-
ters of copyright when using published material for their stories.
This paper’s main aim is to contribute to the multimodal description of this
new emerging genre from a cognitive standpoint. To our knowledge, this com-
bined approach of multimodal analysis and cognitive linguistics has not been ex-
plored yet in previous research on digital storytelling. Thus, our objective is to
explore how both the Internet and the technical resources for integrating sound,
image and text affect the receivers’ interpretation of these digital narratives and
add interpersonal (emotional) build up in the receiver’s mental image of the sto-
ries. For this purpose, we have analysed thirty randomly chosen digital stories
from several organizations and project websites.
From a cognitive perspective, narratives must be regarded as more than a
mere succession of causal/temporal events. Some recent theories see narratives as
a complex network of mental spaces which combine and blend in order to reach
a final emergent mental representation (Dancygier, 2008; Semino, 2009; Turner,
2008). Listeners and readers are thus not mere recipients of the stories, but active
participants in the co-construction of the narrative. Multimodality adds com-
plexity to both the production and interpretation of digital stories. In a very short
time span, the information provided by three different channels – verbal, visual
and auditory – must be processed and combined to construct a global representa-
tion of the story. Therefore, the interaction between the different modes and the
way it affects the processing of the narrative is essential for the analysis of the
stories and for the understanding of the effects they produce on the receivers. The
content of the narrative, that is, the instructions given to the interpreter in order
to construct the mental representation of the story, is distributed across those
three channels, or “information tracks” (Herman, 2009, p. 80), and contains not
only factual information, but also cultural and emotional aspects that play a main
role in the final representation. Therefore, the present analysis can be inscribed
within the recent trend of transmedial narratology (Herman, 2004; Ryan, 2005),
which claims that the constraints and affordances associated with a given medium
affect the narrative and the construction of the receiver’s mental image.
In the first section of this work, we present some of the main features of digi-
tal stories as a new emergent genre. Following, there is a description of the sample
where we provide some explanations of the methodological tools we have com-
bined in order to account for the multiple dimensions of digital stories. On the one
hand, we carried out a joint application of Unsworth and Cléirigh’s model (2009,
pp. 151–163) and Kress & van Leeuwen’s (2006) grammar of visual design in or-
der to advance understanding of how images and language interact to construct
ideational meaning. On the other, we make use of some notions of the Mental
Spaces and Conceptual Integration theory as applied for narratives (Dancygier,
2008; Fauconnier & Turner, 2002; Porto & Romano, 2010; Semino, 2009; Turner,
2008) in order to explain how the different modes are integrated with one another
and also with the socio-cultural knowledge of participants to construct a global
meaning of the narrative. We believe these cognitive and functional oriented tools
are complementary since both seek to explain linguistic forms in terms of their
meaning, in strict relation with usage and cognition. After introducing the study’s
theoretical framework, the analysis of the three modes is presented and the way
in which the different meanings provided are integrated is examined in detail in
Sections 4 and 5, respectively.
Digital storytelling was initiated at the Center for Digital Storytelling (CDS) in
California in the early 1990s. Digital stories are made in workshops where people
can learn both about the software and technical aspects of the process and about
effective ways to construct their story. The genre and workshop methods have
spread to large parts of the world, and we can read, for example, how projects have
been carried out in Australia, Wales, and in South-African secondary schools.
It is possible to achieve a general view of the main features of the genre by
reading the various guides available on websites that publish digital stories.2 In
these tutorials, potential storymakers are advised on the several steps needed to
create their own story, from script to publication. Among the main features that
define this new emergent genre, the following can be highlighted:
– The textual narrative is usually told in first person or a fictional third person
and is frequently well structured in the traditional schema of abstract, orien-
tation, complication, resolution and coda (Labov & Waletzky, 1967). The text
is the main, leading mode and images and sounds tend to be supportive. All
tutorials consulted reveal that the verbal narrative is always the first step in
the construction of the multimodal story.
– Most images are still, but there some advanced digital storymakers that in-
clude short video clips in their work, or at least some zooming and travelling
to the images. They can range from drawings to family photos – colour, sepia
or black and white –, in addition to collages, common cultural signs (the red
ribbon for AIDS), symbols and visual metaphors.
– Quite often, digital stories contain personal images of the narrators them-
selves, as a child or at the present time. Guides for digital storytelling encour-
age authors to use this kind of images, partly because of the personal content
of the stories, but also in order to avoid any conflicts of copyright. Images
mainly provide the emotional and cultural content of the digital story.
– The narrating voice is also another defining feature of digital stories, as it
lends a sense of veracity to the story that enhances its emotional content.
In fact, the voice conveys information about the author, such as gender, age,
socio-cultural background and status, etc. On the other hand, the general
tone of the narratives is usually neutral, as if trying to sound objective in the
narrated events. In addition, the voice also provides rhythm and pace in the
display of images.
– The soundtrack is not always present. It can vary from commercial songs to
traditional, folk music or just some background music or sounds. It frames
the whole story in a context and can also help to mark shifts in the narrative.
Moreover, as it tends to become louder at the end of the story, it may act as a
coda for the story.
The stories which integrate our sample were uploaded by both non-governmental
and non-profit organizations which periodically hold digital storytelling work-
shops in various parts of the world to encourage people to share their experi-
ences.3
This section describes the data and the combination of methodological tools used
to depict multimodal digital stories from a cognitive point of view. For the pur-
pose of this study, thirty digital narratives were analysed. As for their selection, no
variables of sex, age, or cultural origin were considered. No attention was given
to particular topics either. The sample was selected at random from various pub-
lic, well-known digital storytelling web sites across continents to guarantee that
different cultural approaches to narratives are considered (see the Annex for the
complete list of stories). Consequently, our sample does not constitute a formal
corpus, but a first approach to the genre which is expected to provide a prelimi-
nary multimodal characterization of digital storytelling from a cognitive point
of view. All digital narrations were coded by a number and the acronym of the
organization that promoted its elaboration, e.g. Engender Health: EH. Also, the
audio was transcribed.
Concerning the article’s methodology, we drew some basic concepts from the
theories of Mental Spaces and Conceptual Integration Theory. Very recently, these
theories have been applied to the study of discourse (Oakley & Coulson, 2008)
and more specifically to narratives, both fictional (Dancygier, 2008; Semino,
2009) and non-fictional (Porto & Romano, 2010). According to these theories,
the meaning construction of a narrative is a process by which several input mental
spaces, i.e. “partial assemblies constructed as we think and talk, for purposes of
local understanding” (Fauconnier, 1997, pp. 1–2), are set up and then activated or
deactivated as the story unfolds by means of several devices such as linguistic and
pragmatic markers that act as space builders. Some elements in these input spac-
es are then cross-mapped and selectively projected onto a blended space whose
structure is emergent, i.e. dynamically elaborated and not just deriving from the
inputs. In addition, the selection of the elements that will be projected onto the
final blend depends on the generic space, which is not given by the narrative, but
belongs to the shared knowledge of the participants and is activated by the dis-
course. Therefore, the information, not only factual, but also modal, emotional,
cultural, etc. provided by the input spaces is just the prompt for the construction
of the global final meaning of the narrative – the emergent story. In Section 4
readers will find further explanations on the theory and examples taken from the
stories analysed.
This process becomes far more complex in multimodal narratives, where the
process of conceptual integration can be seen as working at different levels:
iii. The three narratives, each of them a different “information track” (Herman,
2009), must be integrated into the one and only emergent narrative to which
they all contribute.
In this framework, we decided to carry out a qualitative study that depicts the
main characteristics of the verbal, visual and acoustic channels as they are made
manifest in digital stories and describes how these three individual channels in-
teract to construct the meaning of this narrative genre.
For the multimodal study of the verbal mode, apart from the oral text, we
drew from two signifying systems proposed by Kress and van Leeuwen (1996) in
their grammar of visual design: salience and framing. Salience is determined on
the basis of the identification of visual cues – size, sharpness of focus, or amount
of detail or texture shown, tonal contrast, colour contrast, placement in the visual
field, perspective, and any cultural symbolism associated with the image – that
enhance verbal language. Besides all these elements, we also looked at the mean-
ing of any possible written text on screen (titles, introductory slides and credits),
and described its position on the image, its font, (ex. white fonts on black) and
its size. We also paid attention to edited images, for example, bubbles on pho-
tographs, cartoons, etc. As for the notion of framing, it is essential to unveil the
narrative structure of digital stories since they can connect and disconnect dif-
ferent narrative elements. So we analysed the structure of our stories, looking for
borders between elements, connective vectors, repetition of shapes, colours, etc.
which provide textual coherence to the story.
Regarding the visual mode, we first identified the type of image which accom-
panies the verbal mode in our stories (photos, cartoons, drawings, collages, etc.)
and its main features (edited or not, public, private, etc.). As we will show, most
slides are static or pictorial images. To identify the construction of ideational or
representational meaning through images, we relied on Unsworth and Cléirigh’s
(2009) model. It is a systemic functionally oriented tool (Halliday & Matthiessen,
2004) which provides a complete basis for the modelling of image-text relations
that jointly construct experiential meaning. From the point of view of the im-
age that visualises the language, Unsworth and Cléirigh (pp. 156ff.) classify pic-
tures into three types: (a) images which elaborate the (verbalised or unverbalised)
qualities of the main participants and/or events in the story; (b) pictures which
visualise the (verbalised or unverbalised) parts of the main participants and/or
events in the story; and; (c) pictures which visualise (verbalised or unverbalised)
geographical locations.
Identifying the construction of other meanings, like emotional or cultural,
was also very important in our analysis. As we will see in Sections 4 and 5, the
integration between image and text in these narratives through metaphors often
has a persuasive function, that is, to appeal to the listener.
Finally, regarding the acoustic mode, we analysed the songs and melodies that
go with the stories to unveil their contribution to the narrative meaning in the
digital narration. Although the narrators’ voices are also acoustic, we regard them
as belonging to the verbal mode, as these are oral stories. Thus the description
of gender voices, types of accents, etc. will be found in the verbal mode. As for
the acoustic mode, we distinguished between the music that creates a context for
the narration (introducing the story, setting the cultural context) and the music
which marks a significant moment in the story, or a change in context.
4. Multimodal analysis
(O’Halloran & Lim Fei, 2009, p. 145). The stark contrasts of the black and white
colours enhance the message against bullying and allow the receiver to focus on
its different expressions.
As for the visual channel, the narratives are structured via a sequence of static
and very often private images which belong to the narrator’s familiar archive and
are used to speak about him/herself, his/her family and his/her past and pres-
ent personal experiences. The narrator usually appears at the very centre of most
images, as the nucleus of information to which other elements (other people,
objects, etc.) are ancillary. In many instances, these photographs are black and
white or sepia, which reinforces the idea of past, gone times (SC5, SC6), which
alternate with colour images referring to present times. In some other cases (EH8,
SC2, SC5), black and white photographs bring forth dark emotions (poverty, de-
pression) versus colour pictures, which represent positive emotions (enthusiasm
and joy).
Edited images with cartoon speech bubbles on photographs (CN3) or car-
toons (UM1), are very few, but their use, again, is multimodally marked since
they attract the audience’s attention to the sometimes socio-culturally determined
message they convey. This is the case of the slide taken from EH8 (see Plate 1),
which reproduces graffiti of a man beating his wife with a cartoon speech bubble
on the picture saying: I paid lobola for her.
Graffitis are generally considered a young artistic expression that rebels
against authority. We believe the narrator uses them to challenge the traditional
African notion of masculinity and to transmit this message among male African
youngsters. As for the speech bubble, “lobola” is a set amount paid by a prospec-
tive husband to the bride’s family among certain peoples in Southern Africa.
It is clear then that images can perform different functions in digital stories.
Also, from the structural point of view, images guide the listener as signposts
through the story. Picture repetitions are used to provide textual coherence to
the story (WL1, SC8). Other times, they are used for the contrary, to signal topic
change. That is, in some cases, it is also possible to find the same image but ed-
ited in different ways, in order to show a change in the topic. The preservation of
the main features guarantees the identification of the topic although the addition
of elements (bubbles, other images such as graffiti, different background…) may
suggest a change, a step forward (see Plate 2).
It is not only pictures that play a distinctive role in structuring the story; most
of the analysed narrations are also framed by the background music. In some
stories, music goes on all the time, from beginning to end, merely accompany-
ing the story without extra meaning (EH8). Indeed, it can even go unnoticed. In
some other cases, it plays only at the beginning and at the end (EH5). And as with
pictures, music can be used by narrators to highlight a significant change in the
setting of the story (SC3, EH3…).
Apart from structuring, images play other functions in digital stories. Pho-
tographs also help the audience to visualise all the narrative elements. Most pic-
tures elaborate the unverbalised qualities of the main participants in the story, for
example, information about the people’s ethnicity (SC6, SC7, CN3, WL1, WL2),
about their condition (women with HIV in WL2, a lesbian and immigrant in
SC8, a transgender child in SC1), or about some other circumstances that sur-
round the narrator’s life: poverty and rural life (CN3), etc. This type of intermodal
identification between text and image is defined by Unsworth and Cléirigh (2009)
as intensive identification. Less frequently, we have also found cases of circum-
stantial identification, by which the image visualises (unverbalised) geographical
locations. In the narratives from the channels “living in Bristol” or “stories from
India” or “stories from Namibia”, the location is established from the very first
moment. However, in some other cases the verbal mode does not usually give a
proper physical setting for the story, since, as already pointed out, these stories are
condensed in a few minutes and reduced to their essence. So, images provide this
kind of information, sometimes intentionally, i.e. when a landscape or a map is
shown (SC8), and some other times unintentionally, by showing some cues at the
background of a photograph, e.g. the style of buildings (UM1), or the way people
dress (pendant with Ethiopia’s map in BS1, narrator’s Peruvian hat in SC8, Indian
clothes in EH6).
Finally, images can be used by digital narrators to effectively transmit ad-
ditional, emotional, and/or cultural meanings that are not conveyed by the oral
mode. Thus, pictures of celebrations and different lifestyles (e.g. children’s birth-
days (SC7), photos of weddings (UM1), etc.) help to situate the audience in a
given socio-cultural context. This is reinforced by features such as the accents
or English varieties spoken by the narrators and/or the music which frames the
story: we can find rap music in the story of a urban black boy that breaks the law
(SC2) or traditional folk music in stories about immigration (SC8, CN1), or the
sound of drums in a story of a Rastafarian living in Bristol (BS1). Gospel music
appears in stories of black people that speak of hope and overcoming difficulties
(CN3, WL1).
Special mention should be made to the presence of metaphors in our sample
stories. The integration of verbal and visual modes gives way to the production
of metaphors which are created by narrators to verbalise new ideas and emo-
tions (EH3, CN3, SC6 among others). Metaphors are used in digital stories when
an unknown or difficult situation needs explanation and other linguistic devices
prove insufficient (Cameron, 1999; Lakoff & Turner, 1989), or else in order to en-
able the compression of complex ideas and meanings in such a brief format.
Similarly, symbols can accompany the text to intensify its meaning, or be
added to it if the verbal mode does not mention the concept represented by the
symbol. In Nelao’s story (EH1), for example, the HIV ribbon, a group of ques-
tion marks, or the icon of sound crossed out to mean silence are all displayed at
different points of the narrative in order to convey the struggles that the narrator
faces living with HIV. Also, at the macro level, the symbolic use of colours, red,
black and white, has a role in establishing visual coherence during the narration:
red refers to passion (Hodge & Kress, 1988), broken hearts to HIV, black indicates
worries, white triggers innocence. Other bright colours, appearing at the end of
the story, show a shift towards happiness and joy.
5. Integrating meanings
After having described the different modes and the way they interact, we will
now show how the verbal, visual and acoustic meanings are integrated into a final
global construction of the story. Since this integration works at several levels, the
number of mental spaces created during the production and interpretation of a
story is extremely high. Thus, in order to glimpse some of the hundreds of mental
spaces that compose the network and of the processing strategies that take place
during the process of understanding a digital multimodal narration, we will show
the analysis of a fragment of one of the stories at the micro level and then extend
the results to the macro level of integration of the whole narrative.
James’ story, Rock Bottom (EH3), tells how the narrator, after many love expe-
riences, found a girl he really loved and wanted a stable relationship with her, but
she finally broke up with him. He then became aware of the pain he had inflicted
upon his previous girlfriends and now feels ready to be a different kind of man
and to challenge traditional gender roles. Figure 1 presents the description of two
slides in this story.
In the 7th slide, the verbal mode constitutes input space 1. It provides, on the
one hand, the basic factual information: he grew up without a father figure and
was brought up by his mother, and on the other, some emotional meaning: he
regrets the lack of a father figure. The photo of the lions prompts the construction
of input space 2, conveying unmistakable cultural meanings, since lions are repre-
sentative of an African setting. Also, the attitude of the animals transmits feelings
of comfort and protection, as they are sitting together under the sunlight, resting
parenthood
lions are strong
lions protect theri brood GENERIC SPACE
father and mother bring up
their children together
children must feel safe
African setting
A father is strong and supportive
A father provides protection
A father transmits a sense of peace
Narrator did not feel protected and
supported by a male parent
but watching attentively. The tinkling music matched with these words and photo
conveys mental images of childhood, innocence and light-heartedness,
which are part of input space 3 (see Plate 3).
The generic space activated by the elements of these three input spaces in-
cludes several cognitive and cultural models not explicit in the story: an abstract
schema of parenthood and what it involves; the image of lions as strong ani-
mals and the common knowledge that lions fiercely protect their brood;
the conventional models that children are brought up by both parents,
male and female and that children must feel safe and happy.
By cross-mapping the information provided by the input spaces, as enabled
by the commonalities evidenced by the models activated in the generic space, the
narrator as a child is projected onto the image of the lion brood and the absent
father figure onto the lion; the strength and fierceness commonly associated with
lions is matched with the support and care of parents for their children; the still-
ness of the lions in the image corresponds to the peace of a child who feels secure
and all that is framed by a general feeling of safety, peace and innocent happiness
prompted by the music.
Thus, a final blended space is constructed, with an emergent structure that
integrates elements from the input spaces, selectively projected and fine-tuned
by the previous knowledge of the discourse participants. That blend shows the
lion as a metaphor for a good father and states that the narrator feels he would
have become a better man if he had had a strong and supportive father, who had
protected and guided him as a boy and stayed by him and granted him a happy
and carefree childhood.
The next slide is just black, which means that the lions are gone, and the father
image is missing, and so are the feelings of safety and protection provided by him.
Besides, black means darkness, which stops us from seeing, from knowing what
comes next, and so it is culturally associated with cognitive models of fear and
loneliness. Also there is a change in the music, indicating that childhood is over
and there is a new stage in the narration of James’s life.
The whole story is composed of thirty-nine slides, and the blends for each
of these also act as input spaces in the whole network that will generate the final
blend of the narrative. Moreover, the mappings between input spaces can also be
done between different slides. For instance, the African setting is not only pro-
vided by the image of the lions in the 7th frame, but also the slides that display
his house on a street in Africa (4th, 5th and 6th), another (33rd) with a symbolic
image of a pregnant womb with a map of Africa on it and several others where
black African people appear in different attitudes – laughing (34th), attending
a lesson (20th) at a graduation ceremony with a white robe (9th) – or just some
landscapes (26th, 35th…). Therefore, the whole visual mode can be regarded as a
narrative itself where several input spaces are cross-mapped for the construction
of a blended space. Similarly, the verbal and the acoustic modes blend into spaces
that contribute to the final emergent one for the whole narrative.
6. Concluding remarks
In this paper, the meaning potential of thirty multimodal digital narratives has
been explored from both a multimodal perspective (verbal, pictorial and musical
resources) and a cognitive approach (Mental Spaces and Conceptual Integration
Theory). The oral channel has proved to be the leading mode, supported by im-
ages and music. Visual metaphors, cultural symbols and identifications provide
additional meanings beyond the mere factual content of the texts. The music,
when present, frames and contextualises the whole story. A highly complex men-
tal image of the story is achieved by integrating factual, cultural and emotional
meanings in an emergent blend that allows converting a minor, private story, of-
ten culturally specific, and only individually relevant into a universal, exemplify-
ing story. Our analysis attempted thus to show how different meanings and modes
integrate to convey a very complex and cognitively demanding message to society,
because digital stories are typically personal events narrated by an individual, but
when posted on a website, intend to become universal, reaching thousands of
people who may feel mirrored in them. It is in this sense that digital stories con-
stitute a good example of narratives as a tool for thinking, or a cognitive artifact
(Herman, 2003), that is, something used by humans for the purpose of aiding,
enhancing or improving cognition.
We are cautious about the interpretations provided here on the basis of the
study of the sample. Further research on a more extended corpus of digital stories
in English is needed. Also, in-depth interviews with digital narrators investigat-
ing their narrative processes and with the viewers are encouraged here to test
whether the explanations provided in this paper can be confirmed or not and the
importance of parameters such as individual narrative style should be studied to
determine their influence in the narrative structure. It is important to remember
that the stories were created by non-professional, non-expert storytellers, who
followed the instructions of monitors in workshops on the one hand and on the
other, may have felt constrained by the technical equipment or the software they
used for the creation of the stories, or even by their unequal abilities for the differ-
ent stages of the process (editing images, reading aloud, playing music, etc.).
However, the fact that some of the features here described and analysed were
not intentional by the narrator does not invalidate the conclusions on how those
features affect the interpretation and influence in the construction of the final
emergent story. For example, in SC5, instead of background music there is some-
one humming a song first and later singing it aloud. This is probably the result of
some advice given by the workshop organisers about the use of original music or
sounds, or even of people singing instead of using copyrighted material. But the
point is that this humming makes the whole story far more personal and the emo-
tional meaning attached to it stronger than it would have been with pre-recorded,
commercial music.
Apart from these limitations, it is nevertheless hoped that the multimodal
and cognitive analysis will contribute to deepen our understanding of the com-
plex ways in which different modes and resources contribute to form meaning in
these dynamic digital stories.
Notes
1. This paper forms part of a research project on Narration, Discourse and Cognition financed
by the Spanish Ministry of Science and Innovation (FFI2009-13582). The authors would like to
thank Profs. Anne McCabe, Carmen Pena and the two anonymous reviewers for their revision
and constructive commentaries of an earlier version of this chapter.
2. http://www.storycenter.org/cookbook.pdf,
http://www.bbc.co.uk/wales/audiovideo/sites/yourvideo/pdf/aguidetodigitalstorytellingbbc.
pdf, http://thecollaboratory.wikidot.com/creating-a-digital-story,
http://net.educause.edu/ir/library/pdf/ELI08167B.pdf, http://www.inms.umn.edu/elements.
References
Kress, G., & van Leeuwen, T. (1996) (2nd edition 2006). Reading images: The grammar of visual
design. London: Routledge.
Labov, W., & Waletzky, J. (1967). Narrative analysis: Oral versions of personal experience. In
J. Helm (Ed.), Essays on the verbal and visual arts (pp. 12–44). Seattle, WA: University of
Washington Press.
Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor.
Chicago: University of Chicago Press. DOI: 10.7208/chicago/9780226470986.001.0001
O’Halloran, K., & Lim Fei, V. (2009). Sequential visual discourse frames. In E. Ventola & A.J.
Moya Guijarro (Eds.), The world told and the world shown: Multisemiotic issues (pp. 139–
156). London: Palgrave Macmillan.
Oakley, T., & Coulson, S. (2008). Connecting the dots: Mental spaces and metaphoric language
in discourse. In T. Oakley & A. Hougaard (Eds.), Mental spaces in discourse and interaction
(pp. 27–50). Amsterdam: John Benjamins. DOI: 10.1075/pbns.170.02cou
Porto, D., & Romano, M. (2010). Conceptual integration in natural oral narratives. Actes des
journées d’étude Narratology and the new social dimension of narrative (01–02 Février
2010). At http://narratologie.ehess.fr/index.php?681 (last accessed 8th March 2013).
Ryan, M.L. (2005). On the theoretical foundations of transmedial narratology. In J.C.
Meister (Ed.), Narratology beyond literary criticism: Mediality, disciplinarity (pp. 1–23).
Berlin/New York: Walter de Gruyter. DOI: 10.1515/9783110201840.1
Semino, E. (2009). Text worlds. In G. Brône & J. Vandaele (Eds.), Cognitive poetics: Goals, gains
and gaps (pp. 33–37). Berlin: Mouton de Gruyter.
Turner, M. (2008). Conceptual integration. In D. Geraeerts & H. Cuyckens (Eds.), The Oxford
handbook of cognitive linguistics (pp. 264–293). Oxford: Oxford University Press.
Unsworth, L., & Cleirigh, C. (2009). Multimodality and reading: The construction of meaning
through image-text interaction. In C. Jewitt (Ed.), The Routledge handbook of multimodal
analysis (pp. 151–163). London: Routledge.
163
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
164 Isabel Alonso, Silvia Molina and María Dolores Porto
Code Title Link
SC2 Sacrificios http://www.storycenter.org/stories/index.php?cat=4
SC3 Privilege http://www.storycenter.org/stories/index.php?cat=3
SC4 Bruises http://www.storycenter.org/stories/index.php?cat=3
SC5 Memories of a Political Prisoner from Worcester http://www.storycenter.org/stories/
SC6 The Balcony http://www.storycenter.org/stories/
SC7 Mixed Race Me http://www.storycenter.org/stories/index.php?cat=7
SC8 My Shoes http://www.storycenter.org/stories/
TL1 Stop this Madness http://www.bbc.co.uk/humber/telling_lives/ram_files/roz_carr.ram
TL2 Whatever happened to Miss Pears? http://www.bbc.co.uk/humber/telling_lives/humber_intermediary4.shtml
TL3 McGuire http://www.bbc.co.uk/humber/telling_lives/40_humber_intermediary.shtml
UM1 Real Men Do Housework http://www.umbc.edu/oit/newmedia/studio/digitalstories/projects.php?movie=ELC054_
S09ByungchangKim.flv
WL1 The day I Made Him Stop http://www.youtube.com/user/WomenCrossingtheLine#p/search/0/hneAZCEl5v4
WL2 Power, HIV and the Feminist Movement Building http://www.youtube.com/user/WomenCrossingtheLine#p/u/1/KLI4N0XelSM
Asunción López-Varela
Universidad Complutense Madrid
doi 10.1075/bct.78.11lop
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
168 Asunción López-Varela
For Charles Sanders Peirce and the Harvard School of Pragmatism, a sign
is an object that stands for another so that an experience of the former affords
knowledge of the latter in some respect or capacity. This includes sounds, im-
ages, gestures, scents, tastes, textures, words, etc. The sign creates in the mind a
more developed sign, a mental effect or thought that Peirce calls ‘interpretant’
and which gives the sign significance or meaning, becoming in turn a sign in a
dynamic process ad infinitum. Interestingly, for Peirce, “cognition involves some-
thing represented, that of which we are conscious, and some action of passion of
the self whereby it becomes represented. The former shall be termed the objective,
the latter the subjective, elements of cognition” (Peirce, 1991, p. 46; my emphasis).
Alongside the deictic aspects of intermedial semiotics, this paper seeks to unveil
some of the intersubjective aspects of this ‘passion of the self ’.
The most basic mechanism for selecting information is to process stimuli
from a limited portion of space. This function is mediated by spatial attention
which operates capturing information by means of different sensory modalities
(sight, sound, etc.). For example, when trying to follow a conversation in a noisy
environment, attending to relevant lip movements and gestures may be as impor-
tant as attending to the speaker’s voice (Magosso, Serino, Pellegrino, & Ursino,
2010). It has been found (Blake, Sobel, & James, 2004; Kemmerer, 2006) that spa-
tiotemporal contiguity in the processing of information across sensory modalities
facilitates cross-modal coordination, so that, for instance, the sense of balance
located in the vestibular portion of the human inner ear can provide informa-
tion about the position of the human body. The senses of hearing and smell also
signal positions of other objects in relation to the body, even when visibility is
excluded.
Recent research on mirror neuron structures appears to indicate that different
senses encode proto-objects in similar ways. These neurons, located fundamen-
tally in brain areas that involve sense perception and motor activities (premo-
tor cortex, primary somatosensory cortex, inferior parietal cortex), respond not
only to one’s own bodily acts, but to actions performed by others, thus providing
mechanisms for learning by means of imitation. Jenson and Iacoboni (2011) ar-
gue that mirror neurons form the neural basis of the human capacity for emo-
tional engagement, with emotions having a very strong impact in cognition and
representation. However, despite the excitement generated by these findings, no
consensus has been reached on the kind of mirror neuron activity that supports
cognitive functions such as understanding the intention of others, and the imita-
tion of their actions (Churchland, 2012). Therefore, some neuroscience aspects
discussed here remain at the level of hypothesis.
Researchers seem to agree, however, in that from one unconscious fixation
to another, working memory only retains certain basic properties and relative
attention and sensorial response. For instance, activity activated by touch in the
somatosensory cortex increases when the volunteers have just been looking at
their arm. Other experiments have revealed cross-modal interactions between
vision and touch and vision and audition. To give other examples, sound can in-
fluence the perceived roughness of a touched surface (Guest, Catmur, Lloyd, &
Spence, 2002), and touch can influence visual perception of surface texture and
surface slant (Blake et al., 2004, p. 397; Ernst, Banks, & Bulthoff, 2000). Some
people experience specific colours in relation to differences in musical pitch, so
that musical patterns are converted into visual experience (Critchley, 1977).
As aforementioned, selective attention is the first step in multimodal percep-
tion, followed by an analysis of information to produce separable data (colour,
shape, etc.), incorporation of the experience in the brain by means of body-sche-
mas, generalization of features from repetition of the experience, abstraction and
establishment of a concept-structure by means of the elimination of accidental
features found in differing circumstances and, finally, the relation of this concept-
structure to specific patterns in order to activate a functional response. Due to
space constrains, the following section of this paper will offer a glimpse on how
visual cueing captures attention in fictional/virtual contexts such as graphic nov-
els. The last part will explore multimodal cueing in online communication.
3. Intermedial attunement
The expression ‘attunement’ originates in the field of music and refers to the mea-
sures taken to solve discrepancies and dissonances among different instruments
so that they can play in tune. Attunement is achieved by means of a correlation of
simultaneity. In the visual mode, simultaneity is favoured within spatial contigu-
ity, so that the use of text and image in written formats (i.e. in comics and graphic
novels) helps convey information more easily. Along with simultanenity, tempo-
ral contiguity is also important in multimodal crossings between, for instance,
vision and sound, as the online example analyzed in this paper shows.
In the case of human hearing and its relation to speech, several attempts have
been made to relate sound deixis, motor action and task-oriented phoneme selec-
tion (Abry, Vilain, & Schwartz, 2004). There is evidence that mirror neuron map-
pings have allowed connections between pointing and language (Gallese, 2006;
Iacoboni, 2005). Auditory sensation involves the ability to perceive and localize
sound sources in space by means of acoustic waves, vibrations and pulses that act
on the body. Once intensity, frequency and duration are sensed, auditory percep-
tion requires a higher level of cognitive processing in association with previous
experience, and the formation of auditory images, that is, psychological represen-
tations of sounds exhibiting internal coherence. Attention correlates with fluctua-
tions in loudness, pitch, perceived duration, spatiality and timbre.
In hearing, deictic pointers also hold a position as mental indexes, carrying
information about sound sources, their interrelationships, and the environment
(i.e. sound movement in the horizontal and vertical plane, distance, depth, etc.).
The difference in pitch sensation and degree of correlation between the left and
right ear (binaural hearing) improves sound perception (for some species of
birds, such as owls, hearing asymmetry is the main localization cue in the vertical
direction). Distance can also be located on this basis, since distant sounds have a
lower frequency than the sounds radiated from proximal sound sources. Motion
is located by changes in the angle of sound perception. In humans, the timbre of
a continuous sound can be perceived at 60 m and its variations are the basis for
sound classification in music. In the absence of vision, humans can also deter-
mine a person’s size by changes in pitch sensation and by calculating the mass of
air displaced when the person walks (for further information see Lopez-Poveda,
Palmer, & Meddis, 2010).
In human-to-human communication, the case of speech sound is different
from the perception of all other complex sounds because it is cognitively tied
up to the process of speech production (where the stream of air controlled is
processed by the lungs, the larynx, the vocal folds and tract, and the articulators
located in the mouth cavity − tongue, teeth, and lips − whose movements divide
the vocal tract into a series of resonance tubes and cavities). If the speech sounds
are unfamiliar to the listener (e.g., an unknown foreign language) it may loose the
coupling between perception and production, and sounds are not recognized as
language. In a party, for instance, where speech might not be audible, a listener
might be able to hear his or her own name because awareness does not only de-
pend on sonic qualities but also on co-operating semiotic sources such as atten-
tion, gestures, and the cultural and emotional aspects of communication.
The case of human-to-human communication by means of computer interac-
tion has particular characteristics because it does not take place in spatial conti-
guity (and sometimes not in temporal simultaneity, as we shall see below). The
digital medium of computer communication increases the weight of touch, vision
and sound. Tactile information can also relate to visual information from dis-
tant objects because tools (in his case the keyboard, the mouse or touch-screens)
are used to extend possible reaching space (Bouchardon & López-Varela, 2011;
Maravita, Spence, Kennett, & Driver, 2002).
The notion of interactivity in digital environments means that performativity
is about physical ‘actions’ operated by participants (clicking the mouse or touch-
ing the screen; looking through the Webcam; speaking through the microphone),
not just mental interpretations or readings. But not all digital formats are modifi-
able, and when they are, only certain manipulations are allowed. Some formats,
such as pdf, protected images, certain graphics, and some sound and video files
do not allow copy-paste, TV capture, or Photoshop. Some applications (Apple
products such as iPad, iPodTouch, iPhone, or the Android Open Source Project
AOSP) encourage the use of touch, just as the Braille Translation Software which
converts visual text to tactile format for visually impaired users (among Screen
Reading Software and Text to Speech programs the most widely used is ‘Jaws’).
There is also software that allows the incorporation of biophysical environmental
elements (temperature, light, body heat, eye-tracking movements and other touch
and motor activities).
Since the early 1990s, the widespread use of computer mediated communi-
cation in the World Wide Web encouraged arguments pro the potential use of
these technologies to connect people and empower social groups. In the 21st-
century, user/participant interactions have increased dramatically, enabled by so-
cial software applications such as blogs and platforms like Twitter or Facebook.
The debate on the power of such networks upon public opinion remains open,
and research on the workings of online interpersonal relations and communica-
tion may shed some light upon the ongoing discussions.
until it focuses on the face of a man. The speaker is obviously sitting in front of a
webcam that follows his movements. He talks through a head-set. An audience
familiar with the language would identify his speech as French. He explains that
sometimes human reactions cannot be anticipated. As he talks, his gaze moves
deictically towards the right where another window opens and a female voice
speaks: “et moi” (English transcription on the bottom-right). For a non-bilingual
speaker, this doubling effect in two different languages undoubtedly contributes
to the estrangement of the situation. The cueing process only works if one is able
to recognize the language and the words uttered as French, and identify them
with the English transcription. The focus of the webcams begins to fluctuate again
and to move in several directions, as if attempting to point towards a particular
place. The sonic modality sets the audience’s expectations to rest: “I don’t know
what to do”. This sentence provides reassurance that communication has not been
interrupted and that cooperation continues. The uncanny effect returns when si-
lence is again prolonged and the camera moves around the male speaker. “How to
manage a fragmented relation” (une relation entre-coupé) says the female voice
as the screen goes black once again (except for the English transcription). She
laughs. Information exchange between the two voices seems to continue, but it is
not perceived as a dialogue.
In this piece, interruptions take place at the visual and sonic level. If one of the
two perceptual modes works, the audience is able to maintain connection. How-
ever, both channels are intermittently interrupted, and the uncanny communica-
tion gaps create a desire to fill the voids by performing some kind of intervention,
for instance moving the cursor of the video-recording to see what happens next.
In minute 4 approximately, both begin to speak as if they are unable to hear each
other. It first sounds as two simultaneous monologues, but one soon realizes that
they are answering each other, although the synchrony of the responses is slightly
delayed. This shows the importance of temporal contiguity.
The speakers reflect on their previous relationship. The fact that they know
each other comes as a relief and a guaranty of perhaps greater connection be-
tween them, even if the technological context does not enable it. Reassurance is
achieved when all modal cues are activated, that is, when the two windows are
displayed simultaneously and their faces visible, showing occasional eye-contact
with the audience. Also when sound is synchronized and when there are signs
of emotional connection such as smiles or laughs between the speakers. In the
absence of touch, physical contact is achieved through mutual gaze, agreement in
speech (i.e. emphasis on the word ‘sweetness’), and acts such as slowly stroking
the teddy-bear. All of these gestures convey tranquillity and reassurance.
Communication in Abrahams’ piece is not broken but syncopated. This cre-
ates occasional frustration in the speakers. They discuss who conveys more ‘hope’,
but they do not seem to achieve consensus. However, this is not felt as a com-
munication breach. Anxiety arises from the interruptions that take place in the
channel, rather than being the result of the speakers’ actions and discourses. This
is probably due to the fact that the audience perceives them as friends. Neverthe-
less, weariness grows as the channel fails to enable connectivity.
In minute 8 when the male speaker blows the smoke of his cigar into the
microphone, the audience experiences a synaesthetic moment triggered by the
blowing sound and the visual perception of smoke. Both deictic cues create a
mental association in which one is almost able to feel the smell of the cigar. The
last part of the video explores other aspects that impair or facilitate online com-
munication. A very important one is speed. Digital communication increases
the speed of information processing at all levels. However, this is not necessarily
translated into an improvement in communication. The last section of the project
shows how rapid changes create distractions and anxiety because the audience is
unable to track down information accurately.
As a summary, we can say that digital environments allow easy access to tracks
and segments of information by means of instantaneous captures of sound, text
and image which can be frozen for subsequent inspection, for instance, in video-
capture, as in the example analyzed here. However, navigation through sound
perception – whether speech or music – involves a kind of representation of the
complete piece (this is also true of text, which looses its coherence and narrativity
when too many links are introduced in hypermedia environments). Unlike visual
images which are held in space, sound has a temporal structure where deictic cues
are processed in their continuity as well as in their simultaneity.
5. Conclusions
The paper has traced the communicative impact of multimodal mappings, index
assignment and pointers both in cognition and discourse. It has also introduced
the potential role of affective phenomena and intersubjectivity in engaging coop-
eration. After a succinct reference to visual indexes and the role of eye-contact
in capturing attention and conveying empathic responses in Al Davison’s auto-
biographical graphic novel The Spiral Cage, the paper has offered a more detailed
reading of Annie Abrahams’ “A Fragmented Relation”, which explores both vision
and sound in online-collaboration. The case study shows the importance of me-
diating channels (whether analogue or digital) upon the spatiotemporal axis of
perception, with the slightest cues (spatial visual frames in the case of the graphic
novel, and time-lag in digital setting) having significant impact on communica-
tive situations.
References
Grishakova, M. (2012). On cognitive and semiotic functions of shifters. Chinese Semiotic Stud-
ies, 8(2), 227–238. DOI: 10.1515/css-2012-0041
Guest, S., Catmur, C., Lloyd, D., & Spence, C. (2002). Audiotactile interactions in roughness
perception. Experimental Brain Research, 146(2), 161–171.
DOI: 10.1007/s00221-002-1164-z
Iacoboni, M. (2005). Understanding others: imitation, language, empathy. In S. Hurley &
N. Chater (Eds.), Perspectives on imitation: From cognitive neuroscience to social science
(pp. 77–99). Cambridge, MA: MIT Press.
Jenson, D., & Iacoboni, M. (2011). Literary biomimesis: Mirror neurons and the ontological
priority of representation. California Italian Studies, 2(1) <http://escholarship.org/uc/
item/3sc3j6dj>.
Kemmerer, D. (2006). The semantics of space: integrating linguistic typology and cognitive neu-
roscience. Neuropsychology, 44, 1607–1621. DOI: 10.1016/j.neuropsychologia.2006.01.025
Kurylowicz, J. (1964). The inflectional categories of Indo-European. Heidelberg: Carl Winter
Universitätsverlag.
Lopez-Poveda, E., Palmer, A.R., & Meddis, R. (2010). The neurophysiological bases of auditory
perception. New York: Springer Verlag. DOI: 10.1007/978-1-4419-5686-6
López-Varela, A. (2010). Exploring intercultural relations from the intersubjective perspectives
offered through creative art in multimodal formats. Lexia, 5–6, 125–147.
López-Varela, A. (2011). Multimodal metaphor and intersubjective experiences: the impor-
tance of eye-contact in Davison’s graphic novel ‘The Spiral Cage’ and in Annie Abrahams
Net-Project ‘On Collaboration’. In L. Masucci & G. Di Rosario (Eds.), Lavori del Convegno
Palazzo degli Artista Italiani (pp. 307–324). Naples: Oficina di Letterature Electrónica.
Maravita, A., Spence, Ch., Kennett, S., & Driver, J. (2002). Tool-use changes multimodal spatial
interactions between vision and touch in normal humans. Cognition, 83, 25–34.
DOI: 10.1016/S0010-0277(02)00003-3
Magosso, E., Serino, A., Pellegrino, G.D., & Ursino, M. (2010). Crossmodal links between vi-
sion and touch in spatial attention: A computational modeling study. Computational Intel-
ligence and Neuroscience. DOI: 10.1155/2010/304941 http://www.hindawi.com/journals/
cin/2010/304941/
Peirce, C.S. (1906). Prolegomena to an apology for pragmaticism, The Monist, 5(4), 492–546.
Reprinted in C. Hartshorne & A. W. Weiss (Eds.), Collected papers of C. S. Peirce. Vol-
ume V (pp. 530–572). Cambridge, MA: Harvard University Press.
DOI: 10.5840/monist190616436
Peirce, C.S. (1932). Division of signs. In C. Hartshorne & A.W. Weiss (Eds.), Collected papers of
C. S. Peirce. Volume II. Cambridge, MA: Harvard University Press.
Peirce, C.S. (1991). On the nature of signs. In J. Hoopes (Ed.), Peirce on signs. Chapel Hill: Uni-
versity of North Carolina Press.
Pylyshyn, Z.W. (2000). Situating vision in the world. Trends in Cognitive Sciences, 4(5), 107–
207. DOI: 10.1016/S1364-6613(00)01477-7
Rensink, R. (2000). The dynamic representation of scenes. Visual Cognition, 7, 17–42.
DOI: 10.1080/135062800394667
Searle, J.R. (1995). The construction of social reality. London: Allen Lane.
Schmidt, R.A. (1982). Motor control and learning: A behavioural emphasis. Champaign, IL: Hu-
man Kinetics Publishers Inc.
Schmidt, R.A. (1991). Motor learning and performance. Champaign, IL: Human Kinetics Pub-
lishers Inc.
Tresilian, J. (2012). Sensorimotor control and learning: An introduction to the behavioral neuro-
science of action. London: Palgrave Macmillan.
Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition. Oxford: Blackwell.
Stolorow, R.D., & Atwood, G.E. (1992). Contexts of being: The intersubjective foundations of
psychological life. Hillsdale, NJ & London: The Analytic Press.
Wade, N.J. (2010). Pioneers in eye movement. i perception, 1, 33–68. <http://i-perception.
perceptionweb.com/fulltext/i01/i0389.pdf>. DOI: 10.1068/i0389
Wind, J. (1989). Studies in language origins. Amsterdam/Philadelphia: John Benjamins.
DOI: 10.1075/z.los1
Zlatev, J. (2008). The co-evolution of intersubjectivity and bodily mimesis. In J. Zlatev, T.
Racine, C. Sinha & E. Itkonen (Eds.), The shared mind: Perspectives on intersubjectivity
(pp. 215–244). Amsterdam/Philadelphia: John Benjamins. DOI: 10.1075/celcr.12.13zla
The paper presents the analysis of the humor found in four dyadic conversa-
tions. The results of the conversational data match those of previous studies
(Pickering et al., 2009): no differences were found in volume or speech-rate
between humorous pause units and non-humorous ones. Similarly, pauses
were not found to mark humorous turns. However, the result that punch-lines
showed lower pitch than non-humorous parts of the text was not replicated:
humorous pause units showed no significant differences in pitch from non-hu-
morous ones. Smiling is found to mark humor only in a general sense of “setting
the frame” and is not integrated (i.e., co-extensive) with the humor.
1. Introduction
The purpose of this paper is to begin to address the issue of multimodal1 mark-
ers of humor in conversational humor. We present an extension of the results we
found in previous studies that show that speakers do not consistently mark punch
lines prosodically or with pauses. We also present, as a first tentative hypothesis, a
view that smiling and laughter co-occurring with humorous turns may mark local
discourse as in a playful, humorous frame.
The field is in clear need of research, as very little has been written about the
prosodic and multimodal markers of humor, with the exception of the markers
of irony (which, for the purposes of this study, we consider to be a case of humor;
the issue is needless to say, much more complex, see Hidalgo Downing & Iglesias
Recuero, 2009). Moreover, what little research is extant on the subject of markers
of humor and irony is primarily focused and based on laboratory data and does
doi 10.1075/bct.78.12att
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
182 Salvatore Attardo et al.
not concern itself on discourse data. For a review of the extant literature, on mark-
ers of humor see Pickering et al. (2009) and Attardo, Pickering, and Baker (2011);
for markers of irony, see for example Attardo (2000) and Rockwell (2006).
Thus, for example, in Pickering et al. (2009), the first study to systematically
test various hypotheses about prosodic marking of humor in canned jokes, we
presented evidence showing that untrained speakers did not mark as prosodically
salient the punch lines of jokes. In fact, we found that punch lines, which occur at
the end of narrative texts, were produced with significantly lower pitch, consistent
with their occurrence at the end of a paratone. Moreover, we found that speakers
did not mark a significant number of punch lines with pauses, nor did they con-
sistently use smiling and laughter to do so.2
In a follow up study (Attardo et al., 2011), we found that humor in a con-
versational setting, as opposed to the narrative setting of the jokes studied in
Pickering et al. (2009), did not display the significantly lower pitch, as we would
have expected, since the humor did not occur at the end of a narrative. The study
in Attardo et al. (2011), however, was only a pilot, consisting of the analysis of one
dyadic conversation, making broader generalizations impossible.
The present study broadens the analysis to include four dyadic conversations,
in which speakers were instructed to tell each other a joke and then were left free
to converse for five minutes. The setup allowed for the collection of ten minutes
(both sides of the conversation were recorded) of high quality sound files and
video files of the faces and torso of the speakers, for each conversation. There were
39 instances of humor in the corpus. A detailed description of the setup for the
data collection and of the transcription conventions can be found in Attardo et al.
(2011). Syllables in all caps are stressed. A full discussion of the data is forthcom-
ing. We gratefully acknowledge the help of several students on this project.
For the purposes of this study, we treated smiling and laughter as points on
a continuum of intensity for mirthful displays. This is purely a matter of conve-
nience and we take no position as to whether smiling and laughter are genetically
related. The continuum can be visualized as ranging from a very low intensity
smile, to a high intensity smile, and then transitioning into laughter. We also as-
sume as given the distinction between canned narrative (prepared, rehearsed)
jokes, which end in a punch line, which is the locus of the humor, and discourse
The difference between canned jokes (rehearsed narratives with a humorous end-
ing, the punch line) and conversational humor (improvised, often non-narrative)
has been mentioned. Despite these differences, humor, regardless of its position
in a narrative or occurring in an isolated turn of a conversation, is assumed to be
salient (Giora, 2003). Thus, when we decided to examine how speakers marked
the salient passages of narrative humor (canned jokes), we expected that the se-
mantically and pragmatically salient punch line would be marked with the pro-
sodic counterparts of salience (high pitch, high volume, slow speech rate) and
we expected to find the punch lines marked by significant pauses and marked
by smiling and/or laughter. The view of laughter and smiling as reactive to hu-
mor has been rejected early on by the pioneering work of Jefferson (1979) which
showed that laughter could precede and accompany the production of humor to
signal its presence and effectively “invite” more laughter.
However, the results of the Pickering et al. (2009) study rejected most of these
expectations. The punch lines of narrative jokes were found to be significantly
lower in pitch than the rest of the text. There was no significant difference in
volume between the punch lines and the rest of the texts. There also was no sig-
nificant difference in speech rate, although speakers delivered punch lines slower
than the rest of the text. There were no significantly longer pauses around punch
lines. Finally, only 60% of punch lines were accompanied by laughter and/or smil-
ing voice.
The results just described naturally bring about the question of whether they
apply to conversational, improvised humor as well. The ongoing study of a small
corpus of dyadic conversations was designed to answer this question.
2.1 Pitch
2.2 Volume
We also did not find any significant difference between the volume of the hu-
morous pause units and the pitch of the non-humorous pause units. The mean
volume was: Humor data: 74.77; Non-humor: 74.75. The standard deviation: Hu-
mor: 6.79; Non-humor: 6.50. The two-tailed P value is 0.9437, considered not
significant, using the Mann-Whitney test.
We measured the speech rate in syllables per second. The mean speech rate was
0.36 sps for serious speech and 0.24 for humorous speech. This difference is not
significant (Mann-Whitney two tailed p = 0.3014).
2.4 Pauses
As discussed in detail in Pickering et al. (2009) and Attardo and Pickering (2011),
the traditional folk-theory of humor predicts that punch lines should be set apart
from the preceding text by noticeable pauses. A pause was operationalized as
a “substantial” pause > .6 seconds (Brown, Currie, & Kenworthy, 1980, p. 56),
since shorter pauses are not generally noticeable and hence would not function as
markers. Significantly, a pause of .6 seconds is roughly double the mean length of
pauses in three out of four conversations, in our data. The fourth conversation has
a mean length of pauses of .7 seconds, but this is clearly an exception, as the two
speakers had some difficulty getting started with the conversation.
Since we used pause-based units, obviously every unit is preceded by a pause.
Therefore we looked both at whether the prediction that all instances of humor
were preceded by a substantial pause was borne out both using a strict definition
of punch/jab line, and using a broader definition, i.e., whether the unit in which
the humor occurs was preceded by a substantial pause.
Our results match those in Pickering et al. (2009): no jab line (including iro-
ny) occurs immediately after a substantial pause (strict definition). 11 out of 39
(roughly 26%) instances of humor occur in a pause-based unit preceded by a
substantial pause (broad definition). These results clearly falsify the folk-theory of
humor’s claim that humor should be preceded by a substantial pause.
The following example clearly shows how even in an instance in which there
is a substantial pause before the pause-based unit, it clearly does not mark the jab
line.
233 P 0.3
234 P //a Pep Boys and an Aldi’s GROcery [100] [73] store //
235 0.81
236 M //wh:::ich [((laughs)) that’s very IRISH [245] [80] [((laughs))] //
237 0.32
Note how speaker M begins his turn with an elongated vowel, a bout of laughter,
followed by a sentence, in which the direct object is the jab line. Thus, in a strict
definition of jab line, the pause should have been between “that’s very” and “Irish,”
whereas there is no audible pause at all. It is notable how laughter occurs before
and after the sentence in which the jab line occurs. The presence of laughter is not
unusual, as it is found in about 50% of the instances of humor.
Table 2 summarizes the results for conversational humor, contrasted with the
Pickering et al. (2009) study on canned jokes.
A fuller discussion of these data would take too much space, in this context,
so we will limit ourselves to explaining the only difference between the two sets
of data, namely the absence of a significantly lower pitch for the conversational
humor, as compared with the canned humor.
– They define pause as interruptions of the flow of speech longer than .3 sec-
onds; for the reasons discussed in Pickering et al. (2009), we used substantial
pauses (> .6 s)
– They use “intonation phrases” for which boundaries are, among other things,
pauses (Archakis et al., 2010, p. 195). Hence intonation phrases have a higher
probability of being preceded and/or followed by pauses than other non-
pause-based unit.
– They selected “semi-randomly” 100 serious intonation phrases from narra-
tives, this may have reduced the probability that they be framed by pauses,
unlike jab lines. The only way of avoiding potential bias is either to utilize all
non-humorous turns or to select randomly a sample.
Let us note that, even when looking at only four conversations, that the conver-
sations differ in interesting ways, at a very direct impressionistic level. Some of
the conversations are very easy going and the participants are clearly enjoying
themselves, whereas in others they are much less relaxed. This is reflected in the
quantity of smiling and laughter. For example, Tamara and Mary smile and laugh
almost continuously throughout the conversation, while Carmen and Marina
smile and laugh a lot less. In this context, we will not try to characterize these
differences and instead we will focus on the intensity of smiles and their use in
back channels.
Figure 1. Mary and Tamara’s facial expressions during the jab line occurring on line 215
of the transcription “I’ve screwed up Easy Mac and stuff like that, you know?”
Clearly, the high intensity smiles do not mark the punch line (in its strict
definition) in the sense of accompanying it or of announcing it. The function of
smiling and laughter seems to be that of providing clues that lead to the framing
of segments of the exchange as humorous.
It should be noted that some speakers smile and laugh also in turns where no
humor takes place, most notably Tamara and Mary. Thus obviously smiling is not
a clear-cut marker of humorous intent. Smiles can have many different meanings
(Ekman, 1985). Conversely, there are instances of humor not accompanied by
smiling (for example, Carmen’s ironical turns).
The occurrence of laughter when no humor occurs can possibly be explained
by the framing of a multi-turn conversational stretch as humorous. Once the
speakers have agreed that the current exchange is in the humor frame, subsequent
utterances that have no humorous value in and of themselves are “interpreted” as
humorous (i.e., the speakers react with laughter). The presence of laughter itself
being a ostensive marker of humor (the speakers would be “reasoning” along the
lines of “we are laughing, we must be being funny”. Significantly, these laughing
stretches do not continue indefinitely but eventually decay: the speakers return to
neutral expressions or low intensity smiles fairly rapidly.
From the exchanges in our data exemplified in Figure 1, it is fairly clear that speak-
ers use reduced (low intensity) smiles to back channel and to “test” the humorous
intention (see stills 1–3 of Figure 1), whereas they break out in full smiles (high
intensity) either to spontaneously show enjoyment (often followed by laughter) or
to signify uptake of the humor.
Smiling has been likened to back channel (Brunner, 1979). Obviously, not all
smiles are back channel devices and, regardless of that fact, it remains to be deter-
mined what message is being sent through the back channel. Traditionally, back
channel messages signify that the attention of the hearer is engaged and provide
clues to the uptake of the message in the predominant channel. We wish to put
forward the hypothesis that back channel smiling during humorous turns has a
richer, more specific value.
Specifically, we hypothesize that back channel smiling indicates agreement
with the humor. “Agreement” is here used as a technical term, introduced in Hay
(2001). Hay (2001, p. 67) distinguished four levels of uptake of a humorous in-
stance in discourse: recognition, understanding, appreciation, and agreement.
The first three levels are self-explanatory: the hearer must become aware of the in-
tention of the speaker to be funny (there are exceptions, which need not concern
us in this context), must understand the humor, and must appreciate it (i.e., enjoy
it). The fourth is more complex: agreement implies “agreement with the message,
including any attitudes, presuppositions or implicatures contained in the humor.”
(Hay, 2001, p. 72). Significantly, the levels are the object of a scalar implicature:
agreement implies appreciation, which in turn implies understanding, and the
latter implies recognition. For the record, we simplify Hay’s sophisticated discus-
sion, but not so much as to distort her argument.
Our hypothesis, then, is that by smiling during and after the production of
humor, the hearer signals not only their attention to the message, but that they
“get it” and moreover that they agree with the speaker, in the sense above. We wish
to stress that we are not arguing that the above is the only meaning of smiling in
conversation. On the contrary, we argue that it is one of many meanings it may
convey. Our work should be seen as exploratory in nature, as it explores discur-
sive exchanges and departs from some methodological problems, such as cherry
picking of the data, which plague other approaches, as we discuss below.
4. Methodological issues
This is a very complex issue, which we cannot hope to tackle in all its complex-
ity. Let us just note two general points: (1) a sufficiently large corpus will yield
enough examples to illustrate just about any theory. A study should analyze either
all the data or a representative sample thereof. (2) Investigating the occurrence of
a feature and its correlation to another is not sufficient. For example, we know that
smiling and laughter occur also when no humor is present. Therefore, the mere
occurrence of smiling and/or laughter cannot be said to mark a discourse passage
as humorous. Even if all the data in one’s corpus co-occurred with smiling/laugh-
ter, the occurrence of smiling and/or laughter outside of humorous turns needs to
be explained and/or discriminated somehow from smiling/laughter in humorous
turns, if one wants to speak of “marking.”
There have been attempts to circumvent this problem by coining the tauto-
logical term “laughable” (a laughable is a stretch of discourse in which laughter
occurs; Glenn, 2003) but clearly the concept has no explanatory power at all (see
Attardo, 2005).
We are deliberately avoiding, at this point, the issue of what units of discourse
get “framed.” There are different approaches, ranging from framing as humorous
entire conversations, to single turns. We might label the former “global” and the
latter “local.” Clearly, the hypothesis outlined in this paper is that the framing of
discourse as humorous (play frame) is local. Further research will need to exam-
ine this hypothesis.
Notes
1. The term “multimodality” refers to multiple modes of communication being used concur-
rently. Thus speech uses the auditory channel and smiling (primarily) the visual channel. In this
paper we only consider a few aspects of multimodality in what is also known as “paralanguage”
(Trager, 1958). Multimodality can include gestures, the posture of the body during speech, the
way the speakers locate themselves in relation to each other, their clothing, etc. Clearly, only
a few features can be analyzed at a time. In this article we focus on prosody and smiling and
laughter.
2. One of the anonymous referees asks us to speculate as to why speakers do not mark punch
lines. Our best guess, at this time, would be that they do not feel the need to do so, insofar as
their hearers can be relied upon to recognize the humorous intention.
References
Archakis, A., Giakoumelou, M., Papazachariou, D., & Tsakona, V. (2010). The prosodic framing
of humour in conversational narratives: Evidence from Greek data. Journal of Greek Lin-
guistics, 10, 187–212. DOI: 10.1163/156658410X531375
Attardo, S. (1994). Linguistic theories of humor. Berlin: Mouton de Gruyter.
Attardo, S. (2000). Irony markers and functions: Towards a goal-oriented theory of irony and
its processing. RASK, 12, 3–20.
Attardo, S. (2001). Humorous texts. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110887969
Attardo, S. (2005). Review of Glenn (2003). HUMOR: International Journal of Humor Research,
18(4), 422–429.
Attardo, S., & Pickering. L. (2011). Timing in the performance of jokes. Humor: International
Journal of Humor Research, 24(2), 233–250.
Attardo, S., Pickering, L., & Baker, A. (2011). Prosodic and multimodal markers of humor in
conversation. Pragmatics and Cognition, 19(2), 224–247. DOI: 10.1075/pc.19.2.03att
Bavelas, J.B., & Chovil, N. (2000). Visible acts of meaning. An integrated message model of lan-
guage use in face-to-face dialogue. Journal of Language and Social Psychology, 19, 163–194.
DOI: 10.1177/0261927X00019002001
Brown, G., Currie, K.L., & Kenworthy, J. (1980). Questions of intonation. Baltimore, MD: Uni-
versity Park Press.
Brunner, L.J. (1979). Smiles can be back channels. Journal of Personality and Social Psychology,
37(5), 728–734. DOI: 10.1037/0022-3514.37.5.728
Coulthard, M., & Montgomery, M. (Eds.). (1981). Studies in discourse analysis. London:
Thomas Litho Press.
Eisterhold, J., Attardo, S., & Boxer, D. (2006). Reactions to irony in discourse: evidence for the
least disruption principle. Journal of Pragmatics, 38(8), 1239–1256.
DOI: 10.1016/j.pragma.2004.12.003
Ekman, P. (1985). Telling lies: Clues to deceit in the marketplace, politics, and marriage. Norton,
New York, NY.
Ekman, P., Davidson, R.J., & Friesen, W.V. (1990). Duchenne’s smile: Emotional expression and
brain physiology II. Journal of Personality and Social Psychology, 58, 342–353.
DOI: 10.1037/0022-3514.58.2.342
Giora, R. (2003). On our mind: Salience, context and figurative language. Oxford: Oxford Uni-
versity Press. DOI: 10.1093/acprof:oso/9780195136166.001.0001
Glenn, P. (2003). Laughter in interaction. Cambridge: Cambridge University Press.
DOI: 10.1017/CBO9780511519888
Grice, P.H. (1957). Meaning. The Philosophical Review, 66(3), 377–388. DOI: 10.2307/2182440
Harker, L., & Keltner, D. (2001). Expressions of positive emotion in women’s college yearbook
pictures and their relationship to personality and life outcomes across adulthood. Journal
of Personality and Social Psychology, 80(1), 112–124. DOI: 10.1037/0022-3514.80.1.112
Hay, J. (2001). The pragmatics of humor support. HUMOR: International Journal of Humor
Research, 14(1), 55–82. DOI: 10.1515/humr.14.1.55
Hidalgo Downing, R., & Iglesias Recuero, S. (2009). Humor e ironía: una relación compleja. In
L. Ruiz Gurillo & X.A. Padilla García (Eds.), Dime cómo ironizas y te diré quién eres: Una
aproximacion pragmática a la ironía (pp. 423–455). Frankfurt: Peter Lang.
Hirst, D., & Di Cristo, A. (1998). Intonation systems: A survey of twenty languages. Cambridge:
Cambridge University Press.
Jefferson, G. (1979). A technique for inviting laughter and its subsequent acceptance declina-
tion. In G. Psathas (Ed.), Everyday language: Studies in Ethnomethodology (pp. 79–96).
New York: Irvington.
Owren, M.J. & Bachorowski, J.A. (2003). Reconsidering the evolution of non-linguistic com-
munication: the case of laughter. Journal of Nonverbal Behavior, 27, 183–200.
DOI: 10.1023/A:1025394015198
Pickering, L. (1999). An analysis of prosodic systems in the classroom discourse of native speaker
and nonnative speaker teaching assistants. Unpublished dissertation. University of Florida.
Pickering, L., Corduas, M., Eisterhold, J., Seifried, B., Eggleston, A., & Attardo, S. (2009). Pro-
sodic markers of saliency in humorous narratives. Discourse Processes, 46, 517–540.
DOI: 10.1080/01638530902959604
Rees, C.E., & Monrouxe, L.V. (2010). I should be lucky ha ha ha ha: The construction of power,
identity and gender through laughter within medical workplace learning encounters. Jour-
nal of Pragmatics, 42, 3384–3399. DOI: 10.1016/j.pragma.2010.05.004
Rockwell, P.A. (2006). Sarcasm and other mixed messages: The ambiguous ways people use lan-
guage. Lewistown, NY/Queeston/Lampeter: Edwin Mellen.
Ruiz-Gurillo, L. (2009). ¿Cómo se gestiona la ironía en la conversación? RILCE, 23(2), 363–377.
Trager, G.L. (1958). Paralanguage: A first approximation. Studies in Linguistics, 13, 1–12.
Tsakona, V. (2011). Irony beyond criticism: Evidence from Greek parliamentary discourse.
Pragmatics and Society, 2(1), 57–86. DOI: 10.1075/ps.2.1.04tsa
Wennerstrom, A. (2001). The music of everyday speech. Oxford: Oxford University Press.
Wittgenstein, L. (1953). Philosophical investigations. New York: The Macmillan Company.
Alan Cienki
Vrije Universiteit (VU) / Free University
1. Introduction
The idea of schemas is certainly not novel with the field of cognitive linguistics.
While research on schemas has its origins in various disciplines, the work which
was among the most influential in the early years of cognitive linguistics (as noted
in Lakoff, 1987) came from cognitive psychology (originating most notably with
Rumelhart’s [1980] schema theory), computer science (dating back to Minsky’s
[1975] frames with defaults and Schank and Abelson’s [1977] scripts), and, of
course, philosophy: Johnson (1987) discusses Kant ([1781] 1968) as the particu-
lar inspiration for his idea of ‘image schemas’, and he notes (Johnson, 2005) the
doi 10.1075/bct.78.13cie
© 2015 John Benjamins Publishing Company
EBSCOhost - printed on 2/10/2023 2:18 AM via . All use subject to https://www.ebsco.com/terms-of-use
196 Alan Cienki
predecessor of this notion in the works of James (1890), Dewey ([1925] 1958),
and Merleau-Ponty ([1945] 1962).
Within cognitive linguistics, Langacker (1987, p. 132) provides the follow-
ing definition: “The notion of schematicity pertains to levels of specificity, i.e. the
fineness of detail with which something is characterized; … A schema is thus
abstract relative to its … elaborations in the sense of providing less information
and being compatible with a broader range of options…”. He concludes that the
fact that we are able to conceptualize situations at different levels of specificity,
and express some characterizations of these different levels linguistically, means
schematicity has huge implications for how language is structured and used: “The
linguistic significance of this ability is hard to overstate” (Langacker, 1987, p. 135).
Various categories of schemas have been proposed in cognitive linguistics
within different frameworks of analysis. These include the construct of schema as
it is used in cognitive grammar – “an abstract characterization that is fully com-
patible with all the members of the category it defines” (Langacker, 1987, p. 371)
and as discussed in terms of syntactic constructional schemas (Goldberg, 1995;
Tomasello, 1992). One of the most influential uses of the term has been in the
work on image schemas (Hampe, 2005; Johnson, 1987; Lakoff, 1987). Another
proposal, that of mimetic schemas (Zlatev, 2005, and elsewhere), is a more recent
addition to the field.
The latter two particular notions will be the focus of this article, as they are
two types which have particular significance for research in a different but related
field – the study of spontaneous gesture with speech. While scholarship on ges-
ture dates back at least to the time of the Roman orators (Kendon, 2004, Ch. 3),
research on it from a linguistic point of view was suppressed in the Anglo-Amer-
ican tradition by the dominance of a modular view of language, promulgated
within generative theories of linguistics, which entailed the, since much-lament-
ed, separation in the scholarship of the study of the mind from that of the body.
Work on gesture from a more psychological perspective only took hold among
(cognitive) linguists in the 1990s and thereafter upon the publication of McNeill
(1992) and the development of this work by psycholinguists.
Now we see an enthusiastic in-corpor-ation of gesture research into cognitive
linguistics, evidenced by its inclusion as a topic in cognitive linguistic conferences
and journals. This move, however, raises some interesting questions for concepts
that have been developed for cognitive linguistics. How do they apply to a broad-
er notion of language, beyond that found in written and spoken words? The as-
sumption has been that the embodied approach of cognitive linguistics should
be readily amenable to co-speech embodied behavior. The research on schemas
provides a good case study to ascertain the relevance of, and possible problems
2. Image schemas
Though having implicit origins in Lakoff & Johnson (1980), as discussed below,
the construct of image schemas was explicitly proposed in cognitive linguistics
in the same year by Johnson (1987) and Lakoff (1987). It was most notably char-
acterized as follows: “An image schema is a recurring, dynamic pattern of our
perceptual interactions and motor programs that gives coherence and structure
to our experience” (Johnson, 1987, p. xiv). Some examples include path1 and
related notions such as cycle; container and patterns related to containers and
areas, such as full-empty and center-periphery; and force relations such as
compulsion, attraction, and enablement. Johnson (1987, p. 126) provides a
list of some 27, including those listed above, which he counts as among the most
important, but it is not meant to be an exhaustive list, and others have been pro-
posed in subsequent research, such as straight (Cienki, 1998a) and self-mo-
tion and animate motion for animals and caused motion and inanimate
motion for artifacts (Mandler, 1992).
As Oakley (2007, p. 214) sums it up, the basic notion grew as an instrumental
part of the epistemology and moral philosophy that Johnson developed as well as
part of Lakoff ’s articulation of a theory of categorization. Lakoff provides two fun-
damental questions which gave rise to (among other ideas) the theorizing about
image schemas: “What kind of preconceptual structure is there to our experience
that could give rise to conceptual structure?” and “How can abstract concepts and
abstract reason be based on bodily experience?” (Lakoff, 1987, p. 267). Thus the
notion of image schemas arose within the establishment of a new philosophical
basis for a particular strand of cognitive science, one which led to Lakoff and
Johnson’s (1999) treatise on “the embodied mind and its challenge to Western
thought.”
Various attempts have been made to refine or qualify the notion of what im-
age schemas are, or the scope of what they encompass, including accounting for
their static and dynamic qualities, and the differing levels at which their schema-
ticity is relevant (Cienki, 1997; Quinn, 1991). One of the most important of these
sets of distinctions has been Grady’s differentiation between patterns which are
claimed to constitute “mental representations of fundamental units of sensory ex-
perience” (Grady, 2005, p. 44, emphasis in original) and fundamental units which
“relate to our interpretations of and responses to the world, our assessments of the
physical situations we encounter, their nature and their meaning” (Grady, 2005,
p. 47). Grady proposes limiting the former under the term ‘image schemas’ and
acknowledging the different status of the latter under a separate term; he proposes
‘response schemas’ (Grady, 2005, p. 46). While his category of image schemas in-
cludes, for example, center-periphery, container, and balance, others such
as cycle, scale, and process fall under his response schemas.
Within linguistics, image schemas have proven to be a useful notion in theo-
ries of grammar (Langacker, 1987), in psycholinguistic research (Gibbs & Colston,
1995), and most of all in numerous applied linguistic studies, particularly those
accounting for the polysemy of individual or related words or constructions and
semantic change (see Oakley, 2007, p. 219–223 for an overview). The relevance of
image schemas in these linguistic analyses has been taken by many cognitive lin-
guists as supporting evidence for the reality of image schemas in some way on the
cognitive and experiential levels, even if the status being claimed for that reality is
still acknowledged to be complex and subject to revision (Gibbs, 2005).
In sum we can say that the thinking about image schemas developed primar-
ily from philosophical reasoning within a subgroup of cognitive science oriented
toward the analysis of conceptual structure, and its successful uptake in linguistic
analyses bolstered (a) the credence in image schemas as a cognitive construct of
language users and (b) the attractiveness of image schemas as an explanatory tool
which linguists could use, particularly for semantic analyses. Interestingly, despite
the claims about the embodied basis of image schemas, the theorizing about them
originated starting in the 1980s without reference to gesture. One major reason
could be that gesture studies only gained wide reception in cognitive psycholo-
gy, and later in cognitive linguistics, after the publication of McNeill’s Hand and
Mind in 1992. However another important factor is the research methodology
used: much of the fundamental work on image schemas was based on intuitive
analysis of linguistic examples – phrases which were often invented by the authors
as plausible, but which were not drawn from corpora of actual usage.
If we consider talk as used in context by hearing, seeing co-participants, the
canonical encounter (Clark, 1973) of human communication is face to face inter-
action. This has led Kendon (1980, 2004) to approach gesture and speech as two
aspects of the process of utterance and McNeill (1985, 1992) to claim that gesture
and language are one system. Numerous heirs of their research tradition in ges-
ture studies now consider gesture as not nonverbal “body language” but rather as
co-verbal behavior. In light of the important role that image schemas have played
in cognitive linguistic theory and analysis, to what degree do image schemas pro-
vide a useful explanatory tool for researching the concrete, physically embodied
details of gestures?
same gesture type accompanying use of the progressive aspect in English, ex-
pressed with be + -ing, e.g., “there’s something going on in the city that…”, sug-
gesting that it can play a grammatical role and not simply occur with or in place
of lexical items. Williams (2008) argues that the path image schema – or more
precisely, the source-path-goal image schematic structure – underlies gestures
involving tracing, usually with one’s extended finger(s). Tracing of a path can sup-
port the speaker’s cognition but also, in contexts of demonstrating something or
when teaching, it can be used to guide the addressee’s conceptualization. Part
of the basis of arguing for straight as an image schema (Cienki, 1998a) had
to do with the distinctive recurring pattern of experience of muscular tension
and control involved in effortful, non-curvilinear movement of body parts. While
discussed in isolation in the studies mentioned above, image schematic patterns
often co-occur in gesture, e.g., movements involving path and iteration, or
path, straight, and up-down (Bressem, 2008). This resonates with claims about
the co-occurrence of certain image schemas in other aspects of our experience
(Cienki, 1997).
The study of schemas in cognitive linguistics has some close ties with research on
conceptual metaphors. With regard to image schemas, some might even argue that
we can see some circularity in the history of their development as a construct in
relation to metaphor studies. Even though image schemas were not named in the
1980 book as such, Oakley (2007, p. 214 ) notes that “[t]he locus classicus of image
schema theory is Lakoff and Johnson’s (1980) conceptual theory of metaphor.”
Generalizing over patterns of metaphors found in language led to conclusions
about underlying conceptual metaphors (mappings of target domain in terms
of source domain) that provided the structure for the linguistic expressions, and
what kinds of source domains that were showing up in the most fundamental
types of conceptual metaphors (such as more is up) provided answers to the ques-
tions posed by Lakoff (above). Indeed the convention of using small capital letters
to name image schemas follows naturally from many of them having been named
in Lakoff and Johnson (1980) as common metaphoric source domains.
The process of deriving image schemas from the analysis of metaphors in lan-
guage and then justifying those image schemas through later application of them
in the analysis of metaphors in linguistic data relies on reasoning that has been
critiqued by psychologists as circular (Gibbs & Colston, 1995, p. 354). However,
as we see below, the vicious cycle of reasoning – that verbal metaphoric expres-
sions provide evidence for conceptual metaphors and that we know that because
This research discussed above all concerns behavior of adult speakers. Image
schemas have been claimed to play an important role in early development as well
(e.g., Gibbs & Colston, 1995; Mandler, 1992, 2005), for example, as patterns which
infants may realize and thereby be capable of generalizing across perceptions. Im-
age schemas might therefore be expected to appear in children’s early gestural
behavior. However, Andrén (2008) argues that at least up to 27 months of age, chil-
dren “are not doing abstract and refined image schema-like gestures of the kind
that can be seen in adults until, possibly, the very end of [that time] period.” This
suggests that “performing refined schema-like gestures is not simply a question of
these abstracted image schematic structures of thought ‘spontaneously’ coming
out of the hands in the form of expression” (Andrén, 2008). Andrén supports the
position that less abstract schemas of actions provide a more fruitful option for
characterizing young children’s gestures, and that later in development, schematic
patterns more like those of image schemas become relevant in the structuring of
gestures (and thought). The importance, for children and adults, of patterns on
a less schematic level than that of image schemas has led Zlatev and colleagues
(see below) to investigate what they call mimetic schemas, patterns which relate
more closely to the basic level of categorization (Rosch, Mervis, Gray, Johnson,
& Boyes-Braem, 1976) of actions in human experience than image schemas do.
3. Mimetic schemas
The notion of mimetic schemas had a rather different starting point than that of
image schemas. One difference is that it arose from discussion among a team of
researchers coming from a variety of interrelated theoretical perspectives: “an in-
terdisciplinary group of linguists, semioticians, cognitive scientists and philoso-
phers” who have taken both “a phylogenetic and ontogenetic perspective” (Zlatev,
2005, p. 315), importantly including developmental research in the scope of their
work. In addition, the research has been multimodal in nature from the start,
concerning both the audio and visual modalities by considering the interrelation-
ship between language, gestures, and pictures (Zlatev, 2005, p. 315). Finally, it is
more recent in origin than the theory of image schemas. It has been developed
in Zlatev, Persson, & Gärdenfors (2005a, 2005b) and Zlatev (2005, 2007a, 2007b),
building on a key concept of bodily mimesis from Donald (1991). Therefore mi-
metic schemas were developed with image schemas as background knowledge, in
fact in comparison and contrast with them (Zlatev, 2005, §3).
Let us consider the specifics of mimetic schemas. Some examples that Zlatev
(2005, p. 317) proposes are eat, sit, kiss, hit, put in, take out, run, crawl, fly,
and fall.2 The difference from image schemas is clear, in that while the following
properties are possible descriptors of image schemas, they are definitional char-
acteristics of mimetic schemas. Zlatev (2005, p. 318) characterizes mimetic sche-
mas as bodily, representational (not just abstract patterns), dynamic, accessible to
consciousness, specific (relating to bodily acts), and pre-reflectively shared (since
they derive from culturally salient actions). As opposed to the potentially static
nature of some image schemas (Cienki, 1997), mimetic schemas are all about ac-
tions, and thus dynamic. In this way, they concern a different level of specificity
than image schemas. Thus, while each applies to a narrower range of phenomena,
it also is more information-rich (to return to the quote from Langacker, 1987,
with which we began). In this regard, they are argued to provide a strong ba-
sis for language development in children. Zlatev (2005, pp. 327–328) notes their
the close correspondence of the claimed mimetic schemas to the first verbs that
Tomasello (1992) observed from an English-speaking child between the ages of
16 to 24 months, such as hammer, kick, jump, swim, get-out. Note, however, that
detailed research has yet to be carried out on the metaphoric use of verbs express-
ing mimetic schemas, although there is interesting potential for this topic, given
that the greater specificity of mimetic schematic structures raises questions about
the ways and contexts in which they might be extended metaphorically.
While the notion of mimetic schemas is still new and has not yet been explic-
itly employed in gesture studies as a construct, some existing studies implicitly
support further exploration of it. Of particular interest here is work which con-
cerns gestures involving schematized versions of manual actions. Calbris’ (2003)
study, for example, concerns a family of gestures involving a flat hand making a
tense, straight movement either down or horizontally across. The horizontal vari-
ants, with the palm facing down, are often used by speakers of French, as Calbris
observes, but also in many other European cultures, when refusing or negating
something. One can see the possible origins in action via the kind of sweeping
motion made when removing small unwanted objects (such as dust particles or
water droplets) from a flat surface by wiping it. We can see how a mimetic rep-
resentation of this could be used in other contexts in which no physical object
was present that required sweeping. In this sense, the unwanted or refused idea is
metaphorically wiped away.
This kind of gesture is what is described by Müller (1998a, 1998b) as the
mode of representation in which the hand imitates or enacts an action it would
actually do, such as when one depicts writing with a pen by moving one’s empty
hand horizontally in the air with the hand shape gripped as if holding a pen. Sim-
ilarly, Streeck (2009) describes handling and mimesis (depicting action) as two
forms by which gestures can depict. These kinds of gestures appear to represent
mimetic schemas through motor patterns that are informationally rich. In some
contexts, enactment gestures are performed with a different kind of function than
a referential depiction of some action of the hand. Teßendorf and Ladewig (2008),
for instance, discuss the brushing away gesture used by Spanish speakers (but
again, also observed elsewhere) in which the slightly curled fingers of one hand
quickly flick outward, often done two or three times; the same gestural form to
brush something small and unwanted (such as crumbs or lint) off of one’s clothes
is also sometimes used in the air, not against any surface. In these cases, it can play
the role of indicating dismissiveness towards an idea that has been mentioned. In
this sense, the schematic action takes on a pragmatic function. In this regard it is
4. Discussion
We see from this overview how aspects of the current state of play in both sche-
ma research and gesture research manifest themselves more saliently as they are
brought into contact with each other. Image schemas and mimetic schemas have
been argued to perform different kinds of functions in cognitive terms for lan-
guage users, and have proven to be useful analytically for the researcher as tools
for linguistic analysis on different levels. Similarly, for gesture research, the two
notions provide different tools for analyzing, and different levels of explanation
for, gesture forms and functions.
For example, we saw above that while both types of schemas provide patterns
which can be used in gestures as source domains of metaphors, the target domains
of the metaphors involved appear to be different in the two cases. Metaphorically
used gestures based on image schemas seem to relate to ideas on the general level
of types of processes, reasoning, or behavior, while those based on mimetic sche-
mas, at least in the examples considered above, concerned more particular ideas,
like negation or dismissiveness. This could have to do with different types of sche-
maticity involved in the gestures expressing the source domains: with simple mo-
tions with less specific handshapes being characteristic of the image-schematic
type of gestures (such as path and cycle), and with handshapes more specifically
associated with basic level actions in the case of mimetic-schematic type gestures
(such as wipe and brush away). The greater schematicity of the gestures realizing
image schemas may allow for a wider variety of possible metaphoric extensions,
while the information richness of the mimetic schemas in gestures may constrain
their scope for metaphoric extension. However, confirmation of this hypothesis
will have to await further research on image schematic and mimetic schematic
structures in gestures as source domains for metaphors.
One question that arises from the discussion above is: at what cognitive level
are these schemas operating? Lakoff and Johnson (1999) place image schemas on
the level of the “cognitive unconscious,” though, Zlatev (2005, p. 322) observes,
Johnson (2005, p. 22) qualifies this by saying that the level at which image sche-
mas have meaning for us “typically operates beneath the level of our conscious
awareness” (emphasis added). Mimetic schemas, however, with their greater level
Acknowledgements
This article has its origins in a commentary for a theme session on “Motivation
in gesture: Image and motor schemas and their metaphorical extensions” at the
Third International Conference of the German Cognitive Linguistics Association,
held in Leipzig in September, 2008. I am grateful to Irene Mittelberg and Cornelia
Müller for having organized that panel, to Doris Schönefeld for valuable discus-
sion of ideas leading to this paper, to María Jesús Pinar Sanz for undertaking
the compilation of this special issue, and to two anonymous reviewers for their
helpful comments.
Notes
1. I will follow the convention in the literature of using small capital letters to indicate names
of image schemas.
2. The convention of also identifying mimetic schemas with small capital letters that Zlatev
uses will be followed here.
References
Cienki, A. (2005). Image schemas and gesture. In B. Hampe (Ed.), From perception to meaning:
Image schemas in cognitive linguistics (pp. 421–441). Berlin: Mouton de Gruyter.
DOI: 10.1515/9783110197532.5.421
Cienki, A. (2008). Why study metaphor and gesture? In A. Cienki & C. Müller (Eds.), Metaphor
and gesture (pp. 5–25). Amsterdam: John Benjamins. DOI: 10.1075/gs.3
Cienki, A. (2012). Usage events of spoken language and the symbolic units we (may) abstract
from them. In J. Badio & K. Kosecki (Eds.), Cognitive processes in language (pp. 149–158).
Bern: Peter Lang.
Clark, H. (1973). Space, time, semantics, and the child. In T.E. Moore (Ed.), Cognitive develop-
ment and the acquisition of language (pp. 27–63). New York: Academic Press.
Dewey, J. ([1925] 1958). Experience and nature. New York: Dover Publications. [Original edi-
tion: Chicago/London: Open Court].
Donald, M. (1991). Origins of the modern mind: Three stages in the evolution of culture and
cognition. Cambridge, MA: Harvard University Press.
Emanatian, M. (1997). The spatialization of judgment. In W.A. Liebert, G. Redeker, & L. Waugh
(Eds.), Discourse perspective in cognitive linguistics (pp. 131–147). Amsterdam: John Benja-
mins. DOI: 10.1075/cilt.151.11ema
Fricke, E. (([2008] 2012). Grundlagen einer multimodalen Grammatik des Deutschen: Syntak-
tische Strukturen und Funktionen. Habilitationsschrift, Frankfurt/Oder: Europa-Universi-
tät Viadrina. Republished as Grammatik multimodal (Berlin: Walter de Gruyter).
Gibbs, R.W., Jr. (2005). The psychological status of image schemas. In B. Hampe (Ed.), From
perception to meaning: Image schemas in cognitive linguistics (pp. 113–135). Berlin: Mouton
de Gruyter. DOI: 10.1515/9783110197532.2.113
Gibbs, R.W., Jr., & Colston, H. (1995). The psychological reality of image schemas and their
transformations. Cognitive Linguistics, 6, 347–378. DOI: 10.1515/cogl.1995.6.4.347
Goldberg, A. (1995). Constructions: A construction grammar approach to argument structure.
Chicago: University of Chicago Press.
Grady, J. (2005). Image schemas and perception: Refining a definition. In B. Hampe (Ed.), From
perception to meaning: Image schemas in cognitive linguistics (pp. 35–55). Berlin: Mouton
de Gruyter. DOI: 10.1515/9783110197532.1.35
Hampe, B. (Ed.). (2005). From perception to meaning: Image schemas in cognitive linguistics.
Berlin: Mouton de Gruyter. DOI: 10.1515/9783110197532
Harrison, S. (2009). Grammar, gesture, and cognition: The case of negation in English. PhD dis-
sertation. Bordeaux, France: Université Michel de Montaigne.
James, W. (1890). The principles of psychology. New York: Dover. DOI: 10.1037/11059-000
Johnson, M. (1987). The body in the mind: The bodily basis of meaning, imagination, and reason.
Chicago: University of Chicago Press.
Johnson, M. (2005). The philosophical significance of image schemas. In B. Hampe (Ed.), From
perception to meaning: Image schemas in cognitive linguistics (pp. 15–33). Berlin: Mouton
de Gruyter. DOI: 10.1515/9783110197532.1.15
Kant, I. ([1781] 1968). Critique of pure reason. Translated by N.K. Smith. New York: St. Martin’s
Press.
Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M.R.
Key (Ed.), The relation between verbal and nonverbal communication (pp. 207–227). The
Hague: Mouton.
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge: Cambridge University
Press. DOI: 10.1017/CBO9780511807572
Roth, W.M., & Lawless, D.V. (2002). How does the body get into the mind? Human Studies, 25,
333–358. DOI: 10.1023/A:1020127419047
Rumelhart, D.E. (1980). Schemata: The building blocks of cognition. In R.J. Spiro, B.C. Bruce
& W.F. Brewer (Eds.), Theoretical issues in reading comprehension (pp. 38–58). Hillsdale,
NJ: Erlbaum.
Schank, R.C., & Abelson, R.P. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Streeck, J. (2009). Gesturecraft: The manu-facture of meaning. Amsterdam: John Benjamins.
DOI: 10.1075/gs.2
Teßendorf, S., & Ladewig, S. (2008). The brushing-aside and the cyclic gesture–reconstructing
their underlying patterns. Talk presented at the third conference of the German Cognitive
Linguistics Association, Leipzig, Germany, September 2008.
Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge:
Cambridge University Press. DOI: 10.1017/CBO9780511527678
Traugott, E.C. (1988). Pragmatic strengthening and grammaticalization. Proceedings of the
Fourteenth Annual Meeting of the Berkeley Linguistics Society, 406–416.
Williams, R.F. (2008). Path schemas in gesturing for thinking and teaching. Talk presented at
the third conference of the German Cognitive Linguistics Association, Leipzig, Germany,
September 2008.
Zlatev, J. (2005). What’s in a schema? Bodily mimesis and the grounding of language. In B.
Hampe (Ed.), From perception to meaning: Image schemas in cognitive linguistics (pp. 313–
342). Berlin: Mouton de Gruyter. DOI: 10.1515/9783110197532.4.313
Zlatev, J. (2007a). Language, embodiment and mimesis. In T. Ziemke, J. Zlatev & R. Frank
(Eds.), Body, language and mind, vol. 1: Embodiment (pp. 297–337). Berlin: Mouton de
Gruyter.
Zlatev, J. (2007b). Intersubjectivity, mimetic schemas and the emergence of language. Intellec-
tica, 2–3(46–47), 123–152.
Zlatev, J., Persson, T., & Gärdenfors, P. (2005a). Bodily mimesis as the missing link in human
cognitive evolution. LUCS 121. Lund: Lund University Cognitive Studies.
Zlatev, J., Persson, T., & Gärdenfors, P. (2005b). Triadic bodily mimesis is the difference. Com-
mentary to Tomasello, et al. Behavioral and Brain Sciences, 28, 720–721.
DOI: 10.1017/S0140525X05530127