How Multimedia Can Improve Learning and Instruction
How Multimedia Can Improve Learning and Instruction
Preparation of this chapter was supported by Grant N000141262046 from the Office of Naval
Research.
460
“When the handle is pulled up, the piston moves up, the inlet value opens, the outlet valve closes, and air enters the lower part of the cylinder.”
“When the handle is pushed down, the piston moves down, the inlet valve closes, the outlet valve opens, and air moves out through the hose.”
Figure 18.1 Frames from narrated animation on how a bicycle tire pump works
Limited capacity principle: Only a few items can be processed in a channel at any
one time (Baddeley, 1992; Sweller, Ayres, & Kalyuga, 2011). This is
reflected in the working memory box in the middle column of Figure 18.2.
Active processing principle: Meaningful learning requires appropriate cognitive
processing during learning, including attending to relevant information,
mentally organizing it into a coherent structure, and integrating it with
relevant prior knowledge (Mayer, 2009; Wittrock, 1989). This is reflected
in the arrows for selecting, organizing, and integrating in Figure 18.2.
The boxes in Figure 18.2 represent memory stores and the arrows represent
cognitive processes during learning. The first box in Figure 18.2 consists of the
multimedia instructional message, which consists of words and pictures. The second
box represents sensory memory – spoken words are held briefly in auditory sensory
memory whereas pictures and printed words are held briefly in visual sensory
memory. If the learner pays attention, as indicated by the selecting arrows, some of
the words and images are transferred to working memory for further processing
within a system that has limited processing capacity in each channel. In working
memory, the learner can arrange words (including printed words transformed from
the visual channel) into a verbal model and images into a pictorial model, as
indicated by the organizing arrows. The final box is long-term memory, which
contains a permanent storehouse of knowledge. The learner activates relevant prior
knowledge and brings it into working memory, where it is connected with the
incoming information and where the verbal and pictorial models are connected, as
indicated by the integrating arrows.
Overall, meaningful learning occurs when the learner engages in appropriate
cognitive processing during learning, including selecting relevant words and images
from the multimedia message for further processing in working memory, mentally
organizing the words into a coherent structure (or verbal model) and the images into
a coherent structure (or pictorial model), and integrating the verbal and pictorial
representations with each other and with relevant prior knowledge activated from
long-term memory. The main challenge in instructional design is to guide learners to
engage in these process, while not overloading their limited processing capacity in
each channel of working memory. This challenge can be addressed by designing
multimedia instruction in ways that minimize extraneous processing (i.e., cognitive
processing that does not support the instructional objective, which can be caused by
poor instructional design), manage essential processing (i.e., cognitive processing
aimed at representing the presented material in working memory, which depends on
the complexity of the material for the learner), and foster generative processing (i.e.,
cognitive processing aimed at making sense of the material, which depends on the
learner’s motivation to exert effort). In short, designing effective multimedia instruc-
tion requires not only presenting the relevant material but also guiding the learner’s
cognitive processing of the material.
Note. ES = median effect size based on Cohen’s d; No. = number of positive effects out of total number of
comparisons.
have added two sentences at the end of the paragraph that present an interesting but
irrelevant fact (which can be called a seductive detail). Students learned better when
seductive details were excluded from the virus lesson (d = 0.80). Overall, across
twenty-three of twenty-three experimental comparisons, students performed better
on transfer tests when extraneous material was excluded, yielding a median effect
size of d = 0.86, which is considered a large effect. Thus, more learning occurs when
less is presented, that is, when the instructional message is kept as simple as possible.
Some possible boundary conditions are that the coherence principle applies most
strongly for learners with low working memory capacity, when the lesson is pre-
sented at a fast pace not under the learner’s control, and when the extraneous material
is highly distracting (Rey, 2012).
The signaling principle (also called the cueing principle) is that people learn better
when essential material is highlighted (Mayer, 2009; Mayer & Fiorella, 2014; van
Gog, 2014). Highlighting of printed text can involve the use of color, underlining,
bold, italics, font size, font style, or repetition. Highlighting of spoken text can
involve speaking louder or with more emphasis. Highlighting of graphics includes
the use of arrows, color, flashing, and spotlights. For example, in a narrated
Figure 18.3 Do people learn better when we add interesting but extraneous text?
slideshow lesson on how airplanes achieve lift, signaling involved adding headings
such as “Wing Shape: Curved Upper Surface Is Longer,” and emphasizing key
words, such as the emboldened words in the following phrase: “surface on top of
the wing is longer than on the bottom.” Mautone and Mayer (2001) reported better
transfer test performance for students who learned from a signaled multimedia lesson
than from a nonsignaled lesson (d = 0.65). Overall, there was a positive signaling
effect in twenty-four of twenty-eight published experimental comparisons, yielding
a median effect size of d = 0.41, which is considered in the small to medium range.
Some possible boundary conditions are that the signaling effect can be stronger for
low-knowledge learners (Naumann, et al., 2007), when the graphics are complex
(Jeung, Chandler, & Sweller, 1997), and when signaling is used sparingly (Stull &
Mayer, 2007).
The spatial contiguity principle is that people learn better when printed words are
placed near to rather than far from corresponding graphics (Ayers & Sweller, 2014;
Ginns, 2006; Mayer & Fiorella, 2014). For example, Figure 18.4a shows a version of
a lesson on car braking systems with the words presented as a caption at the bottom
of the page or screen (i.e., separated presentation) whereas Figure 18.4b shows the
words placed near the part of the graphic they describe (i.e., integrated presentation).
Johnson and Mayer (2012) reported that students performed substantially better on
transfer tests when they received integrated presentations rather than separated
Figure 18.4 Which instructional method leads to better learning about braking
systems?
presentations, even though the words and graphics were identical in both treatments
(d = 0.73). Overall, there was a positive effect for spatial contiguity in twenty two out
of twenty two published experiments, yielding a median effect size of d = 1.22,
which is a large effect. Some possible boundary conditions are that the spatial
contiguity effect can be stronger when learners are low in prior knowledge (Mayer
et al., 1995) and when the material is complex (Ayres & Sweller, 2014).
The temporal contiguity principle is that people learn better from a narrated
lesson, when the spoken words are presented simultaneously with the corresponding
graphics such as drawings, animation, or video (Ginns, 2006; Mayer & Fiorella,
2014). In successive presentation, the spoken words are presented before (or after)
the graphics are presented. In nine out of nine published experimental comparisons,
students performed better on transfer tests with simultaneous rather than successive
presentations, yielding a median effect size of d = 1.22, which is a large effect. Some
possible boundary conditions are that the temporal contiguity principle is diminished
when the material is very simple (Ginns, 2006), when the material is presented in
very short chunks (Mayer, et al., 1999; Moreno & Mayer, 1999; Schuler et al., 2012),
and when the lesson is slow-paced or under learner control (Michas & Berry, 2000).
The redundancy principle is that people learn better from narration and graphics
than from narration, graphics, and redundant printed text (Adesope & Nesbit, 2012;
Kalyuga & Sweller, 2014; Mayer & Fiorella, 2014). For example, Figure 18.5a
shows a slide from a lesson on lightning that includes animation and narration,
whereas Figure 18.5b shows a slide that includes animation, narration, and onscreen
text that duplicates the narration. Mayer, Heiser, and Lonn (2001) reported that
students performed better on transfer tests when they received a narrated animation
rather than a narrated animation with redundant onscreen text (d = 0.77). Overall, in
sixteen of sixteen published experiments, people performed better on transfer tests
when redundant onscreen text was excluded rather than included, with a median
effect size of d = 0.86, which is a large effect. Some important boundary conditions
are that the redundancy principle may not apply when no graphics are presented
(Moreno & Mayer, 2002), only a few key words are printed on the screen (Mayer &
Johnson, 2008), or the onscreen text is worded differently than the spoken text (Yue,
Bjork, & Bjork, 2013).
The next three principles in Table 18.1 are aimed at managing essential processing
(i.e., cognitive processing for mentally representing the essential material in working
(a) Animation and Narration (b) Animation, Narration, and On-Screen Text
ÒAs the air in this updraft cools, water vapor ÒAs the air in this updraft cools, water vapor
condenses into water droplets and forms a cloudÓ. condenses into water droplets and forms a cloudÓ.
Figure 18.5 Which instructional method leads to better learning from an online
slideshow?
memory). When the material is complex for the learner, the amount of essential
processing required to mentally represent the material may overload working mem-
ory capacity. In this case, the learner needs to be able to manage his or her processing
capacity in a way that allows for representing the essential material. Three techni-
ques for accomplishing this goal are breaking the essential material into manageable
parts (i.e., segmenting), learning about the names and characteristics of key elements
before the lesson is presented (i.e., pretraining), and presenting words in spoken form
rather than printed form (i.e., modality).
The segmenting principle calls for breaking a multimedia lesson into manageable
parts (Mayer & Pilegard, 2014). For example, rather than presenting a 2.5 minute
narrated animation on lighting formation as a continuous presentation, suppose we
break it into sixteen segments, each about 10 seconds long with about one sentence,
and allow the learner to click on a CONTINUE key to go to the next segment.
A sample slide is shown in Figure 18.6. This design allows the learner to digest one
step in the process of lightning formation before going on to the next one. Mayer and
Chandler (2001) found that students performed better on transfer tests when they
received segmented rather than continuous lessons on lightning formation, with an
effect size of d = 1.13. The segmenting principle was supported in ten of ten
published experiments, yielding a median effect size of d = 0.79, which is nearly
a large effect. Concerning boundary conditions, the segmenting principle may apply
more strongly for students with low working memory capacity (Lusk et al., 2009)
and for students who are low-achieving (Ayres, 2006).
The pretraining principle calls for teaching students about the names and character-
istics of key elements before presenting the multimedia lesson (Mayer & Pilegard,
2014). For example, before presenting a narrated animation depicting how a car’s
braking system works, students can be presented with a diagram of the braking system
showing the key parts – e.g., brake petal, piston, wheel cylinders, and brake shoes – as
Figure 18.6 Do people learn better when a CONTINUE button is added after
each segment?
shown in Figure 18.7. When the learner clicks on a part, such as the piston, the
computer shows that the part is called a piston and tells the learner that the piston can
move forward and back. Mayer, Mathias, and Wetzell (2002) found that students who
received this pretraining before the multimedia lesson performed better on transfer
Figure 18.7 Do people learn better when they receive pretraining in the names
and characteristics of the key elements?
tests than those who received no pretraining (d = 0.86). In thirteen of sixteen published
experiments, pretrained learners performed better on transfer tests than non–pretrained
learners, with a median effect size of d = 0.75, which is in the medium range.
An important boundary condition is that the pretraining principle may apply to low-
knowledge but not high-knowledge learners (Pollock, Chandler, & Sweller, 2002).
The modality principle is that people learn better from multimedia presentations
when the words are spoken rather than printed (Low & Sweller, 2014; Mayer &
Pilegard, 2014). The rationale is that the visual channel may become overloaded by
having to process both graphics and printed words, but processing capacity in the
visual channel can be freed up when the words are spoken and therefore processed in
the verbal channel. For example, Figure 18.8(a) shows a frame from a narrated
animation on lightning whereas Figure 18.8(b) shows a frame from the same lesson
with words printed on the screen as a caption. Mayer and Moreno (1998) found strong
evidence that students performed better on transfer tests when the words were spoken
rather than printed for this fast-paced animation that was presented under system
control (d = 1.49). The modality principle is the most studied of all the multimedia
design principles, with positive effects found in fifty-three of sixty-one published
experiments, yielding a median effect size of d = 0.76, which is in the medium range.
Some of the boundary conditions identified in the literature are that the effect can be
eliminated when the lesson is self-paced (Tabbers, Martens, & van Merrienboer, 2004)
or when the verbal segments are long and complex for learners (Schuler et al., 2012).
The final three principles in Table 18.1 are intended to foster generative proces-
sing, that is, cognitive processing aimed at making sense of the presented material.
Even if cognitive capacity is available, learners may not be motivated to use it to
process the material deeply. Social cues can help motivate learners to engage in
deeper processing because people tend to want to understand what a communication
partner is telling them. Thus, principles based on social cues are intended to make
learners feel as if they are in a conversation with the instructor, that is, they feel that
(a) Animation and Narration (b) Animation, Narration, and On-Screen Text
Figure 18.8 Which instructional method leads to better learning from an online
slideshow?
Table 18.2 Portions of nonpersonalized and personalized text from a narrated animation on how
the human respiratory system works
Nonpersonalized Version
“During inhaling, the diaphragm moves down creating more space for the lungs, air enters through the
nose or mouth, moves down through the throat and bronchial tubes to tiny air sacs in the lungs . . . ”
Personalized Version
“During inhaling, your diaphragm moves down creating more space for your lungs, air enters through
your nose or mouth, moves down through your throat and bronchial tubes to tiny air sacs in your
lungs . . . ”
the instructor is a social partner. This approach yields the newest of the multimedia
design principles, including using conversational language (personalization princi-
ple), using an appealing human voice (voice principle), and using humanlike ges-
tures (embodiment principle).
The personalization principle is that people learn better from a multimedia lesson
when the words are in conversation style rather than formal style (Ginns, Martin, &
Marsh, 2013; Mayer, 2014b). For example, Table 18.2 shows a portion of the words
from a lesson on how the human respiratory system works presented in third-person
form (e.g., “the lungs”) or in first- and second-person form (e.g., “your lungs”).
Students performed better on a transfer test when the words were in conversational
style (i.e., in first- and second-person form), with an effect size of d = 0.79 (Mayer
et al., 2004). Overall, there were positive effects in fourteen of seventeen published
experiments on personalization (including polite vs. direct wording), yielding
a median effect size of d = 0.79 which is nearly a large effect. Concerning boundary
conditions, the personalization principle works best for less knowledgeable learners
(McLaren, DeLeeuw, & Mayer, 2011a, 2011b; Wang et al., 2008) and lower achiev-
ing learners (Yeung et al., 2009) as well as with shorter lessons (Ginns et al., 2013).
The voice principle is that people learn better from multimedia lessons involving
spoken words when the narrator has an appealing human voice rather than a machine
voice or an unappealing voice (Mayer, 2014b). In five out of six experimental
comparisons, people learned better from narrated animations – such as a 2.5 minute
animated presentation on lightning formation (Mayer, Sobko, & Mautone, 2003) –
when the words were spoken in an appealing human voice rather than in a machine
voice or in an unappealing human voice, yielding a median effect size of d = 0.74.
An important boundary condition is that the positive impact of a human voice can be
overturned by the use of negative social cues such as presenting an onscreen agent
that does not engage in humanlike gesturing (Mayer & DaPra, 2012).
The embodiment principle is that people learn better from multimedia lessons in
which an onscreen agent or instructor uses humanlike gesture (Mayer, 2014b). For
example, Mayer and DaPra (2012) presented students with a narrated slideshow
lesson on how solar cells work in which an onscreen animated pedagogical agent
stood next to the slide (as shown in Figure 18.9) and either displayed humanlike
gestures or did not move during the lesson. Students learned better when the
Figure 18.9 Do people learn better when an onscreen agent uses humanlike
gestures or stands still?
References
Adesope, O. O. & Nesbit, J. C. (2012). Verbal redundancy in multimedia learning environ-
ments: A meta-analysis. Journal of Educational Psychology, 104, 250–263.
Anderson, L. W., Karthwohl, D. R., & Airasian, P. W. et al. (2001). A taxonomy for learning,
teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives.
New York: Longman.
Ayres, P. (2006). Impact of reducing intrinsic cognitive load on learning in a mathematical
domain. Applied Cognitive Psychology, 20(3), 287–298.
Ayres, P. & Sweller, J. (2014). The split attention principle in multimedia learning.
In R. E. Mayer (ed.), The Cambridge handbook of multimedia learning, 2nd edn
(pp. 206–226). New York: Cambridge University Press.
Baddeley, A. D. (1992). Working memory. Science, 255, 556–559.
Butcher, K. R. (2014). The multimedia principle. In R. E. Mayer (ed.), The Cambridge
handbook of multimedia learning, 2nd edn (pp. 174–205). New York: Cambridge
University Press.
Clark, R. C. & Mayer, R. E. (2016). e-Learning and the science of instruction, 4th edn.
Hoboken, NJ: Wiley.
Clark, R. E. (2001). Learning from media. Greenwich, CT: Information Age Publishing.
Comenius, J. A. (1887). Orbis pictus. Syracuse, NY: Bardeen.
Cuban, L. (1986). Teachers and machines: The classroom use of technology since 1920.
New York: Teachers College Press.
Gee, J. P. (2003). What video games have to teach us about learning and literacy. New York:
Palgrave Macmillan.
Ginns, P. (2006). Integrating information: A meta-analysis of spatial contiguity and temporal
contiguity effects. Learning and Instruction, 16, 511–525.
Ginns, P., Marin, A. J., & Marsh, H. M. (2013). Designing instructional text for conversational
style: A meta-analysis. Educational Psychology Review, 25, 445–472.
Harskamp, E. G., Mayer, R. E., & Suhre, C. (2007). Does the modality principle for multi-
media learning apply to science classrooms? Learning and Instruction, 17,
465–477.
Hattie, J. (2009). Visible learning. New York: Routledge.
Issa, N., Mayer, R. E., Schuller, S., Wang. E., Shapiro, M. B., & DaRosa, D. A. (2013).
Teaching for understanding in medical classrooms using multimedia design
principles. Medical Education, 47, 388–396.
Issa, N., Schuller, M., Santacaterina, S., Shapiro, M., Wang, M., Mayer, R. E., &
DaRosa, D. A. (2011). Applying multimedia design principles enhances learning
in medical education. Medical Education, 45, 818–826.
Jeung, H., Chandler, P., & Sweller, J. (1997). The role of visual indicators in dual sensory
mode instruction. Educational Psychology, 17, 329–433.
Johnson, C. & Mayer, R. E. (2012). An eye movement analysis of the spatial contiguity effect
in multimedia learning. Journal of Experimental Psychology: Applied, 18, 178–191.
Kalyuga, S. (2014). The expertise reversal principle in multimedia learning. In R. E. Mayer
(ed.), The Cambridge handbook of multimedia learning, 2nd edn (pp. 576–597).
New York: Cambridge University Press.
Kalyuga, S. & Sweller, J. (2014). The redundancy principle in multimedia learning.
In R. E. Mayer (ed.), The Cambridge handbook of multimedia learning, 2nd edn
(pp. 247–262). New York: Cambridge University Press.
Levin, J. R. & Mayer, R. E. (1993). Understanding illustrations in text. In B. K. Britton,
A. Woodworth, & M. Binkley (eds.), Learning from textbooks (pp. 95–113).
Hillsdale, NJ: Lawrence Erlbaum.
Low, R. & Sweller, J. (2014). The modality principle in multimedia learning. In R. E. Mayer
(ed.), The Cambridge handbook of multimedia learning, 2nd edn (pp. 227–246).
New York: Cambridge University Press.
Lusk, D. L., Evans, A. D., Jeffrey, T. R., Palmer, K. R., Wikstrom, C. S., & Doolittle, P. E.
(2009). Multimedia learning and individual differences: Mediating the effects of
working memory capacity with segmentation. British Journal of Educational
Technology, 40(4), 636–651.
Mautone, P. D. & Mayer, R. E. (2001). Signaling as a cognitive guide in multimedia learning.
Journal of Educational Psychology, 93, 377–389.
Mayer, R. E. (2008). Multimedia literacy. In D. J. Leu, J. Coiro, M. Knobel, & C. Lankshear
(eds.), Handbook of research on new literacies (pp. 359–377). Mahwah, NJ:
Lawrence Erlbaum.
(2009). Multimedia learning, 2nd edn. New York: Cambridge University Press.
(ed.) (2014a). The Cambridge handbook of multimedia learning, 2nd edn. New York:
Cambridge University Press.
(2014b). Principles based on social cues in multimedia learning: Personalization, voice,
image, and embodiment principles. In R. E. Mayer (ed.), The Cambridge handbook
of multimedia learning, 2nd edn (pp. 345–368). New York: Cambridge University
Press.
(2014c). Computer games for learning: An evidence-based approach. Cambridge, MA:
MIT Press.
Mayer, R. E. & Anderson, R. B. (1991). Animations need narrations: An experimental test of a
dual-coding hypothesis. Journal of Educational Psychology, 83, 484–490.
Mayer, R. E. & Chandler, P. (2001). When learning is just a click away: Does simple user
interaction foster deeper understanding of multimedia messages? Journal of
Educational Psychology, 93, 390–397.
Mayer, R. E. & DaPra, C. S. (2012). An embodiment effect in computer-based learning with
animated pedagogical agent. Journal of Experimental Psychology: Applied, 18,
239–252.
Mayer, R. E., Fennell, S., Farmer, L., & Campbell, J. (2004). A personalization effect in
multimedia learning: Students learn better when words are in conversational style
rather than formal style. Journal of Educational Psychology, 96, 389–395.
Mayer, R. E. & Fiorella, L. (2014). Principles for reducing extraneous processing in multi-
media learning: Coherence, signaling, redundancy, spatial contiguity, and temporal
Rey, G. D. (2012). A review and meta-analysis of the seductive detail effect. Educational
Psychology Review, 7, 216–237.
Schank, R. C. (2002). Designing world-class e-learning. New York: McGraw-Hill.
Schuler, A., Scheiter, K., Rummer, R., & Gerjets, P. (2012). Explaining the modality effect in
multimedia learning: Is it due to a lack of temporal contiguity with written text and
pictures? Learning and Instruction, 22, 92–102.
Stull, A. & Mayer, R. E. (2007). Learning by doing versus learning by viewing: Three
experimental comparisons of learner-generated versus author-provided graphic
organizers. Journal of Educational Psychology, 99, 808–820.
Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. New York: Springer.
Tabbers, H. K., Martens, R. L., & van Merrienboer, J. J. G. (2004). Multimedia instructions
and cognitive load theory: Effects of modality and cueing. British Journal of
Educational Psychology, 74, 71–81.
van Gog, T. (2014). The signaling (or cueing) principle in multimedia learning. In R. E. Mayer
(ed.), The Cambridge handbook of multimedia learning, 2nd edn (pp. 263–278).
New York: Cambridge University Press.
Wang, N., Johnson, W. L., Mayer, R. E., Rizzo, P., Shaw, E., & Collins, H. (2008).
The politeness effect: Pedagogical agents and learning outcomes. International
Journal of Human-Computer Studies, 66, 98–112.
Wittrock, M. C. (1989). Generative processes of comprehension. Educational Psychologist,
24, 345–376.
Yeung, A., Schmid, S., George, A. V., & King, M. M. (2009). Using the personalization
hypothesis to design e-learning environments. In M. Gupta-Bhowon, S. Jhaumer-
Laulloo, H. K. L. Wah, & P. Ramasami (eds.), Chemistry education is the ICT age
(pp. 287–300). Berlin: Springer.
Yue, C. L., Bjork, E. L., & Bjork, R. A. (2013). Reducing verbal redundancy in multimedia
learning: An undesired desirable difficulty? Journal of Educational Psychology,
105, 266–277.