Early Reading Instruction What Science Really Tells Us About How To Teach Reading Bradford Books PDF
Early Reading Instruction What Science Really Tells Us About How To Teach Reading Bradford Books PDF
Instruction
What Science Really
Tells Us about How to
Teach Reading
Diane McGuinness
E arl y R e ading Ins t r uc t ion
E arl y R e ading Ins t r uc t ion
Diane McGuinness
A Bradford Book
The MIT Press Cambridge, Massachusetts London, England
( 2004 Diane McGuinness
All rights reserved. No part of this book may be reproduced in any form by
any electronic or mechanical means (including photocopying, recording, or
information storage and retrieval) without permission in writing from the
publisher.
This book was set in Janson and Rotis Semi-sans on 3B2 by Asco Typesetters,
Hong Kong.
Printed and bound in the United States of America.
10 9 8 7 6 5 4 3 2 1
C on t ent s
Preface vii
Introduction xiii
1 Why English-Speaking Children Can’t Read 1
Glossary 363
References 371
Author Index 395
Five thousand years ago, Egyptian and Sumerian scholars designed the
first full-fledged writing systems. Though these systems were radically
different in form, the Egyptians marking consonants and whole-word
category clues, and the Sumerians marking syllables, both were complete
and self-contained. Any name, any word, or any word yet to come, could
be immediately assigned the appropriate symbols representing that word’s
phonology.
Schools were established for the sons of the elite—the rulers, priests,
administrators, and wealthy farmers, plus the obviously gifted—and not
much changed in this regard until the nineteenth century, when the
universal-education movement began gathering momentum. Up to this
point, no one kept track of which children were more or less successful in
mastering this extraordinary invention. But with children sorted by age,
and every child in attendance, individual differences in learning rate and
skill were hard to ignore. In most European countries, individual differ-
ences were minor, and when problems did occur, they impacted reading
fluency and reading comprehension. In English-speaking countries, indi-
vidual differences were enormous. Some children were learning to read
quickly but others were not learning to read at all, despite years of teach-
ing. This applied across the board—to decoding, spelling, fluency, and
comprehension. Was this failure due to the teaching method, the nature
of the written code itself, or something inherent in the child?
Answering this question took most of the twentieth century, and now
that the answers are in, there are some huge surprises. Reading and spell-
ing are easy to teach if you know how to do it. Influential theories driving
much of the research over the past 30 years are not supported by the data.
Meanwhile, the volume of research has snowballed to such an extent that
viii
the quantity of studies has become unmanageable. The huge and formi-
Preface
Preface
be noted that this key does not conform to the International Phonetic
Alphabet (IPA). Instead, it represents the most common spelling in En-
glish for each phoneme. The IPA is a particularly poor fit to the English
spelling system compared to other European alphabets, which are more
directly tied to the Latin sound-symbol code. As such, the IPA is confus-
ing to people unfamiliar with it. For example, the IPA marks the sound ah
with the letter a. In English, this letter typically stand for the sounds /a/
(cat) or /ae/ (table), while ah is marked with the letter o (hot), which is the
symbol for the sound /oe/ in the IPA. This muddle exists for most vowel
spellings.
A glossary of terms is provided at the end of the book.
/sh/ shop sh
Preface
/th/ thin th
/th/ then th
/zh/ azure —
/ks/ tax x
/kw/ quit qu
Vowels
Sound As in Basic code spelling
/a/ had a
/e/ bed e
/i/ hit i
/o/ dog o
/aw/ law aw
/u/ but u
/ae/ made a-e
/ee/ see ee
/ie/ time i-e
/oe/ home o-e
/ue/ cute u-e
^
/oo/ look oo
/oo/ soon oo
ou out ou
oi oil oi
Vowel þ r
/er/ her er
/ah/–/er/ far ar
/oe/–/er/ for or
/e/–/er/ hair air
There are nine vowel þ r phonemes. All but one (/er/) are diphthongs—
two sounds elided that count as one vowel. Those listed above have special
spellings and need to be specifically taught. The remainder use more
conventional spellings and can be taught as ‘‘two sounds’’: /eer/ /ire/ /ure/
/oor/ /our/ as in deer, fire, cure, poor, our.
L an g u a g e D e v e l o p m e n t a n d L e a rn i n g t o R e a d
C o n t e n ts
Preface
I Does Phonological Awareness Develop?
1 The Origin of the Theory of Phonological Development
2 Development of Receptive Language in the First Year of
Life
3 Speech Perception After One
4 Links: Auditory Analysis, Speech Production, and
Phonological Awareness
5 Young Children’s Explicit Awareness of Language
6 What is Phoneme Awareness and Does It Matter?
II Expressive Language, Reading, and Academic Skill
7 The Development of Expressive Language
8 The Impact of General Language Skills on Reading and
Academic Success
III Direct Tests of the Language-Reading Relationship
9 An Introduction to Reading Research: Some Pitfalls
10 Auditory and Speech Perception and Reading
11 General Language and Reading: Methodological Issues
12 Vocabulary and Reading
13 Verbal Memory and Reading
14 Syntax and Reading
15 Naming Speed and Reading
16 Slow Readers: How Slow Is Slow?
17 Summary: What Do We Know for Sure?
Int r o duc t ion
call the many-word problem. This is the fact that, even if a logical or rea-
sonable instructional method for spelling was introduced into the class-
room, it will never be possible to teach children how to spell every word
in the English language. Because our spelling system is so unpredictable,
what cues or kinds of exposure determine how people learn to spell? The
answers to this question are new and surprising.
Early Reading Instruction is largely an inductive analysis of the histori-
cal evidence and the empirical research on reading instruction. Because
reading methods are complex and the field is controversial, I will be
addressing methodology in more depth than is usually necessary. In part
this is because there is no mechanism in the field for screening out invalid
research. Examples of good and bad research alike can be found in every
journal (even in the top-rated flagship journals) and are very hard to tell
apart. This is just as true of the high-profile studies cited by everyone in
the field, as for research on major breakthroughs that nobody knows
about. This is particularly an issue for chapters 5 through 8, which review
the research screened by the National Reading Panel. To make this
material as accessible as possible, I have provided summaries at the end of
each chapter.
Introduction
words are mainly based on those units (CVCVCV ), like the languages of
India and Japan. An English word with this structure is potato. Consonants
only (CCCC) are used for languages where consonant sequences carry
the meaning load and vowels indicate changes in grammar, like Semitic
languages. Phonemes (individual consonants and vowels) are used for all
languages with a highly complex syllable structure and no common pho-
nological patterns, like European languages. History shows these phono-
logical units are never mixed in any writing system.
European languages are written in an alphabet, because they cannot
be written any other way. This is a fact, and there is nothing we can do
about it. The evidence reviewed in this book shows that when you follow
the principles by which writing systems are constructed and teach the
English writing system appropriately, 4-year-olds can easily learn to read
in about 10 to 12 weeks. It makes no sense to continue teaching reading
the way we do.
in the sense that it is obvious how they work, making them easy to teach
and learn.
Highly opaque alphabet codes, like English, have multiple spellings
for the same phoneme (be, see, sea, scene, field, deceit, radio, marine, lucky,
key), and multiple decodings for the same letter or digraph (soup, soul,
shout, should, rough, cough, journey). Opaque writing systems can be very
hard to teach unless teachers use a structured method that mitigates these
difficulties. Poor teaching of a difficult code creates enormous confusion
and can lead to reading problems and failure.
If success in learning to read is intimately bound to the form of the
script, no universal laws can be applied to it. In other words, reading is not
a natural aptitude—‘‘a property of the child’’—and children who fail to
learn to read do so mainly because of environmental causes, not biological
causes.
An Opportunity Lost
In the early nineteenth century, Isaac Pitman, a self-taught linguist and
author of the famous shorthand method, hit on a solution for how to teach
our complex alphabet code. Spelling reformers had been clamoring for
changes to our spelling system since the sixteenth century, with no suc-
cess. John Hart (An Orthographie, 1569) and Richard Mulcaster (The First
Part of the Elementarie, 1582) advocated sweeping reforms almost to the
point of scrapping everything and starting over. Mulcaster called for a new
phonetic spelling system—a transparent alphabet. Although nothing like
this ever happened, Mulcaster did manage to eliminate some oddities,
such as the unnecessary letter doublings and e’s at the ends of words
(greate, shoppe). He was responsible for standardizing the letter e as a dia-
critic, a symbol that signals a pronunciation, as in the words came, time,
fume, home.
No true spelling reform of the type that Hart and Mulcaster advo-
cated ever succeeded, despite numerous efforts by highly influential peo-
ple over the centuries. And while Samuel Johnson was able to standardize
spelling in his famous dictionary in 1755, he standardized the spellings for
words, but not the spellings for phonemes.
Pitman’s solution was to adapt his shorthand method for use in be-
ginning reading instruction by setting up what I call an artificial transpar-
xvii
Introduction
the 40 English phonemes. He developed this into a classroom program
called Phonotypy with A. J. Ellis in 1847. An artificial transparent alphabet
levels the playing field for English children vis-à-vis their continental
cousins, and provides the novice reader with critical information about
how alphabet codes work:
units.
All codes are reversible. Reading and spelling (decoding/encoding) are
dren’s language development and cannot think past it. According to this
theory, children gradually develop an awareness of speech sounds in the
order: words, syllables, phonemes—a process alleged to take 7 to 8 years.
This has led to the proliferation of phonological-awareness training pro-
grams, that are, at best, a waste of time (see chapters 5 and 6). The theory
that phonological awareness develops from larger to smaller units of
sound, or even that it develops at all, is contradicted by the evidence from
the mainstream research on early language development (see Language
Development and Learning to Read ).
We have the knowledge to teach every child to read, write, and spell
at an amazingly high level of skill. So far, this knowledge has not been
made available to educators, legislators, parents—or to many researchers.
The vast quantity of invalid and unreliable research clogging the databases
makes it almost impossible for anyone interested in the field to ferret out
what is accurate and important and what is not. The National Reading
Panel has made an important start on this problem, and the primary goal
of Early Reading Instruction is to try to finish the task.1
1. Readers interested in the early history of spelling reform and reading in-
struction should consult Scragg 1974; Morris 1984; Balmuth 1992; McGuin-
ness 1997c, 1998b.
1
WHY ENGLISH-SPEAKING CHILDREN CAN’T READ
difficulty with the German alphabet code. It turned out their problem was
Chapter 1
reading too slowly. But slow is a relative term. How slow is slow?
To find out, Wimmer collaborated with an English researcher
(Wimmer and Goswami 1994) to compare normal 7- and 9-year-olds from
Salzburg and London. The results were startling. The Austrian 7-year-
olds read comparable material as rapidly and fluently as the English 9-
year-olds, while making half as many errors. Yet the Austrian 7-year-olds
had had 1 year of reading instruction, while the English 9-year-olds had
been learning to read for 4 or 5 years. Equal speed and half the errors in
one-quarter of the learning time is an eightfold increase in efficiency!
Wimmer and his colleagues (Landerl, Wimmer, and Frith 1997) got
the same extraordinary results when they compared their worst readers
(incredibly slow) with English children identified as ‘‘dyslexic’’ (incredibly
inaccurate). The children were asked to read text consisting of nonsense
words. The so-called Austrian slow readers were not only more accurate
than the English ‘‘dyslexics,’’ but they read twice as fast. The average
Austrian ‘‘slow reader’’ would be able to read a 500-word passage in about
10 minutes, misreading only 7 percent of the words. The average English
‘‘dyslexic’’ would read only 260 words in this time, and misread 40 percent
of the words. It seems the expression ‘‘worst reader’’ is relative as well.
An even more dramatic study was reported from Italy. Cossu, Rossini,
and Marshall (1993) tested Down’s syndrome children with IQs in the 40s
(100 is average) on three difficult reading tests. They scored around 90
percent correct, breezing through Italian words like sbaliare and funebre.
However, they could not comprehend what they read, and they failed
miserably on tests of phoneme awareness, the skill that is supposed to be
essential to decoding.
‘‘Come, come, John. See me. I can swing. Come and see.’’
Phonics lessons came in late or not at all, and made no sense. This ap-
proach was the platform for ‘‘basal readers’’ (U.S.) or ‘‘reading schemes’’
(U.K.), products of the educational publishing houses. Basal readers
dominated from the 1930s until the late 1970s. In the mid-1960s, a survey
showed that basal readers were used in 95 percent of classrooms in the
United States. Many people still remember Dick and Jane or Janet and
John.
The extreme dullness and repetitiveness of the basal-reader method,
plus other precipitating factors, eventually led to a backlash. Basal readers
were swept away by a third whole-word method that came to be known
as ‘‘whole language’’ (U.S.) or ‘‘real books’’ (U.K.). The theory behind
whole language is that with minimal guidance, children can teach them-
selves to read naturally. They do this by following along as the teacher
reads stories written in natural language, and by reading on their own
while using all their ‘‘cuing systems.’’ These include everything from
guessing words based on context and the illustrations, to sight-word mem-
5
Nouvelle Eclecticism
In the 1990s, reading researchers and directors of research agencies, sup-
ported by state and national politicians, launched a campaign to rescue
children from whole language, claiming they wanted a return to phonics.
But after nearly a century, no one was quite sure what phonics was.
Instead, what they proposed was not phonics, but a new kind of eclecti-
cism. In the past, eclecticism referred to a teacher’s habit of mixing dif-
ferent approaches and materials in the mistaken belief that children have
different learning styles. This form of eclecticism is individualistic and
haphazard.
‘‘New eclecticism’’ is based on the notion of developmental gradualism,
a consequence of the myth that children become more phonologically
aware as they grow older. Children begin by learning whole words by
sight, then move on to syllables (clapping out beats), then to word families
(words with rhyming endings like fight, might, sight), with the goal of be-
ing eased into an awareness of phonemes, a process taking a year or two,
if it is completed at all. This not just a passing whim. It is the method
promoted by people in charge of research funding in the United States.
6
age. By 9 months they can tell the difference between legal and illegal
code, but how to shed 100 years of unsubstantiated beliefs about how to
Chapter 1
The Translators
Translators take over once the code has been cracked and come from a
variety of disciplines. Linguists and grammarians must unravel the pho-
nological, grammatical, and structural properties of the language before a
complete translation is possible. When translators work with the vocabu-
lary of a dead language, such as ancient Egyptian, Sumerian, or Baby-
lonian, they must see the same word in different contexts, in a variety of
texts, to gain any real insight into what the word implies. Once vocabulary
and grammar are tentatively worked out, the translator needs a historian’s
talent for framing a synthesis that accurately represents the customs, the
political system, the economy, and the feelings of the people about all
manner of things, including their sense of themselves.
The historical reconstruction of an ancient civilization is critical for
an understanding of how writing systems develop. Without knowing any-
thing about the Sumerian economy (that it was agricultural on a large
scale, with a temple distribution system and individual ownership of land),
or the Sumerian religious practices (that priests and city administrators
played a joint and important role in economic matters), or the Sumerian
legal system (that it was an effective system of justice), it would have been
difficult to discern how and why the writing system developed the way it
did.
14
Function
A writing system codes spoken language into a permanent form so that it
can transcend time and space. But the most important aspect of a writing
system is its purpose. For one thing, it should make life better. A writing
system that has no effect (being too difficult for most people to learn)
or that makes life worse (generating thousands of bureaucratic forms) is
hardly worth the bother.
Writing systems make it possible to permanently record important
things that are hard to remember, such as rules and laws, and case deci-
sions about breaches of those laws. They can record events of critical
importance to everyone, such as migrations, battles, and other historical
events, as well as disasters like floods, drought, and crop failure. They can
record intentions of good faith, as in a business transaction or in mar-
riage vows. And should disagreements arise, family members, magistrates,
or judges do not have to rely on a person’s word, or on the testimony
of witnesses who may or may not remember or be telling the truth.
Recorded accounts of how land and inheritance disputes were settled in
the Middle Ages, when most parties and witnesses were illiterate, make
fascinating reading (Clanchy 1993). A writing system, plain and simple,
makes civilization work, and without a writing system, it cannot work.
Structure
The texts and documents unearthed from civilizations with the earliest
writing systems in the world, Sumer and Egypt, show that writing systems
originated in the same way for the same reasons. They began as systems
for accounting, inventory control, bills of lading, and invoices. This is why
protowriting (which is not a system at all) has so many symbols that are
stylized pictures or icons. Much more is needed to qualify as a writing
system. A true writing system must represent the entire language, and to
do so it has to meet certain fundamental requirements. Coulmas (1989)
specified three: economy, simplicity, and unequivocality. I would add compre-
hensiveness to the list.
15
Economy means that the number of symbols used for the system must
and probably would never work. Unfortunately, there is not, and can
Chapter 2
A true writing system did not appear until around 3200–3000 b.c.
Chapter 2
The most primitive form of this system used pictograms, abstract symbols
standing for words (logographs), and category determiners for nouns.
This changed rapidly due to the nature of the language. Sumerian was
an ‘‘agglutinative’’ language consisting mainly of one-syllable words with
these syllable types: CV, VC, CVC. In this type of language, one-syllable
words are combined (‘‘glued’’) into longer words and phrases. For exam-
ple, the plural was indicated by duplication: ‘‘I see man-man.’’ Because the
Sumerians used the same symbol for the sound of the word no matter what
role it played in the sentence, the writing system essentially functioned
like a syllabary—a sound-based system.
The shift from meaning to sound is not obvious and has misled
scholars down through the years. Proportional counts of pictograms and
logographs made from the vast number of clay tablets recovered from
archaeological digs show that the logograph count has declined over time.
In part, this is because scholars initially believed all early writing systems
were logographic, leading them to see more logograph symbols than were
actually there. In part, it was because the Sumerians shed logographs as
time went by. For example, Falkenstein (1964) reported that among the
total signs dating back to 3000 b.c. (pictograms, logographs, classifiers),
2,000 were logographs. In subsequent estimates, the logograph tally
was much smaller, and it shrank dramatically over the centuries: 1,200
in 3000 b.c., 800 in the period 2700–2350 b.c., and 500 in 2000 b.c.
(Coulmas 1989). Yet Michalowski (1996), a leading Sumerian scholar,
believes that the Sumerians may have had few, if any, logographs almost
from the beginning. He calculated that the entire corpus of symbols (all
types) was around 800 by 3200 b.c., stating that the system was virtually
complete at this time. A writing system with 800 signs is consistent with a
syllabary.
In 2500 b.c., the Semitic Akkadians adopted Sumerian for their ad-
ministrative language, along with the writing system (much as Europeans
borrowed Latin). Over time, it was adapted to the Akkadian language
and became a syllabary of about 300 signs (Cooper 1996). By the reign of
King Sargon I, in 2350 b.c., any use of logographs had virtually ceased
(Civil 1973).
We can trace a similar development in China. Early Chinese writing
was discovered late in the nineteenth century on a set of oracle bones
21
dating to the Shang dynasty in roughly 1200 b.c. The characters, which
about 1,860. Children learn half the kanji symbols in elementary school
Chapter 2
the choice of a phonetic unit for a writing system has been suggested by a
number of authors (Mattingly 1985; Coulmas 1989; Katz and Frost 1992;
McGuinness 1997c), but it has not been explored in much depth. I have
proposed (McGuinness 1997c) that there are four types of sound-based
writing systems, not two. These four basic types derive from the syllable
structure and phonotactics of individual languages. The choice of a sound
unit is based on the principle of least effort—a trade-off between two fac-
tors: economy (the ease with which the sound units can be learned), and
perception (the ease of discrimination, or ‘‘naturalness,’’ of the units to be
adopted). This is assuming the other constraints apply as well.
This represents a departure from conventional wisdom on this issue.
Nevertheless, it is worth exploring because it provides answers to some
quite fundamental puzzles, the first being: If Gelb was so certain that a
syllabary is an inevitable step en route to alphabets, where are the syl-
labaries he refers to? So far, we have met only three (Sumerian, Akkadian,
and Chinese). There are the Babylonian and Assyrian syllabaries (lan-
guages that were dialects of Akkadian), the Hittite syllabary, plus the clas-
sical script for Annamese (Vietnamese) borrowed from China. Seven or so
syllabaries in 5,000 years is not an impressive number. Only one, Annam-
ese, was shed for an alphabet, and this was not made official until 1912. A
more compelling argument against Gelb’s linear evolutionary model is
that it cannot explain the many examples where an alphabet was rejected
in favor of something more syllable-like. If Gelb was correct about writing
systems evolving into higher forms, then evolution could go both forward
and backward in time and so would not count as ‘‘evolution.’’
To understand the implications of this new classification scheme, we
need to look at the structure of the four types of writing systems and how
they were designed. In particular, we are going to look at the critical role
that awareness of the phonetic/phonemic level of language played in how
the writing system was set up. This illustrates the fact that no scholars
could have designed a writing system unless they knew the phoneme cor-
pus of their language, and raises the following question: If they knew this,
why didn’t they always opt for an alphabetic writing system?
netic information. The English words straight and I are both one-syllable
C V C V C
a o a o
b ba bo b bab bob b
t ta to b bat bot t
t tat tot t
t tab tob b
26
This example is intended to make the point that when your logo-
Chapter 2
graphic solution crashes into the ceiling of human memory, you have to
find a better way, and it must be complete. A partial writing system is no
use. The question is, would it be more efficient to begin by assigning
symbols to 35 phonemes as a memory device, then set up a matrix and
work it through, or to do it randomly, designing each syllable symbol as
you went along? We have no clues from Chinese documents as to what
was done, but we do have clues from Sumer in the thousands of clay
tablets found in temples, palaces, and schools dating back to the fourth
millenium b.c. (Kramer 1963, [1956] 1981; Michalowski 1996).
Michalowski (1996) reported that digs in ancient Uruk revealed an
abrupt transition, which has been dated to between 3200 and 3000 b.c.
The early phase of protowriting had no particular order or structure. At
the point where writing proper begins, suddenly order appears. Having
observed this material, he wrote: ‘‘The structure and logic of the system
indicate that it was invented as a whole and did not develop gradually’’
(p. 35). Michalowski emphasized the fact that from the beginning there
was a concern for the structured transmission of the system.
Kramer reached a similar conclusion in his analysis of Sumerian
tablets written for and by schoolchildren (about 2000 b.c.). The children
were introduced to the syllable signs in a systematic way: all the CV syl-
lables, all the VC syllables, then the CVC syllables, each memorized
independently. Next, they learned lists of words in semantic categories.
Written symbols were not learned in a random order incidentally as part
of reading meaningful text, and it is highly unlikely they were designed
that way.
These examples show that it is just as likely the Chinese and the
Sumerians were aware of the phonemic structure of their language, as it
is they were not. And if they were aware of it, they could have easily
observed that an alphabet was extremely economical. Nevertheless, they
chose not to use it.
CV Diphone Systems When a language has more than three or four sylla-
ble types, a syllabary will not work for the simple reason that it would
breach the limits of human memory. According to the received wisdom
on this issue, the only other option is an alphabet. But this is not remotely
what happened. The most common sound-based unit adopted for all
27
later used in commerce by the Greeks. Next come the writing systems
Chapter 2
based on the Indic Brahmi script, developed sometime after the fifth cen-
tury b.c. This script led to an amazing number of offspring.1
It is assumed that the Indic Pali script, or one like it, led to the
creation of the two Japanese diphone systems, hiragana and katakana, via
Buddhist missionaries. Other diphone systems include Ethiopian (a
Semitic language), circa fourth century a.d., and the Han’gul system
developed during the Korean writing reform in the fifteenth century.
There is also the writing system of the Cherokee Indians, and a most
interesting diphone hybrid from Persia, discussed in more detail below.
The important message is that alphabets may dominate in Western
cultures, but they do so for linguistic (phonotactic) reasons, not because
they are inherently superior to all other forms.
Evidence on how these diphone systems were set up and designed
provides another surprise. At the time the Brahmi script was created, In-
dian scholars had already designed symbols to represent each phoneme in
their language. These ‘‘alphabet symbols’’ were set up in a fixed order and
grouped by place of articulation, showing their sophisticated knowledge of
phonetics. The same ‘‘alphabetical order’’ was used for dictionaries. There
is speculation that this ‘‘alphabet chart’’ was used for novices to learn to
chant mantras with precise pronunciation. But it is highly likely the al-
phabet chart was used to design the Brahmi diphone script, as shown by
how it was constructed. Each consonant plus the vowel /ah/ (the primary
vowel in Indic languages) was assigned a different symbol. Next, these
symbols were systematically modified for each vowel change, as shown in
figure 2.1.
1. The Brahmi offspring include the writing systems for Kusan, Gupta, the
Devanagari script used for Sanskrit, Hindi, and Nepali, plus the scripts for
Siamese, Burmese, Kavi or Sinhalese, Bengali, Assamese, Tibetan, Mongolian,
Kashmiri, Balinese, Madurese, Tamil, Central Indian, Punjabi, and Malaya-
lam, as well as the important Pali scripts designed for the Prakrit languages
associated with Buddhism. These scripts traveled east as Buddhism spread,
giving rise to the diphone systems of Sri Lanka, Burma, Thailand, Cambodia,
and Indonesia.
29
Judging from the evidence on how these systems were set up, and the
large number of people who adopted and designed them (about 200 of
these scripts have been attested in India alone), it would have been obvious
to ancient scholars that an alphabet could work as a writing system. Yet an
alphabet was never adopted for this purpose.
The Han’gul writing system from Korea is an even clearer example,
because its design is more transparent. First, each vowel and consonant
was assigned a symbol and these were set up in a matrix, 10 vowels across
the top and 14 consonants down the side. At each junction of consonant
and vowel, they needed a symbol for all the CV pairs in the Korean lan-
guage. Instead of designing 140 new symbols like everyone had done
before them, they had a better idea. Rather than wasting the alphabet
symbols, these symbols were fused into pairs, as shown in figure 2.2.
There is one more example from Old Persia, during the reign of
Darius I (522–486 b.c.), the most curious so far. It is written in cuneiform
symbols, no doubt borrowed from the Babylonian script. This is a hybrid
system of 36 symbols. There are 13 consonant-only signs that are used
with 3 vowel signs (an alphabet). In addition, there are 20 different con-
sonant signs that include an inherent vowel and vary according to which
vowel is indicated (CV diphones). Another surprise is that the ‘‘alpha-
betical’’ order of these signs is identical to the one used in India, with
consonants grouped by place of articulation. Our alphabetical order
derives from the West Semitic (Phoenician) system and is quite differ-
ent. Whether this alphabetical order is original to Persia or to India is
unknown, because both writing systems appeared around the same time.
30
Chapter 2
Figure 2.2
Diphone symbols for combinations of vowels and consonants in Han’gul
failed to use it for a writing system. We now have three clear examples of
Alphabets When a language has too many syllable types (English has 16,
not counting plurals), when it lacks a basic diphone structure CVCVCV
or any other simple structure, or when it does not have the ‘‘consonant-
root’’ nature of Semitic languages, it must be written some other way. At
this point we are nearly out of sound-unit options, and the only other way
is down—down one more level below the CV unit, to the individual con-
sonants and vowels (phonemes).
All writing systems based on the phoneme are called alpha-bets after
the first two letter names (alpu, beth) in the Old Phoenician consonantal
script. The Greeks borrowed these symbols in the eighth century b.c. to
32
set up the first alphabetic writing system. They used the 22 Phoenician
Chapter 2
system are transparent to the learner, and that the elements of the system
Chapter 2
are mastered. Methods that ask the child to ‘‘guess’’ how the writing sys-
tem works (whole-language) are utterly irresponsible.
Third, no writing system was ever based on the whole word, nor were
whole-word symbols (logographs) ever more than a minute fraction of any
writing system. Languages have too many words. Human memory for
abstract symbols overloads at about 2,000 symbols, and even achieving this
takes many years. A reading method either totally or partially based on
whole-word memorization of sight words will cause the majority of chil-
dren to fail. These facts also refute the notion proposed by some cognitive
psychologists, that people ultimately read all words by sight. It cannot
be done. (For a description of these theories and what they imply, see
chapter 11.)
Fourth, all writing systems are based on one of four specific mean-
ingless phonological units, specific because phonological units are not mixed.
Teaching methods that train children to be aware of the particular sound
unit for the writing system, and only that unit, and that show them how
these sounds are represented by the symbols, will be effective. Teaching a
potpourri of other sound units (words, syllables, syllable segments, word
families) that have nothing to do with the writing system will lead to con-
fusion and failure for many children.
Fifth, writing systems are designed to fit the phonological structure
(phonotactics) of the languages for which they were written. The choice
of the sound-unit basis of the writing system is not arbitrary. Civilizations
adopted an alphabet solution out of necessity and not from choice, be-
cause no other solution would work. The Greeks, for example, already
had a diphone writing system (Linear B), which they used for commerce.
But it was useless for representing the Greek language, with its complex
syllable structure. All Linear B texts used by the Greeks consist of bills of
lading, inventory lists, and invoices. And it is quite obvious, viewing these
texts, that the CV diphone units of Linear B map extremely badly to the
Greek language (see Robinson 1995).
Finally, history shows that alphabets tend to be avoided if possible.
Scholars from different cultures, separated widely in time and place,
designed alphabets for other reasons, but failed to use them for a writing
system. The record shows that a larger, clearly audible, phonological unit
was the preferred solution, especially if the number of these units was well
35
below the magic 2,000 limit. People are not normally aware of phonemes,
1. Make sure the complete structure of the writing system has been
worked out (or thoroughly understood) before a method of instruction is
developed.
2. Teach the specific sound units that are the basis for the code. (Do not
teach other sound units that have nothing to do with the code.)
3. Teach the arbitrary, abstract symbols that represent these sounds.
These symbols constitute the code.
4. Teach the elements of the system in order from simple to complex.
5. Ensure that the student learns that a writing system is a code and that
codes are reversible.
6. Make sure that encoding (spelling) and decoding (reading) are con-
nected at every level of instruction via looking (visual memory), listening
(auditory memory), and writing (kinesthetic memory).
raphy. At the end of first grade, the children scored 80 percent correct on
this test, a value that remained unchanged through third grade. College
students scored 90 percent correct, failing the words with the most ob-
scure Finnish spellings. In other words, it takes a year or less for Finnish
children to be able to read and spell nearly as well as the average college
student.
As noted in chapter 1, Wimmer (1993) found that the worst readers
in the city of Salzburg, Austria, scored close to 100 percent on a difficult
test of reading accuracy and did nearly as well in spelling. Wimmer and
Landerl (1997) compared Austrian and English children on a spelling test
of English and German words balanced for complexity of spelling pat-
terns. The English children made twice the spelling errors (of all types)
as the Austrian children. More importantly, 90 percent of the errors made
by the Austrian children were legal (phonologically accurate), compared
to only 32 percent for the English children.
Geva and Siegel (2000) reported on 245 Canadian children who were
learning to read and write English and Hebrew at the same time. English
was the first language in most cases. Hebrew is written with a consonant
cuing system where symbols represent consonants, and diacritic marks
are used for vowels when texts are difficult, as for beginning readers. The
consonant symbols and diacritics are nearly 100 percent consistent,
making Hebrew a highly transparent writing system. By the end of first
grade, the children scored 79 percent correct on a Hebrew reading test,
but only 44 percent correct on the English version of the test. Children
did not reach the 80 percent competency level in English until fifth grade,
and by this time they scored 90 percent correct in Hebrew.
Writing systems that evolve over long periods of time tend to be
opaque, lacking consistency between sounds and symbols. This happens
for two reasons. The first has to do with the arbitrary nature of the pro-
cess; the designers are pioneers and there is no prior model of a code or a
writing system. The process proceeds by trial and error, and as insight is
gradually gained, no writing reform occurs to correct early mistakes.
The second reason is historical accident. A country (England) with a
transparent writing system (Anglo-Saxon or Old English) is conquered by
a people who speak a different language and have a different way of writ-
ing (spelling) the same or similar sounds (Norman French). The language
41
destroys the logic of the writing system. Confusion can start to creep in
Chapter 3
quite early and unravel students long before they get anywhere close to
176 alternative spellings (these spellings would never be taught anyway,
because teachers do not know what they are).
Here is a typical example of a common error. Mrs. Jones is keen to
include phonics principles in her teaching. She works entirely from visual
logic: letter to sound. She teaches the letters k, c, and the digraph ck, but
not at the same time. She says something like: ‘‘The letter see says /k/.’’
(Mrs. Jones is careful not to say ‘‘kuh,’’ which is good.) She writes the
words cat, cup, car, and cow on the board. A week or so later, she says:
‘‘This letter is kay, and it says /k/.’’ She writes the words keg, keep, and kill
on the board. Several weeks later, she says: ‘‘The two letters see-kay say
/k/.’’ She writes the words back, duck, and sick on the board.
If children remember the first lesson (which many will not), they
may think they probably got something mixed up. What they heard a
week ago was not really /k/, but something else they cannot remember.
Other children will assume that the sound /k/ that the letter c says is a
different /k/ from the one the letter k says, and a different /k/ from the
one the letters ck say. Young children make this kind of mistake a lot.
They believe, or actually hear, the same phoneme as sounding differently
depending on where it comes in a word. To some children the /b/ in bat
sounds different from the /b/ in cab. Acoustically, they are different. The
/b/ in bat is considerably more bombastic than the /b/ in cab. These subtle
distinctions are known as allophones (variants on the same sound). But
our writing system is based on phonemes, not allophones. If children are
taught the 40 phonemes in the first place, this kind of confusion could
never occur.
Here is what Mrs. Jones should have said: ‘‘Today we’re going to
learn the sound /k/. There’s more than one way to spell this sound. I am
going to teach you some patterns to help you remember when to use each
spelling.’’
Now the message is clear. The children will not think that they are
losing their minds, or that they are too stupid to learn.
Given the fact that the English spelling code is highly complex, one
would imagine that considerable effort has been expended to work out its
structure by linguists, curriculum designers, and researchers studying how
children learn to read and spell. This has not happened. The nature and
43
A Tower of Babble
Confusion in the field begins with basic terminology like orthography,
spelling rules, and regular spelling. The word orthography appears in almost
every research report on spelling and in much of the literature on reading
as well. Orthography means ‘‘standardized spelling,’’ ortho meaning uni-
form or standard, and graphy, written signs or symbols. We have a stan-
dardized spelling system thanks to Samuel Johnson, but knowing correct
spelling tells us nothing about the structure of the spelling code. Yet
researchers frequently confuse the word orthography with this structure
and with ‘‘spelling rules.’’ Many believe that ‘‘orthographic rules’’ govern
why and how words are spelled in particular ways, despite the fact that no
one is quite certain what these rules are. Yet it is precisely these ‘‘rules’’
that children are supposed to grasp intuitively and internalize as they work
their way through pages of print and spelling tests over the years. Here is
a summary of the terminology problem. When the basic terminology of
a discipline is misused, or never properly defined, the researchers do not
understand what they are studying.
Rules
Clymer (1983) wrote an insightful and entertaining paper on spelling
rules, or as he put it, on ‘‘spelling generalizations.’’ In his words, ‘‘We
were careful not to call the generalizations ‘rules,’ for all our statements
had a number of exceptions. As the class finally formulated a generaliza-
tion regarding the relationships of letters, letter position, and sounds, such
defensive phrasing as ‘most of the time,’ ‘usually,’ and ‘often’ appeared as
protective measures’’ (p. 113).
44
Regular Spelling
The expression ‘‘regular spelling’’ is used all the time in publications on
reading, as well as in descriptions of reading and spelling tests. The casual
use of this expression, and the fact that it is never defined, implies that
everyone is supposed to know what ‘‘regular spelling’’ means. Some authors
refer to ‘‘regular grapheme-phoneme spellings,’’ others to ‘‘grapheme-
45
Orthography
Wagner and Barker (1994) unearthed 11 definitions of orthography in
publications by leading reading researchers. Each definition was different.
Three were circular, using the word orthographic or orthography to define
orthography. According to Szeszulski and Manis (1990, 182), for example,
‘‘Orthographic coding allows direct access to a mental lexicon for familiar
words based on their unique orthography.’’ Other authors stressed the
function and structure of the spelling code, the ‘‘general attributes of the
writing system,’’ which included things like ‘‘structural redundancies’’
(Vellutino, Scanlon, and Tanzman 1994) or similar structural features
(Leslie and Thimke 1986; Jordan 1986). Some incorporated phonological
processing into the definition (Foorman and Liberman 1989; Ehri 1980;
Goswami 1990; Olson et al. 1994), while others took pains to exclude it,
stressing that orthography was different from ‘‘phonological mediation’’
(Stanovich and West 1989). Perfetti (1984, 47) came closest to the real
meaning, defining orthographic ability as ‘‘the knowledge a reader has about
permissible letter patterns.’’
46
1. There are not enough letters in the alphabet for the 40 phonemes in
the English language. To solve this problem, the Anglo-Saxons followed
the Romans’ lead by reusing letters in combination (digraphs) to stand for
a single phoneme (sh in ship).
2. There are only 6 vowel letters for approximately 23 English vowel
sounds (15 vowels, and 8 vowel þ r vowels). The multiplicity of vowel
digraphs and phonograms (igh in high) make vowels particularly difficult
to read and spell.
3. There are multiple ways to spell the same sound: spelling alternatives.
4. There are multiple ways to ‘‘read’’ the same letter(s): code overlaps.
5. The connection between spelling alternatives and code overlaps is not
straightforward. This creates the false appearance of two independent
47
letters, both small and great . . . then the Italics . . . then the sounds of
Chapter 3
the vowels; not pronouncing the double letters a and u separately, but
only the sound that those letters united express . . . then the double letters.
All this a child should know before he leaves the Alphabet and begins
to spell’’ (Webster 1783, 28). The speller consists mainly of word lists
organized by syllable length (up to seven syllables) and by types of suffixes.
The only phonological structure imposed on these lists was the use of
word families (rhyming endings). There was no attempt to provide the
common patterns of sound-to-symbol relationships in our spelling code.
No change to this basic format occurred in the six editions of the speller
over 100 years. (For a thorough discussion of Webster’s analysis of the
problem, see McGuinness 1997c, 1998b.)
Venezky
Since that time, surprisingly few attempts to assess the structural elements
and limits of the code have been made. Venezky (1970, 1995, 1999), also
using a letter-to-sound approach, analyzed the incidence and variety of
code overlaps (point 4 on the list above). As he put it, ‘‘Orthography con-
cerns letters and spellings, the representation of speech in writing. That’s
not exactly what this book is about, however. Here only one direction in
the speech-writing relationship—that from writing to speech—is stressed’’
(Venezky 1999, 3).
Venezky set himself the task of discovering all possible ways a partic-
ular letter or letter combination can be decoded, using a corpus of 20,000
words. For example, he identified 17 different ways to decode the letter o.
There are 48 ways to decode the five (single) vowel letters. However,
Venezky tends to give equal weight to all options, which only serves to
highlight the number of exceptions. He did not write, for example: ‘‘Most
of the time, the letter o is decoded three ways—/o/, /oe/, and /u/, as in
hot, told, among.’’ Instead, common and weird spellings alike are set forth
(sapphire, catarrh), and while these curiosities are interesting, they lead the
reader to believe that the English spelling system is beyond redemption.
Venezy did not work out the probability structure of these code over-
laps, but he did offer two classification systems. In the first system, letters
were categorized functionally as relational units, markers, or silent. Rela-
tional units signal a particular decoding (a phoneme); markers refer to let-
ters used as diacritics; silent letters serve no function. However, letters are
49
not confined to just one category (i.e., are not mutually exclusive), so it is
From this exercise we learn that the most common initial consonant
and ‘do you know a word in which it has a different sound?’ ’’ (p. 232,
Some vowel sounds were counted twice: /er/ (her) and /ur/ ( fur); /o/
Chapter 3
(soft) and /o/ (odd ). There were numerous errors in classifying vowel
spelling alternatives. The most serious stemmed from misunderstanding
the role of e as a diacritic marker for consonants in words like juice, dense,
live, siege, judge, and soothe. The e was erroneously coded with the vowel,
producing several nonexistent spelling alternatives like oo-e (choose) and
ui-e ( juice), plus hundreds of misclassified words where the e works with
the consonant but was coded as working with the preceding vowel: e-e
(license), o-e (dodge), and i-e (massive).
There are 23 consonants in English (25 if one includes the letter
symbols for consonant clusters: x /ks/ and qu /kw/). Hanna et al. listed 31.
These included /ks/ and /kw/, plus silent h (honest) and several phonemes
that do not exist, marked with a glottal stop /'l/, /'m/, and /'n/, as in table,
chasm, and pardon. These are two-syllable words. Every syllable, by defi-
nition, must have a vowel, not a glottal stop. These words are pronounced
ta-bP l, kazzP m, and pardP n, with a schwa vowel in the second syllable.
Hanna et al. uncovered a total of 174 spellings for 52 phonemes (93
for consonants and 81 for vowels), including the spellings for the Latin
and Greek layers of the language. The spelling alternatives were classified
in turn according to whether they appeared in a stressed or unstressed
syllable. However, English spelling is largely unrelated to syllable stress.
The schwa vowel always appears in an unstressed syllable, but knowing
this does not tell you which of its six spellings to use. Next, Hanna and
colleagues classified multisyllable words according to whether a spelling
appeared in the initial, medial, or final syllable, another exercise in futility.
Spelling patterns are affected by phoneme position within a syllable, not
between syllables in multisyllable words. All this greatly increased the level
of complexity of their results.
There were errors in data entry. People were trained to classify words
by phoneme and by spelling alternative. Initially, there was a high success
rate with a trial corpus of 565 words. This level of accuracy was not
count as one vowel: /ou/ ¼ ah-oo (out); /oi/ ¼ oh-ee (oil ). Vowel þ r vowels are
most affected by dialect, and in some dialects the /r/ portion is not sounded.
There is considerable disagreement among linguists as to how many vowel þ r
vowels there are.
55
McGuinness
The most recent attempt to classify the spelling code is my own
(McGuinness 1992, 1997a, 1997c, 1998a, 1998b). This classification is
based on a strict sound-to-print orientation like that of Hanna et al. But
the goals and the procedure were different, in that the purpose of my
classification was similar to Webster’s—to systematize and illuminate
the spelling code so that it could be taught. The process began with the
phonotactic structure of the English language (legal phoneme sequences
56
lable constituted a real word, or was part of a real word, its spelling was
recorded. Based on this analysis, it was estimated that there are over
55,000 legal English syllables. Contrast this to the 1,277 legal syllables in
Chinese, and you have the reason our writing system is an alphabet and
not a syllabary.
Basic Code The first step in any classification process is to establish limits
or endpoints. Endpoints create a frame or boundary. This is essential for
codes, because they must have a pivot point, around which the code can
reverse. This endpoint or pivot is the finite number of phonemes in the
language. Once the student knows 40 phonemes, there are no more to
learn.
The next step is to set up a basic code (an artificial, transparent
alphabet) using the most probable spelling for each phoneme, as defined
in the section ‘‘Regular Spelling’’ above. A basic code can easily be taught
to 4- or 5-year-olds in a relatively short period of time ( Johnston and
Watson 1997, forthcoming; also see chapter 5, this volume), though this
should not be mistaken for the complete code. A basic code makes it pos-
sible to read and write a large number of common one- and two-syllable
words. It is transparently reversible, so children can see and experience the
logic of a writing system. The idea of a basic code is not new (Ellis 1870;
Dale 1898), and it is common to several phonics programs today, particu-
larly in the United Kingdom. But nearly all programs stop here, or at best,
teach only a fraction of the remainder of the code.
The Advanced Spelling Code The major hurdle in our writing system is
mastering the multiple spellings for each phoneme. This is the reason
English-speaking children have so much difficulty learning to read and
spell. It is the ‘‘advanced code’’ that causes the major problems for poor
readers, and teaching this turns out to be far more important during
remediation than training phoneme-awareness skills (C. McGuinness, D.
McGuinness, and G. McGuinness 1996).
Because of its complexity, the advanced code itself needs a classifi-
cation scheme with limits and boundaries. All classification systems are
somewhat arbitrary, and whether this one is best for the purpose remains
57
an empirical question. The first hurdle is to find out which spelling alter-
this strategy can harden into a habit that can be difficult to break
Chapter 3
(McGuinness 1997b).
For these reasons, the sight-word category was reserved for common
words where one or more phonemes have a unique spelling that is hard
to decode without direct instruction. There are almost no words where
every phoneme has an unpredictable spelling. By this criterion, there are
remarkably few true sight words. The following sight words and special-
group words did not fit a major spelling category in a large corpus of words
of English/French origin. There are approximately 100 sight words:
Final /k/—c or ch arc, tic, ache, stomach Group: /k/—lk baulk, caulk,
chalk, stalk, talk, walk
/t/—bt Group: debt, doubt, subtle
Final /th/—th smooth (final voiced /th/ is usually spelled the, as in breathe,
clothe)
Final /v/—f of
Initial h is not sounded: honest, honor, hour
Initial /h/—wh who, whom, whose, whole
59
h hail
hale halo
haste
hasty
hate hatred hay
haze hey
i inflate
j jade jail jay
l label
labor
lace ladle
lady
laid
lain
lake
lame
lane
late
lathe lazy lay
m mace
made maid
mail
maim
main
maize major
make
male
mane manger
maple
mate may
maze
n nail
name
nape nasal
native
nature
naval
navel
navy neighbor
o obey
Source: McGuinness 1997a.
61
chips away at the complexity, by accounting for when and where specific
Chapter 3
The suffix -sion usually attaches to word stems ending in /s/: access,
Chapter 3
compress, concuss, confess, convulse, depress, digress, discuss, and so on, though
other word stems take this spelling as well: admit/admission, ascend/
ascension.
Memorize the words that use the -tian, -cion, -cean, and -shion
spellings.
What is more, -tion and -sion get around. They move as a unit, another
reason to teach them as a unit. They pop up as spellings for ‘‘zhun’’
(equation, vision), and for ‘‘chun’’ (digestion, question, suggestion). In all, there
are 38 Latin and 2 Old French suffix spellings that need to be taught. This
brings the total of spelling alternatives to 172, plus 4 extra ‘‘Greek’’ con-
sonant spellings at this level, a total of 176. This is surprisingly close to the
tally of 174 reached by Hanna et al., but for quite different reasons.
Teaching these Latin suffixes requires a different instructional ap-
proach, and keeping the suffixes intact has a number of spin-offs. One is
that it makes long, scary words like advantageous and unconscious easy to
read and spell. Knowing that the formidable -geous or -scious spellings
are merely ‘‘jus’’ or ‘‘shus’’ in disguise makes them far less daunting. Once
these suffixes are demystified, they can be identified first, making the rest
of the word easy to decode and spell: ad-van-ta-/geous, un-con/scious.
The front ends of Latin-derived words are remarkably well behaved and
usually spelled in basic code, or with a highly probable spelling alternative.
A second spin-off is that suffixes attach to root words in predictable
ways. When -tion attaches to a root word, the most common form
involves adding the letter a to the root: inform-information, limit-limitation.
When word stems ends in e, the e is dropped: agitate-agitation, create-
creation. A large family of -ate words follow this pattern, another reason to
group words when teaching these suffixes.
Greek words came into the language via philosophy, medicine, and
science, originally in Greek, and later as transliterations with special
‘‘Greek’’ spellings. If one steers clear of specialist or technical words,
the Greek invasion is remarkably less intrusive than people believe. Very
few common words use these spellings, and only eight of these spellings
appear in familiar words. These words can be listed on a single page. The
spellings include ch for /k/ (chorus), y for medial /i/ (myth) or /ie/ (cycle),
65
ph for /f/ (dolphin), and rarer spellings like pn for /n/ ( pneumatic), rh for
Level 0: /b/, /d/, /t/, /m/, /n/, /h/, /w/, /a/, /e/, and /i/
Level 1: /f/, /l/, /g/, /k/, /s/, /t/, /p/, /o/, and /u/
Level 1 words (see above) include three spellings for the sound /k/: c,
k, and ck, as seen in the words club, ask, and back. Do the children learn
these as three spellings for the same sound, or as three different sounds?
At level 2, children learn to decode words with the vowel spellings a-e
(lane), i-e ( fine), and o-e (tone). At level 4 they learn alternative spellings
for two of these vowels (/ae/ and /oe/): ai (hail ) and ay (day); oa (soap) and
ow (snow). The ow spelling is also a code overlap for the phoneme /ou/
(cow), so it is not clear which sound is meant. The question is, do children
learn the level 4 vowels as new sounds, or do they learn that ai, ay, oa, and
67
ow are alternative spellings for two sounds they learned at level 2? And if
Lesson 1 The ancient scholars who designed the first writing systems be-
gan by using the same logic people had used to set up accounting systems:
one symbol for each word. This attempt failed quickly, irrevocably, and
universally. Scholars were forced to abandon the word for a sound unit
below the level of the word due to the extreme limitations of human
memory for mastering sound-symbol pairs. The average person has an
74
upper memory limit of about 2,000 of these pairs, no matter which sound
Chapter 4
unit is chosen. This is an ultimate limit, a memory ceiling, which does not
improve with further training. Thus, this type of paired-associate learning
obeys the law of diminishing returns.
We have abundant evidence, dating back over 5,000 years, that a
whole-word, meaning-based writing system does not work, never did
work, and never will work.
The evidence from the NRP report provides incontrovertible support
for this conclusion. Whole-word teaching methods lead to consistently
lower reading test scores than methods that emphasize phoneme-
grapheme correspondences.
There is another problem with whole-word methods: they are highly
misleading. Children and adults alike are strongly biased in favor of
linguistic meaning. A whole-word (sight-word) reading method is very
appealing to children, especially because memorizing letter sequences and
‘‘word shapes’’ is quite easy early on. This gives children the false im-
pression that they are learning to read. But it is just a matter of time be-
fore this strategy begins to implode. Whole-word memorization starts to
fail toward the end of first or second grade, depending on the children’s
vocabulary and visual-memory skills, and unless they figure out a better
strategy, their reading will not improve (McGuinness 1997b).
Lesson 2 Only four types of sound units have been adopted for the writ-
ing systems of the world: the syllable, the CV diphone, consonants only,
and the phoneme. Which unit is chosen depends on the phonotactic or
phonetic structure of the language. These sound units are never mixed. If
more than one unit was adopted, this would make the writing system
highly ambiguous and extremely difficult to learn.1
A reading method must teach the sound for which the writing system
was designed, and no other unit. This rules out ‘‘eclectic’’ or ‘‘balanced’’
reading methods that teach whole words, syllables, syllable fragments like
rhyming endings, and phonemes. This is tantamount to teaching four
1. Japan is an exception since they added the Roman alphabet to their two CV
diphone scripts.
75
Lesson 3 Ancient scholars avoided adopting the phoneme as the basis for
a writing system. Yet there is abundant evidence they were well aware of
the phonemic structure of their language and used this to set up the writ-
ing system and to design dictionaries. The fact that phonemes are harder
to isolate or segment than larger phonological units appears to be the
primary reason, perhaps the only reason, why every civilization today does
not have an alphabetic writing system. For this reason, it makes sense to
teach children to segment (and blend) phonemes if they have an alpha-
betic writing system. A method that includes this type of instruction ought
to be more effective than one that does not. This too is confirmed by the
NRP report. Reading methods that include phoneme-analysis tasks are far
more successful than methods that do not.
Here is the message so far: If you have an alphabetic writing system, you
must teach an alphabetic writing system. There is no use pretending you have
something else.
Lesson 4 The English alphabet code is highly opaque. There are two
ways to mitigate this problem and help children manage this complexity.
The first is to ensure that children understand the direction in which the
code is written, from each sound in speech to its spelling. For an opaque
writing system to function as a code, it must be anchored in the finite
number of sounds of the language and not in the letters or letter combi-
nations of the spelling patterns. Unless this is done, the code nature of the
writing system is obscured, and the code cannot reverse. A code that can-
not reverse will not function as a code.
By contrast, transparent codes are relatively easy to learn and to teach.
The second way to help children master an opaque alphabet code is to set
up a temporary ‘‘artificial transparent alphabet’’ or basic code. This reveals
the nature or logic of an alphabetic writing system, making it ‘‘transpar-
ent’’ or accessible to a child. It also provides a platform, a foundation,
from which the code can expand, and spelling alternatives can be pegged
76
onto the system without changing the logic. Reading programs based on
Chapter 4
these two principles ought to work better than programs that are not. So
far, no analysis, including the NRP review, has focused on these possibil-
ities: a sound-to-print orientation, and teaching via an artificial transpar-
ent alphabet.
We begin this analysis here, and it continues in the following chapter.
It is of considerable interest to follow the history of these new ideas, and
track the programs that meet these guidelines from the nineteenth century
to the present time. This way we can evaluate how a prototype reading
program based on these principles fares in well-conducted research.
Flesch in Why Johnny Can’t Read (Flesch [1955] 1985). It was later revised
Chapter 4
children, the classroom, the teacher, the parents, and so forth. Thus, it
writing as entirely separate processes. This meant that for most beginning
A Program Analysis
Chall did an analysis of program content and sequence, plus an in-depth
account of three programs. The basal-reader programs as a group featured
very few words in their stories and ‘‘readers.’’ Neither spelling regularity
nor word length was considered important. Letter-sound correspondences
were introduced late, and children learned slowly, gradually being exposed
to the word from whole to part, a method known as analytic or intrinsic
phonics. These lessons do not begin until around second grade and con-
tinue for several years. Details on two basal-reader programs are pre-
sented in the following section.
The programs described as synthetic phonics were much more variable.
Chall described these methods as having reading vocabularies based on
spelling regularity and word length. Early words were simple and short,
gradually increasing in complexity. The emphasis was on mastering letter-
sound correspondences, and new words were introduced rapidly. To this
end, children were taught to blend and segment sounds in words and to
connect sounds to letters.
A third group, the linguistic programs, had several features in com-
mon. First, they were written by linguists. There was little emphasis on
meaning for obvious reasons, and the focus was on the alphabet, especially
letter names. The reading vocabulary consisted of predictable, short words
82
set out in lists that abounded with word families: cat, rat, sat, mat. Chil-
Chapter 4
barrage, Chall could not find a single statement about the fact that the
Classroom Observations
The third phase of Chall’s investigation included visits to over 300 class-
rooms in kindergarten through third grade. The observations consisted of
84
A Research Summary
The last component of Chall’s review was an analysis of classroom re-
search on reading methods. All the comparison studies between look-say
and phonics were published between 1912 and 1940, and will not be dis-
cussed further. The findings from the studies were summarized in a series
of tables. The method that produced the best outcome on tests of word
recognition, comprehension, spelling, and so on was indicated by its
initials. Thus, if look-say did ‘‘better’’ on a particular test, an LS was
reported. (Synthetic phonics ¼ SP; intrinsic or ‘‘basal’’ phonics ¼ IP.)
Words and phrases like better, higher, and ‘‘had an advantage’’ were used
to describe these outcomes. Better was not defined and no numerical data
were provided.
What kind of data did Chall rely on? The majority of the studies
reported outcomes in average scores. It appears that no statistical analysis
of the data occurred before the 1960s, as shown by the fact that ‘‘no dif-
ference’’ was indicated by E (for equal) in all studies prior to the 1960s,
but as NS (not significant) from the 1960s on. The tables are peppered
with SP’s, showing a huge ‘‘advantage’’ for synthetic phonics. In the
comparisons listed from first through sixth grade, SP was ‘‘better’’ 68
times, IP only 11, and there were 34 draws (29 E’s and 5 NS’s), giving the
impression that synthetic phonics was the overwhelming winner. This is
very misleading.
The absence of numerical data, the reliance on average scores to de-
termine better, and the failure to define what better means, make it impos-
sible to draw any conclusions from this review. Chall leaves us with a basic
contradiction. If synthetic phonics is vastly superior, as her tables seemed
to show, then the quality of teaching is far less important than the
method. In fact, teaching skill pales into insignificance compared to the
overwhelming superiority of synthetic phonics according to the tables. Yet
Chall’s classroom observations indicated that the teacher’s ability can
override the method, at least in making the lesson exciting and stimulat-
ing. On the one hand, the message from the classroom is that the teacher
matters as much as or more than the method. On the other hand, Chall’s
86
presentation of the research seemed to show that the method matters far
Chapter 4
Table 4.1
Total N 9,141
Study. Refers to the Study as a whole, including all Methods and Projects.
Method. The particular Method being compared to basal-reader in-
struction. Five different Methods were investigated.
88
Data Collection
Baseline Measures Demographic data were collected on the children, the
teachers, the community, and the school. For the children, these were age,
sex, months of preschool experience, and days absent. Teacher variables
included sex, age, degrees, certification, years of experience, marital status,
number of children, attitude toward teaching reading, days absent, and
supervisor-rated teacher effectiveness. Community-/school-based infor-
mation included median education of adults, median income, population,
type of community (urban, rural, and so on), classroom size, length of
school day/year, per-pupil costs, and so forth.
These data can be dismissed rather quickly. None of the community
and school variables were more than modestly correlated with the chil-
dren’s reading test scores. ‘‘Teacher experience’’ was the only marginally
relevant teacher variable (r ¼ :30). Sex was found to be a strong predictor,
and for this reason, it was included as a variable in all further statistical
analyses.
The children took a battery of tests thought to be predictive of sub-
sequent reading skills. These included auditory- and visual-discrimination
tasks, tests of letter-name knowledge, reading readiness, and IQ. When
this testing was completed at each school, the 140-day clock started
running.
Outcome Measures At the end of the 140 days, children were tested once
more on various reading tests. They were given the five subtests of the
Stanford Reading Test, which had just been renormed. This is a group-
administered test. The subtests are word reading (word recognition), para-
graph meaning (comprehension), vocabulary (receptive vocabulary), spelling
(spelling dictation), and word-study skills (tests of auditory perception as
well as decoding skill).
In addition to the Stanford Reading Test, four individually adminis-
tered tests were given to a subsample of children, randomly selected from
every classroom. The total number of children in this sample was 1,330.
The tests were the Gilmore Tests of Accuracy and Rate, the Fry Word
89
List (decodable, regularly spelled words), and the Gates Word List (based
Correlational Analyses
The first question was whether scores on the baseline tests correlated with
(predicted) scores on the Stanford Reading Test seven months later.
There were complete data on 8,500 children. Correlations were computed
separately for each Method, and also for Basal-reader Classrooms com-
bined across all Projects and Methods.
from .45 to .62), all other correlations were very high indeed, ranging
Chapter 4
from .61 to .86. Clearly, the group tests and the individual tests were
measuring the same skills. This shows that both the individual data and
the group data were valid and reliable measures of reading skill.
A particularly important result was the almost-perfect correlation
(.86) between the Fry Word test (a test of decoding regularly spelled
words) and the Gates Word test (a test of visual recognition of mainly
irregularly spelled words or sight words). This was surprising, because
people assumed, as many still do, that decoding and sight-word memori-
zation require different skills. Children trained to memorize sight words
(basal-reader groups) were expected to be good on the Gates and worse
on the Fry. Conversely, children taught to decode regularly spelled words
(phonics only/few sight words) were expected to do the opposite. These
patterns would produce low to moderate correlations. Instead, perfor-
mance on one test almost perfectly predicted performance on the other.
There is only one interpretation: children who are good decoders are
good decoders, no matter how words are spelled. Children who rely
mainly on visual memory (nondecoders) are not only poor decoders but
fare just as badly on sight words.
Results from the Stanford Reading Test (Group Administered) At this point,
in this otherwise impeccably controlled research, several statistical blun-
ders took place that essentially voided most of the results. The authors
decided to use mean scores from each classroom instead of individual
scores, one for boys and one for girls. This reduces the data from 20–30
children in a classroom to a prototypic boy and girl. Next, the data were
analyzed with analysis-of-variance (ANOVA) statistics for the five Meth-
ods separately. Reducing data to means invalidates the use of ANOVA
statistics, because the mathematics is based on variances derived from nor-
mal distributions of individual scores, not on group means—hence its
name.2
2. There is currently a theory that the unit of instruction must also be the unit
of analysis. If the ‘‘treatment’’ is the whole class, then the measure must be the
92
things happened. First, the focus of the study changed. Now all the vari-
ance (variability in each outcome measure) was due to the variability be-
tween classrooms, instead of variability between children within classrooms.
This turns a study comparing children learning different Methods into a
study comparing Classrooms (teaching skill perhaps?). Second, this repre-
sents a huge loss of statistical power. Table 4.1 illustrates the contrast be-
tween the actual number of children in the study and the number of scores
used for the data analysis, a tenfold reduction (100 to 10).3
I will examine the Basal-reader-versus-i.t.a. comparison to illustrate
the kinds of problems this created. There were large main effects of sex
(girls’ better p < :001), effects of Method (i.t.a. better on the Word sub-
test, Basal readers better on Spelling), and effects of projects (some did
better than others). However, there were also highly significant Project
Method (treatment) interactions on every test except Vocabulary. Some-
times Basal Classrooms did better and sometimes i.t.a. Classrooms did
better, depending on which Project they were in. This is the ultimate
muddle in methods research, the experimenter’s worst nightmare: ‘‘Now
you see it here, now you see it there, now you don’t see it anywhere.’’
The investigators tried to reduce or eliminate these interactions by
covariance analyses (ANCOVA) on the assumption that the erratic re-
sults occurred because children started out with different levels of skill.
ANCOVA helps to level the playing field by equalizing scores on the
whole class as well (mean scores). This is a doubtful practice, and, as far as I
am aware, unknown to Bond and Dykstra at that time. A full analysis of this
problem is presented in appendix 1.
3. The authors provide no explanation for why they reduced the data to class-
room means and analyzed the methods comparisons one at a time. When this
project was carried out, computers were physically big and computationally
small. The data and the program-code information went onto punch cards.
To compare 15 projects in one analysis requires a four-factor design with 300
cells! It is unlikely anything this complex could have been handled by com-
puters at that time. Mean scores may have been used for the same reason.
ANOVAs are simple to compute by hand (by calculator) if the data set is small
(a few hundred), but nearly impossible when numbers run into the thousands.
93
1. The only skill that would directly ‘‘cause’’ subsequent reading perfor-
mance is initial reading performance, and this was not measured.
2. Correlates of reading scores cannot be used to infer causality, and in
any case the baseline measures are correlated with each other, so it is im-
possible to know what is causing what.
3. One valid covariate in this study was IQ, but IQ was not used as a sep-
arate covariate in the ANCOVA analyses.
The authors do not report what kind of data were used for the covariance
analyses, but the tables reveal that mean scores were used again, which
voids the use of ANCOVA statistics. (You cannot do covariance analyses
with mean scores.) The tables also reveal that grossly inflated degrees of
freedom were used on all analyses. (These problems are discussed more
fully in appendix 1.)
Despite these statistical manipulations, the Project Methods inter-
actions did not go away. Something else was causing this effect. The re-
searchers decided to reanalyze the data for each Project separately (see
their table 23). Two Projects had strong across-the-board results favoring
i.t.a. over the Basal-reader groups. Three Projects essentially found no
differences between the two methods, except for spelling. Basal-reader
classes had significantly higher spelling scores. On closer inspection, the
explanation was obvious. Many of the children in i.t.a. classes were still
using nonstandard script, and scoring did not allow for this. However, it
was not at all clear why two Projects produced a strong i.t.a. advantage on
the remaining tests, and three Projects did not. The authors could not
94
explain this result. Using means instead of individual data in the analysis
Chapter 4
A New Look at Old Data There are 92 tables in this report but no table
summarizing the entire project. In view of the inappropriate use of the
data in the statistical analyses, a summary of combined data from these
more than 9,000 children is in order. It will be far more informative and
more valid than its predecessor. I calculated the grand means across all
Classrooms within a particular Method. These results are shown in tables
4.2 and 4.3. The tables illustrate the number of children who contributed
Basal þ phonics (1,002) 20.9 1.8 20.5 1.8 21.1 1.8 10.8 1.9 35.3 1.8
Basal (1,523) 20.0 1.7 20.7 1.8 21.2 1.8 12.1 2.0 36.6 1.9
Language experience (1,431) 21.5 1.8 21.1 1.8 22.1 1.9 12.3 2.0 37.3 1.9
Basal (597) 19.1 1.7 19.2 1.7 21.5 1.9 10.8 1.9 36.3 1.9
Linguistic (760) 19.0 1.7 15.8 1.6 19.6 1.7 9.3 1.7 33.8 1.8
Basal (525) 19.6 1.7 19.6 1.7 22.2 1.9 10.8 1.9 36.1 1.9
Lippincott (488) 26.6 2.2 24.4 1.9 23.7 2.2 14.1 2.2 41.4 2.2
Basal total (4,405) 19.6 1.7 19.2 1.7 21.4 1.8 10.8 1.9 35.6 1.9
Table 4.3
Chapter 4
to these means. With numbers this large, statistical tests are not necessary.
One can confidently assume a normal distribution, and the mean is the
most accurate measure of this distribution if standard deviations are low.
Fortunately, table 75 in the Bond and Dykstra paper provided the stan-
dard deviations for each subtest on the Stanford Reading Test for every
Method, and for all Basal-reader groups combined. These standard devi-
ations were computed correctly on the children’s individual test scores. All
were low and extremely consistent, indicating the excellent psychometric
properties of the Stanford tests as well as outstanding testing and data
collection. Thus we know we are dealing with normal distributions, and
combining means across Classrooms is a valid indication of what hap-
pened in this study. The table values represent only the nontransformed
(noncovaried) data.
Table 4.2 provides the grand means for the Stanford subtests, plus
grade-level conversions (in decimals, not months). The expected grade
level for these children at the end of the Project was 1.7 (first grade, eighth
month). The Basal-reader groups consistently scored at or near this level
across all measures. Because the Stanford tests had recently been normed
97
this was the only Stanford test on which i.t.a. children excelled, nor did
Chapter 4
they mention that this effect did not occur in three out of the five Projects.
Rather than pursue these issues further, especially in view of the many
problems with data handling and statistical analysis, the following is a
summary of what this study demonstrated.
The Results Sex differences favoring girls were very large and consistent
and appeared on every test in every comparison.
Correlations between the baseline measures (IQ, phoneme discrimi-
nation, letter-name knowledge) and the Stanford Reading tests were
modest, with the best predictors accounting for about 25 percent of the
variance. However, there was a high degree of overlap (shared variance)
between these baseline measures that was not controlled.
The fact that phoneme discrimination predicted 20–25 percent of the
variance in subsequent reading scores was a new finding, and needed to be
followed up.
The correlations between group and individual reading tests were
very high, indicating good test administration and good validity.
There was unequivocal evidence that good decoders can decode both
regularly and irregularly spelled words with the same facility, and that
sight-word memorization does not work (see the scores of the Basal-
reader children on the Fry decoding test in table 4.3).
Classroom means were used in the ANOVAs for the Stanford tests,
and incorrect degrees of freedom were used on about 75 percent of the
analyses. As a consequence, grand means have greater validity. These
showed that children who were taught with the Lippincott method scored
six months above grade on nearly all the Stanford tests, and were superior
99
to all other method groups on all tests, and there were large effect sizes on all
In paragraph 13, Bond and Dykstra wrote that these two programs
‘‘encourage pupils to write symbols as they learn to recognize them and to
associate them with sounds’’ (1997, 416).
This is the first time in the report that the reader has some idea of the
specific characteristics of the Lippincott program, and it is the first time
that Bond and Dykstra give any indication that they are aware of what
happened in this study. From what we know today, had these suggestions
been followed up with appropriate research, we might be one or two de-
cades ahead of where we are now, and we might have prevented the
unnecessary suffering of hundreds of thousands of children who have
struggled to learn to read. But these suggestions were not followed up.
One cannot help wondering why.
might have missed. In the conclusion to their study, they wrote: ‘‘Evi-
Chapter 4
teachers and methods are not mutually exclusive. Both could be studied at
prising that no one was inspired to follow them up. Here are some of them:
ten Projects: two each for Basal reader, i.t.a., Language experience, Lin-
et al. 1977) proved difficult. Data had not been collected on the same
measures, and testing was not synchronized in time nor reported in a
similar fashion.
This research lacked the advance planning and uniform controls
imposed by Bond and Dykstra in their study, and only the vaguest con-
clusions could be drawn. There was a tendency for the academic approach
to be most successful, and within this group the direct-instruction method
known as DISTAR was most effective. DISTAR was designed at the
University of Oregon by Engelmann and Bruner (1969) and was intended
for disadvantaged children in small-group instructional settings. It re-
quires minimal teacher training, because lessons are scripted and teachers
are strongly encouraged not to deviate from the script.
DISTAR includes a variety of components, including math and lan-
guage. The reading program is phonics oriented and features a modified
script. Letters vary in size depending on their perceived importance in
decoding, or are marked to indicate pronunciation. Children do not learn
to segment but to read the ‘‘slow way’’ by stretching the sounds. The
long-term effects of the complete DISTAR program were reviewed by
Becker and Gersten in 1982. They provided an analysis of the follow-on
data for fifth and sixth graders collected in 1975. These children had been
in a three-year DISTAR program for reading, language, and mathematics,
beginning in first grade in 1969–70. When DISTAR training ended at
third grade, children had gained over 20 standard-score points on the
Wide Range Achievement Test (WRAT) for reading, 9 points in spelling,
and 7.5 points in arithmetic. Percentile scores were 67, 40, and 45 (50 is
the national average). These are good results.
There was a further follow-on at fifth and sixth grade after the DIS-
TAR children had been transferred to schools with normal instruction.
Compared to a matched control group, they were significantly ahead in
WRAT reading in 23 out of 31 comparisons. At fifth grade, DISTAR
children were superior in one-third of the comparisons on the Metropol-
itan Achievement Test (MAT) battery, and equal to the controls on the
remaining two-thirds. By sixth grade this advantage had dwindled to
around 10–20 percent of comparisons.
However, when compared to national norms, these children did not
do well. When Becker and Gersten plotted the average percentiles and
105
standard scores for the children across six grades, the DISTAR children
It is the ultimate irony that Chall’s book and Bond and Dykstra’s study
had the opposite effect from the one they intended. Bond and Dykstra’s
analysis of the data failed to reveal what really happened (that is, the su-
periority of the Lippincott program on all measures), and seemed to show
that the success of a reading method was completely unpredictable. This
message undermined Chall’s assessment of the research (that synthetic
phonics was the clear and unassailable winner) and gave more weight to
her observations on the impact of the teacher. Taken together, the pro-
jects pointed to the same conclusion: teachers matter more than the method.
Dykstra (1968b) even suggested this in a follow-up report, expressing
concern that teachers needed to be monitored in future research.1
The net effect of Bond and Dykstra’s project was to virtually shut
down applied research on classroom reading methods. If the teacher mat-
ters more than the method, the method is largely irrelevant. If basal-
reader programs are no worse than anything else, there is no reason to
change. And unless some way can be found to neutralize or stabilize the
teacher’s input into the process, any future research on methods is a waste
of time. There was no point in funding this kind of research when a study
of over 9,000 children failed to show anything definitive. The fact that
only 38 research reports on reading methods passed the final screening in
the recent National Reading Panel survey says it all. And of these 38
reports, half are tutoring programs for older readers. Twenty valid studies
years, one reason we have made so little progress in establishing solid sci-
entific evidence on how to teach children to read.
The 1960s projects had even more ripple effects, opening the door for
the whole-language movement. Chall’s book (in company with Flesch’s
book in 1955) unmasked the basal programs. The facts were there for all
to see, complete with virtual pages from those boring readers. By the end
of third grade, children had been exposed to a reading vocabulary of a
mere 1,500 words. They did not learn to spell. They did no writing of any
kind until second grade.
This was the antithesis of what should happen, according to the
founders of the whole-language movement, who believed that learning to
read was as ‘‘natural’’ as learning to talk. Children should learn to read by
reading and use stories written in natural language, not in the stilted and
repetitive style of the Dick and Jane readers. This way, children could
apply all their linguistic skills (vocabulary, syntax, sensitivity to context) to
understand what they read. According to Goodman (1967), reading is a
‘‘psycholinguistic guessing game,’’ where the main goal is to follow the
gist of the story. Accuracy is largely irrelevant.
Of the face of it, the ‘‘natural-language’’ approach sounded like a
better solution, and had the added bonus of being fun—fun for the
teacher whose main task was to read interesting stories from real chil-
dren’s literature out of Big Books, and fun for children who got to listen
and ‘‘read along’’ in little books. It was fun for teachers to encourage
creative writing and watch in admiration as children invented their own
spelling system, and fun for children to ‘‘write stories’’ regardless of
whether anything they wrote could be deciphered. It was motivating for
children to believe they were learning to read, write, and spell, despite the
fact they were not. The whole-language movement was the third (and
final) whole-word method of the twentieth century, and it took the
English-speaking world by storm—with catastrophic consequences (see
chapter 1).
The founders of whole language were not alone in believing that
natural-language development had something to do with learning to read.
This has been a major theme in reading research as well. Most of the
research on spelling derives from the belief that children go through
109
sight words, the lower their reading scores. Time spent reading aloud (to
The average age was six years at midyear when the training and the ob-
servations began. Observations followed the format developed by Evans
and Carr. Every child in every classroom was monitored for a brief time
period on numerous occasions, and the observations continued for several
months.
Overall activity patterns were similar in the two types of classrooms.
Children went out to play for the same amount of time, spent the same
amount of time transitioning between activities, and were interrupted
equally often. However, children in the ‘‘balanced’’ classes spent a sig-
nificantly larger proportion of the time in non-literacy-related activities
(35 percent versus 28 percent). Of the ten literacy activities coded, there
were significant differences on four, with JP children participating sig-
nificantly more often in each one. JP children spent 10 percent of the
language period on phonics-related activities (explained below), compared
to 2 percent for the other children. They spent more time on auditory
phoneme-awareness tasks (7 percent versus 3.7 percent), and more time
memorizing sight words and learning about grammar, though the last two
activities were infrequent in all classrooms. There were no differences be-
tween the two types of classroom in the time spent on the remaining cat-
egories: learning concepts of print, like the order of words on the page
and order of pages, reading aloud or silently, ‘‘pretend writing’’ sentences,
copying letters/words/sentences and writing them from memory, learning
letter names, and vocabulary lessons. Vocabulary work took up by far the
most time in both types of classrooms (18 percent and 20 percent).
The children took five standardized reading and spelling tests at the
end of the school year. Children in the JP classrooms were significantly
ahead on every test. When scores on these tests were correlated with time
on task for the various activities, only two activities were significantly (and
positively) related to reading and spelling scores. These were phonics
activities (range r ¼ :48 to .62), and copying/writing letters and words
(range r ¼ :50 to .55). The correlations between time spent on auditory
phoneme analysis (no letters) were essentially zero for all five reading
measures. Although no other values were significant, the following activ-
ities were consistently negatively correlated with all five reading and spell-
ing tests, with values for r ranging from .20 to .31: learning letter
113
dences into memory has been investigated by Hulme and his colleagues
(Hulme 1981; Hulme and Bradley 1984; Hulme, Monk, and Ives 1987).
They compared learning speed for mastering phoneme-grapheme cor-
respondences for copying, using alphabet cards, or using letter tiles.
Children learned much faster when they wrote the letters. Hulme and
colleagues (1987) concluded that motor activity promotes memory, and this
assists children in learning to read. But this is only part of the answer.
Copying letters forces you to look carefully and hold this image in mind
while you are writing. This, plus the act of forming the strokes, makes it
clear how letters differ. (See McGuinness 1997c for an analysis of how
copying assists memory.)
Cunningham and Stanovich (1990b) reported the same effect for
spelling accuracy. First graders memorized spelling lists using three dif-
ferent methods: copying by hand, utilizing letter tiles, and typing the lists
on a computer keyboard. Children spelled twice as many words correctly
when they learned by copying the letters than with letter tiles or typing.
They also found that saying letter names while writing letters had no im-
pact on spelling performance.
The most surprising result to emerge from the observational studies
was the large number of activities that were either nonproductive (zero
relationship to reading) or actually detrimental (negative correlations).
Negative correlations can mean either of two things: time wasted (a trade-
off between learning one thing at the expense of another), or a negative
outcome, like the creation of a maladaptive strategy. Correlational re-
search can never prove causality, but it is hard to imagine that vocabulary
activities and listening to stories are ‘‘bad’’ for children. Here, the time
trade-off argument makes sense, especially because vocabulary work and
listening to stories took up more time than any other literacy activity in
most classrooms. (The big question is whether time spent on vocabulary
work actually improves vocabulary. We will come back to this question in
chapter 8.)
On the other hand, we know that time spent memorizing sight words
can cause a negative outcome by promoting a strategy of ‘‘whole-word
guessing.’’ This is where children decode the first letter phonemically and
guess the rest of the word based on its length and shape. This strategy
is highly predictive of subsequent reading failure (McGuinness 1997b). It is
115
Spelling Helps Reading, But Seeing Misspelled Words Is Bad for Spelling
There were other important discoveries during this 30-year period. Two
sets of studies, in particular, bear directly on the importance of writing
and spelling practice. One group of studies looked at the impact of spelling
116
Invented Spelling
There is little research on the consequences of invented spelling, because
spelling is assumed to follow natural developmental stages. When children
are encouraged to ‘‘just write’’ and ‘‘invent’’ their own spelling system, the
most common pattern that emerges (other than complete randomness) is
letter-name spelling (so, for example, the word far would be spelled fr), as
117
for spelling research. This type of test appears in well-known test batteries
Chapter 5
how to spell, and so writing out lists of all possible spellings, carefully
code.
Teach phonemes only—no other sound units.
lesson.
Link writing (spelling) and reading to ensure children learn that the al-
and most consistent success. How well do these programs align with the
Chapter 5
Other problems came to light as the NRP reviewed the 75 studies in more
depth, such as missing control groups, too limited a focus or time span,
incorrect or inadequate statistical analysis, inappropriate outcome mea-
sures, and duplicate studies (or data) found in another publication. As a
result, only 38 research reports (a total of 66 individual comparisons)
passed the final screening—a sad state of affairs.
Yet even this dismal showing is better than the research database on
whole language. In 1989, Stahl and Miller attempted to review the whole-
language/language experience research and provide a quantitative synthe-
sis of the research findings. The search was exhaustive and dated back to
1960. It included all the obvious databases, dissertation abstracts, and bib-
liographies. Personal letters were sent to the major figures in the field,
asking for information and help. Apart from Bond and Dykstra’s research
on the language experience method, only 46 studies were found, and only
17 had sufficient statistical data to compute effect sizes. These 46 studies
constituted the total pool of research on the whole-language and language
experience methods. Contrast this with the 1,027 studies found by the
NRP. These studies compared whole language/language experience to
various basal-reader programs.
Because Stahl and Miller located so few studies, none could be ex-
cluded, regardless of the numerous methodological problems they de-
tected and the fact that most of these studies had never passed peer review.
They were obliged to report the outcomes in a table similar to the one
used by Chall. The table showed that 58 comparisons were not significant,
26 favored whole language/language experience, and 16 favored basal
readers. On closer inspection, the advantage to whole language was en-
tirely on nonreading tests, like ‘‘concepts of print,’’ and only in the
kindergarten classes. Of the 17 programs with sufficient statistical infor-
mation to compute effect sizes, only 4 were published papers. Of these 4,
124
whole language had marginally higher effect sizes on ‘‘print concepts’’ and
Chapter 5
A Quick Overview
With these reservations in mind, we will explore what the NRP’s meta-
analysis showed about the global effectiveness of the different types of
reading programs. For the most part, the studies in the database compared
a phonics-type program with something else. It was rare to find two
phonics programs being compared. Taking all cases (66 ES values), the
mean ES after training was .41 for combined reading scores. At follow up
(62 cases), it was .44 (see table 5.1). The positive value shows an advan-
tage of phonics methods over contrasting methods. These effect sizes are
a composite value that represents every reading measure, on every age
126
Table 5.1
Chapter 5
National Reading Panel: Reading-instruction effect sizes for phonics versus other
N Read- N Spell- N Compre-
cases ing cases ing cases hension
All studies
Immediate testing 65 .41
End-of-year testing 62 .44
K and 1st grade only 30 .55 29 .67 20 .29
2nd to 6th grade 32 .27 13 .09 11 .12
group, on normal and special populations alike, and for all types of re-
search designs. It is interesting to see what happens when a subset of
studies is subtracted from the mix. A general summary of ES values for the
studies as a whole is shown in table 5.1.
I separated the studies on beginning readers (kindergartners and first
graders) from the studies on older readers (children receiving remedial
tutoring). For beginning readers, the ES value increases for phonics
methods (30 cases, ES ¼ .55). For older, poor readers, it is reduced sub-
stantially (32 cases, ES ¼ .27). The first value is moderately large and sig-
nificant, and the second is neither. Does this mean that phonics-type
programs do not work for older, poor readers? Well, no, it does not. But
it does mean that a large proportion of the remedial programs in this
particular database were unsuccessful. For instance, programs based on the
Orton-Gillingham model were particularly ineffective (10 cases produced
an ES value of only .23).
Because my analysis will focus on beginning reading instruction, we
can unravel this further by examining what the ES value of .55 represents.
The 30 individual comparisons that contributed to this value tell us, by
and large, that a phonics-type program produces a 0.5-standard-deviation
advantage over a non-phonics-type program. Not all phonics programs
were equally successful. Particularly weak were programs described as rime
analogy. These programs teach larger sound units, like word families, and
encourage children to make analogies to other words by swapping word
parts (cr-own, fr-own, d-own). The average effect size was .28 for this group
of studies, not significantly different from the comparison programs.
127
A Glossary
Chapter 5
Visual Phonics The short version (a) teaches the 26 ‘‘sounds’’ of the
26 letters of the alphabet. The long version (b) teaches 40 to 256þ
‘‘sounds’’ of the letters, digraphs, and phonograms.
The Prototype Fits (b) above, plus the other components of the pro-
totype listed earlier. There are no programs that fit all the elements of the
prototype in the NRP database, though some come close.
131
Lippincott and i.t.a. The Bond and Dykstra 1967 study included two
reading programs close to the prototype: the initial teaching alphabet
(i.t.a.) and the Lippincott program. (See chapter 4 for descriptions of these
programs.) Bond and Dykstra’s study was not included in the NRP data-
base because of their 1970 cut-off date. I am adding it here, because the
effect sizes represent over 3,000 children, providing the most accurate
estimate of the power of these programs. I computed the effect sizes for
i.t.a. and Lippincott on the Stanford Reading Test subtests. (‘‘Word
study’’ is a measure of phonics knowledge.) The comparison groups were
the basal-reader classrooms (see chapter 4).
Table 5.2 contrasts the effectiveness of these two programs. The
Lippincott program has much higher effect sizes than the i.t.a. program
and is superior to the basal programs in all respects (ES ¼ 1:12 for read-
ing, .61 for spelling, .62 for phonics knowledge, and .57 for reading com-
prehension). Perhaps having to learn, then unlearn, a special script (i.t.a.)
wastes time and/or causes confusion. More recent studies on Lippincott
that are in the NRP data pool are shown in table 5.2. Effect sizes are
smaller but generally confirm the Bond and Dykstra results, which are far
and away the most accurate reflection of the strength of this program.
Table 5.2
Chapter 5
16) but no common sight words are included. Spelling activities are indi-
rect in the sense that children build and alter nonsense words using letter
tiles. The program takes, on average, 80 hours for poor readers to reach
normal reading levels.
Because of the population for which it was written, the Lindamood
program requires some modification and flexibility to teach beginning
readers. The exercises on phoneme analysis need pruning, and lessons
on the 40þ phonemes and their basic-code spelling need to be speeded
133
Table 5.3
Chapter 5
McGuinness, McGuinness,
and Donohue 1995
N ¼ 42
Grade 1 .30 1.66
Torgesen et al. 1999
N ¼ 180
K to grade 1 .32 .71 .25
Grade 2 .48 .89 .43 .53
N ¼ 138
Tests: Woodcock Reading Mastery (reading, comprehension), WRAT (spelling).
lated with age, vocabulary, auditory and visual short-term memory, and
Open Court There is only one study in the NRP database on the Open
Court program. This is unfortunate, because this program has received a
good deal of national attention, and because it is a good fit to the proto-
type, at least according to the description provided by Foorman et al.
(1997, 67):
is regarded as the key strategy for applying the alphabetic principle, and,
Chapter 5
therefore, 8–10 new words are blended daily; 4) Dictation activities move
from letter cards to writing words sound by sound, to whole words (by lesson
17), to whole sentences (by lesson 27); 5) Shared reading of Big Books; 6)
Text anthologies (with uncontrolled vocabulary), plus workbooks are intro-
duced in the middle of first grade, when all sound/spellings have been intro-
duced; and 7) Writing workshop activities are available in individual and small
group formats.
There are some peculiarities in this program, such as the extensive use of
color-coded text (both background and print) to mark consonants and
three types of vowel spellings. There is little empirical support for the use
of color-coded text, and there is a risk that children will come to rely on
the colors to the exclusion of noticing the specific print features. Trans-
ferring to normal text may cause difficulties. The program includes some
aspects of whole language, which can muddy the waters.
The second-grade program starts with a review of the sound-to-letter
correspondences, includes more blending exercises, and adds two new
anthologies. There is no mention of how spelling is taught after the basic
code has been mastered.
The study by Foorman et al. (1997, 1998) was ambitious, involv-
ing 375 children (1997), reduced to 285 (1998), who received tutoring
through Title 1 in addition to a classroom program. There were three
control programs: whole language with no special teacher training, whole
language with teacher training, and an embedded-phonics program with
teacher training. Open Court teachers (and tutors) were taught by Open
Court trainers. The embedded-phonics program was developed locally,
and was a visually driven method based on letter patterns and featuring an
‘‘onset-rime-analogy’’ type of instruction.
These children were scattered among 70 classrooms in all (70
teachers). The three methods were taught to the whole class, but only the
tutored children participated in the study, about 3 to 8 children per classroom.
Classroom time for literacy activities was 90 minutes daily, and tutoring
was provided for 30 minutes per day. Whether these periods overlapped is
unknown. This was a complex study design in which the tutorial method
sometimes matched the classroom program and sometimes did not. To
complicate matters further, the tutors had been trained the previous year
137
Table 5.4
Jolly Phonics Jolly Phonics was developed by Sue Lloyd (1992), a class-
Chapter 5
room teacher in England who devoted many years to perfecting this pro-
gram. It takes its name from the publisher, Chris Jolly. Jolly Phonics
meets nearly all the requirements of the prototype, and goes beyond it in
some respects. Jolly Phonics is the product of what can happen when
popular myths of how to teach reading are challenged.
First to go was the myth that reading is hard to teach. Second to go
was the notion that a linguistic-phonics program cannot be taught to
the whole class at the same time. Third to go was the age barrier. Jolly
Phonics is taught to 4-year-olds. Fourth to go was the belief that young
children cannot pay attention for more than about 10–15 minutes at a
time. Fifth to go was the related belief that if young children are kept at a
task for longer than about 15 minutes, they become bored and frustrated
and are unable to learn. Sixth to go was the idea that teachers need ex-
tensive training to teach the alphabet code properly.
Lloyd’s initial goal was to reduce the lessons to the essential elements
and present them at an optimum rate, as quickly and as in depth as pos-
sible. Undoubtedly, her greatest insight was in figuring out what these
elements are. Certainly nothing in teacher training provides any useful
information on this issue. The next questions were how these elements
should be taught and how early and quickly they could be taught, given
the appropriate sequence and format. She discovered that young chil-
dren forget what they have learned when lessons are spaced too far apart.
This necessitates constant reteaching and review, wasting an extraordinary
amount of time.
Lloyd discovered that a comprehensive reading program can be
taught to young children in a whole-class format if three conditions are
met. First, the lessons should be fun and stimulating and engage all the
children. Second, sufficient backup materials for individual work have to
be available to support what is taught in the lessons. Third, parents need
to be involved enough to understand the program and know how to sup-
port their child at home. When lessons are enjoyable and when children
see that they and their classmates are actually learning to read, they have
no trouble paying attention for up to an hour.
Lloyd found clever and ingenious ways to engage the whole class and
keep them interested. She invented simple action patterns to accompany
learning each phoneme. Children say each phoneme aloud accompanied
139
Table 5.5
Chapter 5
Table 5.6
Jolly Phonics effect sizes
Phoneme
segmen- BAS Young Nonword Compre-
tation reading reading decoding Spelling hension
Johnston and Watson 1997
N ¼ 53
JP vs. analytic phonics
Immediate (5:1) 1.52 .90
1st posttest (5:4) 3.27 1.0
142
They did not believe that young children could learn a phoneme and let-
Chapter 5
ter per day, or endure hour-long sessions. There was a compromise. Les-
sons were reduced to 20 minutes or less, and total recommended hours
were nearly cut in half. As noted in the discussion of the observational
study, the JP program was taught along with a mix of other language
activities, reducing time still further. We can see the impact of the slower
delivery of the program, plus the lack of focus, in the lower effect sizes
(table 5.6). Sumbler commented that after working with the program,
some teachers realized that children could learn this material much more
quickly and had begun speeding up the training.
There were three follow-on studies among this group. Johnston and
Watson (1997) followed the children for an additional year, to 7.5 years
old. JP children maintained their gains and were about one year ahead of
the control group and national norms. Furthermore, one-third of the
control children scored more than one standard deviation below the mean,
but only 9 percent of the JP children scored this low. Stuart’s one-year
follow-up produced much the same results.
Johnston and Watson (2003) followed up the ‘‘Wee County’’ children
for an additional year. However, because of the initial success of the pro-
gram, school officials insisted that the entire county be switched to FPF,
and group differences were eliminated by the end of the next year. Now
all the children in the Wee County scored one year above national norms.
Further follow-on studies (see Johnston and Watson 2003) showed that
the advantage for all the children increased over time to two years above
national norms by age 9.5 on reading (decoding), and one year ahead in
spelling. Reading comprehension was only marginally above norms (see
table 5.7).
None of the prototype programs include anything other than the
most rudimentary attempt to teach the advanced spelling code. Yet spell-
ing scores were surprisingly high, certainly much higher than the na-
tional norms. Because norms are based on the current status quo, this
tells us what a parlous state spelling instruction is in. Merely teaching the
basic code the right way around, getting the logic straight, and adding a
dozen or so of the 136 remaining spelling alternatives makes an enormous
difference.
Several of the studies in this group measured phoneme awareness in
addition to the reading and spelling tests. Table 5.6 shows the enormous
145
Table 5.7
Potpourri
Before we leave the assessment of reading methods, I want to report on
some of the other programs that made it into the NRP database. A brief
synopsis of these studies provides a startling glimpse of the confusion and
variability that abound in research on reading methods, and in the use of
the term phonics.
146
Teachers received additional training and continuing help from the re-
Chapter 5
Conclusions
How valid and reliable is a meta-analysis for research of this complexity
where nearly every study uses a different method, involves different types
and ages of children, employs different research designs, and utilizes dif-
ferent measures of reading and spelling competence? I should point out
that nearly all reading and spelling measures in these studies were prop-
erly constructed, normed, standardized tests. This does at least ensure
that the effect sizes (when computed accurately) are statistically valid and
reliably represent group differences on these measures. This is a bonus,
because standardized tests are very much the exception in the phoneme-
awareness training studies reviewed in the next chapter. Nevertheless, effect
sizes will vary for other reasons, such as study design and extraneous factors.
I have taken special care to analyze each study separately in detail and
to compute effect sizes myself (many times). This revealed that the com-
150
putations in the NRP report were not always accurate. Sometimes this was
Chapter 5
due to misunderstanding the study design. For example, one study in the
database (Martinussen and Kirby 1998) had nothing to do with comparing
reading methods. Instead, it focused on whether teaching different strat-
egies for learning the same reading program made any difference. Results
were not significant for reading, and there were huge floor effects on the
Woodcock reading tests with standard deviations three times higher than
the means. Yet effect sizes were computed anyway.
Besides noting anomalies like this, I was concerned about the fact that
my values sometimes deviated significantly from those in the NRP’s
tables, especially since we were using the identical formula. In part, this
was because I was more interested in individual test scores, whereas the
NRP was more likely to collate the data from several tests. However, in
other cases, there were gross computational errors. For example, Griffith,
Klesius, and Kromrey (1992) (see previous section) reported no significant
differences between two contrasting reading methods on every reading
and spelling test except one. ‘‘No significant differences’’ translates into
low or zero effect sizes, which is what I found. The NRP, however, re-
ported large effect sizes in every case (some higher than 1.0) in favor of
whole language. Errors like these bias the general meta-analysis results (in
this case away from phonics and toward whole language).
In view of the nature of meta-analysis as a statistical tool, and the
problems outlined above, the overall summary of the panel was disap-
pointing. They relied exclusively on the summary tables and argued from
global effects to conclusions. There were sweeping generalizations like the
following: ‘‘Phonics instruction failed to exert a significant impact on the
reading performance of low-achieving readers in second through sixth
grades’’ (National Reading Panel, 2000, 2-133). This is a dangerous
statement. It implies that no phonics instruction works for poor readers or
that none of the programs in the NRP database was effective for this pop-
ulation, neither of which is true. The statement also gives the false im-
pression that all remedial-phonics programs found their way into their
data pool, when this was far from the case. In fact, the most successful
remedial-reading programs today were missing (see D. McGuinness
1997c; C. McGuinness, D. McGuinness, and G. McGuinness 1996).
As for beginning reading instruction, the panel concluded that the
meta-analysis value (MS ¼ .44), the one we started with at the beginning
151
of this section, ‘‘provided solid support for systematic phonics’’ (p. 2-132),
program continue?’’ (It takes one semester to teach the basic code.)
Chapter 5
their speech before age 3 (Chaney 1992). (These studies are reviewed in
Chapter 6
ysis occurs below the level of conscious awareness, children (or illiterate
Phoneme-Awareness Training
adults) must be taught to pay attention to the phonemic level of speech to
learn an alphabetic writing system. This does not necessarily require ex-
plicit awareness (‘‘awareness that’’), as Cossu, Rossini, and Marshall’s (1993)
research on children with Down’s syndrome has shown. These children
were able to learn the sound-symbol correspondences of the Italian alpha-
bet code by simple matching and repetition, and could read (decode)
real and nonsense words at fairly high levels of skill. Yet they could not
comprehend what they read and failed dismally on phoneme-awareness
tests.
Another powerful component of the myth is that phoneme awareness
is difficult to teach. This is said to be because phonemes are coarticulated
and hard to tease apart, and because consonants cannot be produced in
isolation from a vowel. Thus the best rendering of a segmented word
like cat is ‘‘kuh-aa-tuh.’’ Neither statement is true. Only five of the 40þ
English phonemes are hard to produce in isolation. These are the voiced
consonants: /b/, /d/, /g/, /j/, and /l/. But even they can be managed by
keeping the vowel extremely brief. No other consonants need to be pro-
duced with a vowel, and certainly not /k/ and /t/.
Because the evidence shows that children have good phoneme sensi-
tivity, and isolating phonemes in words is not nearly as difficult as re-
searchers seem to believe, do we really need special phoneme-awareness
training programs? More to the point, children do need to be made aware
of phonemes to use an alphabetic writing system and to learn to match
each phoneme to its letter symbols. But this is what a linguistic-phonics
program teaches. What special benefit, then, does a phoneme-awareness
training program confer?
Poppy is a pet.
The best pet yet.
Notice that the sound /p/ appears in all positions in a word. After the
children learn to hear this sound, and to say the sound in first, middle, and
last positions, they see the letter p and are told that this letter represents
the sound /p/. They practice tracing, copying, and writing each new letter
until its shape is firmly embedded in memory. As they write, they say the
sound the letter stands for.
Once a few sounds are introduced, they are combined to make real
words (not nonsense words). This way children get to ‘‘meaning’’ as soon
as possible and understand the value and purpose of the exercise. Real
words can be written down, sound by sound, and retrieved (decoded) from
print. Phonemes and their basic-code spellings are introduced as quickly
as children can learn them, every lesson building on the one before, until
all 40 phonemes and their spellings are mastered. Spelling alternatives
follow next.
Children learn to segment and blend phonemes by seeing and writing
letters. They learn that phonemes occur one after the other over time, and
that letters are sequenced one after the other over space (left to right).
Reading and writing (spelling) are integrated in the lessons so that chil-
dren learn the code nature of our writing system.
These are the basic building blocks of a good linguistic-phonics
program.
The central question is whether a phonological-training component
adds anything to what I have just described. Will it enhance phoneme
analysis, reading, and spelling skills beyond the programs that fit the pro-
totype outlined in the previous chapter? To prove that it does, it would
157
have to produce still larger effect sizes for reading, spelling, and phoneme
Phoneme-Awareness Training
awareness than the successful phonics programs alone, with ES values well
above 1.0.
To answer this question, I will draw on the efforts of the National
Reading Panel once again. Their focus was on phoneme awareness. To
meet the selection criteria, the study had to include training in phoneme
awareness. The panel unearthed 1,962 articles referenced by this term,
and they were screened by these criteria:
training.
Statistics had to be adequate to compute effect sizes.
Table 6.1
Chapter 6
spelling (ES ¼ .59). But when the training is purely auditory (subtracting
all studies where letters were used), the impact on reading and spelling is
substantially reduced (ES values shrink to .38 and .34). Also, if reading is
measured by a standardized test instead of an experimenter-designed test,
the impact of phoneme-awareness training on reading and spelling shrinks
as well (ES ¼ .33 and .41). In-house tests inflate effect sizes, because they
measure what was taught.
Another meta-analysis of phonological-awareness training studies was
provided by the Dutch team of Bus and Van IJzendoorn (1999), using a
159
Phoneme-Awareness Training
ness, excluding training on rhyme and syllable segmenting. They reported
an effect size of 1.04 for the impact of phoneme-awareness training on
phoneme tests, and .44 for reading. However, follow-on studies showed
the impact of phoneme-awareness training on reading was nil (ES ¼ .16).
Bus and Van IJzendoorn commented on the enormous variability be-
tween these studies. In an attempt to reduce the ‘‘noise’’ in the data, they
recomputed the meta-analysis using only the studies from the United
States where children were randomly assigned to groups or matched.
Surprisingly, the effect size for the impact of phoneme training on pho-
neme awareness declined (ES ¼ .73), while it rose considerably for read-
ing (ES ¼ .70). Bus and IJzendoorn were overly optimistic about this
result: ‘‘The training studies settle the issue of the causal role of phono-
logical awareness in learning to read: Phonological training reliably en-
hances phonological and reading skills. About 500 studies with null results
in the file drawers of disappointed researchers would be needed to turn
the current results into nonsignificance’’ (p. 411).
We will see shortly that a more careful reading of these U.S. studies
will turn these results into ‘‘nonsignificance.’’
The NRP’s data pool reflected a similarly diverse group of studies
with different types of training, different tests measuring different things,
and different populations in different countries. Eleven studies were
carried out in non-English-speaking countries, including Israel, which
does not even have an alphabetic writing system. Some of these countries
have transparent alphabets (Finland, Germany, Spain, Sweden, Norway),
and some do not (Denmark). In countries with a transparent alphabet,
standardized reading tests measure reading speed (fluency), not accuracy,
while the reverse is true for countries with opaque alphabets. Subtract-
ing foreign-language studies increases the connection between phoneme
awareness and reading for studies in English-speaking countries (English:
ES ¼ .63; foreign: ES ¼ .36). Eliminating the computer-training studies,
which were singularly unsuccessful, had the same impact: (teachers: ES ¼
.55; computers: ES ¼ .32).
The NRP summary tables illustrate shifts in effect sizes when one
variable at a time is subtracted. However, tables were also provided for
each study, listing individual effect sizes along with information on im-
portant descriptors, like the number of children in the study, the country
160
where it was conducted, the number of hours the training lasted, and so
Chapter 6
forth. This makes it possible to subtract studies that obscure a true picture
of the findings. To this end, I recomputed the average ES value for read-
ing and spelling (see the bottom row of table 6.1) after eliminating studies
with these characteristics:
Foreign-language studies
Computer-training studies
Total N less than 20 children
Training lasting less than 3 hours
Studies on older poor readers (grade 3 or higher)
The new ES values are now much larger: ES ¼ .74 for reading and
ES ¼ 1.01 for spelling. These results are more in line with the results
from the good phonics programs, and in better agreement with Bus and
IJzendoorn’s stripped-down meta-analysis of U.S. studies. This seems to
show that phoneme-awareness training has a strong impact on reading
and spelling. But does it?
Two pieces of information were missing from the NRP’s tables on
the individual studies. The first is whether letters were included in the
training, and the second, which reading tests were employed. After careful
reading of the remaining studies in the data pool, and recomputing many
of the effect sizes, I again found that the NRPs effect sizes were not always
accurate, and I will comment on any discrepancies as we go along.
NRP’s effect sizes were often based on the combined scores of in-
house and standardized tests. Reviewing the individual studies, I found
that in-house tests lacked reliability estimates and were highly specific to
what was taught. This will grossly inflate effect sizes, giving a false picture
of what these training studies accomplished. By contrast, the effect sizes
for reading instruction were based on standardized tests (see the previous
chapter). If one wants to know the true impact of a phoneme-awareness
program on reading, the same rigorous measures of reading and spelling
must apply. To examine this problem more closely, we have to turn to the
studies themselves.
In the remainder of this chapter, I will be reviewing those studies
left in the data pool after the studies in the categories listed above
161
Phoneme-Awareness Training
awareness training with or without letters, and examine the outcomes for
experimenter-designed tests versus standardized tests. I will also examine
the impact of various types of training, such as programs using larger or
smaller phonological units, and how, precisely, phoneme awareness is
taught. To this end, I have included a few well-designed foreign-language
studies that investigated neutral factors such as the impact of teaching
phoneme awareness with or without letters.
A Glossary
Before we move on to the analysis of the individual studies, I want to clear
up some terminology problems. There are several types of phoneme-
awareness skills. Suffice it to say that there are no standardized tests of
these skills, and few tests have norms. The descriptive terms for these
skills vary as well. Here is a glossary of how I intend to use terms.
Discrimination. Telling phonemes apart. ‘‘Are the sounds /v/ and /v/ the
same or different?’’
Identification. Being able to identify (say the sound) when asked for a
phoneme in a particular location in a word (first, middle, final position).
‘‘What is the first sound in cat?’’
Sequencing. Being able to say phonemes in the order in which they appear
in a word. There are three teaching methods:
this view, when we read, we rapidly translate letters into phonemes and
Chapter 6
blend them into the word. When we spell, we say the word, segment the
sounds, translate each sound into a letter or letters, and write it down.
This is not what happens.
Children see an unfamiliar word: sting.
To read the word, they sound out each phoneme (segmenting): /s/ /t/
/i/ /ng/. Then they blend the sounds into the word and check the out-
come. It is quite common for beginning readers and poor readers to seg-
ment correctly and blend incorrectly: /s/ /t/ /i/ /ng/—sing.
To spell the word sting the children say the word, hear each segment
in sequence, and blend the segments into the word as they write. Seg-
menting and blending are intimately connected in both reading and
spelling. Even when reading and spelling are efficient, and it seems like
processing is instantaneous, it is not. The same sound-by-sound analysis
continues, only at a phenomenal speed. This is what is meant by automa-
ticity. Our brains operate in the realm of milliseconds, while conscious re-
flection operates in the realm of seconds.
We become aware of the interplay of these two processes when we see
a word we cannot read or start to write a word we cannot spell. My spell-
ing occasionally falters when I write technical words on the board, an in-
teresting phenomenon caused by the 90 degree rotation from the normal
plane and by the wider visual angle. To solve this I have to consciously
slow down, segment each sound, then deliberately blend the sounds vo-
cally as I write.
As I review the individual studies in the next section, the litmus test
for evaluating the relevance of a phoneme-awareness training program
will be the evidence from the Jolly Phonics research that 4-year-old chil-
dren can learn the 40 sounds of English and their basic-code spellings in
about 11 weeks of whole-class teaching. At the end of the school year,
these children were one year advanced on standardized reading and spell-
ing tests compared to the control groups, and compared to national
norms. The average effect sizes for the JP studies were 1.60 for phoneme
segmenting, 1.0 for reading, and 1.42 for spelling. Follow-on data showed
that an effect size of 1.0 for reading held up for 3 years, the longest time
measured. To make the case for a separate phoneme-awareness training
regime, it would have to produce results better than this.
163
Phoneme-Awareness Training
Phonological-awareness programs differ in a number of ways, including
the size and the type of phonological units taught, how much of the focus
is on larger versus smaller units, how many phonemes are taught, and how
long it takes to teach them. Many programs in my selected data pool teach
larger units of sound as well as phonemes. What works best?
were added. Now there was a strong impact on decoding words and non-
Chapter 6
However, scoring was well above chance for the 264 older children.
Phoneme-Awareness Training
With age, IQ, and memory controlled, rhyme scores (middle and final
sounds) did not predict performance on standardized reading and spelling
tests. Alliteration (first sound) was a significant predictor, accounting for 5
to 8 percent of the variance. But there was a problem. Alliteration pre-
dicted math test scores even better (10 percent of the variance)! Whatever
the alliteration test is measuring, it is not specific to reading. In short,
nothing was found in this study. The ability to detect a contrasting pho-
neme (the ‘‘odd one’’) in initial, middle, and final position in three-sound
words does not predict reading and spelling test scores.
Problems with the Sound Categorization test were reported by
Schatschneider et al. (1999), who were strongly critical of tests where
guessing plays a major role. In their normative study on 945 children in
kindergarten through second grade, they investigated the overlap (redun-
dancy) of seven phoneme-awareness tasks, one of which was the Sound
Categorization task. Six phoneme tasks were highly correlated with each
other. The Sound Categorization was an outlier in this analysis. Nor did it
‘‘load’’ on a general factor of phoneme awareness at high levels like the
other tasks. It was also extremely unreliable, producing a wide range of
scores (high standard deviations). This led the authors to conclude that
‘‘an inspection of the difficulty parameters for the Sound Categorization
subtest revealed that item difficult is highly dependent on where the target
word is placed in the string of words. This dependence, coupled with the
low discrimination parameters associated with these items, indicates that
this subtest is a relatively poor indicator of phonological awareness’’
(Schatschneider et al. 1999, 448). Despite Bradley and Bryant’s disap-
pointing results, they did not abandon their hypothesis. Instead, the data
were interpreted to fit the hypothesis, both in the conclusions to this study
and in the introduction to their training study, which opens with this line:
‘‘Children who are backward in reading are strikingly insensitive to rhyme
and alliteration’’ (Bradley and Bryant 1983, 419).
The training study is presented in the next section, because it primarily
demonstrates the benefit of using letters during phonological training.
Overall, these studies show that teaching larger phonological units
has little impact on reading and spelling skill. There is no evidence,
either from correlational studies or from training studies, that children
need to be eased into phonemes from larger units of sound. Large-scale
166
and others (Share et al. 1984; Yopp 1988; McGuinness, McGuinness, and
Donohue 1995) show that phoneme-analysis skills are strongly correlated
with reading and spelling, while syllable segmenting and rhyming are not.
Williams found that children in the control group could segment and
manipulate syllables just as easily as the children who had spent weeks
practicing this.
Phoneme-Awareness Training
ing or segmenting).
Group I worked exclusively in the auditory mode. Group II did the
auditory tasks and worked with plastic letters, as described below:
In the second half of the training sessions Group II was taught with the help
of plastic alphabet letters as well. Whenever a new sound category was intro-
duced, it was demonstrated first with the help of the picture cards in the nor-
mal way. But the child then made each word in the set with plastic letters. (All
of the children had by this time attended school for at least two years, and
they were quite familiar with the alphabet.) (Bradley and Bryant 1985, 88)
Group III (control) had an equal number of lessons with the same picture
cards, but sorted them into semantic categories, like animals, furniture,
and so forth. A final control group (Group IV ) had no special training.
If these English children had been in school for ‘‘at least two years’’
and ‘‘were quite familiar with the alphabet,’’ they would be reading at
some level. The groups should have been matched for reading skill, and
they were not.
The training was not how it was portrayed. Children did not ‘‘cate-
gorize’’ sounds, they did phoneme-identity tasks. Nor were children
trained in phonological awareness prior to learning to read. We have no
idea what the children were taught in the classroom. This study cannot
test the authors’ central hypothesis that training in phonological aware-
ness ‘‘causes’’ reading, because reading was well underway, and children’s
reading skill was not controlled.
Furthermore, plastic letters were not simply added to the mix as sup-
port for phoneme identification. They were used primarily for spelling
dictation. After the children had identified each phoneme in a word set,
they were asked to spell the words with plastic letters, which they selected
from a box. The authors commented that many children learned to ‘‘with-
hold certain letters’’ (not put them back in the box) when the word sets
shared phonemes in common: sand, band, land.
After 2 years of training (and 4 years of classroom reading instruc-
tion), the children were tested on standardized reading and spelling tests.
Group II (phoneme identification plus spelling with plastic letters) was the
only group to score in the normal age range (8 years). They were significantly
168
of all groups in spelling, and had a 2-year advantage over the untreated
control group. Group I (phoneme awareness only) was superior to the
untreated control group (6 months ahead in reading, and 1 year ahead in
spelling) but not to the semantic-categorizing group. The two control groups
did not differ from each other.
The NRP used a special formula to compute effect sizes for this study
because standard deviations were not provided. These effect sizes were
unusually high, especially for Group II (phoneme þ letters): ES ¼ 1.17
for reading compared to the semantic-categorizing group, and ES ¼ 1.53
compared to the no-treatment controls. Effect sizes for spelling were even
higher: ES ¼ 1.59 and 2.18. It strains credulity that 10 hours of lessons
spread over 2 years would generate such large effects.
A number of factors challenge the validity of these results and/or the
NRPs calculations. First, there was the failure to control for initial reading
skill. Second, there was the small sample size (13 children per group).
Third, the effect sizes computed by the NRP were far too large. Even in
the comparison between Group I (no letters) and the no-treatment con-
trol group, ES values were .86 and 1.0. Recall that Group I was trained
with pictures, saw no printed words or letters, identified phonemes in the
auditory mode only, did no phoneme sequencing of any type, and never
read or spelled anything. In the NRPs’ meta-analysis (see the previous
chapter), the ES values for this type of training were .38 for reading and
.34 for spelling compared to no-treatment control groups. Thus the ma-
jority of studies are distinctly at odds with this result.
Fourth, interpretation of the results is problematic, even if they were
accurate. The main effect appears to be a spelling effect, not a phoneme-
awareness-plus-letters effect. If phoneme-identification training was really
effective, then the phoneme-only group would have been superior to the
semantic-categorizing group, but it was not. Nowhere in the description
of these exercises did children ever read any words. Even for Group II, the
only reading was indirect, a consequence of spelling with plastic letters. If
these data are valid, this study has demonstrated the impact of spelling
practice on reading skill, supporting findings by Ehri and Wilce (1987)
and Uhry and Shepherd (1993).
Unless this study is replicated and these issues addressed, Bradley and
Bryant’s results cannot be relied on. So far, this has not occurred.
169
Phoneme-Awareness Training
nemes in conjunction with letters is better than learning with none. But
there are two ways to train phoneme awareness without letters. One is to
do this in the auditory mode alone, and the other is to use unmarked
tokens or counters to represent each sound. The token method does seem
to have some success.
Cunningham (1990) divided kindergartners and first graders into
three groups. She provided a program similar to Williams’s program,
with a range of activities to teach phoneme analysis, segmenting, and
blending. Tokens, but no letters, were used. Children were taught in
groups for 10–15 minutes twice a week, for 10 weeks (5 hours). One group
was taught with a matter-of-fact ‘‘skill-and-drill’’ approach, and a second
group with a more cognitive ‘‘metalevel’’ approach,’’ in which explana-
tions and goals were provided. A third group got no special training.
The results were puzzling. In kindergarten, the ‘‘meta’’ and the
‘‘skills’’ groups scored much higher than the controls on three phoneme-
awareness tasks (ES values all above 1.0), and did better on the Metro-
politan Reading Test as well (ES ¼ .57 and .43). The ‘‘meta’’ and ‘‘skills’’
groups were very much alike. They did not differ in reading, or on the
Lindamood Auditory Conceptualization (LAC) test of phoneme discrimi-
nation/manipulation, or on the Sound Categorization test, although the
‘‘meta’’ group was superior on the difficult phoneme-deletion test
(ES ¼ .81).
The first-grade results were different. Although the ‘‘skills’’ group far
outshone the controls on all the phoneme-awareness tasks (ES range ¼
.76 to 1.46), this apparently had no impact on reading (ES ¼ .09). The
‘‘meta’’ group did even better compared to controls on the phoneme tasks
(ES range ¼ .83 to 2.08), and this did impact reading (ES ¼ .53).
Something external to the study may be responsible for the strange
result, in which a high degree of expertise in phoneme awareness confers
no benefit for reading in one group of children but does in another.
Perhaps reading test scores were nonnormally distributed for these two
groups, or something was going on in the classroom that affected these
results. Otherwise, one would have to argue, despite the vast amount of
evidence to the contrary, that phoneme awareness taught in first grade has
no effect on reading unless you receive a ‘‘metalevel’’ explanation for why
you are learning it! This seemed to be Cunningham’s argument. She
170
years old and nonreaders. They were divided into three groups and given
Phoneme-Awareness Training
20-minute lessons four times a week for 7 weeks (about 9 hours). The first
group did phonological exercises, learned letter names and sounds for nine
letters (a, m, t, i, s, r, f, u, and b), practiced segmenting and blending, and
did ‘‘word assembly’’ with tokens and letter tiles. Children in this group
never used more than two letter tiles (plus blank tiles) in any one session,
and so never saw the words spelled in full. The second group focused
on general language activities (vocabulary, categorizing), but also learned
the nine letter names and sounds. There was no phonological training
for this group. The third group participated in their ordinary classroom
lessons.
An in-house reading test was designed with words composed of the
letters that were taught. Recall that none of the groups had seen these
words, though group 1 had seen portions of them. Group 1 read 10.9
words correctly, group 2, 3.9 words, and the control group, 2.2 words.
The effect sizes comparing group 1 and the other groups were substantial
(ES ¼ .71 and .98). However, this did not transfer to a standardized
reading test. This was reported as a ‘‘significant difference,’’ yet there were
large floor effects on the reading test, which puts statistical analysis off
limits.
The definitive study on this issue was carried out in Germany by
Schneider and his colleagues (2000). It is notable for the extensive and
informative test batteries that were employed. These provide a clear pic-
ture of which phoneme skills are easy or hard to teach (a direct test of the
developmental theory). And, by adding an extra experimental group, they
were able to pin down the value of teaching phoneme awareness alone,
letter-sound knowledge alone, or a combination of the two. This was a
complex study, and before I get into the details, I need to alert the reader
to certain factors that influence how these findings can be interpreted.
First, in Germany, where kindergarten originated, it was intended to
be precisely that: a ‘‘children’s garden,’’ where children play and interact
socially. From a research standpoint, this is useful, because children in the
control group are really taught nothing about how to read, and parents are
discouraged from teaching reading at home. Second, parents and teachers
in Germany have strong feelings about the value of this practice. The
authors reported that many parents and teachers were opposed to intro-
ducing any type of training at the kindergarten level, and obtaining
172
Phoneme-Awareness Training
was no writing in this component. The combination group that had both
types of training learned fewer letters and spent less time on the phono-
logical tasks. This group had 20 weeks of training.
Special tests were designed to measure phonological-awareness skills,
along with verbal short-term memory, naming speed for colors and pic-
tures of objects, and measures of early literacy such as letter knowledge
and word recognition. These tests were given prior to the start of training
and after training was completed. Pretraining scores on the phonological
tasks were poor for everyone, and when children had to choose among
alternatives, performance was no better than chance. Test scores were
reliable after training was completed in July.
It is informative to look at the order of difficulty of these tasks. This
order was nearly identical for all groups regardless of the type of training.
From easy to hard, the order was: (1) Identify (say) an initial phoneme in a
word. (2) Blend isolated phonemes into a word. (3) Segment phonemes in
a word. (4) Identify the odd one out in a set of rhyming words that varied
in the final phoneme (part of the rhyme). (5) Delete phonemes. Say the
word that remains after the initial phoneme is deleted. (6) Carry out the
‘‘alliteration’’ portion of the odd-one-out task. Children did not score
above chance on this test.
These results support Chaney’s findings reported earlier (Chaney
1992). In her study, blending phonemes was the second-easiest phono-
logical task for 3-year-olds, and the alliteration tasks were the most diffi-
cult out of 22 tasks. The results also support Schatschneider et al.’s (1999)
finding that the Sound Categorization task is one of the most difficult
among a variety of phoneme-awareness tests. As seen above, even the
phoneme-deletion test, the most difficult of the phoneme tests, was easier
than the alliteration task.
Schneider et al. found a large impact of training on phoneme-
awareness skill for both the PA group and the PAþLS group. Both groups
did significantly better than the normal control group, with the PA group
having a decided advantage. However, none of the groups had much suc-
cess reading simple words.
In the fall of the same year, children entered first grade and were
taught to read in the usual way. The authors do not describe the reading
174
Table 6.2
Chapter 6
Effect sizes from Schneider, Roth, and Ennemoser 2000: kindergartners at risk vs.
normal controls, three types of training
Phoneme Phoneme Initial
Kindergarten synthesis analysis phoneme
Pretest
PAþLS vs. controls .45 1.0 1.12
Posttest
PAþLS vs. controls .34 .61 .90
PA only vs. controls .55 .96 1.18
Reading
speed Spelling Comprehension
End grade 1
PAþLS vs. PA only .17 .27
PAþLS vs. LS only .22 .49
PAþLS vs. controls .38 .17
End grade 2
PAþLS vs. PA only .31 .30 .19
PAþLS vs. LS only .48 .48 .20
PAþLS vs. controls .30 .32 .50
Note: Tests used: German standardized tests on reading fluency, spelling, and
comprehension. PA ¼ phoneme-awareness training. LS ¼ letter-sound training.
spelling, except that the PAþLS group was also statistically superior to the
Phoneme-Awareness Training
LS group, especially at the second testing. In other words, the PAþLS
combined training had largely eliminated the difference between these at-risk
children and the normal children. Training in phoneme awareness alone, or
learning 12 letter sounds alone, did not erase this gap.
I calculated the ES values for the most successful group (PAþLS)
versus all other groups. We can see that this group is ahead of the PA
group, and especially the LS group, in reading, spelling, and comprehen-
sion. Although the ES values are consistently negative in the comparisons
to the normal control group, most values are not statistically significant. It
would be interesting to see what happens in the future. Differences in IQ
may come to matter more over time, especially because the training did
not eliminate the difference between the normal children and the at-risk
(low-IQ) children on the comprehension test.
This study showed that phoneme-awareness exercises alone, or
sound-symbol associations alone, have little impact on learning to read. At
least this is the case in countries with a transparent alphabet, where
teaching is appropriate. But we are left wondering what would have hap-
pened if all the phonemes had been taught (and their letters) and not just
12 of them, and if the letter-sound group had had the same 20 weeks
training that the other two groups received.
These at-risk children, apart from having lower IQs, started off at
a distinct disadvantage compared to the normal children on several of
the readiness measures. They knew half as many letter-sound corre-
spondences as the normal children, and had lower phoneme-identity and
phoneme-segmenting scores. These things have to be taught. It may be
the case that the ‘‘normal’’ children entering kindergarten had more of a
boost at home than their parents let on. (Wimmer reported that, in Aus-
tria, some first graders know all the letter-sound correspondences and
some know none, even though parents are told to teach nothing.)
Although programs combining phoneme and letter training always
produce a greater impact on reading test scores than either alone, if we ask
whether any of these combination programs does a better job than a good
linguistic-phonics program, the answer is still no. The only exception here
is the study above, which measured fluency and comprehension for chil-
dren learning a transparent alphabet.
176
edge, both groups doing well (24 correct). Likewise, both groups did well
Phoneme-Awareness Training
on the Woodcock Word ID subtest, and phoneme-identity training played
no role (ES ¼ .21), nor did the training affect an in-house spelling test
(ES ¼ .09). However, the trained group did outshine the controls in
decoding on an in-house nonword test (ES ¼ 1.2). This result suggests
that the training had oriented the children toward the alphabet principle.
At this point in the data analysis, the authors chose to re-sort the
children into two new groups (regardless of the prior group assignment)
based on whether they passed or failed the phoneme-identity test at the
end of kindergarten. (The pass/fail cutoff was 66 percent correct.) The
new groups were compared statistically on various reading and spelling
tests. When groups no longer reflect who did or did not get trained, there
is no way to measure the outcome of training! Furthermore, this brings
uncontrolled factors into play, raising the following question: Why would
some children trained on ‘‘phoneme identity’’ fail a phoneme-identity test,
and some of the untrained children pass? IQ and home environment be-
come important, and neither variable was controlled. The new results
were highly significant, which was not the case previously.
The reassignment of children to groups based on their ability explains
the extremely high effect sizes reported by the NRP for this group of
studies. The NRP averaged the results together with those of the original
groups, enormously inflating effect sizes for this study as a whole. (ES ¼
1.61 for reading and 3.14 for phoneme awareness, by far the largest values
we have seen.) This grossly exaggerates the impact of this training pro-
gram, as well as the meta-analysis values as a whole. We know these values
are false by a table of first-order correlations from the 1993 publication,
which showed that the strongest contributors to reading and spelling test
scores were blending skills and letter-sound knowledge, not phoneme-
identity skills.
In subsequent follow-on reports on these same children, similar re-
sults were found at the end of grades 1 and 2, and at grade 5 (Byrne and
Fielding-Barnsley 1995, 2000). In-house tests were used to measure read-
ing and spelling in the early grades. The two groups did not differ on
these tests, with the exception of a small but significant effect for nonword
decoding. At the fifth-grade level, most tests were standardized tests. The
trained group, now 11 years old, outscored the control group on the
Woodcock Word Attack test ( p < :04), though the effect size was small
178
Phoneme-Awareness Training
phoneme-awareness training regime has a special benefit over and above a
good linguistic-phonics program. The really effective reading programs
reviewed in the previous chapter have far more impact than anything we
have seen here. Surprisingly, the strongest support for a positive impact of
special phoneme-awareness training comes from the European studies
(Schneider and colleagues; Lie) in countries with transparent alphabets,
and where traditional classroom instruction is likely to be similar to lin-
guistic phonics. However, neither Schneider and associates nor Lie reveal
what specifically was being taught in the classroom.
for the special kindergarten program (ES ¼ 1.08). However, this knowl-
Chapter 6
1. The same spelling for the same sound can occur in more than one of these
syllable types. The spelling o for the sound /oe/ occurs in both ‘‘open’’ and
‘‘closed’’ syllables: go, most. Ditto ee as in fee and feed; ie as in die and died; ue as
in cue and fuel; oo as in too and food. The same word fits more than one syllable
type: goat (‘‘vowel team’’ and ‘‘closed’’), care (‘‘final e,’’ ‘‘vowel þ r’’), soar
181
Phoneme-Awareness Training
is a sight-word, plus basal-reader, plus disconnected-phonics curriculum.
Teachers also used parts of the DISTAR program as well. Children read
trade books from classroom libraries. This is a typical eclectic mix.
Both groups of children were taught in a whole-class format. And
both groups received whole-class spelling instruction using the Scott-
Foresman spelling program, which is vaguely phonics oriented (‘‘regular’’
CVC words first, followed by the ‘‘long-vowel’’ words, blends, and
digraphs, gradually increasing in length and complexity).
The teachers in the experimental program received 13 hours of train-
ing. No special training was given to the teachers of the control group.
The comparison between the two programs showed that the experi-
mental group had much higher phoneme-segmenting and letter-sound
knowledge scores (ES values ¼ 1.0), as would be expected. However, they
were only marginally better on standardized reading and spelling tests
(ES ¼ .35 for word recognition and .38 for spelling). The children were
followed up at the end of second grade, and standardized reading and
spelling tests were given again. Effect sizes for reading improved slightly, in
favor of the experimental group: ES ¼ .44 for word-recognition tests and
.46 for word attack. However, the ES value for spelling was zero. These
values are not particularly remarkable for 2 years of special teaching.
This study had a major design flaw if one of the goals was to evaluate
the impact of the kindergarten phoneme-awareness training. It is missing
a critical control group, and possibly two. This would be a group of chil-
dren who did not receive the kindergarten phoneme-awareness program
but did receive the first-grade phonics program. To be absolutely certain
that the kindergarten program was relevant, a fourth control group is also
needed. This group would receive the kindergarten program but be
switched to Scott-Foresman at first grade. Instead, the control group got
neither the kindergarten program nor the new reading program. In view
of this, one would expect to find a much more substantial advantage for
the experimental group, especially if both the kindergarten program and
(‘‘vowel team,’’ ‘‘vowel þ r’’). Furthermore, this classification scheme does not
fit multisyllable words at all, which make up about 80 percent of the words in
English.
182
the first-grade program were effective. Instead, effects sizes were modest,
Chapter 6
Phoneme-Awareness Training
group (phonological awareness þ reading þ linking activities) got a Mul-
ligan stew of reading methods: a variant of whole language in Reading
Recovery lessons, some type of phonics in the classroom, multisound-unit
phonological tasks, a variety of phoneme-analysis tasks, plus exercises to
‘‘link’’ phonological knowledge and the alphabet code.
Perhaps this mix was too confusing, because neither the combined
group (predicted to be superior to all groups) nor any of the other groups
did particularly well. No differences between the groups were found on an
in-house word-recognition test or on the British Ability Scales, a standard-
ized test of word recognition. There was a small but consistent advantage
for the combined PA þ read group on the remaining standardized tests, as
shown in table 6.3. The other groups did not differ from one another.
Table 6.3
Effect sizes: Data from Hatcher, Hulme, and Ellis 1994
Neale
Neale Compre- Schonell
Accuracy hension Spelling
Testing age 8:1
Read þ PA vs. read only .45 .41 .31
Read þ PA vs. phonemes only .40 .52 .14
Read þ PA vs. controls .52 .61 .33
The comparisons between the combined group and the other three
Chapter 6
groups produced similar effect sizes (see table 6.3), with ES values in a
modest .30–.45 range. This is not much to show for 20 hours of one-on-one
help. In other words, neither Reading Recovery nor phonological-aware-
ness training on its own provided any greater benefit to these poor readers
than what was taught in the phonics classroom, which was not much.
More revealing were the combined PA þ read group’s age-equivalent
scores on the standardized tests at beginning and end of training, and at
follow-up 9 months later. These scores are shown at the bottom of table
6.3. The children began the study scoring 1.5 years below age norms on
all tests. The table lists the years:months these children lag behind age
norms for reading and spelling at two testing times (ages 8:1 and 8:8
years). The discrepancies between these scores and national age norms
actually increased over time.
For reasons unknown—the mix of methods, the training programs
themselves, poor teacher training or monitoring—these programs were
not effective in getting the children caught up. Other intervention pro-
grams work much better, and in a much shorter space of time (see D.
McGuinness 1997c, 1998b; C. McGuinness, D. McGuinness, and G.
McGuinness 1996). The rule of thumb for successful remediation is to
remain faithful to the prototype, and to avoid teaching skills that have
nothing to do with an alphabetic writing system.
Finally, Brennan and Ireson (1997) carried out a study that, despite
methodological problems, is the most direct test of whether phonological-
awareness training provides any particular advantage over a good phonics
program. They adapted a Lundberg-type program for kindergartners (age
range 4:10 to 6:1 years) attending an American school in England. The
children spent 3 months on phonological units above the level of the
phoneme. These included listening to nonverbal sounds, playing rhyming
games, clapping out syllable beats, and using markers to represent syllables
in multisyllable words. In the middle of the third month, children began
learning phonemes in initial position, and subsequently in all positions
over the course of the school year. Lessons were mainly in the auditory
mode only (no letters). However, there was a major confound in this
study. The phonological program took up 15–20 minutes of a 2-hour
language-arts period, during which children learned letter names and
sounds, copied letters, and wrote ‘‘stories’’ using invented spelling.
185
Phoneme-Awareness Training
groups. One group was using a published phonics program (Success in
Kindergarten) designed to integrate learning sounds and letters. Copying
and writing words was a particular emphasis. The third group was taught
‘‘Letterland’’ characters (letters drawn to look like animals) to teach letter
sounds. This is a visually driven approach that focuses on letter shapes and
‘‘the sounds the letters make.’’ Lessons include hands-on activities to do
with memorizing letter forms, such as tracing letters in the sand tray and
making letters out of play dough. Children recite rhymes related to the
target letter.
Both the phonological training group and the Success in Kinder-
garten group were considerably advanced on standardized reading and
spelling tests compared to the Letterland group at the end of the year.
However, the phonological-awareness group was not superior to the Suc-
cess in Kindergarten group. When I compared the two programs, effect
sizes were in favor of Success in Kindergarten: ES ¼ .38 for Schonell
reading, .56 for ‘‘high-frequency words,’’ and .23 for Schonell spelling.
(These values are different from the NRP’s effect sizes.)
Brennan and Ireson do not report which phonemes were taught for
any group, or how they were taught, and there is the fact that about 90
minutes of other reading activities were ongoing in the classrooms. In
contrast to the situation in most other European countries, children in the
United Kingdom are taught to read in kindergarten. The only clear result
was the extremely poor showing of the Letterland program. There is no
way to tell whether the differences between the other two programs had
anything to do with specific characteristics of these programs.
Conclusions
As a general observation, one of the most consistent findings to
emerge from these studies is that phoneme-identification and phoneme-
sequencing (segmenting/blending) training are the only phoneme-analysis
skills that consistently impact reading test scores. This confirms the evi-
dence from the correlational research. Helfgott (1976) was among the first
to discover that segmenting skill for CVC words was the highest correlate to
reading 1 year later (r ¼ :72) among a variety of phoneme-awareness skills.
The definitive study was that of Yopp (1988), who measured
kindergartners’ performance on 11 phonological-awareness tests. She
186
investigated the statistical overlap between these tests and correlated each
Chapter 6
test score with the time to learn to read novel words. Yopp’s findings
mirror the training studies. The highest correlates of learning rate were
‘‘sound isolation’’ (phoneme identification) (r ¼ :72), and phoneme-
sequencing tasks (blending and segmenting) (average r ¼ :67). Auditory
discrimination (r ¼ :27) and rhyming skills (r ¼ :47) did not predict
learning rate. Phoneme-deletion tests were too difficult for this age group.
For those knowledgeable about factor analysis, phoneme-
identification and phoneme-sequencing tasks loaded on the same factor
(factor I) at values ranging from .76 to .89. Auditory-discrimination and
rhyming skill loaded on none, and phoneme-deletion tests loaded on a
separate factor (the Rosner test loaded on factor II at .94).
It should be noted that phoneme identification and sequencing are
precisely the skills that are trained in a linguistic-phonics program as a
matter of course.
Despite the overall agreement between these studies, most studies
reviewed in this chapter have a number of design flaws that make it diffi-
cult to know exactly how or whether a phonological training program
impacts reading and spelling. There is little attempt to discern which of
the many phonological tasks are truly necessary. Often there is no de-
scription of which phonemes and letters (if used) are taught. And when
this information is provided, too few sound-symbol correspondences are
included in the program, and far too much time spent teaching them. On
the whole, the majority of programs grossly underestimate what 5-year-
old children (or even younger children) can learn. One is struck by the
fact that the balance is exactly backward between the number of phonemes
taught (and their spellings), and the time spent on larger phonological
units. The unwritten assumption seems to be that teaching phonemes and
letters is hard for young children, but teaching lots of ‘‘unnatural’’ and
conflicting phonological tasks is not. This appears to be one of the many
legacies of the phonological-development myth.
The evidence is not convincing that special programs for teaching
phoneme awareness and letter-sound correspondences instead of using a
good linguistic-phonics program at the outset provides any additional
benefit.
This was not the conclusion of the NRP, which seemed to take it for
granted that phoneme-awareness skills are so hard to learn that separate
187
Phoneme-Awareness Training
introduction to the report that Texas and California have prescribed the
inclusion of phoneme-awareness training as part of early reading instruc-
tion in the context of praising these two states for their forward-looking
policies. (We have already seen what happened the last time California
mandated a curriculum.)
If special phoneme-awareness training is essential, precisely which
of the many training programs reviewed here should teachers adopt?
And what is the evidence that any of these programs confers a benefit
beyond what a good linguistic-phonics program confers? So far there is
none.
The summary to the NRP report on phoneme awareness (‘‘Implica-
tions for Reading Instruction’’) shows that the authors have not considered
the overlap between phoneme-awareness training and a linguistic-phonics
program. For example, in a question-and-answer section, the NRP ad-
dressed the question of whether phoneme-awareness training helps chil-
dren learn to read and spell. The panel stated that teaching children to
manipulate phonemes ‘‘transfers and helps them learn to read and spell.
PA training benefits not only word reading but also reading comprehen-
sion. PA training contributes to children’s ability to read and spell for
months, if not years, after the training has ended’’ (p. 2-40).
As to which phoneme-awareness method has the greatest impact on
learning to read, they wrote: ‘‘Teaching students to segment and blend
benefits reading more than a multiskilled approach. Teaching students to
manipulate phonemes with letters yields larger effects than teaching stu-
dents without letters. . . . Teaching children to blend the phonemes repre-
sented by letters is the equivalent of decoding instruction’’ (p. 2-41). No,
it is identical to decoding (reading) instruction.
As for spelling: ‘‘Teaching children to segment phonemes in words
and represent them with letters is the equivalent of invented spelling in-
struction’’ (p. 2-41). No, it is identical to proper spelling instruction.
Spelling should never be ‘‘invented.’’
The authors of the NRP report appeared to be satisfied with the
quality of the studies and the validity of the large effect sizes reported in
their analysis—effect sizes that, as we have seen, are grossly inflated due to
invalid in-house tests and other methodological anomalies. Nowhere is
this more evident than in a statement that misrepresents the fact that
188
effect sizes based on standardized tests are marginal (i.e., ES ¼ .33 for
Chapter 6
have been carried out on English-speaking children who read slowly and
Chapter 7
Reading Fluency
measured by the number of words uttered per unit of time in normal
conversations. The natural speaking rate for English is about 250 to 300
words per minute (wpm).
Because this rapid speaking rate makes conversation possible, one
would imagine that for readers to process meaning at the brain’s preferred
rate (to comprehend what they read), they should be able to decode at the
same rate as people speak. What might this optimal reading rate be? In his
review of eye-movement research, Rayner (1998), described a study with
this goal. College students were identified by their excellent performance
on a reading-comprehension test. They were asked to read passages at
their optimum reading rate. The fastest reader was clocked at 380 wpm
and the slowest at 230 wpm. The average for this group of expert readers
was 308 wpm. This is about the same as the optimal speaking rate, and
even this takes years to accomplish. The normal second-grade reader
reads about 90 wpm. This jumps to 150 wpm at fourth grade. By sixth
grade, the average reader is closing in on the rate of the slowest college
reader (200 wpm).
The variability across age, and the individual differences in reading
speed, mean that defining a ‘‘slow reader’’ is not going to be easy. The
problem is compounded when English children are compared to children
who learn a transparent alphabet. As noted earlier, Wimmer, in collabo-
ration with English colleagues (Wimmer and Goswami 1994; Landerl,
Wimmer, and Frith 1997), compared normal children from Salzburg and
London. The Austrian 7-year-olds with 1 year of school read as fast as
English 9-year-olds with 4 years of school, making half the number of
errors, an eightfold increase in efficiency. When the worst readers in
Salzburg (very slow) were pitted against the worst readers in London (very
inaccurate), the Salzburg children read the same material twice as fast while
misreading only 7 percent of the words. The English children not only
read more slowly, but misread 40 percent of the words. ‘‘Slowness,’’ it
seems, is a function of the writing system, not a property of the child.
If ‘‘slow reading’’ is relative, tied to a particular writing system and
method of instruction rather than to age or innate ability, there are no
guidelines for determining an optimum reading rate other than anecdotal
reports of the teachers or complaints of individual children. To complicate
matters, the content of the reading material also determines reading rate.
192
Difficult material is read more slowly, with many more regressive fixa-
Chapter 7
words that begin with those particular letters, have that shape, fit that slot,
Reading Fluency
and relate to the meaning of the story. Human brains are especially good
at associative pattern matching, and superb at anticipating meaning, a
phenomenon first reported by William James. Today, this is known as
analysis by synthesis or top-down processing. Listeners continually anticipate
words that are coming up next in a speaker’s utterance. This is the rea-
son that puns, other forms of word play, and sudden or odd shifts of
context are surprising and amusing, because certain words mismatch our
expectations.
Of course, people usually do not do this consciously. No one reading
this book is aware that they see print they are not looking at. Nor can we
monitor the fact that our brain is busily putting this ‘‘nonseeing’’ to good
use. How then can we get slow readers to make use of this peripheral
information to speed reading along? One way would be to train this fac-
ulty directly, by setting up a series of exercises for slow readers to practice
making peripheral glances or to fixate straight ahead while trying to iden-
tify blurred shapes on the right. However, it turns out that there is a much
simpler solution, one that produces the same results without the need to
make an unconscious process conscious.
Reading Fluency
the impact of ‘‘sustained silent reading’’ or a similar approach. There was
no evidence to support the idea that having children read for a fixed
period of time, inside or outside the classroom, made any difference to
vocabulary, reading comprehension, reading attitude, word recognition, or
performance on general achievement tests and standardized tests. The
NRP authors concluded as follows: ‘‘None of these studies attempted to
measure the effect of increased reading on fluency. Instead, most of these
studies considered the impact of encouraging more reading on overall
reading achievement as measured by standardized and informal tests. It
would be difficult to interpret this collection of studies as representing
clear evidence that encouraging students to read more actually improves
reading achievement’’ (p. 3-26).
They went on to stress that the poor quality of this research provides
no definitive proof, one way or the other, that a regime of scheduled or
controlled silent reading helps reading achievement. Given the fact that
the outcome measures included the whole spectrum of reading compe-
tence (except fluency), this is a scathing indictment. This pattern has been
observed in every topic area so far. It is clear that we urgently need a
proper database for scientific research on reading. ERIC has certainly
never fulfilled that function, and now, it appears, neither does Psych-
INFO. The NRP’s discovery that only three of the studies in their review
were methodologically sound is also an indictment of the research journals
themselves and their editorial boards. Teachers need our help. Teachers
should be able to find out how or whether time spent reading makes a
difference to reading skills.
Rereading
Teachers have known for a century or more that rereading text will in-
crease reading speed. Slow readers read faster after they have read the
same story or passage many times. This technique has such an ancient
history that E. B. Huey described it in his famous book on reading in
1908. However, the central problem with this technique is whether
improved speed on one story will transfer to another story. If fluency is
specific to only one passage or story, the rereading technique is worthless.
The NRP data search on rereading was much like the previous one,
covering a wide range of programs with fanciful names like ‘‘echo reading’’
196
and ‘‘neurological impress.’’ The same tireless search was carried out, with
Chapter 7
Reading Fluency
sight word, rereading) in a completely crossed research design, in which
second-grade children were trained in one, or none, or any possible com-
bination of these three methods. Dahl found that context-based training
and the method of rereading were equally likely to enhance reading accu-
racy, but that rereading was most likely to increase reading speed. Sight-
word practice on the 800 isolated words that constituted the text had no
effect on either accuracy or speed.
While these studies were ongoing, Chomsky (1976) developed a
rereading technique in which children read along with an audiotape. She
reported that children improved in reading speed and accuracy, but there
was not any hard evidence to back up this claim.
Since this early work, a number of important issues have come to
light, and there has been increasing scrutiny of the assumptions about
what rereading actually achieves. Because so many factors are critical in
optimizing the effects of rereading, I will outline them here before dis-
cussing the more recent research.
Here are the questions that need to be addressed in doing research on
rereading or in designing an effective rereading program:
deemed a success. First and second, reading speed should increase with no
loss of accuracy (or an increase in accuracy) as readers read faster. Third,
oral rereading should produce increasingly appropriate phrasal boundaries
and inflection (prosody). Fourth, comprehension should improve. Finally,
there should be transfer effects. Reading speed should increase from
one story to the next. If a criterion reading speed is set, this should be
achieved more quickly with each new story. Accuracy and comprehension
should improve with each new story as well. Getting children to read
‘‘fast’’ achieves nothing if they are inaccurate, fail to comprehend what
they read, and do not stay ‘‘fast’’ from one story to the next.
Sorting out these issues turned out to be more difficult than people
imagined. It is easy to get slow readers, accurate or inaccurate, to read
much faster in a short space of time (within an hour). It is not easy to im-
prove comprehension of that same passage or to show transfer effects. We
may not have all the answers to the questions raised above, because indi-
vidual researchers study different problems, and the children in the studies
vary in age and reading skill, but we are getting close.
Setting a target criterion seems, on the face of it, a better approach
than using an arbitrary number of repetitions—in other words, letting
children reread without a goal. Children report they like rereading and
enjoy having a goal. Because the studies vary, there is no direct proof of
this assertion. What is important is that if a goal is set, children are capa-
ble of meeting it. This raises the question of where to set the goal, because
rereading stops when the goal is met.
sion, twice a week, for total of about 21 days (210 minutes) spaced out
Reading Fluency
over 3 months. The initial speed on the fifth and final story was 70 wpm, a
clear improvement over story 1 (transfer effect). The final speed on story 5
was the same as on story 1 (92 wpm) because rereading stopped when the
children met the target goal. (Would they have improved further if the
target goal had been shifted higher for each story?) Accuracy showed
excellent transfer. The average error score was 11 on the first reading of
story 1, and this dropped to 7.6 on the first reading of story 5, a highly
significant effect ( p < :01).
Rashotte and Torgesen (1985) were interested in the impact of word
repetition on transfer. The children were 8.5 to 12 years old. No target
goal was set. Instead, improvement was measured by how much faster the
child read after a fixed number of rereadings. Children’s initial reading
speed ranged from 31 to 62 wpm, with an average of 50 wpm. Because all
the stories were at a second-grade difficulty level, reading accuracy was
already good, and there was scarcely any room for improvement.
The children read about 15 minutes each day, for a total of 7 days
(105 minutes). They were divided into three groups, and each group read
slightly different materials. Two groups read various passages from the
same story, rereading the same set of passages four times each session. For
group 1, 60 common words repeated many times across all the passages.
For group 2, the same story and the same passages were read, but the
common words were replaced by synonyms and there was little word
overlap. Both groups read passages from the same story on 28 occasions.
Group 3 read 28 different stories, an important control for the impact of
‘‘reading a lot.’’
The two rereading groups increased their speed by 34 wpm to 84
wpm from the first to the last session. The reading speed for the children
who read 28 different stories improved by only 5 wpm, proof that reread-
ing increases speed but that ‘‘reading a lot’’ does not. When they looked at
transfer in terms of speed, accuracy, and comprehension, only speed was
found to transfer from one story to the next, and this effect was stronger
for group 1, where passages contained the repeating words.
There were differences between the two studies that make them hard
to compare. Superficially it looks like the gains in speed (93 and 84 wpm)
were similar. But text difficulty varied. Herman’s training was spread over
3 months; Rashotte and Torgesen’s was concentrated into 7 days. In both
200
Rashotte reported no gains, but this may be because the stories were too
easy and comprehension was nearly perfect to start with. Herman did find
gains and transfer, but her comprehension measure was indirect and
vague: ‘‘errors in context sensitive word substitutions.’’ This measure
sheds no light on whether story content was understood or remembered.
Nevertheless, there is agreement on certain facts. Rereading is effec-
tive almost immediately and children can reach a target speed when re-
quired to do so. Reading speed shows a transfer effect, particularly when
there is some overlap in the words. Rereading is considerably more effec-
tive than reading the same number of different stories. Reading for speed
does not occur at the expense of accuracy but actually enhances it. Im-
provement in accuracy is much easier to demonstrate when story content
is close to reading level or grade level (not too easy). Like speed, im-
provements in accuracy will transfer from one story to the next. Measures
of comprehension were confounded in both studies, and better methods
were called for.
In 1987, Dowhower published the most comprehensive study to date.
She investigated transfer effects for speed, accuracy, and comprehension,
along with measures of prosody. She also investigated two types of re-
reading experience. One group of children read stories out loud without
assistance, and the other group read along with an audiotaped version of
the same story. The target goal in both cases was 100 wpm.
Dowhower was interested in beginning readers who were just making
a transition from word-by-word decoding to more fluent reading. She
screened 89 beginning second-grade students and selected 17 who fit the
profile of ‘‘accurate but slow’’ readers. Children read the stories aloud,
whether working on their own or with an audiotape. The group working
with the audiotape listened to the story first, then rehearsed it out loud,
with the goal of being able to read in synchrony with the tape.
The study design was complex. There were five stories (numbered
1–5), all at a second-grade level and 400 words long. There were two
additional stories (200 words each) at the same level of difficulty: story A
was read only once at the very first session (baseline), and story B was read
only once at the very last session (the final transfer test). The rereading
part of the study began at session 2. The children read the first 200 words
of story 1 over and over until they reached the target of 100 wpm. This
201
Reading Fluency
how close the children were to achieving the target speed. Reliability
checks were made throughout by using a second observer. When the
children reached the target speed, they were asked to read the second half
of the story—the last 200 words (the first transfer test). This transfer test
preserved story context. Word overlap between the two halves of the story
was not controlled.
When the children finished reading the transfer passage (once), they
were asked to choose another story, and the process began again. This
continued until all five stories were read at the target speed of 100 wpm.
At the very last session, the children read the unrelated story (story B—
final transfer) one time, and the experiment ended. Each child met with
Dowhower for 15 minutes most days of the week, and this continued for
as long as necessary to achieve the target goal on each story. The study
lasted a total of about 7 weeks, approximately 7.5 hours per child. This is
twice as long as the longest study above (Herman).
Gains and transfer effects were measured for speed, accuracy, and
comprehension. Comprehension was measured by asking different ques-
tions about each story on the first and last trials. Statistical comparisons
were made between the initial and final reading of the same story, the
initial reading and transfer portion of the same story, all five initial read-
ings, all five final readings, and the readings of stories A and B. Also,
measures of prosody were coded from the child’s tapes, and included
things like inappropriate pauses, reading phrase length, and intonation.
The results of the study are shown in table 7.1.
All contrasts in the table are significant for both groups of children,
and there were no differences between reading alone or with an audiotape
on any measures of speed, accuracy, or comprehension. Speed improved
on the transfer passage (the last 200 words) from story 1 to story 5 by about
10 wpm. Accuracy was already good at the outset (the children were
chosen for being accurate) and only got better. The results for compre-
hension were particularly impressive, improving from 57 percent (story 1)
to 72 percent (story 5).
Transfer effects were also high when the children switched to a new
story, as shown in the comparison of the first reading of story 1 to the first
reading of story 5. Both groups began at 41 wpm and improved to 58–65
wpm, which is close to normal for beginning second graders. Again,
202
Table 7.1
Chapter 7
accuracy was good at the outset and there was not much room for im-
Reading Fluency
provement. Comprehension improved markedly even though the story
contexts differed, from 57 percent correct (story 1) to 66–79 percent cor-
rect (story 5).
The most impressive result was the savings across the five stories in
how many trials (rereadings) it took to reach the criterion of 100 wpm.
Both groups had great difficulty achieving this target goal on the first
story, taking an average of 15 attempts. By story 5, the number of at-
tempts was down to 4.5. What was once difficult had now become easy.
There was a large improvement between story A and story B, the
stories read only once at the beginning and end of sessions. Speed in-
creased from about 36 wpm to 65 wpm for both groups. Accuracy in-
creased from 178 words correct to 190 correct out of 200 words.
Comprehension on story B was 81 percent correct.
The systematic effect of transfer across all sessions is illustrated in
figure 7.1, which shows the progression from the very first story (story
A), through each initial reading of stories 1 through 5, to the final story
(story B).
Nearly all measures of prosody showed significant improvement as
well for both groups. However, the children reading with an audiotape
improved more noticeably (and significantly) on several measures, show-
ing that ‘‘reading with expression’’ improves more easily with a model. It
is interesting that having a model did not have a differential effect on
measures of speed, accuracy, or comprehension. One would imagine that
hearing a story read with expression would assist comprehension by en-
hancing meaning.
Dowhower recommended that teachers use the rereading technique
(either method), because it obviously works and children like it. She sug-
gested using the assisted method first (audiotape), especially for children
who read extremely slowly, and shifting to unassisted reading when chil-
dren reach 60 wpm, when they seem to do better on their own. She ob-
served that very slow readers were less frustrated working with the tape
than working alone. She also pointed out that this is not a quick fix. The
children did not make significant gains at the transition from story 1 to
story 2, except marginally for speed. She had several suggestions for fur-
ther research, such as looking at the relationship between prosody and
comprehension, examining the different populations of children who
204
Chapter 7
Figure 7.1
Mean scores for rate (WPM), accuracy, and comprehension for the first reading (trial 1) at the initial pretest
(Story B). From Dowhower, 1987, p. 400.
205
Reading Fluency
should be used as a ‘‘crash course’’ in getting children up to speed or as
part of normal classroom instruction, interspersed with other lessons.
I want to add another suggestion. All these extremely slow readers
dutifully achieved 100 wpm, and this got easier and easier with each story.
When target goals are set, the children can never go beyond them. A
moving target might produce longer-lasting results. For example, by the
third story, the children needed only seven trials to get from a starting
speed of 55 wpm to 100 wpm. Achieving 100 wpm, however, did not have
much impact on the starting speeds of the next two stories (60 wpm and
61 wpm), despite the fact that the children could read at 100 wpm with
good comprehension and no loss of accuracy. If the bar is raised a little
each time, children might start coming closer to 100 wpm on the first
reading.
Levy and colleagues got the usual rereading benefits in reading speed
Chapter 7
and also found that students could do the secondary task equally well as
they read faster and faster. Misspelled words were easier to detect (at
around 85 percent accuracy) than words that breached rules of syntax or
semantics (around 70 percent). An error in word spacing or an abrupt
change of font produced some interesting effects. They found that reading
speed would slow even though subjects did not consciously notice the
change. Apparently when a detail is inessential (does not affect proof-
reading or comprehension), it impinges at an unconscious level but does
not emerge into consciousness.
Levy and associates demonstrated that readers process perceptual and
linguistic/cognitive information in parallel, and they do this so efficiently
that much of the lower-level processing takes place below conscious
awareness. Good readers are mainly aware of content and meaning, but
they can easily become aware of perceptual detail with no loss of under-
standing if their attention is directed to it.
Levy wondered whether this effect could be demonstrated with chil-
dren, including those with poor reading skills. A similar study was carried
out on 144 children in the third through the fifth grades (Levy, Nicholls,
and Kohen 1993). Good and poor readers read one story four times and
had to locate spelling and word errors. The instructions to the children
stressed speed, and they were aware they were being timed with a stop-
watch. After the fourth rereading, they transferred to a new story, and
transfer effects were measured. The stories varied in difficulty level (easy,
medium, hard).
Poor readers read much more slowly, but both reader groups in-
creased their reading speed by about the same amount, around 50 seconds
faster from the first to the fourth reading. Transfer effects for reading
speed were proportionally the same, and were quite large. The children’s
success rate in spotting errors was similar to the college students’, and
detection patterns were the same as well. It was easier to spot spelling
mistakes than notice errors that violated meaning. Error-detection rates
increased modestly for both good and poor readers across rereadings, but
improvement was inconsistent. Comprehension transfer did not occur
until the fourth and fifth grades, and this varied as a function of story
difficulty.
207
Reading Fluency
each grade level, and for both good and poor readers, was achieved with-
out ‘guessing’ or ‘sampling’ the printed pages. Word recognition became
more efficient, not attenuated, as fluency was acquired’’ (Levy, Nicholls,
and Kohen 1993, 321).
The studies by Levy and her colleagues added a new dimension to the
rereading literature by showing that not only do adults and children im-
prove in accuracy as they read faster, but they can also carry out a sec-
ondary task (proofreading) with no loss of accuracy. Furthermore, even
poor readers can do this, and though they were worse overall, they had
slightly higher gains in both speed and accuracy than normal readers did.
Levy and associates attempted to manipulate text difficulty in this
study and did not succeed, perhaps because they relied on the publisher’s
criterion for difficulty (‘‘grade level’’) instead of objective criteria, such as
word frequency, word length, and so forth. In two subsequent studies with
Faulkner (Faulkner and Levy 1994, 1999), this problem was remedied.
The 1994 study is the most important study on rereading since Dow-
hower’s investigation, because it nails down the final elements that deter-
mine what makes rereading successful.
The 1994 study contained four different experiments that manipu-
lated the type of transfer task and text difficulty. The subjects were third,
fourth, and sixth graders and college students, divided into good and poor
readers. Poor readers were selected on the basis of accuracy (word recog-
nition) and not reading speed. The first two studies involved an in-depth
analysis of the impact on transfer of story context and word overlap (rep-
etition of the same words). Each task consisted of two readings: the initial
story and a transfer story. There were four kinds of transfer stories: the
same story (rereading), a story with a high overlap of the same words (word
overlap), a story with a high overlap in content/context but little overlap in
words ( paraphrase), and a story unrelated in content or words (unrelated ).
The analysis focused on the reading rates and accuracy for the trans-
fer story only. The results were the same regardless of grade level. Re-
reading the same story produced the fastest times and the fewest errors for
everyone compared to the other types of transfer stories. When the re-
maining transfer stories were compared to each other, good readers im-
proved most on the paraphrased version, and got no boost from word
208
overlap. Poor readers benefited from both, but most from the overlapping
Chapter 7
words.
The results showed that when the text is difficult, word repetition is
helpful and content overlap is less so. When the text is easy, similar con-
tent enhances reading speed and accuracy, and specific words are not as
important. In other words, people are more likely to read difficult text at
the level of the word (more focus on decoding), and process meaning less
well. When people read at an optimum difficulty level, they read for
meaning alone, and particular words do not matter as much.
If this is true, then story difficulty ‘‘causes’’ reading speed and deter-
mines the amount of meaning extracted from the text. This corresponds
to the research on eye-movement control. Rayner (1986) found that text
difficulty was the most likely cause of erratic eye-movement patterns. But
does this always hold? The children in these studies were reading stories
at their grade level. As a result, the stories were easy for the good readers
but quite difficult for the poor readers. For this reason, Faulkner and
Levy, in a second part of the study, varied text difficulty by tying it to
reading skill, comparing good and poor readers across the age span. They
used the same four types of transfer stories as before: rereading, word
overlap, paraphrase, and unrelated.
They found that text difficulty level alone could make good readers
look exactly like poor readers, reading at the same slow rate and level of
accuracy. However, while poor readers did indeed read much faster when
stories were easy for them, they never read at rates remotely like those of
good readers. In one comparison, second-grade good readers and sixth-
grade poor readers were both given a fourth-grade story to read. The two
groups of children read at the same rate (120 and 117 wpm). But when the
fourth-grade good readers read a very easy story 2 years below their grade
level, reading speed soared to 151 wpm. In another study by Faulkner and
Levy (1999), fourth-grade good readers read material at or above grade
level at the rate of 159 wpm on the first reading and 182 wpm on the
second. Yet poor readers at the college level read easy material at the rate
of only 142 wpm.
These were not training studies. Children read stories far fewer times
(only twice in the last set of experiments), and results will not be compa-
rable to the true rereading studies. It is clear that a couple of rereadings
will not be sufficient to bring poor readers up to speed even on simple
209
Reading Fluency
speed, it only goes part of the way in accounting for poor readers’ ex-
tremely slow reading speed. Something else is going on.
Poor readers may get caught in a text-difficulty trap, in which reading
slowly becomes a way of life. This would happen if they got off to a bad
start and began lagging behind their peers. The reasons would include
such things as slow oculomotor development, language delays, lack of
early experience with letter shapes and sounds, missing school through
illness, poor instruction, and so on. Once behind, children who stay at
grade level will always be reading material at a difficulty level beyond their
comfort zone. Their reading speed will be slow and will remain slow even
if decoding accuracy improves to average or better. This pattern would
explain Wimmer’s ‘‘slow readers,’’ or at least a portion of them.
Summary
The good news is that children do not have to stay slow, and we now have
the formula for success. The formula is the same for slow readers (accu-
rate or inaccurate) and beginning readers alike.
Set target reading speeds well above the child’s level. So far, no one
has reported failing to achieve target levels 50 to 60 wpm higher than the
child’s baseline speed. I believe targets should be reset each time the
child’s reading speed improves by a certain amount. The final goal is to
have the first reading of the story at a normal or superior rate for the child’s
age. At this point, rereading exercises can cease. Children need to have
multiple rereading experiences (many stories), not just a few. Dowhower’s
time frame of about 7 hours’ work over 7 weeks was effective and seems
optimal. Time pressure was not excessive, and sessions were not spread so
far apart that there were no carryover effects. The ultimate goal is the
desired target speed on the first reading, and this goal determines how
long the rereading sessions last.
The difficulty level of the text is critically important, because speed is
tied to difficulty level. Very slow readers should start off reading passages
at or just above their reading level, not their grade level. Once reading
speed improves, stories should increase in difficulty. Passages with over-
lapping words are best for struggling readers and very young readers, and
are most likely to produce carryover (transfer) effects from one story to
the next. Overlapping context helps as well. This creates a situation in
210
continue.
This is far from the last word on this topic, because more research is
needed to sort out the best way to apply these new findings in the class-
room and in remedial settings. But the technique of rereading has finally
come of age. Nevertheless, fundamental research questions remain. What
causes children with good reading instruction to be slow, as found in the
Salzburg studies? So far, the evidence is pointing to a low verbal IQ and a
weak verbal memory. But other critical variables have not been studied,
such as the rate of oculomotor development and the nature of the skills
the child acquired prior to going to school.
8
VOCABULARY AND COMPREHENSION
INSTRUCTION
Printed words are oral language written down. Print is a filter through
which people can exchange oral messages across space and time. To make
these distant conversations possible, two things are necessary: you must
understand the spoken language, and you must know how the code works
(decoding accuracy and fluency).
Decoding and fluency are the gateway to reading comprehension,
but they do not work in isolation from a child’s vocabulary and oral-
comprehension skills. This can be observed in a number of ways. A word
might be accurately decoded (vampire) but have ‘‘nowhere to go,’’ because
the child does not know what it means. A word that is not in a child’s
vocabulary (sympathy) might lead to a distorted, though ‘‘legal,’’ decoding
(sim-pa-thigh?). A word that is known may fail to be decoded correctly due
to anomalies in the spelling code, glacier read as ‘‘glassier.’’
Comprehension means more than a good vocabulary. It involves a
number of core language skills, such as the ability to use syntax to antici-
pate words in a sentence and assign unknown words to the appropriate
part of speech. It includes an aptitude for monitoring context, making
inferences on the basis of background knowledge, as well as familiarity
with oral or literary forms (genres). Children with good oral comprehen-
sion who read the phrase ‘‘the bunnies huddled in the dense green grass’’
may not know the meaning of the words huddle and dense, but they will
know that huddle is something the bunnies are doing (‘‘verb’’), and dense is
a property of the grass (‘‘adjective’’). They know this implicitly because of
where these words occur in the phrase.
Other comprehension problems can arise from the nature of the com-
munication or the text itself. Young children have particular trouble with
a ‘‘story grammar’’—the special sequential structure of a story and its
212
fictional nature. This is surprising in view of the fact that children hear so
Chapter 8
many stories. Yet when children are asked to ‘‘tell a story,’’ most cannot
initiate or order the structural elements until around the age of 8 (Hudson
and Shapiro 1991). They routinely omit the ‘‘flags’’ that signal a story
beginning and ending (‘‘once upon a time’’; ‘‘they lived happily ever
after’’). They fail to provide a setting or any fictional characters. (Young
children’s stories are invariably autobiographical, with the children them-
selves in the title role.) They fail to create a problem or obstacle to carry
the story forward (story line), which means there is no resolution (story
apex). And despite what parents think, young children cannot retell a story
they have heard scores of times. They invariably get the story sequence
scrambled (Nelson 1998).
Reading comprehension is the end game of learning to read and nec-
essarily involves everything that comes before it: a good vocabulary and
good oral-comprehension skills, plus accurate and fluent decoding skills.
If children do badly on a reading-comprehension test, any of these four
things, alone or in combination, could be the culprit. Children who get
low scores on a reading-comprehension test solely because they cannot
decode are a very different from children who have low scores due to weak
oral-comprehension skills.
In English-speaking countries, reading researchers have focused much
more attention on decoding than on reading comprehension. In large part,
this is a consequence of the enormous number of children who fail at this
level. Nevertheless, the ultimate purpose of being able to read is to un-
derstand the message conveyed by the print. This is certainly the primary
goal of most teachers, even in the earliest grades, as has been seen many
times in this book. So far, we have learned that time spent on verbal lan-
guage skills is time taken away from learning how to decode and spell. But
we also know that decoding and basic spelling skills can be learned quickly
if they are taught appropriately. And there are excellent techniques to
improve reading fluency.
The tests used in the National Assessment of Educational Progress
(NAEP) to estimate reading competency in the United States, measure
reading comprehension, not decoding accuracy (Mullis, Campbell, and
Farstrup 1993; Campbell et al. 1996). When NAEP reported in 1993 and
1996 that 43 percent of fourth graders in America were ‘‘functionally
illiterate,’’ this did not mean these children could not decode (though that
213
may have been true as well). It meant they could not locate information or
the same Woodcock tests, but modified the listening version of the test to
Chapter 8
1. As a general rule, if correlations are real, the better the test, the higher the
values will be. This is a function of the items on the test, their difficulty level,
and the number of items at each level. A good test increases the likelihood of
a normal distribution, which will increase correlational values. The Peabody is
considered one of the best tests available for oral and reading comprehension.
215
Vocabulary Instruction
There is a popular theory that listening to stories and ‘‘reading a lot’’
causes vocabulary and comprehension skills to improve. This is assumed
to be true because written text contains more complex and rare words than
appear in everyday conversations. Haynes and Ahrens (1988) found that
children’s literature contains 50 percent more rare words than prime-time
television or college students’ conversations. When adult readers encoun-
ter unknown words, they try to work out meaning from syntax, context,
and word derivation. This is a lifelong process that is never completed.
Thus there are two propositions one could hold about the rela-
tionship between vocabulary and reading. One is that ‘‘vocabulary causes
reading,’’ because the more words are stored in memory, the easier it is to
decode them. The second is that ‘‘reading causes vocabulary,’’ because if
you ‘‘read a lot,’’ you learn more new words. Both lines of reasoning could
be correct (and probably are), and this has important consequences for
reading instruction in the classroom. (We have already seen that ‘‘reading
a lot’’ does not cause fluency or decoding accuracy.) But there are more
fundamental issues.
Vocabulary Development
Research on vocabulary instruction has to build on basic knowledge of
how vocabulary skills develop. So far, there has been a disconnect between
216
The sheer quantity of parents’ verbal input (total number of words per
hour) predicts a child’s spoken vocabulary later in time. (Other research
has shown that speech must be child directed and not adult directed.)
The quality of the communicative style (its richness, as well as the type
of feedback the child receives) was a stronger predictor of the child’s ver-
bal development than socioeconomic status was. This was seen in the data
for individual children, where socioeconomic status mattered much less
than how the mother interacted with her infant.
Five key communicative styles were identified. These are, in order of
importance:
220
was not in the story (‘‘foils’’). They had to point to one of four pictures
Chapter 8
that best fit the meaning of the sentence (25 percent correct by chance).
The success rate for the foil words (words not in the story) was just over 3
words. The success rate for the target words was 4.4 words, not much
better. Neither score was significantly above chance (5.6 correct is signif-
icant at p ¼ :05). Only the high-vocabulary group scored this well, and
just barely.
There is no evidence from this study that the majority of children
learned any words, despite the authors’ conclusions to the contrary. Even
the high-vocabulary group learned only two or three words.
Senechal and Cornell (1993) reasoned that if parents or teachers
used a more interactive style, this would enhance memory for new words.
There were 80 four-year-olds and 80 five-year-olds in the study. All were
middle or upper middle class. The design of the study was similar to the
one above, with target words embedded in a story. The story-reading
session (only one) took about 30 minutes. The target words were odd and
unfamiliar (as they were in Robbins and Ehri’s study): angling, corridor,
elderly, gazing, infant, lineman, reposing, sash, satchel, snapshot. All had a
familiar synonym: fishing, hall, old, looking, baby, repairman, resting, window,
purse/bag, picture.
The children were tested initially on their knowledge of the target
words and their synonyms. All children knew the familiar synonyms, but
none knew the target words. The next day, they met with an experimenter
(individually) and listened to the story. The story was read in one of four
ways, which varied in how much repetition or questioning was involved.
Immediately after this, the child took recall and recognition tests, and one
week later, the recognition test was given again.
We can dispense with the four treatments, because they made no dif-
ference whatsoever to either age group. Children remembered just as well
(or badly) when they heard the story read with no help or with expla-
nations of the target words. We can also dispense with the data for the
recall-memory test, because children did not do what they were supposed
to do. When they were shown a picture representing a target word, in-
stead of saying the word from the story, they gave the common word (the
synonym) almost exclusively. Only 18 of the 160 children provided any
target words throughout testing.
223
These results show that a handful of children learned one to two words
from initial testing to follow-up, but most children were just guessing.
Even viewed in the most generous way, it takes 30 minutes per individual
child to add one word to their vocabulary, and even then children will not
use the word spontaneously, making it hardly worth the effort.
The target words in this and the previous study were decidedly odd.
Children add words to their vocabulary so they can say something they
could not say otherwise. There is no reason they would prefer the archaic/
technical/literary target words in these stories as replacements for com-
mon synonyms they already know. This may be one explanation why
these experiments did not work.
If experimenters reading to a child have no impact on vocabulary ac-
quisition, would preschool teachers do any better? And if so, what type of
teacher-child interaction works best? These were the questions addressed
by Dickinson and Smith 1994. This study only serves to illustrate just how
difficult an assignment this was. This was a naturalistic study in which the
teacher’s style of interaction during story time was uncontrolled. It was
coded after the fact from videotapes in 25 different Head Start classrooms.
At the end of the year, 25 children (5 years old) were randomly selected
from these classrooms and tested on vocabulary and story comprehension.
224
The videos were coded for the amount of talk during story time for
Chapter 8
teachers and the children in each class, plus styles or types of interaction.
The investigators coded 21 different measures, which were combined
to represent three types of classroom interactions. The first was labeled
co-constructive (5 classrooms), in which a high number of analytic con-
versations took place between the teacher and the children, prior to,
during, and after reading the story. The second type was called didactic-
interactional (10 classrooms). Talk was limited generally, and what talk
did occur consisted of repetition (saying a sentence again) or answering a
question. The final type was performance-oriented (10 classrooms). Here,
talk largely occurred prior to and following reading the story, and the
story was read with little interruption. The preamble to reading the story
was often extended, analytic, and evaluative. When the story was finished,
questions were asked about story recall and understanding. Sometimes this
involved reconstructing the entire story piece by piece.
This was Dickinson and Smith’s interpretation of what the data
showed. But a table of probability values indicated that only five measures
discriminated among the classrooms at a conservative p < :01. Most of
these measures had to do with how much talk was going in the classroom
by both teachers and children prior to, during, and after story reading.
The only other discriminating measure was the proportion of teacher and
child clarifications about the story. Using these more conservative mea-
sures, classroom ‘‘style’’ boiled down to two things: the total amount of
talk by teachers and children, and how this talk was distributed between
the prestory and poststory phases and during the story.
Vocabulary and story-comprehension measures were compared for
the three types of classrooms. Children in the performance-oriented
classrooms had significantly higher vocabulary scores than children in the
didactic-interactional classrooms ( p < :01). (One assumes no other com-
parisons were significant because no other values were provided.) No dif-
ferences were found for story-comprehension scores.
There was a fatal design flaw in this study. Vocabulary was not measured
before the children entered the study. Without this baseline, there is no way to
know whether the children’s vocabulary levels ‘‘caused’’ (animated and
extended) the type of interaction that went on in the classroom, which
seems highly likely, or whether what went on in the classroom ‘‘caused’’
225
the vocabulary. The authors opted for the second interpretation and did
after the 5-day book-reading period, and also 6 weeks later. Only the
Chapter 8
children who were taught the meanings of the target words scored sig-
nificantly above chance. They did well on both books immediately after
training and also 6 weeks later, scoring around 50 percent correct. The
control group that heard the books read for the same amount of time with
no explanation, did no better than the control group that had never heard
the books, both groups scoring at chance.
These results showed that simply listening to a story does not impact
vocabulary acquisition. If new words are explained and synonyms provided
in context, there is some success. Nevertheless, even a 5-day training
period was not long enough for the children to remember more than half
of 10 new words. Meanwhile, they would have acquired 50 words on their
own without any training.
The NRP Report The NRP subgroup’s analysis of the training studies, 14
Chapter 8
years on, was disappointing. It would have been helpful to have an update,
perhaps adopting Stahl and Fairbanks’s classification scheme and compar-
ing this to more recent research. Instead, the NRP declared all research
null and void for purposes of a meta-analysis. The initial screening criteria
were publication in a scientific journal, inclusion of a control group that
was either matched or randomly assigned, and a proper statistical analysis.
The initial search turned up 197 papers on ‘‘vocabulary’’ plus ‘‘instruc-
tion,’’ and after screening, 50 studies remained in the pool. On further
analysis of these 50 studies, the panel decided that no research met the
NRP criteria that explicitly addressed measurement issues. In the execu-
tive summary of this report, the reason for ruling out a meta-analysis was a
‘‘heterogeneous set of methodologies, implementations, and conceptions
of vocabulary instruction’’ (p. 4-3).
Instead, the panel provided brief descriptions of 40 studies, set out
in various categories. Ten studies overlapped categories and appeared in
more than one place. Included among them were the studies I presented
above on storybook reading, where the data were invalid due to the failure
to control for guessing. The panel was unaware of this problem and their
report on these studies is inaccurate (see p. 4-21).
The NRP stated that Senechal and Cornell showed that ‘‘a single
book reading significantly improved children’s expressive vocabulary.’’
However, there was no significant effect for receptive vocabulary in this
study, and even Senechal and Cornell reported that there was no impact
on expressive vocabulary, because they found that children used few target
words to label the illustrations . . . and there was not enough variability in
the data to conduct statistical tests.
The NRP stated that Robbins and Ehri’s method ‘‘helped teach chil-
dren meanings of unfamiliar words.’’ Yet the data were invalid in this
study as well. They claimed that Dickinson and Smith showed that ‘‘the
amount of child-initiated analytic talk was important for vocabulary gains’’
when it did nothing of the sort. Gains were never measured in this study
(no baseline). The panel’s final summary of these 40 studies was even
more troubling, because these and other inaccurate conclusions were gen-
eralized further.
229
work outside the classroom. One classroom was designated ‘‘rich only,’’
reason relates to Lloyd’s (1992) discovery that fast and intense training is
Chapter 8
more effective for young children. Because McKeown et al. changed three
variables that relate to timing—the number of words taught (24 versus
104), the exposure time for each word (shorter), and the length of the
training (shorter)—there is more than one explanation for why all three
methods (even the traditional method) produced almost perfect receptive-
vocabulary scores (95 percent correct) for high-exposure words, and nearly
as high (85 percent correct) for the low-exposure words. Was this due to
learning fewer words, the compressed learning time, or both? The impact
of the ‘‘rich’’ teaching approach versus the traditional approach appeared
only on measures of productive vocabulary, which suggests that the teach-
ing method matters most in enhancing recall memory.
These studies suggest that any method that calls attention to meaning
and engages the student cognitively, produces gains in receptive vocabu-
lary. Exposure duration and intensity of learning impact how well and
how much is remembered. This general hypothesis may or may not be
accurate, but it is certainly worth further study.
Jenkins, Matlock, and Slocum (1989) also found a ‘‘frequency’’ and
‘‘method’’ effect in a short-term study on vocabulary instruction. There
were 135 fifth-grade children in the study, primarily middle class, with ex-
ceptionally high vocabulary and reading-comprehension scores on stan-
dardized tests. The goal was to teach 45 target words. The children were
taught with either of two methods. In one, word meanings were taught
directly, and in the other, meaning was derived from context. In addition,
words were seen once, three times, or six times. Training took place over
9, 11, and 20 days depending on the amount of exposure. Children spent
15 minutes per day learning these words, about 5 hours for the long-
exposure group (6.5 minutes per word).
‘‘Training for meaning’’ included memorizing definitions, using the
target words in a sentence, and substituting synonyms for the target words.
‘‘Context training’’ involved applying a sequence of strategies: (1) substi-
tute a word or expression for an unknown word, (2) check other context
clues to support this choice, (3) determine whether the substitution fits all
context clues and if not, (4) revise the word and start again.
Knowledge of the target words was tested for both productive and
receptive vocabulary. The context-learning group did not do well on
any of the tests at any exposure condition (low, medium, or high). They
233
scored about 1 correct for finding synonyms for isolated words, and were
Summing Up
The results from these well-executed studies are very consistent. Fre-
quency of exposure to new words makes a big difference only if students
have some guidance (instruction) and gain a deeper understanding of what
these words mean. Deriving meaning from context analysis is not effective.
It appears to be too abstract, even for bright fourth graders. This finding
contradicts the major tenet of whole language, that children can easily
derive meaning from the pictures and context clues and do this while they
teach themselves to read.
Vocabulary can be taught, and there is solid evidence on which
teaching methods have value. We also see that repetition is critical, and
that short-term (intense) teaching works better than lessons spread out
over a long period of time. As to whether teaching new vocabulary is
‘‘worth it,’’ the fact that these abstract words would be unlikely to be
acquired spontaneously suggests that this is a good idea. Using the right
method and approach, the cost is only around 5 minutes per word. Stahl
and Fairbanks pointed out that learning just 300 words a year will increase
vocabulary size about 10 percent. This is around two new words per day
of classroom days (175 days). However, there is considerable debate about
the effectiveness of teaching isolated words rather than learning new vo-
cabulary in the context of general comprehension training. As we will see
in the next section, a good comprehension program dramatically enhances
vocabulary, even when this is not a specific feature of the lessons.
234
Because these methods are for the purpose of impacting reading com-
A Meta-analysis
Apart from these concerns, there are problems with how reading com-
prehension is measured in these studies. Unlike the research on vocabu-
lary instruction where specific words are taught and tested, training in
reading comprehension must generalize to other text to prove the validity
of the method. The results of a large review show that it does not. Rosen-
shine, Meister, and Chapman (1996) did a meta-analysis on 26 studies that
dealt with ‘‘questioning’’ types of instruction. As they put it, ‘‘Teaching
students to ask questions may help them become sensitive to important
points in a text and thus monitor the state of their reading comprehen-
sion’’ (p. 183).
The method had to include a large proportion of time spent generat-
ing questions to help students understand a passage for the method to be
included in their analysis. Also included were studies using ‘‘reciprocal
teaching,’’ in which both teacher and students collaborate to interpret a
passage. Rosenshine and colleagues excluded all studies where the children
were tested on the same passage they had been trained on. The overall
result was an effect size of .86 for experimenter-generated tests—test
236
passages that had a similar structure to the passage on which the children
Chapter 8
were trained. However, this training did not generalize, and there was
little transfer to standardized measures of comprehension (ES ¼ .36).
The studies were broken down further into five different types. One
was ‘‘signal words’’ in which the student is prompted by words like who,
what, where, when, why, and how. The second type involved ‘‘generic
questions.’’ Here the student is trained to ask a variety of questions, such
as how two things are alike or different, what the main idea is, or how
events or actions are related to one another. The third type was ‘‘main
idea only,’’ in which children find the main idea and then ask questions
about it. ‘‘Question types’’ comprised the fourth category. Here students
are first directed to find specific information, then to relate two or more
pieces of information, and finally to answer questions where information
must be inferred or deduced by logic and background knowledge. (The
NAEP tests are based on this approach.) The fifth type encompassed
questions about story grammar like those reviewed earlier: ‘‘Who is the
main character in this story?’’
On the experimenter-designed tests, the ‘‘generic-questions’’ approach
generated the largest effect sizes, followed by the ‘‘signal-words’’ tech-
nique. Finding the ‘‘main idea’’ did not fare well. Other types included too
few studies to make effect sizes meaningful. As for performance on stan-
dardized tests, the ‘‘signal-words’’ group produced an effect size of .36,
and the ‘‘question-type’’ group an effect size of zero. Even more interest-
ing, Rosenshine, Meister, and Chapman found no evidence for the impact
of length of training. Paradoxically, studies with positive results had fewer
sessions overall (4 to 25) than those with nonsignificant results (8 to 50).
This fact alone calls most of this research into question. With a good
method, learning time ought to translate into better learning, not worse.
The authors commented on the contrast between experimenter-
designed tests and standardized tests. Experimenter-designed tests were
more highly structured, with a clear ‘‘main idea’’ and obvious sup-
porting detail. Standardized test passages were more typical of nor-
mal text, without such a clear and obvious structure. It should also be
noted that standardized tests are normed and control for age, whereas
experimenter-designed tests are not. Experimenter-designed tests usually
employ multiple-choice questions and are subject to guessing. As we have
237
3. I did not include cases where there was one study only per method in these
effect sizes.
239
Overall, this is a very gloomy analysis from all points of view. I do not
was devoted to teaching this program. Also, in the case of children with
Chapter 8
Block uncovered eight strategies that are important and need to be taught.
These reflect such things as basic cognitive operations, analytic thinking,
decision strategies, problem solving, metacognitive analysis (awareness of
one’s current state of knowledge), creative thinking, plus skills for working
in groups, and skills for working effectively alone. Sixteen lessons were
designed to teach these strategic skills, and these lessons were carefully
field tested across all grade levels on a large number of children prior to
this study.
The lessons were structured so that one critical-thinking technique
was introduced per lesson, plus strategies for improving comprehension.
This constituted part 1 of the lesson. In part 2, children selected reading
material to apply this new knowledge. They could choose from a large
243
school (92 percent reporting ‘‘useful,’’ versus zero for the controls). The
Chapter 8
General Conclusions
The past few chapters have shown that there are some remarkable instruc-
tional methods for almost every type of reading skill, methods that pro-
duce close to 100 percent success for every child. This is the good news.
Wouldn’t it be exciting if everyone knew about these methods, especially
teachers? Unfortunately, due to the enormous volume of published and
unpublished research, these excellent methods are very hard to locate.
It is difficult to be neutral about the fact that there is such a vast
quantity of poor research (‘‘junk science’’) clogging the reading-research
databases. In the real world of science, the most rigorous and most im-
portant studies tend to find their way to the top journals. Here, quite the
opposite is true. The flagship journals, of which there are only two, are
just as likely to publish research that is methodologically flawed as not.
245
In a deeply opaque writing system like the English alphabet code, most
phonemes have multiple spellings. Only eight are reliable, and another
six, relatively so. But even here, half of the ‘‘predictable’’ phonemes have
single- and double-letter spellings: /b/, /d/, /l/, /p/, /t/, /g/, /m/, and
/n/, as in cab/ebb, lad/ladder, curl/hill, tap/tapped, bat/batter, fog/egg, ham/
hammer, win/winning). Some phonemes can be spelled nine or ten differ-
ent ways. Because spelling requires recall memory and reading requires
only recognition memory, spelling is much more difficult than reading. It
is easy to read a word like hill, but quite another matter to remember
whether to double the l when you spell it (boil, ball, deal, will, pal, pull, bail,
doll ).
As we saw in chapter 3, there have been four attempts, historically, to
systematize the English spelling code, and I was only able to locate three
studies where these systems were tested empirically. Typically, spelling
instruction consists of lists of random words that go up on the board on
Monday for the spelling test on Friday. The yardstick of spelling difficulty
is syllable length, as well as ‘‘regular’’ versus ‘‘rare’’ spellings, but seldom
emphasizes the structural elements of the code—such as the spelling pat-
terns linked to phoneme position within words.
Research on classroom spelling programs is so rare that there was no
section on spelling in the NRP report. Graham (2000) managed to locate
only 60 studies on spelling by scouring the journals back to the 1920s.
These studies compared the two dominant approaches: ‘‘natural’’ learning
(self-taught spelling) versus ‘‘traditional’’ instruction, consisting mainly of
random word lists. Most of these studies were methodologically flawed,
but the general message was that children cannot teach themselves to spell
simply by reading or through creative writing, and that ‘‘traditional’’
248
programs were superior. As for what these programs contained, little was
Chapter 9
said. Graham noted that, apart from rote memorization, spelling lessons
were made more enjoyable by ‘‘including student choice in the selection
of spelling words and methods of study, guided discovery in learning the
patterns underlying the spelling of words, opportunities to work with
peers, and use of games’’ (p. 245).
How ‘‘student choice’’ and ‘‘guided discovery’’ are supposed to work
in the absence of any knowledge on this subject was not explained.
The fact that many children do learn to spell is, therefore, a bit of a
mystery, and how children succeed at this task has been the central ques-
tion in research on spelling. This question occupies us for the next two
chapters, and there are some surprising answers.
The ‘‘how’’ question is approached differently depending on the
researcher’s background knowledge. For the most part, researchers who
study spelling have little or no understanding of how writing systems work
and no knowledge of the structure of the English spelling code. Because
of this, spelling research is based on a set of implicit assumptions. It is
assumed that it is ‘‘natural’’ for children to teach themselves to spell, and
that spelling skill proceeds in stages. It is assumed that children learn to
spell by reinventing the spelling code (invented spelling). Using this logic,
a poor speller is someone with a developmental delay or a deficit. Yet if
children are not taught something as complex as the English spelling
code in a structured and meaningful fashion, how can anyone learn it? To
someone with greater knowledge, the good speller seems unnatural and the
poor speller seems normal.
There a deeper issue here, which I call the many-word problem. Even if
the world’s best spelling program could be devised, it would never be
possible to teach the spelling of every word. Because the English spelling
system is so opaque, and only a handful of phoneme-to-grapheme corre-
spondences are consistently reliable, the only way it can be mastered is
through its probability structure, the reoccurring regularities in spelling
patterns as outlined in chapter 3. For the brain to set up this structure, it
needs exposure to thousands of examples of correctly spelled words. A
good spelling program can jump-start this process by grouping words with
these redundant patterns, but it will never succeed in teaching every word.
There will always be words that a fluent English reader is unable to spell,
words that have to be looked up in a dictionary.
249
Spelling Predictors
There are three important factors that will impact spelling scores regard-
less of method.
significantly by region, the sex difference did not. (No region sex inter-
Chapter 9
but they do not dominate the field of reading research to nearly the same
time
Failure to let the structure of the spelling code dictate the sequence
To put this in context, let’s begin in the real world. Molly’s mother began
teaching her the alphabet at age 3. Molly learned to chant letter names in
alphabetical order. She learned to match letter names to letter shapes. She
was cycled through a ‘‘letter-of-the-week’’ regime in which a new mag-
netic letter appeared on the refrigerator door each Monday. By the time
Molly entered school at age 5, she and most of the children in her middle-
class neighborhood had a fair-to-good knowledge of letter shapes and
letter names. The children could name the letters they were shown, point
252
to letters that were named, and some children (mostly girls) could write
Chapter 9
Molly was never taught these things because neither her mother nor
plateau, or did the teacher teach past-tense spellings in sixth grade? There
Chapter 9
seem to be two stages for mastering the ‘‘add -ing rules’’ (drop e, double
the consonant). For some reason it is easier to master the ‘‘drop e, add -ing
rule,’’ though certainly not for everyone. Only 70 percent succeeded by
second grade. Following this, there was no change for the next 4 years. The
letter-doubling trick appears to be much later ‘‘developmentally.’’
These peculiar results are just as likely to reflect what was going on
the classroom. Because no information was provided about this, these
results are uninterpretable. In any case, they do not support a stage model
that places ‘‘structural’’ or ‘‘derivational’’ spelling errors at a single stage.
In fact, no stage model can explain these data, not even one that pro-
posed a different stage for every type of plural and every type of verb
transformation.
A classification process must follow standard scientific principles;
otherwise this work does not count as science. Evidence for a stage model
requires that at least these criteria be met:
Error scores are the basis for the coding scheme, but errors provide no
of children ‘‘at each stage.’’ Stages can only be demonstrated by shifts for
individual children to prove that spelling stages are not mixed. That is,
children could not be in a ‘‘visual’’ stage once they were in a ‘‘phonetic’’
stage.
1. Morris and Perney’s scheme is different from Henderson’s and from that of
Beers and Beers, yet Morris formerly worked with Henderson.
260
Any stage model should pass at least three tests. One is that all, or very nearly
all, the children should clearly belong to one of the stages in each session. . . .
Second, the developmental stages should be related to external criteria: The
children at more advanced stages should be the older or educationally more
successful children in the sample. The third test is the most stringent and
unfortunately the least often applied. . . . Children should move in one direc-
tion but not in the other. (p. 642)
Despite this clear statement, the descriptive language in the report is quite
at odds with the notion of stages. The language reflects children’s slowly
emerging awareness of spelling conventions. The researchers wrote that at
first children ‘‘ignore’’ spelling conventions, then ‘‘they begin to realize’’
there is an -ed spelling convention but cannot apply it, later they ‘‘grasp
[its] grammatical significance’’ but misapply it to irregular verbs, and
finally they ‘‘learn about exceptions.’’
This gradual process of becoming aware of spelling patterns is not a
description of anything ‘‘stagelike.’’ It is a description of learning.
This was a longitudinal study with 363 children tracked from second
through fourth grade. They were given a spelling-dictation test in which
ten words were regular past-tense verbs ending in /d/ or /t/ that take the
-ed spelling (load, loaded; wilt, wilted ), ten were irregular past-tense verbs
ending in /d/ or /t/ ( found, felt), and ten were common nouns ending in
/d/ or /t/ (bird, belt).
261
The spelling test was given three times: at the start of the study, 7
who were initially at stage 2 behaved even less stagelike, 43 got stuck
there, 5 went backward to stage 1, none went up one stage, 5 went up two,
and 6 went up three. Children’s movement from one stage to the next is
scarcely progressive or orderly. The follow-up testing produced similar
results. Of the 58 children initially assigned to stage 1, 15 children (25
percent) were still there 20 months later.
A second hypothesis in this study was that knowledge of grammar
(past tense) would directly impact children’s ability to spell past-tense
verbs. However, their in-house grammar test was far too advanced for
7- to 9-year-olds. Test scores were near zero and changed little over 20
months. This did not prevent the authors from doing statistics on the
data. The authors interpreted their findings as follows:
These generalizations are at the heart of our new model of the development of
spelling. This proposes that a child’s first step in spelling is to adopt a phonetic
spelling strategy; the next step is to notice and to try to incorporate exceptions
to these rules, but without a complete understanding of their grammatical
basis; the next step is to understand fully this grammatical basis for some of
the spelling patterns that do not fit well with the letter-sound rules; and the
final step is to learn about the exceptions to the grammatically based rules.
(p. 647)
adopted Gentry’s (1981) classification system, and used Morris and Per-
jective time is not instantaneous in neural time. People are not aware of
There are more longitudinal studies in the literature today than there
were in 1992, but the data continue to show that stage models are false.
Stage models of spelling will continue to be limited empirically because
they are circular and logically untenable.
spellers that are absent in poor spellers, using word lists that feature par-
ticular types of spellings. Poor spellers will, by definition, be worse on
every type of test (they are selected to be worse). The goal is to show dif-
ferential performance on the tests, a worse performance on some tests than
on others. For instance, one might anticipate that differences would be
small to nonexistent on one-syllable ‘‘regularly’’ spelled words, but greater
on words where spellings obey a convention or ‘‘rule’’ (past-tense end-
ings) or are determined in some way by morphology (language structure).
The validity of this research is entirely dependent on whether the
word lists actually measure what they purport to measure. Knowledge of
the structure of the spelling code is imperative for in-house spelling tests
to be valid. Otherwise, there is no way to know whether performance is
due to a visual or phonetic strategy, to ignorance of the orthographic or
morphological spelling ‘‘rules,’’ or to insufficient exposure to print. Un-
fortunately, few researchers have any knowledge of this structure.
As an example of this problem, it is assumed that morphological
spelling clues like those bequeathed to us by Samuel Johnson in 1755 are
of great benefit to readers and spellers alike ( Johnson [1755] 1773). Lin-
guists, echoed by reading researchers, often point to ‘‘linguistic’’ con-
nections between word forms and the spelling code. But they are highly
selective. The word sign is said to contain the gn spelling because it is
morphologically related to signal and signature. This may be true, but it
does not matter a fig unless this morphological clue is consistent and pre-
dicts these transformations: deign, deignal, deignature; reign, reignal, reign-
ature; design, designal, designature; impugn, impugnal, impugnature; benign,
benignal, benignature. As you can see, it does not. In order to use this
‘‘morphological clue,’’ a person would have to remember that the gn in
sign occurs because of signal or signature, ‘‘BUT NOT’’ (a blocking rule—
or exception to a rule you have to remember) for other words with the gn
spelling. It is far simpler to remember that gn is a spelling alternative for
the sound /n/ (except for signal and signature), and be done with it. There
are 15 common words with this spelling.
Words on the lists were mainly multisyllable words and were balanced
across levels for syllable length and frequency in print.
268
that words like blunder, alternate, and unemployment are more transparent
and easier to spell than words like diphtheria, sergeant, annihilate, and
pygmy. And these words were noticeably different from Level 2 words.
The problem arose in the contrast between Level 2A and 2B words. For
example, letter doubling was supposed to work by an ‘‘orthographic rule’’
at Level 2A in words like sobbing, clannish, and thinned. Yet these words
were not qualitatively different from words of ‘‘morphophonemic’’ origin,
such as omitted, regrettable, and equipped (Level 2B). Thus the distinction
between Levels 2A and 2B was based on bogus phonics-type rules.
The results showed that Level 1 was easiest, Level 2 next easiest, and
Level 3 hardest, for good and poor spellers alike. Both groups had the
same error patterns, and in the same proportions, phonetic substitutions
being the most common errors (around 88 percent), confirming the find-
ings of Varnhagen and colleagues. The two groups parted company on
Levels 2A and 2B. Poor spellers were equally bad on both word lists, but
good spellers spelled the morphophonemic (2B) words more accurately
than the orthographic (2A) words (contrary to expectation).
Fischer and associates also tested good and poor spellers on a
nonsense-word spelling-recognition test. The students had to choose
which of two words was most likely to be spelled correctly. Poor spellers
had the most difficulty when an added prefix or suffix required a modifi-
cation to the root word.
When they looked at other possible contributors to these results, the
poor spellers were found to score well below good spellers on the WRAT
reading test and on the Stanford reading-comprehension test. The groups
did not differ in vocabulary (WAIS vocabulary), showing that these
students’ spelling problems were not related to verbal IQ. And equally
important, poor spellers did not do worse on a visual-memory test for
abstract visual patterns, evidence that visual memory is not the source of
the poor spellers’ difficulties.
Fischer and colleagues surmised that because the greatest discrepancy
between good and poor spellers appeared on the ‘‘morphological’’ spelling
test (Level 2B), ‘‘linguistic sensitivity’’ was at the root of their reading and
spelling difficulties. However, it is just as likely that these students are poor
spellers because they are poor readers and do not read (or write) fre-
quently enough to observe the more difficult spelling patterns very often.
269
were no differences between good and poor spellers on either visual task
Chapter 9
for speed or for errors. Once more, the evidence shows that poor spellers
do not have visual-processing problems.
These are important results. First, they show that it takes only half the
time to judge whether a pair of words is the same as it does to decide
whether a random string of the same letters is the same. This means that
processing speed improves as a function of exposure, and the redundant
syllable patterns in speech, and how these are represented by spelling
code, make this possible.
Holmes and Ng also found that poor spellers were far more likely to
have low vocabulary scores, but did not differ in nonverbal IQ. The stu-
dents took the Author Recognition Test (Stanovich and West 1989) to
estimate exposure to print. The task is to check off famous authors’ names,
which are mixed with names of unknown persons. The differences were
enormous. Good spellers correctly identified an average of 18 authors out
of 40, poor spellers only 7.5 (pure guessing), proof that poor spellers do
not read nearly as much as good spellers.
These studies provide a excellent profile of the poor speller at the
college level. They score well below good spellers on standardized reading
tests; they also read far less and have a weaker vocabulary. They have no
visual-perception problems, but they do have an unusual visual-scanning
patterns for reading multisyllable words, focusing most attention on the
outer segments of the word. None of this, of course, sheds much light on
cause. For example, a weak vocabulary or poor decoding skills might
depress interest in ‘‘reading a lot,’’ so that print exposure is a result of
reading skill, and any ‘‘causal’’ link to spelling is indirect.
Ressearch on Children
Waters, Bruck, and Malus-Abramovitz (1988) adapted this research design
to study children. They set up five types of spelling lists suitable for chil-
dren age 8 to 12 years (see pp. 172–173). Precise descriptions of each type
were provided. Here they are in slightly reduced form:
Regular words. ‘‘Must contain spellings that directly reflect the surface
phonology of the word, and which can be derived through the application
of sound-spelling correspondences. Sounds have few spelling alternatives’’
(original emphasis). Later in the article, they referred to regular words as
271
Shankweiler, and Liberman’s word lists, but there were also flaws in the
word lists used by Waters and associates, making it difficult to interpret
the data. The most straightforward description of the word lists is that
they increase in difficulty with each level as they depart from a surface
phoneme-grapheme relationship.
This study involved 158 children at four grade levels (3 through 6),
consisting of the top and bottom thirds in spelling performance. Poor
spellers were also very poor readers. Results for all groups combined
(main effects) showed that high-frequency words were easier to spell than
low-frequency words (frequency in print), Regular words were spelled
more accurately, Regular* and Orthographic levels did not differ from
each other, and the remaining levels differed significantly from each other
in the expected direction. Grade level was also significant.
However, while poor spellers did worse overall, there was no differ-
ential impact as a function of the types of spelling words. This result is
particularly interesting, not only for purposes of this study, but because a
stage model would predict that these spelling ‘‘levels’’ would be acquired
by good and poor spellers at different times or rates. This was one of three
requirements of a stage model as specified by Nunes, Bryant, and Bind-
man 1997.
The patterns across age were very revealing. First, good spellers made
systematic gains across all spelling levels between grades 3 and 4, while
gains for poor spellers during this time period were virtually nil. Second,
the size of the gains was comparable for all categories of spelling words
with the exception of ‘‘Strange words.’’ Third, not only did poor spellers
score far below good spellers on every test, but they made proportionately
fewer gains at every grade level. They are late out of the starting gate and
run more slowly as the race progresses.
This is more evidence against developmental stages (not that Waters
and colleagues were interested in stages). A stage model would predict
variations in acceleration as a function of age and the category of words
the child was asked to spell, assuming these categories were meaningful.
Instead, there were similar rates of improvement across all levels.
Waters and associates also compared the performance of the same
children on a spelling-recognition task. The children had to choose the
273
correct spelling from three foils (trane, train, trayn). Poor spellers had a
Summary
Poor spellers are more likely to have a limited vocabulary and weak
decoding skills, and do not read as much as good spellers do. However, as
274
a group they are not differentially worse than good spellers on tests
Chapter 9
category ought to disappear. And if it did, the only category left would be
Chapter 9
Correct spelling!
Treiman discovered that letter-name knowledge can have a negative
impact on children’s spelling accuracy. A study was designed to find out
whether children knew the letter names in their own name, and whether
this would generalize to knowing letter-sound correspondences (Treiman
and Broderick 1998). Kindergarten and first-grade children were asked to
identify letters in the alphabet by name and by sound. Children knew far
more letter names than sounds. Only the letter name for the initial letter
in the child’s first name was consistently (and significantly) likely to be
known, and this was true for both age groups. However, knowing this
letter name did not ensure that these children knew the first sound in their
names.
The second experiment was a replication of the first, with a writing
component added. Preschoolers (4:3 to 5:9 years old) were tested for
letter-printing accuracy, plus letter-name recognition. Scores were sig-
nificantly higher for the letter that began the child’s first name, but, again,
this did not lead to knowledge of the sound the letter stood for. Despite
the fact that young children are very familiar with the shape and ‘‘name’’
of the first letter of their name, knowing this provides no clue to the
sound it represents even by first grade. The supposed ‘‘generalizability’’
effect has been the only argument for the importance of teaching letter
names.
Treiman and Tincoff (1997) designed a special spelling test to pin
down letter-name spelling errors in multisyllable words. Kindergartners
and first graders were asked to spell nonsense words like tuzzy, tuggy, and
tuzzigh. The first word ends in a letter name—zee—and the others do not
( ghee and zigh). They found children commonly spelled open syllables
(CV-CV ) with single consonant letters (b, z, d, g). These ‘‘letter-name’’
errors were three times more likely to appear in the kindergartners’ spell-
ings than in first graders’ spellings. This shows that letter names are
something a child has to unlearn to be able to spell. Treiman and Tincoff
observed that the error patterns showed that children were processing
words at the level of the syllable, not the phoneme, and they were match-
ing letters and sounds at the level of the syllable as well. They emphasized
the significance of this fact: ‘‘These letter-name spellings reveal that the
alphabetic principle is fragile for beginning spellers. Children find it dif-
277
eralized to the sounds inside the names. Learning letter names was no
more beneficial than memorizing names of geometric shapes or cartoon
characters! In fact, it was no more beneficial than being taught nothing
at all.
The message is clear: Discourage and eliminate the use of letter names and
encourage the teaching of phoneme-grapheme correspondences.
10
THE MANY-WORD PROBLEM:
MORE TO SPELLING THAN MEETS THE I
The studies reviewed in the previous chapter failed to show either how
children learn to spell, or why some children fail and others do not. The
concept of stages of spelling development is untenable on both logical and
scientific grounds. The notion that poor spellers lack some linguistic
knowledge or insights that would allow them to access ‘‘orthographic
rules’’ or ‘‘morphological levels’’ of the spelling codes has not been sub-
stantiated. Poor spellers do worse across the board and have the most
difficulty with rare, irregularly spelled words that appear infrequently in
print—words whose spelling patterns do not reflect ‘‘orthographic rules’’
or ‘‘morphological structure.’’ The most parsimonious explanation of the
poor spellers’ problem is lack of exposure to print. I address the reasons
this might be the case later.
One of the major problems with these research efforts is that they
reflect two main sources of confusion. The first relates to these funda-
mental questions: What processing skills are involved in mastering our
spelling system? Is it phoneme awareness, knowledge of phoneme-
grapheme correspondences, rote visual memory, memory for redundant
visual patterns, or all the above? The second source of confusion is igno-
rance of the structure of the English spelling code. Unless we can get
beyond these basic holes in knowledge, we are unlikely to make much
headway toward the ultimate goal, which ought to be figuring out how we
teach children to spell. The question ‘‘What’s wrong with people who
can’t spell?’’ is, after all, a rather foolish question when researchers do not
know the structure of the spelling code or how to teach it.
An analysis of the English spelling code was provided in chapter 3.
This showed that the spelling code contains multiple components or
280
The initial research (Stanovich and West 1989) was designed to look
The students were divided into skilled and less skilled readers on the
Chapter 10
oped a new test called the Title Recognition Tests (TRT). The children
Deaf and hearing children find legal spelling patterns much easier to
Chapter 10
remember and spell. Not only this, but deaf children do not have superior
visual rote-memory skills to hearing children. It appears that deaf and
hearing children alike rely on both types of visual memory—pure rote
memory and memory for redundant orthographic patterns—and they do
so to the same extent.
A qualitative analysis of spelling errors on this task revealed that
hearing children made mainly phonological errors, and deaf children
hardly any. Their errors were visual and included letter reversals, and
missing or substituted vowels and consonants. Deaf children made vowel
and consonant errors in the same proportion, whereas the hearing chil-
dren’s errors mostly involved vowel spellings.
This is strong evidence that profoundly deaf children learn to spell by
relying on the statistical properties of visual spelling patterns. These chil-
dren had no phonological skills and no special advantage in rote visual
memory. Yet they were able to spell nearly as well as grade 5 readers on
some tasks. The other important result was that the hearing children were
using a combination of three different skills: phonological processing for
phoneme-grapheme relationships, pure rote visual memory, and memory
for the statistical probabilities of ‘‘orthographic’’ patterns in words. It
is this combination of skills that allows us to remember words and spell
them accurately, and that accounts for why spelling improves with age. As
Aaron pointed out, deaf children are at a distinct disadvantage: ‘‘A lack
of acoustic phonological skills sets an upper limit at about the fourth
grade level, beyond which spelling skills may not progress. This may be so
because beyond a certain level, children encounter many cognate verbs,
adverbs, and multi-syllable words whose spelling is influenced by mor-
phophonemic conventions’’ (p. 18).
A related finding on a very different population of children was re-
ported by Siegel, Share and Geva (1995) in Canada. They tested 257 poor
readers who scored below the 25th percentile on the WRAT reading test,
and 342 normal readers who scored above the 35th percentile.
The task was a spelling-recognition exercise: look at pairs of nonwords
and select the one that looked ‘‘most like a word.’’ The children were also
tested on the Word Attack subtest of the Woodcock. The normal and poor
readers differed on the Word Attack test in the expected direction. But the
poor readers were superior (statistically) on the spelling-recognition task,
287
and the stored phonological representation of the word. The other path
involves translating from letters to individual phonemes and assembling
them into a word. The translation is said to proceed via grapheme-phoneme
correspondence rules (GPC rules). This route bypasses semantic memory
and links to the output phase just prior to ‘‘saying’’ the word. GPC rules
are the decoding counterpart of phonics-type spelling rules.
The concept of GPC rules had its roots in the work of Chomsky and
Halle (1968), who proposed that phonology and morphology were linked
by a set of correspondence rules. If a rule failed in a particular instance,
this violation would be marked (presumably by the brain) with a ‘‘blocking
rule’’ (‘‘BUT NOT’’). Venezky’s (1970, 1999) attempt to discover ‘‘rules’’
of the spelling-to-sound code (chapter 3) fits this line of thinking. The
dual-route model holds that GPC rules can be revealed by a complete
analysis of printed words of the type Venezky embarked on. These rules
are presumed to be deduced by the learner (or taught) and will explicitly
guide the learning process. The dual-route model also includes the dictum
that once a GPC analysis of a word is sufficiently fast, the word is trans-
ferred to the lexical path and becomes a sight word, recognized instantly
(which, as chapter 2 showed, is impossible).
A similar idea was developed to explain the behavioral data on adult
readers and children learning to read. It was believed that people have two
(equally efficient) reading styles, reading whole words by sight (logo-
graphically), or reading phonetically. Baron and Strawson (1976) chris-
tened them ‘‘Chinese’’ and ‘‘Phoenician’’ readers. This reflects a mistaken
belief that the Chinese (or anyone else) have a logographic writing system,
and that people can learn to read by memorizing whole words as if they
were telephone numbers.
The dual-route, Chinese/Phoenician models had their detractors, but
none as important as Robert Glushko. His pioneering work is directly
linked to the latest computer models of reading based on ‘‘parallel dis-
tributed processing’’ (Seidenberg and McClelland 1989; Plaut et al. 1996).
These models mimic what is thought to occur in the brain when someone
learns to read. These are statistical models. They ‘‘learn’’ by virtue of
processing the structural redundancies in the input, along with the feed-
back from the environment about success rates.
289
Teachers have known it for at least two centuries as a word family, and
Chapter 10
more recently it was rechristened the rime. I will refer to the visual repre-
sentation of the VC or VCC letter sequences in one-syllable words as the
orthographic rime and the phonological ‘‘decoding’’ of that rime as the
phonological rhyme. It is very important to keep this distinction straight, or
this research will seem even more complex than it already is.
rime, deaf, leaf, and sheaf. This selective scoring biases the error rate in the
2; and (4) whether or not the response times followed prediction. The
Chapter 10
table illustrated some disconcerting facts about these kinds of word lists.
In the first place, it is tempting for experimenters to drop out words that
go against the prediction and retain those that do. I am not saying that this
is a conscious act or that Glushko did this; otherwise all word pairs would
have gone in the prediction direction. But he did have a better hit rate for
the reused word pairs than for the new word pairs he created: 16 out of
26 old pairs going with the prediction (62 percent), versus only 3 out of
7 new pairs (43 percent). The response-time data represents an additional
concern, because the values for the same word pairs fluctuate wildly from
experiment 1 to experiment 2.
Glushko himself appeared to have considerable doubts about his clas-
sification scheme. In the third experiment, he reclassified words into three
categories. He used the word have to explain how this worked. Have is both
an exception word and inconsistent, in that the spelling ave is only decoded
/av/ in this word, whereas it is usually decoded /aev/ (cave, gave). Thus, have
is the only word with this pronunciation. The word gave, on the other
hand, is no longer regular by the old scheme, but inconsistent by virtue of
the existence of have. Another group of words were both regular and con-
sistent, having lots of neighbors and no competitors. Two sets of consistent
words were derived from the exception and inconsistent words by changing
a single consonant (haze and wade), and these became the control words.
In general, Glushko’s predictions for this complex set of contrasts was
borne out statistically, but it is doubtful that the categorization process fits
his activation theory, or that the findings are even supported by the data.
First of all, the pairs of regular/consistent control words (haze, wade) were
supposed to be read at more or less the same speed, having been derived
from the same word ( gave), both having lots of neighbors. But only 66
percent of these word pairs were processed at comparable speeds, when all
should have been.
Furthermore, because the activation theory predicts that neighbors
will automatically be activated together and boost the likelihood of a par-
ticular reading of a word, regular/inconsistent words like wave (lots of
consistent neighbors) ought to be read more quickly than an exception
word like have (all neighbors inconsistent). This, after all, is the basis for
the consistency effect. Yet reaction time was 492 ms for have and 528 ms
295
for wave. Similarly, the exception word love took 472 ms to read, and the
At this point, I must address the problem created by the notion that
Chapter 10
Juel and Solso Glusho’s article was followed by a more quantitative at-
tempt using bigram (letter-pair) frequencies in a series of studies by Juel
and Solso (Solso and Juel 1980; Juel and Solso 1981; Juel 1983). They
provided an informative and thoughtful analysis of the problem, though
this was largely a failed attempt because bigram frequencies are a very bad
fit to the spelling code. In 60 percent of the words in their lists, at least
one bigram unit was discordant with the word’s phonology.
Juel and Solso’s analysis did go some way toward identifying impor-
tant structural elements of the spelling code. They identified two types of
information contained in the spelling patterns. Their language has been
changed to some extent to be consistent with the use of terms in this book.
These categories tie letter sequences to positions within words, but do not
specify what the reader does with them. This led to a third orthographic
category:
As can be seen, this is essentially a visual logic, and does not include the
probability structure of the spelling code from the phoneme out. Thus,
there are two missing variables on the list. One is the frequency of occur-
rence in print of a particular spelling for a particular phoneme. The other
298
is the probability or likelihood that spelling alternatives for the same pho-
Chapter 10
neme occur in a large corpus of words. This problem was never solved in
future research, and remains unsolved as this book goes to press.
There were several interesting findings in this work. Juel (1983)
reported that the probability that a bigram consistently represents a pho-
neme was much more important for beginning readers (second graders).
Orthographic versatility was more important to older children (fifth
grade) and adults, and they read ‘‘versatile’’ letter pairs (bigrams that ap-
pear in lots of positions in lots of words) much faster. This result seems
counterintuitive, because one would expect that positional knowledge of
spelling patterns (chat, catch) would be extremely important and should
improve with age, and not that positional stability is less beneficial. How-
ever, this may be an artifact of the bigram technique and of their word
lists.
After investigating the instructional implications of these findings,
she reported that first graders taught by a letter-to-sound method learned
the letter-sound relationships faster for ‘‘versatile’’ letter pairs (those that
appear in many slots in words) than for nonversatile letter pairs. Juel
(1983, 325) had this to say:
children were asked to read real and nonsense words, and were scored for
only enemy is spook. The cow in cown and how in hown may be responsible
Chapter 10
for the children’s preference for /ou/ versus /oe/ readings, which garnered
nearly 100 percent of responses.
In analyzing these responses, Zinna and colleagues concluded that
‘‘the influence of the initial segment appears to account for most of the
variability’’ (p. 474). They suggested that because people read from left
to right, the first sequence activates a word: prou ¼ proud instead of
proup ¼ soup. But this explanation does not hold up consistently, not even
for proup, which was read proop 50 percent of the time. In short, children’s
decoding did not obey either the CV onset or the VC ‘‘rime,’’ and seemed
to involve the whole word.
And they were quite clear about this direction of the relationship: ‘‘The
consonant that follows the vowel helps to specify its pronunciation’’
(p. 108). They referred to this as a ‘‘special dependency.’’
These statements are troubling, because the goal is so circumscribed,
suggesting a lack of objectivity in the quest to classify orthographic
structure.
The article was in three parts. The first was a tabulation of the fre-
quency count of the ‘‘neighbors’’ of the orthographic units: C, V, C, CV,
VC in 1,329 CVC words. These units were analyzed correctly, in that
301
each phoneme could be represented by one to four letters: sit, soap, sight,
The results for individual graphemes paint a rather bleak picture of the En-
Chapter 10
glish writing system. . . . English is not very regular. For vowels especially,
a single grapheme often maps onto several phonemes. . . . If we incorporate
large orthographic and phonological units into our description of the English
writing system, however, the picture becomes more encouraging. The pro-
nunciation of orthographic units that contain a vowel grapheme and a final con-
sonant grapheme are more consistent than the pronunciations of single vowel
graphemes. (p. 112; emphasis added)
There are two problems with this and the statement above. First, it is no
more likely that the VC consistency is due to the consonant controlling
the vowel pronunciation, as the authors claim, than for the vowel to con-
trol the consonant spelling (co-occurrence is not cause). Second, these
1,329 common CVC words, constitute only .07 percent of all words in a
college dictionary of 200,000 words, scarcely a sufficient number of words
to advocate incorporating larger orthographic and phonological units in
descriptions of our writing system. Finally, there is no evidence for, and
considerable evidence against, the assumption that people’s behavior will
follow this particular statistical pattern in these particular words, as earlier
results have shown. The obvious next step was to find out if it did. Stu-
dents from two universities (27 from Wayne State and 30 from McGill)
were asked to read the 1,329 CVC words as quickly and accurately as
possible.2
I will present the results that were consistent between the two student
groups. Unless results can be generalized across similar populations, they
have no validity. Of the 42 variables entered into a regression analysis,
only those shown in the accompanying table accounted for significant
2. Reaction time (voice-onset time) and error scores were averaged across
subjects for each word, for the two groups of students separately. Thus, there
were a total of 1,329 reaction-time scores, one for each word. These were used
in a multiple-regression analysis to look at the relationship between the speed
involved in saying a word and the various properties of that word. In effect,
‘‘words’’ were substituted for subjects in the statistical analysis, and all the
‘‘variance’’ (variability) came from the words and not from the students. This
is a very unusual procedure.
303
amounts of variance for both the Wayne State and McGill students. These
The table illustrates two things. First, the figures in the bottom row reflect
a serious problem with the technology used to measure voice-onset la-
tency. Variation in the physical act of producing the initial phoneme in a
word accounted for the largest amount of variance in voice-onset time and
was quite different for the two groups. In essence, this is a huge source of
noise in the data. As such it should have been subtracted or set aside, yet
the authors added it in their analysis.
Second, whatever it is that accounts for the speed it takes to read a
word, this study has not found it. Word frequency or familiarity will not
play a big role in college students’ ability to read high-frequency, simple
CVC words, and we see that it does not here. But even so, it accounted for
more variance than anything else that was measured. The number of let-
ters per phoneme and the consistency rating for the initial consonant both
accounted for more variance than the VC rime, which accounted for less
than 1 percent in the Wayne State data and 1.3 percent for the McGill
students. Although these tiny values were ‘‘significant’’ due to the fact that
the 1,329 words acted as ‘‘subjects’’ in the study, they have no practical
significance.
Treiman and colleagues reached the following conclusion:
In all other cases there was little or no connection. This means that the
so, one would imagine that everything had been nailed down, and that all
Chapter 10
the head scratching that Glushko suffered through was over and done
with. Not so.
occur), and the fact that phonology is wrapped into the process—a process
3. In the definitive experiment, the following constraints had to apply for the
‘‘orthographic rime theory’’ to work. Words were selected that fit two main
categories: words with high-frequency friends (rime segments that appear
often in a lot of words) and words with low-frequency friends. They were
divided further into words with consistent and inconsistent rimes (one possible
decoding versus more than one). The inconsistent words were split further
into words with high-frequency enemies (rimes with different pronunciations
that appeared often in print), and those with low-frequency enemies (infre-
quent in print). Each word was then matched to a control word on everything
except consistency. Also controlled were a variety of other measures such as
word frequency and word length.
308
More Worms
We need to put Jared, McRae, and Seidenberg’s results in perspective,
especially because they underpin the rationale for the latest computer
models on reading from this group (Plaut et al. 1996). It is insufficient to
argue merely that these studies did or did not prove a consistency theory
about word-recognition speed and accuracy, because all the constraints
that made this statement possible should be part of the equation. An ac-
curate statement of what happened in this study follows.
Given words of equal frequency in print, all one syllable long and
containing the same number of letters, the same bigram-frequency count,
and the same orthographic-error score, consistent words (rimes pro-
nounced only one way) are read faster and more accurately than inconsis-
tent words (rimes pronounced more than one way) only if the inconsistent
words have enemies seen frequently in print. This effect is most marked if
the frequency of friends is low as well. This result is not affected by the
simple tally of friends or enemies.
There are two ways to look at this outcome. Here is the authors’
version, presented in light of the fact that they controlled all the relevant
variables listed above: ‘‘It is very likely that the observed effects are due to
the one stimulus property that systematically differed between groups:
consistency of spelling-sound correspondences’’ (p. 707).
I was more struck by the fact that as the controls kept mounting, the
corpus of words became so small that the investigators had to repeat word
309
lists twice to have enough words to do the study. Thus, even if these ef-
Words that are similarly spelled and similarly pronounced (e.g., rhyming
neighbors such as FEAT and TREAT) have similar effects on the weights:
therefore exposure to one word improves performance on the other. . . .
Words that are orthographically similar but phonologically dissimilar have
mutually inhibitory effects: training on TREAT has a negative impact on the
weights relative to GREAT and vice versa. The net effect of the entire en-
semble of learning experiences is poorer performance on inconsistent items
compared to entirely consistent ones. ( Jared, McRae, and Seidenberg 1990,
709)
Throughout this paper we have assumed, as others have before us, that pro-
nunciation is largely determined by properties of neighborhoods defined in
terms of word bodies. . . . However, there are some words in English that have
word bodies found in no other words (e.g. SOAP is the only OAP word). . . .
Performance on SOAP, for example, may be affected by exposure to words
such as SOAK and SOAR. Therefore, our studies should not be taken as
indicating that only neighborhoods defined in terms of word-bodies are rele-
vant to naming. (p. 711)
But what if there is a simpler explanation? What if the word soap is read
s
/ / / oe/ p
/ /
For some reason the alphabet code has gone astray in this research!
310
results than the one they provided. This has to do with the fact that the
words in their consistent and inconsistent lists contained different vowels
and unequal ratios of the same vowels. When words on these two lists are
balanced for counts of the same vowel sounds, and their spellings are
examined, rime-based explanations fall apart, a problem similar to the one
with Treiman et al.’s word lists.
My analysis was extensive, and I provide one small example here,
using the words from experiment 3 ( Jared, McRae, and Seidenberg 1990,
698). The words on the consistent and inconsistent lists had only 6 vowels
in common out of a total of 19 vowels. The vowels on the consistent lists
were more likely to be ‘‘simple’’ (short) vowels: /a/, /e/, and /i/. That is,
they tended to be the vowels that had the fewest spelling alternatives and
that were the easiest to decode. As to what this has to do with rimes or
rhymes, the answer is nothing. But we can probe deeper. The alleged
function of the orthographic rime is that the final consonant provides a
clue to the pronunciation of the vowel sound. Did the final consonant
help in these word lists?
We can take as an example the /oe/ sound, which was used the most
on any list, for a total of nine words. Here is a list of the /oe/ words from
the ‘‘consistent’’ list on the left, with the total number of words that are
read consistently ( friends) in brackets. They are paired with words on the
‘‘inconsistent’’ list—that is, words with different pronunciations of the
same orthographic rime (enemies), with the number of enemies in brackets.
Counts are based on a corpus of 3,000 words.
space’’ in which, at any given time, the pattern of activity across all units
engaged in processing corresponds to this single point. This is similar to
a ‘‘node’’ in a network that contains all the information about particular
input-output relationship. Furthermore, ‘‘A set of initial patterns that
settle to this same final pattern corresponds to a region around the at-
tractor, called its basin of attraction. To solve a task, the network must
learn connection weights that cause units to interact in such a way that the
appropriate interpretation of each input is an attractor whose basin con-
tains the initial pattern of activity for that input’’ (p. 82). Whether neural
networks behave like attractors or work by superposition and Fourier
analysis/synthesis (Pribram 1991), or both, is a problem for the future.
What is interesting now is how well the attractor model mimics human
performance. For one thing, it can ‘‘decode’’ nonwords never seen before
just as well as humans do. This had been a failing of the earlier models.
Another bonus of this model is that it can be directly compared to the
reaction-time data. Previous models could only be compared to error
scores.
Two properties of the model are of greatest interest here. The first is
that the mutual redundancies of orthographic ‘‘rimes’’ and phonological
‘‘rhymes’’ are mapped together in an attractor basin. The authors explain
it this way: ‘‘The reason is that, in learning to map orthography to pho-
nology, the network develops attractors that are componential—they have
substructure that reflects common sublexical correspondence between or-
thography and phonology. This substructure applies not only to most
words but also to nonwords, enabling them to be pronounced correctly’’
(p. 83; original emphasis).
The second property of interest is that an attractor model can be sys-
tematically and continuously degraded until its ‘‘boundary state’’ for
mapping a particular feature(s) of the input begins to collapse. This allows
for a direct measure of each feature’s robustness, as well as of which fea-
tures work together to produce a correct outcome (the correct pronunci-
ation of a word).
The model (a computer program) was set the task of learning how to
read 3,000 words. These words varied in things like neighborhood con-
sistency (friends/friends þ enemies), word length, number of phonemes in
the word, and a measure of word frequency. The results for the time it
313
took to learn to read a word showed that all four factors accounted for
Figure 10.1
The amount of activity in the attractor network (boundary state) for initial consonants (onset),
vowel, and final consonants (coda) necessary for the computer to read words correctly. In words with
regular spellings (trip) all phonemes are context free (decoding is not influenced by surrounding spellings).
In words with irregular spellings (dead/read, break/beach) the vowel cannot be decoded independently of
the surrounding consonants. From Plaut et al. 1996. Copyright APA. Reprinted with permission.
315
These studies provide strong support for a third way to decode print in
addition to knowledge of phoneme-grapheme correspondences and rote
visual memory. This third way is a gift of the brain’s capacity to automat-
ically encode the statistical redundancies of any kind of pattern. The third
way was first proposed by Glushko, and has been operationalized mathe-
matically in research by Plaut et al. Aaron and colleagues provided further
proof in their discovery that deaf children rely on visual statistical redun-
dancies to read and spell.
We have come full circle. Chapters 1 and 2 showed that knowledge
of the English spelling code is the key to understanding how English-
speaking peoples learn to spell and how they decode (read) orthographic
patterns. This knowledge is as important for designing spelling and
316
It is our good fortune that the National Reading Panel’s fishing expedition
led to such a harvest of treasures. We now have precise knowledge about
how to teach every child to decode, read fluently, and comprehend what
they read. No child needs to be left behind. This is exciting news indeed.
It has been long in coming. We have been held up by two major road-
blocks: the difficulty of unearthing properly controlled research on suc-
cessful programs amidst the vast wasteland of publications on this topic,
and the difficulty of identifying the basic elements of reading instruction
that consistently lead to success. The NRP has gone some distance toward
solving the first problem. And I hope this book will help in solving the
second.
The NRP’s analysis showed that research on reading instruction is
still being cast in the phonics versus whole-word (‘‘reading wars’’) frame-
work. In virtually all 66 ‘‘cases’’ in the NRP database, the control group
was taught with a whole-word method. These methods were identified
in the NRP tables as ‘‘whole word,’’ ‘‘basal,’’ ‘‘whole language,’’ or
‘‘unspecified’’—meaning ‘‘regular classroom program’’ (i.e., whole word).
Few studies contrasted two types of phonics program.
It is time to move on. The goal of this chapter is help us do that. First,
I want to summarize the important discoveries presented in this book, and
second, I would like to make some suggestions for how to move reading
research into the twenty-first century. I will begin with the prototype.
This is the first set of objective guidelines for reading instruction based
on the historical and scientific evidence. They establish the ground rules,
though they are still incomplete. Much more research is needed to pin
down specific details, like the order (if any) in which phonemes should be
taught, how fast they should be taught, what type of lessons work best,
318
and which materials, special exercises, and activities matter, and which do
Chapter 11
not.
1. Make sure the complete structure of the writing system has been
worked out (or thoroughly understood) before a method of instruction is
developed.
319
2. Teach the specific sound units that are the basis for the code. (Do not
Visual Phonics The short version (a) teaches the 26 ‘‘sounds’’ of the
26 letters of the alphabet. The long version (b) teaches the 27 to 200þ
‘‘sounds’’ of the letters, digraphs, and phonograms.
We can rule out ‘‘junk phonics’’ at the outset based on the evidence
from the NRP report. Children taught with basal-reader phonics gener-
ally did poorly compared to children taught with other types of phonics
programs. We can eliminate whole-to-part and multisound phonics pro-
grams as well, based on the historical evidence and on their dismal show-
ing in the NRP report.
This leaves visual phonics and linguistic phonics. There is consider-
able confusion about the nature of these two types of programs, especially
the distinction between teaching from the letter versus teaching from the
phoneme, and the issue of which one is ‘‘synthetic’’ phonics. The NRP
reading committee, along with most U.S. phonics advocates, either do not
recognize this distinction or do not know it exists.
Let’s clarify the distinction once more. Linguistic phonics anchors the
code in the sounds of the language, and bases initial reading instruction
on a basic code. This avoids several pitfalls created by our highly opaque
spelling system. The code is temporarily set up with a manageable number
of units (40þ phonemes), and this number does not change as the lessons
progress. Children see that an alphabet code works from a finite number
of speech sounds to their individual spellings. They observe that the code
can reverse (all codes are reversible mapping systems by definition). Later,
spelling alternatives can easily be pegged onto the system without chang-
ing the logic: ‘‘There’s another way to spell this sound. I’ll show you some
patterns to help you remember when to use this spelling.’’
Visual phonics does the opposite. Because letters and letter sequences
(digraphs, phonograms) drive the code and phonemes do not, this deprives
the code of any anchor or endpoint. As the number of ‘‘sounds’’ begins to
expand (there are over 250 ‘‘sounds’’ that letters ‘‘can make’’), there is no
way to get back to the 40 phonemes that are the basis for the code. The
child is soon awash in letter patterns and there is no discernible structure,
no limit, no logic, and no pivot point around which the code can reverse.
321
But what is ‘‘clear’’—and clear to whom? And what is ‘‘major’’? In the first
Chapter 11
Given that at least five programs reviewed by the NRP follow all or
Chapter 11
nearly all of these guidelines (minus the complete spelling code), the dif-
ferences in the effectiveness of these programs have to do with specific
features such as pacing and depth, special activities, and curriculum mate-
rials. Research varying these details, element by element, would be ex-
tremely time consuming and tedious, but not all elements are in question.
Let’s begin with the definitive proof about what children can learn, at
what age, and in what time period, to provide a baseline or set of criteria
for what can be accomplished.
One important ‘‘rule’’ needs to be stressed. If your goal is to teach the
alphabet code, all components of the prototype must be taught at the same
time, because each component bolsters the remainder. It is counterpro-
ductive to ease children into reading by teaching these elements slowly in
disconnected and unrelated bits.
The youngest children in these studies were in the Jolly Phonics/Fast
Phonics First classrooms in the United Kingdom. They were between 4:8
and 5:0 years old at the start of training. The training period lasted from
10 to 16 weeks depending on the study, and the total number of hours
in whole-class lessons varied from 26 to 60 h. In these training sessions,
children learned the 40þ phonemes, their basic-code spellings, and how
to write every letter and digraph for the 40 phonemes. Children learned
to identify phonemes in all positions in a word, segment and blend pho-
nemes in words, and read, write, and spell common words. Immediately
after training, these children were 8 months above U.K. age norms on
standardized reading and spelling tests. The control groups scored exactly
at age norms. These gains increased at the second and third testings to 1
year above norms, and by the fourth and fifth grades children were 2 years
above norms on tests of decoding. Effect sizes were consistently around
1.0 or higher compared to the children learning via analytic phonics.
Now we have a useful baseline. If children this young can learn to
read and spell 40 phonemes and their common spellings in 10 to 16
weeks, certainly any older child can do the same. We also know from
Lloyd’s account of the development of Jolly Phonics that children are
eager learners and willing to work hard. Because these programs involve
only the elements in the prototype and the learning period is so com-
pressed, children see gains almost immediately. They begin reading words
325
after the first six sounds are introduced. This is highly motivating, espe-
Jolly Phonics Actions The interesting difference among the U.K. studies
is the shorter training time in the Scottish study (26 hours versus 50 to
60) while achieving similar gains that held up well over time. A large
part of the savings came from omitting the Jolly Phonics ( JP) actions.
On the other hand, Sumbler in Canada reported that of all the phonics
activities they assessed, the time spent learning and using JP actions pro-
duced the highest correlation to reading test scores. The problem here is
that JP actions are confounded with lessons on phoneme-identification as
well as segmenting and blending activities, as they are taught and used
simultaneously. Nevertheless, Sumbler’s data suggest that JP actions
matter, while Johnston and Watson’s data suggest they do not.
Nellie Dale, who developed the first successful linguistic-phonics pro-
gram, warned teachers to ‘‘never teach anything you have to discard
later.’’ Does this warning apply to the JP actions, or might learning them
be helpful even though they will be discarded? This is an interesting re-
search question from a number of perspectives. Lloyd believes that the
action patterns supply a ‘‘motor’’ component in a multisensory type of
learning, and this helps automate learning sound-symbol correspondences.
One could argue, as well, that the action patterns help focus attention on
the differences between the phonemes, increasing auditory discrimination.
(Children must listen carefully to know which action pattern to use.) I
pointed out in chapter 5 that the actions may help the teacher during
whole-class instruction by providing visual signals for who is participating
and who is not, or who is ‘‘getting it’’ and who is not.
1. At the end of his first week in kindergarten, my son had this tearful com-
plaint: ‘‘I’ve been in school a whole week. It seemed like a hundred years, and I
haven’t even learned to read!’’
326
originated with Dale. Dale had the onerous task of engaging 70 children
Chapter 11
from the front of the room. She reported that focusing attention on the
parts of the mouth, tongue, and vocal cords that produce each phoneme
makes the phoneme easier to hear and identify, and therefore faster to
learn. It also makes it easier to tell phonemes apart (auditory discrimina-
tion improves).
Unlike the JP action patterns, speech-motor patterns are executed
spontaneously. Because there is no need to learn them, only to be made
aware of them, this will take up little extra time. Dale used these patterns
strictly for auditory support as each phoneme was taught. In the Linda-
mood program, however, the motor patterns form a major part of training
and are taught in depth. Each motor pattern for 44 phonemes is given a
special name, which children have to memorize. For example, a bilabial
plosive (/b/, /p/) is called a lip popper. The voiced plosive (/b/) is called a
noisy lip popper.2
We now have the ingredients for an interesting and important study
on whether these ‘‘add-ons’’ assist learning. A study could be designed in
which there are three groups of beginning readers all taught with the same
linguistic-phonics program. One group learns the basic program only, a
second group learns this plus Jolly Phonics actions, and the third group,
the basic program plus speech-motor patterns (à la Dale). Any benefit
from these additional components would be expected to impact learning
rate—the time it takes to master 40þ phoneme-grapheme associations)
and automaticity—the speed to match a letter to a phoneme and vice versa.
Other important measures, of course, would include reading, spelling, and
phoneme awareness (segmenting, blending).
The automaticity issue is extremely important. Lloyd (1992) observed
that the most persistent differences between children was the time it took
to ‘‘automate’’ the phoneme-grapheme associations. Without automaticity,
children have difficulty blending sounds into words (auditorally or from
2. Memorizing these names and which phonemes they apply to is not easy, as
anyone who has used the Lindamood program can attest. Using them during
lessons is even harder: ‘‘In this word the ‘quiet lip popper’ comes after the
‘little skinny,’ not after a ‘scraper.’ ’’
327
print), because they lose track of phoneme sequence and which phoneme
Reading Fluency
The research on reading fluency is a curious mix of nonsense and bril-
liance. The vast majority of publications uncovered by the NRP were
devoted to the premise that ‘‘reading a lot’’ increases accuracy, reading
speed, and comprehension. Research (or quasi-research) on this topic
turned out to be so methodologically flawed that only 14 studies out of the
1,000 or so screened by the panel had any merit. These studies showed
essentially nothing. There is a central problem with this premise, because,
without a highly sophisticated research design, there is no way to account
for reading habits, and no way to tell whether children read more because
they are fluent readers, or whether they turn into fluent readers because
they read a lot. The best research on this topic has been provided by
Stanovich and West (1989), and even this study could not solve all the
research-design problems. This study and similar studies are reviewed in
Language Development and Learning to Read.
Fortunately, the ‘‘reading a lot’’ issue is a red herring, because there is
a much better way to ensure that children read fluently, accurately, and
with understanding. This is through the technique of rereading. I want to
provide a brief summary of this work along with suggestions for future
research.
Due to the large individual variation in the rate at which the
phoneme-grapheme associations become automated, a small portion of
children have difficulty decoding at sufficient speed to comprehend what
329
increase from one story to the next. If a criterion reading speed is set, this
Chapter 11
should be achieved more quickly with each new story. Accuracy and com-
prehension should improve with each new story as well.
I have provided a list of suggestions, based on the data, to help ensure
that rereading is a success. Here they are again. Target reading speeds
should be set well above the child’s initial reading speed. So far, no one
failed to achieve targets 50 to 60 wpm higher than the child’s baseline. It is
my personal view that targets should be reset each time the child’s reading
speed improves by some critical amount. The final goal should be to have
the first reading of a story at a normal or superior rate for the child’s age. At
this point, rereading exercises can cease.
Children need multiple rereading experiences (across many stories),
not just a few. Dowhower’s time frame of about 7 hours of work over 7
weeks was effective and seems optimal. Time pressure was not excessive,
and sessions were not spread so far apart that there were no carryover
effects. The ultimate goal is the desired target speed on the first reading,
and this goal determines how long the rereading sessions last.
The difficulty level of the text is critically important, because speed
is tied to difficulty level. Very slow readers should start off reading passages
at or just above their reading level, not their grade level. Once reading
speed improves, stories should increase in difficulty. Passages with over-
lapping words are best for struggling readers and very young readers, and
are most likely to produce carryover (transfer) effects from one story to
the next. Overlapping context helps as well. This creates a situation in
which transfer is successful and boosts confidence and motivation to
continue.
ES ¼ .97. But this fell substantially when standardized tests only were
Chapter 11
The eight strategies that Block believed were crucial for improving chil-
dren’s comprehension skills and increasing progress toward these goals
333
New Frontiers
Chapter 11
CVCC only) for 2,000 repetitions. The number of times each word ap-
Computer Errors?
Computer sees Computer says Researchers say
BLEAD, WEAD bled, wed bleed, weed
BOST, SOST boast bossed
Here are the actual probabilities (options) for making analogies to real
words:
BLEAD, WEAD:
Analogy to onset/vowel (CV )
as /e/ (none)
as /ee/ bleach, bleak
BOST, SOST:
Analogy to onset/vowel
as /oe/ bold, both
as /o/ Bob, bog, bond, box
Analogy to rime
as /oe/ ghost, host, most, post
as /o/ cost, frost, lost
338
3. The authors did state that if they scored the errors on the basis of whether
any word in the original corpus of 3,000 had the ‘‘same body’’ (rime) as the
nonword, the computer’s errors reduced to 9. But the network model is based
on statistical redundancy (probability), not isolated cases, so this does not
solve the problem.
339
Misconceptions about sight word reading persist. One is that only irregularly
his basic findings were replicated and extended. Biemiller (1977/1978) re-
Chapter 11
ported that second and third graders took longer to name words than
to name letters, but that the speeds were identical by fourth grade and
remained so through college, while overall naming speed increased at each
grade. Reading words in context was much faster than reading words in
isolation, and this effect appears by second grade.
Samuels and colleagues speculated that if a word is read as a single
unit (holistically), then word length would not matter (see Samuels,
Laberge, and Bremer 1978; Samuels, Miller, and Eisenberg 1979). They
found that word length (three to six letters) did have a strong effect on
naming speed at second grade, and continued to do so at higher grades
when words were over five or six letters long, but there was no word-
length effect for college students. They concluded that younger readers
process words letter by letter, whereas college students read ‘‘holistically.’’
This was interpreted as some kind of ‘‘activation’’ process: ‘‘What is ap-
parently being changed as one progresses through the grade levels is a
reduction of the contribution of activation of each letter code to the word
code’’ (Samuels, Laberge, and Bremer 1978, 719).
However, as Samuels, Laberge, and Bremer noted, ‘‘activation units’’
cannot explain how college students process all words at the same speed
regardless of length. Instead, they proposed that ‘‘there must be a tradeoff
in the activation threshold with word length, such that the letter codes of
three-letter words produce more activation than the letter codes of the six-
letter words’’ (p. 719; emphasis added).
To such complexities do our theories lead us!
I have surveyed a mere sprinkling of the many studies over the past
120 years in which researchers concluded that if you can read a word as
fast as you can name a letter, this is proof that the word has become a
‘‘sight word,’’ processed holistically, instantly, and automatically.
No one stopped to consider that the same results could reflect the
limits of motor processing (speech-output rate) across the developmental
span, not visual-processing speed. In other words, the similarity in speeds
to name letters and words may be caused by limits on the time it takes to
program an output and produce it, not how long it takes to perceive or
recognize it.
The idea that you can infer something about perception, cognition,
and brain processing from a single measure of simple response time is
343
extremely naive. This problem was recognized early in the twentieth cen-
A century later, we know that the brain processes all incoming sen-
Chapter 11
sory signals but only alerts us to what is relevant after processing it. We can
ask our brain to hold something ‘‘in mind’’ if we choose to focus on it,
but usually at a metabolic cost (Pribram and McGuinness 1975). Further-
more, conscious awareness of external events can be incredibly slow,
especially when they are novel and unexpected. When President Reagan
was shot, a film recorded nearly a minute of silence before any reaction
was seen or heard (brains were churning while bodies were frozen in
inaction).
There is another brain principle at work. With experience and learn-
ing, neural processing increases in efficiency (in terms of precision, speed,
and neural organization) to the point where multiple or ‘‘parallel’’ pro-
cessing goes on automatic pilot, outside our awareness. There are in-built
properties of neural systems that allow for such rapid processing that we
can never become aware of it, no matter how hard we try. People do not
hear or feel phoneme coarticulation in speech. Yet we know it is there all
the same. We do not hear the microsecond differences between the arrival
time of sounds to each ear that allows us to localize objects in space.
Our sense that we read whole words instantly by sight via some
direct pipeline from the eye to meaning is an illusion. No matter how
much something ‘‘seems like’’ it happens instantly, it does not. Con-
scious awareness and brain processing run on different clocks. The brain
can map grapheme-phoneme correspondences, analyze patterns of ortho-
graphic redundancies, register degrees of word familiarity, perceive con-
text cues, and work out possible decodings of odd or unpredictable
spellings, in parallel. Everything else the brain does works this way. In
ordinary conversations, we map phoneme sequences into words, process
syntax and semantics, perceive quality and tone of voice, note vocal in-
flection, and watch mouth movements, facial expressions, body posture,
and mannerisms all at the same time. The brain has an amazing capacity
for ‘‘multiplexing’’ or ‘‘multitasking’’ hundreds or thousands of operations
simultaneously. Brains are redundant parallel processors even when they
do not need to be.
Your Brain Can Read Before You Know What It Has Read We have barely
begun to scratch the surface of how the brain processes the printed page.
Recent breakthroughs have shown that readers’ expectations based on
345
syntax and semantics govern how the text is read, and what they look for
We propose that although the coding of structure and the coding of meaning
The author viewed the ‘‘structural frame’’ as local in the sense of the im-
mediate context. The location of a particular word that disambiguates
meaning changes the depth of the perceptual analysis in surrounding
words, even when it ‘‘appears in mind’’ before it is read and/or is expected
in the future.
This research shows that reading is far more complex and more amaz-
ing than anyone could possibly imagine. It highlights the fact that this
complexity, which, in these studies, required visual search, decoding, the
processing of meaning, and syntactic analysis, is also accompanied by an
attempt to carry out a global structural analysis of every sentence and
anticipate which words need to be briefly scanned or receive full attention.
And this goes on completely outside conscious awareness. A good reader is only
conscious of meaning. The print on the page is all but invisible.
We are not the conductors of this symphony. Our brain is. This is
why it is foolish to imagine that anyone can decree how a particular word
is read by the brain, simply because reading seems automatic, and then
presume to know what this process entails.
Appendix 1: How Nations Cheat on
In 2001, the media in the United Kingdom reported that English students
outscored Sweden in a recent international reading survey, citing PISA
2000 (Programme for International Student Assessment, OECD 2001).
Yet a mere four years earlier, in another OECD study on twelve coun-
tries, Sweden came first in the 16-to-25 age group while the United
Kingdom, Australia, New Zealand, Ireland, and the United States barely
outdid Poland. Contradictory results also appeared in a recent study on
nine-year-olds from thirty-two countries—PIRLS 2001 (Progress in In-
ternational Reading, April, 2003). England was third in this study after
Sweden and the Netherlands. The United States was ninth. The apparent
spurt in literacy rates for England and the United States is not mirrored
by well-conducted domestic studies, such as the National Assessment of
Education Progress (NAEP).
In the most recent NAEP report on fourth-grade children (NAEP
2002, U.S. Department Education 2003. See USDE Web site), 38 per-
cent of the children were ‘‘below basic’’ (functionally illiterate), 32 percent
were at ‘‘basic level,’’ 23 percent were ‘‘proficient,’’ and 6 percent were
‘‘advanced.’’ These values are similar to earlier results of 43 percent, 33
percent, 20 percent, and 4 percent, respectively. Certainly, a 38 percent
functional illiteracy rate is unlikely to catapult the United States into ninth
place internationally. NAEP studies are extremely rigorous. In NAEP
1992 (Mullis et al. 1993), 140,000 children were tested on an individual
basis. Test items were secured, and outside testers monitored protocols
and compliance rates.
Earlier international reading studies sponsored by the OECD were
also well conducted. Yet a senior official on this project told me that
350
One country withdrew from the study because they did not like their
results.
Today, the situation has deteriorated to the point where results are
meaningless. PISA 2000 is the most recent of the OECD studies, involv-
ing thousands of 15-year-olds from thirty-two countries. Pages 232 to
236 of the report provide information on population sampling along with
tables on exclusion and compliance rates (tables A3.1, p. 232, and A3.2,
p. 235). Compliance was monitored in the following ways: (1) Target
schools representing a demographic sample were provided to local school
districts. (2) If a school would not or could not participate, a replacement
school from a second list had to be chosen. To avoid selection bias, the
compliance rate at step 1 was set at 85 percent minimum. (3) A fixed
number of students per class had to participate based on classroom size
(usually about 35 children), with a minimum compliance of 80 percent
student participation.
Exclusion rates refer to children in the population who cannot be
tested because of mental retardation, being blind or deaf, and so forth.
The guidelines were: ‘‘The percentage of 15-year-olds excluded within
schools had to be less than 2.5 percent of the nationally desired target
population.’’ The last part was in boldface type. This section of the re-
port also expressly stated that ‘‘special education’’ was not a criterion for
exclusion.
The PISA consortium, a body of various government officials and
statistics experts, were apparently unable to enforce even these lenient
guidelines. Failure to comply with exclusion rates was ignored completely.
Poor compliance rates were explained away with the statement (p. 236)
that countries with poor compliance rates supplied documentation that
adequately explained why rates were low! The Netherlands, with a 27
percent compliance rate, was the only country dropped from the study.
The compliance rates for the remaining worst offenders are provided
below. The score in column 1 must be 85 percent to meet the require-
ments of the study. If not, reading scores will be unreliable, especially
when the percentage of missing schools is added to the percentage of
missing students. Even with replacement schools, the United States tested
only 55 percent of students they should have tested.
351
I compiled lists representing the true facts about exclusion and compliance
rates for all countries. Composite reading test scores are included, taken
from table 2.3a, p. 253 in the report.
percent:
Canada 4.9% 534
Denmark 3.1% 497
France 3.5% 505
Sweden 4.7% 516
Control over sampling and compliance rates was also a problem in the
Analysis of Covariance
The analysis of covariance (ANCOVA) is justified when subjects differ
widely in a particular skill at the beginning of a study where the same skill
is being trained, or when subjects differ widely on something likely to con-
tribute to this skill, such as IQ or vocabulary. In other words, covari-
ance would be used where it would be logical to pair or match subjects
between groups, or when a difference between groups occurs that does
not seem due to chance. Even if one assumes that Bond and Dyskstra’s
study met these requirements, one must be able to answer the following
question: Did the groups consistently differ on the baseline measures?
Unfortunately, this question cannot be answered, because the incorrect
data (means again) were used for the statistical analyses. Furthermore,
when comparing the statistics for all projects combined, versus individual
project analyses, the results from the baseline tests suffered from a ‘‘now
you see it now you don’t’’ effect. In the basal-reader versus i.t.a. compar-
isons, letter-name knowledge was significantly better for i.t.a. Yet when
the five projects were analyzed separately, the letter-name effect disap-
peared for every group. However, a group difference in phoneme discrimi-
nation appeared that had not been there before. These results are typical
357
Misuse of Statistics
used in a statistical analysis.
ANCOVA is based on correlational statistics with the twist that it
allows you to make all groups comparable on the baseline measures, and
then adjust the outcome measure accordingly. The mathematics requires
the data to be linear and normally distributed, with similar variances on
each test. By reducing the data to means, none of these requirements can be
met. In addition, the computations in an ANCOVA analysis must include
values for sampling error and residual variance (the square of the standard-
deviation estimate), neither of which can be computed with means.
Degrees of Freedom
Degrees of freedom translates to the number of subjects in a study, minus
the levels or conditions under each treatment or factor, minus 1. In com-
puting ANOVA statistics, the total variance for treatments is divided by
the number of levels for that treatment (for i.t.a. versus basals, df ¼ 1), and
the total variance for subjects across treatments (within-subjects error) is
N 1 1. If the degrees of freedom for the error term is very large (lots
of subjects), then the final value for within-subjects error (the denomina-
tor in the F ratio) is very small. This, in turn, would make the final F value
too large and more likely to be significant. In this study, individual proj-
ects were analyzed one at a time, and the degrees of freedom should have
represented only the classrooms in that project. Instead, the value for the
degrees of freedom represented all the projects in a method group. In the
i.t.a. comparison, the degrees of freedom for the error term was five times
larger than it should have been for every comparison. And the same prob-
lem continued throughout.
As an illustration, by analyzing each project separately, this reduces
to a two-factor random-group design (gender treatment). Project 1 had
32 classrooms (64 means), so degrees of freedom would be 1 (treatment)
1 (sex) and 61 subjects. The table values given, however, were 1 and 292,
which is clearly incorrect. This was not a typographical error, because the
same mistake appeared on all other tables where individual projects were
analyzed separately (see tables 27, 38, 49, 60, and 71). The same mistake
appeared in the analyses for the individual tests as well. This means one
cannot rely on the results for any of the outcome measures.
Appendix 3: An Analysis of Word
Low CV/High VC
bar, jar, beer, dear, rear, turn, ball, tall, lock, cook, loss, bob, mob, heap, thin
High CV/High VC
deck, pick, suck, Dutch, mess, gum, gun, met, wet, file, game, hope, rode, role,
sung
Low CV/Low VC
bear, pull, push, wash, watch, none, ton, won, cough, rough, lose, dog, fog, death,
bade
High CV/Low VC
word, worm, chose, dose, lease, pose, gas, yes, doll, shall, hood, mood, wood, limb,
mead
1. One list contains six vowel þ r words: bar, jar, beer, dear, rear, and turn.
The /r/ is a vowel in its own right (/r/ in her). When it is combined with
another vowel (VV ) as in for, this creates a diphthong: /oe/-/er/. A diph-
thong is taught as one unit (a digraph or phonogram) and not as two
separate phonemes (/oe/ and /r/), which would make no sense. The words
listed above are actually CV words, not CVC words, and so have no VC
360
unit. The letter r cannot ‘‘control’’ the pronunciation of the vowel because
Appendix 3
lable words (weapon). The VC unit is not productive here in view of the
1. There are far more irregularly spelled words on the low-VC lists. I
have mentioned bear, which is ‘‘irregular’’ compared to dear and rear.
There is shall (one of the renegades in the all family), as well as none, ton,
and won, with uncommon spellings of the vowel /u/. Cough and rough both
appear on the same list, causing confusion because ou stands for different
vowel sounds, and gh is a low-probability spelling for /f/. Did children
have trouble reading these words because of their irregular spellings, be-
cause of the fickleness of ough (bough, bought, though, through), or because
the VC ‘‘rime was too inconsistent’’ to provide a clue to how to pro-
nounce the vowel, as they claim?
2. These lists contained many words in which the initial consonant con-
trols the vowel spelling: pull, push, wash, watch, word, worm. As noted
above, /p/ (and /b/) take the u spelling for the vowel /oo/, ^ as in pull, push,
pudding, pulley, pullet, put (and in bull, bullet, bully, bush, butcher). The sound
/w/ takes the a spelling for the vowel sound /o/: wad, waffle, wan, wand,
wander, want, wash, wasp, watch, water, also true for /sw/ blends (swab,
swamp, swap, swat) and even for /skw/ blends (squab, squabble, squall,
squander, squash, squat). The sound /w/ takes the or spelling for the vowel
sound /er/, as in word, work, world, worm, worry, worse, worst, worth, worthy.
It is noteworthy that four of these strongly CV-controlled spellings were
not on the high-CV consistency list, showing that these lists reflect little
understanding of the spelling code.
3. Four words use the Old French spelling se for /s/ and /z/: chose, dose,
pose, and lose (choze, doass, poze, and looze), and are unstable for both
362
consonant and vowel, thus being highly likely to cause decoding errors.
Appendix 3
To add to the confusion, uncommon final /s/ words were also on the list:
gas, yes. They are part of a tiny group of words ending in /s/ (not plural)
that use the s and not ss spelling (bus, gas, us, this, yes).
4. The words hood, mood, and wood were on the same list. This creates
confusion because this is the main spelling (basic-code spelling) for two
vowels (nothing to do with ‘‘control’’ by a final consonant), as in food,
mood, noodle, poodle / good, hood, stood, wood.
5. Two other words seemed designed to cause reading errors—limb and
mead—because few children will have heard of these words.
6. Finally, the words dog and fog are on these lists. These are part of the
‘‘context-free’’ CVC group, and it is not clear why the og spelling is ‘‘in-
consistent’’ with any other pronunciation.
Glossary
basic code Used during initial reading instruction. Each phoneme in the
language is represented by its most common or least ambiguous spelling.
binomial test A statistical test that computes a numerical score that will
exceed chance (guessing) at a specified level of probability. The computa-
tion takes into account the number of items on a test and the number of
alternatives for each response.
code Any system in which arbitrary symbols are assigned to units within
a category. The number symbols 1–10 represent units of quantity. Letters
represent units of speech sounds (phonemes).
Glossary
letter to indicate pronunciation.
diphthong A vowel sound that elides two vowels in rapid succession and
counts as one vowel (/e/ þ /ee/ ¼ /ae/ in late).
factor loading A final value assigned to a test after a factor analysis has
been carried out. This represents the power of that test to represent a
factor in correlational values.
366
floor effects When a test or task is so difficult that most of the scores
Glossary
are at zero.
hiragana One of two systems for marking diphones used in the Japa-
nese writing system.
kanji Symbols standing for whole words used in the Japanese writing
system.
katakana One of two systems for marking diphones used in the Japanese
writing system.
Glossary
another type.
phoneme The smallest unit of speech that people can hear; corresponds
to consonants and vowels.
phonics A generic term for any reading method that teaches a relation-
ship between letters and phonemes.
Glossary
stand when spoken by others.
rime A technical term for the final portion of a word that ‘‘sounds like’’
other words (rhymes). and in band, bland, brand, hand.
sight words Printed words that children are asked to memorize visually
as random string of letters. A true sight word contains rare spelling pat-
terns (‘‘yot’’ ¼ yacht).
word family A group of words that share the same ending sounds, that
are spelled the same, and that rhyme (bright, night, fight, sight).
Aaron, P. G., Keetay, V., Boyd, M., Palmatier, S., and Wacks, J. 1998. Spell-
ing without phonology: A study of deaf and hearing children. Reading and
Writing: An Interdisciplinary Journal, 10, 1–22.
Berninger, V. W., Vaughan, K., Abbott, R. D., Brooks, A., Abbott, S. P.,
Rogan, L., Reed, E., and Graham, S. 1998. Early intervention for spelling
problems: Teaching functional spelling units of varying size with a multiple-
connections framework. Journal of Educational Psychology, 90, 587–605.
Blachman, B. A., Tangel, D. M., Ball, E. W., Black, R., and McGraw, C. K.
1999. Developing phonological awareness and word recognition skills: A two-
year intervention with low-income, inner-city children. Reading and Writing:
An Interdisciplinary Journal, 11, 239–273.
References
Theory, and Classroom Practice. Jossey-Bass Education Series, no. 20. San Fran-
cisco: Jossey-Bass.
Block, C. C., and Mangieri, J. N. 1997. Reason to Read: Thinking Strategies for
Life through learning. Pearson Learning report, no. 20.
Block, C. C., and Pressley, M., eds. 2001. Comprehension Instruction. New
York: Guilford Press.
Block, C. C., Gambrell, L., and Pressley, M. 1997. Training the Language
Arts: Expanding Thinking through Student-Centered Instruction. Boston: Allyn
and Bacon.
Bond, C. L., Ross, S. M., Smith, L. J., and Nunnery, J. A. 1995–1996. The
effects of the Sing, Spell, Reading and Write program on reading achievement
of beginning readers. Reading Research and Instruction, 35, 122–141.
Bond, G. L., and Dykstra, R. 1967. The cooperative research program in first-
grade reading instruction. Reading Research Quarterly, 2, 1–142.
Bradley, L., and Bryant, P. E. 1985. Rhyme and Reason in Reading and Spelling.
Ann Arbor: University of Michigan Press.
Brady, S., Fowler, A., Stone, B., and Winbury, N. 1994. Training phonologi-
cal awareness: A study with inner-city kindergarten children. Annals of Dys-
lexia, 44, 26–59.
Brett, A., Rothlein, L., and Hurley, M. 1996. Vocabulary acquisition from lis-
tening to stories and explanations of target words. Elementary School Journal,
96, 415–422.
374
Brown, R., Pressley, M., Van Meter, P., and Schuder, T. 1996. A quasi-
experimental validation of transactional strategies instruction with low-
achieving second-grade readers. Journal of Educational Psychology, 88, 18–37.
Bruck, M., and Waters, G. S. 1990. An analysis of the component spelling and
reading skills of good readers–good spellers, good readers–poor spellers, and
poor readers–poor spellers. In T. H. Carr and B. A. Levy, Reading and Its De-
velopment: Component Skills Approaches, 161–206. New York: Academic Press.
Calfee, R. C., and Henry, M. K. 1985. Project READ: An inservice model for
References
training classroom teachers in effective reading instruction. In J. V. Hoffman,
ed., Effective Teaching of Reading: Research and Practice, 199–229. Newark, DE:
International Reading Association.
Cattell, J. M. 1886. The time taken up by cerebral operations. Mind, 11, 220–
242, 377–392, 524–538.
Chall, J. 1967. Learning to Read: The Great Debate. New York: McGraw-Hill.
Chall, J., and Feldman, S. 1966. First grade reading: An analysis of the inter-
actions of professed methods, teacher implementation and child background.
The Reading Teacher, 19, 569–575.
Chang, Kwang-chih. 1963. The Archeology of Ancient China. New Haven, CT:
Yale University Press.
Chomsky, C. 1976. When you still can’t read in the third grade: After decod-
ing, what? Language Arts, 53, 288–296.
Chomsky, N., and Halle, M. 1968. The Sound Patterns of English. New York:
Harper and Row.
Civil, M. 1973. The Sumerian writing system: Some problems. Orientalis, 42,
21–34.
Coltheart, M., Curtis, B., Atkins, P., and Haller, M. 1993. Models of reading
References
Cossu, G., Rossini, F., and Marshall, J. C. 1993. When reading is acquired but
phonemic awareness is not: A study of literacy in Down’s syndrome. Cognition,
46, 129–138.
Coulmas, F. 1989. The Writing Systems of the World. Oxford: Blackwell.
Daniels, P. T., and Bright, W. 1996. The World’s Writing Systems. New York:
Oxford University Press.
References
influence of fresh orthographic information on spelling. Reading and Writing:
An Interdisciplinary Journal, 9, 483–498.
Ehri, L. C. 1989a. Knowledge and its role in reading acquisition and reading
disability. Journal of Learning Disabilities, 22, 356–365.
Ehri, L. C. 1989b. Movement into word reading and spelling: How spelling
contributes to reading. In J. Mason, ed., Reading and Writing Connections,
65–81. Boston: Allyn and Bacon.
In C. Hulme and R. M. Joshi, eds., Reading and Spelling: Development and Dis-
orders. Hillsdale, NJ: Erlbaum.
Ehri, L. C., and Wilce, L. S. 1987. Does learning to spell help beginners learn
to read words? Reading Research Quarterly, 22, 47–65.
Engelmann, S., and Bruner, E. 1969. Distar reading program. Chicago: Science
Research Associates.
Faulkner, H. J., and Levy, B. A. 1994. How text difficulty and reader skill in-
teract to produce differential reliance on word and content overlap in reading
transfer. Journal of Experimental Child Psychology, 58, 1–24.
Faulkner, H. J., and Levy, B. A. 1999. Fluent and nonfluent forms of trans-
fer in reading: Words and their message. Psychonomic Bulletin and Review, 6,
111–116.
Fisher, R., and Craik, F. I. M. 1977. The interaction between encoding and
retrieval operations in cued recall. Journal of Experimental Psychology: Human
Learning and Memory, 3, 701–711.
Flesch, R. [1955] 1985. Why Johnny Can’t Read. 3rd ed. New York: Harper
and Row.
379
Foorman, B. R., Francis, D. J., Beeler, T., Winikates, D., and Fletcher, J. M.
References
1997. Early interventions for children with reading problems: Study designs
and preliminary findings. Learning Disabilities, 8, 63–71.
Foorman, B. R., Francis, D. J., Fletcher, J. M., Schatschneider, C., and Mehta,
P. 1998. The role of instruction in learning to read: Preventing reading failure
in at-risk children. Journal of Educational Psychology, 90, 37–55.
Gersten, R., Darch, C., and Gleason, M. 1988. Effectiveness of a direct in-
struction academic kindergarten for low-income students. Elementary School
Journal, 89, 227–240.
Geva, E., and Siegel, L. S. 2000. Orthographic and cognitive factors in the
concurrent development of basic reading skills in two languages. Reading and
Writing: An Interdisciplinary Journal, 12, 1–30.
Graham, S. 2000. Should the natural learning approach replace spelling in-
struction? Journal of Educational Psychology, 92, 235–247.
Greenberg, S. N., and Koriat, A. 1991. The missing-letter effect for common
function words depends on their linguistic function in the phrase. Journal of
Experimental Psychology: Learning, Memory and Cognition, 17, 1051–1061.
Griffith, P. L., Klesius, J. P., and Kromrey, J. D. 1992. The effect of phonemic
awareness on the literacy development of first grade children in a traditional or
a whole language classroom. Journal of Research in Childhood Education, 6, 85–92.
Hanna, P. R., Hanna, J. S., Hodges, R. E., and Rudorf, E. H. 1966. Phoneme-
Grapheme Correspondences as Cues to Spelling Improvement. Washington, DC:
U.S. Department of Health, Education, and Welfare, Office of Education.
Hart, B., and Risley, T. R. 1995. Meaningful Differences. Baltimore: Paul H.
Brookes.
Hatcher, P. J., Hulme, C., and Ellis, A. W. 1994. Ameliorating early reading
failure by integrating the teaching of reading and phonological skills: The
phonological linkage hypothesis. Child Development, 65, 41–57.
Healy, A. F. 1976. Detection errors on the word the: Evidence for reading
units larger than letters. Journal of Experimental Psychology: Human Perception
and Performance, 2, 235–242.
References
York: Academic Press.
Henderson, E. H., and Beers, J. W., eds. 1980. Developmental and Cognitive
Aspects of Learning to Spell: A Reflection of Word Knowledge. Newark, DE: Inter-
national Reading Association.
Hoover, W. A., and Gough, P. B. 1990. The simple view of reading. Reading
and Writing: An Interdisciplinary Journal, 2, 127–160.
Hudson, J. A., and Shapiro, L. R. 1991. From knowing to telling: The devel-
References
Hulme, C., Monk, A., and Ives, S. 1987. Some experimental studies of multi-
sensory teaching: The effects of manual tracing on children’s paired associate
learning. British Journal of Developmental Psychology, 5, 299–307.
Jacoby, L. L., and Hollingshead, A. 1990. Reading student essays may be haz-
ardous to your spelling: Effects of reading incorrectly and correctly spelled
words. Canadian Journal of Psychology, 44, 345–358.
Jared, D., McRae, K., and Seidenberg, M. S. 1990. The basis of consistency
effects in word naming. Journal of Memory and Language, 29, 687–715.
Johnson, S. [1755] 1773. A Dictionary of the English Language. 4th ed. London:
Strahan.
Johnston, R. S., and Watson, J. 1997, July. Developing reading, spelling and
phonemic awareness skills in primary school children. Reading, 37–40.
383
Johnston, R. S., and Watson, J. 2003. Accelerating reading and spelling with
References
synthetic phonics: A five year follow up. Interchange 4, ISSN 1478-6788, 1–8.
Edinburgh: Scottish Executive Education Department.
Joshi, R. M., Williams, K. A., and Wood, J. R. 1998. Predicting reading com-
prehension from listening comprehension: Is this the answer to the IQ de-
bate? In C. Hulme and R. M. Joshi, eds., Reading and Spelling: Development and
Disorders, 319–327. Mahwah, NJ: Erlbaum.
Juel, C., and Solso, R. L. 1981. The role of orthographic and phonic structure
in word identification. In M. L. Kamil and A. J. More, eds., Perspectives in
Reading Research and Instruction: 30th Yearbook. Washington, DC: National
Reading Conference.
Jusczyk, P. W. 1998. The Discovery of Spoken Language. Cambridge, MA: MIT
Press.
Katz, L., and Frost, R. 1992. The reading process is different for different
orthographies: The orthographic depth hypothesis. In Orthography, Phonology
and Meaning, 67–83. Amsterdam: North-Holland.
Klesius, J. P., Griffith, P. L., and Zielonka, P. 1991. Whole language and tra-
ditional instruction comparison: Overall effectiveness and development of the
alphabetic principle. Reading Research and Instruction, 30, 47–61.
Krevisky, J., and Linfield, J. L. 1990. The Awful Speller’s Dictionary. New York:
Random House.
Landerl, K., Wimmer, H., and Frith, U. 1997. The impact of ortho-
graphic consistency on dyslexia: A German-English comparison. Cognition, 63,
315–334.
Larson, S. C., and Hammill, D. D. 1994. Test of Written Spelling. Austin, TX:
Pro-Ed.
Leslie, L., and Thimke, B. 1986. The use of orthographic knowledge in be-
ginning reading. Journal of Reading Behavior, 18, 229–241.
Levy, B. A., Nicholls, A., and Kohen, D. 1993. Repeated readings: Process
benefits for good and poor readers. Journal of Experimental Child Psychology, 56,
303–327.
Liberman, I. Y., Shankweiler, D., Liberman, A. M., Fowler, C., and Fisher,
F. W. 1974. Explicit syllable and phoneme segmentation in the young child.
Journal of Experimental Child Psychology, 18, 201–212.
Lie, F. 1991. Effects of a training program for stimulating skill in word analy-
sis in first-grade children. Reading Research Quarterly, 26, 234–250.
Lloyd, S. 1992. The Phonics Handbook. Essex, England: Jolly Learning Ltd.
References
Lundberg, I., Frost, J., and Petersen, O. 1988. Effects of an extensive program
for stimulating phonological awareness in preschool children. Reading Research
Quarterly, 23, 263–284.
Lundberg, I., Olofsson, A., and Wall, S. 1980. Reading and spelling skills in
the first school years predicted from phonemic awareness skills in kindergar-
ten. Scandinavian Journal of Psychology, 21, 159–173.
Lysynchuk, L. M., Pressley, M., D’Ailly, H., Smith, M., and Cake, H. 1989.
A methodological analysis of experimental studies of comprehension strategy
instruction. Reading Research Quarterly, 24, 458–470.
Mattys, S. L., Jusczyk, P. W., Luce, P. A., and Morgan, J. L. 1999. Phono-
tactic and prosodic effects on word segmentation in infants. Cognitive Psychol-
ogy, 38, 465–494.
McArthur, T., ed. 1992. The Oxford Companion to the English Language.
Oxford: Oxford University Press.
Schuster/Free Press.
McKeown, M. G., Beck, I. L., Omanson, R. C., and Pople, M. T. 1985. Some
effects of the nature and frequency of vocabulary instruction on the knowledge
and use of words. Reading Research Quarterly, 20, 522–535.
Meyer, L. A., Stahl, S. A., Linn, R. L., and Wardrop, J. L. 1994. Effects of
reading storybooks aloud to children. Journal of Educational Research, 88, 69–85.
References
grade reading achievement. Elementary School Journal, 84, 441–457.
Muter, V., and Snowling, M. 1997. Grammar and phonology predict spell-
ing in middle childhood. Reading and Writing: An Interdisciplinary Journal, 9,
407–425.
Nagy, W. F., and Herman, P. 1987. Breadth and depth of vocabulary knowl-
edge: Implications for acquisition and instruction. In M. McKeown and
M. Curtis, eds., The Nature of Vocabulary Acquisition, 19–36. Hillsdale, NJ:
Erlbaum.
Nation, K., and Snowling, M. 1997. Assessing reading difficulties: The validity
and utility of current measures of reading skill. British Journal of Educational
Psychology, 67, 359–370.
Nunes, T., Bryant, P., and Bindman, M. 1997. Morphological spelling strat-
egies: Developmental stages and processes. Developmental Psychology, 33, 637–
649.
Olson, R., Forsberg, H., Wise, B., and Rack, J. 1994. Measurement of word
recognition, orthographic, and phonological skills. In G. R. Lyon, ed., Frames
of Reference for the Assessment of Learning Disabilities: New Views on Measurement
Issues, 229–277. Baltimore: Brookes.
Pearson, D. 1997. The first grade studies: A personal reflection. Reading Re-
search Quarterly, 32, 428–432.
Pintner, R., Rinsland, H. D., and Zubin, J. 1929. The evaluation of self-
administering spelling tests. Journal of Educational Psychology, 20, 107–111.
Pitman, J., and St. John, J. 1969. Alphabets and Reading. London: Pitman.
Plomin, R., Fulker, D. W., Corley, R., and DeFries, J. C. 1997. Nature, nur-
ture, and cognitive development from 1 to 16 years: A parent-offspring adop-
tion study. Psychological Science, 8, 442–448.
Rashotte, C. A., and Torgesen, J. K. 1985. Repeated reading and reading flu-
ency in learning disabled children. Reading Research Quarterly, 20, 180–188.
389
References
processes. Memory and Cognition, 4, 443–448.
Rayner, K. 1986. Eye movements and the perceptual span in beginning and
skilled readers. Journal of Experimental Child Psychology, 41, 211–236.
Samuels, S. J., Laberge, D., and Bremer, C. D. 1978. Units of word recogni-
tion: Evidence for developmental changes. Journal of Verbal Learning and Ver-
bal Behavior, 17, 715–720.
Samuels, S. J., Miller, N. L., and Eisenberg, P. 1979. Practice effects on the
unit of word recognition. Journal of Educational Psychology, 71, 514–520.
Schatschneider, C., Francis, D. J., Foorman, B. R., Fletcher, J. M., and Mehta,
P. 1999. The dimensionality of phonological awareness: An application of
item response theory. Journal of Educational Psychology, 91, 439–449.
Share, D. L., Jorm, A. F., Maclean, R., and Matthews, R. 1984. Sources of
individual difference in reading acquisition. Journal of Educational Psychology,
76, 1309–1324.
Siegel, L. S., Share, D., and Geva, E. 1995. Evidence for superior ortho-
graphic skills in dyslexics. Psychological Science, 6, 250–254.
Silberberg, N., Iversen, I., and Goins, J. 1973. Which remedial method works
best? Journal of Learning Disabilities, 6, 18–22.
Solso, R. L., and Juel, C. 1980. Positional frequency and versatility of bigrams
for two- through nine-letter English words. Behavior Research Methods and
Instrumentation, 12, 297–343.
Stahl, S. A., and Fairbanks, M. M. 1986. The effects of vocabulary instruction:
A model-based meta-analysis. Review of Educational Research, 56, 72–110.
Stahl, S. A., and Miller, P. D. 1989. Whole language and language experience
approaches for beginning reading: A quantitative research synthesis. Review of
Educational Research, 59, 87–116.
Stebbins, L., St. Pierre, R. G., Proper, E. L., Anderson, R. B., and Cerva, T. R.
References
1977. Education as Experimentation: A Planned Variation Model. Vols. IVA–D.
Cambridge, MA: Abt Associates.
Stuart, M. 1999. Getting ready for reading: Early phoneme awareness and
phonics teaching improves reading and spelling in inner-city second language
learners. British Journal of Educational Psychology, 69, 587–605.
Treiman, R., and Tincoff, R. 1997. The fragility of the alphabetic princi-
ple: Children’s knowledge of letter names can cause them to spell syllabi-
cally rather than alphabetically. Journal of Experimental Child Psychology, 64,
425–451.
References
diverse elementary schools: Decoding and word meaning. Journal of Educa-
tional Psychology, 82, 281–290.
Wimmer, H., and Landerl, K. 1997. How learning to spell German differs
from learning to spell English. In C. A. Perfetti, L. Rieben, and M. Fayol,
Learning to Spell, 81–96. Mahwah, NJ: Erlbaum.
Author Index
Halle, M., 288 327
Hammill, D. D., 70 Jordan, T. R., 45
Hanna, P. R., 53–55, 64–65, 267 Joshi, R. M., 213–214
Hart, B., 217–222, 331 Juel, C., 213, 297–298
Hatcher, P. J., 182–184 Jusczyk, P. W., 23
Hay, J., 77
Haynes, D. P., 215 Kaminska, Z., 119–120
Healy, A. F., 345 Karlgren, B., 21
Helfgott, J. A., 345 Katz, L., 24
Henderson, E. H., 255, 257, 275 Kirby, J., 150
Henry, M. K., 68–69, 71 Kirsner, K., 205
Herman, P. A., 198–200, 216–217 Klesius, J. P., 128, 147–148, 150,
Ho, Ping-ti, 21 321
Hodges, R. E., 65 Kohen, D., 206–208
Hohn, W. E., 170 Koriat, A., 345–347
Hollingshead, A., 118–119, 205– Kramer, S. N., 26
206 Krevisky, J., 119
Holmes, V. M., 269–270, 282 Kromrey, J. D., 128, 148, 150, 321
Hoover, W. A., 213
Howard, M. P., 133 Laberge, D., 342
Hudson, J. A., 212 Landerl, K., 2, 40, 55, 191
Huey, E. B., 195 Larson, S. C., 70
Huggins, A. W. F., 340 Leinhardt, G., 146–147
Hulme, C., 114, 182–184, 274 Leslie, L., 45
Hurley, M., 225–226 Levy, B. A., 205–208, 329
Liberman, A. M., 6
Ireson, J., 184–185 Liberman, D., 45
Iversen, I., 132 Liberman, I. Y., 6, 153, 266–269,
Ives, S., 114 172, 290, 298–300
Lie, F., 178–179
Jacoby, L. L., 118–119 Lindamood, C. H., 77, 131–135,
Jared, D., 306–311, 313 325–326
Jeffrey, W. E., 277–278 Lindamood, P. C., 77, 131–135,
Jenkins, J. R., 232–233, 331 325–326
Jensen, H., 23 Linfield, J. L., 119
Johnson, S., xvi, 43, 266 Lloyd, S., 138–139, 141, 232, 324–
Johnson-Glenberg, M. C., 238– 327
239 Logan, G. D., 115, 121
398
Author Index
Rosenshine, B., 235–236, 238, 240 Stuart, M., 139–143
Rossini, F., 2, 155 Sumbler, K., 111–113, 141–144, 325,
Roth, E., 171–175 340
Rothlein, L., 225–226 Szeszulski, P. A., 45
Ruddy, M. G., 18
Ryan, E. B., 235 Tanzman, M. S., 45
Thimke, B., 45
Saffran, J. R., 7, 23 Tincoff, R., 117, 276–277
Samuels, S. J., 196–197, 277–278, Torgesen, J. K., 133–135, 146, 199–
342 200
Scanlon, D. M., 45, 115 Treiman, R., 117, 275–277, 300–305,
Schatschneider, C., 165, 173, 188 310
Schmandt-Besserat, D., 19
Schneider, W., 171–175, 179 Uhry, J. K., 70, 116–117, 168
Schvaneveldt, R. W., 18
Scragg, D. G., xviii, 76 van Ijzendoorn, M. H., 158–160
Seidenberg, M. S., 288, 305–315, 336 Varnhagen, C. K., 262–264, 268,
Senechal, M., 222–223 275, 343
Shanahan, T., 250, 282 Vellutino, F. R., 45–46, 115
Shankweiler, D. P., 266–269, 272, Venezky, R. L., 39, 41, 48–52, 55,
290, 298–300 65–70, 288, 290, 321
Shapiro, L. R., 212 Vihman, M. M., 153
Share, D. L., 166, 286
Shepherd, M. J., 70, 116–117, 168 Wagner, R. K., 45
Siegel, L. S., 40, 55, 286 Walcutt, C. C., 78
Silberberg, N., 132 Wall, S., 163, 166
Slater, W. H., 216 Waters, G. S., 270–273, 282, 287
Slocum, T. A., 232–233, 331 Watson, J., 139–145, 325–327
Smith, A. A., 69–70, 116, 334 Webster, N., 47, 55, 78, 82
Smith, M. W., 223–225, 228 Wessels, J. M. I., 7, 23
Snowling, M., 214, 249 West, R. F., 45, 270, 281–283, 328
Solso, R. L., 297–298 White, T. G., 216
St. John, J., 77 Wilce, L. S., 70, 116, 168
Stahl, S. A., 123, 226–227, 233 Williams, J. P., 163, 166
Stanovich, K. E., 45, 114, 270, 274– Williams, K. A., 213–214
275, 280–283, 328 Willows, D. M., 235
Stebbins, L., 104 Wimmer, H., 1, 2, 40, 55, 174, 175,
Steffler, D. J., 264 189–191, 277
400
Wingo, C., 77
Author Index
Subject Index
Functional illiteracy 265, 327
countries and, 1, 6, 322
rate of, 1, 5 Many-word problem, xiv, 248, 279–
Function words in reading, 344–347 316
Mapping systems, 11–12, 15, 320
Head Start, 103, 217, 223 Memorization
Homophones, 16 letters as aid, 170
limits to, 18–19, 22, 26, 34, 74, 318
International reading surveys, 1, Memory
349–353 promoted by, 114
recall, 37, 70, 116, 232, 247, 273
Jolly Phonics actions, 138–139, 141, recognition, 37, 70, 116, 247, 273
325–326 rote visual in reading, 279, 283,
284–287, 289
Kindergarten, origin of, 171 sight-word reading and, 251
Meta-analysis, 124–125
LAC test of phoneme awareness, 169 Missing letter effect, 345
Languages, impact on writing system Montessori, Maria, 37, 275
Akkadian, 20 Morphology, 62
Arabic, 19 Moveable alphabet, 131
Aramaic, 30
Chinese, 21, 56 Naming speed, 277, 327, 341–342
Egyptian, 19 National Assessment of Educational
German, 3 Progress (NAEP), 1, 212–213, 236
Greek, 34 National Reading Panel, viii, xiii, xiv,
Hebrew, 19 xviii, 70, 73–75, 107, 247, 317–
Indian, xv, 27–28 320, 322–334
Japanese, xv, 22 comprehension training, 215, 234,
Sumerian, 20 237, 331
Language development, 153 fluency training, 193–196, 328
Language types and writing systems phoneme awareness training, 153–
Hamito-Semitic, 19, 23, 31 188
Indo-European, 27, 30 reading instruction, 121–152, 323–
Tonal languages, 21 324
Letter names and learning to read, vocabulary training, 226, 228, 330
184, 251, 323
Letter-sound correspondences train- OECD (1995, 1997), 1, 349–353
ing, 166, 170, 175, 178–179, 182 Opaque alphabet, xvi, 3, 13, 39–40,
Ligature as diacritic, 27 46, 75, 159, 247, 250
404
Paired-associate learning, 18, 19, 74, visual phonics, 130, 147, 149, 151,
Subject Index
Subject Index
instruction length of training, 236
onset-rime analogy, 126, 129, 141 multiple strategy technique, 238
syllable analysis, 180 National Reading Panel review,
Phonotactics, 7, 23, 34, 55–56, 74, 215, 234, 237
280, 291, 318 reciprocal teaching programs, 235,
Piaget’s stages, 253–254 238–241
Prefixes/suffixes and spelling, 62–63, summary of, 330–333
268 visualizing/verbalizing program,
Project Follow Through, 103–105 238–240
Prototype for reading instruction, 38, Reading frequency (print exposure)
73, 82, 111, 121, 127, 184, 317– impact on accuracy, 282–283
319, 323–324 tests of, 281–286
programs fitting prototype, 131– for training fluency, 194–195, 199,
145, 152 328
Proto-writing, 14 word familiarity and, 344
PsychINFO database, 194–195 Reading instruction
Austria, 175, 210
Reading comprehension California, 5
measures of Canada, 40, 139, 141
cloze test, 214 England, 40, 138, 191
factor analysis, 214 Finland, 39–40
functional illiteracy, 212–213 Germany, 4, 39, 171–173
in-house tests, 235–238 Italy, 39
meta-analysis, 235–237 Norway, 39, 178
NAEP tests, 212, 236 Scotland, 139, 141
standardized tests, 212–214, 236– Sweden, 39
239, 241, 244 Texas, 187
treatment fidelity, 243 Reading methods. See also Phonics,
skills. See also Story comprehension teaching approaches
decoding accuracy and, 211–214, basal readers, 4–5, 77, 80–83, 86,
235 92, 94, 97, 102–103, 108, 128,
oral (listening) comprehension 147, 152, 216, 317
and, 211–214, 235 basal þ phonics, 87, 90–91, 94,
reading fluency and, 211–213, 235 103
vocabulary and, 211 eclectic or balanced, 5, 6, 74, 84,
training methods 111–113, 146
Block comprehension program, language experience, 87, 90–91, 94,
242–244 110–111, 123–124
406
Subject Index
193, 196–201, 203, 205, 207–209 268, 271, 289
proofreading and, 205–208 grapheme-phoneme correspon-
target goal, 196–205 dence ‘‘rules’’ (GPCs), 44–45,
text difficulty and, 207–210 288–290, 306, 308, 344
training studies, 194–210, 328–330 linguistic structure and, 265–266,
transfer effects, 196–206 271, 279
Root words, 54–55, 267–268 morphological structure and, 266–
Rosner test of phoneme awareness, 269, 271, 274, 279, 290
186 orthographic ‘‘rules,’’ 279–280,
290, 340
Sex differences orthographic structure, 279–281,
in reading, 88, 92, 98, 103 291, 297, 305–306, 313, 315, 344
in spelling, 92, 98, 103 orthography defined, 17, 43, 45–
Sight words 46, 48–49, 55
functional sight vocabulary, 340 phoneme-grapheme correspond-
late-stage sight-word reading, 339– ences, 279, 285, 288–290, 326,
343 328
myth of, 9, 34, 343 phonotactic structure and spelling,
as reading method, 288 280, 291
traditional sight word lists, 52 spelling generalizations, 43–44
true sight words, 57–59 spelling ‘‘rules,’’ 43–44, 50, 119,
Silent letters, 57 251, 266, 268, 274
Speech perception, 153, 155 methods based on structure of the
Spelling code, 55–65, 69–70
alternatives, 46–47, 54, 83, 130– methods in schools
131, 139, 151, 270–271, 298, 310, invented spelling, 108, 116–117,
320–321 120, 184, 187, 248, 252
categories of spelling words letter-name spelling, 116–117
rare spellings (exception words), miscellaneous, 120–121
267, 269, 271, 279 research on traditional programs,
regular spelling, 43, 44–45, 56–57, 247–248
247, 266–267, 270–271, 273 rule-based, 43–44, 50, 119
errors, 115–121 predictors
linguistic/visual features of spelling reading skill, 250
code used in cognitive research sex, 249–250
bigram/trigram frequencies, 285, verbal IQ, 249, 268, 282
287, 291, 297–298, 308 probability structure of spelling
checked vowel ‘‘rule,’’ 289 code
408
Subject Index
availability of, 79, 85 221–223, 225, 229
invalid methods of, 91–93, 98 National Reading Panel report,
valid methods of, 94–97 226, 228
Story productive vocabulary and, 222,
comprehension, 224 230–232, 331
grammar, 211, 236 receptive vocabulary and, 223, 230,
recall, 230–231 232
Sumerian schools, 38, 115 standardized tests and, 221, 227,
229
Test construction, 188 successful programs, 229–235
Transparent alphabets, xv, 2–3, 32, teacher-child interaction, 224
39–40, 55, 75–76, 159, 175, 178– Voice-onset latency/response time,
179, 191, 277 293–294, 296, 303, 312
Treatment fidelity, 100
Webster, Noah, 37, 44, 82
Universal education, xviii Word families, 5, 7, 9, 46, 62, 82–83,
126, 318
Vocabulary Word play in teaching, 154
acquisition, 215–219 Writing systems
children’s literature and, 215 ancient
heritability/verbal IQ, 219 Akkadian, 11, 20
oral comprehension, 215 Anglo-Saxon, 39, 40–41, 46
productive vocabulary size, 216, 233 Aramaic, 11, 30
reading and, 215 Assyrian, 24
television and, 215 Babylonian, 11, 13, 24, 29–30
word derivation, 215 Chinese, 18, 20–21, 25–26
training Crete, 11, 27
classroom lessons and, 112–113 Egyptian, 11, 13–16, 18–19, 23,
dangers of whole language, 220 31
deducing meaning from context, Greek, 28, 31, 32, 34, 39, 41
232–233 Hittite, 6, 11, 24
frequency of exposure and, 229– Indian, 27–29
233 Japanese, 21–22, 33
Head Start, 217, 233, 331 Korean, 28–29, 30
listening to stories as method, Mayan, 11, 15–16, 27
221–226 Norman French, 40–41
meta-analysis of classroom Old English, 40
research, 226–227 Old Persian, 28–30
410