Technical Reference: Marcia Invernizzi - Connie Juel - Linda Swank - Joanne Meier
Technical Reference: Marcia Invernizzi - Connie Juel - Linda Swank - Joanne Meier
K Technical Reference
For questions about PALS-K, please contact:
Phonological Awareness Literacy Screening (PALS)
1-888-UVA-PALS (1-888-882-7257) or (434) 982-2780
Fax: (434) 982-2793
e-mail address: [email protected] • Web site: http://pals.virginia.edu
©2004–2015 by The Rector and The Board of Visitors of the University of Virginia. All Rights Reserved.
Section II
9 Description of PALS-K
7 Domains
7 Scoring
8 Forms
Section III
11 Item Development and Field-Testing
11 Phonological Awareness Tasks
11 Rhyme and Beginning Sound Awareness
13 Literacy Tasks
13 Alphabet Knowledge
14 Letter-Sound Knowledge
15 Concept of Word
17 Word Recognition in Isolation
18 Feedback from the Field
18 Outside Review
19 Advisory Review Panel
19 External Review
Section IV
20 Establishing Summed Score Criteria and Benchmarks
21 Benchmarks and Discriminant Analysis (DA)
Section V
22 Technical Adequacy
23 Broad Representation of Students
23 Pilots
23 Summary Statistics
24 Reliability
26 Test-retest Reliability
26 Subtask Reliability
26 Inter-rater Reliability
29 Internet Data Entry Reliability
29 Validity
29 Content Validity
30 Criterion-related Validity
31 Construct Validity
35 Differential Item Functioning
Section VI
36 Summary
Section VII
37 References
Section VIII
39 Endnotes
Section I Phonological Awareness Literacy Screening for Kindergarten 5
Section I
Phonological Awareness Literacy Screening for
Kindergarten (PALS-K)
instrument (validity and reliability). In preparing ment. Many of the same conditions from the earlier
this Technical Reference, we followed the Standards initiative apply:
for Educational and Psychological Testing (1999), • All students in kindergarten through second grade
prepared jointly by the American Educational must be screened annually;
Research Association (AERA), the American • All students not meeting the benchmark for their
Psychological Association (APA), and the National grade level must receive, in addition to regular
Council on Measurement in Education (NCME). classroom instruction, intervention services;
Explicit instructions for the administration and • All students in kindergarten through second grade
scoring of PALS-K are included in a separate who receive intervention services must be assessed
PALS-K Administration and Scoring Guide. The again during the first screening period following
results for the statewide screening for each cohort the intervention. (Note that third-grade students
are available in separate annual reports. are only screened in the fall if they are new to
Virginia schools, or if they received intervention
services over the summer; spring screening for
Background third-graders is optional);
• All screening results must be reported to the PALS
The Phonological Awareness Literacy Screening Office at the University of Virginia via the PALS
for Kindergarten (PALS-K) is the state-provided website (pals.virginia.edu).
screening tool for the Virginia Early Intervention
Reading Initiative (EIRI), and is designed for use In 2002, the Virginia Department of Education
in kindergarten. The purpose of the EIRI is to changed the screening period for the EIRI from fall
reduce the number of children with reading prob- to spring. Also, a high benchmark was added for
lems through early detection and to accelerate their first- and second-grade students clearly performing
learning of research-identified emergent and early above grade-level expectations. Students attaining
literacy skills. this high benchmark would no longer need to be
screened for the EIRI. These changes enhance the
Virginia’s Early Intervention Reading EIRI by:
Initiative (EIRI) • allowing intervention services for all students
The 1997 Virginia Acts of Assembly, Chapter 924, in first, second, and third grades to start at
Item 140, initially established the Early Intervention the beginning of the school year or during the
Reading Initiative. The state initiative allocated summer;
funds to help participating school divisions identify • eliminating the problem created by fall screening
children in need of additional instruction and to for year-round schools and schools that start
provide early intervention services to students with before Labor Day;
diagnosed needs. Participating school divisions were • allowing Title I to use PALS as their screening
allowed to implement the initiative in either kinder- instrument for reading services, thereby
garten or first grade. eliminating the use of a second screening;
• reducing the amount of time required for
In the 2000-01 legislative session, the Governor and screening.
the General Assembly provided funding to expand
the EIRI to third grade. Participating school divisions An EIRI timeline for PALS screening is shown in
are now required to screen students in kindergarten Table 1.
through third grade either with a diagnostic assess-
ment approved by the Virginia Department of
Education or with PALS, the state-provided instru-
Section II Description of PALS-K 7
Section II
Description of PALS-K
Scoring
Section III
Item Development and Field-Testing
In this section we describe the various tasks of within spoken words. The research literature on pho-
PALS-K: nological awareness identifies two skills significantly
• Rhyme and beginning sound awareness; related to reading outcomes: (a) rhyme awareness,
• Alphabet knowledge; and (b) individual phoneme awareness.4 Items in
• Letter-sound awareness; PALS-K were selected to represent these two catego-
• Letter sounds; ries of sound awareness and to meet three attributes
• Spelling; of measurement. First, the items selected needed to
• Concept of word; be of moderate difficulty for kindergarten children.
• Word recognition in isolation. Second, the items selected needed to have strong pre-
dictive relationships to reading outcomes. Measures
We also describe feedback we receive from experts of rhyme awareness and phonemic awareness are
in the field. well documented as predictive of reading outcome.5
Third, the selected items needed to be adaptable to
PALS-K evolved from the McGuffey Reading group assessment procedures. Because the format for
Center’s Test of Early Word Knowledge (EWK), both tasks subsumed under Phonological Awareness
which later became McGuffey’s Assessment of (Rhyme Awareness and Beginning Sound Awareness)
Literacy Acquisition (ALA). Both of these early is similar, the following section describes the devel-
literacy assessment procedures have been adapted, opment of these tasks concurrently.
expanded, and applied in early intervention settings
across the country, most notably by Darrell Morris. Rhyme and Beginning Sound Awareness
Morris’ Early Reading Screening Inventory (ERSI) New format. Traditional measures of phonological
(see Perney, Morris, & Carter, 1997) has been used awareness typically assess students in an individual
extensively across the country and includes many of format, using oral assessment procedures. In this
the same tasks contained in PALS-K. way, obtaining phonological awareness data on an
entire class can become a lengthy and time-con-
The tasks presented in PALS-K are a representative suming process. The items on the PALS-K Group
sample of tasks found in other measures of early lit- Rhyme Awareness and Group Beginning Sound
eracy. Items were selected because of their previous Awareness tasks allow teachers to assess students in
history in phonological awareness and early literacy small groups of five or fewer. Only those students
research, and because of their correlation with who exhibit difficulty in the group screening require
Virginia’s Standards of Learning. Item selection and individual follow-up to gather more detailed infor-
field-testing procedures for the original and revised mation about which sound units present difficulty
versions of PALS-K are described below. for a given student.
nological awareness, thus establishing predictive types of picture revisions resulted from an analysis of
outcomes; and (b) pictures were easily recognizable the 1997 and 1998 samples. First, controversial pic-
and represented age-appropriate vocabulary. The first tures were changed to reflect more appropriate items.
criterion was met by selecting stimulus words from For example, the picture of the pipe in the Group
past prediction studies.6 We met the second criterion Beginning Sound Awareness task was eliminated and
by selecting picture templates previously used suc- replaced with a picture of a bus. Second, ambiguous
cessfully with preschool and primary-age children, pictures were redrawn to provide greater clarity. For
and by having an artist draw similar renderings example, the picture of the rock was redrawn to look
of pictures. The pictures represent one-syllable, more like a rock. Third, unfamiliar pictures were
high-frequency words appropriate for kindergarten replaced with more common items. For example,
children.7 We included only single-syllable words the picture of the fountain pen was replaced with a
with concrete meanings that could be represented picture of the more common ballpoint pen. Fourth,
pictorially. random sound relations among pictures in the same
row were eliminated, so that no sound within the
Field review. The PALS-K pictures and teacher name of the target picture occurred in any position
administration instructions were reviewed by a in any other picture within the row. For example, the
panel of primary classroom teachers, elementary picture of the tie was changed to a picture of a bell
administrators, university researchers, and Virginia so as not to prompt attention inadvertently to the
Department of Education personnel in assess- /t/ sound at the end of the target picture heart. The
ment, early childhood, and elementary instruction. order of the pictures was also changed in some cases
Following approval by the 15-member panel, the to ensure that correct responses were distributed ran-
phonological awareness measures were then piloted domly across items; thus, scores would not be biased
with 50 kindergarten and first-grade children in two if, for example, a child simply chose the first picture
school divisions in different parts of the state, while in each row.
classroom teachers and administrative personnel
observed. Following the first administration, class- Additional testing. Further pilot data on individual
room teachers and administrative personnel were items were collected in Fall 2001 with 1,855 kinder-
trained to re-administer the phonological awareness garten children for Group Rhyme Awareness and
tasks. Within a three-week period, they retested the 1,862 kindergarten children for Group Beginning
same students for preliminary test-retest reliability Sound Awareness. In Spring 2004 data on indi-
data. Following the re-administration, teachers and vidual items were collected from 1,417 kindergarten
administrators provided oral and written feedback children for Group Rhyme Awareness and 1,227
on the instructions and on students’ performance. kindergarten children for Group Beginning Sound
They also provided their own reactions to the proce- Awareness. These phonological awareness tasks and
dure and suggested changes. Their suggested changes items within these tasks were examined with regard
were submitted to the 15-member panel for final to (a) item-to-total correlations, (b) Cronbach’s
approval and incorporation into PALS-K. This set alpha (an index of internal consistency based on
of procedures resulted in the current PALS-K pho- the average correlation of items within a task),8 and
nological awareness tasks, Rhyme Awareness and (c) item means (level of difficulty). Items were con-
Beginning Sound Awareness. sidered for removal if they had low item-to-total
correlations, were too easy or difficult (i.e., nearly
Field testing. The phonological awareness items all students responded correctly or nearly all stu-
were administered to 53,425 kindergartners and dents missed the item), or if scales yielded alpha
first-graders in the fall of 1997 and to 65,619 kin- coefficients less than .80. In these pilot samples, item-
dergartners and first graders in the fall of 1998. Four to-total correlations for each item were moderate
Section III Item Development and Field-Testing 11
item encountered was a lower-case b, a letter best predictors of word analysis and word synthesis.15
frequently confused with lower-case d. On the cur- In the first cohort of Virginia’s EIRI, 35,518 kinder-
rent PALS-K, the first item encountered is an m. garten and 16,136 first-grade students attempted
Inter-rater reliabilities for the Lower-Case Alphabet to spell five consonant-vowel-consonant (CVC)
Recognition task have been consistently high words in the fall of the academic year. In the second
(r = .99, p < .01). year, 50,949 kindergartners and 14,670 first graders
attempted to spell the same five high-frequency
Letter-Sound Knowledge words. In both samples, children’s spellings were
In addition to naming the letters of the alphabet, scored for the number of phonemes represented. The
emergent readers must develop knowledge of letter Spelling task has consistently been a reliable discrim-
sounds and learn to apply that knowledge. The ability inator of children in need of additional instruction
to produce the sounds represented by individual let- in phonological awareness and early literacy skills in
ters in isolation is difficult for young children, and both kindergarten and first grade. Inter-rater reliabil-
requires explicit awareness of individual phonemes. ities have remained high for all statewide samples:
PALS-K assesses both children’s knowledge of letter r = .99 (p < .01).
sounds and their application of that knowledge in
two tasks: Letter Sounds and Spelling. In Spring 2001, two sets of five new spelling words
were piloted among 847 kindergartners in 22 dif-
Letter Sounds. In the Letter Sounds task, children ferent school divisions across the Commonwealth of
are asked to touch each letter and say the sound it Virginia. Then, in Fall 2001, two additional sets of
represents. Only the lax (or short) vowel sound for five spelling words were piloted among 1,980 kinder-
each vowel is scored as correct, and only the hard gartners in 52 different school divisions across the
sound for C and G is scored as correct. Children are Commonwealth of Virginia. The piloted items were
prompted for “the other sound” a letter makes in all high-frequency CVC words.
cases where they provide a long vowel sound or the
soft sounds for C or G. Inter-rater reliabilities for Words for the piloted spelling inventories were
the Letter Sounds task have been consistently high: selected from a pool of words used in previous
r = .98 to .99 (p < .01). Because research has shown research in the Virginia Spelling Studies.16 Specific
that kindergartners recognize more upper-case than words were selected by frequency of occurrence and
lower-case letters, knowledge of grapheme-phoneme by each word’s linguistic attributes. That is, words
correspondence is assessed using upper-case letters were selected to elicit responses to particular speech
in PALS-K. In the first cohort of the EIRI, all of the sounds and high-frequency CVC phonograms typi-
upper-case letters were used, with the exception of cally encountered in print early on. Five words were
X and Q, since neither of these letters can be pro- selected for each form. All pilots assessed student
nounced in isolation. Qu was substituted for Q and performance on the representation of beginning,
Sh took the place of X. Negative feedback from the middle, and ending speech sounds and the total
first PALS-K administration regarding Qu prompted number of words spelled correctly. In scoring each
the elimination of this item in the 1998 edition. Ch, a word, students received a point for the phonetic
more frequently occurring digraph, replaced Qu, and representation of the beginning, middle, and ending
Th replaced M, which became the letter used as an sound. Another point was awarded if the entire word
example in the directions. was spelled correctly. In this way, students were
credited for phonetic representation of individual
Spelling. Application of letter-sound knowledge in phonemes regardless of whole-word spellings.
invented spelling tasks is an excellent predictor of
word recognition in young children14 and among the
Section III Item Development and Field-Testing 13
Individual words from all pilot lists were analyzed In Spring 2002, we also piloted an expanded 12-word
using the following criteria: spelling list that was not only longer than the cur-
• teacher feedback; rent PALS-K spelling list but also included a more
• item means (level of difficulty); complete analysis of spelling features. Our aim in
• item-to-total correlations; this pilot was to determine whether we could better
• Cronbach’s alpha. approximate the PALS 1–3 spelling task that first
graders would encounter, and more importantly
Words were considered for removal if they received to assess whether we could further strengthen the
negative feedback from more than two teachers in relationship between PALS-K scores from spring of
the pilot sample, if they were too easy or difficult, if kindergarten, and PALS 1–3 scores for first graders
they had low item-to-total correlations, or if a given in the fall. Our analyses of these pilot data suggested
spelling list had an alpha less than .80. None of the that the enhanced spelling task added little to the
piloted spelling words from the Spring 2001 pilot prediction of first grade scores based on PALS-K
warranted replacement based on empirical grounds. scores; that is, it offered little or no statistical ben-
Spelling lists had alpha coefficients greater than .90; efit over the present Spelling task. A change in the
all item-to-total correlations were in the range of .49 spelling task for this purpose was not warranted
to .72; and all piloted words were of acceptable dif- based on this pilot.
ficulty. However, 32% of the teachers participating
in the pilot study voiced concerns over the word jog A final additional list of five spelling words was
because of perceived unfairness regarding j and g piloted in Spring 2003 with 1,565 kindergarten stu-
in the same word, so this word was removed. The dents. Again this spelling list and individual words
piloted spelling lists and the original spelling lists were examined using the same criteria as for earlier
were significantly correlated (r = .70, p < .001). lists: item difficulty, item-to-total correlations, and
Cronbach’s alpha. This piloted list met all statistical
Spelling lists from the Fall 2001 pilot also had alpha criteria, correlated well with the Spring 2003 PALS-
coefficients greater than .90; all item-to-total correla- K spelling list (r = .88, p < .001) and received no
tions were in the range of .49 to .80; and all piloted negative teacher feedback; thus no changes were
words showed evidence of acceptable difficulty. necessary.
Although both piloted lists were acceptable on these
criteria, one word list was consistently superior on Concept of Word
all criteria (e.g., higher alpha), and it was selected for Concept of word refers to the emergent reader’s
use in the PALS-K materials. Teacher feedback indi- ability to match spoken words to written words as
cated that kindergarten students were confused when s/he reads.17 Research has shown that a stable con-
sentences were provided with the spelling words; cept of word in text can facilitate a child’s awareness
therefore, we did not include spelling sentences with of the individual sounds within words. Until children
the PALS-K materials. can point to individual words accurately within a line
of text, they will be unable to learn new words while
Following selection of the spelling words from the reading or to attend effectively to letter-sound cues at
Spring 2001 pilot, we examined students’ spelling the beginning of words in running text. The ability to
samples (n = 354) to determine the most common fully segment all the phonemes within words appears
phonetic substitutions made by kindergarten students. to follow concept of word attainment.18 Children
Where students consistently represented a particular with a solid concept of word will recognize words
letter, we compared these phonetic representations they didn’t know prior to reading a memorized or
with the developmental spelling literature in order to familiar text, even when these words are presented
verify the accuracy of the spelling scoring grid. out of context.
14 PALS-K Technical Reference
Development of the Concept of Word task. In 1997, word lists using Cronbach’s alpha. Pre-test word list
34,848 kindergarten and 3,586 first grade students alphas ranged from .76 (n = 162) to .90 (n = 402)
were administered the Concept of Word finger-point and post-test word list alphas ranged from .81
reading task. Qualitative feedback from the field (n = 161) to .93 (n = 421). Therefore, no words
indicated that some children were unfamiliar with needed to be replaced in the pre- and post-test word
the content of the text used that year, which featured lists. Care was taken to mix the order of words in the
a farmer and a goat. Although familiarity with the word lists so that these lists did not reflect the order
story content would not have affected the outcome of word presentation in the rhyme itself. Selection of
of the measure, the content was changed in the target words in the Word Identification portion was
1998 version of PALS to a standard nursery rhyme. based on both the position of words in the sentence
Further feedback from the PALS Internet survey as well as word difficulty. For instance, in each poem
indicated a teacher preference for the book format some words from the beginning, middle, and end
of 1997 and for a more familiar nursery rhyme. As a of the lines were assessed. Additional modifications
result, PALS-K uses simple nursery rhymes presented were made to the test layout and to some illustra-
in a book format, one line to a page. The administra- tions accompanying the rhymes, based on teacher
tion and scoring of the PALS Concept of Word task feedback.
remained unchanged for the first four cohorts of
Virginia’s EIRI. Changes. In Spring 2003, minor changes were made
to the Concept of Word task to enhance the relation-
Field testing. Teachers providing feedback from ship between PALS-K and PALS 1–3, particularly
the 2000-01 school year requested another nursery across the spring kindergarten to fall first grade
rhyme to alternate from fall to spring. In response to screenings. First, the Pre-test Word List task was
this need, multiple nursery rhymes were field-tested eliminated. This was based on both teacher feedback
with 1,405 end-of-year kindergartners and first- and statistical analyses suggesting that the pretest
graders in Spring and Fall 2001. A new procedure word list added little to the predictive validity of
for administration of the Concept of Word task was Concept of Word in relation to later PALS scores.
also piloted at the same times. The new procedure Second, the Concept of Word Post-test Word List
involved pre-testing words from the rhyme prior to score was included in the PALS-K Summed Score
the finger-point reading exercise. The same words beginning in Spring 2003. This decision was based
were post-tested after the finger-pointing exercise, on statistical analyses suggesting that the Post-test
to see if any words were “picked up” in the pro- Word List significantly enhances the predictive
cess. Words identified in the post-test that were not validity of PALS-K in relation to PALS 1–3 scores.
known in the pre-test are words learned by virtue The Pointing and Word Identification subtasks in
of participating in the Concept of Word task itself. Concept of Word remained the same.
Among pilot sample scores from Spring 2001
(n = 276), the post-test subscore at the end of kinder- Concept of Word poems. In Fall 2003 three Concept
garten was found to be significantly correlated with of Word poems were piloted (Baa Baa Black Sheep,
the preprimer Word Recognition in Isolation scores Little Teapot, and Little Turtle). In Spring 2004 two
that children earned the following fall (i.e., at the additional Concept of Word poems were piloted
beginning of first grade) (r = .79). (Little Bo Peep and Little Boy Blue). In each case,
we again examined (a) teacher feedback, (b) the
From these pilots, nursery rhymes were selected for internal consistency of items within subtasks, and
use if they received positive feedback from the pilot (c) the relationship (correlation) between the piloted
teachers and yielded reliability coefficients of .80 or tasks and the regular PALS-K Concept of Word tasks
higher. Reliability was assessed for pre- and post-test administered at the same time. From the Fall pilot,
Section III Item Development and Field-Testing 15
Little Turtle was chosen based on the combination of for Spelling,22 A Combined Word List,23 and A Basic
very positive teacher feedback, strong and significant Vocabulary of Elementary School Children.24 The data-
correlation between the pilot word list and the PALS- base now includes all of these words, plus the words
K Concept of Word list (r = .84, p < .01, n = 1,776), from graded word lists located in informal reading
and acceptable internal consistency (Cronbach’s inventories and other well-known published assess-
alpha = .84). ments with grade-level lists. Words were added to
the database from the Qualitative Reading Inventory
From the Spring 2004 pilot, Little Bo Peep was II (QRI-II),25 The Stieglitz Informal Reading Inventory
chosen, again based on positive teacher feedback, (1997), the Bader Reading and Language Inventory
significant correlation between the piloted COW (1998), the Decoding Skills Test,26 the Ekwall/Shanker
word list and the PALS-K COW word list (r = .76, Reading Inventory,27 the Book Buddies Early Literacy
p < .01, n = 1,280), and acceptable internal consis- Screening (BBELS),28 and The Howard Street Tutoring
tency (Cronbach’s alpha = .88) Manual.29 The validity of each word’s grade-level
placement was cross-checked for consistency within
Word Recognition in Isolation frequency bands in The American Heritage Word
Word lists. Since the capacity to obtain meaning Frequency Book.30 Words on the preprimer and
from print depends so strongly on accurate, auto- primer word lists appear in at least three of the above
matic recognition of words,19 PALS–K provides sources. Words on the first-grade word list appear in
three optional word lists to gauge advancing kinder- at least two of the above sources and are unique to
gartners’ progress throughout the year: preprimer that specific grade level.
(pre-1), primer (1.1), and first grade (1.2). Each
list represents a random sample from a database of Preprimer, primer, and first-grade word lists were
words created from a variety of sources. piloted in Spring 2001 with 427 students in 39 kin-
dergarten classrooms in different schools and school
Word lists for PALS-K were developed in conjunc- divisions within the Commonwealth of Virginia.
tion with the preprimer through third-grade word Two additional sets of word lists were piloted in Fall
lists for PALS 1–3. Originally, word lists were gener- 2001 with 311 kindergarten students in 31 kinder-
ated from a database of words created from three of garten classrooms. Furthermore, different forms of
the most frequently used basal readers in Virginia. the PALS word lists have been piloted over the past
These included the Harcourt Brace Signature series five years with over 7,500 students in 246 first-, 194
and the Scott Foresman series from 1997 and 1999. second-, and 80 third-grade classrooms from over 55
Then, words from the first-, second-, and third-grade different school divisions across all eight regions of
lists from the EDL Core Vocabularies in Reading, the Commonwealth of Virginia. Student scores gen-
Mathematics, Science, and Social Studies (1997) were erated from all pilot studies were used to assess the
added to the database. The EDL Core Vocabularies reliability and validity of the word lists.
provides a core reading vocabulary by grade, com-
prised of words derived from a survey of nine basal Analysis. Individual words and word lists were
reading series. Words from the 100 Most Frequent analyzed using the following criteria: (a) teacher
Words in Books for Beginning Readers20 were added to feedback, (b) item means (level of difficulty), (c)
the preprimer, primer, and first-grade word pools. item-to-total correlations, and (d) Cronbach’s alpha.
Words and/or word lists were considered for removal
Data base expanded. After the first year, the if they received negative feedback from more than
database was expanded to include words from grade- two teachers in the pilot sample, were too easy or
level lists in spelling and vocabulary books. These too difficult, had low item-to-total correlations, or
included words from Teaching Spelling,21 A Reason had alpha coefficients lower than .80. Words with
16 PALS-K Technical Reference
low item-to-total correlations, little to no variance and the information gained from the screening.
in response patterns, and/or negative feedback from Open-ended comments were also invited.
teachers were substituted with words that had higher
item-to-total correlations, moderate variance, and The results from the survey and qualitative com-
more positive teacher feedback. In a few isolated ments responding to an open-ended question were
cases, plural endings were changed to singular. consistent with comments received through the
Currently, three sets of word lists with good evidence toll-free phone line. That is, with regard to clarity of
of reliability and validity are rotated across PALS directions, ease of administration and scoring, and
screening windows. information gained from screening, most teachers
(81% to 98% across all subtasks) rated PALS-K tasks
good (4) to excellent (5) on a rating scale of one to
Feedback from the Field five. In 2003, 2,011 teachers responded to a brief
survey designed primarily to assess the usefulness of
In addition to the formal feedback we solicit various PALS reports and Web site features. Between
from reviewers, the PALS office continually seeks 71% and 80% of respondents rated class reports,
informal feedback from the field. During each spring class summary sheets, score history reports, and stu-
screening window we post a survey on the PALS dent summary reports as “very useful;” 2% or fewer
website (pals.virginia.edu) to solicit feedback from of respondents rated any of these reports as “not
teachers in the field. For example, response rates to useful.”
specific questions on the Spring 2001 survey ranged
from 200 to 800 teachers who participated in the
EIRI and who screened their students with either Outside Review
PALS-K or PALS 1–3. In Spring 2001, approximately
533 teachers rated PALS-K tasks on the ease of The Code of Fair Testing Practices in Education
administration and scoring, the clarity of directions, (1988) defines the obligations of professionals who
undertake the process of creating an assessment rials. The review panel was further asked to suggest
instrument. Included among these obligations are changes or deletions of items. Additionally, a mea-
procedures that minimize the potential for bias or surement professional from outside the University of
stereotyping. The potential for bias can be mini- Virginia reviewed PALS-K materials to verify that the
mized if assessment tools are carefully evaluated.31 items and directions for administering and scoring
Procedures that protect against inappropriate instru- met the minimum standards of assessment.
ment content include the use of an advisory review
panel and an external evaluation. External Review
In addition to the opinions of the advisory review
Advisory Review Panel panel, the Virginia Department of Education sought
To evaluate the appropriateness of PALS’ content, the opinions of several external reviewers, all of
we sought the opinions of an advisory review whom are national experts in the fields of reading,
panel, appointed by the Virginia Department of communication sciences, or psychology. The first
Education. The panel consisted of primary grade PALS technical manual and report,32 detailing the
teachers, reading specialists, speech teachers, psychometric qualities of PALS and first-year results,
instructional coordinators, and educators from the as well as PALS materials and teacher’s manuals,
Commonwealth of Virginia. Members of this panel were sent to prominent researchers whose charge
are listed in Table 4, along with their affiliations at was to determine the technical soundness of PALS
the time they reviewed PALS-K. as a valid and reliable instrument for the EIRI. The
opinions of these outside reviewers were presented
The advisory review panel evaluated the materials to the Virginia Department of Education in March
that are part of the PALS-K screening. Members 1999. The judgments of these reviewers were favor-
completed review forms in which they appraised (a) able. Copies of the reviews can be obtained from
the content of the assessment, (b) the content of the the Virginia Department of Education. External
teacher’s manual, (c) the directions for administra- reviewers are listed in Table 5. An additional, inde-
tion and scoring, (d) the content of the screening pendent review of PALS can be found in Early
instrument, and (e) the graphic qualities of the mate- Reading Assessment.33
18 PALS-K Technical Reference
Section IV
Establishing Summed Score Criteria and
Benchmarks
In the following sections, we describe the of 20 judges was invited for each grade level, K
process through which PALS-K benchmarks were through 3). Each panel of judges spent a full day in
established. Charlottesville evaluating individual task items from
all PALS materials.
Criteria and benchmarks for PALS-K were derived
from several sources: We evaluated standard-setting judges’ mean scores
• nine years of research using similar tasks with for PALS tasks against two sources of information:
struggling readers in a central Virginia early our current benchmarks, and statewide data from the
intervention program; most recent screening windows. In virtually all cases,
• statewide kindergarten and first-grade PALS standard-setting judges’ scores were comparable
data generated from the first eleven cohorts of to current benchmarks (i.e., within one standard
Virginia’s EIRI; deviation), and moreover fell at approximately the
• data gathered from pilot samples between 2000 bottom quartile, which has traditionally been the
and 2004 with approximately 4,000 kindergarten approximate range of students identified by PALS-K.
students in the Commonwealth of Virginia; For these reasons, we decided that standard-setting
• theoretical assumptions based on the reading judges’ evaluations supported PALS benchmarks as
research literature. appropriate.
Benchmarks reflect raw scores for each PALS task, Criteria and benchmarks for the 1997–99 PALS
based on available data sources. The sum of these (the previous version of PALS, used with kinder-
benchmark scores for the core variables equals the garten and first-grade students only) were derived
Summed Score criterion. These benchmarks and cri- from more than six years of research using PALS
teria are re-evaluated based on analyses of each year’s measures on more than 750 at-risk children in a
statewide PALS-K data and results of ongoing pilot central Virginia early intervention program.34 Since
studies. this research was limited to central Virginia, new
benchmarks were established based on a state-
In November 2002 we conducted a formal standard- wide sample of 37,072 kindergarten children in the
setting procedure to verify PALS benchmarks. Commonwealth of Virginia in the fall of 1997. These
Standard setting refers to the process used by instru- benchmarks were confirmed through analyses con-
ment developers to help establish, or in this case ducted using scores from 50,949 kindergartners in
to verify, benchmarks or levels of performance Fall 1998; 53,256 kindergartners in Fall 1999; 74,054
that reflect ‘minimal competence.’ In standard set- kindergartners in Fall 2000; 50,127 kindergartners
ting, expert judges evaluate each individual task or in Spring 2001; 65,036 kindergartners in Fall 2001;
item and state whether they believe that the student 74,928 kindergartners in Spring 2002; and 83,099
who is minimally competent would respond cor- kindergartners in Fall 2003.
rectly. We assembled panels of experts in reading
from throughout the Commonwealth (one panel
Section IV Establishing Summed Score Criteria and Benchmarks 19
The core variables representing the emergent-literacy Benchmarks and Discriminant Analysis (DA)
construct measured by PALS-K underwent prin-
cipal component analysis (PCA) in order to obtain To verify PALS-K benchmarks statistically, we sub-
a single-factor score for each child. This was done ject statewide data to discriminant analyses (DA).
for the ultimate purpose of calculating quartiles for This allows us to assess the extent to which PALS
that single index. The factor loadings from which variables reliably discriminate between groups of
the factor scores were created were relatively homo- students who are or are not identified as needing
geneous, suggesting that each subtask similarly additional services, based on their PALS-K Summed
influenced the factor score. As a result, a different Score. The primary goal of DA is to isolate statisti-
and simpler index was created. The scores obtained cally the dimensions on which groups differ, based
by each child on the core tasks were summed. To on a set of variables (i.e., PALS-K subtask scores).
assess the validity of the Summed Scores, they were
correlated with the more precise factor scores. That Discriminant function analyses based on the sub-
correlation coefficient was 0.99, indicating that the tasks included in the Summed Score correctly
Summed Score captured nearly everything. Quartiles classified as Identified or Not-identified 96% of
were then calculated for the Summed Scores for kin- students in Fall 2009 and 98% of students in Spring
dergarten using the statewide sample. 2010, based on their subtask scores. This suggests
that the task scores used to create the Summed Score
Benchmarks reflect milestones established in part produce a discriminant function (a linear combina-
by using means and standard deviations from stu- tion of these variables) that classifies students as
dents NOT in the bottom quartile. Benchmarks were Identified or Not-identified, using mathematical
determined initially by subtracting one standard measures to isolate the dimensions that distinguish
deviation from the mean score for students ranking the groups. The abstract (or mathematical) classifi-
above the bottom quartile and by making further cations have consistently demonstrated a very high
adjustments based on modal data for each task. correspondence to PALS classification. Since the
Finally, we always evaluate benchmarks subjectively inception of PALS, similar DA analyses have consis-
to make certain that decisions we have made empiri- tently classified 93% to 98% of students correctly as
cally reflect sensible targets that are consistent with Identified or Not-identified.
literacy acquisition theory.
20 PALS-K Technical Reference
Section V
Technical Adequacy
In this chapter, we provide an overview of the in the direct and instructionally relevant assessment of
students who have made up the PALS pilot and these literacy fundamentals (see the conceptual frame-
statewide samples, and then describe the technical work of PALS-K in Table 3) in the subtasks of Rhyme
adequacy of PALS-K in terms of validity and Awareness, Beginning Sound Awareness, Alphabet
reliability. Recognition, Letter Sounds, Spelling, Concept of
Word, and Word Recognition in Isolation.
Standards for test construction, evaluation, and
documentation, as outlined in the Standards for In subsequent sections, we describe the technical
Educational and Psychological Testing (1999), adequacy of PALS-K in terms of reliability and
prepared by the American Educational Research validity. First, we provide an overview of the students
Association (AERA), American Psychological who have participated in PALS-K pilot and statewide
Association (APA), and National Council on samples.
Measurement in Education (NCME), were carefully
followed throughout the development of PALS-K. We
made special efforts to satisfy all the major criteria Broad Representation of Students
for acquiring and reporting technical data. In addi-
tion, we have attended carefully to the assessment Tasks, items, and benchmarks used in PALS-K are
criteria spelled out in various policy initiatives (e.g., derived from analyses of PALS scores from more
the Reading First requirement of the No Child Left than 600,000 kindergarten students in schools par-
Behind Act, Race to the Top). Specifically, Reading ticipating in Virginia’s EIRI between 1997 and 2006.
First guidelines suggest that assessment tools should The first ten cohorts of the EIRI provide ten state-
serve four assessment purposes: (a) screening, (b) wide samples representing a diverse population.35
diagnosis, (c) progress monitoring, and (d) outcome Table 6 lists the total number of students screened
evaluation. Moreover, states are encouraged to use with PALS-K in the sixteenth cohort of Virginia’s
assessments that target five core reading areas: (a) EIRI (Fall 2012) by gender, socioeconomic status
phonemic awareness, (b) phonics, (c) fluency, (d) (SES), and race/ethnicity.
vocabulary, and (e) comprehension.
2002; thus spring samples prior to 2002 are smaller. As and discriminant function analyses to assess the validity
displayed in Table 9, PALS-K identification rates have of PALS-K tasks. The following sections contain a
trended downward since 2003 to about 12% of kin- brief description of the technical adequacy of PALS-K
dergartners screened identified as needing additional in terms of reliability (the consistency of scores) and
instruction in Fall 2011. In Table 10, the discrepancy validity (the extent to which PALS-K is supported as a
between means and standard deviations for Identified true measure of the construct of emergent reading).
and Not-identified groups highlights the clear distinc-
tion between these groups.
Reliability
We examine and summarize PALS-K scores each year
for indices of central tendency, internal consistency, Reliability coefficients provide information about the
and item reliability. We also conduct factor analyses consistency of test scores. Reliability may be assessed
Section V Technical Adequacy 23
Table 10 Means and Standard Deviations for Summed Scores for Kindergarten
Students Identified and Not-identified
Date Summed Score Mean (sd)
ID Not-ID
Fall 1998 16.93 (6.65) 55.24 (16.34)
Fall 1999 16.97 (6.72) 55.52 (16.47)
Fall 2000 17.07 (6.80) 56.78 (16.58)
Fall 2001 18.85 (7.21) 67.62 (23.34)
Fall 2002 17.27 (6.80) 58.29 (16.99)
Fall 2003 16.82 (6.96) 60.50 (19.39)
Fall 2004 17.08 (6.89) 61.09 (19.12)
Fall 2005 17.00 (6.90) 63.31 (19.49)
Fall 2006 16.76 (7.04) 64.07 (19.36)
Fall 2007 17.08 (6.79) 64.82 (19.53)
Fall 2008 16.99 (6.91) 65.95 (19.27)
Fall 2009 17.22 (6.85) 66.82 (19.61)
Fall 2010 17.11 (6.86) 67.06 (19.48)
Fall 2011 17.36 (6.71) 68.13 (19.57)
Fall 2012 17.00 (7.01) 67.50 (19.40)
Fall 2013 17.40 (6.69) 67.88 (19.41)
Data are not provided here for the initial PALS cohort in 1997, as identification was based not on a Summed Score criterion but on passing a certain
number of subtasks within each domain. Scores in Fall 2001 were higher because Concept of Word was included in Summed Score.
24 PALS-K Technical Reference
by comparing the scores of individuals taking the alpha.37 Table 12 displays the alpha coefficients for
same test on different occasions (test-retest reliability), PALS-K tasks by gender, SES, and race/ethnicity,
taking equivalent forms of the test (equivalent forms based on statewide samples from fall screenings
reliability), or, when it is not practical to assess indi- from 1998 through 2007. We do not report spring
viduals on two separate occasions, to examine the screening results here for years prior to 2002, as
internal consistency of the scale (e.g., split-half reli- during those years only students identified in the fall
ability). Reliability evaluates the error of measurement were required to be rescreened in the spring. Thus,
or the “true score” variance. We assess three aspects spring samples prior to Spring 2002 were not repre-
of PALS’ reliability: test-retest reliability, internal con- sentative of statewide samples overall.
sistency (subtask reliability), and the consistency and
accuracy of scoring (inter-rater reliability). Internal Reliabilities for PALS subtasks by gender and race/
consistency was assessed using Cronbach’s alpha;36 ethnicity are presented in Table 13 for Fall 2007
these results are reported in the following sections. and later. These reflect the new categories used by
Inter-rater reliability was assessed by having tasks the Virginia Department of Education to describe
scored and tabulated by multiple raters. children’s ethnicity. We also began consistently dis-
aggregating SES to a finer degree starting Fall 2007,
Test-retest Reliability using deciles (10 equal groups) instead of quartiles
Test-retest reliability was assessed in Fall 2002 with (4 equal groups). For Fall 2008 and Spring 2009,
a sample of 473 students. In this study, teachers Cronbach’s alpha averaged .86 (range = .78 to .88)
administered PALS-K a second time to a randomly across the ten decile groups of school–level SES
selected sample of their students. These students (based on free or reduced lunch counts).
were tested again at least one week, but no more
than two weeks, after their initial screening. We then Inter-rater Reliability
computed Pearson correlations between scores on Inter-rater reliability coefficients provide evidence
the two administrations as an indicator of test-retest that different individuals score a particular task the
reliability. Test-retest reliabilities, which ranged from same way. To determine the inter-rater reliability of
.78 to .95, are presented in Table 11. PALS-K, scores for various PALS-K tasks from two
different raters (or scorers) were compared. The most
Subtask Reliability extensive assessments of inter-rater reliability were
Reliabilities for PALS subtasks were determined for conducted in Fall 1997 and Spring 1999. In these
gender, SES, race/ethnicity, and region using data studies, one person administered the PALS-K sub-
generated from statewide samples from 1998 to 2007. tasks while a second person observed and scored the
Task reliabilities were determined using Cronbach’s tasks simultaneously but independently. Each person
Native
American Indian/ Ethnicity
Black White Hispanic Asian Hawaiian/Other
Alaska Native Unspecified
Pacific Islander
Fall 2008 .87 .88 .88 .88 .88 .88 .88
Spring 2009 .84 .81 .84 .81 .82 .85 .82
Fall 2009 .88 .88 .88 .89 .89 .89 .88
Spring 2010 .85 .83 .86 .85 .82 .83 .83
Fall 2010 .88 .88 .88 .87 .88 .89 .88
Spring 2011 .83 .82 .84 .83 .83 .86 .89
Fall 2011 .86 .86 .85 .86 .87 .86 .89
Spring 2012 .84 .84 .85 .85 .84 .85 —
Fall 2012 .86 .86 .86 .88 .87 .88 —
Spring 2013 .85 .84 .86 .86 .83 .82 —
Fall 2013 .86 .86 .86 .87 .87 .87 —
Spring 2014 .86 .85 .87 .86 .86 .86 —
Section V Technical Adequacy 27
administering or scoring tasks in these studies expe- Table 14 Inter-rater Reliabilities Expressed as
rienced the same training provided to all teachers Pearson Correlation Coefficients for PALS Tasks
using PALS: they read the PALS teacher’s manual
PALS-K Task Date Correlation (n)
and viewed the PALS training video prior to admin-
istration. As shown in Table 14, inter-rater reliability Fall 1997 K & 1: .99 (n = 134)
Rhyme Awareness
coefficients were consistently high (range: .96–.99), Spring 1999 K & 1: .96 (n = 154)
suggesting that PALS-K can indeed be administered Fall 1997 K & 1: .99 (n = 122)
and scored reliably. Beginning Sound Awareness
Spring 1999 K & 1: .99 (n = 154)
Fall 1997 K & 1: .99 (n = 122)
Internet Data Entry Reliability Alphabet Recognition
In the Commonwealth of Virginia, teachers enter Spring 1999 K & 1: .99 (n = 154)
PALS scores via the Internet into a password-pro- Fall 1997 K & 1: .99 (n = 121)
Letter Sounds
tected, securely encrypted database. The reliability Spring 1999 K & 1: .98 (n = 154)
of score entry into the Internet database is checked
Fall 1997 K & 1: .99 (n = 130)
regularly against a randomly selected sample of Spelling
the original hand-scored Class Summary sheets. In Spring 1999 K & 1: .99 (n = 154)
Spring 2003, we compared a 10% sample of these Concept of Word (total score) Fall 2001 K: .97 (n = 110)
Class Summary Sheets against the PALS database. *Preprimer: .99 (n = 51)
Based on a sample of 5,931 students’ score entries, Word Recognition in Isolation Fall 2000 *Primer: .99 (n = 52)
which consisted of 74,612 individual data points, we *First Grade: .98 (n = 45)
found 708 errors, reflecting an overall accuracy of p < .01 for all correlations, * indicates level of word list; inter-rater reliability for word
lists was assessed in Fall 2000 using students in first through third grades.
Internet data entry of 99.1%.
line-by-line behavioral criteria was included. A by the total model was 50% (p < .001), and the
more detailed explanation of the content validity of adjusted R2 was .47. PALS fall Summed Scores also
PALS-K tasks can be found in this technical refer- predicted spring Stanford-9 scaled scores for all three
ence under Item Development. Stanford-9 subtests: Sentence Reading, Word Reading,
and Letters and Sounds (p < .001). The adjusted R2 for
Criterion-related Validity Stanford-9 Word Reading was .54 (p < .001).
Criterion-related validity determines whether assess-
ment scores are related to one or more outcome A second assessment of predictive validity involves
criteria.39 There are two types of criterion-related an examination of the relationship between PALS-K
validity: predictive, in which an assessment is used scores from a current administration and future
to predict future performance; and concurrent, in PALS scores. For example, we found significant (p <
which assessment results are compared to perfor- .001 in all cases) and medium to medium-high cor-
mance on a different assessment administered at relations between kindergarten students’ Summed
approximately the same time. Both forms of validity Scores from Fall 2000 and later PALS-K scores from
have been assessed for PALS-K. spring of their kindergarten year (r = .56), as well as
with PALS 1–3 Entry Level scores from the fall (r =
Predictive Validity. The predictive validity of .67) and spring (r = .53) of their first grade year. The
PALS-K has been assessed in two ways. First, PALS shared variance evident in these correlations offers
scores from the fall were compared with Stanford some evidence of the predictive power of PALS-K
Achievement Test40 scores obtained during the spring Summed Scores relative to future PALS scores.
of the same school year. When PALS was developed,
the administration of the Stanford-9 was required in We also examined the predictive power of individual
the Commonwealth of Virginia in grades 3, 5, 8, and subtask scores from PALS-K in Fall 2000 relative
11. In addition, the kindergarten and first-grade ver- to future PALS-K and PALS 1–3 scores. Regression
sions of the Stanford-9 contain three subtests that are equations using the subtask scores making up
similar, though not identical, to several PALS tasks: the PALS-K Summed Score (Rhyme Awareness,
Sounds and Letters, Word Reading, and Sentence Beginning Sound Awareness, Alphabet Recognition,
Reading. In Fall 1998, 74 kindergartners from one Letter Sounds, and Spelling) as independent vari-
school division were screened with PALS. None of ables yielded adjusted R2 values of .33 in predicting
the students were provided additional instruction Spring 2001 Summed Scores, .45 in predicting Fall
apart from that which all students receive during the 2001 PALS 1–3 Entry Level Scores, and .30 in pre-
school year. The same 74 students were given the dicting Spring 2002 PALS 1–3 Entry Level Scores.
Stanford-9 at the end of the school year, in Spring Thus the amount of variation in future PALS scores
1999. Fall PALS Summed Scores and all PALS sub- that could be predicted based on their relationship
task scores were significantly correlated with spring to PALS-K subtask scores ranged from 30% to 45%
Stanford-9 scaled scores (p < .001). The correla- for the subsequent spring and the following fall and
tion between fall PALS Summed Scores and spring spring of first grade.
Stanford-9 Total Reading scaled scores was .70.
In a study of the predictive validity of PALS-K (n =
Significant amounts of variance in the kindergarten 61,124), discriminant analysis was used to assess the
Stanford-9 Total Reading scaled scores were explained relationship between the 2012 Reading SOL scores
by the five core PALS subtasks collectively (Rhyme in the spring of third grade and the students’ spring
Awareness, Beginning Sound Awareness, Alphabet PALS-K scores three years earlier. The discrimi-
Recognition, Letter Sounds, and Spelling). In regres- nant analysis correctly classified 85% of students
sion equations, the proportion of variance explained according to their pass-fail status on the SOL.
Section V Technical Adequacy 29
The Reading SOL scores and the spring PALS-K Figure 1 PALS Theoretical Model
summed scores were moderately correlated (r = .43).
The individual tasks that had the highest correla-
tion with the SOL scores were Concept of Word (r =
.43), Spelling (r = .35), and letter sound knowledge
(r = .33). Although approximately 77 thousand kin-
dergarteners had been assessed using PALS-K in the
spring of 2009, only 80% of the students had 2012
Reading SOL scores three years later. The students that
remained in the sample had higher spring PALS-K
scores (Ms = 94.2 vs. 82.8, d = 0.97) and were more
likely to be White (58% vs 50%) compared to the
kindergarten students who had taken PALS-K and
were not matched in the sample. With the limitations
noted above, these correlations provide some evidence Construct Validity
that PALS-K scores may account for a portion of the Construct validity refers to the degree to which the
variation in a child’s SOL scores three years later. underlying traits of an assessment can be identified
and the extent to which these traits reflect the theo-
Concurrent Validity. Concurrent validity is the retical model on which the assessment was based.42
extent to which outcomes obtained from a particular
measure are consistent with some independent The theoretical model on which PALS was based is
standard.41 The independent standard against which illustrated in Figure 1. As demonstrated there, PALS
PALS was compared was the Stanford-9 (1997). was designed to assess children’s knowledge of sound
Again the three Stanford-9 subtests that are similar and print, and includes tasks that assess the wedding
to PALS tasks (Sounds and Letters, Word Reading, of the two. The pronunciation of letter sounds, the
and Sentence Reading) were administered in Spring ability to invent a phonetically plausible spelling, and
1999 to 137 kindergartners, who had also been given the ability to match speech to print and to recognize
PALS two weeks earlier. words out of context all require the application of
both sound and print knowledge.
The correlation between the end-of-year kinder-
garten PALS Summed Score and the Total Reading We tested the theoretical model illustrated in
scaled score of the Stanford-9 was medium to high Figure 1 several ways. First, we conducted principal
and significant (r = .72, p < .001). The correlations components analyses (PCA) on PALS data to verify
between the PALS Summed Score and the three the underlying factor structure. Second, we con-
Stanford-9 subtest scaled scores were also medium ducted discriminant analyses (DA) on PALS data to
to high and significant (Sounds and Letters, r = .79; determine the extent to which group membership
Word Reading, r = .74; and Sentence Reading, r = (i.e., Identified versus Not-identified as needing addi-
.58). Correlations between the PALS Summed Score tional services) could be predicted accurately from
and the Stanford-9 raw scores were similar: medium PALS subtask scores. Third, we conducted receiver-
to high and significant (Total Reading, r = .79; operating characteristic (ROC) curve analysis to
Sounds and Letters, r = .80; Word Reading, r = .78; evaluate the diagnostic accuracy of PALS-K.
Sentence Reading, r = .56). Consistently medium to
high correlations provide evidence of the concurrent Principal Components Analysis (PCA). We tested
validity of PALS with the Stanford-9 when adminis- the theoretical model illustrated in Figure 1 by
tered at the end of kindergarten. subjecting the first-year PALS results to principal
30 PALS-K Technical Reference
components analysis (PCA). In the first year, PCA the same for fall and spring. The eigenvalue for the
for the entire sample yielded one factor with an single factor resulting from PCA on PALS-K Spring
eigenvalue of 5.20. This factor represented the inter- 2011 data was 3.80. This factor accounted for nearly
relationship between sound and print. The same two-thirds (65%) of the total variance in the data set,
unitary factor was also found using kindergarten a pattern that has remained consistent across several
data only (eigenvalue of 4.92) and first-grade data years of statewide data.
only (eigenvalue of 4.05). The one-factor solution
suggested that PALS measures a unitary trait: emer- Factor Analysis. Exploratory and confirmatory
gent literacy. These results are in keeping with Perney factor analysis (CFA) were used to assess the factor
et al.’s (1997) research that also yielded a single structure of PALS-K with a sample of 2,844 first-time
factor. In Fall 1997, the single PALS factor accounted public-school kindergarteners.44 The sample was
for 58% to 74% of the total variance in the children’s randomly spit in two to form exploratory and con-
scores on all the tasks in both the phonological firmatory samples. In a comparison of three models
awareness and literacy components of PALS for the (e.g., a one factor model, a two-correlated factor
entire sample, for kindergarten, and for first grade.43 model, and a hierarchical model), CFA results indi-
cated that the data could best be represented by a
This unitary factor was replicated using Fall 1998 second-order factor model. An overall general factor
and Fall 1999 PALS results. Principal components of early literacy influenced three first-order factors
analysis again yielded a single eigenvalue greater of alphabet knowledge, phonological awareness, and
than one for the entire sample, for kindergarten, and contextual knowledge. The factor model (see Figure
for first grade. Factor loadings from the second and 2) was found to replicate in the confirmatory hold-
third year were similar to the first: five core variables out sample providing further evidence of the model’s
(Rhyme Awareness, Beginning Sound Awareness, generalizability.
Alphabet Recognition, Letter Sounds, and Spelling)
defined the construct. Factor loadings for Letter In addition to assessing the factor structure of
Sounds and Spelling were consistently large and PALS-K, metric invariance of the instrument was
account for most of the construct. Factor loadings investigated. Metric invariance is concerned pri-
for Rhyme Awareness were the smallest. This pattern marily whether the estimated factor structure is
stayed the same for the entire sample, for kinder- statistically indistinguishable between defined groups
garten only, and for first grade only.
Figure 2 Second-order Factor structure
The PALS theoretical model was tested again using of PALS-K
2000–01 statewide data (n = 74,054) and subse-
quently using 2001–02 statewide data (n = 65,036), ABC
Alphabet
to see whether the newly configured Concept of knowledge LS
Word task would also load onto one factor. Principal
components analysis yielded one factor with an
RHYME
eigenvalue greater than one for each year. Even with Early Phonological
literacy awareness BS
Concept of Word included in the analysis, factor
loadings are similar to previous factor loadings: all
COW
PALS-K tasks, including Concept of Word, define Contextual
the construct. Factor loadings for Letter Sounds and knowlege SPELL
of students. In this particular study, the factor model needing additional intervention and those not identi-
was tested comparing Spanish-speaking students fied as needing additional intervention.
who were English language learners (ELLs) with
non-ELL students. Research supported the metric As an external measure or risk status, students
invariance of PALS-K between both groups of stu- who scored at or below the 20th percentile of the
dents lending support for the use of PALS-K with Stanford Reading First46 (SRF) were identified to
these populations. be at risk for future difficulties. The SRF, an edition
of the Stanford-10, is a national norm-referenced
Discriminant analyses (DA). The purpose of dis- assessment. The SESAT/2 (Stanford Early School
criminant analysis is to determine whether test Achievement Test) multiple-choice reading subtest
scores can discriminate accurately between groups of of the SRF, which was appropriate for students in
subjects if their group identity is removed. Because the second half of kindergarten, was administered to
PALS is designed as a screening tool to identify stu- the students by their teachers. The SRF focused on
dents in need of additional reading instruction, we the five components of scientifically-based reading
test this hypothesis each year by determining the research: phonemic awareness, phonics, vocabulary
extent to which a combination of PALS subtest scores development, reading fluency, and comprehension.
accurately predicts membership in Identified and
Not-identified groups. A total of 3,506 kindergarteners (males = 50.2%) in 73
schools in Virginia were assessed using both the PALS-K
In Spring 2010, discriminant analyses using the six and the SRF in the spring of 2005. Of the sample, 45%
PALS-K subtask scores that made up the Summed were White, 48% were Black, 4% were Hispanic, and 3%
Score (Rhyme Awareness, Beginning Sound were of another race/ethnicity. A majority of students
Awareness, Alphabet Recognition, Letter Sounds, (58%) were eligible for free or reduced-price lunch and
Spelling, and Concept of Word-word list) yielded a 98% had no identified disabilities.
function that was statistically significant in differenti-
ating groups (as indicated by a statistically significant The Area Under the Curve (AUC) statistic of a
Wilks’s lambda for the discriminant function). The ROC curve is an overall indication of diagnostic
same function accurately classified 98% of students accuracy. AUC values of 1.00 indicate perfect clas-
as Identified or Not-identified. This classification rate sification accuracy whereas values of 0.50 indicate
has remained consistent since 1997, with discrimi- that the screener is no better than chance. The
nant analyses accurately classifying 93% to 98% of National Center on Response to Intervention uses
students. Table 15 summarizes DA results across the an AUC value greater than .85, along with several
last thirteen PALS cohorts. other criteria, to indicate if a screener has convincing
evidence for classification accuracy.47 In our sample,
Receiver-operating characteristic (ROC) curve PALS-K had an overall AUC of .91. In terms of dis-
analysis. Receiver-operating characteristic (ROC) aggregated classification accuracy for White, Black,
curve analysis is a tool for evaluating the perfor- and Hispanic populations, PALS-K had AUC values
mance of a diagnostic test in classifying subjects into of .91, .90, and .92, respectively. Findings indicate
one of two categories (e.g., at risk for future reading that PALS-K has excellent classification capabilities.
difficulties, not at risk for future reading difficulties)
often used in laboratory testing, epidemiology, and Together, the results of our PCA, CFA, DA, and ROC
psychology.45 To estimate the classification accuracy curve analyses continue to suggest that PALS-K assesses
using ROC curve analysis, an external indicator a single general construct associated with emergent
is used to evaluate how well PALS discriminates reading and, further, that the combination of variables
between those students who were identified as making up the PALS-K subtasks discriminates reliably
32 PALS-K Technical Reference
between groups of students who are or are not identi- task scores. In Fall 2011, for example, these correla-
fied as needing additional reading instruction. tions were .87 (Alphabet Recognition); .94 (Letter
Sounds); and .88 (Spelling).
Intercorrelations among PALS-K Tasks. A third
source of evidence for a test’s construct validity may Medium-high correlations (between .60 and .79)
be found in the intercorrelations among its subtests. are consistently obtained between Summed Scores
We examined the intercorrelations among PALS-K and Beginning Sound Awareness (r = .76); Rhyme
task scores to assess the relationships among PALS-K Awareness (r = .60); preprimer Word Recognition in
tasks and, further, to verify that the pattern of inter- Isolation (r = .75); and primer Word Recognition in
correlations is consistent across student subgroups Isolation (r = .61). In Fall 2011, Concept of Word-
(e.g., SES levels or race/ethnicity categories). word list also correlated in the medium-high range
(r = .75) with the Summed Score. This pattern of
High correlations (above .80) are consistently intercorrelations among PALS tasks administered to
obtained between the PALS-K Summed Score and kindergartners in the fall has been consistent over
Alphabet Recognition, Letter Sounds, and Spelling the past years.
Section V Technical Adequacy 33
Patterns of intercorrelation among PALS-K tasks are defined as the average factor by which the odds that
also examined within subgroups of the statewide members of one group will answer a question cor-
sample based on geographic region, gender, race/ rectly exceed the corresponding odds for comparable
ethnicity, and SES. For example, the pattern of inter- members of another group. The MH statistic is a
correlation across regions of the state has consistently form of odds ratio.48
mirrored that of the entire sample. That is, for the last
three years, all intercorrelations that were high in the To explore the consistency of responses to PALS
entire sample were also high within each of Virginia’s items, we examined the responses to PALS-K tasks
eight regions. The same can be said for intercorrela- from groups defined as Identified and Not-identified
tions that were medium-high and medium as well. for additional instruction under EIRI, based on
The patterns of intercorrelation are also similar for their PALS-K Summed Score. Since the purpose
males and females, racial/ethnic groups, and all levels of PALS-K is to identify children in need of addi-
of SES, suggesting that tasks in PALS-K behave in tional instruction, individual items within each
a similar manner for all students regardless of geo- task should function differently for Identified and
graphic region, gender, SES, or race/ethnicity. Not-identified groups. For each of the last three
mandatory screening windows, this was the case for
The one exception to the pattern of consistency kindergarten student scores. Table 16 displays the
across groups emerged when we examined inter- Mantel-Haenszel statistic (based on item scores) for
correlations within groups of Identified versus each PALS-K subtask for Spring 2009, Fall 2009, and
Not-identified students. Generally, intercorrelations Spring 2010. As can be seen, the general association
were lower within the Identified group than the Not- statistic is significant for all PALS tasks for both fall
identified group; this is likely due to the restriction and spring.
in the range of scores that naturally occurs when the
Identified group is isolated.
Table 16 Mantel-Haenszel Statistics (Based on Item Scores) for Identified and Not-identified
Groups: Spring 2009, Fall 2009, and Spring 2010
Spring 2009 Fall 2009 Spring 2010
PALS Task GA p GA p GA p
Group Rhyme Awareness 19,957 < .001 18,129 < .001 19,890 < .001
Group Beginning Sound
24,577 < .001 21,567 < .001 27,334 < .001
Awareness
Alphabet Recognition 20,654 < .001 44,692 < .001 23,275 < .001
Letter Sounds 37,641 < .001 28,192 < .001 39,187 < .001
Spelling 45,584 < .001 16,339 < .001 46,977 < .001
Concept of Word (word list) 32,603 < .001 13,951 < .001 39,704 < .001
*GA= general association
34 PALS-K Technical Reference
Section VI
Summary
The technical adequacy of PALS-K has been estab- Identified and Not-identified students. All of these
lished through pilots and statistical analyses of PALS analyses provide evidence of the validity of PALS-K
scores from more than 600,000 kindergarten stu- as an emergent literacy assessment that reliably iden-
dents statewide over the last ten years. The reliability tifies students in need of additional instruction in
of individual subtasks is supported through the use reading and writing.
of Cronbach’s alpha. Reliability coefficients for indi-
vidual tasks range from .79 to .89 and demonstrate In summary, PALS-K provides an assessment tool
the adequacy of their internal consistency. Inter- with good evidence of validity that can be used reli-
rater reliabilities, expressed as Pearson correlation ably to screen students in kindergarten for difficulty
coefficients, have ranged from .96 to .99, indicating in emergent literacy. PALS-K shows evidence of
that PALS-K tasks can be scored consistently across both internal consistency and inter-rater reliability,
individuals. In all of these analyses, PALS-K has been indicating that it can be administered and scored
shown to be steady, reliable, and consistent among consistently by different users. PALS-K also shows
many different groups of users. evidence of content, construct, and criterion-related
validity, suggesting that PALS-K indeed captures
Data analyses also support the content, construct, the underlying constructs associated with emergent
and criterion-related validity of PALS-K. Principal literacy.
components analyses, discriminant function anal-
yses, receiver-operating charactistic curve analyses,
and intercorrelations among tasks provide evidence
of the construct validity of PALS-K. Regression anal-
yses have shown the predictive relationship between
PALS-K Summed Scores in the fall and Stanford-9
scores in the spring. Coefficients of determination
have demonstrated that a significant proportion of
the variability in spring Stanford-9 scores can be
explained by the PALS-K Summed Score from nine
months earlier. Additional evidence of predictive
validity is provided by regression equations based
on fall PALS-K scores that account for 25% to 45%
of the variance in PALS-K and PALS 1–3 scores
obtained in the subsequent four screening windows
(spring of kindergarten, fall of first grade, spring
of first grade, and fall of second grade). Similar
analyses have demonstrated the concurrent validity
of PALS-K, also using the Stanford-9. In addition,
differential item functioning analyses using the
Mantel-Haenszel statistic demonstrate the consis-
tency of responses to specific tasks across groups of
Section VII References 35
Section VII
References
Abouzeid, M. P. (1986). Developmental stages of word knowledge in Dorans, N. (1989). Two new approaches to assessing differential item
dyslexia (Doctoral dissertation, University of Virginia, 1986). Dissertation functioning: standardization and the Mantel-Haenszel method. Applied
Abstracts International, 48, 09A2295. Measurement in Education, 2, 217-233.
Adams, M. (1990). Beginning to read: Thinking and learning about print. EDL (1997). EDL core vocabularies in reading, mathematics, science, and
Cambridge, MA: MIT Press. social studies. Orlando, FL: Steck-Vaughn.
American Educational Research Association (AERA), American Ganske, K. (1999). The developmental spelling analysis: A measure of
Psychological Association (APA), & National Council on Measurement orthographic knowledge. Educational Assessment, 6, 41-70.
in Education (NCME) (1999). Standards for educational and psychological
testing. Washington, DC: American Psychological Association. Gill, C. E. (1980). An analysis of spelling errors in French (Doctoral dis-
sertation, University of Virginia, 1980). Dissertation Abstracts International,
Bader, L. (1998). Bader Reading and Language Inventory (3rd ed.). Upper 41, 09A3924.
Saddle River, NJ: Prentice-Hall.
Gill, T. P. (1985). The relationship between spelling and word recognition
Barnes, W. G. (1993). Word sorting: The cultivation of rules for spelling in of first, second, and third graders (Doctoral dissertation, University of
English. Reading Psychology: An International Quarterly, 10, 293-307. Virginia, 1985). Dissertation Abstracts International, 46, 10A2917.
Bear, D. (1989). Why beginning reading must be word-by-word: Disfluent Gronlund, N. E. (1985). Measurement and evaluation in teaching. New
oral reading and orthographic development. Visible Language, 23 (4), 353- York: Macmillan.
367.
Harcourt Assessment, Inc. (2004). Stanford Reading First: Technical data
Bear, D., Invernizzi, M., Templeton, S., & Johnston, F. (2004). Words their report. San Antonio, TX: Author.
way: Word study for phonics, vocabulary and spelling instruction (3rd ed.).
Upper Saddle River, NJ: Merrill. Henderson, E. (1990). Teaching spelling (2nd ed.). Boston: Houghton
Mifflin.
Bodrova, E., Leong, D., & Semenov, D. (1999). 100 most frequent words
in books for beginning readers. Retrieved from http://www.mcrel.org/ Henderson, E., & Beers, J. (1980). Developmental and cognitive aspects of
resources/literacy/road/100words learning to spell. Newark, DE: International Reading Association.
Bradley, L., & Bryant, P. (1983). Categorizing sounds and learning to read: Huang, F., & Konold, T. (2013). A latent variable investigation of the
A causal connection. Nature, 30, 419-421. Phonological Awareness Literacy Screening-Kindergarten assessment:
Construct identification and multigroup comparisons between Spanish-
Bradley, L., & Bryant, P. (1985). Rhyme and reason in reading and spelling. speaking English language learners (ELLs) and non-ELL students.
Ann Arbor, MI: University of Michigan Press. Language Testing. doi: 10.1177/0265532213496773
Bryant, P., MacLean, M., & Bradley, L. (1990). Rhyme, language, and chil- Invernizzi, M. (1992). The vowel and what follows: A phonological frame
dren’s reading. Applied Psycholinguistics, 11, 237-252. of orthographic analysis. In S. Templeton & D. Bear (Eds.), Development
of orthographic knowledge and the foundation of literacy (pp.105-136).
Bryant, P., MacLean, M., Bradley, L., & Crossland, J. (1989). Nursery Hillsdale, NJ: Lawrence Erlbaum and Associates, Inc.
rhymes, phonological skills and reading. Journal of Child Language, 16,
407-428. Invernizzi, M., Juel, C., Rosemary, C., & Richards, H. (1997). At-risk read-
ers and community volunteers: A three year perspective. Scientific Studies
Burton, R., Hill, E., Knowlton, L., & Sutherland, K. (1999). A Reason for of Reading, 1, 277-300.
Spelling. Retrieved from http://www.areasonfor.com/index.htm
Invernizzi, M., Meier, J. D., Swank, L., & Juel, C. (1997). PALS:
Cantrell, R. (1991). Dialect and spelling in Appalachian first grade chil- Phonological Awareness Literacy Screening. Charlottesville, VA: University
dren (Doctoral dissertation, University of Virginia, 1991). Dissertation Printing Services.
Abstracts International, 53, 01A0112.
Invernizzi, M., Robey, R., & Moon, T. (1999). Phonological Awareness
Carroll, J. B., Davies, P., & Richman, B. (1971). The American Heritage Literacy Screening (PALS) 1997-1998: Description of sample, first-year
word frequency book. Boston: Houghton Mifflin. results, task analyses, and revisions. Technical manual and report prepared
for the Virginia Department of Education. Charlottesville, VA: University
Catts, H. W. (1993). The relationship between speech-language impair- Printing Services.
ments and reading disabilities. Journal of Speech & Hearing Research, 36,
948-958. Invernizzi, M., Robey, R., & Moon, T. (2000). Phonological Awareness
Literacy Screening (PALS) 1998-1999: Description of sample & second-year
Clay, M. M. (1979). Reading: The patterning of complex behavior. Auckland, results. Technical manual and report prepared for the Virginia Department
New Zealand: Heinemann. of Education. Charlottesville, VA: University Printing Services.
Code of Fair Testing Practices in Education. (1988). Washington, DC: Joint Johnston, F., Invernizzi, M., & Juel, C. (1998). Book Buddies: Guidelines
Committee on Testing Practices. for volunteer tutors of emergent and early readers. New York: Guilford
Publications.
Dolch, E. W. (1936). A combined word list. Boston: Ginn.
36 PALS-K Technical Reference
Leslie, L., & Caldwell, J. (1995). Qualitative Reading Inventory-II. New Stieglitz, E. (1997). The Stieglitz informal reading inventory (2nd ed.).
York: HarperCollins. Needham Heights, MA: Allyn & Bacon
Lyon, R. (1998). Overview of reading and literacy initiatives. Retrieved Superintendents’ Memo No. 081–09 (2009). Early Intervention Reading
January 15, 2002, from National Institute of Health, National Institutes of Initiative — Application Process for the 2009–2010 school year. Retrieved
Child Health and Human Development, Child Development and Behavior May 23, 2005, from Commonwealth of Virginia Department of Education
Branch Web site: http://156.40.88.3/publications/pubs/jeffords.htm Web site: http://www.doe.virginia.gov/info_centers/administrators/
superintendents_memos/2009/081-09.shtml
McBride-Chang, C. (1998). The development of invented spelling. Early
Education & Development, 9, 147-160. Swank, L.K. (1991). A two level hypothesis of phonological awareness
(Doctoral dissertation, University of Kansas, 1991). Dissertation Abstracts
McLoughlin, J., & Lewis, R. (2001). Assessing students with special needs. International, 53-08A, 2754.
Upper Saddle River, NJ: Merrill/Prentice Hall.
Swank, L.K. (1997). Linguistic influences on the emergence of writ-
Mehrens, W., & Lehmann, I. (1987). Using standardized tests in education. ten word decoding in first grade. American Journal of Speech-Language
New York: Longman. Pathology: A Journal of Clinical Practice, November 1997.
Meltzoff, J. (1998). Critical thinking about research. Washington, DC: Swank, L.K., & Catts, H.W. (1994). Phonological awareness and written
American Psychological Association. word decoding. Language, Speech, and Hearing Services in Schools, 25, 9-14.
Morris, D. (1981). Concept of word: A developmental phenomenon in the Swets, J.A., Dawes, R. M., Monahan, J. (2000). Psychological science can
beginning reading and writing process. Language Arts, 58, 659-668. improve diagnostic decisions. Psychological Science in the Public Interest,
1, 1–26.
Morris, D. (1992). What constitutes at risk: Screening children for first
grade reading intervention. In W. A. Second (Ed.), Best practices in school- Templeton, S. (1983). Using the spelling/meaning connection to develop
speech pathology (pp.17-38). San Antonio, TX: Psychological Corporation. word knowledge in older students. Journal of Reading, 27, 8-14.
Morris, D. (1993). The relationship between children’s concept of word Torgesen, J. K., & Davis, C. (1996). Individual differences variables
in text and phoneme awareness in learning to read: A longitudinal study. that predict responses to training in phonological awareness. Journal of
Research in the Teaching of English, 27, 133-154. Experimental Child Psychology, 63, 1-21.
Morris, D. (1999). The Howard Street tutoring manual. New York: Guilford Torgesen, J. K., & Wagner, R. K. (1998). Alternative diagnostic approaches
Press. for specific developmental reading disabilities. Learning Disabilities
Research & Practice, 31, 220-232.
Perney, J., Morris, D., & Carter, S. (1997). Factorial and predictive valid-
ity of first graders’ scores on the Early Reading Screening Instrument. Torgesen, J. K., Wagner, R. K., Rashotte, C. A., Burgess, S., & Hecht, S.
Psychological Reports, 81, 207-210. (1997). Contributions of phonological awareness and rapid automatic
naming ability to the growth of word-reading skills in second to fifth
Rathvon, N. (2004). Early reading assessment: A practitioner’s handbook. grade children. Scientific Studies of Reading. Vol 1(2), 161-185.
New York: The Guilford Press.
Vellutino, F. R., Scanlon, D. M., Sipay, E. R., Small, S. G., Pratt, A., Chen,
Richardson, E., & DiBenedetto, B. (1985). Decoding skills test. Los Angeles: R., & Denckla, M. B. (1996). Cognitive profiles of difficult-to-remediate
Western Psychological Services. and readily remediated poor readers: Early intervention as a vehicle for
distinguishing between cognitive and experiential deficits as a basic cause
Rinsland, H. D. (1945). A basic vocabulary of elementary school children. of specific reading disability. Journal of Educational Psychology, 88, 601-
New York: Macmillan. 638.
Roberts, E. (1992). The evolution of the young child’s concept of word as Viise, N. (1992). A comparison of child and adult spelling (Doctoral dis-
a unit of spoken and written language. Reading Research Quarterly, 27, sertation, University of Virginia, 1992). Dissertation Abstracts International,
124-138. 54, 5A1745.
Santa, C., & Hoien, T. (1999). An assessment of early steps: A program for Virginia Department of Education. (1995). Standards of learning for
early intervention of reading problems. Reading Research Quarterly, 34, 54-79. Virginia public schools. Richmond, VA: Commonwealth of Virginia Board
of Education.
Scarborough, H. S. (1998). Early identification of children at risk for read-
ing disabilities: Phonological awareness and some other promising predic- Worthy, M. J., & Invernizzi, M. (1990). Spelling errors of normal and
tors. In B. K. Shapiro, P. J. Accardo, & A. J. Capute (Eds.), Specific reading disabled students on achievement levels one through four: Instructional
disability: A view of the spectrum (pp. 75-199). Timonium, MD: York Press. implications. Bulletin of the Orton Society, 40, 138-151.
Scarborough, H. S. (2000). Predictive and causal links between language and Yopp, H. K. (1988). The validity and reliability of phonemic awareness
literacy development: Current knowledge and future direction. Paper pre- tests. Reading Research Quarterly, 23(2), 159-177.
sented at the Workshop on Emergent and Early Literacy: Current Status
and Research Direction, Rockville, MD. Zou, K. H., O’Malley, A. J., & Mauri, L. (2007). Receiver-operating char-
acteristic analysis for evaluating diagnostic tests and predictive models.
Schlagal, R. C. (1989). Informal and qualitative assessment of spelling. The Circulation, 115, 654–657.
Pointer, 30(2), 37-41.
Zutell, J. B. (1975). Spelling strategies of primary school children and their
Shanker, J. L., & Ekwall, E. E. (2000). Ekwall/Shanker reading inventory relationship to the Piagetian concept of decentration (Doctoral disserta-
(4th ed.). Boston: Allyn & Bacon. tion, University of Virginia, 1975). Dissertation Abstracts International, 36,
8A5030.
Snow, C., Burns, M., & Griffin, P. (Eds.). (1998). Preventing reading diffi-
culties in young children. Washington, DC: National Academy Press.
Stanford Achievement Test (9th ed.). (1996). San Antonio, TX: Harcourt
Brace.
Section VIII Endnotes 37
Section VIII
Endnotes
1
Supts. Memo No. 081–09, 2009. 28
Johnston et al., 1998.
2
Virginia Department of Education, 1995, p. 61. 29
Morris, 1999.
3
Catts, 1993; Lyon, 1998; Scarborough, 1998, 2000; Torgesen & Wagner, 30
Carroll, Davies, & Richman, 1971.
1998; Torgesen, Wagner, Rashotte, Burgess, & Hecht, 1997; Vellutino,
Scanlon, Sipay, Small, Pratt, Chen, & Denckla, 1996. 31
McLoughlin & Lewis, 2001.
4
Swank, 1991; Yopp, 1988. 32
Invernizzi, Robey, & Moon, 1999.
5
Bradley & Bryant, 1983, 1985; Bryant, MacLean, & Bradley, 1990; Bryant, 33
Rathvon, 2004, p. 250–261.
MacLean, Bradley, & Crossland, 1989; Swank, 1997; Swank & Catts, 1994.
34
Invernizzi et al., 1997.
6
Bradley & Bryant, 1983, 1985; Bryant et al., 1989; Bryant et al., 1990;
Swank, 1997; Swank & Catts, 1994. 35
Invernizzi, Robey, & Moon, 2000.
7
Bear, Invernizzi, Templeton, & Johnston, 2004. 36
Mehrens & Lehmann, 1987.
8
Mehrens & Lehmann, 1987. 37
Idem.
9
Morris, 1992. 38
Swank, 1991; Yopp, 1988.
10
Johnston, Invernizzi, & Juel, 1998. 39
Standards for Educational and Psychological Testing, 1999.
11
Invernizzi, Juel, Rosemary, & Richards, 1997; Morris, 1999; Perney et al., 40
Stanford-9, 1996.
1997; Santa & Hoien, 1999.
41
Meltzoff, 1998.
12
Adams, 1990; Snow, Burns, & Griffin, 1998.
42
Gronlund, 1985.
13
Invernizzi, Meier, Swank, & Juel, 1997.
43
Invernizzi et al., 1999.
14
McBride-Chang, 1998.
44
Huang & Konold, 2013.
15
Torgesen & Davis, 1996.
45
Swets, Dawes, & Monahan, 2000; Zou, O’Malley, & Marui, 2007
Abouzeid, 1986; Barnes, 1993; Bear, 1989; Cantrell, 1991; Ganske, 1999;
16
19
Snow et al., 1998.
20
Bodrova, Leong, & Semenov, 1999.
21
Henderson, 1990.
22
Burton, Hill, Knowlton, & Sutherland, 1999.
23
Dolch, 1936.
24
Rinsland, 1945.
25
Leslie & Caldwell, 1995.
26
Richardson & DiBenedetto, 1985.
27
Shanker & Ekwall, 2000.