Jen Et Al. - Science Teachers' TPACK-Practical Standard-Setting Using An Evidence-Based Approach
Jen Et Al. - Science Teachers' TPACK-Practical Standard-Setting Using An Evidence-Based Approach
net/publication/288992353
CITATIONS READS
42 2,185
5 authors, including:
Some of the authors of this publication are also working on these related projects:
Investigating Students’ Scientific Inquiry Abilities by Using a Computer-based Assessment View project
Examining Science Teachers’ Needs and Practices about Computer-based Assessments View project
All content following this page was uploaded by Tsung-Hau Jen on 31 January 2016.
Tsung-Hau Jen1, Yi-Fen Yeh1*, Ying-Shao Hsu2, Hsin-Kai Wu2, & Kuan-Ming Chen1
1
Science Education Center, National Taiwan Normal University, Taipei, Taiwan;
2
Graduate Institute of Science Education, National Taiwan Normal University, Taipei, Taiwan
Abstract
To cite this article: Jen, T.-H., Yeh, Y.-F., Hsu, Y.-S., Wu, H.K., & Chen, K.-M. (2016).
Science Teachers’ TPACK-practical: Standard-setting using an evidence-based approach.
Computer & Education, 95, 45-62.
1
Proficiency Level Validation
1. Introduction
Surveys regarding technology in classrooms have revealed that teachers’ and students’
uses of technology have not been popular, even when technology was readily available for
instructional use (Gray, Thomas, & Lewis, 2010; Project Tomorrow, 2008). Lack of necessary
knowledge and confidence could explain this low frequency and ineffective use of
technology in classrooms, along with common concerns such as lacking good technological
tools (Afshari, Bakar, Luan, Samah, & Fooi, 2009; Mumtaz, 2000). Researchers have
proposed technological pedagogical content knowledge (TPACK) as a body of knowledge
that digital-age teachers should be expected to have in order to properly use technology in
their teaching (Mishra & Koehler, 2006; Niess, 2005). Teachers who use such knowledge are
believed to make their instruction comprehensible and assist their students through their
thoughtful use of technology, such as information communication technology (ICT).
Researchers have tried different approaches to investigating teachers’ TPACK. The
resulting information has revealed not only how well teachers may teach with technology, but
also constructive directions for future teacher education. However, what and how to measure
TPACK has become even more complicated, especially when considering its situated and
multifaceted nature (Cox & Graham, 2009; Doering, Veletsianos, Scharber, & Miller, 2009;
Koehler & Mishra, 2008) and the scales of the various proficiency levels (Dwyer, Ringstaff,
& Sandholtz, 1991; Sandholtz, Ringstaff, & Dwyer, 1997). Even more fundamental issues in
TPACK measurement are how it is defined and exactly what it is, which makes objective
evaluation a difficult task (or at least in need of exploratory data to provide a basis) (Mishra
& Henriksen, 2015). This research and development study constructed, validated, and
benchmarked a practice-based TPACK assessment tool.
preservice teachers (Lee, Brown, Luft, & Roehrig, 2007; Lee & Luft, 2008). Teachers’
knowledge evolves as they struggle making instructional decisions and negotiating among
diverse contextual elements (Koh, Chai, & Tay, 2014; Niess, 2011; Shulman, 1987) and
different situations in instruction (Authors, 2014). Reflection-in-action and reflection-on-
action in classroom teaching practices further individualize teachers’ TPACK and keep it
dynamic (Koh & Divaharan, 2011; Krajcik, Blumenfeld, Marx, & Soloway, 1994).
Knowledge can be attained for practice or elaborated in practice, but it can also achieve an
ultimate level of knowledge of practice (Cochran-Smith & Lytle, 1999). TPACK-P refers to
the knowledge teachers develop from long-term experiences in planning and enacting
instruction with flexible ICT uses to support different instructional needs (Ay, Karadag, &
Acat, 2015; Authors, 2014). Considering that both TPACK skills and personal teaching
experiences compose teachers' TPACK-P, it is expected that experienced teachers have
stronger TPACK-P than novice teachers or preservice teachers.
Teaching practices are the result of the complex and convoluted interactions of
instructional, social, and physical factors; there are no one-size-fits-all solutions for
instructional tasks (Koehler & Mishra, 2008; Mishra & Koehler, 2006). Teachers encounter a
wide variety of different problems and solutions throughout their respective teaching
experiences specific to their particular environment (Moersch, 1995; Niess et al., 2009;
Russell, 1995). Progression and evolution of teachers’ TPACK stem from their instructional
practices in response to different scenarios and challenges; therefore, varying proficiency
levels are expected amongst teachers. Apple Computers funded a project called Apple
Classrooms of Tomorrow (ACOT) in the mid-1980s where media was used to support
teaching and learning in classrooms. Dwyer et al. (1991) concluded that teachers’
instructional evolution in technology-implemented classrooms of the ACOT project went
through a series of phases, including entry, adoption, adaption, appropriation, and innovation.
They considered the teachers’ mastery of technology and their level of technology infusion
when determining the success of the teachers’ classrooms. A similar learning trajectory was
also found for mathematics teachers' learning to use spreadsheets as a means of facilitating
students learning in mathematics classrooms (i.e., recognizing, accepting, adapting,
exploring, and advancing) (Niess, 2007; Niess, et al., 2009). These performance-based
proficiency levels depict the typical features of the main stages of teacher development, while
statistical scrutiny of the cutting points between levels can offer another means of confirming
the level's validity.
1.2. Measurement of Teachers’ TPACK
Considering that TPACK is an internal and dynamic construct, it is difficult to measure
accurately (Kagan, 1990). However, the collection of different types of data and use of
different methods of analysis should enable better understanding TPACK. Self-report surveys
and performance assessments are two measurement types that TPACK researchers and
teacher educators have used (Abbitt, 2011; Koehler, Shin, & Mishra, 2012). Self-report
surveys are preferred by researchers hoping to rapidly collect large amounts of teachers’ self-
ratings regarding their technology understanding and use, but the TPACK models embedded
in the data should also be explored and estimated. Schmidt et al. (2009) collected 77 pre-
service teachers’ self-rating scores regarding ability descriptions developed from the seven
TPACK knowledge domains (TK, PK, CK, TPK, TCK, PCK, TPACK). These data were later
analyzed by principle component factor analyses and correlation analyses, from which
modifications to certain items and the relationships within component knowledge were
suggested. Lee and Tsai (2010) collected data from 558 pre-service teachers and used
exploratory and confirmatory factor analyses to examine the validity of the items, identify the
main factors affecting the construct, and determine the data fit within the proposed
framework of TPACK-Web. Although the teachers’ composite scores from their self-rating
19
Proficiency Level Validation
can be quantitatively compared, a more valid and complete understanding about teachers’
abilities can be achieved by linking these self-report scores and the indicators from observed
teachers’ practical mastery in instructional artifacts and classroom applications of TPACK-P.
As for proficiency levels, Harris, Grandgenett, and Hofer (2010) constructed an assessment
rubric to evaluate teachers’ lesson plans; scores 1 to 4 were offered to four aspects of the
appropriateness of technology use toward instruction. Expert teachers and researchers tested
the validity and reliability of the assessment rubrics through judging the content and scoring
the lesson plans. Beside the instructional performance that teachers display, their negotiations
among different constraints in instructional contexts as well as their reflections and
experiences from prior teaching practices can also display the depth of their instructional
knowledge. To investigate teachers’ instructional dispositions, the authors first identified the
framework of TPACK-P in which TPACK transforms from and during teaching practices
(Authors, 2014) and then interviewed 40 teachers about their design thinking and teaching
experiences with ICTs in instructional tasks of assessment, planning and designing, and
enactment (Authors, 2015). A spectrum of design thinking, actions, and reflections were
identified, which elicited more features of teachers’ TPACK-P at four levels.
1.3. Standard-setting for the Proficiency Levels of Teachers’ TPACK-P
Standard setting is a methodology utilized to provide cut scores for measurement scales.
These cut scores are used to separate performance categories by classifying test takers with
lower scores into one level and higher scores into another (Çetin & Gelbal, 2013; Shepard,
2008). Various approaches to standard setting have been proposed to achieve different
purposes, including conducting mastery classifications of the target population, finding norms
and passing rates for test takers or the greater population, inducting standards through
empirical means, and validating already developed assessment frameworks based on current
theories (Haertel & Lorié, 2004; Shepard, 1980). The technique used in this study for
standard setting was applied firstly to validate the proposed proficiency levels of science
teachers’ TPACK-P, and secondly to examine the in-service and pre-service science teachers’
knowledge about and application of TPACK in their teaching practices.
Once the hierarchical proficiency levels of teachers’ TPACK-P identified in previous
studies were qualitatively generated (Authors, 2014, 2015), the typical performance features
identified from these levels could be examined further and confirmed through statistical
analyses. The identified benchmarks and their features will be informative for teacher
educators, helping them to have a better awareness of what features science teachers with
different TPACK proficiency levels may display and develop in their teacher education
programs. Utilizing ICTs can be especially critical to teaching and learning science, since
representation in different forms facilitates learners' visualization of micro-level or macro-
level phenomena (Ainsworth, 2006; Mayer, 1999; Treagust, Chittleborough, & Mamiala,
2003; Wu, Krajcik, & Soloway, 2001) and their development of conceptual understanding
and inquiry abilities (de Jong & van Joolingen, 1998; Perkins et al., 2006). Science teachers
display their pedagogical uses of ICTs along a spectrum of emergent to advanced levels.
Their learning progress along the road to TPACK-P development, though content specific,
can offer some insights into professional development in other subjects.
2. Method
A questionnaire was developed to rapidly collect science teachers’ responses to different
instructional scenarios. Those responses were then used to validate the proposed proficiency
levels in the TPACK-P framework and for investigating the respondents’ TPACK-P
proficiency in their teaching practices. Two analyses were performed on the responses in
order to validate the hierarchy of proficiency levels, determine the thresholds of each
performance level in the metric, and generally investigate these science teachers’ TPACK-P.
20
Proficiency Level Validation
21
Proficiency Level Validation
18
Proficiency Level Validation
6. Be able to use appropriate instructional strategies to facilitate teachers’ instruction and student learning of
science in technology-supported curricula (e.g. engaging students in collaborative learning). [12B] [13A]
Enactment Use technology flexibly to 7. Be able to use different technology to improve the quality of content presentation, support
assist students’ learning communications, or build up students’ learning profiles. [14A] [15B] [16A]
and teachers’ instruction 8. Be able to use different technology to manage instructional resources or track student learning progress.
management [17C]
Level 2 – Assessment Evaluate students through 1. Be able to use ICTs to present science content, from which they observe students’ learning performance
Simple presenting content with and learning difficulties. [1A] [2A] [3B]
adopting ICTs. 2. Be able to evaluate students’ learning through online assessments, digital representation or ICT tools. [4B]
[5A] [6B]
Planning & Design technology- 3. Be able to use representations or ICT tools for teachers (themselves) and students to learn abstract
Designing supported instruction from concepts. [7B] [8A]
the teacher-centered 4. Consider technology uses in instruction according to external factors or students’ learning motivations.
perspective or with a [9B] [10C]
focus on developing 5. Be able to present science content with digital representations that are available and good for enhancing
students’ content students’ learning motivations. [11A]
comprehension.
6. Be able to teach science with technology in couple of instructional strategies for the purposes of enhancing
students’ motivations and conceptual understanding. [12A] [13B]
Enactment Use technology to make 7. Be able to implement technology in class to impress students in science learning and make teachers’
teaching more interesting instruction easier. [14B] [15A] [16B]
and better supported. 8. Be able to use word processors or online platforms to manage instructional resources. [17B]
Level 1 – Assessment Think technology make 1. Think technology are not good tools to be used for knowing students’ learning styles or learning
Lack of use no specific contributions difficulties.[1D] [2D] [3D]
to student evaluation. 2. Think technology-supported assessments are no different from conventional assessments or they have
concerns regarding implementing ICTs to assist their assessments. [4D] [5D] [6D]
Planning & Think technology make 3. View learning science content through technology no better than learning from professional books or
Designing no specific contributions magazines. [7D] [8D]
to curriculum design over 4. Consider teaching with technology to be an alternative instructional method to conventional instruction.
conventional teaching. [9D] [10D]
5. Consider technology to be useful only in limited instructional occasions. [11D]
6. View teaching with technology to be good enough for instructional purposes, in need of no other teaching
strategies for support. [12D] [13D]
19
Proficiency Level Validation
Enactment Think technology make 7. Believe teaching with technology brings similar contributions to student learning as conventional
no contributions to instruction. [14D] [15D] [16D]
teaching practices. 8. View current technology as not accommodating teachers’ needs in instructio management. [17D]
Level 0 In the current study, those who performed below level 1 were categorized as level 0.
Note: The information in brackets refers to the numbers of items and options in the questionnaire.
20
item were designed to match the indicators of the four proficiency levels (see Table 1). Figure
1 shows an item sample (see the complete questionnaire in supplementary material).
21
Proficiency Level Validation
highest-frequency responses given by the science teachers (Authors, 2014, 2015). Four
science inservice teachers who had longitudinally collaborated with the authors on designing
and implementing technology-supported curricula at the middle and high school levels were
invited to be questionnaire reviewers. Their job was to review the preliminary questionnaire
and comment on how to make it more comprehensible to the population of science teachers,
as well as descriptive of teachers’ knowledge and experience in teaching with technology.
Necessary modifications to the items and options were made before the questionnaire was
used.
22
Proficiency Level Validation
of TPACK-P. IRT allows test-takers’ ability and the difficulty of certain performances at
different proficiency levels for all task items to be located along the same scale. As a result,
researchers are able compare the test-takers’ abilities and the difficulty of the proficiency
levels for all task items (Bond & Fox, 2015; Wright & Stone, 1999). Based on the developed
metrics and proficiency levels, the in-service and pre-service science teachers’ knowledge
about and application of TPACK-P were also examined. The person and item parameters in
the current study were estimated by using an open-source software package called Test
Analysis Modules (TAM; Kiefer, Robitzsch, & Wu, 2015); the Wright’s maps were generated
by WrightMap (Irribarra & Freund, 2014). Both packages were conducted in R-3.1.3 (R
Development Core Team, 2014).
2.3.1. Analysis 1
Before engaging in standard setting for the science teachers’ TPACK-P, the response data
from the 47 in-service teachers to the 17 items in the first section (i.e., perceived as important
criteria) were used to cross-validate the proficiency levels for the different indicators in the
TPACK-P assessment framework developed through protocol analyses of previous
interviews. The four options for each question were treated as different pseudo-items because
multiple-response questions can be seen as a set of agree/disagree items. The partial credit
model (PCM; Masters, 1982) in IRT was applied to locate the thresholds for the pseudo-items
(see Equation A.1 in Appendix A).
The option listed as the most important criterion of the corresponding pseudo-item was
scored as 2, the option listed as an important criterion was scored as 1, and the other options
were scored as 0. The higher threshold or difficulty of score 2 for each pseudo-item implied a
smaller likelihood that these teachers would choose that option as the most important. Thus,
we calculated the Spearman’s rank correlation coefficient between the proficiency ranks that
were estimated from the 47 responses and the ranks that were identified from the previous
interview study. The correlation results could be viewed as a validity indicator; a high
correlation would suggest that the proposed proficiency levels (identified from interview
results and embedded with typical responses in item options) should be cross-validated by the
in-service teachers’ selections of options based on features of the teachers’ behaviors for the
ranks.
2.3.2. Analysis 2
Knowledge differences between the pre-service and in-service teachers were explored by
applying a multidimensional PCM to the response data collected from the 99 participants.
The first dimension referred to their knowledge about TPACK-P (i.e., perceived importance
of each criterion) and the second dimension to their application of TPACK-P (i.e., the
application of the criterion in the teaching practice). All responses on both parts of the 17
questions were scored according to the corresponding proficiency level (i.e., 1, 2, 3, 4). A
blank response was scored as 0, referring to proficiency Level 0 (i.e., the respondent had no
idea how to use technology in a science class). Therefore, the scores for the 17 questions in
both the knowledge and application dimensions were in the range of 0 to 4. The highest level
among the chosen options was designated as the score received for the specific question.
Therefore, each teacher received 17 scores that could be used to estimate their TPACK-P
from the dimension of knowledge and another 17 scores for the dimension of application.
The variances of subgroups estimated based on test-takers’ raw scores or the MLE scores
estimated by joint maximum likelihood (JML) method are likely overestimated due to the
measurement errors, and therefore, the effect sizes of the differences among the
subpopulations are attenuated. Therefore, in the current study, the marginal maximum
likelihood (MML) estimation, which takes the prior distributions of subpopulations as
23
Proficiency Level Validation
additional conditions, was used to obtain the unbiased estimation of subpopulation variances
(i.e. inservice and preservice teachers) (Adams & Wu, 2007). Thus the correct effect size of
group difference can be calculated.
By applying MML estimation, the item thresholds and population parameters (i.e., means
and variances for the two groups) could be estimated simultaneously on the same scale in
each of the two dimensions (Bock, 1981). Therefore, the regression coefficients and their
standard errors of the predictor (i.e., the group variable) for the two latent abilities as
knowledge about and application of TPACK-P in Equation 1 could be estimated directly
before estimating individual scores (Adams & Wu, 2007; Bock, 1981; de Gruijter & van der
Kamp, 2008).
𝜃𝐾 𝑅 𝐶 𝐸
( ) = ( 𝐾) 𝐺 + ( 𝐾) + ( 𝐾) (1)
𝜃𝐴 𝑅𝐴 𝐶𝐴 𝐸𝐴
In Equation 1, 𝜃𝐾 and 𝜃𝐴 refer to the estimated latent abilities of knowledge about and
application of TPACK-P, respectively; G is the dummy variable and equals to 1 for an
experienced in-service teacher and 0 for an inexperienced pre-service teacher; 𝑅𝐾 and 𝑅𝐴 are
the regression coefficients of the predictor G; 𝐶𝐾 and 𝐶𝐴 are constants referring to pre-service
teachers’ average ability in knowledge about and application of TPACK-P; and 𝐸𝐾 and 𝐸𝐴 are
the error functions 𝑁(0, 𝜎𝐾2 ) and 𝑁(0, 𝜎𝐴2 ).
3.1. Analysis 1
Analysis 1 established the rank of thresholds of score 2 for the options in all 17 questions
via the TAM and WrightMap packages. Figure 2 provides questions #1 and #8 as examples to
illustrate how the proficiency level for each option was identified. For question #1, option C
had the lowest threshold, implying that this option was likely to be the most important
consideration in the context provided in the item scenario. For most of the 17 questions
(including question #1), option D was not selected as an important consideration by any
respondent so the threshold of option D for these questions was automatically assigned the
least important status among the four options. Therefore, option C of question #1 should be
the highest level (i.e., Level 4), option B could be identified as Level 3, option A as Level 2,
and option D as the lowest level (i.e., Level 1). Similarly, for question #8, we identified
option C as Level 4, option B as Level 3, option A as Level 2, and option D as Level 1. Based
on the 47 experienced teachers’ responses, the proficiency levels of the four options for the
17 questions in the TPACK-P Questionnaire were identified. The Spearman’s rank correlation
between the identified levels and the levels previously specified from the interview data was
0.87. In other words, the framework of the four proficiency levels identified from the
interview data was quantitatively supported by the current study.
24
Proficiency Level Validation
Figure 2. Item thresholds in Wright Map and corresponding locations of options for
questions #1 and #8.
3.2. Analysis 2
Analysis 2 provided estimates of the 47 experienced in-service and 52 pre-service
teachers’ knowledge about and application of TPACK-P by using the multidimensional PCM.
In order to estimate the group differences on these two dimensions, the group (i.e., pre-
service, in-service) was used as a regression variable to predict the teachers’ latent abilities of
the TPACK-P from both dimensions. The results indicated that the person separation
reliabilities of the survey for knowledge about and application of TPACK-P are 0.85 and
0.90, respectively. These person separation reliabilities are good but not as high as expected
for an instrument including 17 four-level questions (i.e., the reliability should be at the level
of an instrument with 68 dichotomous items). Low reliability could be explained by the small
variance across the proficiency levels of respondents. In addition, the item separation
reliabilities suggested by contemporary psychometrics (e.g. Bond & Fox, 2015; Krishnan &
Idris, 2014; Linacre, 2012; Wright & Stone, 1999) were also calculated to examine the
stability in estimating the item parameters and the appropriation of sample size. For the
dimensions of knowledge about and application of TPACK-P, the item separation reliabilities
are 0.95 and 0.96 demonstrating good replicability of item locations along the two
dimensions if these same items were given to another 99 science teachers who had the similar
background or experience with the sample in the current study.
Various item- and model-level fit indices were utilized to examine the validity in using
multidimensional PCM. For all 34 questions (17 each in knowledge and application) their
information-weighted MNSQ were ranged from 0.79 to 1.24, and the t values were located in
between -1.96 and 1.96 (Appendix B). The results suggested that the equal discrimination
assumption was sustained (Bond & Fox, 2015; Linacre, 2012; Wright & Stone, 1999). In
addition, the absolute values of elements in Q3 matrix (Yen, 1984) were ranged from 0.01 to
0.26, supporting a robust assumption of local independence (Kim, de Ayala, Ferdous &
Nering, 2011; Yen, 1993). Finally, the standardized root-mean-square residual (SRMR) was
equal to 0.08 and suggested adequate fit of model globally (Hu & Bentler, 1999, Maydeu-
Olivares, 2013). Therefore, the proposed multidimensional PCM was used to interpret the
subjects’ responses on the developed instrument.
25
Proficiency Level Validation
in the logit scale at which a person had a 75% chance to get a score point higher than or just
equal to the corresponding level of response was calculated. In each of the two dimensions,
the variation of thresholds of the same proficiency level across the 17 items reflected the fact
that the task difficulty interacts with the context of educational practice described in the
scenarios. In education, 75% is usually a reasonable probability to certify that a person can
reach a specific proficiency level for tasks with an average difficulty. Thus, by averaging the
thresholds across the items (see Equation A.2 in Appendix A), the thresholds of proficiency
levels were located for the dimensions of knowledge about TPACK-P as -0.45, -0.08, 0.80,
and 2.60 (logit) and of application of TPACK-P as -1.33, 0.62, 1.35, and 2.36, respectively
(Figure 3).
26
Proficiency Level Validation
(a)
(b)
Figure 3. Wright Maps and the thresholds of proficiency levels of the metrics in
(a) knowledge about and (b) application of TPACK-P.
and -0.10/0.18) for the two dimensions were between -1.96 and 1.96, indicating that there
was no significant difference between the two groups in both the dimensions of knowledge
about and application of TPACK-P.
The TAM package also provided the covariance matrix of the two abilities in knowledge
about and application of TPACK-P (Table 5). The correlation between knowledge about and
application of TPACK-P was 0.74, indicating a good discriminating validity to differentiate
knowledge about and application of TPACK-P as two different latent abilities. In addition, the
variances of the two latent abilities were small (i.e., 0.35 and 0.64) in comparison with
27
Proficiency Level Validation
general ability, such as reading or mathematics ability, in classroom contexts. In addition, the
distributions of the two groups at different proficiency levels were almost the same (see Table
6).
The average abilities and distributions at different proficiency levels were similar for the
two groups. In addition, most of the participants’ knowledge about TPACK-P was located at
Level 3 and application of TPACK-P was located at Level 1. This evidence implied that most
of the participants demonstrated their TPACK-P at proficiency Level 3 and Level 1 for the
dimensions of knowledge and application, respectively. Lacking obvious proficiency
differences between preservice and inservice teachers suggested even the inservice science
teachers in this study who reported having experiences teaching with technology did not
develop better TPACK-P than pre-service teachers did.
28
Proficiency Level Validation
4. Discussion
It is common to see teachers’ TPACK-P be evaluated through composite scores earned
from how well they understand and use technology, or by proficiency ranks determined by
their achievement of certain levels. Higher scores or higher ranks entail more advanced
TPACK-P. Teachers at the same developmental levels are assumed to share knowledge or
teaching performances at the same complexity levels. Qualitative data can be informative
revealing typical features and identifying nuances between levels, but it is rare that the
thresholds for the categorical ranks or ordinal scores are statistically examined. In this study,
we validated the five proficiency ranks of teachers’ TPACK by examining the correlations
between the ranks located in Wright Maps constructed from the data provided by teachers’
questionnaire responses and the ranks identified in the interviews. The thresholds of these
ranks were also identified according to the average threshold of each proficiency level from
the 17 items, according to a logit scale. Science teachers in this study were found to possess
higher proficiency levels from the perspective of knowledge than application, though there
were no significant differences found between pre-service and in-service teachers.
Difficulty variations in these 17 items were expected because: (a) the different instructional
scenarios demanded different abilities, and (b) teacher evaluations had to be conducted as
specifically as possible in terms of the scope of the objective evaluated (e.g., oral language
use and direction offered for “communicating with students”; Danielson, 1996, p. 29).
Therefore, in the current study, we located the threshold of a proficiency level by averaging
the thresholds across all the task items; the threshold of each item was defined as the presence
of a 75% chance of a participant performing at the same or a higher level in the item.
However, most researchers in educational testing acknowledge that the techniques used in
standard setting are judgmental, due either to the arbitrary requirement of the likelihood of
success or the arbitrary selection of observed indicators (e.g., Block, 1978; Shepard, 1980).
One could apply a harsher criterion to set the cut scores for proficiency levels (for example,
by estimating an 85% chance of a participant performing at the same or a higher level for
80% of the task items). A harsher or lesser criterion can be used for setting proficiency levels,
depending on the purposes and the risks that may come along. Researchers can easily apply
the procedure demonstrated in the current study and change the required probability of
success to meet their purposes and manage the associated risks.
The proficiency levels on an ability scale can be used to map the trajectory of
development. However, unlike the proficiency levels validated in the current study
emphasizing on applying a unidimensional and hierarchical structure to model and to explain
the variance of population performance, studies focused on learning progression usually
adopted a longitudinal design and paid more attention on categorizing different patterns of
developing trajectories than on finding a general one. In spite of the differences in nature
between the concept of learning proficiency and learning progression, the four proficiency
levels validated on both the scales of knowledge about and application of TPACK-P for
science teachers in the current study are highly parallel to Niess et al.’s (2009) five stages
five stages describing teachers’ learning progression related to a piece of technology designed
for instructional use. First, similar features were found for teachers ranked Levels 2 through
Level 4. Teachers at Level 4 (reflective application) showed their reflective thinking or
innovative curriculum construction abilities through their potential ICT uses. Teachers at
Level 3 (infusive application) displayed their ability to select and use appropriate ICTs to
support instruction, whereas teachers at Level 2 (simple adoption) tended to use ICTs to
facilitate students' learning of content knowledge. The degree of student-centeredness and
appropriateness of the ICT engagement increased as the level increased. Second, features of
teachers at Level 1 (lack of use) or below were quite different from those observed in
teachers' learning progression (i.e., recognizing, accepting). Sampling differences could be
29
Proficiency Level Validation
the main reason for this characteristic. It makes sense that teachers' TPACK development
usually begins with recognizing technology in instruction and forming an accepting attitude
and belief in their value, especially when teachers are motivated learners of technology-
supported instruction or properly guided in teaching with certain technology (i.e., Niess et al.,
2009). However, due to teacher variations, we should not ignore the fact that there are still
some teachers who do not opt to attend or are too busy to make use of professional
development programs related to technology. These teachers may only have few knowledge
of the importance of technology implementation or possess limited understanding of ICT
strengths, and still favor conventional instruction (Level 0 to Level 1). How to attract and
assist these two teacher groups in their TPACK and TPACK-P learning tracks cannot be less
important than assisting teachers in refining their teacher knowledge.
Examination of the thresholds of the four proficiency levels, which were based on the
average difficulty across the 17 items, indicated that it was especially hard for these science
teachers to reach Level 4 in knowledge of TPACK-P or Level 2 in application. The greater
barrier difficulty faced by teachers hoping to master these two levels can be explained by the
greater proportion of teachers in Levels 2 and 3 in knowledge and Level 1 in application.
That is, the science teachers evaluated in this study might have developed knowledge of
adopting and infusing technology into their instruction, but it was rare that they applied such
knowledge in their actual teaching practices (TPACK-P). Considering that TPACK-P is
developed based on experiences of teaching with technology and instructional reflection,
neither academic nor applicational TPACK-P can effectively be enhanced when technology
are not implemented or experimented within classrooms. The instructional environment can
be supportive of devices but not supportive in terms of curriculum or teacher support systems
(Afshari et al., 2009; Ertmer, 1999; Mumtaz, 2000). For example, some science teachers
attributed their low technology implementation to tight curriculum schedules, which
minimized the time they tried out new learning tools in class or digitalized the curriculum. In
fact, teachers who had higher levels of TPACK-P and used technology adeptly made
comments like uses of appropriate representations enhancing the effectiveness of their
content delivery and facilitated student learning (Authors et al., 2015). Increased knowledge
and uses of such devices can boost teachers’ confidence and self-efficacy, and then lead to
more instructional implementation in classrooms (Koh & Frick, 2008; Muller et al., 2008;
Thomas & O’Bannon, 2015). Therefore, teachers' TPACK-P can be further developed from
their continuous and reflective technology implementation with witness of instructional
effectiveness..
No significant differences with regards to mastery of knowledge and application of
TPACK-P were found between the preservice and experienced inservice science teachers,
which was surprising. Experienced teachers displayed a greater repertoire of representations
and flexible teaching strategies in their PCK (Clermont, Borko, & Krajcik, 1994). Inservice
teachers who had experience in teaching with technology were assumed to develop stronger
TPACK-P than preservice teachers, since TPACK-P develops through actual technology
implementation in instruction. Results of the current study showed that preservice teachers
may not know less about teaching with technology than inservice teachers, even though the
options were created based on the content of the in-service teachers’ interviews. Previous
survey results indicated that age and teaching experience may not be reliable predictors of
teachers’ technology uses in classrooms, or even their level of TPACK. Based on the results
of large-scale surveys, Russell, Bebell, O’Dwyer, & O’Connor (2003) found that novice
teachers (<5 years) possessed higher confidence in technology, but such confidence did not
translate to actual uses of technology or student-centered instruction. Veteran teachers (>15
years) seemed to possess the lowest confidence in technology, but both veteran teachers and
experienced teachers (6-15 years) showed significantly more teacher-directed student use of
30
Proficiency Level Validation
technology during instruction. Another survey found that preservice teachers (digital natives)
recognized and used more features of smart phones, but they were not enthusiastic about the
use of smart phones in instruction (O’Bannon & Thomas, 2014; Thomas & O’Bannon, 2015).
These researchers suspected that it was a lack of models demonstrating the possible
implementations of smart phones in classrooms that caused this discrepancy. Teachers’
TPACK and TPACK-P are not individually composed of PCK, technological knowledge, or
confidence. Most teachers are now exposed to technology in daily life and are aware of the
possible benefits technology can offer, but there does not seem to be an automatic connection
between their PCK and technology knowledge or use. Teachers still need to be guided and
peer-supported in their development of TPACK and TPACK-P.
There are limitations to the current study. First, the statistical results should be interpreted
with caution because the participants were convenient samples from two target populations.
Second, designing items in a format with multiple options can be a first trial among the self-
rated questionnaires used for TPACK-P evaluation. IRT results showed that the options
generated from the interview data were sensitive to teachers’ TPACK-P proficiency;
therefore, these options can be viewed as typical features of the teachers at specific ranks.
However, it should be noted that self-reported questionnaires may not reflect actual
instructional performance; this issue was partially addressed by requiring questionnaire
respondents to report their knowledge about and application of TPACK-P. The questionnaire
constructed in this study offers a quick examination of science teachers’ TPACK in practical
teaching contexts.
5. Conclusions
Teacher evaluation is much more difficult than student evaluation because teachers’
instructional knowledge is dynamic, contextualized, and personal. In previous studies, the
developmental trajectory of TPACK and TPACK-P has been qualitatively validated based on
teachers’ actual performance. This study further examined and validated the hierarchy of
proficiency levels and corresponding typical features of teachers’ TPACK-P through an IRT
standard setting. Teachers' TPACK development is content-bounded; however, teachers'
learning progression can be generic, and not limited to a certain type of technology. These
qualitatively and quantitatively validated features can be used as milestones along the
TPACK-P roadmap, allowing references for quick, one-time evaluations and longitudinal
observations of teachers’ TPACK development. Researchers and teachers can quantify how
mature a teacher’s TPACK-P is and what is needed for it to further evolve. Finally, knowing
is easier than doing. Science teachers may have significant knowledge of what ICT tools are
available and how they might facilitate instruction, but lacking actual experience (e.g.,
designing, enacting, and reflecting on technology-supported instruction) means it is unlikely
that teachers will further refine their TPACK-P. The availability of technology in the
classroom and teachers’ professional development in TPACK might not be the only issues;
how teachers feel about their environment and the support available for their teaching with
technology should be considered and must be addressed if the development and elaboration
of teachers’ TPACK-P is to be pursued.
31
References
Authors. (2014).
Authors. (2015).
Abbitt, J. T. (2011). Measuring technological pedagogical content knowledge in pre-service
teacher education: A review of current methods and instruments. Journal of Research on
Technology in Education, 43(4), 281–300.
Adams, R., & Wu, M. (2007). The mixed-coefficients multinomial logit model: A generalized
form of the Rasch model. In C. Carstensen (Ed.), Multivariate and mixture distribution
Rasch models (pp. 57-75). New York, NY: Springer.
Afshari, M., Bakar, K. A., Luan, W. S., Samah, B. A., & Fooi, F. S. (2009). Factors affecting
teachers’ use of information and communication technology. Online Submission, 2(1), 77-
104.
Ainsworth, S. (2006). DeFT: A conceptual framework for considering learning with multiple
representations. Learning and Instruction, 16(3), 183–198.
Angeli, C., & Valanides, N. (2009). Epistemological and methodological issues for the
conceptualization, development, and assessment of ICT-TPCK: Advances in technological
pedagogical content knowledge (TPCK). Computers & Education, 52(1), 154–168.
Archambault, L., & Crippen, K. (2009). Examining TPACK among K-12 online distance
educators in the United States. Contemporary Issues in Technology and Teacher
Education, 9, 71–88. Retrieved from
http://www.citejournal.org/vol9/iss1/general/article2.cfm
Ay, Y., Karadağ, E., & Acat, M. B. (2015). The Technological Pedagogical Content
Knowledge-practical (TPACK-Practical) model: Examination of its validity in the Turkish
culture via structural equation modeling. Computers & Education, 88, 97-108.
Block, J. H. (1978). Standards and criteria: A response. Journal of Educational Measurement,
15, 291-295.
Bond, T. G. & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in
the human sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.
Chen, W., Hendricks, K., & Archibald, K. (2011). Assessing pre-service teachers' quality
teaching practices. Educational Research and Evaluation,17(1), 13-32.
Clement, C. P., Borko, H., & Krajcik, J. S. (1994). Comparative study of the pedagogical
content knowledge of experienced and novice chemical demonstrators. Journal of
Research in Science Teaching, 31(4), 419-441.
Çetin, S., & Gelbal, S. (2013). A comparison of bookmark and Angoff standard setting
methods. Kuram Ve Uygulamada Egitim Bilimleri, 13(4), 2169-2175.
Cochran-Smith, M., & Lytle, S. L. (1999). Relationships of knowledge and practice: Teacher
learning in communities. Review of research in education, 249-305.
Cox, S., & Graham, C. R. (2009). Using an elaborated model of the TPACK framework to
analyze and depict teacher knowledge. TechTrends, 53(5), 60-71.
de Gruijter, D. N. M., & van der Kamp, L. J. T. (2008). Statistical test theory for the
behavioral sciences. Boca Raton, FL: Chapman & Hall/CRC.
de Jong, T., & van Joolingen, W. R. (1998). Scientific discovery learning with computer
simulations of conceptual domains. Review of Educational Research, 68(2), 179–201.
Doering, A., Veletsianos, G., Scharber, C., & Miller, C. (2009). Using the technological,
pedagogical, and content knowledge framework to design online learning environments
and professional development. Journal of Educational Computing Research, 41(3), 319-
346.
Dwyer, D. C., Ringstaff, C., & Sandholtz, J. H. (1991). Changes in teachers’ beliefs and
practices in technology-rich classrooms. Educational Leadership, 48(8), 45–52.
Ertmer, P. A. (1999). Addressing first- and second-order barriers to change: Strategies for
32
Proficiency Level Validation
technology integration. Educational Technology Research and Development, 47(4), 47-61.
Gess-Newsome, J., & Lederman, N. G. (1993). Pre-service biology teachers’ knowledge
structures as a function of professional teacher education: A year-long assessment. Science
Education, 77(1), 25–45.
Gray, L., Thomas, N., & Lewis, L. (2010). Teachers’ use of educational technology in U.S.
public schools: 2009 (NCES 2010–040). Washington, DC: National Center for Education
Statistics, Institute of Education Sciences, US Department of Education.
Haertel, E. H., & Lorié, W. A. (2004). Validating standards-based test score interpretations.
Measurement: Interdisciplinary Research and Perspectives, 2, 61-103.
Harris, J. B., Grandgenett, N., & Hofer, M. (2011). Testing a TPACK-based technology
integration assessment rubric. In D. Gibson & B. Dodge (Eds.), Proceedings of Society for
Information Technology & Teacher Education International Conference 2010 (pp. 3833-
3840). Chesapeake, VA: Association for the Advancement of Computing in Education
(AACE). Retrieved from http://www.editlib.org/p/33978.
Irribarra, D. T., & Freund, R. (2014). WrightMap: IRT item-person map with ConQuest
integration. Retrieved from http://github.com/david-ti/wrightmap
Jimoyiannis, A. (2010). Designing and implementing an integrated technological pedagogical
science knowledge framework for science teachers’ professional development. Computers
& Education, 55(3), 1259–1269.
Kagan, D. M. (1990). Ways of evaluating teacher cognition: Inferences concerning the
Goldilocks Principle. Review of Educational Research, 60(3), 419-469.
Kiefer, T., Robitzsch, A., & Wu, M. (2015). TAM: Test Analysis Modules (R package version
1.6-0). Retrieved from http://CRAN.R-project.org/package=TAM
Koehler, M. J., & Mishra, P. (2008). Introducing TPCK. In American Association of Colleges
for Teacher Education Committee on Innovation and Technology (Ed.), The handbook of
technological pedagogical content knowledge (TPCK) for educators (pp. 3-29). New York,
NY: Routledge.
Koehler, M. J., Shin, T. S., & Mishra, P. (2012). How do we measure TPACK? Let me count
the ways. In R. N. Ronau, C. R. Rakes, & M. L. Niess (eds.), Educational technology,
teacher knowledge, and classroom impact: A research handbook on frameworks and
approaches (pp. 16-31). Hershey, PA: Information Science Reference.
Koh, J. H. L., Chai, C. S., & Tay, L. Y. (2014). TPACK-in-Action: Unpacking the contextual
influences of teachers’ construction of technological pedagogical content knowledge
(TPACK). Computers & Education, 78, 20-29.
Koh, J. H. L., & Divaharan, S. (2011). Developing pre-service teachers’ technology
integration expertise through the TPACK-developing instructional model. Journal of
Educational Computing Research, 44(1), 35-58.
Koh, J. H., & Frick, T. W. (2009). Instructor and student classroom interactions during
technology skills instruction for facilitating preservice teachers' computer self-
efficacy. Journal of Educational Computing Research, 40(2), 211-228.
Krajcik, J. S., Blumenfeld, P. C., Marx, R. W., & Soloway, E. (1994). A collaborative model
for helping teachers learn project-based instruction, Elementary School Journal, 94(5),
483-497.Lee, M. H. & Tsai, C. C. (2010). Exploring teachers’ perceived self efficacy and
technological pedagogical content knowledge with respect to educational use of the World
Wide Web. Instructional Science, 38, 1-21.
Lee, E., Brown, M. N., Luft, J. A., & Roehrig, G. H. (2007). Assessing beginning secondary
science teachers' PCK: Pilot year results. School Science and Mathematics, 107(2), 52-60.
Lee, E., & Luft, J. A. (2008). Experienced secondary science teachers’ representation of
pedagogical content knowledge. International Journal of Science Education, 30(10),
1343-1363.
33
Proficiency Level Validation
Lee M.H. & Tsai C.C. (2010) Exploring teachers' perceived self efficacy and technological
pedagogical content knowledge with respect to educational use of the World Wide
Web. Instructional Science 38, 1–21.
Lord, F. M. (1960). Large-sample covariance analysis when the control variable is fallible.
Journal of the American Statistical Association, 55(290), 307-321.
Magnusson, S., Krajcik, J. S., & Borko, H. (1999). Nature, sources, and development of
pedagogical content knowledge for science teaching. In J. Gess-Newsome & N. G.
Lederman (Eds.), Examining pedagogical content knowledge: The construct and its
implications for science education (pp. 95–132). Dordrecht, The Netherlands: Kluwer
Academic.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Mayer, R. E. (1999). The promise of educational psychology: Learning in the content areas.
Upper Saddle River, NJ: Prentice Hall.
Mishra, P., & Henriksen, D. (2015). The end of the beginning: An epilogue. In Y.-S. Hsu
(Ed.). Development of science teachers’ TPACK: East Asia practices (pp. 133-142).
Singapore: Springer.
Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A
framework for teacher knowledge. Teachers College Record, 108(6), 1017–1054.
Moersch, C. (1995). Levels of technology implementation (LoTi): A framework for
measuring classroom technology use. Learning and Leading with Technology, 23(3), 40–
42.
Mueller, J., Wood, E., Willoughby, T., Ross, C., & Specht, J. (2008). Identifying
discriminating variables between teachers who fully integrate computers and teachers with
limited integration. Computers & Education, 51(4), 1523-1537.
Mumtaz, S. (2000). Factors affecting teachers’ use of information and communications
technology: A review of the literature. Journal of Information Technology for Teacher
Education, 9(3), 319-342.
Niess, M. L. (2005). Preparing teachers to teach science and mathematics with technology:
Developing a technology pedagogical content knowledge. Teaching and Teacher
Education, 21(5), 509–523.
Niess, M. L. (2011). Investigating TPACK: Knowledge growth in teaching with technology.
Journal of Educational Computing Research, 44(3), 299-317.
Niess, M. L. (2007, June). Mathematics teachers developing technological pedagogical
content knowledge (TPCK). Paper presented at IMICT2007, Boston, MA.
Niess, M. L., Ronau, R. N., Shafer, K. G., Driskell, S. O., Harper, S. R., Johnston, C. …
Kersaint, G. (2009). Mathematics teacher TPACK standards and development model.
Contemporary Issues in Technology and Teacher Education, 9(1), 4–24.
O'Bannon, B. W., & Thomas, K. M. (2015). Mobile phones in the classroom: Preservice
teachers answer the call. Computers & Education, 85, 110-122.
Perkins, K. et al. (2006). PhET: Interactive Simulations for Teaching and Learning Physics,
Physics Teacher, 44(1), 18-23.
Project Tomorrow. (2008). 21st century learners’ deserve a 21st century education: Selected
national findings of the speak up 2007 survey. Retrieved from
http://www.tomorrow.org/docs/national%20findings%20speak%20up%202007.pdf
R Development Core Team (2014). R: A language and environment for statistical computing.
Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-
project.org/
Russell, A. L. (1995). Stages in learning new technology: Naïve adult email users. Computers
& Education, 25(4), 173–178.
Russell, M., Bebell, D., O'Dwyer, L., & O'Connor, K. (2003). Examining teacher technology
34
Proficiency Level Validation
use implications for preservice and inservice teacher preparation.Journal of Teacher
Education, 54(4), 297-310.
Sandholtz, J. H., Ringstaff, C., & Dwyer, D. C. (1997). Teaching with technology: Creating
student-centered classrooms. New York, NY: Teachers College Press.
Schmidt, D. A., Baran, E., Thompson, A. D., Mishra, P., Koehler, M. J., & Shin, T. S. (2009).
Technological pedagogical content knowledge (TPACK): The development and validation
of an assessment instrument for preservice teachers. Journal of Research on Technology in
Education, 42(2), 123-149.
Shepard, L. (1980). Standard setting issues and methods. Applied Psychological
Measurement, 4(4), 447-467.
Shepard, L. (2008). Commentary on the national mathematics advisory panel
recommendations on assessment. Educational Researcher, 37(9), 602–609.
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational
Researcher, 15(2), 4–14.
Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard
Educational Review, 57(1), 1–22.
Thomas, K., & O'Bannon, B. (2015, March). Looking Across the New Digital Divide: A
Comparison of Inservice and Preservice Teacher Perceptions of Mobile Phone Integration.
In Society for Information Technology & Teacher Education International Conference (Vol.
2015, No. 1, pp. 3460-3467).
Treagust, D., Chittleborough, G., & Mamiala, T. (2003). The role of submicroscopic and
symbolic representations in chemical explanations. International Journal of Science
Education, 25(11), 1353–1368.
van Driel, J. H., Beijaard, D., & Verloop, N. (2001). Professional development and reform in
science education: The role of teachers’ practical knowledge. Journal of Research in
Science Teaching, 38(2), 137–158.
van Driel, J. H., Verloop, N., & de Vos, W. (1998). Developing science teachers’ pedagogical
content knowledge. Journal of Research in Science Teaching, 35(6), 673–695.
Wright, B. D., & Stone, M. H. (1999). Measurement essentials. Wilmington, DE: Wide
Range, Inc
Wu, H.-K., Krajcik, J., & Soloway, E. (2001). Promoting understanding of chemical
representations: Students’ use of a visualization tool in the classroom. Journal of Research
in Science Teaching, 38(7), 821–842.
35
Appendix A: Partial Credit Model
exp
h 0 k 0
n ik
0
where we define exp n ik 1 . In Eq. (A.1), 𝜃𝑛 refers to the person’s latent ability and
k 0
𝛿𝑖𝑘 to the item parameters. In the current study, the threshold ability 𝜃𝑇𝑖𝑥 of the xth step for
item i is defined as the value of latent ability satisfying
mi h
exp Tix ik
Pr X i x | Tix h x k 0
0.75 (A.2)
exp
mi h
Tix ik
h 0 k 0
Therefore, for a person whose latent ability is higher than 𝜃𝑇𝑖𝑥 , the probability of obtaining a
score higher than x on item i is larger than 0.75.
Table B.1. Information-weighted fit (infit) indices for average item threshold
Knowledge about TPACK-P Application of TPACK-P
Item no. MNSQ CI T Item no. MNSQ CI T
1 0.92 (0.71, 1.29) -0.5 1 1.07 (0.72, 1.28) 0.5
2 1.15 (0.68, 1.32) 0.9 2 1.05 (0.64, 1.36) 0.3
3 0.98 (0.76, 1.24) -0.1 3 1.04 (0.64, 1.36) 0.2
4 1.02 (0.72, 1.28) 0.2 4 1.15 (0.73, 1.27) 1.1
5 1.02 (0.71, 1.29) 0.2 5 0.98 (0.67, 1.33) -0.1
6 0.99 (0.70, 1.30) 0.0 6 1.14 (0.70, 1.30) 0.9
7 1.03 (0.71, 1.29) 0.3 7 0.99 (0.71, 1.29) 0.0
8 0.9 (0.62, 1.38) -0.5 8 0.91 (0.75, 1.25) -0.7
9 1.16 (0.72, 1.28) 1.1 9 1.19 (0.74, 1.26) 1.4
10 1.18 (0.70, 1.30) 1.1 10 0.84 (0.75, 1.25) -1.3
11 1.14 (0.71, 1.29) 1.0 11 0.84 (0.73, 1.27) -1.2
12 1.03 (0.70, 1.30) 0.2 12 0.97 (0.71, 1.29) -0.2
13 0.95 (0.70, 1.30) -0.3 13 0.93 (0.72, 1.28) -0.5
14 0.99 (0.66, 1.34) 0.0 14 1.06 (0.75, 1.25) 0.5
15 0.95 (0.71, 1.29) -0.3 15 0.95 (0.72, 1.28) -0.3
16 1.17 (0.76, 1.24) 1.3 16 1.23 (0.74, 1.26) 1.8
17 0.94 (0.69, 1.31) -0.4 17 1.13 (0.73, 1.27) 0.9
36
Proficiency Level Validation
Table B.2. Information-weighted fit (infit) indices for the thresholds of item steps
Knowing about TPACK-P Appling TPACK-P
Item Item Item Item
MNSQ CI t MNSQ CI t
no step no step
1 0 1.04 (0.40, 1.60) 0.2 1 0 0.81 (0.50, 1.50) -0.7
1 1 0.93 (0.33, 1.67) -0.1 1 1 1.09 (0.89, 1.11) 1.5
1 2 1.01 (0.77, 1.23) 0.1 1 2 1.02 (0.82, 1.18) 0.2
1 3 1.00 (0.91, 1.09) 0.1 1 3 1.07 (0.58, 1.42) 0.4
1 4 1.00 (0.75, 1.25) 0.0 1 4 1.00 (0.05, 1.95) 0.2
2 0 1.51 (0.00, 2.43) 0.8 2 0 1.03 (0.52, 1.48) 0.2
2 1 1.01 (0.36, 1.64) 0.1 2 1 0.90 (0.87, 1.13) -1.6
2 2 0.97 (0.55, 1.45) -0.1 2 2 0.97 (0.76, 1.24) -0.2
2 3 0.96 (0.86, 1.14) -0.5 2 3 1.05 (0.00, 2.04) 0.3
2 4 0.90 (0.86, 1.14) -1.4 2 4 1.06 (0.07, 1.93) 0.3
3 0 n/a* (n/a, n/a ) n/a 3 0 0.91 (0.18, 1.82) -0.1
3 1 1.03 (0.21, 1.79) 0.2 3 1 1.03 (0.83, 1.17) 0.3
3 2 1.03 (0.76, 1.24) 0.3 3 2 0.99 (0.77, 1.23) -0.1
3 3 1.00 (0.77, 1.23) 0.1 3 3 1.15 (0.24, 1.76) 0.5
3 4 0.95 (0.85, 1.15) -0.7 3 4 0.93 (0.07, 1.93) 0.0
4 0 1.24 (0.40, 1.60) 0.8 4 0 1.02 (0.62, 1.38) 0.2
4 1 0.87 (0.56, 1.44) -0.5 4 1 1.03 (0.89, 1.11) 0.6
4 2 1.00 (0.75, 1.25) 0.0 4 2 1.05 (0.83, 1.17) 0.6
4 3 0.97 (0.91, 1.09) -0.7 4 3 1.09 (0.61, 1.39) 0.5
4 4 1.06 (0.62, 1.38) 0.4 4 4 0.97 (0.00, 2.20) 0.1
5 0 1.94 (0.00, 3.59) 1.0 5 0 0.89 (0.58, 1.42) -0.4
5 1 0.95 (0.16, 1.84) 0.0 5 1 0.94 (0.88, 1.12) -1.0
5 2 0.98 (0.65, 1.35) 0.0 5 2 0.99 (0.72, 1.28) 0.0
5 3 1.02 (0.84, 1.16) 0.3 5 3 0.96 (0.24, 1.76) 0.0
5 4 0.99 (0.86, 1.14) -0.2 5 4 1.04 (0.31, 1.69) 0.2
6 0 1.93 (0.00, 3.58) 1.0 6 0 1.01 (0.49, 1.51) 0.1
6 1 0.87 (0.25, 1.75) -0.3 6 1 1.06 (0.89, 1.11) 1.0
6 2 0.96 (0.77, 1.23) -0.4 6 2 1.08 (0.79, 1.21) 0.7
6 3 1.00 (0.92, 1.08) 0.1 6 3 0.89 (0.53, 1.47) -0.4
6 4 1.02 (0.64, 1.36) 0.2 6 4 1.09 (0.05, 1.95) 0.3
7 0 1.30 (0.00, 2.44) 0.6 7 0 0.79 (0.41, 1.59) -0.7
7 1 0.89 (0.00, 2.01) -0.1 7 1 1.01 (0.87, 1.13) 0.1
7 2 1.02 (0.74, 1.26) 0.2 7 2 1.01 (0.88, 1.12) 0.1
7 3 1.01 (0.83, 1.17) 0.2 7 3 0.99 (0.30, 1.70) 0.1
7 4 1.04 (0.85, 1.15) 0.5 7 4 1.17 (0.55, 1.45) 0.8
8 0 1.09 (0.00, 2.46) 0.4 8 0 0.90 (0.07, 1.93) -0.1
8 1 0.94 (0.00, 2.87) 0.3 8 1 1.00 (0.81, 1.19) 0.1
8 2 0.93 (0.68, 1.32) -0.4 8 2 1.00 (0.63, 1.37) 0.0
37
Proficiency Level Validation
8 3 0.98 (0.89, 1.11) -0.3 8 3 0.98 (0.72, 1.28) -0.1
8 4 1.05 (0.69, 1.31) 0.4 8 4 0.94 (0.75, 1.25) -0.5
9 0 1.31 (0.60, 1.40) 1.4 9 0 0.94 (0.70, 1.30) -0.3
9 1 0.96 (0.32, 1.68) 0.0 9 1 0.99 (0.82, 1.18) -0.1
9 2 0.98 (0.75, 1.25) -0.1 9 2 1.00 (0.75, 1.25) 0.1
9 3 1.03 (0.90, 1.10) 0.5 9 3 1.05 (0.69, 1.31) 0.3
9 4 0.97 (0.70, 1.30) -0.1 9 4 1.16 (0.60, 1.40) 0.8
10 0 1.18 (0.40, 1.60) 0.7 10 0 0.93 (0.56, 1.44) -0.3
10 1 0.93 (0.32, 1.68) -0.1 10 1 0.94 (0.83, 1.17) -0.6
10 2 1.00 (0.89, 1.11) 0.0 10 2 1.00 (0.76, 1.24) 0.1
10 3 0.99 (0.89, 1.11) -0.1 10 3 0.85 (0.81, 1.19) -1.6
10 4 1.11 (0.38, 1.62) 0.4 10 4 0.87 (0.01, 1.99) -0.1
11 0 1.06 (0.25, 1.75) 0.3 11 0 0.81 (0.41, 1.59) -0.6
11 1 0.99 (0.00, 2.02) 0.1 11 1 0.98 (0.87, 1.13) -0.2
11 2 1.02 (0.79, 1.21) 0.2 11 2 0.98 (0.79, 1.21) -0.1
11 3 1.00 (0.89, 1.11) 0.1 11 3 1.01 (0.60, 1.40) 0.1
11 4 1.11 (0.81, 1.19) 1.1 11 4 0.87 (0.52, 1.48) -0.5
12 0 1.20 (0.13, 1.87) 0.6 12 0 0.85 (0.46, 1.54) -0.5
12 1 0.90 (0.15, 1.85) -0.1 12 1 0.96 (0.87, 1.13) -0.6
12 2 0.98 (0.74, 1.26) -0.1 12 2 1.02 (0.76, 1.24) 0.2
12 3 0.98 (0.90, 1.10) -0.4 12 3 0.96 (0.48, 1.52) -0.1
12 4 1.07 (0.82, 1.18) 0.8 12 4 1.12 (0.55, 1.45) 0.6
13 0 1.16 (0.34, 1.66) 0.6 13 0 1.02 (0.59, 1.41) 0.2
13 1 0.92 (0.46, 1.54) -0.2 13 1 0.92 (0.87, 1.13) -1.2
13 2 0.97 (0.64, 1.36) -0.1 13 2 1.02 (0.58, 1.42) 0.1
13 3 1.00 (0.93, 1.07) 0.0 13 3 1.04 (0.63, 1.37) 0.2
13 4 0.99 (0.78, 1.22) -0.1 13 4 0.82 (0.58, 1.42) -0.9
14 0 1.12 (0.14, 1.86) 0.4 14 0 0.89 (0.30, 1.70) -0.2
14 1 0.89 (0.16, 1.84) -0.1 14 1 0.99 (0.81, 1.19) -0.1
14 2 1.04 (0.51, 1.49) 0.2 14 2 1.02 (0.78, 1.22) 0.2
14 3 0.99 (0.94, 1.06) -0.2 14 3 1.01 (0.67, 1.33) 0.1
14 4 1.00 (0.84, 1.16) 0.0 14 4 0.99 (0.73, 1.27) 0.0
15 0 1.07 (0.53, 1.47) 0.3 15 0 1.03 (0.63, 1.37) 0.2
15 1 0.93 (0.32, 1.68) -0.1 15 1 0.90 (0.89, 1.11) -1.8
15 2 0.99 (0.72, 1.28) 0.0 15 2 1.00 (0.73, 1.27) 0.0
15 3 0.99 (0.92, 1.08) -0.2 15 3 1.01 (0.48, 1.52) 0.1
15 4 0.99 (0.70, 1.30) 0.0 15 4 0.92 (0.52, 1.48) -0.2
16 0 1.14 (0.79, 1.21) 1.2 16 0 1.24 (0.56, 1.44) 1.6
16 1 1.01 (0.49, 1.51) 0.1 16 1 1.09 (0.84, 1.16) 1.1
16 2 1.01 (0.78, 1.22) 0.1 16 2 1.02 (0.80, 1.20) 0.2
16 3 1.03 (0.74, 1.26) 0.3 16 3 1.20 (0.75, 1.25) 1.5
16 4 1.03 (0.34, 1.66) 0.2 16 4 0.97 (0.42, 1.58) 0.0
38
Proficiency Level Validation
17 0 1.14 (0.26, 1.74) 0.5 17 0 0.98 (0.41, 1.59) 0.0
17 1 0.81 (0.34, 1.66) -0.5 17 1 0.93 (0.87, 1.13) -1.0
17 2 0.99 (0.59, 1.41) 0.0 17 2 1.05 (0.80, 1.20) 0.5
17 3 0.98 (0.94, 1.06) -0.6 17 3 1.04 (0.54, 1.46) 0.2
17 4 1.04 (0.82, 1.18) 0.4 17 4 1.11 (0.58, 1.42) 0.5
*For item # 3, the 95% CI and t value are not available because no respondent was scored 0 in the item.
39