Thesis - Development and Application of A Four-Tier Test To Assess-Compressed (001-079)
Thesis - Development and Application of A Four-Tier Test To Assess-Compressed (001-079)
A THESIS SUBMITTED TO
THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES
OF
MIDDLE EAST TECHNICAL UNIVERSITY
BY
DERYA KALTAKÇI
SEPTEMBER 2012
Approval of the thesis:
submitted by DERYA KALTAKÇI in partial fulfillment of the requirements for the degree
of Doctor of Philosophy in Secondary Science and Mathematics Education Department,
Middle East Technical University by,
Date: 03.09.2012
iii
I hereby declare that all information in this document has been obtained and presented
in accordance with academic rules and ethical conduct. I also declare that, as required
by these rules and conduct, I have fully cited and referenced all material and results
that are not original to this work.
Signature :
iv
ABSTRACT
Kaltakçı, Derya
Ph.D., Department of Secondary Science and Mathematics Education
Supervisor: Assoc. Prof. Dr. Ali Eryılmaz
Co-Supervisor: Prof. Dr. Lillian Christie McDermott
The main purpose of this study was to develop and administer a four-tier test for assessing
Turkish pre-service physics teachers‟ misconceptions about geometrical optics. Sixteen pre-
service physics teachers, who were selected by maximum variation and convenience
sampling methods from three state universities in Ankara, were interviewed in the contexts
of plane mirrors, spherical mirrors, and lenses. From these interviews and the studies in the
literature, the Open-Ended Geometrical Optics Test was developed. It was administered to
52 pre-service physics teachers from three state universities selected by purposive and
convenience sampling. The responses of each subject for each item were categorized in order
to determine the alternatives of the multiple-tier multiple-choice misconception test. The
initial form of the test, the Three-Tier Geometrical Optics Test (TTGOT), was administered
to 53 pre-service physics teachers from three state universities selected by purposive and
convenience sampling as a pilot study. The analysis of the results of the TTGOT was used to
revise the test items. Finally, the Four-Tier Geometrical Optics Test (FTGOT) was
developed and administered to 243 pre-service physics teachers from 12 state universities in
Turkey. The validity of the FTGOT scores was established by means of some qualitative and
quantitative methods. The Cronbach alpha reliability coefficients were calculated for
v
different test scores. Those for the total correct scores and standardized misconception scores
(SUMM4) were found to be .59 and .42, respectively. Some misconceptions, which were
held by more than 10 % of the pre-service teachers, were identified and considered to be
significant.
ÖZ
Kaltakçı, Derya
Doktora, Ortaöğretim Fen ve Matematik Alanları Eğitimi Bölümü
Tez DanıĢmanı: Doç. Dr. Ali Eryılmaz
Ortak Tez DanıĢmanı: Prof. Dr. Lillian Christie McDermott
Bu çalıĢmanın asıl amacı Türk fizik öğretmen adaylarının geometrik optik hakkındaki
kavram yanılgılarını ölçmek için dört basamaklı bir test geliĢtirip uygulamaktı. Ankara
ilindeki üç devlet üniversitesinden maksimum çeĢitlilik ve kolay ulaĢılabilir durum
örnekleme yöntemleri kullanılarak seçilen 16 fizik öğretmen adayı ile düzlem aynalar,
küresel aynalar ve mercekler bağlamlarında görüĢmeler yapıldı. Bu görüĢmelere ve
alanyazınındaki çalıĢmalara dayanarak Açık Uçlu Geometrik Optik Testi geliĢtirildi. Bu test,
amaçlı ve kolay ulaĢılabilir durum örnekleme yöntemleri kullanılarak seçilen üç devlet
üniversitesinden 52 fizik öğretmen adayına uygulandı. Adayların her bir soru için verdikleri
cevaplar çoktan seçmeli olarak geliĢtirilecek olan çok aĢamalı testin çeldiricilerine karar
verebilmek amacıyla kategorize edildi. Testin ilk hali olan Üç AĢamalı Geometrik Optik
Testi (ÜAGOT) pilot çalıĢma olarak, amaçlı ve kolay ulaĢılabilir durum örnekleme
yöntemleri kullanılarak seçilen üç devlet üniversitesinden 52 fizik öğretmen adayına
uygulandı. ÜAGOT‟nin sonuçlarının analizi, testin sorularının revizasyonu için kullanıldı.
Son olarak, Dört AĢamalı Geometrik Optik Testi (DAGOT) geliĢtirildi ve Türkiye‟deki 12
devlet üniversitesinden 243 fizik öğretmen adayına uygulandı. DAGOT skorlarının geçerliği
nitel ve nicel yöntemler kullanılarak sağlandı. Cronbach alpha güvenirlik katsayısı farklı test
skorları için hesaplandı. Güvenirlik katsayısı, tüm test doğru skorlar ve standardize edilmiĢ
vii
yanlıĢ skorlar (SUMM4) üzerinden .59 ve .42 olarak hesaplandı. Katılımcıların % 10‟undan
fazlası tarafından sahip olunan kavram yanılgıları tespit edildi ve önemli olarak belirlendi.
Anahtar Kelimeler: Fizik Eğitimi, Kavram Yanılgıları, Geometrik Optik, Tanı Testleri, Dört-
Basamaklı Test
viii
To
ACKNOWLEDGMENTS
The completion of this thesis was possible with the efforts, encouragements and
support of many people to whom I would like to present my special thanks. First of all, I
would like to thank to my supervisor Assoc. Prof. Dr. Ali Eryılmaz for his invaluable
guidance, helpful suggestions, constructive criticism, and time for discussing the viewpoints
and findings.
Secondly, I would like to express my appreciation and gratitude to my co-supervisor
Prof. Dr. Lillian C. McDermott. It was an honor for me to have her guidance and feel her
support in every stage of my dissertation study. Deep appreciation is further extended to
Prof. Dr. Peter Shaffer, Prof. Dr. Paula Heron, Assoc. Prof. Dr. MacKenzie Stetzer, Donna
Messina and the other members of the Physics Education Group at University of
Washington. I feel very fortunate to have a chance to get to know all of them during my one
year visit to the USA.
I am thankful to the examining committee members of my thesis, Prof. Dr. Bilal
GüneĢ, Assoc. Prof. Dr. Esen Uzuntiryaki, Assist. Prof. Dr. Esma BuluĢ Kırıkkaya, Assist.
Prof. Dr. Ömer Faruk Özdemir for their comments and positive criticism.
I am also grateful to my special friends and colleagues Dr. Ufuk Yıldırım, UlaĢ
Üstün, Haki PeĢman, Betül Demirdöğen, AyĢegül Tarkın, Kübra Eryurt, Lana Ivanjek, Holly
Shelton at METU and at UW for their support and friendship.
I am very much appreciating the financial support of the Scientific and
Technological Research Council of Turkey (TÜBĠTAK) with the BĠDEB-2211 and BĠDEB-
2214 scholarship programs and the Faculty Development Program (ÖYP) at METU.
Special thanks to my parents Nevil and Ġhsan Kaltakçı and to my brother Onur
Kaltakçı for their never-ending support and love throughout my life. Lastly and mostly, I
thank to my fiance Hikmet Hakan Gürel for his deep love, patience and constant
encouragement during this process. Without your love and encouragement, my thesis could
not have been accomplished.
x
TABLE OF CONTENTS
ABSTRACT............................................................................................................................ iv
ÖZ ........................................................................................................................................... vi
ACKNOWLEDGMENTS ...................................................................................................... ix
TABLE OF CONTENTS ......................................................................................................... x
LIST OF TABLES ................................................................................................................ xiv
LIST OF FIGURES ............................................................................................................ xviii
LIST OF ABBREVIATIONS ................................................................................................ xx
CHAPTERS
1. INTRODUCTION ............................................................................................................... 1
1.1. Research Questions ....................................................................................................... 6
1.2. Definition of Important Terms .................................................................................... 11
1.2.1. Constitutive Definitions ....................................................................................... 11
1.2.2. Operational Definitions ........................................................................................ 12
1.3. Geometrical Optics as the Topic of Investigation ....................................................... 14
1.3.1. Scope of Geometrical Optics Studied by Pre-service Teachers ........................... 14
1.3.2. Desired Pre-service Teacher Conceptual Understanding in Geometrical Optics. 15
1.4. Significance of the Study ............................................................................................ 16
2. LITERATURE REVIEW .................................................................................................. 19
2.1. Student Conceptions and Misconceptions .................................................................. 19
2.2. Probable Sources of Student Misconceptions and Teachers as a Source .................... 24
2.3. Methods for Diagnostic Assessment ........................................................................... 27
2.3.1. Interviews and Open-Ended Tests ....................................................................... 28
2.3.2. Multiple-Choice Tests .......................................................................................... 31
2.3.2.1. The Force Concept Inventory (FCI) .............................................................. 35
2.3.2.2. Concerns about the FCI ................................................................................. 37
2.3.3. Multiple-Tier Tests............................................................................................... 40
2.3.3.1. Two-Tier Tests .............................................................................................. 40
2.3.3.2. Three-Tier Tests ............................................................................................ 47
2.3.3.3. Four-Tier Tests .............................................................................................. 59
2.4. Previously Documented Misconceptions in Geometrical Optics ................................ 64
xi
LIST OF TABLES
TABLES
Table 1.1 Calculations Performed for the Four-Tier Geometric Optics Test Scores in Order
to Address the Research Questions of the Present Study ..................................... 10
Table 1.2 Decision Table for the Lack of Knowledge on the FTGOT .................................. 12
Table 1.3 Decision Table Based on Combinations of the Results of the First and Third-Tiers
When the Participant is Sure on Both the Second and Fourth Tiers .................... 13
Table 2.1 Strengths and Limitations of IAI Method Mentioned by Osborne and Gilbert
(1980b) ................................................................................................................. 30
Table 2.2 Conceptual Tests in Physics Published in Journals ............................................... 34
Table 2.3 Average of MDT Scores (in %) for Arizona State University and Nearby High
School Students .................................................................................................... 35
Table 2.4 Average of FCI Scores (in %) for University and High School Students in the USA
.............................................................................................................................. 36
Table 2.5 Two-Tier Tests Published in Journals.................................................................... 43
Table 2.6 Percentages of Students‟ Misconceptions about Light Identified with Two-Tier
Diagnostic Test in Fetherstonhaugh and Treagust‟s (1992) Study ....................... 46
Table 2.7 Scoring System Used by Dressel and Schmid (1953) in Confidence Testing ....... 48
Table 2.8 Decision Matrix for an Individual Student and for a Given Question (Hasan et al.,
1999)..................................................................................................................... 50
Table 2.9 Determination of Certitude Scores for Interpreting Student Metacognition .......... 51
Table 2.10 Categorization of Alternative Conceptions by Caleon and Subramaniam (2010a;
2010b)................................................................................................................. 52
Table 2.11 The Misconceptions in Geometric Optics Identified in the More than 10 % of the
Sample in the Study of Kutluay (2005) .............................................................. 55
Table 2.12 Some Descriptive Statistics for Three-Tier Tests ................................................ 57
Table 2.13 The Mean Percentages of Misconceptions Based on Number of Tiers ............... 58
Table 2.14 Comparison of Four-Tier Tests and Three-Tier Tests in Terms of Determining
Lack of Knowledge ............................................................................................ 60
Table 2.15 Comparison of Decisions in Four-Tier Tests and Three-Tier Tests ................... 61
Table 2.16 Comparison of Strengths and Weaknesses of Diagnostic Methods ..................... 63
Table 2.17 Summary of Studies about Students‟ Conceptions in Geometrical Optics .......... 66
xv
Table 2.18 Description of the Contexts and the Percentages of Students with Correct
Answers to the Questions in Each Context in the Study of Chu et al. (2009) .... 74
Table 2.19 Description and the Percentages of Students‟ Context Dependent Alternative
Conceptions Identified from the Study of Chu et al. (2009) ............................... 74
Table 3.1 Information about the Universities with Physics Teacher Education Programs in
Turkey that Constitutes the Target Population of the Study ................................. 89
Table 3.2 Information about Interview Universities .............................................................. 91
Table 3.3 Demographic Information about the Sixteen Participants Selected for Interviews 92
Table 3.4 Information about Open Ended Test Universities .................................................. 93
Table 3.5 Information about Pilot Test of Three Tier Geometrical Optics Test Universities 93
Table 3.6 Information about FTGOT Universities ................................................................. 95
Table 3.7 Summary of the Samples and Sampling Methods of the Whole Study ................. 96
Table 3.8 Information about the Interview Sessions of the Study.......................................... 99
Table 3.9 Calculated AND, OR and SUM Scores for the Correct and Misconception Scores
............................................................................................................................ 106
Table 4.1 Categories of the Subjects‟ Responses for Each Item on the Open-Ended Test .. 144
Table 4.2 Percentage of Subjects Choosing Each First Tier and Reasoning Tier Alternative
for Each Item on the TTGOT.............................................................................. 161
Table 4.3 ITEMAN Analysis Results for the TTGOT for All Three Tiers Scores .............. 165
Table 4.4 Percentages of False Positives, False Negatives, Lack of Knowledge, Correct
Answers for Only First Tier, Correct Answers for First Two Tiers and Correct
Answers for All Three Tiers for the TTGOT Scores .......................................... 166
Table 4.5 Description of Coding Categories and Their Frequencies ................................... 169
Table 4.6 Number of Subjects Who Filled the Blank Alternative for Each Item Tier ......... 170
Table 4.7 Correlations between Correct Only First Score and First Confidence Score; Correct
Only Third Score and Second Confidence Score; Correct First and Third Score
and Both Confidences Score ............................................................................... 173
Table 4.8 Correlations between Correct Only First Score and First Confidence Score; Correct
Only Third Score and Second Confidence Score; Correct First and Third Score
and Both Confidences Score for Male and Female Subjects Separately ............ 177
Table 4.9 The SPSS Settings for Conducting Factor Analysis............................................. 180
Table 4.10 The SPSS Output Showing KMO and Bartlett‟s Test for Correct Only First Tier
Scores ................................................................................................................ 182
Table 4.11 The SPSS Output Showing Communalities for Correct Only First Tier Scores 183
xvi
Table 4.12 The SPSS Output Showing the Rotated Component Matrix for Correct Only First
Tier Scores ........................................................................................................ 183
Table 4.13 Interpretation of the Factors Obtained from the Analysis of Correct Scores of
Total Test-Only First Tier Scores ..................................................................... 185
Table 4.14 Results of the Reliability and Factor Analysis of the Correct Scores ................ 186
Table 4.15 Interpretations of the Factors Obtained from the Analysis of Correct Scores of
Total Test- First and Third Tiers, and All Four Tiers Scores ........................... 189
Table 4.16 Interpretation of the Factors Obtained from the Analysis of Correct Scores
According to Their Contexts ............................................................................ 191
Table 4.17 Interpretation of the Factors Obtained from the Analysis of Correct Scores
According to Their Cases ................................................................................. 193
Table 4.18 Interpretation of the Factors Obtained from the Analysis of Correct Scores Coded
with AND, OR, SUM Functions ...................................................................... 195
Table 4.19 Results of the Reliability and Factor Analysis of the Misconception Scores .... 197
Table 4.20 Interpretation of the Factors Obtained from the Analysis of Misconception Scores
Coded with AND, OR, SUM Functions ........................................................... 198
Table 4.21 The Percentages of False Positives, False Negatives, and Lack of Knowledge on
the FTGOT for Correct Scores ......................................................................... 202
Table 4.22 Cronbach Alpha Reliabilities for Correct Scores ............................................... 204
Table 4.23 Cronbach Alpha Reliabilities for Misconceptions Scores ................................. 205
Table 4.24 Difficulty Levels of Correct and Misconception Scores .................................... 206
Table 4.25 Discrimination Indexes of Correct and Misconception Scores .......................... 207
Table 4.26 Overall Descriptive Statistics for All Four Tiers Scores on the FTGOT ........... 208
Table 4.27 Distribution of Percentages of Misconceptions to the Items ............................. 225
Table 4.28 Descriptive Statistics for All four Tiers Score of 15-Misconceptions Measuring
Test ................................................................................................................... 226
Table 4.29 Reliability Indexes, Difficulty levels, and Discrimination Indexes of 15-
Misconceptions Measuring Test ....................................................................... 228
Table 4.30 Results of the Reliability and Factor Analysis of the New Misconception-15
Scores ............................................................................................................... 232
Table 4.31 Interpretation of the Factors Obtained from the Analysis of Misconception Scores
Coded with AND, OR, SUM Functions ........................................................... 233
Table A.1 Atatürk University Kazım Karabekir Education Faculty Physics Teacher
Education Program ............................................................................................. 278
xvii
Table A.2 Balıkesir University Necatibey Education Faculty Physics Teacher Education
Program............................................................................................................... 279
Table A.3 Boğaziçi University Education Faculty Physics Teacher Education Program .... 280
Table A.4 Dokuz Eylül University Education Faculty Physics Teacher Education Program
............................................................................................................................ 281
Table A.5 Dicle University Ziya Gökalp Education Faculty Physics Teacher Education
Program............................................................................................................... 282
Table A.6 Gazi University Education Faculty Physics Teacher Education Program........... 283
Table A.7 Hacettepe University Education Faculty Physics Teacher Education Program .. 285
Table A.8 Karadeniz Teknik University Fatih Education Faculty Physics Teacher Education
Program............................................................................................................... 286
Table A.9 Marmara University Atatürk Education Faculty Physics Teacher Education
Program............................................................................................................... 287
Table A.10 Ondokuz Mayıs University Education Faculty Physics Teacher Education
Program............................................................................................................... 288
Table A.11 Middle East Technical University Education Faculty Physics Teacher Education
Program............................................................................................................... 289
Table A.12 Selçuk University Ahmet KeleĢoğlu Education Faculty Physics Teacher
Education Program ............................................................................................. 290
Table A.13 Yüzüncü Yıl University Education Faculty Physics Teacher Education Program
............................................................................................................................ 291
Table B.1 Pre-service Physics Teachers Registered to the 3rd Grade in the 2010-2011 Spring
Semester at Middle East Technical University (METU) .................................... 292
Table B.2 Pre-service Physics Teachers Registered to the 3rd Grade in the 2010-2011 Spring
Semester at Gazi University................................................................................ 293
Table B.3 Pre-service Physics Teachers Registered to the 4th Grade in the 2010-2011Spring
Semester at Hacettepe University ....................................................................... 294
Table D.1 Table of Specification for the Interview Guide ................................................... 342
LIST OF FIGURES
FIGURES
Figure 1.1 Concept Map of the Scope of Geometrical Optics that is Studied in this
Investigation ......................................................................................................... 14
Figure 2.1 Examples of the IAI Cards for the Concepts Light and Force.............................. 29
Figure 2.2 Stages in the Development of Two-Tier Multiple-Choice Diagnostic Instruments
.............................................................................................................................. 41
Figure 2.3 General Three Tier Test Development Process .................................................... 53
Figure 2.4 Frequency Distribution of Geometrical Optics Topics Investigated in the Studies
.............................................................................................................................. 70
Figure 2.5 Frequency Distribution of Diagnosis Methods Used in Geometrical Optics Studies
.............................................................................................................................. 71
Figure 2.6 Frequency Distribution of Sample Grade Levels in Geometrical Optics Studies 72
Figure 3.1 An Example Illustration of How Subjects‟ Raw Data Coded into Nominal Level
by using Misconception Selection Choices ........................................................ 109
Figure 3.2 An Example Illustration of How ANDM1, ORM1, and SUMM1 Scores were
Produced ............................................................................................................. 110
Figure 3.3 The Development Process of the FTGOT in the Present Study ......................... 114
Figure 4.1 Drawings of Participant M3 for Explaining the Effect of Covering Some Parts of
a Spherical Mirror and Converging Lens ........................................................... 125
Figure 4.2 An Example of a Revision in the Interview Items with Previous and New Version
as a Result of the Analysis of the First Interview Session .................................. 126
Figure 4.3 Drawings by Participants Who Claimed that Any Obstacle between an Object and
a Plane Mirror Hinders the Image Formation and/or Observation Process ........ 129
Figure 4.4 Variation in Participants‟ Drawings for Item 1 .................................................. 132
Figure 4.5 A Sample Drawing of a Participant for Item 28 ................................................. 134
Figure 4.6 A Participant‟s Drawings for Locating an Image along Incoming Light Ray .... 135
Figure 4.7 A Typical Participant Answer to Item 30 ........................................................... 138
Figure 4.8 Positions Where the Girl First Sees Her Eye from the Concave and Convex
Mirrors ................................................................................................................ 140
Figure 4.9 Participants‟ Drawings Showing an Observer can See Herself out of the Region
Enclosed by the Normal Lines to the Mirror Borders ........................................ 141
xix
Figure 4.10 Scatter Plots of Correct Only First Score vs. First Confidence Score; Correct
Only Third Score vs. Second Confidence Score; Correct First and Third Score vs.
Both Confidences Score...................................................................................... 175
Figure 4.11 Scatter Plots of Only First Score vs. First Confidence Score; Correct Only Third
Score vs. Second Confidence Score; Correct First and Third Score vs. Both
Confidences Score for Male and Female Subjects Separately ............................ 179
Figure 4.12 The SPSS Output Showing the Scree Plot ........................................................ 184
Figure 4.13 Histograms of Correct Scores ........................................................................... 210
Figure 4.14 Histograms of Misconceptions Scores .............................................................. 213
Figure 4.15 Percentages of Correct Answers in Terms of Type of the Test ........................ 217
Figure 4.16 Percentages of Correct Scores with AND Function in Terms of Type of the Test
............................................................................................................................ 218
Figure 4.17 Percentages of Correct Scores with OR Function in Terms of Type of the Test
............................................................................................................................ 219
Figure 4.18 Percentages of Correct Scores with SUM Function in Terms of Type of the Test
............................................................................................................................ 220
Figure 4.19 Percentages of Misconceptions Scores with AND Function in Terms of Type of
the Test................................................................................................................ 221
Figure 4.20 Percentages of Misconceptions Scores with OR Function in Terms of Type of
the Test................................................................................................................ 222
Figure 4.21 Percentages of Misconceptions Scores with SUM Function in Terms of Type of
the Test................................................................................................................ 223
Figure 4.22 Histograms for the Test with Selected 15-Misconceptions............................... 227
Figure 4.23 Percentages of 15-Misconceptions Scores with AND Function in Terms of Type
of the Test ........................................................................................................... 229
Figure 4.24 Percentages of 15-Misconceptions Scores with OR Function in Terms of Type of
the Test................................................................................................................ 230
Figure 4.25 Percentages of 15-Misconceptions Scores with SUM Function in Terms of Type
of the Test ........................................................................................................... 231
xx
LIST OF ABBREVIATIONS
SYMBOLS
AC : Alternative Conceptions
CRI : Certainty of Response Index
CGPA : Cumulative Grade Point Average
D : Discrimination Index
FCI : Force Concept Inventory
FN : False Negative
FP : False Positive
FTGOT : Four-Tier Geometrical Optics Test
IAI : Interview-About-Instances
IAE : Interview-About-Events
IDI : Individual Demonstration Interviews
KMO : Kaiser-Meyer-Oklin
LK : Lack of Knowledge
MBT : Mechanic Baseline Test
MCT : Multiple Choice Test
MDT : Mechanic Diagnostic Test
METU : Middle East Technical University
MISQ : Misconception Identification in Science Questionnaire
OCC : Optics Course Credit
OG : Optics Course Grade
OLC : Optics Laboratory Credit
OLG : Optics Laboratory Grade
p : Item Difficulty
PCI : Piagetian Clinical Interviews
POE : Predict-Observe-Explain
p-prims : Phenomenological primitives
r : Pearson correlation
SPSS : Statistical Package For Social Sciences
TE : Teaching Experiment
TTGOT : Three-Tier Geometrical Optics Test
QMVI : Quantum Mechanics Visualization Inventory
xxi
CHAPTER 1
INTRODUCTION
A complete understanding of science and the most effective way of teaching science
have been two of the main concerns of researchers in science education for more than one
hundred years. These concerns will probably continue to be discussed as science and
technology evolves. In today‟s changing world and society, the necessity to prepare students
for the varying demands of society have become crucial. Taking these concerns into account,
starting from the 1900s, many educators became interested in developing new educational
theories that offered new insights into the way students learn and retain knowledge. By the
late 1950s, learning theories began to shift away from behaviorism‟s emphasis on stimulus-
response associations and observable behaviors to cognitivism‟s complex cognitive and
mental processes, such as thinking, problem solving, concept formation, and information
processing. With the entrance of constructivism into the education literature, however,
learning began to be viewed as an individual process of concept development. As a result,
the roles of both learner and the teacher in the classroom have changed.
These innovations in the field of education brought new attempts to shift the focus of
research attention to the learner. Starting from the 1970s, there has been an increasing thrust
in science education on students‟ ways of conceptualizing and reasoning about the
phenomena they encounter. There are several studies on students‟ conception and reasoning
(diSessa, 1993; Driver, Guesne, & Tiberghie, 1985; Gilbert & Watts, 1983; Hammer, 1996;
2000; McDermott, 1993; Osborne, Black, & Smith, 1993; Reiner, Slotta, Chi, & Resnick,
2000; Smith, diSessa & Roschelle, 1993; Wandersee, Mintzes, & Novak, 1994). The
common purpose in these studies is to depict students‟ understanding of several science
concepts. The information gathered from these studies is particularly important, especially
when designing curricular innovations, as well as in the design of a single lesson by a
teacher.
Researchers from a variety of theoretical perspectives have argued that the most
important attribute that students bring to their classes are their conceptions (Ausubel, 1968;
Wandersee et al., 1994), most of which differ from those of scientists (Hammer, 1996;
Griffiths & Preston, 1992). Student conceptions that contradict the scientific view are often
2
labeled “misconceptions” (Al-Rubeyea, 1996; Barke, Hazari, &Yitbarek, 2009; Gilbert &
Watts, 1983; Schmidt, 1997; Smith et al., 1993). Broadly, misconceptions have the common
feature of being strongly held, coherent conceptual structures which are resistant to change
through traditional instruction and need special attention for students to develop a scientific
understanding (Gil-Perez & Carrascosa, 1990; Hammer, 1996; Hasan, Bagayoko & Kelley,
1999; Kaltakci & Didis, 2007). Therefore, correct identification of misconceptions has
become an important first step in order to gain an understanding of student learning.
The usual measures of assessment such as the ability to state correct definitions,
reproduce proofs, solve standard problems cannot provide sufficiently detailed information
to determine students‟ scientific understanding (McDermott, 1991). Instead, in order to
measure students‟ conceptions on several concepts, different diagnostic tools have been
developed and used by researchers. Interviews (Goldberg & McDermott, 1986 ; Osborne &
Gilbert, 1979; 1980a; White & Gunstone, 1992), concept maps (Novak, 1996), open-ended
or free response questionnaires (Wittman, 1998), word association (Champagne, Gunstone,
& Klopfer, 1980, as cited in Wandersee et al., 1994), drawings, essays, multiple-choice tests
(Beichner, 1994; Hestenes, Wells, & Swackhamer, 1992; Tamir, 1971 as cited in Chen, Lin,
& Lin, 2002), and multiple-tier tests (Caleon & Subramaniam, 2010a, 2010b; Chen et al.,
2002; Eryılmaz, 2010; Kutluay, 2005; PeĢman, 2005; PeĢman & Eryılmaz, 2010; Treagust,
1986; Tsai & Chou, 2002) are used to diagnose students‟ conceptions in science education.
Interviews, open-ended and multiple-choice tests are the ones commonly used in physics
education research. However, each tool has some advantages as well as disadvantages over
the others. Interviews have advantages such as flexibility and the possibility of obtaining in-
depth information. However, they can be conducted with only a limited number of
individuals and require a great deal of time. These limitations sometimes make it difficult to
generalize to a broader population. Open-ended tests give responders the chance to write
their answers in their own words and can be administered to larger samples compared to the
interviews. However, it takes time to analyze the results and scoring may be a problem.
Multiple-choice tests can be administered to a large number of individuals. They are easy to
administer and analyze but cannot probe the students‟ responses deeply (Fraenkel & Wallen,
2000). Also, with traditional multiple-choice tests the investigator cannot differentiate
correct answers due to correct reasoning from those due to incorrect reasoning (Caleon &
Subramaniam, 2010b; PeĢman & Eryılmaz, 2010). In other words, correct responses may not
guarantee the presence of the correct scientific conception in these tests. Similarly, a wrong
answer given to a traditional multiple-choice item may not be due to the misconception held,
but might be a wrong answer with correct reasoning. Hestenes et al. (1992) proposed “false
3
positive” and “false negative” concepts in order to emphasize the importance of accuracy of
measures in a multiple-choice test. False positive is defined as a Newtonian answer chosen
with non-Newtonian reasoning; whereas false negative is a non-Newtonian answer with
Newtonian reasoning. False negatives are considered unproblematic and are attributed to
carelessness or inattention. The minimization of false positive answers, on the other hand, is
difficult. They cannot be eliminated altogether. The authors stated that the major problem in
a multiple-choice test development is to minimize false positives and negatives. For this
reason, researchers extended multiple-choice tests into multiple-tier tests, with two, three (or
more recently four) tiers in order to overcome some of the limitations of using interviews,
open-ended tests and traditional multiple-choice tests to diagnose students‟ conceptions.
With the two-tier diagnostic tests developed first by Treagust (1986), more valid and
reliable multiple-choice diagnostic tests to identify misconceptions became possible. They
made valuable contributions to the education field. The first tier of a two-tier diagnostic test
is a typical multiple-choice item, but the second tier asks the reasoning for the answer in the
first tier. Hence, two tier tests provide the opportunity to detect and calculate the proportion
of wrong answers with correct reasoning (false negatives) and the correct answer with wrong
reasoning (false positives). However, two-tier tests cannot discriminate lack of knowledge
from misconceptions, as well as from scientific conceptions, false positives or false
negatives. Three-tier tests eliminated this problem since, in addition to the first two tiers, the
confidence level about the responses in the first two tiers is asked in the third tier. In this
way, misconceptions that are eliminated from lack of knowledge and errors can be assessed.
Although three-tier tests seem to eliminate many disadvantages mentioned above, they still
cannot fully discriminate the confidence choices for the main answer (first tier) from
confidence choices for reasoning (second tier) and therefore may overestimate students‟
scores and underestimate lack of knowledge. Hence, four-tier tests that ask students‟
confidence level for the main and for the reasoning tiers separately are proposed and
discussed in the present study. In the process of developing four-tier misconception tests,
other diagnostic methods such as in-depth interviews and open-ended tests were used to get
the aforementioned benefits of each method and increase the validity of test scores in
assessing student misconceptions.
Effective test development requires a systematic, well-organized approach to ensure
sufficient validity and reliability evidence to support the proposed inferences from the test
scores (Downing, 2006). Hammer (1996) makes an analogy between a researcher exploring
knowledge structure of individuals and a doctor diagnosing diseases. In this analogy
Hammer emphasizes the importance of studies in education that explore individuals‟
4
conceptions. According to him a doctor who knows only one or two diseases would have
only one or two options for diagnosing an ailment, regardless of the technical resources
available. When the diagnosis is correct, the prescribed treatment may be effective; however,
when the diagnosis is not correct, the treatment may not only be ineffective, it may be
damaging. With this analogy, it is clear that studies focusing on conceptual understanding
and methods to diagnose misconceptions in a valid and reliable way have great importance
in physics education research.
Student misconceptions have their origins in a diverse set of personal experiences.
Possible sources of student misconceptions are physical experience, direct observation,
intuition, school teaching, outside of school teaching, social environment, peer culture,
language, textbooks or other instructional materials, and teachers (Al-Rubayea, 1996;
Andersson, 1990; Kaltakci & Eryilmaz, 2010; Renner, Abraham, Grzybowski, Marek, 1990;
Wandersee et al., 1994). Teachers often subscribe to the same misconceptions as their
students (Abell, 2007; Heywood, 2005; Kathryn & Jones, 1998; Wandersee et al., 1994).
They are thought to be the key to improve student performance (Abell, 2007; McDermott,
2006; McDermott, Heron, Shaffer & Stetzer, 2006). There is substantial evidence that
teachers are one of the main sources of student misconceptions (Abell, 2007; Heller &
Finely, 1992; Kikas, 2004; Wandersee et al., 1994). Therefore, before studying student
conceptions, teachers‟ conceptions should be determined and, if necessary, modified in order
to improve students‟ conceptions. In this respect exploring pre-service teachers‟ existing
misconceptions on several science concepts becomes an important concern to be
investigated.
There are several studies in the literature on understanding students‟ conceptions in
science and there has been more research in the domain of physics than in any other
discipline (Duit, 2009; Duit, Niedderer, & Schecker, 2007; McDermott, 1990a; Reiner et al.,
2000; Wandersee et al., 1994). These studies have focused on certain concepts in mechanics
(Clement, 1982; Halloun & Hestenes, 1985; McCloskey, 1983), electricity (McDermott
&Shaffer, 1992; Shipstone, 1988), and heat and temperature (Erickson, 1979, 1980; Rogan,
1988). Duit et al. (2007) documented a bibliography of the publications on students‟ ideas in
science and reported that mechanics (especially the force concept) and electricity (especially
simple electric circuits) are over-researched domains in physics. The authors mentioned the
need for further attention to the other domains. Watts (1985) argued that studies for students‟
understanding of geometrical optics are relatively rare. In another study by Gilbert and Watts
(1983), it was stated that there have been few studies in the world on the topic of light.
5
Kutluay (2005) emphasized the small number of studies on optics, and stated a need for
studies in optics.
Student understanding in geometrical optics has attracted interest of some
researchers in different countries from early childhood to university level (Andersson &
Karrqvist, 1983; Bendall, Goldberg, & Galili, 1993; Colin & Viennot; 2001; Dedes &
Ravanis, 2009; Eaton, Andersson, & Smith, 1984; Fetherstonhaugh & Treagust, 1987;1992;
Galili,1996; Galili, Goldberg, & Bendall, 1993; Goldberg & McDermott, 1986, 1987;
Heywood, 2005; Hubber, 2006; Kocakülah & Demirci, 2010; Langley, Ronen, & Eylon,
1997; Osborne et al., 1993; Rice & Feher, 1987; Ronen &Eylon,1993; Settlage, 1995; ġen,
2003; Van Zee, Hammer, Bell, Roy, & Peter, 2005; Watts, 1985; Wosilait, Heron, Shaffer, &
McDermott, 1998; Yalçın, Altun, Turgut, & Aggül, 2008). All of these studies showed that
students have some misconceptions in this topic regardless of their grade level and culture.
However, there exist a limited number of studies on pre-service teachers, including Turkish
pre-service teachers. In most of the studies in geometrical optics, interviews and open-ended
tests were used as a method of investigation to determine student misconceptions with a
small number of samples. However, the number of studies with multiple-tier tests with large
samples was relatively few. The studies in geometrical optics can be categorized into four
interrelated categories as:
1. Nature and propagation of light (properties of light, rectilinear propagation of light,
illumination, shadows),
2. Vision (role of observer‟s eye, how we see),
3. Optical Imaging (reflection and refraction in plane mirrors, spherical mirrors, lenses,
prisms),
4. Color.
Most of the previous studies in geometrical optics were conducted on the nature and
propagation of light and optical imaging. However, the number of studies on topics in vision
and color was relatively few. Even though the number of studies in optical imaging was
comparably high among all categories, most of them dealt with plane mirrors or converging
lenses as a context. Hardly any studies were found in the literature about students‟
conceptions in the contexts of convex mirrors, hinged plane mirrors, and diverging lenses.
In Turkey starting from the 2005-2006 academic year, the new Science and
Technology teaching program has been compulsory in all elementary schools (GüneĢ, 2007).
As a continuum of this change in the Science and Technology teaching program, the high
school physics teaching program has also been developed and has started to be applied.
Among the primary aims of the Science and Technology teaching program, as well as the
6
new Turkish high school Physics teaching program, students‟ understanding and
conceptualization of the natural world come first (MEB Talim Terbiye Kurulu BaĢkanlığı,
2005a, 2005b, 2007). For this reason, studies on conceptualization have great importance. In
the spiral curriculum attempt in the Science and Technology teaching program, the light and
optics topic is revisited starting from fourth grade to seventh grade regularly. It is then
discussed at the eleventh and twelfth grades in the four-year secondary school teaching
program. It is addressed in almost all physics teacher education programs at the university
level. Nevertheless the limited number of studies on this topic in Turkey requires attention.
Beichner (1994) proposed the combination of the strengths of both interviews and
multiple-choice tests as an ideal course of action for testing student understanding in physics.
In the present study with the combination of qualitative and quantitative designs, both the in-
depth inquiry of interviewing and the broad generalization of multiple-tier tests were aimed
to provide an understanding of the Turkish pre-service physics teachers‟ misconceptions in
geometrical optics.
There is almost no in-depth study for Turkish pre-service physics teachers‟
conceptions in geometrical optics. This research in this respect will provide valuable
information for both physics teacher education program developers and faculties at
universities. The information obtained can be used to design courses that take into account
student conceptions in geometrical optics. The main purpose of this study is to develop a
valid and reliable four-tier diagnostic test to assess misconceptions in geometrical optics.
The second purpose of the study is to diagnose the misconceptions in geometrical optics of
Turkish pre-service physics teachers who have already completed optics courses in their
programs.
The research questions investigated in this study can be classified as main questions
and sub-questions. The main objectives of this study were to develop a valid and reliable
four-tier multiple-choice test for assessing Turkish pre-service physics teachers‟
misconceptions in geometrical optics and to diagnose and categorize Turkish pre-service
physics teachers‟ misconceptions in the topic. A two-phase study with qualitative and
quantitative parts was conducted by addressing the two main questions. In addition, some
sub-questions related to the two main questions were investigated. The sub-questions are
related to the specified main question and were investigated within the frame of the main
question, but do not comprise the whole content of the related main question.
7
In the first phase of the study, a valid and reliable four-tier test was developed in
order to assess pre-service physics teachers‟ misconceptions in geometrical optics. In-depth
interviews, an open-ended test, and the multiple-tier test were conducted. The purposes of
the in-depth interviews were:
1. to see if the Turkish pre-service physics teachers hold the same misconceptions in
geometrical optics as discussed in the literature;
2. to see if the Turkish sample has some other misconceptions which have not been
mentioned in the literature before;
3. to detect the most resourceful contexts to elicit students‟ misconceptions in
geometrical optics for the construction of an open-ended test.
The open-ended test questions were selected from the contexts revealed in the
interviews with the Turkish pre-service teachers and from the contexts which had not
received enough attention in previously documented studies in geometrical optics. The main
purpose of administering the open-ended test was to construct the alternatives of the
multiple-tier multiple-choice test from the answers and reasoning given by the pre-service
teachers in their own words. With the help of the interviews and the open-ended test, as well
as the literature, a misconception diagnostic test with four tiers was developed and
administered to the Turkish pre-service physics teachers. Then the test scores obtained from
pilot and main administrations were used to collect validity and reliability evidence for the
test scores. Test validity is the degree to which the interpretation of a test score is believed to
accurately represent what the test score is intended to measure.
Tests themselves cannot be valid or invalid. Instead, we validate the use of a test
score in a specific context. Hence, when we say the “validity of the test”, we refer to the
“validity of the test scores” throughout the study. Since validity is a matter of degree, not all
or none, there exists no absolute method to determine the validity of the test scores.
However, qualitative and quantitative evidence from different sources were collected to
ensure validity. In the present study the validity of the four tier misconception test scores
was established in three ways:
1. Calculating the correlations among student scores on the first, the reasoning, and the
confidence tiers. It is expected that in a valid test, students who are able to
understand the question can then judge their ability to answer that particular question
correctly or incorrectly (Çataloğlu, 2002).
2. Conducting the factor analysis for the correct scores and misconception scores of the
students. It is expected that previously determined related items would result in some
acceptable factors if a test actually measures the concepts it aims to measure.
8
3. Estimating the percentages of false positives, false negatives, and lack of knowledge.
It is expected that these values should be minimized for establishing construct
validity (Hestenes & Halloun, 1995).
The Cronbach alpha coefficient was calculated in order to establish the internal
consistency of the test scores for the four-tier geometrical optics test as a measure of
reliability of the test scores.
In light of the purpose of the first phase, the first main question and its sub-research
questions of the present study are:
1. Are the Turkish pre-service physics teachers‟ scores on the Four-Tier Geometrical
Optics Test (FTGOT) valid and reliable measures in assessing their misconceptions
in geometrical optics?
a. What are the misconceptions in geometrical optics of a divergent sample of pre-
service physics teachers who completed optics and optics laboratory courses in
all state universities in Turkey as investigated with in-depth interviews?
i. What contexts are more resourceful in order to elicit pre-service physics
teachers‟ misconceptions in geometrical optics?
ii. What are the pre-service physics teachers‟ misconceptions that are
similar to the ones from the literature?
iii. What are the pre-service physics teachers‟ misconceptions that are not
mentioned in the literature before?
b. What are the misconceptions in geometrical optics of a divergent sample of pre-
service physics teachers who completed optics and optics laboratory courses in
all state universities in Turkey as investigated with open-ended questions?
i. What are the pre-service physics teachers‟ misconceptions that are
similar to the ones from the literature?
ii. What are the pre-service physics teachers‟ misconceptions that are not
mentioned in the literature before?
e. Does the underlying descriptive factor structure of the FTGOT support the
previously determined misconception structure of the test?
f. What are the percentages of false positives, false negatives, and lack of
knowledge for each item and in average on the FTGOT?
g. What are the Cronbach alpha reliability coefficients of correct scores and
misconception scores on the FTGOT?
h. What are the difficulty levels and the discrimination indices of each item and on
average on the FTGOT?
In the second phase of the study; the constructed four-tier misconception test was
used to diagnose Turkish pre-service teachers‟ misconceptions in geometrical optics. In this
survey design, pre-service teachers‟ responses to the four-tier misconception test were
investigated with different codings (codings for the student scores over misconceptions and
over correct responses) and functions (misconception and correct scores calculated with
AND, OR, SUM functions) that were suitable for the four-tier misconception tests. The
results were compared with one tier, two-tier, and four-tier correspondences.
Hence, the second main research question and its sub-questions are:
2. What are the misconceptions in geometrical optics of all pre-service physics teachers
who completed optics and optics laboratory courses in all state universities in Turkey
as investigated with the FTGOT?
a. How do the percentages of the pre-service physics teachers giving correct
answers change when only the first tiers, first and third tiers, and all four tiers
are separately taken into account with different analysis types namely AND,
OR, SUM functions?
b. How do the percentages of the pre-service physics teachers having a
misconception change when only the first tier, first and third tiers, and all four
tiers are separately taken into account with different analysis types namely
AND, OR, SUM functions?
Table 1.1 summarizes the calculations for the FTGOT scores from the sub-research
questions in the two main phases of the present study, together with the purpose of the
calculations. The computable and incomputable calculations for each tier and all tiers of the
test are given in the table.
Table 1.1 Calculations Performed for the Four-Tier Geometric Optics Test Scores in Order to Address the Research Questions of the Present Study
Calculations for Four Tier Testing Only 1st tier 1st and 3rd Tiers All Four Tiers Purpose of Calculation
False Positive and False Negative Proportions X X √ Validity
Lack of Knowledge Proportions X X √ Validity
Test Development
Correlation with Confidence X X √ Validity
Factor Analysis for Misconception Scores √ √ √ Validity
Factor Analysis for Correct Scores √ √ √ Validity
Discrimination Index √ √ √ Validity
Difficulty index √ √ √ Reliability
Cronbach Alpha Reliability √ √ √ Reliability
Percentages of Correct Answers √ √ √ Diagnostic
Survey
Percentages of Misconceptions √ √ √ Diagnostic
Note: √ represents the computable statistics; X represents the incomputable or not computed statistics.
10
11
This section presents some important definitions related to the study. The
constitutive definitions include the dictionary definitions of the terms, whereas the
operational definitions specify the actions or operations necessary to measure or identify
these terms.
Geometrical optics: The branch of optics dealing with the concept of light as a ray.
Conception: Conception is the formation of an image, idea, or notion in the mind with the
conscious act of understanding (Webster, 1997).
Misconception: An idea which is wrong or untrue, but which people believe because they do
not understand the subject properly (Longman, 2012).
Diagnostic test: Diagnostic tests are assessment tools which are concerned with the
persistent or recurring learning difficulties that are left unresolved and are the causes of
learning difficulties (Gronlund, 1981).
Two-tier misconception test: A two-tier misconception test is a two-level question of which
the first tier requires a content response and the second tier requires a reason for the response
(Treagust, 1986)
Three-tier misconception test: An additional tier that asks the confidence about the answers
of the former two-tiers is included in a three-tier test (Çataloğlu, 2002).
False Negative: An incorrect answer that is chosen by individuals who have the correct,
scientific reasoning for the question (Hestenes et al., 1992; Hestenes & Halloun, 1995).
False Positive: A correct response that is chosen by individuals who do not have the correct,
scientific reasoning for the question (Hestenes et al., 1992; Hestenes & Halloun, 1995).
AND Function: A function used to calculate the proportion of subjects who hold a unique
conception in “all” of the contexts.
OR Function: A function used to calculate the proportion of subjects who hold a unique
conception in “at least one” of the contexts.
SUM Function: A function used to calculate the proportion of subjects who hold a unique
conception by summing the contexts in which each subject hold that conception.
12
Four-tier misconception test: A four-level multiple-choice test that has been designed with
common student misconceptions. In the first tier of the test, there is a question about
concepts in geometrical optics; in the second tier the degree of confidence about the given
response in the first tier is asked; in the third tier the reasoning about the answer in the first
tier is given, and in the fourth tier the degree of confidence about the given reasoning in the
third tier is asked.
Lack of knowledge: Lack of knowledge is decided by pre-service physics teachers‟ answers
for the second and fourth tiers of items on the FTGOT, which ask their confidence level
about their answers for the first and third tiers, respectively. The answers to those tiers may
to some extent be personality sensitive. If the pre-service teacher is not sure about his/her
answer in any or both of the tiers, then he/she is considered to lack the required knowledge.
Table 1.2 presents the decision table for the lack of knowledge for different possibilities of
confidence for the first and third tiers on the FTGOT.
Table 1.2 Decision Table for the Lack of Knowledge on the FTGOT
Confidence for the 1st tier Confidence for the 3rd tier Decision of researcher for lack
of knowledge
Sure Sure No lack of knowledge
Sure Not sure Lack of knowledge
Not sure Sure Lack of knowledge
Not sure Not sure Lack of knowledge
answer in the first tier, and is sure about the answer in the fourth tier of an item on the
FTGOT may be considered to have a misconception, too.
False Positive: A correct answer chosen in the first tier with wrong reasoning in the third tier
of the FTGOT when the responder is sure about his/her answers in both of the tiers is
considered as a false positive.
False Negative: A wrong answer chosen in the first tier with correct reasoning in the third
tier of the FTGOT when the responder is sure about his/her answers in both of the tiers is
considered as a false negative.
Table 1.3 summarizes the decisions for the results of the first and third tier
combinations on the FTGOT when a participant is sure on both the second and fourth tiers of
the test (i.e., no lack of knowledge).
Table 1.3 Decision Table Based on Combinations of the Results of the First and Third-Tiers
When the Participant is Sure on Both the Second and Fourth Tiers*
AND Function: A function used to calculate the proportion of subjects by assigning “1” to
the subjects who choose the previously defined conceptual dimension or misconception
choices in “all” of the related items, “0” otherwise.
OR Function: A function used to calculate the proportion of subjects by assigning “1” to the
subjects who choose the previously defined conceptual dimension or misconception choices
in “at least one” of the related items, “0” otherwise.
SUM Function: A function used to calculate the proportion of subjects by assigning “1” to
the subjects who choose the previously defined conceptual dimension or misconception
choices and summing the cases for each subject choosing the previously defined conceptual
dimension or misconception choices in all of the related items.
14
Geometrical Optics is the branch of optics that deals with the phenomena of
reflection and refraction of light in optical instruments, including mirrors (single plane,
hinged plane and spherical) and lenses (diverging and converging). Figure 1.1 summarizes
the scope of geometrical optics in a concept map that is studied in the present investigation.
Figure 1.1 Concept Map of the Scope of Geometrical Optics that is Studied in this
Investigation
15
The number of images formed and observed in hinged plane mirrors (two mirrors
case) depends on the angle between the mirrors, the position of the object located,
and the position of the observer.
A screen is a convenience for observation of real image points in space (i.e. aerial
image) and there is a particular image position for an image of an object to be
observed on a screen.
The light ray is a representational tool to show the direction of light prorogation. The
special rays serve as an algorithm to locate the position of an image. Any
combination of two of these is sufficient to locate the position of an image.
However, special rays are sufficient but not necessary in order to form an image.
As science and technology progress, many countries have begun to put more
emphasis on science education. For ensuring meaningful learning, exploration of students‟
conceptions on certain topics has great importance. These investigations constitute the
foundation or give direction to other studies that are intended to design appropriate learning
contexts and materials or to determine appropriate teaching methods. There has been an
acceleration in science education on investigating students‟ ways of conceptualization and
reasoning about the physical phenomena that they encounter. Misconceptions have been one
of the most prominent and widely studied research areas in physics education since they are
thought to be an important factor that prevents students‟ learning and understanding the
concepts. There are several studies in the literature for that concern students‟ misconceptions
in science. Most of these studies are focused on concepts from mechanics, electricity, or heat
and temperature in physics. However, of studies in geometrical optics is very limited and
studies that focus on pre-service (or in-service teachers) are even more limited. For this
reason, the main aim of this study is to explore pre-service physics teachers‟ misconceptions
in geometrical optics.
The concept of “light” in physics is one of the concepts in which everyone may have
some informal and/or formal understanding from at least daily life experiences or language
used in early childhood. The literature reveals that many of the beliefs about light are not
scientific, but contradict the scientifically acceptable knowledge. Revealing the
misconceptions of individuals and developing appropriate instructional strategies to address
them are major concerns of educational researchers. In order to determine these
17
misconceptions, paper-and-pencil instruments alone are not enough. Instead, to identify and
categorize situations in geometrical optics requires in-depth investigations. For a given
individual it is necessary to clarify, inquire and analyze every detail in his/her words,
expressions, products, and activities. There is a need for a valid and reliable diagnostic tool.
For these reasons, in the present study both qualitative and quantitative research
methodologies are used to explore misconceptions in-depth and to portray the pre-service
physics teachers‟ misconceptions in geometrical optics throughout Turkey.
The diagnostic tools used in assessing misconceptions have advantages as well as
disadvantages. For instance, widely used interviews provide the researcher with an
opportunity to investigate deeply and flexibly, but require time and effort to reach large
samples and analyze the results. Multiple-choice tests overcome the aforementioned
problems in interviewing, but are not good for investigating misconceptions deeply and for
detecting false positives and false negatives effectively. Two-tier and three-tier diagnostic
tests overcome these problems in ordinary multiple-choice tests but they are inadequate for
differentiating lack of knowledge from misconceptions and overestimate the misconception
scores. With the four-tier test that is developed and administered in this study, pre-service
physics teachers‟ misconceptions free from error and lack of knowledge can be confidently
assessed. Since each tool has its own strengths, the combination of these strengths with the
appropriate use of different tools for diagnostic purposes is needed. Hence, the present study
aims to derive benefit from three diagnostic tools, namely interview, open-ended test, and
multiple-tier test with four tiers.
In the literature, there are an abundant number of studies on students‟
misconceptions on mechanics and electricity concepts, but little on geometrical optics and
most of the studies have focused on primary or secondary school grades. The number of
studies focusing on teachers‟ misconceptions is relatively small. Also, the studies in
geometrical optics have been mainly on the nature and propagation of light and optical
imaging with plane mirrors and converging lenses. There is a scarcity in other contexts in
optical imaging such as hinged plane mirrors, diverging lenses, and convex mirrors. This
study aims to close this gap by including these relatively untouched contexts, in addition to
the others.
teachers in Turkey to teachers more generally and to teachers beyond Turkey. The present
study focused on pre-service teachers in Turkey because of the similarity in their physics
curriculum, controlled conditions of their participation and their accessibility to the
researcher.
In summary, the significance of this study lies in its contribution to the literature
with the development of the four-tier misconception test with the help of in-depth inquiry of
qualitative research and the generalizability of quantitative research. The test helps to
diagnose the misconceptions in geometrical optics, and to diagnose and present the
misconceptions of the Turkish pre-service physics teachers‟ in geometrical optics. The
findings of this study can provide curriculum developers in higher education and faculties at
teacher education departments with valuable information that can guide them in designing
teaching programs or other instructional practices. As a result, this study can contribute both
to the literature and to practice.
19
CHAPTER 2
LITERATURE REVIEW
In this chapter, the literature review about student conceptions in geometrical optics
is summarized within five parts. The first part presents information about student
conceptions and misconceptions research. The probable sources of student misconceptions
and of teacher misconceptions as a source of student misconceptions are discussed in the
following part. In the third part, methods for diagnostic evaluation in misconception studies
are given. Fourthly, previously documented student misconceptions in geometrical optics in
the literature are presented, and finally this chapter ends with summary part.
Studies on students‟ conceptions and reasoning in science have gained impetus over
the last four decades. As students learn about the world around them formally through school
education or informally through their everyday experiences, they often tend to form their
own views. Because of this concern, several studies have been conducted to depict students‟
understanding. The different forms of student understandings have been called by a number
of different terms such as preconceptions (Ausubel, 1962, as cited in Driver & Easley, 1978;
Clement, 1982), anchoring conceptions (Clement, Brown, & Zietsman, 1989), alternative
conceptions (Wandersee, et al, 1994; Klammer, 1998), misconceptions (Clement et al.,1989;
Driver & Easley, 1978; Helm; 1980), naïve beliefs (Caramazza, McCloskey, & Green,
1980), naïve theories (McCloskey, 1983); alternative frameworks (Driver & Easley, 1978;
Driver, et al., 1985), conceptual frameworks (Driver & Ericson, 1983), commonsense
concepts (Halloun & Hestenes, 1985), children‟s science (Gilbert, Osborne, & Fensham,
1982; Gilbert & Zylbersztajn, 1985), intuitive science (Mayer, 2003), children‟s ideas
(Osborne, et al., 1993), conceptual difficulties (McDermott, 1993), spontaneous reasoning
(Viennot, 1979), phenomenological primitives (shortly p-prims) (diSessa, 1993), and some
other terms as Renström, Andersson, and Marton (1990) stated: constructs, mini-theories,
conceptions, ideas, notions, beliefs, schemata, scripts, frames; as Wandersee et al. (1994)
20
include the explication of individual cases to uncover common features. Such studies are
more likely to be naturalistic, qualitative, study fewer students in depth, uses students‟ self-
report data, and employ thick descriptions (Wandersee et al., 1994). Some terms used in
ideographic studies are: children‟s science, alternative conceptions, alternative framework,
commonsense theories, intuitive beliefs, everyday conceptions. Palacios, Cazorla and Madrid
(1989) assigned the studies on students‟ acquisition of knowledge into two paradigms of
which the first is traditional, scientific, experimental, reductionist, prescriptive, quantitative,
and nomotechnical; while the second paradigm is non-traditional, artistic, naturalistic,
holistic, descriptive, qualitative, and ideographic. The constructivist model of learning
mostly thought it to be related to the ideographic paradigm, whereas Palacious et al. (1989)
found it reasonable to look for both qualitative and quantitative studies that are brought
together and merged into a single set of educational implications. Wandersee et al. (1994)
suggested use of both qualitative and quantitative methods and warned about the temptation
to place a particular researcher into either the nomothetic or the ideographic side on the basis
of terms used.
More recently, however, there has been a shift from misconceptions research to
resources research. In 1993 diSessa proposed an alternative account of intuitive physics
knowledge involving phenomenological primitives (p-prims). In this model instead of
considering students‟ conceptions inherently inconsistent with expert knowledge, they are
considered as elements when appropriately organized, contribute to expert understanding.
Misconception research was criticized because of its examination of restricted sets of
contexts. According to resources researchers, students‟ conceptions failing in one context
may be productive in another unidentified context. For this reason the contexts where and
how conceptions are used by students have become an important concern. Smith et al. (1993)
and Reiner et al. (2000) emphasized that even experts have novice knowledge in several
contexts, and novice knowledge refinement might be useful in constructing expert
knowledge. Considering these arguments, Smith et al. (1993) recommended research that
focuses on the evolution of expert understandings in specific conceptual domains built on
students‟ novice conceptions.
Despite these variations, all the terms stress differences between the ideas that
students bring to instruction and the concepts by the current scientific theories. Whatever it is
called, in almost all of these studies the main aim is the understanding of wrong and flawed
conceptions that impedes learning or the identification of productive components of these
flawed conceptions for other contexts. Therefore, the identification of these conceptions in a
valid way becomes a prominent first step. The most frequently used terms in science and
22
misconceptions remains almost constant by age (Guesne, 1985; Viennot, 1976 and
Carrascosa, 1987, as cited in Gil-Perez & Carrascosa, 1990); while others change
dramatically (Erickson, 1980). However, regardless of cognitive, emotional or physical
maturation misconceptions are still common. Sadler (1998) stated that student
misconceptions appearing to rise with age and ability before they decrease. Similarly,
Shipstone (1985) found that some of the conceptual models in electricity rise with age,
whereas some others decline. Moreover, results of the studies revealed that even honor
students (Peters, 1982), graduate teaching assistants (McDermott et al., 2006), or professors
(Reif, 1995) have similar (or the same) misconceptions on some topics that other physics
students possess. Considering the gender issues, some studies proposed that males are better
than females in physics (Kost, Pollock, & Finkelstein, 2009), and males have fewer
misconceptions than females (Sencar & Eryilmaz, 2004); whereas, some other researchers
advocated that this difference might be a subtle result of study focus and weak design
(Wandersee et al, 1994). For instance; Sencar and Eryilmaz (2004) found that the difference
between male and female students‟ misconceptions in electric circuits depends on the
context of the questions. For practical questions of the misconception measurement test there
was a significant difference between males and females, whereas for theoretical questions
there was no significant difference in terms of gender. After controlling students‟ age and
interest-experience related to electricity, however, this observed gender difference was
mediated on the scores of practical items. Sangsupata (1993, as cited in Al-Rubayea, 1996)
examined misconceptions in optics among high school students in Thailand, and found that
gender had no effect on misconceptions and male and female students held the same
misconceptions. However, he found that the lower grade students (10th graders) had fewer
misconceptions than the students in upper grades (11th and 12th graders). Finally, studies all
over the world in misconceptions on different topics of physics including optics revealed that
most misconceptions are common among individuals regardless of their cultures (Driver et
al., 1985; Fetherstonhaugh, Happs, & Treagust, 1987).
Wandersee et al. (1994) identified eight knowledge claims which emerged from a
substantial research on misconceptions over the world. These claims were considered not as
isolated assertions, but as an integrated whole, are useful for summarizing the research in
this field.
Claim 1: Learners come to formal science instruction with a diverse set of “misconceptions”
concerning natural objects and events.
Claim 2: The “misconceptions” that learners bring to formal science instruction cut across
age, ability, gender, and cultural boundaries.
24
Wandersee et al. (1994) asserted that the origins of misconceptions, by their nature,
remain speculative and the evidence for their origins are difficult to document. Different
researchers proposed different probable sources of student misconceptions. According to
Kikas (2004) the reasons for the rise of misconceptions are different depending on the
peculiarities of concepts and the way they are taught. Smith et al. (1993) stated that
misconceptions arise from students‟ prior learning, either in the classroom or from their
interaction with the physical and social world. Klammer (1998) argued that the answer to the
question “where do the students‟ alternative conceptions originate?” is not simple to answer
since it deals with both cognitive features and pedagogical practice. However, he proposed
experience, language and the curriculum to be the three main possible sources. According to
Driver et al., (1985) children form their faulty ideas and interpretations as a result of
everyday experiences in all aspects of their lives, through practical activities, talking with
other people around them, and through media.
25
Kikas (2004) described four possible sources of misconceptions that hinder children
and adults, including teachers, from gaining scientific understanding and lead the
development of misconceptions. Overgeneralizations on the basis of analogy were
mentioned as the first possible source of misconceptions. Although analogies are used in the
classrooms to relate new information to students‟ prior knowledge, people may take the
analogy too far. A second source of misconceptions is the nature of concepts which belong
to ontologically different categories. Some concepts such as force and light were considered
to be difficult to transfer to different ontological categories. Thirdly, the presentation of
knowledge in school textbooks was considered to promote misconceptions both in students
and teachers. The language of the textbook and the diagrams or models that it contains may
lead to an incorrect interpretation by students and teachers. And as a final source of
misconceptions in schools, teachers‟ faulty content knowledge was mentioned. Kikas (2004)
conducted research on a total of 198 physics, chemistry, biology, primary, humanities, and
trainee teachers to investigate misconceptions on three different phenomena: motion of
objects, seasons, and changes of matter. It was concluded that very many misconceptions
were found among especially trainee and primary teachers. Those teacher misconceptions act
as obstacles, especially in the practice of student-centered, active teaching methods. The
study also signified that there have been no studies comparing the prevalence of
misconceptions on different phenomena in teachers and studies are needed in different
countries.
Helm (1980) developed a multiple-choice test to identify misconceptions in physics
and administered it in South Africa on individuals with a wide range of age and experience
including teachers (N=65). The reported results of the study showed that out of 20 questions
in the test only on one question did 80 % of the teachers select the right answer. In seven of
the questions there was a strong similarity in performance between the teachers and the
students (N=953); in five of the questions 51% or less of the teachers selected the right
answers while an equivalent number selected the fallacious options. Relying on the results of
the study, Helm (1980) interpreted that the triumvirate of teachers, textbooks and
examination papers might be responsible for being the origins of common student
misconceptions; “some misconceptions in physics can be linked to or reinforced by errors in
textbooks, ill-conceived examination questions or carelessness in exposition by teachers”
(p.97). Especially teachers were considered to be a crucial source of student misconceptions
to be dealt with.
The possible sources of misconceptions in optics were grouped into four categories
as: students personal experiences, textbooks, language, and teachers by Kaltakci and
26
Similarly, Ivowi (1984) conducted a study on secondary school students and teachers
in Nigeria to document their misconceptions in physics, and she concluded that the main
sources of the students‟ misconceptions were the teachers and the textbooks. Heller and
Finley (1992, as cited in Al-Rubayea, 1996) studied teachers‟ misconceptions about
electrical current, and they recommended that not only students‟ but also teachers‟
misconceptions should be considered in designing science curriculum.
Findings of studies in physics concerning teachers‟ subject matter knowledge
showed that “…teachers‟ misunderstandings mirror what we know about students. This
finding holds regardless of the methods used to assess teacher knowledge” (Abell, 2007, p.
1117). Ameh and Gunstone (1988, as cited in Cochran & Jones, 1998) found that the
teachers‟ misconceptions were similar in nature and pattern to those found for students, but
were couched in more scientific language. They found similar misconceptions among both
pre-service and experienced secondary teachers and their students in nine science concepts,
one of which was the concept of light in physics.
Teachers‟ subject matter knowledge has not only an effect on student subject matter
knowledge, but also on their motivation in physics. Korur (2008) conducted a multi-method
PhD study with qualitative and quantitative parts in order to explore the interaction between
effective physics teacher characteristics from teachers‟ and students‟ shared perceptions and
students‟ motivation. The findings of the study revealed that there were 38 effective physics
teacher characteristics affecting students‟ motivation in physics under eight categories as
perceived by teachers and students. Among these characteristics the teachers‟ subject matter
knowledge and their personal characteristics were the two categories that mostly affected the
students‟ motivation.
To sum up, misconceptions are strongly held, stable cognitive structures which pose
obstacles to learning, are resistant to change, are difficult to remedy, and need substantial
effort to overcome. There are several possible sources of student misconceptions, but they
are difficult to document and eliminate. However, the first step in overcoming students‟
misconceptions must be to identify them correctly (Klammer, 1998).
methods for assessing misconceptions, they have not reached a consensus regarding a best
method for this purpose (Al-Rubayea, 1996). It depends on the context of the topic to be
investigated, the characteristics of the intended subjects to be investigated, and the ability
and resources of the researcher. However, it is well known that a triangulation of all methods
is better than a single method. In this section interviews, open-ended tests, multiple-choice
tests, and multiple-tier tests are discussed in detail as the most frequently used methods for
diagnosing students‟ misconceptions in physics.
students‟ conceptions by several researchers (Bendall, Goldberg & Galili, 1993; Feher &
Rice, 1987).
The IAI technique (Osborne & Gilbert, 1979) is basically a conversation between an
expert and a student based on questions about the concept. Questions are asked in relation to
a series of line pictures, drawings or diagrams with the use of interview cards. Students‟
meanings for these instances are investigated in depth. Figure 2.1 illustrates two examples
from the concepts of light (Watts, 1985) and force (Osborne & Gilbert, 1980b) to IAI cards.
Figure 2.1 Examples of the IAI Cards for the Concepts Light (Watts, 1985) and Force
(Osborne & Gilbert, 1980b)
During the interview process each card containing pictures of familiar objects or phenomena
are shown, and the interviewee is asked “In your own understanding of the focus concept, is
this an example or not?” (Carr, 1996). In physics education IAI method was used to examine
understanding of the concepts of force (Osborne & Gilbert, 1980b), light (Watts, 1985), and
energy (Kruger, 1990). Osborne and Gilbert (1980b) reported strengths and limitations of the
IAI method which is summarized in Table 2.1. The IAE technique is similar to the IAI, but
in this technique the stimulus shifts from pictures or drawings to activities or events carried
out with the student (Carr, 1996; Kruger, 1990).
30
Table 2.1 Strengths and Limitations of IAI Method Mentioned by Osborne and Gilbert
(1980b p. 378)
Strengths Limitations
1. It is applicable over a wide age range. 1. There is the problem of choosing a
2. It is enjoyable for interviewer and limited but adequate set of instances.
interviewee. 2. The order of instances may influence
3. It has advantages over written answers in student responses.
terms of flexibility and depth of the 3. Interviews, and the transcribing and
investigation. analysis of transcripts are time
4. Classifying instances is more pertinent consuming.
and penetrating than asking for a 4. There are difficulties associated with
definition. interviews and the analysis of the
5. It is concerned with the students‟ view interview data, e.g. difficult to report
rather than merely examining if the succinctly.
student has the correct scientific view.
University of Washington and is similar to POE (Goldberg & McDermott, 1983, 1986, 1987;
Heron & McDermott, 1998). In this technique a simple physical system serves as the set of
tasks to be performed by the student. The task provides a context for further discussions.
Instead of helping the student overcome a difficulty, the main aim of the strategy is to find
out how the student is thinking.
Interviews may be conducted with individuals or with groups (Eshach, 2003; Galili,
Bendall, & Goldberg, 1993; La Rosa, Mayer, Paqtirxi, & Vincentini, 1984; Olivieri,
Totosantucci & Vincentini, 1988; Van Zee, Hammer, Bell, Roy, & Peter, 2005). Duit et al.
(1996) stated that the group interviews have the strength of studying the development of
ideas in the interaction process between students. The purpose of interviewing was stated by
Frankel and Wallen (2000) as finding out what is on people‟s mind, what they think or how
they feel about something. As stated by Hestenes et al. (1992) when skillfully done,
interviewing is one of the most effective means of dealing with misconceptions. Although
interview strategies have the advantages such as gaining in-depth information and flexibility,
a large amount of time is required to interview a large number of people in order to obtain
greater generalizability. Also training in interviewing is required for the researcher. In
addition, interviewer bias may taint the findings. The analysis of data is a little bit difficult
and cumbersome (Adadan & Savasci, 2012; Rollnick & Mahooana, 1999; Sadler, 1998;
Tongchai et al., 2009).
In order to investigate students‟ understanding, open-ended free-response tests were
used. This method gives test takers more time to think and write about their own ideas, but it
is difficult to evaluate the results (Al-Rubayea, 1996). Also because of language problems,
identification of students‟ misconceptions becomes difficult (Bouvens as cited in Al-
Rubayea, 1996) since students are generally less eager to write their answers in full
sentences. Andersson and Karrqvist (1983), Colin and Viennot (2001), Langley, Ronen and
Eylon (1997), Palacious, Cazorla and Cervantes (1989), Ronen and Eylon (1993)
investigated conceptions in geometrical optics and used open-ended questions or tests as a
diagnostic instrument.
been used to ascertain students‟ conceptions. These tests have been used either following in-
depth interviews or alone as a broad investigative measure.
The development of multiple-choice tests on students‟ misconceptions makes a
valuable contribution to the body of work in misconceptions research, assists in the process
of helping science teachers more readily use the findings of research in their classrooms
(Treagust, 1986). Results from diagnostic multiple-choice tests have been reported
frequently in misconception literature. The validity evidence for this format is strong
(Downing, 2006). From the point of view of teachers‟ usage, valid and reliable, easy-to-
score, easy-to-administer, paper-and-pencil instruments enable teachers to effectively assess
students‟ understanding of physics. A physics teacher can get information about students‟
knowledge and misconceptions by use of the diagnostic instruments. Once the student
misconceptions are identified, teachers can work to remedy the faulty conceptions with
appropriate instructional approaches. Advantages of using multiple-choice tests over other
methods have been discussed by several authors (Çataloğlu, 2002; Caleon & Subramaniam,
2010a; Iona, 1986; Tamir, 1990). To sum up, some of the advantages of multiple-choice tests
are:
1. They permit coverage of a wide range of topics in a relatively short time.
2. They are versatile, and can be used to measure different levels of learning and
cognitive skills.
3. They are objective in terms of scoring and therefore more reliable.
4. They are easily and quickly scored.
5. They are good for students who know their subject matter but are poor writers.
6. They are suitable for item analysis by which various attributes can be
determined.
7. They are the next best thing to essay questions due to students‟ inability to write
coherent and logical expressions.
8. They provide valuable diagnostic information and are viable alternatives to
interviews and other qualitative tools in gauging students‟ understanding and in
determining the prevalence and distribution of misconceptions across a
population.
Four Tier Wave Diagnostic Instrument (4WADI) (Caleon & Subramaniam, 2010b)
35
Among these tests the Force Concept Inventory (FCI) developed by Hestenes et al.
(1992) is the most popular in physics education for examining students‟ conceptions in
mechanics.
The FCI is the improved version of the Mechanics Diagnostic Test (MDT) which
was developed by Halloun and Hestenes in 1985 (Hestenes et al., 1992). MDT was first
prepared in an open-ended form and administered to more than 1000 college and
introductory physics students over three years. The most common misconceptions were
determined and used in the final multiple-choice version of the test. It consisted of 36
multiple-choice items about kinematical and dynamical concepts. The scores on the test were
thought to be a measure of qualitative understanding of mechanics. The instrument was
recommended for use as a placement exam, as a means of evaluating instruction, and as a
diagnostic test to identify and classify specific misconceptions. The face and content validity
of the test were provided carefully. In order to establish the reliability, the researchers
conducted interviews and compared the students‟ answers in interviews and on the test. In
addition, different groups were tested at different times, and the Kuder-Richardson reliability
coefficient for use as a pretest was found .86 and for use as a post-test was .89. Thus, the
MDT test scores are reliable. The test was administered at Arizona State University and a
nearby high school, and the average scores presented in Table 2.3 were obtained for the
pretest and post-test administrations. The average normalized gain (<g>) which was used as
a rough measure of the average effectiveness of a course (Hake, 1998) was found to be low
for all three groups.
Table 2.3 Average of MDT Scores (in %) for Arizona State University and Nearby High
School Students (revised from Halloun &Hestenes, 1985)
In 1992 Hestenes and his colleagues developed the FCI in order to supply a more
systematic and complete profile of misconceptions and administered the test to 1500 high
school students and 500 university students accompanied by the MBT. The scores are given
in Table 2.4. The items on the FCI were designed to be meaningful to students without
formal training in mechanics and to elicit their preconceptions about the topic, in contrast to
the MBT which emphasized the concepts that cannot be grasped without formal knowledge
about mechanics. Therefore the best use of the MBT was suggested for post-instruction
evaluation, whereas the FCI can be used both pre and post-instruction evaluations (Hestenes
& Wells, 1992).
Table 2.4 Average of FCI Scores (in %) for University and High School Students in the USA
(Revised from Hestenes et al., 1992)
The FCI is composed of 29 multiple-choice items. Each item has five distracters and
each distracter measures a specific misconception. The items do not require quantitative
manipulation, but a forced choice between Newtonian and non-Newtonian concepts
(Cataloglu, 2002). Taxonomies or Newtonian concepts and misconceptions were provided
with two separate tables. In these tables, Newtonian concepts were divided into six
categories (kinematics, first law, second law, third law, superposition principle, kinds of
forces), and misconceptions were divided into six categories (kinematics, impetus, active
force, action/reaction pairs, concatenation of influences, other influences on motion) with a
total of 30 specific misconceptions. Item distracter choices measuring each of these
categories in both of the tables were provided. The researchers provided evidences to
establish validity and reliability of the test scores. As a consequence of the analysis, the
authors claimed that an FCI score of 60 % could be interpreted as the entry threshold to
Newtonian physics. Students below this threshold were thought to have the following
characteristics: undifferentiated concepts of velocity and acceleration, lack of a vectorial
concept of velocity; lack of a universal force concept; fragmented and incoherent concepts
about force and motion. An FCI score of 85 %, on the other hand, was interpreted as the
Newtonian Mastery threshold, and anyone who gets above this score is considered as
37
confirmed Newtonian thinker (Hestenes & Halloun, 1995). Hestenes et al. (1992) proposed
three different uses of the FCI;
1. For identifying and classifying specific misconceptions in mechanics,
2. For evaluating the effectiveness of instruction,
3. For placement purposes for colleges and universities in conjunction with the MBT.
Huffman and Heller (1995) inquired “what does the FCI actually measure?”. The
authors tried to find evidence with factor analysis from the post-test scores of a group of 145
high school and 750 university students that the FCI actually did not measure the six
conceptual dimensions that were initially mentioned as comprising a force concept. For high
school students, a factor analysis produced a total of ten factors, but only two of them
accounted for enough variance to be considered significant. For the university students, a
factor analysis produced nine factors, but only one factor accounted for enough variance to
be considered significant. Based on these results, Huffman and Heller (1995) concluded that
the items on the FCI are only loosely related to each other and do not necessarily measure a
single force concept or the six conceptual dimensions of the force concept as originally
proposed by its authors. According to the authors, one explanation for the loose relationship
among the items on the FCI is that students actually have a loosely organized understanding
of forces, and the students‟ misconceptions about the force concept are not coherent, but
loosely organized, ill-defined and pieces of knowledge that are dependent upon the context
of the question. The Knowledge-in-pieces and the P-prims (diSessa, 1993) were used to
explain the lack of relationship between the items on the FCI. Another point discussed by
Huffman and Heller was that the piece of knowledge used to answers a question could be
highly dependent upon the context of the question. So, when a student answer a question, it
is difficult to determine the extent to which the question is measuring students‟
understanding of the concept and the extent to which the question is measuring students‟
familiarity with the context. Therefore, the authors claimed that the inventory actually
measured bits and pieces of students‟ knowledge that do not necessarily form a coherent
force concept. Even though, from instructors‟ perspective, the FCI appears to measure six
conceptual dimensions, the factor analysis showed that from the students‟ perspective the
items are measuring something other than these dimensions.
As a response to Huffman and Heller‟s (1995) critique on what actually the FCI
measures, Hestenes and Halloun (1995) reinterpreted the FCI scores. The authors stated that
38
in order to ensure content validity the proportions of false positives (Newtonian response
with non-Newtonian reasoning) and false negatives (non-Newtonian response with
Newtonian reasoning) should be estimated. The proportions of false positives and false
negatives were stated to be less than 10 %. In fact, the lower the probability of false positives
and false negatives, the greater the validity of a multiple-choice test. The minimization of
false positives was considered to be more difficult because of the chance factor by random
choices of false positives. They succeed in reducing noise from false positives in two ways:
first, by assessing each category of force concepts with several questions involving different
contexts, and second with powerful distractors in the FCI culled from student interviews.
The data of Huffman and Heller, which did not clustering on the six dimensions, was
attributed to the noise of false positives. The authors recommended Huffman and Heller to
perform the factor analysis on the group of students with FCI scores between 60 % and 80
%, and above 80 % separately. They also criticized Huffman and Heller (1995) for ignoring
the misconceptions table (Table II) and for interpreting the analysis of results according to
only the Newtonian Concepts table (Table I). According to Hestenes and Halloun, Table I
was never intended to describe students‟ concepts, but it described the Newtonian standard
against which students‟ concepts could be compared. If the data were collected from a
Newtonian population, such as physics professors, then they would find a single factor in
which every question correlated almost perfectly with every other. Lastly, Hestenes and
Halloun suggested that for best results the FCI should be administered and interpreted as a
whole since separate pieces of it are less reliable and informative.
Afterwards Heller and Huffman (1995) published a new article discussing the details
of factor analysis and in this respect the validity of the FCI. They stated that their results
indicated that the FCI items were only weakly correlated, such that for the university sample
only 17 % of the correlations were above 0.19. Similarly, for the high school sample only 12
% of the correlations were above 0.19. Since there were few strong correlations between the
items, the factor analysis yielded one or two weak factors that accounted for only 16 %
(university sample) to 20 % (high school sample) of the total variance in students‟ scores.
Regarding the previous suggestion of Hestenes and Halloun (1995), they conducted a factor
analysis for three separate groups of university students: one which scored below 60 %, the
second which scored between 60 % and 85 %, and the third which scored above 85 %.
However, they found that the correlations between the items for the three sub-samples were
very similar to those of the whole sample, only somewhat lower. As a result, they concluded
that students who scored between 60 % and 85 %, or above 85 % on the FCI did not have a
more coherent Newtonian force concept than the students who score below 60 %. According
39
to Heller and Huffman, the low correlation between the FCI items might stem from two
possible alternative reasons:
1. Students do not have coherent concepts, and their knowledge is in fragmented bits
and pieces.
2. Students do have coherent concepts, but the test items do not measure this
knowledge.
To sum up, they stated that the FCI, which has a reasonable face and content validity, is the
best test available. However, a more valid test from the point of view of students‟ responses
is needed.
Halloun and Hestenes (no date) rejected using correlation coefficients as an
indication of test validity for several reasons. Reducing the items into dichotomous variables,
scoring 1 for a right answer and 0 for a wrong one, were problematic in factor analysis. For
these reasons, they refuted Heller and Huffman‟s (1995) suggestion for the need for a better
test than the FCI, but emphasized the need for a better statistical analysis of the FCI results.
Steinberg and Sabella (1997) speculated that the presence of distracters in multiple-
choice format of the FCI items could bias students toward the incorrect answer and
inaccurately measures students‟ conceptual understanding. For this reason, an open-ended
form of FCI concepts should be administered to understand student misconceptions better.
Rebello and Zollman (2004) studied the role of the distracters on the FCI by
investigating four aspects of the FCI. These aspects were 1) whether students‟ performance
on multiple-choice and open-ended forms of the FCI differ significantly; 2) whether student
responses to open-ended questions could be categorized into the same choices provided on
the current multiple-choice format of the FCI; 3) whether the presence of distracters affects
students‟ selection of incorrect responses in the current multiple-choice format; 4) whether
the student responses would change if alternative distracters that arise from the open-ended
responses are added or replaced by the current FCI distracters affects. In order to check these
research questions, researchers applied two forms of the FCI one is in open-ended and the
other is in the multiple-choice form to a group of 238 introductory physics students. From
the results of the study, it was concluded that there is no notable difference between student
performances in terms of the percentage of correct responses on the two formats of the FCI,
but the current multiple-choice format of the FCI is of limited value in determining the
misconceptions for students who do not respond correctly.
Despite the controversial opinions about the FCI itself and the interpretation of the
FCI scores, the FCI seems to be still the most common choice of teachers and researchers to
40
Solicit students‟
Stage 2: Obtain explanations using free
information about response exercises
students‟ alternative
Solicit students‟ explanations
conceptions
using multiple-choice content
items with free response
justifications
Conduct semi-structured
interviews
Develop first
draft
instrument
Develop final
instrument
Since Treagust (1988) published his seminal work on the development of two-tier
test, large number of researchers have developed and administered two-tier diagnostic tests
in biology (Cheong, Treagust, Kyeleve & Oh, 2010; Griffard & Wandersee, 2001; Odom &
Barrow, 1995; Tsui & Treagust, 2010), in chemistry (Adadan & Savasci, 2012; Peterson,
Treagust & Garnett, 1986; Tan, Goh, Chia & Treagust, 2002), and in physics (Chang et
al., 2007; Chen et al., 2002; Franklin, 1992; Rollnick & Mahooana, 1999; Tsai & Chou,
2002). Table 2.5 summarizes the two-tier tests published in science with their references.
43
scope of the target conceptions in terms of propositional statements and concept maps were
defined. Secondly, an open-ended questionnaire was prepared and administered to high
school students. Thirdly, students‟ responses were analyzed and commonly held
misconceptions identified. Further information was gathered from interviews. Finally, a two-
tier multiple-choice test was constructed based on the answers given to an open-ended
questionnaire and interviews. The second tier also included a blank into which the students
could write any other reason beyond the givens. The constructed test was administered to
317 high school students in Taiwan. The internal consistency reliability was found as .74.
The results of the study showed that a high percentage of students do not comprehend the
nature and mechanism of image formation through mirror reflection.
Tsai and Chu (2002) developed a three networked two-tier test items in science (on
the concepts weight, sound, and heat and light) and administered to 555 14 years old, 599 16
years old students in Taiwan. The distinction of this study was that the test was administered
on the web. Only one item per screen was presented, and for every item there were two
steps. After the students made a choice in the first tier, then the system showed the second
tier with the first tier remaining on the screen. Students‟ responses on each of the three items
were categorized as incorrect (incorrect in both tiers), partially correct (correct in one tier
and incorrect in the other) or correct (correct in both tiers). The results of the single two-tier
item related to light and heat propagation showed that 16 % of the 8th graders and 19 % of
the 10th graders chose the correct answer, whereas an almost equal proportion of students
(14 % of 8th graders and 12 % of 10th graders) held an almost opposite misconception that
heat can be propagated in a vacuum condition but that light could not. The data analysis also
showed that the responses of the 10th graders were not significantly different from those of
the 8th graders for this light and heat related item (χ2 = 14.93, d.f. = 19, p> .05).
Fetherstonhaugh and Treagust (1992) conducted research to obtain data on student
understanding of light and its properties, develop materials, and evaluate the effectiveness of
a teaching strategy using these materials to elicit conceptual change in the topic. They
developed a 16-item diagnostic test in four areas: How does light travel?; How do we see?;
How is light reflected?; How do lenses work?. Twelve of the items in the test were in two-
tier form with multiple-choice questions accompanied by an open response for the student to
provide a reason for choosing the distractor. The remaining four items had only an open
response. The test was administered in a city high school as a pretest (n=27) and the Kuder
Richardson reliability of the test was found to be .67. Also the test was administered as a
pretest (n=20), posttest (n=16), and delayed posttest (n=10) in a country high school. The
diagnostic test identified nine misconceptions and indicated that a large proportion of
46
students in both groups had misconceptions prior to formal instruction. Some of these
misconceptions were still prevalent in the post-test following a conceptual change teaching
strategy and in the delayed post-test three years after the administration of the post-test.
Misconceptions related to how lenses work especially seemed to be strongly held and
resistant to change. These nine misconceptions identified in this study together with their
percentages are summarized in Table 2.6. Another important result deduced from the study
was that student ideas were applied differently depending upon the context.
Table 2.6 Percentages of Students‟ Misconceptions about Light Identified with Two-Tier
Diagnostic Test in Fetherstonhaugh and Treagust‟s (1992) Study
City
School Country School
Pretest Pretest Posttest Delayed
(n=27) (n=20) (n=16) Posttest
Misconception
(n=10)
1. Light travels a different distance depending 41 35 13 0
upon whether it is day or night.
2. Light does not travel during the day. 19 20 6 0
3. Light does not travel at all during the night. 4 15 6 0
4. We see by looking, not by light being 77 75 25 0
reflected to our eyes.
5. People can just see in the dark. 22 10 0 10
6. Cats can see in the dark. 70 60 12 20
7. Light stays on mirrors. 22 25 6 0
8. Lenses are not necessary to form images. 56 83 31 40
9. A whole lens is necessary to form an 100 94 25 9
image.
The study which criticized two-tier tests was done by Griffard and Wandersee
(2001) in the discipline of biology. In order to examine the effectiveness of two-tier tests
they used an instrument developed by Haslam and Treagust on photosynthesis. The study
was conducted on six college students by paper and pencil instrument designed to detect
alternative conceptions, and the participants responded to the two-tier instrument in a think-
aloud task. The findings of the study raised concerns about the validity of using two tier
tests for diagnosing alternative conceptions, since they claimed that the two-tier tests may
diagnose alternative conceptions invalidly. It is not certain whether the students‟ mistakes
were due to misconceptions or unnecessary wording of the test. Another concern about two-
tier tests that was expressed by Tamir (1989) was that the forced-choice items in two-tier
tests provide clues to correct answers that participants would not have had in an open-ended
survey or interview. For instance a student can choose an answer in the second tier on the
47
basis of whether it logically followed from their responses to the first tier (Griffard &
Wandersee, 2001; Chang et al., 2007), or the content of each choice of the second tier seems
partially correct to a responder, but this partially correct response may attract the responder
(Chang, et al., 2007). Caleon and Subramaniam (2010a) and Hasan et al. (1999) called the
attention to significant limitations of two-tier tests in that, those tests cannot differentiate
mistakes due to lack of knowledge from mistakes due to existence of alternative conceptions;
and they cannot differentiate correct responses due to scientific knowledge and those due to
guessing. Thus, two-tier tests might overestimate or underestimate students‟ scientific
conceptions (Chang et al., 2001) or overestimate the proportions of the misconceptions since
the gap in knowledge could not be determined by two tier tests (Aydın, 2007; Caleon &
Subramaniam, 2010a, 2010b; Kutluay, 2005; PeĢman, 2005; PeĢman & Eryılmaz, 2010;
Türker, 2005). Chang et al. (2007) also mentioned that since the choices in the second tier
constructed from the results of interviews, open-ended questionnaires and, the literature
review students are likely to construct their own conceptions out of these and may tend to
choose any item of the second tier arbitrarily. In order to eliminate this problem a blank
alternative was included with the multiple-choice items so that responders could write an
answer that is not provided (Aydın, 2007; Eryılmaz, 2010; PeĢman & Eryılmaz, 2010;
Türker, 2005).
To sum up, two-tier tests have advantages over ordinary multiple-choice tests. The
most important of them is that those tests provide students‟ reasoning or interpretation
behind their selected response. However, these tests have some limitations in discriminating
lack of knowledge from misconceptions, mistakes, or scientific knowledge. For this reason
three-tier tests become crucial in order to determine whether the answers given to the first
two tiers are due to a misconception or a mistake due to lack of knowledge.
The limitations mentioned for the two-tier tests were intended to be compensated by
incorporating a third tier to each item of the test asking for the confidence in the answers
given in the first two tiers (Aydın, 2007; Caleon & Subramaniam, 2010a; Eryılmaz &
Sürmeli, 2002; Eryılmaz, 2010; Kutluay, 2005; PeĢman, 2005; PeĢman & Eryılmaz, 2010;
Türker, 2005).
Confidence rating has been one of the main means of investigation in calibration
research in the field of psychology since 1970s in which individuals were asked to indicate
their belief about the accuracy of their answers to the test items (Echternacht, 1972; Stankov
48
& Crawford, 1997). Generally, confidence ratings are valuable for minimizing the guessing
effects and improving the reliability of the test by its special scoring system. In the methods
of scoring in confidence testing a right answer given with high confidence was given more
credit than a wrong answer given with low confidence (Echternacht, 1972). For example,
Dressel and Schmid (1953 as cited in Echternacht, 1972) used the scoring system below in
Table 2.7 in a degree of certainty test in which examinees indicated on a four-point scale
their certainty in a single answer selected.
Table 2.7 Scoring System Used by Dressel and Schmid (1953) in Confidence Testing
Positive 4 -4
Fairly sure 3 -3
marked was;
Rational guess 2 -2
No defensible choice 1 -1
confidence bias than the multiple-choice format tests. Even though the confidence paradigm
has been widely investigated in psychology, science education research involving the use of
confidence rating in assessment is scant (Caleon & Subramaniam, 2010b) and relatively
new. Al-Rubeyea (1996), Caleon and Subramaniam (2010a); Clement et al. (1992),
Cataloglu (2002), Franklin, 1992), Hasan et al. (1999) and Woehl (2010) made a good
attempt by including confidence tiers in their studies. However, these studies still could not
improve typical multiple choice tests‟ limitations in discriminating lack of knowledge from
misconceptions, mistakes, or scientific knowledge.
Clement et al. (1989) used confidence rating in the determination of anchoring
conception of students. Anchoring conceptions are defined as an intuitive knowledge
structure that is in rough agreement with accepted theory. In their case, a problem situation
was considered as an anchoring example for a student if he or she makes a correct response
to the problem and indicates substantial confidence in the answer. The authors developed a
14 item diagnostic multiple choice test to identify anchoring examples in mechanics. For
each question in the test, students were asked to indicate their confidence on a scale ranging
from 0 (just a blind guess) to 3 (I‟m sure I‟m right). The confidence ratings in this study
were used to determine confident anchors for confidence level ratings of 2 or higher.
Perez and Carrascosa (1990) mentioned that in mechanics pupils have the most
coherent and rooted alternative framework. In a study asking students to express their
confidence in the answers, the results showed that most of the wrong answers on mechanics
were given with high confidence.
Hasan, Bagayoko and Kelley (1999) described a simple and novel method for
identifying misconceptions. They introduced the Certainty of Response Index (CRI) which is
used commonly in social sciences, where a respondent is requested to provide the degree of
his certainty. Irrespective of whether the answer was correct or wrong, a low degree of
confidence indicates guessing which implies a lack of knowledge. On the other hand, if the
respondent has a high degree of confidence in his choice and arrived at the correct answer, it
would indicate that the high degree of certainty was justified. However, if the answer was
wrong, the high certainty would indicate a misplaced confidence in his knowledge of the
subject matter which is an indicator of the existence of a misconception. Hence, CRI enables
someone to differentiate between lack of knowledge and a misconception. With this aim
Hasan et al. (1999) modified Mechanics Diagnostic Test (Halloun & Hestenes, 1985) by
adding a CRI scale from 0 to 5 (0: totally guessed answer,1: almost a guess, 2: not sure, 3:
sure, 4: almost certain, 5: certain) to the ordinary multiple-choice test. The test was
administered to a total of 106 college freshmen students. In the analysis process, they used
50
the decision matrix given in Table 2.8. According to the analysis of students‟ responses in 50
% of the items students had high average CRI values for wrong answers, which might be an
indicator of student misconceptions in these questions. In the remaining 50 % of the
questions, on the other hand, students had low average CRI values for wrong answers which
indicated a lack of knowledge. Differentiation of misconceptions from lack of knowledge
was seen to be important since the presence of misconceptions or lack of knowledge requires
different instructional practices.
Table 2.8 Decision Matrix for an Individual Student and for a Given Question (Hasan et al.,
1999)
In his study, the degree of correlation between QMVI scores and confidence levels was used
as validity evidence with the assumption that students who were able to understand the
question, could then judge their ability to answer that particular question correctly or
incorrectly. Hence, when a positive correlation between students‟ test scores and their
confidence levels exists, it can be concluded that the students‟ believed they understood what
they were reading. Çataloğlu performed the reliability analysis for different subgroups that
took the test and found positive correlations between their QMVI scores and confidence
levels with Pearson coefficients ranging from .393 to .501 for groups, and .490 for overall
groups.
51
Woehl (2010) use the term “certitude” for the subjective estimate of the likelihood of
responding correctly and measured it by asking participants to rate their confidence in the
accuracy of an answer they have given a question. Discrepancy between student‟s
confidence and the actual outcome was used as the indicator of a poor metacognition,
whereas an agreement between them was interpreted as high metacognition. Table 2.9 shows
how certitude scores were determined in this study. A higher total certitude score was used
an indication of the higher certitude accuracy and, thus, greater metacognitive ability.
Caleon and Subramaniam (2010a, 2010b) developed three- and four-tier multiple-
choice tests as an enhanced version of the two-tier multiple-choice tests on the properties and
propagation of mechanical waves in physics. The 14-item three-tier test comprised the
content tier which measures content knowledge; the reason tier which measures explanatory
knowledge; and the confidence tier which measures the strength of conceptual understanding
of the respondents and the test was administered to 243 grade 10 students after formal
instruction on the topic. Two scores were calculated for each question (1) content score (“1”
was given for each correct content choice, and “0” otherwise), (2) both tiers score (“1” was
given when both responses in content and reason tiers were correct, “0” otherwise). In
addition to these two conceptual scores some variables were calculated from the confidence
rating (“1” for Just Guessing to “6” for Absolutely Confident) given for each question.
These variables calculated from the confidence ratings were:
Variable and its description
CF Mean confidence of students when giving a correct or incorrect response.
CFC Mean confidence of students when giving a correct response.
CFW Mean confidence of students when giving a wrong response.
CAQ Mean confidence accuracy quotient. ( )
CB Confidence bias which gives the standardized value of confidence discrimination.
(CB = Confidence rating in a scale of 0 to 1 - Proportion of students who gave
correct responses)
52
Also the alternative conceptions were identified and labeled as in Table 2.10. The genuine
alternative conceptions were thought to be due to lack of understanding of concepts and
application of wrong reasoning; whereas, spurious alternative conceptions were attributed to
the lack of knowledge or guessing. With three-tier form (2010a) they identified 21
significant alternative conceptions, of which 11 were genuine ACs and 10 were spurious
ACs. Among them, four were identified as strong significant alternative conceptions.
Aydın (2007), Eryılmaz and Sürmeli (2002), Eryılmaz (2010), Kaltakci and Didis
(2007), Kızılcık and GüneĢ (2011), Kutluay (2005), PeĢman (2005), PeĢman and Eryılmaz
(2010), Türker (2005) assessed students‟ misconceptions similar to those mentioned before
in the three-tier tests. However, they used the three tier test results to identify and analyze
misconceptions in a slightly different manner.
In almost all of the above three-tier test development processes, the researchers
benefited from diverse methods of diagnosis of misconceptions. The diversity in the data
collection methods enabled the researchers to gain valuable information about the students‟
misconceptions as well as providing a good foundation for developing a valid and reliable
diagnostic assessment tool. Figure 2.3 illustrates the order and names of these diagnostic
methods used in the three-tier test development process.
53
Multiple Choice
Interview Open Ended Test Test with Three
Tiers
For statistically establishing the validity of the developed three tier test three
quantitative techniques were used: (1) finding correlation between student scores on the first
two tiers and confidence levels on the third tier, with the logic that in a properly working test
students with higher scores are expected to be more confident about the correctness of their
answers if they correctly understand what is asked on a test (Çataloğlu, 2002), (2)
conducting explanatory factor analysis to determine whether related items would load into
the same factors or not, (3) calculating proportions of false positives and false negatives,
with the logic that for content validity the proportions of false positives and false negatives
should be minimum (Hestenes & Halloun, 1995). In addition to validity, reliability analysis
was conducted to examine the consistency of results across items within a test. Finally, the
mean percentages of each misconception were reported for the first tier, first two tiers, and
all three tiers and the difference between these percentages was interpreted.
Eryılmaz and Sürmeli (2002) developed a three-tier test on heat and temperature to
assess ninth-grade students‟ misconceptions in Turkey. In this test, the students were asked a
content question in the first tier; the reasoning for the content question was asked in the
second tier; and in the third tier students‟ confidence about the first two tiers was asked.
According to the authors, if a student selected an incorrect choice in the first tier, explained
the answers with reasons was and confident about those two choices, then that student was
considered to have a predefined specific misconception. The 19-item test was developed
from a test on the topic which was translated by BaĢer (1996) into Turkish. The researchers
added second tiers for students‟ reasoning based on the literature search and third tiers for
their confidence of each item. Also they provided students with a blank choice for the first
two tiers to write their own answers or reasons not existing among the available choices. A
table of specification for the misconception choices and of the descriptions of those 15
misconceptions were prepared and then the table together with the three tier test was
examined by five experts in the field. Additionally, three students with different cognitive
states were interviewed with the test items in a think aloud process to see if the items of the
test worked properly. Afterwards, the three-tier test was administered to 77 ninth-grade
54
students in Ankara, Turkey. In the data analysis the researchers calculated the percentages of
students who held each of the predefined misconceptions as well as the mean percentages
according to only the first tiers, the first two tiers, and all three tiers of the test. They found
the mean percentages as 46 %, 27 %, and 18 % respectively. The difference between the
mean percentages of the only first tier and first two tier tests was 19 %, which was attributed
to the existence of students‟ errors. The 9 % difference between the mean percentages of the
first two tiers and all three tiers was attributed to the lack of knowledge of the students in the
heat and temperature topic. The researchers concluded that three-tier tests could confidently
assess misconceptions that are free of errors and lack of knowledge in a more valid way.
Kutluay (2005) developed a three-tier test to diagnose eleventh grade students‟
misconceptions about geometrical optics in Ġstanbul, Turkey. In the first step of the test
development the researcher prepared an interview guide and 15 students were interviewed
with 16 questions in the guide. Afterwards, an open-ended test was developed based on the
interview results and a literature review and administered to 114 eleventh grade science
students. The responses of the students for each item were then categorized with their
frequencies. These categories together with the interview results and the literature research
were used in the development of the Three-tier Geometric Optics Misconception Test
(TTGOMT). The final form of the test was administered to 141 eleventh grade high school
students. In order to check content validity of the test, a factor analysis was conducted and
five categories were found. Also, the proportions of false positives and false negatives were
found to be 28.2 % and 3.4 % respectively. The construct validity was established by
estimating the correlation between the scores of the students for the first two tiers and
confidence levels in the third tier. The reliability of the test was found to be .55 considering
the correct answers, and .28 considering the misconceptions for all three-tier scores. The
mean percentages of misconceptions were found to be 19 %, 13 %, 10 % for only first tier
analysis, first two tiers, and all three tiers, respectively. A 6 % difference was calculated
between only the first tier and first two tiers analysis, and 3.4 % of it were considered as the
proportions of false negatives, whereas the remaining 2.6 % was considered as the result of
inconsistent answers. Similarly, the 3 % difference between the first two tiers and all three
tiers analyses were considered to stem from lack of knowledge. In a second analysis
conducted over the correct scores for only first, first two tiers, and all three tiers, percentages
of the correct answers given to each item in the test and the mean percentages were
calculated. As a result, the average values of 45 %, 17 %, and 13 % were calculated for only
the first, first two tiers, and all three tiers analysis, respectively. The 28 % difference
between only first and first two tiers were attributed to the false positives, whereas the 4 %
55
difference between the first two tiers and all three tiers analysis was considered due to lack
of knowledge. According to the results of the three tier test, the most prevalent
misconceptions which existed in more than 10 % of the sample are listed in Table 2.11.
These results were in agreement with the findings of the previous studies (Galili, 1996;
Guesne, 1985; Fetherstonhaugh & Treagust, 1992; Goldberg & McDermott, 1986).
Table 2.11 The Misconceptions in Geometric Optics Identified in the More than 10 % of the
Sample in the Study of Kutluay (2005)
Misconception %
M3 Eyes can get used to seeing in total darkness. 11
M5 Light is emanating in only one direction from each source, like flash light 18
beams.
M6 Shadows of the objects are clearer when the bigger bulb is used as a light 18
source.
M10 Shadow is black color and light is white color. When they overlap, they mix 13
and form the grey color. In a similar way, when the shadow and light overlap,
the shadow reduce the brightness of the light.
M12 An image in a plane mirror lies behind the mirror along the line of sight 14
between a viewer and the object.
M13 An observer sees the object because the observer directs sight lines toward it, 14
with light possibly emitted from the eyes.
M16 Image of a black object on the mirror was due to black rays bouncing off the 22
black object.
M18 While watching an object its position also shifted as they viewed it from 20
different perspectives.
In other similar studies, PeĢman (2005), Türker (2005), Aydın (2007), and Kızılcık
and GüneĢ (2011) developed three-tier tests in physics and calculated statistics in a similar
manner. Table 2.12 summarizes some of the descriptive statistics for three-tier tests and
Table 2.13 gives the mean percentages calculated for these studies based on the number of
tiers. Among these tests, Türker (2005) developed the three tier version of some of the
selected FCI items and Aydın (2007) developed the three tier version of the TUG-K items.
However, Eryılmaz (2010) performed some further analysis in dealing with
misconception scores. The study selected and concentrated on the common heat and
temperature misconceptions and developed a three tier test with five items (4 of them are in
three tiers, one is two tiers (second tier of which is a confidence tier)) to assess five
misconceptions. These are: (1) Heat and temperature are the same (MisHeatTemp). (2)
Temperature of an object depends on its size (Mistemp1). (3) Temperature of an object
depends on the material (MisTemp2). (4) Heat of an object depends on its size (MisHeat1).
(5) Heat of an object depends on the material (MisHeat2). As can be understood for each