0% found this document useful (0 votes)
108 views

CLASSICAL TEST THEORY: An Introduction To Linear Modeling Approach To Test and Item Analysis

This document provides an overview of Classical Test Theory (CTT), a framework used in educational measurement and psychometrics. CTT views a test score as comprising a "true score" and "error score". It assumes test scores and error scores are uncorrelated. CTT is used to determine test reliability and minimize measurement errors. It introduces concepts like standard error of measurement, which indicates the accuracy of measured attributes. CTT remains important for test development and analysis due to its simplicity, despite some limitations at the item, person, and ability levels.

Uploaded by

Risky Setiawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

CLASSICAL TEST THEORY: An Introduction To Linear Modeling Approach To Test and Item Analysis

This document provides an overview of Classical Test Theory (CTT), a framework used in educational measurement and psychometrics. CTT views a test score as comprising a "true score" and "error score". It assumes test scores and error scores are uncorrelated. CTT is used to determine test reliability and minimize measurement errors. It introduces concepts like standard error of measurement, which indicates the accuracy of measured attributes. CTT remains important for test development and analysis due to its simplicity, despite some limitations at the item, person, and ability levels.

Uploaded by

Risky Setiawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal for Social Studies ISSN: 2455-3220

Available at https://edupediapublications.org/journals Volume 02 Issue 09


September 2016

CLASSICAL TEST THEORY: An Introduction to Linear Modeling


Approach to Test and Item Analysis
Ado Abdu Bichi
Department of Arts and Social Sciences Education,
Northwest University, Kano-Nigeria
[email protected] +2348032928301
.
Abstract: developed in order to construct valid and reliable
The practice of testing has become increasingly items during test development. These indices
common and the reliance on information gained developed mostly rely on the two popular statistical
from test scores to make decision has made an frameworks Classical Test Theory and Item
indelible mark on our culture. The entire educational Response Theory. The two frameworks are
system is today highly concerned with the design and associated with the item development process in the
development of the tests, the procedures of testing, field of educational and psychological test. These
instruments for measuring data, and the frameworks are widely been used in test
methodology to understand and evaluate the results. development to ensure quality of measuring
In theory of measurement in education and instruments and discuss in various literatures in the
psychology, Classical Test Theory (CTT) is a field of psychological and educational measurements
popular framework. The techniques of CTT are on their suitability and effectiveness in test
applied in assessment situations to improve test development process. In the theories the models
analysis and test refinement procedures. The main associated with each have been described and
purpose of this paper is to provide a comprehensive compared, and the ways in which test development
overview of the CTT and its procedures as applied to generally proceeds within each frameworks have
test item development and analysis. The usage of demonstrated [1] the existence of the theoretical as
CTT in measurement is to determine maximum well as empirical differences and similarities of the
information about an individual. It is a scientific two frameworks were extensively described in many
framework which has a pioneer role in educational studies. This paper provides a critical review of the
measurement and psychometric process. CTT has existing empirical studies conducted to describe and
served the measurement community for decades, compare the two popular frameworks.
besides depicting the simplicity of the CTT model
from multiple points of view; various limitations of 2. Classical Test Theory: Overview
the model were highlighted. These limitations are The Random sampling theory and item response
detailed in item, person and ability level. Despite the theory are two major psychometric approaches used
shortcomings attributed to CTT it is recommended in a measurement. Classical test theory approach and
that, Classical test theory approach of item analysis the generalisability theory are the two approaches in
should be maintained in test development and random sampling theory [2]. [3] maintained that,
evaluation, because of its superiority and simplicity Classical test theory is a simple model that describes
in the investigation of reliability and in minimizing how measurement errors can influence observed
measurement errors. scores. According to [4] Classical Test Theory (CTT)
is an emancipation of the early 20th century
Keywords approaches to measuring individual differences. CTT
Classical Test Theory, Test Development, was born after the following three achievements or
Educational Measurement ideas were conceptualized: 1. recognition of the
presence of errors in measurements, 2. a conception
1. Introduction of that error as a random variable, and 3. a
Assessment of students learning is very important in conception of correlation and how to index it. [5]
education. The assessment of students’ cognitive stated that in 1904, Charles Spearman figured out
abilities, academic skills and intellectual how to correct a correlation coefficient for
development involves certain techniques employed attenuation due to error measurement and how to
to sample students’ performance on a particular obtain the reliability index needed in making the
learning outcome targeted by the instructional correction. His finding is considered to be the
objectives one of that techniques is test, the test is beginning of Classical Test Theory. Some other
expected to sample students’ behaviours. Thus scholars who played a significant role in the
creating quality tests is very important in assessing Classical Test Theory's approach include: Truman
the students’ performance; many indices have been Lee Kelley, George Udny Yule, Louis Guttman,

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 27


International Journal for Social Studies ISSN: 2455-3220
Available at https://edupediapublications.org/journals Volume 02 Issue 09
September 2016

those involved in making Kuder-Richardson of measurement. The smaller the standard error of
Formulas [6]. measurement the more certain is the accuracy with
which the attribute measured which also tell us the
2.1. What is Classical Test Theory? individual score is close to the true score.
Classical test theory has been used for decades to Conversely, the larger the standard error of
determine reliability and other characteristics of measurement, the less certain is the accuracy with
measurement instruments. According to [1] Classical which an attribute is measured.
test theory is a theory about test scores that
introduces three concepts (1) test score (often called The standard error of measurement is represented
the observed score), (2) true score, and (3) error and calculated with the formula:
score. Within this framework, various models have
been formulated. Example, in what is often referred (2)
to as the "classical test model,"
Where: SEM=standard error of measurement
(1) Sx = standard deviation of test scores
Rxx= reliability coefficient
This is a simple linear model that links the Small SEM indicates high reliability
observable test score(X) to the sum of two
unobservable variables, true score (T) and error score The Standard errors of measurement are used to
(E). Because the true score is not easily observable, create confidence intervals around specific observed
instead, the true score must be estimated from the scores [8]. The upper and lower bound of the
individual’s responses on a set of test items. confidence interval approximate the value of the true
Therefore the equation is not solvable unless some score.
simplifying assumptions are made.
The observed score in CTT is assumed to be
The major assumptions underlines the CTT are: true measured with error. However, in developing
scores and error scores are uncorrelated, the average measures, the goal of CTT is to minimize this error
error score of the examinees is zero, and error scores [9]. The Importance of a test's reliability and
on the parallel tests are uncorrelated. According to calculating the reliability coefficient increases, in
[7] the assumption of classical test theory is that, that case. If reliability coefficient is known, error
each individual examinee has a true score variance can be estimated. The square root of error
(unobservable) which would be obtained if there variance is determined as a standard error of
were no errors in measurement. However, because measurement and helps to define the confidence
the instruments used are imperfect, the score interval in order to have a more realistic estimation
observed for each individual may differ from an of the true score [10].
individual’s true ability. This difference between the
observed score and the true score results from 2.2. CTT Statistics and Item Analysis
measurement error. Error is often assumed to be a Classical test analysis utilizes traditional item and
random variable having a normal distribution. The sample dependent statistics. These include item
Classical test theory’s implication for examinees is difficulty and item discrimination estimates,
that tests are fallible imprecise tools. The score distractor analyses, item-test inter correlations, and a
obtained by an individual is called the individual’s variety of related statistics. Most of the psychometric
true score. This means that even with the repeated analyses have focused on examinee assessment at the
application of the same test, the true score for an test score level, rather than at the item level.
individual will not change. This CTT’s observed Classical test analysis also typically includes a
score is always the true score influenced by some measure for the reliability of scores (i.e., Cronbach’s
degree of error, the influence of this error on the Alpha), difficulty of the test item and Discrimination.
observe score can be positive or negative. Item Analysis is a set of statistical procedures that
focus on the selection of items that maximizes score
[8] stated that, theoretically, the standard deviation of reliability. The major classical analysis statistics are.
the distribution of random errors for each examinee 1. Difficulty (item level statistic); 2. Discrimination
tells about the magnitude of measurement error. (item level statistic) and 3. Reliability (test level
Usually, it is assumed that the distribution of random statistic).
errors will be the same for all test takers. The
standard deviation of errors is uses as the basic i Item Difficulty
measure of error in Classical test theory. In practice, Item difficulty in classical theory is the first item
the reliability of the test and standard deviation of the characteristic to be determined. Item difficulty is
observed score are used to estimate the standard error simply the proportion of examinees taking the test,

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 28


International Journal for Social Studies ISSN: 2455-3220
Available at https://edupediapublications.org/journals Volume 02 Issue 09
September 2016

who got an item or answer it correctly. The larger the while providing enough cases for analysis" The
percentage getting an item correctly, the easier the discrimination index, D, is given as
item is. The higher the difficulty value, the easier the
item is understood to be. To compute the item (4)
difficulty index, divide the number of examinees
answering the item correctly by the total number of With Pu being the proportion of correct responses for
examinees answering item. An item answered the upper group and Pi being the proportion of
correctly by 75% of the examinees would have a correct responses for the lower group, Since its
difficulty index or p-value, of .75, whereas an item proportion ranges from -1 to +1, a negative index
answered correctly by 40% of the examinees would indicates that the larger portion of the lower group
have a lower item difficulty or p-value, of .40 [11]. answered the item correctly while a positive index
The item difficulty is denoted as p and is indicates that a higher proportion of the upper group
symbolically given as: got the item correctly [15].

b. Discrimination coefficients
(3)
There are two indicators of the item's discrimination
effectiveness; these are; point biserial correlation and
Where P = is the difficulty of a certain item biserial correlation coefficient. The choice of
R = is the number of examinees who get that item correlation depends on the kind of question we want
correct and to answer. One of the major shortcomings of the
N = is the total number of examinees. discrimination index, D is that, only 54% (27% upper
+ 27% lower) are used to compute the item
A general guideline for the interpretation of an item discrimination and 46% of the examinees ignored.
difficulty index is provided in the following table; Similarly, the advantage of using discrimination
see, for example, [12]; [13] [14] among others coefficients in determining the discriminating power
over the discrimination index is that every examinee
Table 1: Item difficulty indices interpretation [14]
taking the test is used to compute the discrimination
Difficulty Index (p) Interpretation
coefficients. A point-biserial correlation coefficient
P ≤ 0.30 Difficult (rpbi) is defined by:
0.31 ≤ 0.70 Moderately difficult
P> 0.70 Easy
(5)
ii Item Discrimination
Item discrimination refers to the difference in correct Where: Mp = whole-test mean for students
responses between the low and the high scoring answering item correctly,
students. It is the ability of a test item to discriminate Mq = whole-test mean for students answering
between higher ability and lower ability examinees item incorrectly,
[12]. For the item difficulty, a group that answered St = standard deviation for whole test,
the item correctly, and one that did not is created. p = proportion of students answering correctly
This statistic focuses on determining the correct q = proportion of students answering
respondents or examinees get the item right or wrong incorrectly [13].
in a test. In essence, the aim of item discrimination is
to eliminate or dropped or modified items that do not A Point biserial correlation (r_pbi) coefficient ranges
function well in the tested group [15]. The index of from -1 to +1. A high point-biserial coefficient
discrimination to determine the discriminating power means that students with higher total scores are
of an item can be computed using two indices: the students selecting the correct response, and students
item discrimination index, D, and Item selecting incorrect responses to an item are
discrimination coefficient associated with lower total scores. According to the
value of r_pbi, item can discriminate between high-
a. Item Discrimination Index (D) ability and low-ability examinees. Very low or
This method can be applied to compute a simple negative point-biserial coefficients help in
measure of the discriminating power of an item using identifying defective test items [15].
the extreme groups [11]. In calculating the D index,
first ranks order the students by their test scores. A summary of the widely used [16] criteria and
Next, separate the top 27% of the students and the guidelines for categorizing discrimination indices in
27% at the bottom for the analysis. As stated by [13] item and test analysis is used in this study.
"27% is used because it has shown that this value
will maximize differences in normal distributions

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 29


International Journal for Social Studies ISSN: 2455-3220
Available at https://edupediapublications.org/journals Volume 02 Issue 09
September 2016

Table 2.2: Interpretation of Discrimination Indices [18] Others methods are as Split Half and Kuder-
Discrimination Quality of an Item Richardson-20 and 21 (KR-20 and 21).
Index However In educational research using classical test
D ≥ 0.40 Item is functioning quite approach, internal consistency estimates are the
satisfactorily easiest to obtain which indicate the extent to which
0.30 ≤ D ≤ 0.39 Good item; little or no revision is each item correlates with other items. This is
required measured on a scale of 0-1. The higher the
0.20 ≤ D ≤ 0.29 Item is marginal and need revision coefficient the higher the item reliability, internal
D ≤ 0.19 Poor item; should be eliminated or consistency is arrived at by using split-half, Kuder-
completely revised Richardson-20 and 21 and Cronbach alpha [18].

iii Reliability Split-half


There are different means of estimating the reliability Split-half reliability assumes that the items in an
of any measure [17]. These methods are explain with instrument can be split into two matched halves in
the help of the diagram below: adopted from [18] terms of contents and cumulative degree of
difficulty. This is often achieved by assigning all the
odd numbered items to one group, and all even
numbered items into another. Essentially, testees’
marks on one-half are expected to match his or her
marks on the other half. The calculation follows by
correlating the marks in the odd items with the marks
in the even items using Pearson’s statistics and
corrected for the whole items using.

(6)

Figure 1: Classical test Reliability Where: rhh: correlation between the two halves
of the test
a. Reliability as Equivalence Procedure:
[19] pointed out that, Reliability as equivalence is of a) Divide the test into two equal halves
two forms: parallel or alternate form and inter-rater b) Calculate the correlation coefficient
form. Estimating reliability using parallel or alternate between the two halves
form requires the developing two forms of a test or c) Calculate the Spearman-Brown reliability
instrument using the same content domain, same estimate
number of items, same test specifications, same item Spearman-Brown formula will give an estimate of
format as well as a similar difficulty and maximum reliability that can be expected (upper
discrimination indices. bound estimate)

b. Reliability as Stability Kuder-Richardson-20 and 21 (KR-20 and 21)


[18] stated that, Test-retest reliability is used to [21] develop procedures for determining the
measure the consistency of a test or instruments homogeneity of items. Probably, the best-known
across time. It is assessed by the correlating the index of homogeneity is KR-20; This KR-20 is
results of tests administered over two or more arrived at by considering the proportion of correct
different periods to the same group of people. and incorrect responses to each of the items on a test.
The formula for KR-20 is:
c. Reliability as Internal Consistency
According [17] Internal consistency gives an (7)
estimate of the equivalence of sets of items from the Where;
same measuring instruments (e.g., a set of questions K = Number of trials or items;
aimed at assessing students' ability in Mathematics). S2x= variance of scores;
The internal consistency reliability coefficient p = percentage answering item right;
provides an estimate of the reliability of q = percentage answering item wrong and
measurement, and it is based on the assumption that, pq = sum of pq products for all k items
items measuring the same behaviour should
correlate. Cronbach’s Alpha is the most widely used However, KR-21 assumes that all items in the test
method for estimating internal consistency reliability. are of equal difficulty and computationally simpler.
The formula for KR-21 is:

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 30


International Journal for Social Studies ISSN: 2455-3220
Available at https://edupediapublications.org/journals Volume 02 Issue 09
September 2016

(8) regard to their competence in a particular subject


(e.g. Economics). That is; such kind of test is
Where; K = number of trials or items in the test; intended to yield a wide range of scores maximising
S2 = variance of test and discriminations among all students taking the test.
When a test for this purpose is designed, items are
X = mean of test
generally chosen within a medium level and narrow
range of difficulty.
Cronbach’s alpha
The alpha formula is one of the best analyses that can
be used to gauge the reliability (i.e., accuracy) of 2.4. Advantages of Classical Test Analysis
educational and psychological measurements. The According to [4] benefits obtainable through the
formula was designed to be applied in a two-way application of proper instructional objectives and
table of data where rows represent persons (p), and item writing using classical test analysis include:
columns represent scores (x), under two or more First, Using Classical test theory, analyses can be
conditions (i). Because the analysis assesses the performed with smaller representative samples of
consistency of scores from one condition to another, examinees. Secondly, classical test analysis employs
procedures like alpha are known as internal relative simple and straightforward mathematical
consistency analyses [19]. The reliability was procedures and model parameter estimations are
computed with coefficient alpha, defined as: conceptually easy. Thirdly, classical test analysis
assumptions are easily met by traditional testing
procedures. Because of this it is often referred to as
(9) “weak models”.

Where: k: represent number of items on the test; 2.5. Limitations of Classical Test Theory
si2: sum of the variances of the different While classical test methods have proven to be very
parts of the test (item i) and useful and are still widely used among practitioners
sx2: variance of the test scores in test construction and analysis process. [1] mention
Cronbach’s α can be shown to provide a lower bound that, the two classical item statistics; item difficulty
for reliability under rather mild assumptions. Thus, and item discrimination that form the cornerstones of
the reliability of test scores in a population is always many classical test and item analyses are group
higher than the value of Cronbach’s α in that dependent (depend on the sample). Thus, the P and
population. 0.7-0.8 is an acceptable value for D or r-values depend on the students’ sample in
Cronbach’s α; values substantially lower indicate an which they are obtained. In terms of discrimination
unreliable scale [23]. indices, higher values will tend to be obtained from
heterogeneous samples and lower values from
2.3. Item Selection in Classical Test Theory homogeneous samples. Similarly, in terms of item
In classical test theory item analysis consists of difficulty indices, higher values will be obtained
determining sample-specific parameters and from the samples examinees of above-average ability
eliminating items based on the statistical criteria or and lower values from examinee samples of low or
set standards. A poor item in the entire test is below-average ability [24]. “Such sample
identified by an item difficulty index that is too low dependency relationships reduce the overall utility of
(p<0.30) or too high (p> 0.70), or a low item these statistics” [4].
discrimination indices, such that rpbi≤ 0.20 [12].
According to [1] in test development, items are Another weakness of classical test theory is that its
selected on the basis of these two characteristics: applications are test dependent or “test-based”. Test
item difficulty and item discrimination. An item with difficulty directly affects the resultant test scores.
the highest discrimination parameters is normally Higher knowledge scores are directly associated with
prioritized in item selection, however, the choice of tests composed of relatively easy items, and low
item difficulty and discrimination is usually informed knowledge scores can be attributed to a test
by the purpose of the test and the anticipated ability composed of items that are more difficult. The true
distribution of the group of people for whom the test score model, upon which much of classical test
is intended. Example, where the purpose of a test is theory is based, permits no consideration of
to select a group of high-ability students for the examinee responses to any specific item. Thus, no
award of a scholarship, here, the items that are quite basis exists to predict how a given examinee will
difficult are generally chosen for the entire perform on a particular test item [4].This shows that
population of the test takers. the examinee ability depends on the test item
Example norm-referenced achievement tests are difficulty
designed to differentiate between examinees with [15] wrote that classical test reliability is an indicator
of the quality of a set of test scores; hence, reliability

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 31


International Journal for Social Studies ISSN: 2455-3220
Available at https://edupediapublications.org/journals Volume 02 Issue 09
September 2016

is dependent on characteristics of the group of 4. Acknowledgement


examinees, in addition to being dependent on The author appreciate the effort of the Kano State
characteristics of the test and the test administration. Government under the visionary governor Engr. Dr.
Another limitation of classical test theory is that to Rabi’u Musa Kwankwaso, FNSE, whose fashion and
compare the performance of different examinees, the concern for the welfare and educational development
examinees must be given the same or parallel items. of his people introduced the postgraduate scholarship
Another problem of classical test theory is its scheme which has offered me the opportunity to
inability to provide basis for determining how an achieve what I am celebrating today.
examinee in a given population might perform when
confronted with test items [25]. Finally, according to The author dedicated the work to his lovely wife
[25], classical test theory assumes that the Maryam Musa, his children Khadija and Fatima who
measurement error is the same for all test despite my absence with long distance remain
takers/examinees. courageous and always encourage me with prayers,
Because of the criticisms heaped upon classical test love and goodwill.
theory, some test developers have turned to item
response theory.
5. References

3. Conclusion and Recommendations [1] Hambleton, R. K., & Jones, R. W. (1993).Comparison


Multiple factors such as the psychological state of of Classical Test Theory and Item Response Theory and
examinee, environmental factors or test itself affect their Applications to Test Development. Educational
examinees’ scores in each implementation of Measurement: Issues and Practice, 12(3), 38-47.
instrument. Sometimes, each test administration
gives different results about the same person. The [2] Bejar, I. I. (1993). A Generative Approach to
only valid and reliable constructions of examinations Psychological and Educational Measurement. In N.
are for interpreting the real aspect of the ability of Frederiksen, R.J. Mislevy,& I.I. Bejar (Eds.), Test Theory
for a New Generation of Tests (pp. 323-357). Hillsdale, NJ:
individual.
Erlbaum.

As it has been mentioned before, the main purpose of [3] Marcoulides, G. A. (1999). Generalizability theory:
the psychometric process and usage of different Picking up where the Rasch IRT model leaves off? In S. E.
measurement approaches or theories is to determine Embretson and S. L. Hershberger (Eds.), the new rules of
maximum information about an individual. This measurement: What every psychologist and educator
valuable information is accessible by different should know. Mahwah, NJ: Erlbaum, pp. 129-152
methods, if valid, theoretic mathematical background
of implementation is used and a reliable atmosphere [4] Schumacker, R. E. (2010). Classical Test Analysis.
is satisfied. CTT is a scientific framework which has http://appliedmeasurementassociates.com/ama/assets/File/
a pioneer role in educational measurement and CLASSICAL_TEST_ANALYSIS.pdf. Retrieved on 13
August, 2014.
psychometric process. Essential rules of this theory
are discussed and presented in this study. CTT has
[5] Allen, M. J., & Yen, W. M. (1979). Introduction to
served the measurement community for decades; due
Measurement Theory. Monterey, CA: Brooks/Cole
to its weaknesses IRT has witnessed an exponential Publishing Company.
growth in recent decades [26]. Therefore, this study
presented the main principles of CTT and their [6] Traub, R.E., & Fisher, C.W. (1997).On The
effects on the educational measurement process. Equivalence of Constructed Responses and Multiple-
Besides depicting the simplicity of the CTT model Choice Tests. Applied Psychological Measurement, 1, 355-
from multiple points of view, various limitations of 370.
the model were highlighted. These limitations are
detailed in item, person and ability level. [7] Magno, C. (2009). Demonstrating the Difference
between Classical Test Theory and Item Response Theory
Despite the shortcomings attributed to CTT it is using Derived Test Data. The International Journal of
Educational and Psychological Assessment, Vol.1, Issue 1.
recommended that, Classical test theory approach of
Pp. 1-11
item analysis should be maintained in test
development and evaluation, because of its
[8] Kaplan, R. M. & Saccuzo, D. P. (1997). Psychological
superiority and simplicity in the investigation of Testing: Principles, Applications and Issues. Pacific
reliability and in minimizing measurement errors. Grove: Brooks Cole Pub. Company
Secondly, achievement tests used to in examining
students’ achievement compared to educational [9] McBride, N. L. (2001). An Item Response Theory
standards should be made to pass through all the Analysis of the Scales From The International Personality
processes of standardization and validation. Item Pool and the Neo Personality Inventory-Revised.

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 32


International Journal for Social Studies ISSN: 2455-3220
Available at https://edupediapublications.org/journals Volume 02 Issue 09
September 2016

Master of Sciences Thesis submitted to the Faculty of


Virginia Polytechnic Institute and State University

[10] Erguven, M. (2014). Two Approaches to


Psychometric Process: Classical Test Theory and Item
Response Theory. Journal of Education, 2(2), 23-30.

[11] Matlock-Hetzel, S. (1997). Basic Concepts in Item


and Test Analysis. Texas A & M University, USA.
files.eric.ed.gov/fulltext/ED406441.pdf. Accessed on 24
June, 2014.

[12] Adegoke, B. A. (2013). Comparison of Item Statistics


of Physics Achievement Test using Classical Test and Item
Response Theory Frameworks. Journal of Education and
Practice, 4(22), 87-96.

[13] Zubairi, A. M., & Kassim, N. L. A. (2006). Classical


and Rasch Analysis of Dichotomously Scored Reading
Comprehension Test Items. Malaysian Journal of ELT
Research, 2, 1-20.

[14] Henning, G. (1987). A Guide to Language Testing:


Development, Evaluation, Research. Cambridge Mass:
Newberry House Publisher.

[15] Courville, T. G. (2004). An Empirical Comparison of


Item Response Theory and Classical Test Theory
Item/Person Statistics. Unpublished Ph.D Dissertation,
Texas A & M University.

[16] Ebel, R. L., & Frisbie, D. A. (1991). Essentials of


Educational Measurement (5th Ed). Engelwood Cliffs,
N.J: Prentice Hall.

[17] Carole L. K., & Winterstein, A. G. (2008). Validity


and Reliability of Measurement Instruments used in
Research. American Journal Health-System Pharmacy,
Vol. 65 Dec 1, 2008

Available online: http://edupediapublications.org/journals/index.php/IJSS/ P a g e | 33

You might also like