Standardized and Non New
Standardized and Non New
COLLEGE OF
NURSING JAGDALPUR
STANDARDIZED TEST:- Standardized test means only all the student answer
the same questions and a large number of question under uniform direction, uniform time
limits and there is a uniform or standard reference group to the performance can be
compared. Standardized test are develop with the help of professional writers reviewers and
editors of test items. It is based on content and objectives common to many schools
throughout the country.
Any test in which the same test is given in the same manner to all test takers, and graded in
the same manner for everyone, is a standardized test. Standardized tests do not need to
be high-stakes tests, time-limited tests, or multiple-choice tests. The questions can be simple
or complex. The subject matter among school-age students is frequently academic skills, but
a standardized test can be given on nearly any topic, including driving tests,
creativity, personality, professional ethics, or other attributes.
Most everyday quizzes and tests taken by students typically meet the definition of a
standardized test: everyone in the class takes the same test, at the same time, under the same
circumstances, and all of the students are graded by their teacher in the same way. However,
the term standardized test is most commonly used to refer to tests that are given to larger
groups, such as a test taken by all adults who wish to acquire a license to have a particular
kind of job, or by all students of a certain age.
scoring, administering and interpreting the test results. A standardized test is generally made
Standardized tests are not restricted to use in a school or a few schools but to larger
population, so that many schools can use such types of tests to assess their own performance
etc. in relation to others and the general population for which the test has been standardized.
DEFINITION:-
The definition of a standardized test has changed somewhat over time. In 1960, standardized
tests were defined as those in which the conditions and content were equal for everyone
taking the test, regardless of when, where, or by whom the test was given or graded. The
purpose of this standardization is to make sure that the scores reliably indicate the abilities or
skills being measured, and not other things, such as different instructions about what to do if
the test taker does not know the answer to a question.
By the beginning of the 21st century, the focus shifted away from a strict sameness of
conditions towards equal fairness of conditions. For example, a test taker with a broken wrist
might write more slowly because of the injury, and it would be more fair, and produce a more
reliable understanding of the test taker's actual knowledge, if that person were given a few
more minutes to write down the answers to a most test. However, if the purpose of the test is
to see how quickly the student could write, then this would become a modification of the
content, and no longer a standardized test.
blue-print. Relevant items are included and irrelevant items are omitted, giving
one). Test score are interpreted with reference to norms. Derivation of norms
interpreting the test scores. These norms are frequently based on age, grade, sex, etc.
4. Information needed for judging the value of the test is provided. Before the test
for ascertaining the level of intellectual ability-strength and weakness of the pupils.
3. These tests are useful in diagnosing the learning difficulties of the students.
4. It helps the teacher to know the casual factors of learning difficulties of the students.
5. Provides information’s for curriculum planning and to provide remedial coaching for
6. It also helps the teacher to assess the effectiveness of his teaching and school instructional
programs.
7. Provides data for tracing an individual’s growth pattern over a period of years.
9. Evaluates the influences of courses of study, teacher’s activities, teaching methods and
LIMITATION:-
Test are often ambiguous and unclear.
the area of behavior being evaluated or the use to be made of the result, all of the
various test are procedure use for evaluation of program should posses certain
Validity
Reliability
Objectivity
Usability
VALIDITY:-The validity of a test is the degree to which it measures what it is intend to
measure. This is the most important aspect of a test. Validity is relative, in that haw valid is
the test the concept of degree of validity is moderately valid or highly valid etc… validity is
always specific for a particular test. To be valid a measuring instrument must be both relevant
Purpose:- A test validity is established in reference to a specific purpose, the test may not
be valid for different purposes. For example the test you use to make valid prediction about
some one technical proficiency on the job may not be valid for predicting his leadership skills
or absenteeism rate.
Types of validity:-
Three types of validity have been identified and used in educational and psychological
Content validity.
Construct validity.
Formative validity
Content validity:- Content validity may be defined as the extent to which a test measures
a representative sample of the subject matter content and behavioral changes under
consideration. The content of a course will include both subject matter content and
instructional objectives and behavioral changes validity also means that the measurement
include attitude, interest, personal, and social adjustment of student. The focus of content
validity, then , is on the adequacy of the sample and not on the appearance of a test(face
validity).
In order to make sure that content validity is obtained, Grounlund recommend certain
(a) The major subject matter content and behavioral changes expected are listed from the
formulated objectives.
(b) These subject matter topics and types of behavioral changes expected are weighted in
(c) Prepare a table of specification from the weighted content and behavioural changes.
(d) Construct the achievement test in accordance with the table of specification. The
closer the test corresponds to the specification indicated in the table the higher the
Criterion related validity:- The criterion related validation may be defined as the
process to determine the extent to which test performance is related to some other valued
measurement tool and the subject actual behavior is related. Two forms of criterion are
criteria available at the same time in the present situation. Concurrent validity
diagnosis existing status of the individual rather than prediction about its future
outcome.
2. Predictive validity:- It refers to the degree of correlation between the measure of
concept and some future measure of the same concept. Predictive validity is extent to
which a test can predict the future performance of the students, this test are used for
Construct validity:- Construct validity may be defined as the extent to which test
performance can be interpreted in terms of certain psychological constructs. The process
of construct validation involves identifying and classifying the factors which influence
Face validity:- It ascertains that the measure appears to be assessing the intended
construct under study. The stakeholders can easily assess face validity. Although this is
assessment of the ability, they may become disengaged with the task.
2. If reading vocabulary is poor, the student fails to replay to the test item even if they
3. Unclear difficult sentence may be difficult to comprehend for the reader, will affect
5. Medium of expression.
answer.
1. Careful matching the test with learning objectives, contents, and teaching methods.
2. Increasing the sample of objectives and content areas included in any given test.
RELIABILITY
Reliability has to do with the quality of measurement. In its everyday sense, reliability is the
"consistency" or "repeatability" of your measures. Before we can define reliability precisely
we have to lay the groundwork. First, you have to learn about the foundation of reliability,
the true score theory of measurement. Along with that, you need to understand the
different types of measurement error because errors in measures play a key role in degrading
reliability. With this foundation, you can consider the basic theory of reliability, including a
precise definition of reliability. There you will find out that we cannot calculate reliability --
we can only estimate it. Because of this, there a variety of different types of reliability that
each have multiple ways to estimate reliability for that type. In the end, it's important to
integrate the idea of reliability with the other major criteria for the quality of measurement --
validity -- and develop an understanding of the relationships between reliability and validity
in measurement.
DEFINITION OF RELIABILITY
“Reliability is the degree of consistency of a measure. A test will be reliable when it gives the
same repeated result under the same condition.”
“The probability that an item will perform a required function without failure under stated
condition for a stated period of time.”
Split- half method:- Largest reliability Coefficient reported for a given tests. Estimate
inflated by factor such a speed is another subtype of internal consistency reliability.
The process of obtaining split- half reliability is begun by” splitting in half” all items
of a test that are intended to probe the same area of knowledge (e.g. World War II) in
order to form two sets of items. The entire test is administered to a group of
individuals the total score for each sat is computed and finally the split half reliability
is obtained by determining the correlation between the two total set scores.
Test-retest method:- Medium to large reliability Coefficient reported for a given test
and it becomes smaller as time interval between test is increased. It is a measure of
Reliability obtained by administering the same test twice over a period of time to a
group of individuals. The scores from time 1 and time 2 can then be correlated in
order to evaluate the test for stability overtime.
Equivalent phones method without time interval medium to large reliability
Coefficient reported for a given test.
Equivalent forms method with time interval smallest reliability Coefficient reported for a
given test become smaller as time interval between forms is increased.
Types of reliability
1. Scorer reliability
2. Content reliability
3. Temporal relliablity
1. Scorer reliability:- Concern itself with the degree of arrangement between two
scorer of the same test answer. It deals both the degree of consistency in grading the
same test answer by the same scorer on two different occasion and this is also called
name suggests it concerns itself with the stability off the result of a test over time, for
Some intrinsic and some extrinsic factors have been identified to affect the reliability of test
scores.
the test contains, the greater will be its reliability and vice-versa.
Logically, the more sample of items we take of a given area of knowledge, skill and the like,
However, it is difficult to ensure the maximum length of the test to ensure an appropriate
value of reliability. The length of the tests in such case should not givr rise to fatigue effects
in the test, etc. thus, it is advisable to use longer tests rather than shorter tests.shorter tests are
less reliable.
(ii) Homogeneity of items:
Homogeneity of items has two aspects: item reliability and the homogeneity of traits
measured from one item to another. If the items measure different functions and the inter-
correlations of items are ‘zero’ or near to it, then the reliability is ‘zero’ or very low and vice-
versa.
(iii) Difficulty value of items:
The difficulty level and clarity of expression of a test item also affect the reliability of test
scores. If the test items are too easy or too difficult for the group members it will tend to
produce scores of low reliability. Because both the tests have a restricted spreads of scores.
(iv) discriminative value:
When items can discriminate well between superior and inferior, the item total-correlation is
give rise to difficulties in understanding the questions and the nature of the response expected
type, the scores will vary from one situation to another. Mistake in him give rise to mistake in
(i) Group variability:- when the group of pupils being tested is homogeneous
in ability , the reliability of the test scores is likely to be lowered and vice-
versa.
(ii) Guessing and chance errors:- Guessing in test gives rise to increased
error variance and s such reduces reliability. For example, in teo-alternative
terms of guessing.
(iii) Environmental conditions:- As far as practicable, testing environment
should be uniform. Arrangement should be such that light, sound, and other
comforts should be equal to all test, otherwise it will affect the reliability of
to change it are the factor which may affect the reliability of test score.
8. When using less reliable methods increase the number of question observation or
examination time.
Reliability and validity are often confused; the terms describe two inter-related but
completely different concepts. Very simply:
Validity: does the test actually measure what it’s supposed to?
Reliability: does the test consistently give the same result under the same conditions?
This difference is best described with an example:
A researcher devises a new test that measures IQ more quickly than the standard IQ test:
If the test consistently delivers scores of 135, and the candidate’s true IQ is 120, the
test is reliable but not valid.
If the new test delivers scores for a candidate of 87, 65, 143 and 102, then the test is
not reliable OR valid. It doesn’t measure what it’s supposed to, and it does so
inconsistently!
If the scores are 100, 111, 132 and 150, then the validity and reliability are also low.
However, the distribution of these scores is slightly better than above, since it
surrounds the true score instead of missing it entirely. Such a test is likely suffering
from extreme random error.
If the researcher's test delivers a consistent score of 118, then that’s pretty close, and
the test can be considered both valid and reliable. The closer to 120, the more valid,
and the smaller the variation between repeat scores, the higher the reliability. A test
that routinely underestimates IQ by two points can be as useful as a more valid test
since the error itself is so reliable.
validity. A test can be reliable but not valid, whereas a test cannot be valid yet unreliable. A
test that is extremely unreliable is essentially not valid either. A bathroom scale that measures
your weight one day as 5000 kg and the next day as 2 kg is not unreliable, it merely is not
OBJECTIVITY
Objectivity in measurement helps to increase test validity and reliability. This is the extent to
which independent and competent examiners agree on what constitutes a good answer for
each of the items of a measuring instrument. Most standardized test of aptitude and
Advantages of objectivity
Objectivity test are more reliable.
They enable to more extensive survey in a given time than can be obtained by any
Disadvantages of objectivity
More time is required to prepare good objective question.
USABILITY
Usability testing is a technique used in user-centered interaction design to evaluate a
gives direct input on how real users use the system. This is in contrast with usability
inspection methods where experts use different methods to evaluate a user interface without
involving users.
Usability testing focuses on measuring a human-made product's capacity to meet its intended
purpose. Examples of products that commonly benefit from usability testing are food,
consumer products, web sites or web applications, computer interfaces, documents, and
devices. Usability testing measures the usability, or ease of use, of a specific object or set of
objects, whereas general human–computer interaction studies attempt to formulate universal
principles.
Rather than showing users a rough draft and asking, "Do you understand this?", usability
testing involves watching people trying to use something for its intended purpose. For
example, when testing instructions for assembling a toy, the test subjects should be given the
instructions and a box of parts and, rather than being asked to comment on the parts and
materials, they are asked to put the toy together. Instruction phrasing, illustration quality, and
the toy's design all affect the assembly process.
ERROR OF MEASUREMENT
Observational error (or measurement error) is the difference between a measured value of
a quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent
part of the results of measurements and of the measurement process.
Measurement errors can be divided into two components: random error and systematic error.
Random errors are errors in measurement that lead to measurable values being inconsistent
when repeated measurements of a constant attribute or quantity are taken. Systematic errors
are errors that are not determined by chance but are introduced by an inaccuracy (involving
either the observation or measurement process) inherent to the system. Systematic error may
also refer to an error with a non-zero mean, the effect of which is not reduced
when observations are averaged.
Every time we repeat a measurement with a sensitive instrument, we obtain slightly different
results. The common statistical model used is that the error has two additive parts:
1. Systematic error which always occurs, with the same value, when we use the
instrument in the same way and in the same case
2. Random error which may vary from observation to another.
Systematic error is sometimes called statistical bias. It may often be reduced with
standardized procedures. Part of the learning process in the various sciences is learning how
to use standard instruments and protocols so as to minimize systematic error.
Random error (or random variation) is due to factors which cannot or will not be controlled.
Some possible reason to forgo controlling for these random errors is because it may be too
expensive to control them each time the experiment is conducted or the measurements are
made. Other reasons may be that whatever we are trying to measure is changing in time
(see dynamic models), or is fundamentally probabilistic (as is the case in quantum mechanics
— see Measurement in quantum mechanics). Random error often occurs when instruments
are pushed to the extremes of their operating limits. For example, it is common for digital
balances to exhibit random error in their least significant digit. Three measurements of a
single object might read something like 0.9111g, 0.9110g, and 0.9112g
NON-STANDARDIZED TESTS
INTRODUCTION :- Non-standardized assessment looks at an individual's
performance, and does not produce scores that allow us to compare that
performance to another's. It allows us to obtain specific information about the
student, and this can be in different formats.
A non-standardized test is usually flexible in scope and format, variable in difficulty and
significance. Since these tests are usually developed by individual instructors, the format and
difficulty of these tests may not be widely adopted or used by other instructors or institutions.
A non-standardized test may be used to determine the proficiency level of students, to
motivate students to study, and to provide feedback to students. In some instances, a teacher
may develop non-standardized tests that resemble standardized tests in scope, format, and
difficulty for the purpose of preparing their students for an upcoming standardized test.
[4]
Finally, the frequency and setting by which a non-standardized tests are administered are
highly variable and are usually constrained by the duration of the class period. A class
instructor may for example, administer a test on a weekly basis or just twice a semester.
Depending on the policy of the instructor or institution, the duration of each test itself may
last for only five minutes to an entire class period.
Another term for non-standardized testing is informal testing. These tests are
classroom tests and are usually developed by the teacher as opposed to some group
of outside testers. These classroom tests assess students' learning over a period of
time or after a particular unit of study. A score of 80% on a multiple choice test
after reading a short story is a non-standardized score because it does not tell us
how the student did in relation to his peers.
CRITERION-REFERENCED MEASUREMENT
With portfolios the student gathers his work over a period of time, and the teacher
will evaluate the work based on a scoring guideline. The student is encouraged to
reflect on his work, which enhances the learning process. Performance exams are
tests given to all students and are based on students performing some task, like
writing an essay, or giving an oral presentation. These tasks are created by the
teachers who teach the students, and so the exams drive the curriculum. It makes
more sense for those doing the teaching to create the tests.
Parents and the community have a right to know how students are doing; therefore,
non-standardized tests need to show how well schools and students are doing.
Teachers are constantly assessing their students, and by doing so they are
constantly adjusting and changing their teaching to meet individual students' needs.
There can still be accountability with non-standardized assessment that provides
parents, local officials, and state officials with the information needed. Teachers
can be in constant touch with parents through the Internet, by calling, by parent
conferences and by sending home progress reports and samples of work.
The key questions to ask with any kind of assessment is, "What is the purpose of
this assessment?" and "Is this purpose meaningful and worthwhile?" If these
questions are constantly referred to and constantly addressed then the assessment
in itself is important, and this helps teachers address what is important to learn. It's
a kind of backwards design. Ultimately the goal is to help students to learn, and to
help them to learn the information and the skills that are important.
SUMMARY :-
Standardized test means only all the student answer the same questions and a large number
of question under uniform direction, uniform time limits and there is a uniform or standard
reference group to the performance can be compared. Standardized test are develop with the
help of professional writers reviewers and editors of test items. It is based on content and
objectives common to many schools throughout the country.
CONCLUSION :-
At the end of my presentation student learner will able to answer the standardized
and non-standardized tests its meaning , definition, characteristics, steps of tests ,
advantages, disadvantages, types , validity, objectivity, usability, reliability, errors,
forms, success , accountability, criteria etc.
BIBLIOGRAPHY
1) R Sudha, Nursing Education, Nursing Education principal
and concept. 1st Edition. Page no. 255-262. Jaypee Brothers
medical publisher (P) Ltd.
2) Shebeer P. Basheer. Nursing education. Text book of
Nursing Education.1st Edition. Page no.227-231.
WEBOGRAPHY
1. http://www.edglossary.org
2. http://en.m.wikipedia.org
3. http://www.whitbyschool.org
4. http://standardizedtests.procon.org
5. http://classroom.synonym.com
6. http://www.ncbi.nlm.nih.gov
7. www.therapyconnect.amaze.org.au