NAME
NAME
YUSOP
Norm-Referenced Assessment
Norm-referenced assessment is a type of standardized test that compares students’ performances to
one another. Norm-referenced assessments compare a student’s performance to the course median.
The “norming process” refers to calculating norm-referenced scores and the “norming group” refers
to the group that a student is compared against while a professor assigns grades.
While similar, a norm-referenced assessment and a criterion-referenced test have different goals. A
norm-referenced assessment is designed to compare a student’s performance to that of a peer group
(known as the “norm group”). Students are ranked on a continuum, showing where they stand
relative to their classmates. Results are usually reported as percentiles, standardized scores or z-
scores. An example is an SAT or ACT score where final grades reflect how students performed
compared to a representative group of test-takers. On the other hand, a criterion-referenced test
evaluates a student’s performance against a specific set of learning objectives or criteria. This test
helps determine whether a student has achieved mastery of particular skills or knowledge. Results
are reported as raw scores or levels of proficiency. An example of a criterion-referenced test is
licensing exams or placement tests that gauge whether students meet minimum competency
requirements.
What is norm-referenced assessment examples?
Standardized admissions tests including the Scholastic Assessment Test (SAT) and the American
College Testing (ACT).
Graduate Admissions Exams including the Graduate Record Examination (GRE) and the Law School
Admission Test (LSAT).
Professional Certification Exams including the National Council Licensure Examination (NCLEX) and
the United States Medical Licensing Examination (USMLE).
Placement and Language Proficiency Tests including the Test of English as a Foreign Language (TOEFL)
and International English Language Testing System (IELTS).
Definition: In norm-referenced systems students are evaluated in relationship to one another (e.g.,
the top 10% of students receive an A, the next 30% a B, etc.). This grading system rests on the
assumption that the level of student performance will not vary much from class to class.
GRADING SYSTEMS
GRADING SYSTEMS
The two most common types of grading systems used at the university level are
student performance.
1. Norm-Referenced Systems:
another (e.g., the top 10% of students receive an A, the next 30% a B, etc.). This
grading system rests on the assumption that the level of student performance will
not vary much from class to class. In this system the instructor usually determines
the percentage of students assigned each grade, although this percentage may be
Advantages:
® They work well in situations requiring rigid differentiation among students where,
for example, program size restrictions may limit the number of students
Disadvantages:
others. This may be true in a large non-selective lecture class, where we can be
fairly confident that the class is representative of the student population; but in
small classes (under 40) the group may not be a representative sample. One
student may get an A in a low-achieving section while a fellow student with the
same score in a higher-achieving section gets a B.
rather than cooperation. When students are pitted against each other for the few
Possible modification:
When using a norm-referenced system in a small class, the allocation of grades can
and Using Tests Effectively: A Guide for Faculty, 1992, describe the following ways to
use an anchor:
"If instructors have taught a class several times and have used the same or an
equivalent exam, then the distribution of test scores accumulated over many classes
can serve as the anchor. The present class is compared with this cumulative
distribution to judge the ability level of the group and the appropriate allocation of
grades. Anchoring also works well in multi-section courses where the same text,
same syllabus, and same examinations are used. The common examination can be
used to reveal whether and how the class groups differ in achievement and the grade
class for the first time and has no other scores for comparison, a relevant and well-constructed
teacher-made pretest may be used as an anchor."
competition among students as they are not as directly in competition with each
other.
® What is the expected class size? If it is smaller than 40, do not use a norm referenced system
unless we use anchoring to modify the system.
absolute scale (e.g. 95-100 = A, 88-94 = B, etc.). Normally the criteria are a set
possible that all students could get As or all students could get Ds.
Advantages:
® Students are not competing with each other and are thus more likely to actively
help each other learn. A student's grade is not influenced by the caliber of the
class.
Disadvantages:
® It is difficult to set reasonable criteria for the students without a fair amount of
teaching experience. Most experienced faculty set these criteria based on their
knowledge of how students usually perform (thus making it fairly similar to the
Possible modifications:
telling the class in advance that the criteria may be lowered if it seems
appropriate, e.g., the 95% cut off for an A may be lowered to 93%. This way if a
first exam was more difficult for students than the instructor imagined, s/he can
lower the grading criteria rather than trying to compensate for the difficulty of
the first exam with an easy second exam. Raising the criteria because too many
assigning grades based on the extent the student achieved the class objectives
(e.g., A = Student has achieved all major and minor objectives of the course. B =
Student has achieved all major objectives and several minor objectives, etc.).
® How will we determine reasonable criteria for students? When teaching the class
3
3. Other Systems: Some alternate systems of grading include contract grading, peer
objectives they can achieve, usually attaching a specified number of points for
each activity (e.g. book report = 30 points, term paper = 60 points). Students
select the activities and/or objectives that will give them the grade they want
evaluation of his/her performance. If students are told what to look for and how
to grade, they generally can do a good job. Agreement between peer and
instructor rating is about 80%. Peer grading is often used in composition classes
and speech classes. It can also be a useful source of information for evaluating
group work; knowing that group members have the opportunity to evaluate each
other’s work can go a long way in motivating peers to pull their weight on a
assessment can be a portion of the final grade. This method has educational
that the percentages of self-assessors whose grades agree with those of faculty
graders vary from 33% to 99%. Experienced students tend to rate themselves
quite similarly to the faculty while less experienced students generally give
themselves higher grades than a faculty grader. Students in science classes also
surprisingly, student and instructor assessments are more likely to agree if the
criteria for assessment have been clearly articulated. Without these shared
on the amount of work they put into a course, on the improvement they’ve seen
the instructor and student should meet to discuss the student's achievement