0% found this document useful (0 votes)
9 views

Lesson 3-1

The document discusses the quality of measurement, focusing on validity and reliability. It explains the importance of measurement in clinical settings, types of measurement errors, and various approaches to assessing reliability, including test-retest, rater, alternate forms, and internal consistency. Understanding these concepts is essential for ensuring accurate data collection and drawing valid conclusions in research and clinical practice.

Uploaded by

6dn682trfm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lesson 3-1

The document discusses the quality of measurement, focusing on validity and reliability. It explains the importance of measurement in clinical settings, types of measurement errors, and various approaches to assessing reliability, including test-retest, rater, alternate forms, and internal consistency. Understanding these concepts is essential for ensuring accurate data collection and drawing valid conclusions in research and clinical practice.

Uploaded by

6dn682trfm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Quality of measurement

Quality of measures is evaluated in terms of


validity and reliability.
Reliability of Measurements
By
Dr.Eman Abdelmoez
Basic Science Department
Objectives

• To recognize the concept of measurement


• To understand the meaning of reliability
• To differentiate between different types of
reliability.
Measurement provides a mechanism for
achieving a degree of precision. So, that we
can describe physical or behavioral
characteristics according to their quantity,
degree, capacity or quality.
We can document that a patient's shoulder can flex to
75 degrees, rather than say motion is "limited," or
indicate that the air temperature is 95° F, rather than
just "hot."
We use measurement as a basis for choosing
between two courses of action. In this sense a
clinician might decide to implement one
treatment approach over another based on the
results of a comparative research study.
Clinicians use measurement as a means of
evaluating a patient's condition and response
to treatment; that is, we measure change or
progress. We also use measurements to
compare and discriminate between
individuals or groups.
For instance, a test can be used to distinguish
between children who do and do not have
learning disabilities or between different types
of learning disabilities.
• Finally, measurement allows us to draw
conclusions about the predictive
relationship between variables. We
might use grades on a college entrance
examination to predict a student's
ability to succeed in an academic
program.
• Also, We can measure the functional
status of an elderly patient to determine
the level of assistance that will be
required when the patient returns home.
What is Measurement?

• Measurement has been defined as the


process of assigning numbers to variables
to represent quantities of characteristics
according to certain rules.
Measurement Errors
• All measurements, including those derived
from questionnaires and physical tests, consist
of an error component.
• This means that values obtained from
measurements will differ from their true value
and are a product of the true value plus error:
Measurement Errors
• Measured value = true value + error

• The true value is the value that would be


gained under ideal situations when using
perfect measurement techniques.
Types of Measurement Error

systematic bias random error


Types of Measurement Error
Measurement Errors
• Both of these types of errors occur in any
measurement.
• Therefore understanding how they occur and
minimizing them as much as possible is
necessary to reduce total measurement error.
• Systematic bias refers to ‘predictable errors in
measurement’ that occur in the same
direction (under- or over-estimation) and have
the same magnitude
• For example, if active wrist extension was
measured on two occasions using a hand-held
goniometer in ten distal radius fracture
patients, systematic error would be present if
the second measurement was consistently
greater than the first, by a constant amount.
Systematic Error

• Systematic bias may occur for a variety of reasons:


• fatigue after a hand strengthening programme
could consistently influence measurements
such that the second is always lower than the
first.
• Alternatively, a learning effect, such as learning
to extend the wrist with the fingers flexed,
rather than in extension, could contribute to
successive measurements being greater than
the first .
Systematic Error
• Finally, the measurement instrument could
itself contribute to systematic bias.
• For example, electrogoniometers require
regular calibration to reduce the likelihood of
systematic measurement differences.
• Calibration is checking the accuracy
of a measurement instrument by
comparing it to reference standards
of known accuracy.
How to Reduce Systematic Error

• knowledge of systematic bias properties can


inform recommendations regarding:
• the number of ‘training’ trials required before
measurement,
• the number of repetitions of the
measurement,
• the period of time for recovery between
measurements and
• the frequency of equipment calibration.
Random Error
• Random error in contrast to systematic bias, is
not predictable and occurs owing to chance.
• Consequently, random errors differ in their
direction and magnitude between patients
and occasions of testing.
Sources of random error

■ Assessor errors associated with using the


equipment (such as placing a goniometer on
inappropriate bony landmarks)
■ Diverting from the measurement protocol
(e.g. inconsistent patient position)
■ Patient distraction
Sources of Error

Observer Instrument

Individual
Reliability of Measurements
Reliability is the extent to which a
measurement is consistent and free from
error.
Reliability can be conceptualized as
reproducibility or dependability.
• A reliable examiner is one who will be
able to measure repeated outcomes with
consistent scores.
• Similarly, a reliable instrument is one that
will perform with predictable consistency
under determined conditions.
• Reliability is fundamental to all aspects of
measurement, because without it we
cannot have confidence in the data we
collect, nor can we draw logical
conclusions from those data.
TYPES OF RELIABI LITY
Four general approaches to reliability testing:

• test-retest reliability
• rater reliability
• alternate forms reliability
• internal consistency
1-Test-Retest Reliability
1-Test-Retest Reliability
1-Test-Retest Reliability

• This estimate can be obtained for a variety of


testing tools, and is generally indicative of
reliability in situations where raters are not
involved, such as self-report survey
instruments and physical and physiological
measures with mechanical or digital readouts.
1-Test-Retest Reliability

If the test is reliable, the subject's score


should be similar on multiple trials. In
terms of reliability theory, the extent to
which the scores vary is interpreted as
measurement error.
1-Test-Retest Reliability

• Unfortunately, many variables do change over


time.
• For example, a patient's self-assessment of
pain may change between two testing
sessions.
Test-Retest Intervals
• Because the stability of a response variable is
such a significant factor, the time interval
between tests must be considered carefully.
Intervals should be far enough apart to avoid
fatigue, learning, or memory effects, but close
enough to avoid real changes in the measured
variable.
• The primary criteria for choosing an appropriate
interval are the stability of the response variable
and the test's intended purpose.
Test-Retest Intervals

For example, if we were interested in the


reproducibility of electromyographic
measurements, it might be reasonable to test the
patient on two occasions (time points) within one
week.

Range of motion measurements can often be


repeated within one day or even within a single
session.
• Within-day test–retest reliability of
the 6-min walk test in patients with
pulmonary fibrosis
• Errors from the Subjects
• Carry Over Effect
• A carry over effect, as its name implies, is an
effect on the response that “carries over”
from one condition to another.
• In other words, events in the first testing may
influence performance in the second testing.
• Errors from the Subjects
• Carry Over Effect
• In most cases, the test–retest reliability is
inflated because the subjects still remember
what they answered the first time and thus
tend to give the same answer in the second
test.
• Errors from the Subjects
• Carry Over Effect
• In some situations, the carry over effect is the
learning effect, also known as the practice
effect.
• In this case, the skill level of the subjects
improves because the first test provides them
an opportunity to practice.
Learning (practice) Effect
Carryover and Testing Effects

Sometimes subjects are given a series of


pretest trials to neutralize this effect, and
data are collected only after performance
has stabilized (Familiarization session).
Carryover and Testing Effects
A retest score can also be influenced by a
subject's effort to improve on the first
score.
This is especially relevant for variables such
as strength, where motivation plays an
important role. Researchers may not let
subjects know their first score to control
for this effect.
2-Rater Reliability

Many clinical measurements require that a


human observer, or rater, be part of the
measurement system.
In some cases, the rater is the actual
measuring instrument, such as in a manual
muscle test or joint mobility assessment.
In other situations, the rater must observe
performance and apply operational criteria to
subjective observations, as in a gait analysis or
functional assessment.
2-Rater Reliability
• Sometimes a test necessitates the
physical application of a tool, and the
rater becomes part of the instrument, as
in the use of a goniometer or taking of
blood pressure.
2-Rater Reliability
This aspect of reliability is of major importance
to the validity of any research study involving
testers, whether one individual does all the testing
or several testers are involved.
Data cannot be interpreted with confidence
unless those who collect, record and reduce the
data are reliable.
2-Rater Reliability

In many studies, raters undergo a period of


training, so that techniques are
standardized.
This is especially important when measuring
devices are new or unfamiliar, or when
subjective observations are used.
2-Rater Reliability

Even when raters are experienced, however,


rater reliability should be documented as part of
the research protocol.
2-Rater Reliability
• To establish rater reliability the instrument and
the response variable are considered stable, so
that any differences between scores are
attributed to rater error.
2-Rater Reliability

a-Intra-rater b-Inter-rater
A-Intrarater reliability

Intrarater reliability refers to the stability


of data recorded by one individual across
two or more trials.
A-Intrarater reliability

Reliability is best established with


multiple trials (more than two).
• What is the difference between Test-test
Reliability and Intra-rater Reliability?
Intrarater reliability
• In a test-retest situation, when a rater's
skill is relevant to the accuracy of the
test, intrarater reliability and test-retest
reliability are essentially the same
estimate.
• The effects of rater and the test cannot
be separated out.
A-Intrarater reliability
• Rater Bias: We must also consider the
possibility for bias when one rater takes
two measurements. Raters can be
influenced by their memory of the first
score.
• This is most relevant in cases where human
observers use subjective criteria to rate
responses, but can operate in any situation
where a tester must read a score from an
instrument.
A-Intrarater reliability
• The most effective way to control for this
type of error is to blind the tester in
some way, so that the first score remains
unknown until after the second trial is
completed;
• however, as most clinical measurements
are observational, such a technique is
often unreasonable.
A-Intrarater reliability
• For instance, we could not blind a clinician to
measures of balance, function, muscle testing
or gait where the tester is an integral part of
the measurement system.
• The major protections against tester bias are
to develop grading criteria that are as
objective as possible, to train the testers in the
use of the instrument, and to document
reliability across raters.
A-Intrarater reliability
B-Inter-rater Reliability
Interrater reliability concerns variation between
two or more raters who measure the same group
of subjects.
Even with detailed operational definitions and
equal skill, different raters are not always in
agreement about the quality or quantity of the
variable being assessed.
B-Inter-rater Reliability
• Interrater reliability is best assessed
when all raters are able to measure a
response during a single trial, where they
can observe a subject simultaneously and
independently.
• This eliminates true differences in scores
as a source of measurement error when
comparing raters' scores.
B-Inter-rater Reliability

Videotapes of patients performing


activities have proved useful for allowing
multiple raters to observe the exact same
performance.
Simultaneous scoring is not possible,
however, for many variables that require
interaction of the tester and subject.
B-Inter-rater Reliability

• For example, range of motion and


manual muscle testing could not be
tested simultaneously by two clinicians.

• With these types of measures, rater


reliability may be affected if the true
response changes from trial to trial.
• For instance, actual range of motion may
change if the joint tissues are stretched
from the first trial. Muscle force can
decrease if the muscle is fatigued from
the first trial.
B-Inter-rater Reliability
• If interrater reliability of measurement has not
been established, we cannot assume that
other raters would have obtained similar
results.
• This, in turn, limits the application of the
findings to other people and situations.
• Inter-rater and intra-rater reliability of Kinovea
software for measurement of shoulder range
of motion
B-Inter-rater Reliability

• Intrarater reliability should be established for


each individual rater before comparing raters
to each other.
3-Alternate Forms Reliability
Many measuring instruments exist in two or
more versions, called equivalent, parallel or
alternate forms. Interchange of these alternate
forms can be supported only by establishing
their parallel reliability.
3-Alternate Forms Reliability

Standardized tests such as the professional


licensing exams or intelligence tests, which are
given several times a year, each time in a different
form.
These different versions of the tests are
considered reliable alternatives based on their
statistical equivalence.
3-Alternate Forms Reliability

This type of reliability is established by


administering two alternate forms of a test to
the same group, usually in one sitting, and
correlating paired observations.
• PLoS One. 2016 Jul 21;11(7):e0159204.
• Reliability and Validity of the Anterior
Knee Pain Scale: Applications for Use as
an Epidemiologic Screener.
Ittenbach RF, Huang G, Barber Foss KD, Hewett
TE, Myer GD
Instrument
• The Kujala AKPS is a 13-item screening
instrument designed to assess patellofemoral
pain in adolescents and young adults, with a
variable ordinal response format.
• For example, a ‘Limp’ score would be scored
as follows: none (5), slight/periodic (3),
constant (0). Total scores range from 0 to 100.
• Myer et al. have offered a 6-item short form
based on simplified, dichotomously recoded
items. As such, a recoded ‘Limp’ score would
be recoded as: none (0), slight/periodic/
constant (1).
• Conclusion
Current AKPS data using the reduced 6-item
form appears to offer highly similar reliability
indices to the original but longer 13-item form
when either the ordinal or the dichotomized
response option formats are considered.
4-Internal Consistency
Internal consistency assesses the correlation
between multiple items in a test that are
intended to measure the same concept.
You can calculate internal consistency without
repeating the test or involving other
researchers, so it’s a good way of assessing
reliability when you only have one data set.
4-Internal Consistency

• Internal consistency or homogeneity reflects


the extent to which items measure various
aspects of the same characteristic and nothing
else.
4-Internal Consistency
To measure patient satisfaction, you could
create a questionnaire with a set of statements
that respondents must agree or disagree with.
Internal consistency tells you whether the
statements are all reliable indicators of patient
satisfaction.
4-Internal Consistency

Software instruments, such as questionnaires,


written examinations and interviews are ideally
composed of a set of questions or items
designed to measure particular knowledge or
characteristic.
4-Internal Consistency

• For example, if a professor gives an exam to


assess students' knowledge of research
design, the items should reflect a summary of
that knowledge; the test should not include
items on anthropology or health policy.
Internal Consistency

If we assess a patient's ability to perform daily


tasks using a physical function scale, then the
items on the scale should relate to aspects of
physical function only. If some items evaluated
psychological or social characteristics, then the
items would not be considered homogeneous.
• Reliability tells you how consistently a method
measures something. When you apply the
same method to the same sample under the
same conditions, you should get the same
results.
• If not, the method of measurement may be
unreliable or bias.
• There are four main types of reliability.
• Each can be estimated by comparing different
sets of results produced by the same method.
Telerehabilitation for People With Physical
Disabilities and Movement Impairment: A
Survey of United Kingdom Practitioners
• A cross-sectional online survey was conducted.
• The questionnaire included a combination of
closed response (tick box, multiple choice) and
open response (free text) questions.
• A combination of opportunity and snowball
sampling were used; potential participants were
identified from contacts and networks of the
research team, and these participants were in
turn asked to forward the survey to other
potential participants.
Question1:
• This study is considered a:
A-quantitative research
B-Qualitative research
C-Mixed-method research
Question2 (T-F):
• This study is an example of follow up research

Question3 (T-F):
• In this study probability sampling method was
used.
• The reliability of this questionnaire was assessed
by asking participants to repeat the
questionnaire two weeks after the first attempt.

Question:
• What is the type of examined reliability?

You might also like