0% found this document useful (0 votes)
72 views

Module 3 - Measurement

This document provides an overview of measuring health outcomes in research. It defines different types of outcome measures, including predictive, discriminative, and evaluative measures. It also describes important measurement properties such as validity, reliability, sensitivity to change, and responsiveness that outcome measures should demonstrate. Specifically, it defines types of validity including face, content, criterion, and construct validity. It also describes study designs to evaluate these properties, such as known groups designs for validity and test-retest designs for reliability. Statistics to quantify these properties are discussed, including the intraclass correlation coefficient and standardized response mean.

Uploaded by

Caleb Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Module 3 - Measurement

This document provides an overview of measuring health outcomes in research. It defines different types of outcome measures, including predictive, discriminative, and evaluative measures. It also describes important measurement properties such as validity, reliability, sensitivity to change, and responsiveness that outcome measures should demonstrate. Specifically, it defines types of validity including face, content, criterion, and construct validity. It also describes study designs to evaluate these properties, such as known groups designs for validity and test-retest designs for reliability. Statistics to quantify these properties are discussed, including the intraclass correlation coefficient and standardized response mean.

Uploaded by

Caleb Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Module 3 – Measurement

Learning Objectives:
- Understand/identify different types of outcome measures and understand their limits
- Define different measurement properties
- Describe methods to evaluate different types of validity, reliability, sensitivity to change
and responsiveness
- Design a study to evaluate the measurement properties of an outcome measure

Measuring Health

What is health?
- According to WHO, it is a state of complete physical, mental, and social well-being
- It is a multi-faceted concept influenced by a person’s experiences, beliefs, expectations,
and perceptions
- It means different things to different people

ICF Model of Health


- ICF = International Classification of Functioning, Disability, and Health
- Model meant to standardize communication about health
- Health outcomes are classified according to the effect on body function and structure
(impairment; includes mental health items), limitations in activities (disability), and
participation (handicap)
- Modifiers of these outcomes are: age, coping strategies, social attitudes, education,
experience

Measuring Health in Research


- By using the ICF as a guide, use several outcome measures that can speak to specific
aspect(s) of health being affected by the intervention
- Measure QoL questionnaire, whose items include specific aspects of health that patients
have deemed important and relevant to their disease
o Not always the case that all questions are equally valued; some are more important
than others
- Good outcome measures for a study have good measurement properties AND well-
known/commonly used (ease of interpretation by others)
- Issue: too many independent outcomes = potential for multiple comparisons error

Types of Outcome Measures

Predictive Outcome Measure


- An instrument/device/method that predicts future
o E.g. MCAT predicts who is likely to perform well on licensing exam
o E.g. following an acute injury, predict who is likely to become chronic
- Design a predictive instrument using prognosis design
- Evaluate predictive validity using diagnosis design

Discriminative Outcome Measure


- An instrument/device/method that sorts individuals into groups
o E.g. x-ray (fracture present or absent)
- Evaluate validity using diagnosis design
Evaluative Outcome Measure
- An instrument/device/method that provides data on the quantity/quality of the result of
the experiment
- It is a basis for measuring the effects of the independent variable or change in dependent
variable
o E.g. pain in pre- and post-intervention
- Evaluate using longitudinal construct validity and sensitivity to change

Types of Evaluative Measures


- Surrogate Outcomes
- Patient Important Outcomes

Surrogate Outcomes
- Outcome measures that are not of direct practical importance to patients but are believed
to reflect outcomes that are important.
- Validity depends on magnitude of the association b/w surrogate and the patient important
outcome (i.e. its predictive validity)
o E.g. reduction in cholesterol as surrogate for reduction in mortality
o E.g. increased bone density as a surrogate for reduction in fracture incidence
- We use these outcomes b/c of its efficiency; changes can be measured on all patients over
a shorter time interval

Patient Important Outcomes


- Outcome measures that are of direct importance to patients
o E.g. death/survival, success/failure, patient-reported QoL
- Advantage: validity
- Disadvantage: long time interval needed to measure

Types of QoL

Measurement Properties

Validity and Reliability


- Validity (accuracy) is a measure of how close a measurement comes to the true score for
a variable
- Reliability (precision) is a measure of the extent to which repeated measurements come
up with the same value
- All outcome measures need to demonstrate validity and reliability
- Exception: evaluative measures only need to show responsiveness

Validity vs. Reliability


- Improve precision of an estimate by increasing the number of measurements taken (i.e.
regression to the mean)
o Reduces level of random error and narrows CI about the value being estimated
- Increasing precision when experiment contains systematic errors is not the solution
o Solution: calibration of instrument

Validity: the extent to which an instrument measures what it is intended to measure

Types:
- Face: the extent to which a measurement instrument appears to measure what it is
intended to measure.
- Content: the extent to which a measurement instrument represents all facets of a given
social construct.
- Criterion: examines the extent to which a measure provides results that are consistent
with a gold standard.
o Predictive: compares the measure in question with an outcome assessed at a later
time.
o Concurrent: comparison between the measure in question and an outcome assessed at
the same time.
- Construct: forming theories about the attribute of interest and then assessing the extent to
which the measure under investigation provides results that are consistent with the
theories.
o Convergent: tests the degree to which two measures of constructs that theoretically
should be related, are in fact related
o Divergent: tests whether concepts or measurements that are supposed to be unrelated
are, in fact, unrelated

Study Designs: Validity


- For known groups, one group has disease and the other doesn’t

Reliability: the extent to which an instrument yields the same results in repeated
administrations in a stable population

Study Designs: Reliability


- All require the disease to be in a stable state; measurements are repeated at least twice
- Test re-test: assumes the rater and disease are consistent and evaluates the
reproducibility of the test (patient has to perform the test)
- Inter-rater: The extent to which 2 or more raters are able to consistently differentiate
subjects with higher and lower values on an underlying trait
o assumes the test and disease are consistent and evaluated the reproducibility b/w
different raters (observations of a client or rater has to perform this test)
- Intra-rater: The extent to which a rater is able to consistently differentiate participants
with higher and lower values of an underlying trait on repeated ratings over time
o Assumes the test and disease are consistent and evaluates the reproducibility of one
rater over time (observations of a client or rater has to perform the test)

Statistics to Communicate Reliability

Relative Reliability:
- Reliability = measuring agreement, NOT association
- Cannot use Pearson/Spearman to demonstrate reliability b/c they are measures of
association and do not consider systematic differences b/w measures
- However, both Intra-class Correlation Coefficient (ICC) and Kappa do consider this
(measures of agreement)
- Ideal value = 1
- Measures that are highly associated but are systematically different will have a
correlation coefficient that is larger than the agreement statistics (A)
- Measures that are highly associated without a systematic difference will have similar
values for the correlation coefficient and agreement statistic (B)

ICC:
- Is a measure of reproducibility that compares variance b/w patients to the total variance
(b/w patient and within-patient variance)

Kappa
- Is a measure of the extent to which observers achieve agreement beyond the level
expected to occur by chance alone.
- For a binary outcome variable (0 – no agreement or 1 – perfect agreement)
- The more discordant the raters are, the lower the value of Kappa
- The weighted Kappa is for ordered categories
o Any discordant ratings will largely affect value of Kappa
Absolute Reliability: Precision – Individual Score
- Standard Error of Measurement (SEM) is a statistic for absolute reliability and is
calculated from test-re-test reliability study design
- SEM allows us to determine how certain we can be about a particular individual’s score
at a particular point in time.
- SEM = √within-client variance
- Ideally 0
- Clinician can be x % confident (x defined by confidence level chosen) that the true score
lies within the reported interval

Absolute Reliability: Real Change or Error?


- We can use SEM to determine if there has been a real change in score over time.
- We can be x % confident that a true change has occurred if the change exceeds the
reported interval, known as the Minimal Detectable Change/Difference (MDC/D), as
opposed to possibly due to random error within the measurement.
- MDC(X) = SEM * Z score for X * sqrt(2)

Sensitivity to Change
- Is the ability to detect change that isn’t necessarily meaningful change
- Many stats. for expressing this
- Standardized Response Mean (SRM) is most common
- Study Design: in a population expected to change, administer the new test pre- and post-
change
- SRM = (mean change) / (SD change)
- If SRM > 1, ‘signal/change’ could be detected over the ‘noise/variability’
- Signal = change that occurred from pre- to post-treatment
- Noise = all systematic and random errors

Responsiveness

Responsiveness: is the instruments ability to detect a clinically meaningful change


- Statistic: Minimally Clinically Important Difference (MCID)
- Sensitivity to change is necessary but insufficient condition for responsiveness
- NOTE: using wrong MCID has important implications for sample size
o Within-group: within a treatment group, every patient changes from pre- to post-
treatment
o B/w-group: the difference we want to detect in a study evaluating two different
treatments; the more similar the treatments, the smaller the expected difference b/w the
groups
o B/w-group MCID is approx. 20% of a within-group MCID

Anchor-Based Approach
- way to establish the interpretability of measures of patient-reported outcomes
- All patients are measured at Time 1 and Time 2
- B/w these times, provide an intervention that usually provides some improvement
- At Time 2, the Anchor is included = Global Rating of Change questionnaire
o Patient indicates how much better/worse they feel compared to Time 1
o Calculate average score of all patients who indicated a small but important change on
the GRC (score of 2 or 3); represents the within-group MCID for that instrument.
- If the magnitude of change in ‘better’ and ‘worse’ group are different, then averaging
score is not valid.

Distribution-Based Approach

- Approach 1
o Measure outcome at two time points in individuals not expected to change
o Calculate change scores for every participant and plot them in distribution
o Choose threshold (MCID) for classifying an individual as not having changed by an
important amount

- Approach 2
o Measure outcome at two time points in individuals expected to change by an important
amount
o Calculate change scores for every participant and plot them in distribution
o Choose threshold (MCID) for classifying an individual as having changed by an
important amount

- The score at the cut-off is the within-group MCID for that instrument

Self-Assessment

You might also like