Early Prediction of Poststroke Rehabilitation Outcomes Using Wearable Sensors
Early Prediction of Poststroke Rehabilitation Outcomes Using Wearable Sensors
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
Revised Date: November 13, 2023
Accepted Date: December 3, 2023
RI
Authors: Megan K. O’Brien, PhD1,2*; Francesco Lanotte, PhD1,2*; Rushmin Khazanchi, BA3*; Sung Yul
Shin, PhD1,2; Richard L. Lieber, PhD2,4,5; Roozbeh Ghaffari, PhD4,6; John A. Rogers, PhD4,6,7,8; Arun
Jayaraman, PT, PhD1,2†
SC
* These authors should be considered co-first authors
† Corresponding Author
U
AN
1
Max Nader Lab for Rehabilitation Technologies and Outcomes Research, Shirley Ryan AbilityLab,
Chicago, IL, USA
2
Department of Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL, USA
M
3
Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
D
TE
EC
RR
CO
N
U
The Author(s) 2024. Published by Oxford University Press on behalf of the American Physical
Therapy Association.
4
Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
5
Shirley Ryan AbilityLab, Chicago, IL, USA
6
Querrey Simpson Institute for Bioelectronics, Northwestern University, Evanston, IL, USA
PT
7
RI
Department of Neurological Surgery, Northwestern University Feinberg School of Medicine,
Northwestern University, Chicago, IL, USA
SC
Address all correspondence to Dr Jayaraman at: [email protected].
Keywords: Outcome Assessment (Health Care); Gait; Balance; Biomedical Engineering; Decision
U
Making: Computer-Assisted; Technology Assessment: Biomedical; Inpatients; Patient Care Planning;
AN
Rehabilitation; Prognosis
M
D
TE
EC
RR
CO
N
U
1
1 Abstract
2 Objectives. Inpatient rehabilitation represents a critical setting for stroke treatment, providing
3 intensive, targeted therapy and task-specific practice to minimize a patient’s functional deficits
4 and facilitate their reintegration into the community. However, impairment and recovery vary
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
5 greatly after stroke, making it difficult to predict a patient’s future outcomes or response to
RI
6 treatment. In this study, the authors examined the value of early-stage wearable sensor data to
SC
7 predict 3 functional outcomes (ambulation, independence, and risk of falling) at rehabilitation
8 discharge.
U
9 Methods. Fifty-five individuals undergoing inpatient stroke rehabilitation participated in this
AN
10 study. Supervised machine learning classifiers were retrospectively trained to predict discharge
11 outcomes using data collected at hospital admission, including patient information, functional
M
12 assessment scores, and inertial sensor data from the lower limbs during gait and/or balance tasks.
D
13 Model performance was compared across different data combinations and benchmarked against a
TE
15 Results. For patients who were ambulatory at admission, sensor data improved predictions of
EC
16 ambulation and risk of falling (with weighted F1-scores increasing by 19.6% and 23.4%,
RR
18 benchmark model without sensor data. The best-performing sensor-based models predicted
CO
19 discharge ambulation (community vs. household), independence (high vs. low), and risk of falling
20 (normal vs. high) with accuracies of 84.4%, 68.8%, and 65.9%, respectively. Most
N
21 misclassifications occurred with admission or discharge scores near the classification boundary.
U
22 For patients who were non-ambulatory at admission, sensor data recorded during simple balance
23 tasks did not offer predictive value over the benchmark models.
2
24 Conclusion. These findings support the continued investigation of wearable sensors as an
26 Impact. Accurate, early prediction of poststroke rehabilitation outcomes from wearable sensors
27 would improve our ability to deliver personalized, effective care and discharge planning in the
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
28 inpatient setting and beyond.
RI
29
SC
U
AN
M
D
TE
EC
RR
CO
N
U
3
30 Introduction
31 Stroke is a leading cause of disability worldwide.1 Following initial treatment, many stroke
32 survivors are admitted to an inpatient rehabilitation facility (IRF), for ongoing medical care and
33 targeted, intensive, multidisciplinary therapy in the early stages after stroke. A primary goal of IRF
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
34 rehabilitation is to maximize neural and functional recovery to help patients reintegrate into the
RI
35 community upon discharge.2 However, not all individuals have the same potential for recovery.
SC
36 Patients achieve widely varying levels of function after initial treatment, with some returning to
37 pre-morbid function and others retaining severe deficits that require additional short- or long-term
U
38 care.3
AN
39
40 Starting at IRF admission, clinicians must plan when the patient will be discharged from the
M
41 hospital, where they can be safely discharged (ie, to their home with or without caregiver
D
42 assistance, or to a skilled nursing facility for ongoing rehabilitative care), and how to structure
TE
43 therapy to optimize a patient’s overall discharge disposition. In the US, the average IRF length of
44 stay has decreased to 12.9 days for patients with Medicare,4 giving clinicians, patients, and families
EC
45 only a brief window to design short-term care strategies and post-discharge plans suited to the
RR
46 patient’s needs (eg, seeking and training caregivers, making home modifications or alternative
47 living arrangements, ordering assistive devices). Early, objective, and accurate predictions of a
CO
48 patient’s functional recovery would help clinicians, patients, and families plan appropriate
50
U
51 Numerous research models have been proposed to predict stroke recovery.5,6 Many of these models
52 use exclusively information available from electronic medical records (EMRs), including patient
4
53 demographics and clinical information.7-9 While such models lend themselves to simple and
54 relatively undemanding clinical implementation, their resolution may not detect subtle differences
55 between patients, leading more often to rules of thumb about recovery rather predicting specific
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
57 stimulation or brain imaging, could improve prediction resolution and accuracy,10-13 but these
RI
58 measures are costly and not often available in rehabilitation settings, posing barriers to clinical
SC
59 uptake.
60
U
61 Non-invasive wearable sensors show promise for capturing biomarkers of disease and recovery,
AN
62 by mining patterns from continuous, high-resolution physiological or behavioral data.14,15 We
63 previously demonstrated that data from inertial measurement units (IMUs), recorded during a brief
M
64 walking bout within a week of IRF admission, improved prediction of ambulation ability at
D
65 discharge compared to traditional functional assessments and other patient descriptors.16 However,
TE
66 a patient’s discharge disposition depends on different abilities, such as navigating their home
67 environment and performing activities of daily living safely and independently. Therefore, we
EC
68 propose 3 functional outcomes for prediction models that may be considered broadly
RR
69 representative of these attributes: the 10-Meter Walk Test (10MWT; ambulation), Functional
70 Independence Measure score (FIM; independence, specifically related to motor tasks), and the
CO
71 Berg Balance Scale (BBS; risk of falling). To enhance the clinical value of model predictions, we
73 moderate-to-severe impairment. Finally, while in our previous work we used sensor data solely
U
74 from walking tasks, here the recorded activities also encompassed simple balance tasks.
5
75 Consequently, incorporating a non-ambulatory population into our approach expands our insights
76 into the potential of sensor-based prediction models for a broader range of patients and IMU data.
77
78 The objectives of the present study were to expand our early-stage prognostic models to predict 3
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
79 poststroke functional outcomes (ambulation, independence, and risk of falling) at IRF discharge
RI
80 for both patients who are ambulatory and patients who are non-ambulatory using data recorded at
SC
81 admission, and evaluate the ability of IMU data to predict each of these 3 outcomes. We
82 hypothesized that incorporating lower-limb IMU data would improve the prediction of discharge
U
83 outcomes relative to models trained on clinician-scored functional assessments and demographic
AN
84 and clinical patient information alone.
85
M
86 Materials and Methods
D
87 Participants
TE
88 Fifty-five patients were recruited from the inpatient rehabilitation unit of the Shirley Ryan
89 AbilityLab (Chicago, IL, USA). Inclusion criteria were having a primary diagnosis of stroke, being
EC
90 aged at least 18 years, and able and willing to give consent and follow study directions. Exclusion
RR
92 powered, implanted cardiac device for monitoring or supporting heart function. Medical clearance
CO
93 was obtained from the primary physician prior to participation. All individuals (or a proxy)
94 provided written informed consent, and the study was approved by the Institutional Review Board
N
96
97 Experimental Protocol
6
98 Data were collected from patients at 2 timepoints: within one week of IRF admission and within
99 one week prior to discharge. At each timepoint, participants completed a series of standardized
100 functional assessments, including the 10MWT, BBS, 6-Minute Walk Test (6MWT), and Timed
101 “Up & Go” test (TUG). FIM scores were extracted from the patient’s EMR at each timepoint. All
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
102 assessments were administered and scored by a licensed physical therapist. Assessments that could
RI
103 not be completed were scored as zero. Patient information – including demographics, pre-morbid
activity level, and stroke characteristics – were obtained from the EMR and a study intake form.
SC
104
105
U
106 Sensor data were collected from 3 flexible, wireless IMUs (BioStampRC; MC10, Inc., Cambridge,
AN
107 MA) during the functional assessments. These devices were attached to the lumbar region (L4-L5
108 level) and each ankle (proximal to the lateral malleolus, along the mid-sagittal line), using an
M
109 adhesive film (Tegaderm; 3M, St. Paul, MN, USA). They recorded triaxial signals from an
D
110 accelerometer (sensitivity ±4g) and a gyroscope (sensitivity ±2000°/s) sampled at 31.25 Hz.
TE
111
113 We divided participants into 2 groups based on their walking status at IRF admission. Patients who
RR
114 were ambulatory (N = 43) were individuals who could complete at least one walking assessment
115 at admission (10MWT, 6MWT, or TUG) with no more than moderate assistance from a physical
CO
116 therapist. Patients who were unable to complete all the walking assessments at admission were
118
U
119 To establish a simple yet inclusive set of physical activities to capture potential biomarkers of
120 recovery across these 2 groups, we narrowed the sensor analysis to a single walking task that could
7
121 be completed by most participants who are ambulatory, and a series of non-ambulatory tasks that
123
124 For the walking task, we selected a single trial of the 10MWT at self-selected velocity, which we
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
125 previously found to be predictive of ambulation discharge outcomes among individuals who are
RI
126 ambulatory.16 In our present dataset, 33 patients who were ambulatory had IMU data during the
SC
127 10MWT (Fig. 1).
128
U
129 For the non-ambulatory tasks, we selected the first 4 items of the BBS (standing unsupported for
AN
130 up to 2 minutes, sitting unsupported for up to 2 minutes, stand-to-sit transition, and sit-to-stand
131 transition), which are among the least demanding and had a high completion rate among all patients
M
132 (Suppl. Fig. 1). In our dataset, 8 patients who were non-ambulatory and 42 patients who were
D
133 ambulatory had IMU data during these 4 tasks (Fig. 1).
TE
134
135 Annotated sensor data for each task were cleaned by removing duplicate timestamps and
EC
136 resampling to the expected sampling frequency (31.25 Hz) using spline interpolation. Data
RR
137 processing, filtering, and subsequent feature extraction were completed in MATLAB R2017b
139
141 Features are measurable, independent variables used as input to a machine learning algorithm to
U
142 make predictions. Three feature categories were defined in this study: patient information (PI),
143 functional assessment scores (FA), and wearable sensor (IMU) data. To reduce dimensionality of
8
144 the feature space and increase robustness to sensor placement, IMU features were computed from
145 the Euclidean norm of the triaxial accelerometer and gyroscope signals. IMU features for the BBS
146 were supplemented with measures of postural sway, computed from the mediolateral and
147 anteroposterior axes of the lumbar sensor (Suppl. Tab. 1). We applied one-hot encoding to
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
148 categorical variables to prevent ordinality issues. Supplementary Table 2 summarizes
RI
149 characteristics of the PI and FA features for our different training and testing datasets.
SC
150
151 Combinations of these feature categories were used to train prediction models, creating 3 different
U
152 types of models for comparison: a benchmark model (PI+FA, no sensor data) including both
AN
153 patient information and functional assessments, a streamlined sensor model (PI+IMU) including
154 easily obtained patient information, and a comprehensive model (PI+FA+IMU) including all
M
155 feature types. The PI+FA benchmark served as a comparative point of reference to determine the
D
156 impact of sensor data on predicting each discharge outcome.
TE
157
159 We trained separate supervised learning classifiers to predict 3 different discharge outcomes:
RR
160 ambulation, independence, and risk of falling (Fig. 2). For each outcome, we defined 2 classes of
161 patient function at discharge; namely, household vs. community ambulators (based on 10MWT
CO
162 score17,18), low vs. high independence (based on FIM motor sub-score19,20), and high vs. normal
164
U
165 Classifiers were developed using the Scikit-Learn (0.23.2) library in Python 3.8.8. We selected
166 L1-penalized logistic regression (L1-LR) given its ability to handle the high dimensionality,
9
167 relatively small sample size, and the varying degrees of class imbalance. L1-LR also requires few
168 hyperparameters and calculates feature importance scores, simplifying the training and
169 interpretation processes for more direct comparison between the models. Models were trained and
170 tested to predict the 3 discharge outcomes for the ambulatory and non-ambulatory populations,
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
171 using nested leave-one-subject-out cross validation (Suppl. Fig. 2).
RI
172
SC
173 Models predicting ambulatory outcomes at discharge were exclusively trained and tested using the
174 32 patients who were ambulatory who had IMU data available for both the 10WMT and BBS (Fig.
U
175 1). To determine the most predictive sensor tasks for patients who were ambulatory, we compared
AN
176 model performance when training with IMU features from BBS only (IMUBBS), 10MWT only
180 combined ambulatory and non-ambulatory populations to maximize the availability of the BBS
181 IMU data. We refer to these models as non-ambulatory models because they were tested and
EC
182 intended exclusively for the 8 patients who were non-ambulatory. This combined training was
RR
183 adopted to increase the sample size and heterogeneity of discharge outcomes for model learning
185
187 The primary performance metric was the weighted F1 score (WF1), defined as the harmonic mean
U
188 of the precision and recall, computed separately for each class 𝑗, and weighted by the number of
10
189 samples 𝑛𝑗 within each class, with the highest possible value of 1.0 indicating perfect precision
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑗 · 𝑟𝑒𝑐𝑎𝑙𝑙𝑗
∑𝐿𝑗=1 2 · ·𝑛
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑗 + 𝑟𝑒𝑐𝑎𝑙𝑙𝑗 𝑗
191 𝑊𝐹1 =
∑𝐿𝑗=1 𝑛𝑗
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
192
RI
193 Secondary performance metrics were accuracy and log-loss scores. Accuracy is the ratio of correct
SC
194 predictions to the total number of samples, with the highest value of 1.0:
195
U
𝑡𝑝 + 𝑡𝑛
AN
196 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑡𝑝 + 𝑓𝑝 + 𝑡𝑛 + 𝑓𝑛
197 M
198 where 𝑡𝑝, 𝑡𝑛, 𝑓𝑝, and 𝑓𝑛 are the numbers of true positives, true negatives, false positives, and
D
199 false negatives, respectively. Positive classes were household ambulation ability, low
TE
201
EC
202 Log-loss measures the variation between prediction probabilities and true classes, wherein lower
values indicate greater certainty about the predictions.23 Given a true label 𝑦𝑖 and the prediction
RR
203
205
1
206 𝐿𝑜𝑔𝐿𝑜𝑠𝑠 = − 𝑁 ∑𝑁
𝑖=1(𝑦𝑖 · ln(𝑝𝑖 ) + (1 − 𝑦𝑖 ) · ln(1 − 𝑝𝑖 )).
N
207
U
208 Confusion matrices were generated for the best performing models to examine misclassifications
209 for each outcome and patient group. These were also compared to the benchmark PI+FA model.
11
210 Parameter importance was determined from coefficients fit in the best model, taking the median
211 coefficient value and the 25th and 75th percentiles across all participants. Parameters with median,
212 25th, and 75th percentile values equal to zero were discarded.
213
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
214 Role of the Funding Source: The funders played no role in the design, conduct, or reporting of
RI
215 this study.
SC
216
217 Results
U
218 Classification performance for each model and feature set is presented in the Table and
AN
219 summarized below.
220 Ambulation
M
221 For patients who were ambulatory, the benchmark PI+FA ambulation model had a WF1 of 0.709.
D
222 Gait-based IMU features, either alone or combined with balance features, improved performance
TE
223 in both the streamlined and comprehensive sensor model configurations by 19.6%. Balance-based
225
RR
226 The gait-based streamlined sensor model, PI+IMU10MWT, was selected as the best model for
227 patients who were ambulatory, given its simple configuration and highest WF1 (Fig. 3A). The
CO
228 streamlined sensor model outperformed the benchmark, correctly identifying more patients who
229 were household (4 vs. 1 patient(s)) and community (23 vs. 21 patients) ambulators at discharge
N
230 (Fig. 3B). The PI+IMU10MWT model also correctly identified 27 of 29 patients who did not change
U
231 ambulation category from IRF admission to discharge, though it misclassified 3 patients who
232 improved from household to community ambulators (Fig. 3C). Eleven features were selected for
12
233 the PI+IMU10MWT model, including lesion location, activity lifestyle, and IMU features from all
235
236 For patients who were non-ambulatory, the comprehensive model trained on the combined dataset
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
237 was the best ambulation model, achieving a WF1 of 0.859 (Suppl. Fig. 3A). The PI+FA+IMUBBS
RI
238 model correctly classified 1 of 2 individuals who were non-ambulatory who progressed to
SC
239 community ambulators, as well as all 6 individuals who were non-ambulatory and were discharged
240 as household ambulators (Suppl. Fig. 3B, 3C). Notably, 2 individuals remained non-ambulatory at
U
241 discharge, with 1 completing the 6MWT but unable to complete the 10MWT. Among the 28
AN
242 features selected for the comprehensive model, the admission 10MWT score and IMU balance
243 features were the most important predictors of community and household ambulation, respectively
M
244 (Suppl. Fig. 3D).
D
245
TE
246 Independence
247 For patients who were ambulatory, the benchmark PI+FA independence model had a WF1 of
EC
248 0.685. Gait-based IMU features yielded a similar WF1, while balance features performed slightly
RR
249 worse in both the streamlined (−14.2%) and comprehensive (−9.2%) sensor models. Combining
250 gait and balance IMU features further decreased WF1 (up to −17.8%) for both sensor models.
CO
251
252 The gait-based comprehensive model, PI+FA+IMU10MWT, was the best-performing model
N
253 according to WF1 (Fig. 4A). Compared to benchmark, the comprehensive model correctly
U
254 classified more individuals who were ambulatory who were discharged with low independence
255 (11 vs. 9 patients), though with fewer correct predictions for individuals with high independence
13
256 (11 vs. 13 patients) (Fig. 4B). Misclassifications were higher among participants with discharge
257 FIM motor scores close to the class threshold. The PI+FA+IMU10MWT model correctly identified
258 10 out of the 16 patients who transitioned from low to high independence (Fig. 4C). Fourteen
259 features were selected for this model, including gyroscope features from the lumbar and
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
260 unaffected-side ankle. Participant age was the most discriminative feature for low independence
RI
261 at discharge, while 10MWT and BBS admission scores indicated high independence (Fig. 4D).
SC
262
263 For patients who were non-ambulatory, independence predictions achieved the same WF1 of 0.933
U
264 across models, with the least uncertainty in the comprehensive model (Suppl. Fig. 4A-4C). We
AN
265 selected the benchmark as the best model, which used simple features as age, admission 6MWT,
266 and admission BBS scores to differentiate between the two levels of discharge independence
M
267 (Suppl. Fig. 4D).
D
268
TE
270 For patients who were ambulatory, the benchmark risk-of-falling model had a WF1 of 0.534 (Fig.
EC
271 5A). Balance-based IMU features decreased performance in the streamlined sensor model to 0.347,
RR
272 but slightly increased performance in the comprehensive model to 0.566. Gait-based IMU features
273 improved performance relative to the benchmark model in both the streamlined (23.4%) and
CO
274 comprehensive (17.6%) sensor models. Combined gait and balance IMU features did not increase
276
U
277 The gait-based streamlined sensor model, PI+IMU10MWT, was selected as the best risk-of-falling
278 model. Compared to the benchmark, the streamlined sensor model correctly classified more
14
279 individuals who were ambulatory who were discharged with both high risk (12 vs. 9 patients) and
280 normal risk (9 vs. 8 patients) (Fig. 5B). Incorrect predictions were more likely when the BBS
281 discharge score was near the cut-off value. The PI+IMU10MWT model correctly predicted 5 patients
282 who transitioned from high to normal risk (out of 8 total) (Fig. 5C). Of the 23 features selected for
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
283 this model, various IMU and demographic features had similar average importance to distinguish
RI
284 individuals with high and normal risk of falling (Fig. 5D).
SC
285
286
U
287 For patients who were non-ambulatory, risk of falling predictions were perfectly accurate for the
AN
288 benchmark and comprehensive models, whereas the streamlined sensor model exhibited
289 marginally lower performance (Suppl. Fig. 5A). Both the benchmark and the comprehensive
M
290 models identified all individuals who were non-ambulatory with high risk of falling (Suppl. Fig.
D
291 5B, 5C). The benchmark was selected as the best model, utilizing the simplest set of 4 features
TE
292 with relatively low uncertainty. Lifestyle and left-side hemiparesis were markers for high fall risk,
293 whereas BBS admission score had the highest importance to identify individuals with normal risk
EC
295
296 Discussion
CO
297 For patients who were ambulatory at admission, we found that IMU sensor data recorded from the
298 lumbar and ankles during walking tasks improved early predictions of poststroke inpatient
N
300 predictions derived from EMR-based patient information and standardized functional assessment
301 scores. For ambulation and risk of falling, IMU features extracted during a 10m walking bout
15
302 increased the WF1 in a streamlined sensor model (PI+ IMU10MWT), while FA features from
303 admission further improved predictions for independence. Similar to our previous work, including
304 sensor data improved predictions of discharge ambulation compared to a benchmark model.16 This
305 finding was repeatable across the 2 studies despite using different modeling approaches and
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
306 algorithms (Random Forest vs. L1-LR). A streamlined PI+IMU10MWT model improved ambulation
RI
307 predictions by 19.6% over the benchmark performance, achieving an 84.4% accuracy to predict
SC
308 community/household ambulators based on the 10MWT score. A comprehensive
U
310 scoring an accuracy of 68.8% to classify high/low independence based on the FIM motor subscore.
AN
311 A streamlined PI+IMU10MWT model improved the benchmark risk-of-falling performance by
312 23.4%, achieving a 65.9% accuracy to classify normal/high risk based on the BBS score. Most
M
313 misclassifications occurred when patients had admission or discharge scores near the class
D
314 boundary (Fig. 3-5C).
TE
315
316 For patients who were non-ambulatory at admission, incorporating IMU data from simple balance
EC
317 tasks added less value to predicting discharge ambulation function. The comprehensive models
RR
318 were as accurate as the benchmark models for independence and risk of falling outcomes, with
319 lower log-loss values indicating less uncertainty due to better convergence between prediction
CO
320 probabilities and actual classes. Interpretation of the non-ambulatory models is challenging given
321 the small, imbalanced sample size and similar discharge outcomes for these patients, which likely
N
322 limited the model’s ability to learn from the available non-ambulatory patient data.
U
323
16
324 A growing body of research focuses on development and testing early prediction tools after stroke.
325 Stinear et al24 provide a detailed review of models predicting functional and motor-related
326 outcomes, enumerating the strengths and limitations of methods published up to 2019. Several
327 previous studies have developed predictive models for IRF discharge, with most incorporating
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
328 functional assessments and therapist evaluations obtained at admission.25-27 For example, Bland et
RI
329 al25 use the BBS and FIM walk scores at admission to predict ambulation at IRF discharge
SC
330 according to the 10MWT, with greater sensitivity (91%–94%, household ambulation) but lower
331 specificity (60–65%, community ambulation) compared to our findings. We have previously
U
332 developed sensor-free regression models to predict discharge scores using similar PI+FA features
AN
333 and 50 participants from this study with mean average error of 0.3 m/s, 9.5 points, and 7.4 points
334 for the 10MWT, FIM, and BBS, respectively.9 The TWIST model28 is another promising approach
M
335 for predictions outside of the IRF setting, utilizing age, BBS, and knee extension grade at 1-week
D
336 poststroke to predict independent walking according to Functional Ambulation Categories at 4, 6,
TE
337 9, 16, or 26 weeks after stroke, with 83% accuracy across all timepoints. Only recently has the
338 research community begun investigating the predictive value of wearable sensor data for similar
EC
339 prognostic applications.14,15 However, the utility of sensor data in regression models or long-term
RR
341
CO
342 Accurately predicting expected post-treatment outcomes early in rehabilitation would improve
343 discharge planning for clinicians, patients, families, and insurance companies by providing a
N
344 roadmap of the patient’s care needs after leaving the hospital. In this study, sensor features were
U
345 important predictors for individuals discharged with limited ambulation ability and high risk of
346 falling, providing quantitative measures of movement symmetry (eg, the skewness) and
17
347 repeatability (eg, sample entropy) for treatment monitoring.9,29,30 Sensor models could replace or
348 reduce reliance on functional assessment scores, as less time is needed to collect the data.
349 Consumer-grade devices and an about 5-minute sequence of simple physical activities (brief
350 walking bout, standing, stand-to-sit, sitting, sit-to-stand) would enable quicker and more frequent
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
351 evaluations than the longer and more complex standardized functional assessments. The
RI
352 assessments considered in this study (10MWT, 6MWT, BBS, TUG, FIM) are typically collected
SC
353 at IRF admission in the US for clinical evaluation and insurance reporting. However, completing
354 these assessments upon admission can be challenging due to time limitations during
U
355 intake/treatment and varied patient impairments, including fatigability and physical or cognitive
AN
356 deficits.
357
M
358 Our results should be considered in context of previous findings for clinical machine learning
D
359 models — namely, that appropriate choices of target population,15 activities,14 sensor modalities,14
TE
360 and prediction outcomes5 are paramount to design a successful model.6 For instance, in the case
361 of patients were ambulatory, IMU data from the 10MWT and BBS were less impactful for
EC
362 predicting discharge independence, as defined by the FIM motor subscore. This is unsurprising,
RR
363 considering the FIM motor assessment evaluates a breadth of functional activities — including
364 walking, stair climbing, transfers, dressing, bathing, grooming, toileting, and bowel or bladder
CO
365 management — and some of these activities may not be well-characterized by gait or balance
366 movements at IRF admission. Sensor features from other physical activities may better capture
N
367 biomarkers of motor independence according to the FIM. Similarly, predictions for patients who
U
368 were non-ambulatory did not significantly benefit from sensor data, revealing the need for
369 alternative modeling approaches for patients with severe gait impairment.
18
370
371 Limitations
372 The number of incorrect predictions is a primary limitation of the models presented in this study.
373 Indeed, a naïve model predicting no change in outcome classification from admission to discharge
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
374 would generally perform well for this study sample, since only a fraction of the patients changed
RI
375 classes in our study (ie, 9%–50% of patients who were ambulatory, or 0%–25% of patients who
SC
376 were non-ambulatory patients, depending on the outcome). However, such a model will always
377 fail to identify individuals who improved functional classes, who are arguably the most difficult
U
378 and clinically meaningful cases to predict. In contrast, our models could identify some individuals
AN
379 who improved in the independence (10 out of 16) and risk of falling (5 out of 8) functional classes.
380 The small and unbalanced populations in our single-site study may limit the sensitivity,
M
381 generalizability, and utility of the proposed models, with a potential risk of overfitting in these
D
382 high dimensional feature sets. Larger sample sizes, particularly for patients who are non-
TE
383 ambulatory at admission and achieve heterogenous discharge outcomes, will be crucial to further
385
RR
386 Future work should also investigate sensor regression models that predict continuous outcome
387 scores at discharge rather than classification models that predict categories based on a cut-off
CO
388 score. Regression models may offer greater clinical utility by removing reliance on predefined
N
389 classification boundaries and providing higher-resolution discharge predictions, though possibly
with greater sensitivity to error.6,14 Alternative clinical outcomes (eg, Fugl-Meyer Assessment),
U
390
391 sensor placements (eg, upper limbs), and functional abilities (eg, endurance) should also be
19
393
394 We did not evaluate other machine learning algorithms, which may outperform L1-penalized
395 logistic regression. Rather, we sought to understand the relative value of sensor data using a single,
396 well-performing and interpretable algorithm for each of these outcomes and patient groups.
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
397 Alternative algorithms and extended hyperparameter tuning could improve the prediction
RI
398 performance shown here.
SC
399
U
400 A potential disadvantage of models trained to predict outcomes at hospital discharge is the use of
AN
401 hospital- and care-specific data. Because treatment strategies and patient characteristics can vary
402 nationally and internationally, a model trained using data from one location may not generalize to
403
M
others. For example, the PREP2 model31,32 – which demonstrated 75% accuracy in New Zealand
404 for categorizing 3-month upper limb function after 1-week poststroke – had drastically lower
D
405 accuracy for patients in the US and Europe.29,30 This highlights the necessity for additional testing
TE
406 and external validation to determine whether site-specific training data are essential for prediction
EC
407 models, or whether combined training data from multiple sites would broaden generalization
409
CO
410 Conclusions
411 This study affirms that motion-based measures from wearable sensors can be beneficial for
N
412 predicting certain patient outcomes following acute poststroke rehabilitation. We have highlighted
U
413 the potential and open challenges of moving these machine learning algorithms into clinical
414 practice to inform tailored and effective rehabilitation therapies. While sensor-based models may
20
415 increase predictive performance, additional research is needed to refine and validate these models
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
RI
SC
U
AN
M
D
TE
EC
RR
CO
N
U
21
Author Contributions
PT
(IV) Collection and assembly of data: M.K.O.
RI
(VI) Manuscript writing: All authors
(VII) Final approval of manuscript: All authors
SC
U
Acknowledgments
AN
The authors thank Nsude Okeke Ewo, Alexander J. Boe, Marco Hidalgo-Araya, Sara Prokup,
Matthew Giffhorn, Kelly McKenzie, Kristen Hohl, and Matthew McGuire for their help in
M
patient recruitment and data collection.
D
TE
Ethics Approval
All individuals (or a proxy) provided written informed consent, and the study was approved by
EC
Funding
CO
This work was supported by the Shirley Ryan AbilityLab, with partial support from the National
(T32HD007418 to M.K.O.), center grant to establish the Center for Smart Use of Technology to
U
Assess Real-world Outcomes (C-STAR, P2CHD101899 to R.L.L.), and the National Institute on
Aging of the NIH (R43AG067835 to R.L.L.). This work was also supported in part by Research
Career Scientist Award from the United States Department of Veterans Affairs Rehabilitation
22
R&D Service (IK6 RX003351 to R.L.L.). The funders played no role in the design, conduct, or
Disclosure
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
The authors completed the ICMJE Form for Disclosure of Potential Conflicts of Interest and
RI
reported no conflicts of interest.
SC
U
AN
M
D
TE
EC
RR
CO
N
U
23
References
[1] Centers for Disease Control and Prevention (CDC). Prevalence and Most Common Causes
of Disability Among Adults --- United States, 2005. MMWR Morb Mortal Wkly Rep 2009;
58: 421–426.
[2] Le Danseur M. Stroke Rehabilitation. Crit Care Nurs Clin North Am 2020; 32: 97–108.
[3] Brandstater ME, Shutter LA. Rehabilitation Interventions During Acute Care of Stroke
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
Patients. Top Stroke Rehabil 2002; 9: 48–56.
[4] Report to the Congress: Medicare Payment Policy – MedPAC. Washingtond, DC,
https://www.medpac.gov/document/march-2022-report-to-the-congress-medicare-
RI
payment-policy/ (March 2022, accessed 11 September 2022).
[5] Kwah LK, Herbert RD. Prediction of Walking and Arm Recovery after Stroke: A Critical
SC
Review. Brain Sciences 2016, Vol 6, Page 53 2016; 6: 53.
[6] Campagnini S, Arienti C, Patrini M, et al. Machine learning methods for functional
recovery prediction and prognosis in post-stroke rehabilitation: a systematic review.
U
Journal of NeuroEngineering and Rehabilitation 2022 19:1 2022; 19: 1–22.
[7] Harvey RL. Predictors of Functional Outcome Following Stroke. Physical Medicine and
AN
Rehabilitation Clinics of North America 2015; 26: 583–598.
[8] Stinear CM, Smith MC, Byblow WD. Prediction Tools for Stroke Rehabilitation. Stroke
2019; 50: 3314–3322.
[9] Harari Y, O’Brien MK, Lieber RL, et al. Inpatient stroke rehabilitation: Prediction of
M
clinical outcomes using a machine-learning approach. J Neuroeng Rehabil 2020; 17: 1–
10.
[10] Piron L, Piccione F, Tonin P, et al. Clinical correlation between motor evoked potentials
D
and gait recovery in poststroke patients. Arch Phys Med Rehabil 2005; 86: 1874–1878.
[11] Stinear CM, Barber PA, Petoe M, et al. The PREP algorithm predicts potential for upper
TE
[13] Stinear CM, Byblow WD, Ackerley SJ, et al. PREP2: A biomarker-based algorithm for
predicting upper limb function after stroke. Ann Clin Transl Neurol 2017; 4: 811–820.
[14] Adans-Dester C, Hankov N, O’Brien A, et al. Enabling precision rehabilitation
RR
interventions using wearable sensors and machine learning to track motor recovery. npj
Digital Medicine 2020 3:1 2020; 3: 1–10.
[15] Lee SI, Adans-Dester CP, Obrien AT, et al. Predicting and Monitoring Upper-Limb
CO
Rehabilitation Outcomes Using Clinical and Wearable Sensor Data in Brain Injury
Survivors. IEEE Trans Biomed Eng 2021; 68: 1871–1881.
[16] O’Brien MK, Shin SY, Khazanchi R, et al. Wearable Sensors Improve Prediction of Post-
Stroke Walking Function Following Inpatient Rehabilitation. IEEE J Transl Eng Health
N
24
[19] Alexander MP. Stroke rehabilitation outcome: A potential use of predictive variables to
establish levels of care. Stroke 1994; 25: 128–134.
[20] Teasell R, Foley N. Managing the Stroke Rehabilitation Triage Process. Evidence Based
Review of Stroke Rehabilitation, http://www.ebrsr.com/evidence-review/4-managing-
stroke-rehabilitation-triage-process (2008, accessed 22 March 2021).
[21] Berg K, Wood-Dauphinee S, Williams JI. The balance scale: Reliability assessment with
elderly residents and patients with an acute stroke. Scand J Rehabil Med 1995; 27: 27–36.
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
[22] Sokolova M, Lapalme G. A systematic analysis of performance measures for
classification tasks. Inf Process Manag 2009; 45: 427–437.
[23] Murphy KP. Machine learning: a probabilistic perspective. Cambridge, MA: MIT Press,
RI
2012.
[24] Stinear CM, Smith MC, Byblow WD. Prediction Tools for Stroke Rehabilitation. Stroke
SC
2019; 50: 3314–3322.
[25] Bland MD, Sturmoski A, Whitson M, et al. Prediction of Discharge Walking Ability From
Initial Assessment in a Stroke Inpatient Rehabilitation Facility Population. Arch Phys Med
U
Rehabil 2012; 93: 1441–1447.
[26] Scrutinio D, Lanzillo B, Guida P, et al. Development and validation of a predictive model
AN
for functional outcome after stroke rehabilitation the maugeri model. Stroke 2017; 48:
3308–3315.
[27] Henderson CE, Fahey M, Brazg G, et al. Predicting Discharge Walking Function With
High-Intensity Stepping Training During Inpatient Rehabilitation in Nonambulatory
M
Patients Poststroke. Arch Phys Med Rehabil 2022; 103: S189–S196.
[28] Smith M-C, Barber AP, Scrivener BJ, et al. The TWIST Tool Predicts When Patients Will
Recover Independent Walking After Stroke: An Observational Study. Original Research
D
Article Neurorehabilitation and Neural Repair 2019; 2022: 461–471.
[29] Barth J, Waddell KJ, Bland MD, et al. Accuracy of an Algorithm in Predicting Upper
TE
Limb Functional Capacity in a United States Population. Arch Phys Med Rehabil 2022;
103: 44–51.
[30] Lundquist CB, Nielsen JF, Arguissain FG, et al. Accuracy of the Upper Limb Prediction
EC
predicting upper limb function after stroke. Ann Clin Transl Neurol 2017; 4: 811–820.
[32] Smith MC, Ackerley SJ, Barber PA, et al. PREP2 Algorithm Predictions Are Correct at 2
Years Poststroke for Most Patients. Neurorehabil Neural Repair 2019; 33: 635–642.
CO
N
U
25
Table. Performance Metrics for Ambulatory and Non-Ambulatory Patients for Each Prediction Model and Feature Seta
PT
RI
Patient Prediction PI+FA Streamlined – PI+IMU
Group Model PI+FA+IMU
IMU Task
Log- Log- Log-
SC
WF1 Accuracy WF1 Accuracy WF1 Accuracy
Loss Loss Loss
BBS 0.688 0.688 0.594 0.688 0.688 0.484
U
0.848 0.848
AN
Ambulation 0.709 0.719 0.904 10MWTb b 0.844 0.546 b 0.844 0.545
M
BBS 0.588 0.594 1.016 0.622 0.625 1.044
Ambulatory 0.688
Independence 0.685 0.688 1.236 10MWTb 0.657 0.656 1.211 0.688 1.346
D
(N = 32) b
TE BBS+10MWT
BBS
0.563
0.347
0.563
0.344
1.422
1.116
0.622
0.566
0.625
0.563
1.081
0.727
EC
0.659
Risk of falling 0.534 0.531 1.380 10MWTb b 0.656 0.974 0.628 0.625 0.877
0.859
Ambulation 0.643 0.750 0.787 BBSb 0.300 0.250 0.916 b 0.875 0.392
Non-
CO
ambulatory 0.933
Independence 0.933b 0.875 0.782 BBSb 0.933 0.875 0.284 b 0.875 0.407
(N = 8)
1.000
Risk of falling 1.000b 1.000 0.246 BBSb 0.933 0.875 0.294 1.000 0.184
N
b
U
26
a
10MWT = 10-Meter Walk Test; BBS = Berg Balance Scale; FA = functional assessments; IMU = inertial measurement unit; PI =
PT
RI
discharge outcomes.
SC
U
AN
M
D
TE
EC
RR
CO
N
U
27
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
RI
SC
U
AN
M
Figure 1. Inpatient dataset available for model training and testing. Data were collected from 55
individuals undergoing poststroke inpatient rehabilitation at admission and discharge. Training sets
for prediction models were determined based on ambulatory status at admission and the availability
D
of IMU data from gait and balance tasks. For patients who were ambulatory at admission, we
utilized their IMU data recorded during the 10MWT and BBS (N = 32). For patients who were non-
TE
ambulatory at admission, we combined IMU BBS data for both patients who were ambulatory and
non-ambulatory (N = 50) and tested only on those who were non-ambulatory (N = 8). All models
were tested using a leave-one-subject-out approach. 6MWT = 6-Minute Walk Test; 10MWT = 10-
EC
Meter Walk Test; Adm = admission; Amb = ambulatory; BBS = Berg Balance Scale; FIM =
Functional Independence Measure; IMU = inertial measurement unit; Ind = independence; Non-
amb = non-ambulatory.
RR
CO
N
U
28
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
RI
SC
U
AN
Figure 2. Data pipeline for prediction models. Data collected at inpatient rehabilitation facility
M
(IRF) admission (PI, FA, and IMU signals) were combined in different feature sets and input into an
L1-penalized logistic regression model. The model was trained to predict functional outcomes at
IRF discharge, related to the classification of ambulation, independence, and risk of falling. 6MWT
D
= 6-Minute Walk Test; 10MWT = 10-Meter Walk Test; Acc = accelerometer; BBS = Berg Balance
Scale; FA = functional assessments; FIM = Functional Independence Measure; Gyr = gyroscope;
TE
IMU = inertial measurement unit; ML = machine learning; PI = patient information; TUG = Timed
“Up & Go” test; X1, X2, X3, XN = example features extracted from admission data.
EC
RR
CO
N
U
29
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
RI
SC
U
AN
Figure 3. Prediction models for ambulation at discharge (ambulatory at admission). (A) WF1,
M
accuracy, and log-loss for the benchmark model (PI+FA), streamlined sensor model
(PI+IMU10MWT), and comprehensive model (PI+FA+IMU10MWT). (B) Confusion matrices. (C)
10MWT score at admission (circles) and discharge (crosses) timepoints. Values at discharge are
D
marked in blue if correctly predicted by the best-performing model (simplest model with the highest
WF1), or in red if incorrectly predicted. (D) Median and interquartile ranges of the coefficients fit to
TE
the most important features for the best-performing model. 10MWT = 10-Meter Walk Test; Acc =
accelerometer; ȧ = derivative of acceleration; 𝜔̇ = derivative of gyroscope; Adm = admission; Amb
= ambulatory; AoM = amount of motion; AS = affected side; Dis = discharge; FA = functional
EC
assessments; Gyr = gyroscope; IMU = inertial measurement unit; PI = patient information; PSD =
power spectral density; SampEn = sample entropy; US = unaffected side; WF1= weighted F1 score.
RR
CO
N
U
30
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
RI
SC
U
AN
M
Figure 4. Prediction models for independence at discharge (ambulatory at admission). (A) WF1,
accuracy, and log-loss for the benchmark model (PI+FA), streamlined sensor model
(PI+IMU10MWT), and comprehensive model (PI+FA+IMU10MWT). (B) Confusion matrices. (C)
D
10MWT score at admission (circles) and discharge (crosses) timepoints. Values at discharge are
marked in dark green if correctly predicted by the best-performing model (simplest model with the
TE
highest WF1), or in red if incorrectly predicted. (D) Median and interquartile ranges of the
coefficients fit to the most important features for the best-performing model. 6MWT = 6-Minute
Walk Test; 10MWT = 10-Meter Walk Test; 𝜔̇ = derivative of rotational velocity (from gyroscope);
EC
Acc = accelerometer; Adm = admission; Amb = ambulatory; AS = affected side; BBS = Berg
Balance Scale; Dis = discharge; FA = functional assessments; FIM = Functional Independence
Measure; Gyr = gyroscope; IMU = inertial measurement unit; Ind = independence; PI = patient
RR
information; PSD = power spectral density; US = unaffected side; WF1 = weighted F1 score.
CO
N
U
31
PT
Downloaded from https://academic.oup.com/ptj/advance-article/doi/10.1093/ptj/pzad183/7505420 by guest on 11 January 2024
RI
SC
U
AN
Figure 5. Prediction models for risk of falling at discharge (ambulatory at admission). (A) WF1,
M
accuracy, and log-loss for the benchmark model (PI+FA), streamlined sensor model
(PI+IMU10MWT), and comprehensive model (PI+FA+IMU10MWT). (B) Confusion matrices. (C)
D
10MWT score at admission (circles) and discharge (crosses) timepoints. Values at discharge are
marked in blue if correctly predicted by the best-performing model (simplest model with the highest
TE
WF1), or in red if incorrectly predicted. (D) Median and interquartile ranges of the coefficient fit to
the most important features for the best-performing model. 10MWT = 10-Meter Walk Test; ȧ =
derivative of acceleration; 𝜔̇ = derivative of rotational velocity (from gyroscope); a(fmax) =
EC
32