Reassessing the role of machine learning in clinical prediction: a benchmark of predicting walking function after spinal cord injury
In light of growing biomedical data, machine learning (ML) models offer tremendous potential for personalized prediction in medicine. However, the additional value provided by these computational tools should always be critically evaluated. Using the example of predicting walking ability after spina...
Gespeichert in:
| Hauptverfasser: | , , , , , , , , , , , , , , |
|---|---|
| Dokumenttyp: | Article (Journal) |
| Sprache: | Englisch |
| Veröffentlicht: |
30 October 2025
|
| In: |
Journal of neurotrauma
Year: 2025, Pages: 1-13 |
| ISSN: | 1557-9042 |
| DOI: | 10.1177/08977151251386008 |
| Online-Zugang: | Verlag, kostenfrei, Volltext: https://doi.org/10.1177/08977151251386008 Verlag, kostenfrei, Volltext: https://www.liebertpub.com/doi/10.1177/08977151251386008 |
| Verfasserangaben: | Julia Bugajska, Louis P. Lukas, Rüdiger Rupp, Norbert Weidner, Martin Schubert, Frank Röhrich, Josina Waldmann, Yorck B. Kalke, Rainer Abel, Doris Maier, Harvinder S. Chhabra, Thomas Liebscher, Armin Curt, Sarah Brüningk, and Catherine R. Jutzeler |
| Zusammenfassung: | In light of growing biomedical data, machine learning (ML) models offer tremendous potential for personalized prediction in medicine. However, the additional value provided by these computational tools should always be critically evaluated. Using the example of predicting walking ability after spinal cord injury (SCI), we highlight a popular scenario in which data-driven predictions are feasible but not clinically meaningful, as the task can be performed equally well by humans. We asked 11 human observers from diverse backgrounds (five researchers without clinical training but proven knowledge of SCI and the International Standards for Neurological Classification of SCI [ISNCSCI], and six neurologists experienced in SCI) to predict walking ability following SCI based on acute phase neurological status assessed by the ISNCSCI motor and sensory scores (≤40 days after injury [DAI]). Following an established clinical prediction rule, walking ability was defined by a binary label derived from the indoor walking ability subitem of the Spinal Cord Independence Measure. We compared the performance of human observers with extreme gradient boosting and logistic regression-based models, which represent popular approaches in clinical literature on SCI. Using 794 patients from the European Multicenter Study about SCI, we show that all approaches provide similar, excellent performance at population level (area under the receiver operating characteristic 0.93-0.95; accuracy 0.88-0.90). Importantly, predictions combined from multiple neurologists (accuracy: 0.89) were comparable with model-based predictions (accuracy: 0.88-0.90), whereas individual neurologists (accuracy: 0.79 [0.01]; mean [standard deviation]) were marginally outperformed by computational approaches (accuracy: 0.88-0.90), particularly for more heterogeneous incomplete injuries. Individual SCI researchers performed equally well compared with neurologists (accuracy: 0.78 [0.02]). Our results show that prediction of walking function following SCI, if described through a binary label, does not benefit from ML, as ensembles of clinical experts and researchers each achieve performance similar to a range of ML models and an established clinical prediction rule. This highlights two key considerations in clinical applications of data-driven prediction models in SCI: first, the importance of carefully choosing clinical outcome measures to target in a prediction task to achieve a true benefit, and second, the necessity of benchmarking human performance on specific tasks to determine whether meaningful differences are present. |
|---|---|
| Beschreibung: | Online veröffentlicht: 30. Oktober 2025 Gesehen am 15.12.2025 |
| Beschreibung: | Online Resource |
| ISSN: | 1557-9042 |
| DOI: | 10.1177/08977151251386008 |