Multi-modal dataset creation for federated learning with DICOM-structured reports

Purpose: Federated training is often challenging on heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset ha...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Tölle, Malte (VerfasserIn) , Burger, Lukas (VerfasserIn) , Kelm, Halvar (VerfasserIn) , André, Florian (VerfasserIn) , Bannas, Peter (VerfasserIn) , Diller, Gerhard (VerfasserIn) , Frey, Norbert (VerfasserIn) , Garthe, Philipp (VerfasserIn) , Groß, Stefan (VerfasserIn) , Hennemuth, Anja (VerfasserIn) , Kaderali, Lars (VerfasserIn) , Krüger, Nina (VerfasserIn) , Leha, Andreas (VerfasserIn) , Martin, Simon (VerfasserIn) , Meyer, Alexander (VerfasserIn) , Nagel, Eike (VerfasserIn) , Orwat, Stefan (VerfasserIn) , Scherer, Clemens (VerfasserIn) , Seiffert, Moritz (VerfasserIn) , Seliger, Jan Moritz (VerfasserIn) , Simm, Stefan (VerfasserIn) , Friede, Tim (VerfasserIn) , Seidler, Tim (VerfasserIn) , Engelhardt, Sandy (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 03 February 2025
In: International journal of computer assisted radiology and surgery
Year: 2025, Jahrgang: 20, Heft: 3, Pages: 485495
ISSN:18616429
DOI:10.1007/s11548-025-03327-y
Online-Zugang:Verlag, kostenfrei, Volltext: https://doi.org/10.1007/s11548-025-03327-y
Volltext
Verfasserangaben:Malte Tölle, Lukas Burger, Halvar Kelm, Florian André, Peter Bannas, Gerhard Diller, Norbert Frey, Philipp Garthe, Stefan Groß, Anja Hennemuth, Lars Kaderali, Nina Krüger, Andreas Leha, Simon Martin, Alexander Meyer, Eike Nagel, Stefan Orwat, Clemens Scherer, Moritz Seiffert, Jan Moritz Seliger, Stefan Simm, Tim Friede, Tim Seidler, Sandy Engelhardt
Beschreibung
Zusammenfassung:Purpose: Federated training is often challenging on heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM-structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration with interactive filtering capabilities, thereby simplifying the process of creation of patient cohorts over several sites with consistent multi-modal data. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data include imaging and waveform data (i.e., computed tomography images, electrocardiography scans) as well as annotations (i.e., calcification segmentations, and pointsets), and metadata (i.e., prostheses and pacemaker dependency). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for multi-centric data analysis. The graphical interface as well as example structured report templates are available at https://github.com/Cardio-AI/fl-multi-modal-dataset-creation.
Beschreibung:Gesehen am 05.08.2025
Beschreibung:Online Resource
ISSN:18616429
DOI:10.1007/s11548-025-03327-y