Extending CARDIO:DE: additional annotation guidelines and evaluation of NLP approaches for clinical applications

Background - Cardiovascular diseases are a major cause of morbidity and mortality, and the management of these conditions generates extensive clinical data. The CARDIO:DE dataset, a German-language corpus of cardiovascular clinical routine letters, has been developed to support natural language proc...

Full description

Saved in:
Bibliographic Details
Main Authors: Becker, Matthias (Author) , Krumscheid, Mario (Author) , Knobelspies, Alisa (Author) , Seydel, Markus (Author) , Richter-Pechanski, Phillip (Author) , Karl, Alexander (Author)
Format: Article (Journal)
Language:English
Published: November 2025
In: International journal of medical informatics
Year: 2025, Volume: 203, Pages: 1-7
ISSN:1872-8243
DOI:10.1016/j.ijmedinf.2025.106009
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1016/j.ijmedinf.2025.106009
Verlag, kostenfrei, Volltext: https://www.sciencedirect.com/science/article/pii/S1386505625002266
Get full text
Author Notes:Matthias Becker, Mario Krumscheid, Alisa Knobelspies, Markus Seydel, Phillip Richter-Pechanski, Alexander Karl

MARC

LEADER 00000caa a2200000 c 4500
001 1938518381
003 DE-627
005 20251112091519.0
007 cr uuu---uuuuu
008 251016s2025 xx |||||o 00| ||eng c
024 7 |a 10.1016/j.ijmedinf.2025.106009  |2 doi 
035 |a (DE-627)1938518381 
035 |a (DE-599)KXP1938518381 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Becker, Matthias  |d 1984-  |e VerfasserIn  |0 (DE-588)1219086460  |0 (DE-627)1734799625  |4 aut 
245 1 0 |a Extending CARDIO:DE  |b additional annotation guidelines and evaluation of NLP approaches for clinical applications  |c Matthias Becker, Mario Krumscheid, Alisa Knobelspies, Markus Seydel, Phillip Richter-Pechanski, Alexander Karl 
264 1 |c November 2025 
300 |b Illustrationen 
300 |a 7 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Online verfügbar: 6. Juni 2025, Artikelversion: 12. Juni 2025 
500 |a Gesehen am 16.10.2025 
520 |a Background - Cardiovascular diseases are a major cause of morbidity and mortality, and the management of these conditions generates extensive clinical data. The CARDIO:DE dataset, a German-language corpus of cardiovascular clinical routine letters, has been developed to support natural language processing research. This study seeks to enhance the dataset by introducing refined annotation guidelines and expanding the annotation schema. - Objective - The objective of this study was to extend the CARDIO:DE dataset with additional annotation categories, and evaluate state-of-the-art NLP models to enhance the utility of the dataset for clinical applications. - Methods - The annotation schema was expanded to include categories such as diagnostic procedures, medical finding, and therapeutic interventions (Diagnostic, Diagnosis, Drug, Medical_Finding, Therapy). The iterative annotation process involved expert annotators, ensuring high-quality, consistent annotations. Four models—GBERT, medBERT.de, XLM-RoBERTa, and TinyLlama—were fine-tuned and evaluated on the dataset. Model performance was assessed using entity-wise precision, recall, and F1 scores. - Results - The extended dataset includes 304,582 token-based annotations, with the highest concentration in medical finding. The inter-annotator agreement scores improved during the iterative process, reaching up to 0.98 for certain subsets. Among the evaluated models, TinyLlama outperformed the other models in entity recognition, achieving a macro-average F1 score of 0.845, highlighting its potential for clinical NLP tasks. - Conclusions - The extended CARDIO:DE dataset, with its refined annotation guidelines provides a robust foundation for natural language processing applications in the clinical domain. The performance of the TinyLlama model demonstrates the potential of fine-tuning non-domain-specific models for clinical text processing. This work paves the way for more accurate NLP solutions in healthcare, particularly for information extraction and decision support in cardiology. 
650 4 |a Cardiovascular Diseases 
650 4 |a Clinical Text Mining 
650 4 |a Information Extraction 
650 4 |a Named-Entity Recognition 
650 4 |a Natural Language Processing 
700 1 |a Krumscheid, Mario  |e VerfasserIn  |4 aut 
700 1 |a Knobelspies, Alisa  |e VerfasserIn  |4 aut 
700 1 |a Seydel, Markus  |e VerfasserIn  |4 aut 
700 1 |a Richter-Pechanski, Phillip  |e VerfasserIn  |0 (DE-588)1204395470  |0 (DE-627)1689724056  |4 aut 
700 1 |a Karl, Alexander  |e VerfasserIn  |4 aut 
773 0 8 |i Enthalten in  |t International journal of medical informatics  |d Amsterdam [u.a.] : Elsevier, 1997  |g 203(2025) vom: Nov., Artikel-ID 106009, Seite 1-7  |h Online-Ressource  |w (DE-627)265783720  |w (DE-600)1466296-6  |w (DE-576)074890913  |x 1872-8243  |7 nnas  |a Extending CARDIO:DE additional annotation guidelines and evaluation of NLP approaches for clinical applications 
773 1 8 |g volume:203  |g year:2025  |g month:11  |g elocationid:106009  |g pages:1-7  |g extent:7  |a Extending CARDIO:DE additional annotation guidelines and evaluation of NLP approaches for clinical applications 
856 4 0 |u https://doi.org/10.1016/j.ijmedinf.2025.106009  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext 
856 4 0 |u https://www.sciencedirect.com/science/article/pii/S1386505625002266  |x Verlag  |z kostenfrei  |3 Volltext 
951 |a AR 
992 |a 20251016 
993 |a Article 
994 |a 2025 
998 |g 1204395470  |a Richter-Pechanski, Phillip  |m 1204395470:Richter-Pechanski, Phillip  |d 910000  |d 910100  |e 910000PR1204395470  |e 910100PR1204395470  |k 0/910000/  |k 1/910000/910100/  |p 5 
999 |a KXP-PPN1938518381  |e 478860339X 
BIB |a Y 
SER |a journal 
JSO |a {"person":[{"family":"Becker","display":"Becker, Matthias","given":"Matthias","role":"aut"},{"given":"Mario","role":"aut","family":"Krumscheid","display":"Krumscheid, Mario"},{"family":"Knobelspies","display":"Knobelspies, Alisa","given":"Alisa","role":"aut"},{"family":"Seydel","display":"Seydel, Markus","role":"aut","given":"Markus"},{"display":"Richter-Pechanski, Phillip","family":"Richter-Pechanski","given":"Phillip","role":"aut"},{"family":"Karl","display":"Karl, Alexander","given":"Alexander","role":"aut"}],"language":["eng"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"origin":[{"dateIssuedKey":"2025","dateIssuedDisp":"November 2025"}],"note":["Online verfügbar: 6. Juni 2025, Artikelversion: 12. Juni 2025","Gesehen am 16.10.2025"],"title":[{"subtitle":"additional annotation guidelines and evaluation of NLP approaches for clinical applications","title_sort":"Extending CARDIO:DE","title":"Extending CARDIO:DE"}],"id":{"eki":["1938518381"],"doi":["10.1016/j.ijmedinf.2025.106009"]},"relHost":[{"disp":"Extending CARDIO:DE additional annotation guidelines and evaluation of NLP approaches for clinical applicationsInternational journal of medical informatics","type":{"bibl":"periodical","media":"Online-Ressource"},"origin":[{"publisherPlace":"Amsterdam [u.a.]","publisher":"Elsevier","dateIssuedKey":"1997","dateIssuedDisp":"1997-"}],"pubHistory":["Volume 44, issue 1 (March 1997)-"],"note":["Gesehen am 05.06.2018"],"title":[{"title_sort":"International journal of medical informatics","title":"International journal of medical informatics"}],"id":{"zdb":["1466296-6"],"issn":["1872-8243"],"eki":["265783720"]},"recId":"265783720","physDesc":[{"extent":"Online-Ressource"}],"part":{"text":"203(2025) vom: Nov., Artikel-ID 106009, Seite 1-7","extent":"7","volume":"203","year":"2025","pages":"1-7"},"language":["eng"]}],"recId":"1938518381","name":{"displayForm":["Matthias Becker, Mario Krumscheid, Alisa Knobelspies, Markus Seydel, Phillip Richter-Pechanski, Alexander Karl"]},"physDesc":[{"extent":"7 S.","noteIll":"Illustrationen"}]} 
SRT |a BECKERMATTEXTENDINGC2025