Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting

A vast amount of clinical data are still stored in unstructured text. Automatic extraction of medical information from these data poses several challenges: high costs of clinical expertise, restricted computational resources, strict privacy regulations, and limited interpretability of model predicti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Richter-Pechanski, Phillip (VerfasserIn) , Wiesenbach, Philipp (VerfasserIn) , Schwab, Dominic Mathias (VerfasserIn) , Kiriakou, Christina (VerfasserIn) , Geis, Nicolas (VerfasserIn) , Dieterich, Christoph (VerfasserIn) , Frank, Anette (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 2025
In: Natural language processing
Year: 2025, Jahrgang: 31, Heft: 5, Pages: 1210-1233
ISSN:2977-0424
DOI:10.1017/nlp.2024.52
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1017/nlp.2024.52
Verlag, lizenzpflichtig, Volltext: https://www.cambridge.org/core/journals/natural-language-processing/article/clinical-information-extraction-for-lowerresource-languages-and-domains-with-fewshot-learning-using-pretrained-language-models-and-prompting/4596EA36DE0034F9A25D7576C4116BC9
Volltext
Verfasserangaben:Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic Mathias Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank

MARC

LEADER 00000caa a2200000 c 4500
001 1921184035
003 DE-627
005 20260209090419.0
007 cr uuu---uuuuu
008 250402s2025 xx |||||o 00| ||eng c
024 7 |a 10.1017/nlp.2024.52  |2 doi 
035 |a (DE-627)1921184035 
035 |a (DE-599)KXP1921184035 
035 |a (OCoLC)1528043962 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Richter-Pechanski, Phillip  |e VerfasserIn  |0 (DE-588)1204395470  |0 (DE-627)1689724056  |4 aut 
245 1 0 |a Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting  |c Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic Mathias Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank 
264 1 |c 2025 
300 |b Illustrationen 
300 |a 24 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Online veröffentlicht: 31. Oktober 2024 
500 |a Gesehen am 02.04.2025 
520 |a A vast amount of clinical data are still stored in unstructured text. Automatic extraction of medical information from these data poses several challenges: high costs of clinical expertise, restricted computational resources, strict privacy regulations, and limited interpretability of model predictions. Recent domain adaptation and prompting methods using lightweight masked language models showed promising results with minimal training data and allow for application of well-established interpretability methods. We are first to present a systematic evaluation of advanced domain-adaptation and prompting methods in a lower-resource medical domain task, performing multi-class section classification on German doctor’s letters. We evaluate a variety of models, model sizes (further-pre)training and task settings, and conduct extensive class-wise evaluations supported by Shapley values to validate the quality of small-scale training data and to ensure interpretability of model predictions. We show that in few-shot learning scenarios, a lightweight, domain-adapted pretrained language model, prompted with just 20 shots per section class, outperforms a traditional classification model, by increasing accuracy from to . By using Shapley values for model selection and training data optimization, we could further increase accuracy up to . Our analyses reveal that pretraining of masked language models on general-language data is important to support successful domain-transfer to medical language, so that further-pretraining of general-language models on domain-specific documents can outperform models pretrained on domain-specific data only. Our evaluations show that applying prompting based on general-language pretrained masked language models combined with further-pretraining on medical-domain data achieves significant improvements in accuracy beyond traditional models with minimal training data. Further performance improvements and interpretability of results can be achieved, using interpretability methods such as Shapley values. Our findings highlight the feasibility of deploying powerful machine learning methods in clinical settings and can serve as a process-oriented guideline for lower-resource languages and domains such as clinical information extraction projects. 
650 4 |a few-shot learning 
650 4 |a language models 
650 4 |a medical information extraction 
650 4 |a pretraining 
650 4 |a Prompting 
700 1 |a Wiesenbach, Philipp  |e VerfasserIn  |0 (DE-588)1292626887  |0 (DE-627)1848889755  |4 aut 
700 1 |a Schwab, Dominic Mathias  |d 1987-  |e VerfasserIn  |0 (DE-588)1163244422  |0 (DE-627)1027454801  |0 (DE-576)507847423  |4 aut 
700 1 |a Kiriakou, Christina  |e VerfasserIn  |0 (DE-588)1156805929  |0 (DE-627)1019739959  |0 (DE-576)502445238  |4 aut 
700 1 |a Geis, Nicolas  |d 1980-  |e VerfasserIn  |0 (DE-588)138513988  |0 (DE-627)696681854  |0 (DE-576)307885313  |4 aut 
700 1 |a Dieterich, Christoph  |d 1975-  |e VerfasserIn  |0 (DE-588)130054844  |0 (DE-627)494359269  |0 (DE-576)297972448  |4 aut 
700 1 |a Frank, Anette  |e VerfasserIn  |0 (DE-588)1020288108  |0 (DE-627)691172161  |0 (DE-576)36005689X  |4 aut 
773 0 8 |i Enthalten in  |t Natural language processing  |d Cambridge : Cambridge University Press, 2025  |g 31(2025), 5, Seite 1210-1233  |h Online-Ressource  |w (DE-627)1917904886  |w (DE-600)3208179-0  |x 2977-0424  |7 nnas  |a Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting 
773 1 8 |g volume:31  |g year:2025  |g number:5  |g pages:1210-1233  |g extent:24  |a Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting 
856 4 0 |u https://doi.org/10.1017/nlp.2024.52  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext  |7 1 
856 4 0 |u https://www.cambridge.org/core/journals/natural-language-processing/article/clinical-information-extraction-for-lowerresource-languages-and-domains-with-fewshot-learning-using-pretrained-language-models-and-prompting/4596EA36DE0034F9A25D7576C4116BC9  |x Verlag  |z lizenzpflichtig  |3 Volltext  |7 1 
951 |a AR 
992 |a 20250402 
993 |a Article 
994 |a 2025 
998 |g 1020288108  |a Frank, Anette  |m 1020288108:Frank, Anette  |d 90000  |d 90500  |e 90000PF1020288108  |e 90500PF1020288108  |k 0/90000/  |k 1/90000/90500/  |p 7  |y j 
998 |g 130054844  |a Dieterich, Christoph  |m 130054844:Dieterich, Christoph  |d 910000  |d 910100  |e 910000PD130054844  |e 910100PD130054844  |k 0/910000/  |k 1/910000/910100/  |p 6 
998 |g 138513988  |a Geis, Nicolas  |m 138513988:Geis, Nicolas  |d 910000  |d 910100  |d 50000  |e 910000PG138513988  |e 910100PG138513988  |e 50000PG138513988  |k 0/910000/  |k 1/910000/910100/  |k 0/50000/  |p 5 
998 |g 1156805929  |a Kiriakou, Christina  |m 1156805929:Kiriakou, Christina  |d 910000  |d 910100  |e 910000PK1156805929  |e 910100PK1156805929  |k 0/910000/  |k 1/910000/910100/  |p 4 
998 |g 1163244422  |a Schwab, Dominic Mathias  |m 1163244422:Schwab, Dominic Mathias  |d 910000  |d 910100  |e 910000PS1163244422  |e 910100PS1163244422  |k 0/910000/  |k 1/910000/910100/  |p 3 
998 |g 1292626887  |a Wiesenbach, Philipp  |m 1292626887:Wiesenbach, Philipp  |d 910000  |d 910100  |e 910000PW1292626887  |e 910100PW1292626887  |k 0/910000/  |k 1/910000/910100/  |p 2 
998 |g 1204395470  |a Richter-Pechanski, Phillip  |m 1204395470:Richter-Pechanski, Phillip  |d 910000  |d 910100  |e 910000PR1204395470  |e 910100PR1204395470  |k 0/910000/  |k 1/910000/910100/  |p 1  |x j 
999 |a KXP-PPN1921184035  |e 4696598985 
BIB |a Y 
SER |a journal 
JSO |a {"title":[{"title_sort":"Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting","title":"Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting"}],"type":{"bibl":"article-journal","media":"Online-Ressource"},"language":["eng"],"note":["Online veröffentlicht: 31. Oktober 2024","Gesehen am 02.04.2025"],"origin":[{"dateIssuedDisp":"2025","dateIssuedKey":"2025"}],"person":[{"display":"Richter-Pechanski, Phillip","family":"Richter-Pechanski","role":"aut","given":"Phillip"},{"role":"aut","given":"Philipp","family":"Wiesenbach","display":"Wiesenbach, Philipp"},{"display":"Schwab, Dominic Mathias","family":"Schwab","given":"Dominic Mathias","role":"aut"},{"family":"Kiriakou","role":"aut","given":"Christina","display":"Kiriakou, Christina"},{"family":"Geis","given":"Nicolas","role":"aut","display":"Geis, Nicolas"},{"role":"aut","given":"Christoph","family":"Dieterich","display":"Dieterich, Christoph"},{"role":"aut","given":"Anette","family":"Frank","display":"Frank, Anette"}],"relHost":[{"type":{"media":"Online-Ressource","bibl":"periodical"},"language":["eng"],"note":["Gesehen am 25.02.2025"],"title":[{"title":"Natural language processing","title_sort":"Natural language processing"}],"origin":[{"dateIssuedDisp":"2025-","dateIssuedKey":"2025","publisherPlace":"Cambridge","publisher":"Cambridge University Press"}],"disp":"Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and promptingNatural language processing","titleUni":[{"title":"Natural language processing (Cambridge)"}],"physDesc":[{"extent":"Online-Ressource"}],"recId":"1917904886","id":{"eki":["1917904886"],"issn":["2977-0424"],"doi":["10.1017/nlp"],"zdb":["3208179-0"]},"part":{"year":"2025","volume":"31","pages":"1210-1233","extent":"24","text":"31(2025), 5, Seite 1210-1233","issue":"5"},"pubHistory":["Volume 31, issue 1 (January 2025)-"]}],"recId":"1921184035","physDesc":[{"noteIll":"Illustrationen","extent":"24 S."}],"id":{"eki":["1921184035"],"doi":["10.1017/nlp.2024.52"]},"name":{"displayForm":["Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic Mathias Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank"]}} 
SRT |a RICHTERPECCLINICALIN2025