A software pipeline for medical information extraction with large language models, open source and suitable for oncology

In medical oncology, text data, such as clinical letters or procedure reports, is stored in an unstructured way, making quantitative analysis difficult. Manual review or structured information retrieval is time-consuming and costly, whereas Large Language Models (LLMs) offer new possibilities in nat...

Full description

Saved in:
Bibliographic Details
Main Authors: Wiest, Isabella (Author) , Wolf, Fabian (Author) , Leßmann, Marie-Elisabeth (Author) , van Treeck, Marko (Author) , Ferber, Dyke (Author) , Zhu, Jiefu (Author) , Boehme, Heiko (Author) , Bressem, Keno K. (Author) , Ulrich, Hannes (Author) , Ebert, Matthias (Author) , Kather, Jakob Nikolas (Author)
Format: Article (Journal)
Language:English
Published: 17 September 2025
In: npj precision oncology
Year: 2025, Volume: 9, Issue: 1, Pages: 1-12
ISSN:2397-768X
DOI:10.1038/s41698-025-01103-4
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1038/s41698-025-01103-4
Verlag, kostenfrei, Volltext: https://www.nature.com/articles/s41698-025-01103-4
Get full text
Author Notes:Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert & Jakob Nikolas Kather

MARC

LEADER 00000caa a2200000 c 4500
001 1939652537
003 DE-627
005 20251030144311.0
007 cr uuu---uuuuu
008 251029s2025 xx |||||o 00| ||eng c
024 7 |a 10.1038/s41698-025-01103-4  |2 doi 
035 |a (DE-627)1939652537 
035 |a (DE-599)KXP1939652537 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Wiest, Isabella  |d 1992-  |e VerfasserIn  |0 (DE-588)1198882956  |0 (DE-627)168103638X  |4 aut 
245 1 2 |a A software pipeline for medical information extraction with large language models, open source and suitable for oncology  |c Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert & Jakob Nikolas Kather 
264 1 |c 17 September 2025 
300 |b Illustrationen 
300 |a 12 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 29.10.2025 
520 |a In medical oncology, text data, such as clinical letters or procedure reports, is stored in an unstructured way, making quantitative analysis difficult. Manual review or structured information retrieval is time-consuming and costly, whereas Large Language Models (LLMs) offer new possibilities in natural language processing for structured Information Extraction (IE) from medical free text. This protocol describes a workflow (LLM-AIx) for extracting predefined clinical entities from unstructured oncology text using privacy-preserving LLMs. It addresses a key barrier in clinical research and care by enabling efficient information extraction to support decision-making and large-scale data analysis. It runs on local hospital infrastructure, eliminating the need to transfer patient data externally. We demonstrate its utility on 100 pathology reports from The Cancer Genome Atlas (TCGA) for TNM stage extraction. LLM-AIx requires no programming skills and offers a user-friendly interface for rapid, structured data extraction from clinical free text. 
650 4 |a Cancer 
650 4 |a Translational research 
700 1 |a Wolf, Fabian  |e VerfasserIn  |4 aut 
700 1 |a Leßmann, Marie-Elisabeth  |e VerfasserIn  |4 aut 
700 1 |a van Treeck, Marko  |e VerfasserIn  |4 aut 
700 1 |a Ferber, Dyke  |e VerfasserIn  |0 (DE-588)1171467079  |0 (DE-627)1040545629  |0 (DE-576)513746056  |4 aut 
700 1 |a Zhu, Jiefu  |e VerfasserIn  |4 aut 
700 1 |a Boehme, Heiko  |e VerfasserIn  |4 aut 
700 1 |a Bressem, Keno K.  |e VerfasserIn  |4 aut 
700 1 |a Ulrich, Hannes  |e VerfasserIn  |4 aut 
700 1 |a Ebert, Matthias  |d 1968-  |e VerfasserIn  |0 (DE-588)1030133522  |0 (DE-627)734827083  |0 (DE-576)377938432  |4 aut 
700 1 |a Kather, Jakob Nikolas  |d 1989-  |e VerfasserIn  |0 (DE-588)1064064914  |0 (DE-627)812897587  |0 (DE-576)423589091  |4 aut 
773 0 8 |i Enthalten in  |t npj precision oncology  |d [London] : Springer Nature, 2017  |g 9(2025), 1, Artikel-ID 313, Seite 1-12  |h Online-Ressource  |w (DE-627)884384454  |w (DE-600)2891458-2  |w (DE-576)486547728  |x 2397-768X  |7 nnas  |a A software pipeline for medical information extraction with large language models, open source and suitable for oncology 
773 1 8 |g volume:9  |g year:2025  |g number:1  |g elocationid:313  |g pages:1-12  |g extent:12  |a A software pipeline for medical information extraction with large language models, open source and suitable for oncology 
856 4 0 |u https://doi.org/10.1038/s41698-025-01103-4  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext 
856 4 0 |u https://www.nature.com/articles/s41698-025-01103-4  |x Verlag  |z kostenfrei  |3 Volltext 
951 |a AR 
992 |a 20251029 
993 |a Article 
994 |a 2025 
998 |g 1064064914  |a Kather, Jakob Nikolas  |m 1064064914:Kather, Jakob Nikolas  |d 910000  |d 910100  |e 910000PK1064064914  |e 910100PK1064064914  |k 0/910000/  |k 1/910000/910100/  |p 11  |y j 
998 |g 1030133522  |a Ebert, Matthias  |m 1030133522:Ebert, Matthias  |d 60000  |d 61100  |e 60000PE1030133522  |e 61100PE1030133522  |k 0/60000/  |k 1/60000/61100/  |p 10 
998 |g 1171467079  |a Ferber, Dyke  |m 1171467079:Ferber, Dyke  |p 5 
998 |g 1198882956  |a Wiest, Isabella  |m 1198882956:Wiest, Isabella  |d 60000  |d 61100  |e 60000PW1198882956  |e 61100PW1198882956  |k 0/60000/  |k 1/60000/61100/  |p 1  |x j 
999 |a KXP-PPN1939652537  |e 479386600X 
BIB |a Y 
SER |a journal 
JSO |a {"id":{"eki":["1939652537"],"doi":["10.1038/s41698-025-01103-4"]},"name":{"displayForm":["Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert & Jakob Nikolas Kather"]},"language":["eng"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"note":["Gesehen am 29.10.2025"],"title":[{"title":"A software pipeline for medical information extraction with large language models, open source and suitable for oncology","title_sort":"software pipeline for medical information extraction with large language models, open source and suitable for oncology"}],"origin":[{"dateIssuedDisp":"17 September 2025","dateIssuedKey":"2025"}],"person":[{"family":"Wiest","role":"aut","given":"Isabella","display":"Wiest, Isabella"},{"display":"Wolf, Fabian","role":"aut","given":"Fabian","family":"Wolf"},{"display":"Leßmann, Marie-Elisabeth","role":"aut","given":"Marie-Elisabeth","family":"Leßmann"},{"role":"aut","given":"Marko","family":"van Treeck","display":"van Treeck, Marko"},{"given":"Dyke","role":"aut","family":"Ferber","display":"Ferber, Dyke"},{"family":"Zhu","role":"aut","given":"Jiefu","display":"Zhu, Jiefu"},{"family":"Boehme","given":"Heiko","role":"aut","display":"Boehme, Heiko"},{"display":"Bressem, Keno K.","family":"Bressem","role":"aut","given":"Keno K."},{"display":"Ulrich, Hannes","given":"Hannes","role":"aut","family":"Ulrich"},{"display":"Ebert, Matthias","family":"Ebert","given":"Matthias","role":"aut"},{"family":"Kather","given":"Jakob Nikolas","role":"aut","display":"Kather, Jakob Nikolas"}],"relHost":[{"part":{"year":"2025","volume":"9","pages":"1-12","issue":"1","text":"9(2025), 1, Artikel-ID 313, Seite 1-12","extent":"12"},"id":{"issn":["2397-768X"],"eki":["884384454"],"zdb":["2891458-2"]},"titleAlt":[{"title":"a nature research journal"},{"title":"Precision oncology"}],"pubHistory":["20 March 2017-"],"name":{"displayForm":["published by Springer Nature in partnership with The Hormel Institute, University of Minnesota"]},"title":[{"subtitle":"a natureresearch journal","title":"npj precision oncology","title_sort":"npj precision oncology"}],"note":["Gesehen am 20. April 2017"],"type":{"media":"Online-Ressource","bibl":"periodical"},"language":["eng"],"origin":[{"publisher":"Springer Nature","dateIssuedDisp":"[2017]-","publisherPlace":"[London]"}],"disp":"A software pipeline for medical information extraction with large language models, open source and suitable for oncologynpj precision oncology","recId":"884384454","physDesc":[{"extent":"Online-Ressource"}]}],"physDesc":[{"extent":"12 S.","noteIll":"Illustrationen"}],"recId":"1939652537"} 
SRT |a WIESTISABESOFTWAREPI1720