A software pipeline for medical information extraction with large language models, open source and suitable for oncology
In medical oncology, text data, such as clinical letters or procedure reports, is stored in an unstructured way, making quantitative analysis difficult. Manual review or structured information retrieval is time-consuming and costly, whereas Large Language Models (LLMs) offer new possibilities in nat...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article (Journal) |
| Language: | English |
| Published: |
17 September 2025
|
| In: |
npj precision oncology
Year: 2025, Volume: 9, Issue: 1, Pages: 1-12 |
| ISSN: | 2397-768X |
| DOI: | 10.1038/s41698-025-01103-4 |
| Online Access: | Verlag, kostenfrei, Volltext: https://doi.org/10.1038/s41698-025-01103-4 Verlag, kostenfrei, Volltext: https://www.nature.com/articles/s41698-025-01103-4 |
| Author Notes: | Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert & Jakob Nikolas Kather |
MARC
| LEADER | 00000caa a2200000 c 4500 | ||
|---|---|---|---|
| 001 | 1939652537 | ||
| 003 | DE-627 | ||
| 005 | 20251030144311.0 | ||
| 007 | cr uuu---uuuuu | ||
| 008 | 251029s2025 xx |||||o 00| ||eng c | ||
| 024 | 7 | |a 10.1038/s41698-025-01103-4 |2 doi | |
| 035 | |a (DE-627)1939652537 | ||
| 035 | |a (DE-599)KXP1939652537 | ||
| 040 | |a DE-627 |b ger |c DE-627 |e rda | ||
| 041 | |a eng | ||
| 084 | |a 33 |2 sdnb | ||
| 100 | 1 | |a Wiest, Isabella |d 1992- |e VerfasserIn |0 (DE-588)1198882956 |0 (DE-627)168103638X |4 aut | |
| 245 | 1 | 2 | |a A software pipeline for medical information extraction with large language models, open source and suitable for oncology |c Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert & Jakob Nikolas Kather |
| 264 | 1 | |c 17 September 2025 | |
| 300 | |b Illustrationen | ||
| 300 | |a 12 | ||
| 336 | |a Text |b txt |2 rdacontent | ||
| 337 | |a Computermedien |b c |2 rdamedia | ||
| 338 | |a Online-Ressource |b cr |2 rdacarrier | ||
| 500 | |a Gesehen am 29.10.2025 | ||
| 520 | |a In medical oncology, text data, such as clinical letters or procedure reports, is stored in an unstructured way, making quantitative analysis difficult. Manual review or structured information retrieval is time-consuming and costly, whereas Large Language Models (LLMs) offer new possibilities in natural language processing for structured Information Extraction (IE) from medical free text. This protocol describes a workflow (LLM-AIx) for extracting predefined clinical entities from unstructured oncology text using privacy-preserving LLMs. It addresses a key barrier in clinical research and care by enabling efficient information extraction to support decision-making and large-scale data analysis. It runs on local hospital infrastructure, eliminating the need to transfer patient data externally. We demonstrate its utility on 100 pathology reports from The Cancer Genome Atlas (TCGA) for TNM stage extraction. LLM-AIx requires no programming skills and offers a user-friendly interface for rapid, structured data extraction from clinical free text. | ||
| 650 | 4 | |a Cancer | |
| 650 | 4 | |a Translational research | |
| 700 | 1 | |a Wolf, Fabian |e VerfasserIn |4 aut | |
| 700 | 1 | |a Leßmann, Marie-Elisabeth |e VerfasserIn |4 aut | |
| 700 | 1 | |a van Treeck, Marko |e VerfasserIn |4 aut | |
| 700 | 1 | |a Ferber, Dyke |e VerfasserIn |0 (DE-588)1171467079 |0 (DE-627)1040545629 |0 (DE-576)513746056 |4 aut | |
| 700 | 1 | |a Zhu, Jiefu |e VerfasserIn |4 aut | |
| 700 | 1 | |a Boehme, Heiko |e VerfasserIn |4 aut | |
| 700 | 1 | |a Bressem, Keno K. |e VerfasserIn |4 aut | |
| 700 | 1 | |a Ulrich, Hannes |e VerfasserIn |4 aut | |
| 700 | 1 | |a Ebert, Matthias |d 1968- |e VerfasserIn |0 (DE-588)1030133522 |0 (DE-627)734827083 |0 (DE-576)377938432 |4 aut | |
| 700 | 1 | |a Kather, Jakob Nikolas |d 1989- |e VerfasserIn |0 (DE-588)1064064914 |0 (DE-627)812897587 |0 (DE-576)423589091 |4 aut | |
| 773 | 0 | 8 | |i Enthalten in |t npj precision oncology |d [London] : Springer Nature, 2017 |g 9(2025), 1, Artikel-ID 313, Seite 1-12 |h Online-Ressource |w (DE-627)884384454 |w (DE-600)2891458-2 |w (DE-576)486547728 |x 2397-768X |7 nnas |a A software pipeline for medical information extraction with large language models, open source and suitable for oncology |
| 773 | 1 | 8 | |g volume:9 |g year:2025 |g number:1 |g elocationid:313 |g pages:1-12 |g extent:12 |a A software pipeline for medical information extraction with large language models, open source and suitable for oncology |
| 856 | 4 | 0 | |u https://doi.org/10.1038/s41698-025-01103-4 |x Verlag |x Resolving-System |z kostenfrei |3 Volltext |
| 856 | 4 | 0 | |u https://www.nature.com/articles/s41698-025-01103-4 |x Verlag |z kostenfrei |3 Volltext |
| 951 | |a AR | ||
| 992 | |a 20251029 | ||
| 993 | |a Article | ||
| 994 | |a 2025 | ||
| 998 | |g 1064064914 |a Kather, Jakob Nikolas |m 1064064914:Kather, Jakob Nikolas |d 910000 |d 910100 |e 910000PK1064064914 |e 910100PK1064064914 |k 0/910000/ |k 1/910000/910100/ |p 11 |y j | ||
| 998 | |g 1030133522 |a Ebert, Matthias |m 1030133522:Ebert, Matthias |d 60000 |d 61100 |e 60000PE1030133522 |e 61100PE1030133522 |k 0/60000/ |k 1/60000/61100/ |p 10 | ||
| 998 | |g 1171467079 |a Ferber, Dyke |m 1171467079:Ferber, Dyke |p 5 | ||
| 998 | |g 1198882956 |a Wiest, Isabella |m 1198882956:Wiest, Isabella |d 60000 |d 61100 |e 60000PW1198882956 |e 61100PW1198882956 |k 0/60000/ |k 1/60000/61100/ |p 1 |x j | ||
| 999 | |a KXP-PPN1939652537 |e 479386600X | ||
| BIB | |a Y | ||
| SER | |a journal | ||
| JSO | |a {"id":{"eki":["1939652537"],"doi":["10.1038/s41698-025-01103-4"]},"name":{"displayForm":["Isabella Catharina Wiest, Fabian Wolf, Marie-Elisabeth Leßmann, Marko van Treeck, Dyke Ferber, Jiefu Zhu, Heiko Boehme, Keno K. Bressem, Hannes Ulrich, Matthias P. Ebert & Jakob Nikolas Kather"]},"language":["eng"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"note":["Gesehen am 29.10.2025"],"title":[{"title":"A software pipeline for medical information extraction with large language models, open source and suitable for oncology","title_sort":"software pipeline for medical information extraction with large language models, open source and suitable for oncology"}],"origin":[{"dateIssuedDisp":"17 September 2025","dateIssuedKey":"2025"}],"person":[{"family":"Wiest","role":"aut","given":"Isabella","display":"Wiest, Isabella"},{"display":"Wolf, Fabian","role":"aut","given":"Fabian","family":"Wolf"},{"display":"Leßmann, Marie-Elisabeth","role":"aut","given":"Marie-Elisabeth","family":"Leßmann"},{"role":"aut","given":"Marko","family":"van Treeck","display":"van Treeck, Marko"},{"given":"Dyke","role":"aut","family":"Ferber","display":"Ferber, Dyke"},{"family":"Zhu","role":"aut","given":"Jiefu","display":"Zhu, Jiefu"},{"family":"Boehme","given":"Heiko","role":"aut","display":"Boehme, Heiko"},{"display":"Bressem, Keno K.","family":"Bressem","role":"aut","given":"Keno K."},{"display":"Ulrich, Hannes","given":"Hannes","role":"aut","family":"Ulrich"},{"display":"Ebert, Matthias","family":"Ebert","given":"Matthias","role":"aut"},{"family":"Kather","given":"Jakob Nikolas","role":"aut","display":"Kather, Jakob Nikolas"}],"relHost":[{"part":{"year":"2025","volume":"9","pages":"1-12","issue":"1","text":"9(2025), 1, Artikel-ID 313, Seite 1-12","extent":"12"},"id":{"issn":["2397-768X"],"eki":["884384454"],"zdb":["2891458-2"]},"titleAlt":[{"title":"a nature research journal"},{"title":"Precision oncology"}],"pubHistory":["20 March 2017-"],"name":{"displayForm":["published by Springer Nature in partnership with The Hormel Institute, University of Minnesota"]},"title":[{"subtitle":"a natureresearch journal","title":"npj precision oncology","title_sort":"npj precision oncology"}],"note":["Gesehen am 20. April 2017"],"type":{"media":"Online-Ressource","bibl":"periodical"},"language":["eng"],"origin":[{"publisher":"Springer Nature","dateIssuedDisp":"[2017]-","publisherPlace":"[London]"}],"disp":"A software pipeline for medical information extraction with large language models, open source and suitable for oncologynpj precision oncology","recId":"884384454","physDesc":[{"extent":"Online-Ressource"}]}],"physDesc":[{"extent":"12 S.","noteIll":"Illustrationen"}],"recId":"1939652537"} | ||
| SRT | |a WIESTISABESOFTWAREPI1720 | ||