Medication information extraction using local large language models

Objective - Medication information is crucial for clinical routine and research. However, a vast amount is stored in unstructured text, such as doctor’s letters, requiring manual extraction - a resource-intensive, error-prone task. Automating this process comes with significant constraints in a clin...

Full description

Saved in:
Bibliographic Details
Main Authors: Richter-Pechanski, Phillip (Author) , Seiferling, Marvin (Author) , Kiriakou, Christina (Author) , Schwab, Dominic Mathias (Author) , Geis, Nicolas (Author) , Dieterich, Christoph (Author) , Frank, Anette (Author)
Format: Article (Journal)
Language:English
Published: September 2025
In: Journal of biomedical informatics
Year: 2025, Volume: 169, Pages: 1-32
ISSN:1532-0480
DOI:10.1016/j.jbi.2025.104898
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1016/j.jbi.2025.104898
Verlag, kostenfrei, Volltext: https://www.sciencedirect.com/science/article/pii/S1532046425001273
Get full text
Author Notes:Phillip Richter-Pechanski, Marvin Seiferling, Christina Kiriakou, Dominic M. Schwab, Nicolas A. Geis, Christoph Dieterich, Anette Frank
Description
Summary:Objective - Medication information is crucial for clinical routine and research. However, a vast amount is stored in unstructured text, such as doctor’s letters, requiring manual extraction - a resource-intensive, error-prone task. Automating this process comes with significant constraints in a clinical setup, including the demand for clinical expertise, limited time-resources, restricted IT infrastructure, and the demand for transparent predictions. Recent advances in generative large language models (LLMs) and parameter-efficient fine-tuning methods show potential to address these challenges. - Methods - We evaluated local LLMs for end-to-end extraction of medication information, combining named entity recognition and relation extraction. We used format-restricting instructions and developed an innovative feedback pipeline to facilitate automated evaluation. We applied token-level Shapley values to visualize and quantify token contributions, to improve transparency of model predictions. - Results - Two open-source LLMs - one general (Llama) and one domain-specific (OpenBioLLM) - were evaluated on the English n2c2 2018 corpus and the German CARDIO:DE corpus. OpenBioLLM frequently struggled with structured outputs and hallucinations. Fine-tuned Llama models achieved new state-of-the-art results, improving F1-score by up to 10 percentage points (pp.) for adverse drug events and 6 pp. for medication reasons on English data. On the German dataset, Llama established a new benchmark, outperforming traditional machine learning methods by up to 16 pp. micro average F1-score. - Conclusion - Our findings show that fine-tuned local open-source generative LLMs outperform SOTA methods for medication information extraction, delivering high performance with limited time and IT resources in a real-world clinical setup, and demonstrate their effectiveness on both English and German data. Applying Shapley values improved prediction transparency, supporting informed clinical decision-making.
Item Description:Online verfügbar: 21. August 2025, Artikelversion: 23. August 2025
Gesehen am 12.11.2025
Physical Description:Online Resource
ISSN:1532-0480
DOI:10.1016/j.jbi.2025.104898