Medication information extraction using local large language models
Objective - Medication information is crucial for clinical routine and research. However, a vast amount is stored in unstructured text, such as doctor’s letters, requiring manual extraction - a resource-intensive, error-prone task. Automating this process comes with significant constraints in a clin...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article (Journal) |
| Language: | English |
| Published: |
September 2025
|
| In: |
Journal of biomedical informatics
Year: 2025, Volume: 169, Pages: 1-32 |
| ISSN: | 1532-0480 |
| DOI: | 10.1016/j.jbi.2025.104898 |
| Online Access: | Verlag, kostenfrei, Volltext: https://doi.org/10.1016/j.jbi.2025.104898 Verlag, kostenfrei, Volltext: https://www.sciencedirect.com/science/article/pii/S1532046425001273 |
| Author Notes: | Phillip Richter-Pechanski, Marvin Seiferling, Christina Kiriakou, Dominic M. Schwab, Nicolas A. Geis, Christoph Dieterich, Anette Frank |
| Summary: | Objective - Medication information is crucial for clinical routine and research. However, a vast amount is stored in unstructured text, such as doctor’s letters, requiring manual extraction - a resource-intensive, error-prone task. Automating this process comes with significant constraints in a clinical setup, including the demand for clinical expertise, limited time-resources, restricted IT infrastructure, and the demand for transparent predictions. Recent advances in generative large language models (LLMs) and parameter-efficient fine-tuning methods show potential to address these challenges. - Methods - We evaluated local LLMs for end-to-end extraction of medication information, combining named entity recognition and relation extraction. We used format-restricting instructions and developed an innovative feedback pipeline to facilitate automated evaluation. We applied token-level Shapley values to visualize and quantify token contributions, to improve transparency of model predictions. - Results - Two open-source LLMs - one general (Llama) and one domain-specific (OpenBioLLM) - were evaluated on the English n2c2 2018 corpus and the German CARDIO:DE corpus. OpenBioLLM frequently struggled with structured outputs and hallucinations. Fine-tuned Llama models achieved new state-of-the-art results, improving F1-score by up to 10 percentage points (pp.) for adverse drug events and 6 pp. for medication reasons on English data. On the German dataset, Llama established a new benchmark, outperforming traditional machine learning methods by up to 16 pp. micro average F1-score. - Conclusion - Our findings show that fine-tuned local open-source generative LLMs outperform SOTA methods for medication information extraction, delivering high performance with limited time and IT resources in a real-world clinical setup, and demonstrate their effectiveness on both English and German data. Applying Shapley values improved prediction transparency, supporting informed clinical decision-making. |
|---|---|
| Item Description: | Online verfügbar: 21. August 2025, Artikelversion: 23. August 2025 Gesehen am 12.11.2025 |
| Physical Description: | Online Resource |
| ISSN: | 1532-0480 |
| DOI: | 10.1016/j.jbi.2025.104898 |