Enhancing clinicians’ trust in large language models via transparent source attribution: A randomized controlled evaluation in uro-oncology

Introduction - Large language models (LLMs) are utilized to answer queries in urology and oncology, yet the performance is limited due to outdated data and missing source transparency, which undermines clinical reliability and therefore adoption. - Material and methods - We developed UroBot, a urolo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Carl, Nicolas (Author) , Hetz, Martin Joachim (Author) , Wies, Christoph (Author) , Haggenmüller, Sarah (Author) , Winterstein, Jana Therés (Author) , Mangold, Maurin Helen (Author) , Maywald, Lasse (Author) , Worst, Thomas (Author) , Westhoff, Niklas Christian (Author) , Michel, Maurice Stephan (Author) , Wessels, Frederik (Author) , Brinker, Titus Josef (Author)
Format:	Article (Journal)
Language:	English
Published:	17 January 2026
In:	European journal of cancer Year: 2026, Volume: 233, Pages: 1-7
ISSN:	1879-0852
DOI:	10.1016/j.ejca.2025.116168
Online Access:	Verlag, kostenfrei, Volltext: https://doi.org/10.1016/j.ejca.2025.116168 Verlag, kostenfrei, Volltext: https://www.sciencedirect.com/science/article/pii/S0959804925010548
Author Notes:	Nicolas Carl, Martin Joachim Hetz, Christoph Wies, Sarah Haggenmüller, Jana Theres Winterstein, Maurin Helen Mangold, Lasse Maywald, Thomas Stefan Worst, Niklas Westhoff, Maurice Stephan Michel, Frederik Wessels, Titus Josef Brinker

Description
Summary:	Introduction - Large language models (LLMs) are utilized to answer queries in urology and oncology, yet the performance is limited due to outdated data and missing source transparency, which undermines clinical reliability and therefore adoption. - Material and methods - We developed UroBot, a urology-specific chatbot integrating retrieval-augmented generation (RAG) to provide in-line references and source text previews for each response. In a randomized controlled reader study, UroBot and ChatGPT were compared across ten uro-oncological case rounds. Thirty urologists assessed recommendation correctness, source verifiability and trust with preference ratings collected after each round. - Results - UroBot performed significantly better than ChatGPT in recommendation correctness (73% vs. 50%; p<0.001), source attribution (74% vs. 30%; p<0.001) and verifiability of sources (84% vs. 35%; p<0.001). Furthermore, clinicians consistently preferred UroBot for accuracy, source verifiability and trust. Qualitative analysis showed that ChatGPT often produced vague or incorrect citations, with 28% being non-existent or outdated and 83% lacking specific sections, whereas UroBot achieved complete alignment on guideline sub-section and page level. These gains in citation precision were mirrored by higher clinician ratings for verifiability and trust. Limitations include the small sample size of ten cases due to feasibility, which may not cover the full uro-oncological spectrum. - Conclusion - Our findings show that combining LLMs with RAG with in-line references and source text previews markedly enhances perceived source attribution and verifiability compared to state-of-the-art conventional LLMs. Importantly, this approach is readily transferable across medical subspecialties, enabling reliable and up-to-date clinical decision support.
Item Description:	Online verfügbar: 11. Dezember 2025, Artikelversion: 15. Dezember 2025 Gesehen am 16.02.2026
Physical Description:	Online Resource
ISSN:	1879-0852
DOI:	10.1016/j.ejca.2025.116168

Enhancing clinicians’ trust in large language models via transparent source attribution: A randomized controlled evaluation in uro-oncology

Similar Items