Comparing ChatGPT-3.5 and ChatGPT-4’s alignments with the German evidence-based S3 guideline for adult soft tissue sarcoma

Clinical reliability assessment of large language models is necessary due to their increasing use in healthcare. This study assessed the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions deducted from the German evidence-based S3 guideline for adult soft tissue sarcoma (STS). Reponses...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Cheng-Peng (Author) , Jakob, Jens (Author) , Menge, Franka (Author) , Reißfelder, Christoph (Author) , Hohenberger, Peter (Author) , Yang, Cui (Author)
Format:	Article (Journal)
Language:	English
Published:	December 20, 2024
In:	iScience Year: 2024, Volume: 27, Issue: 12, Pages: 1-9
ISSN:	2589-0042
DOI:	10.1016/j.isci.2024.111493
Online Access:	Verlag, kostenfrei, Volltext: https://doi.org/10.1016/j.isci.2024.111493 Verlag, kostenfrei, Volltext: https://www.sciencedirect.com/science/article/pii/S2589004224027202
Author Notes:	Cheng-Peng Li, Jens Jakob, Franka Menge, Christoph Reißfelder, Peter Hohenberger, and Cui Yang

Description
Summary:	Clinical reliability assessment of large language models is necessary due to their increasing use in healthcare. This study assessed the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions deducted from the German evidence-based S3 guideline for adult soft tissue sarcoma (STS). Reponses to 80 complex clinical questions covering diagnosis, treatment, and surveillance aspects were independently scored by two sarcoma experts for accuracy and adequacy. ChatGPT-4 outperformed ChatGPT-3.5 overall, with higher median scores in both accuracy (5.5 vs. 5.0) and adequacy (5.0 vs. 4.0). While both versions performed similarly on questions about retroperitoneal/visceral sarcoma and gastrointestinal stromal tumor (GIST)-specific treatment as well as questions about surveillance, ChatGPT-4 performed better on questions about general STS treatment and extremity/trunk sarcomas. Despite their potential as a supportive tool, both models occasionally offered misleading and potentially life-threatening information. This underscores the significance of cautious adoption and human monitoring in clinical settings.
Item Description:	Online verfügbar: 28. November 2024, Artikelversion: 11. December 2024 Gesehen am 15.04.2025
Physical Description:	Online Resource
ISSN:	2589-0042
DOI:	10.1016/j.isci.2024.111493

Comparing ChatGPT-3.5 and ChatGPT-4’s alignments with the German evidence-based S3 guideline for adult soft tissue sarcoma

Similar Items