Benchmarking vision-language models for diagnostics in emergency and critical care settings

The applicability of vision-language models (VLMs) for acute care in emergency and intensive care units remains underexplored. Using a multimodal dataset of diagnostic questions involving medical images and clinical context, we benchmarked several small open-source VLMs against GPT-4o. While open mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kurz, Christoph (VerfasserIn) , Merzhevich, Tatiana (VerfasserIn) , Eskofier, Bjoern M. (VerfasserIn) , Kather, Jakob Nikolas (VerfasserIn) , Gmeiner, Benjamin (VerfasserIn)
Dokumenttyp: Article (Journal) Editorial
Sprache:Englisch
Veröffentlicht: 10 July 2025
In: npj digital medicine
Year: 2025, Jahrgang: 8, Pages: 1-6
ISSN:2398-6352
DOI:10.1038/s41746-025-01837-2
Online-Zugang:Verlag, kostenfrei, Volltext: https://doi.org/10.1038/s41746-025-01837-2
Verlag, kostenfrei, Volltext: https://www.nature.com/articles/s41746-025-01837-2
Volltext
Verfasserangaben:Christoph F. Kurz, Tatiana Merzhevich, Bjoern M. Eskofier, Jakob Nikolas Kather & Benjamin Gmeiner

MARC

LEADER 00000naa a2200000 c 4500
001 1941747612
003 DE-627
005 20251121084144.0
007 cr uuu---uuuuu
008 251121s2025 xx |||||o 00| ||eng c
024 7 |a 10.1038/s41746-025-01837-2  |2 doi 
035 |a (DE-627)1941747612 
035 |a (DE-599)KXP1941747612 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Kurz, Christoph  |d 1981-  |e VerfasserIn  |0 (DE-588)1204607125  |0 (DE-627)1690006889  |4 aut 
245 1 0 |a Benchmarking vision-language models for diagnostics in emergency and critical care settings  |c Christoph F. Kurz, Tatiana Merzhevich, Bjoern M. Eskofier, Jakob Nikolas Kather & Benjamin Gmeiner 
264 1 |c 10 July 2025 
300 |b Illustrationen, Diagramme 
300 |a 6 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 21.11.2025 
520 |a The applicability of vision-language models (VLMs) for acute care in emergency and intensive care units remains underexplored. Using a multimodal dataset of diagnostic questions involving medical images and clinical context, we benchmarked several small open-source VLMs against GPT-4o. While open models demonstrated limited diagnostic accuracy (up to 40.4%), GPT-4o significantly outperformed them (68.1%). Findings highlight the need for specialized training and optimization to improve open-source VLMs for acute care applications. 
650 4 |a Computational biology and bioinformatics 
650 4 |a Health care 
650 4 |a Medical research 
700 1 |a Merzhevich, Tatiana  |e VerfasserIn  |4 aut 
700 1 |a Eskofier, Bjoern M.  |e VerfasserIn  |4 aut 
700 1 |a Kather, Jakob Nikolas  |d 1989-  |e VerfasserIn  |0 (DE-588)1064064914  |0 (DE-627)812897587  |0 (DE-576)423589091  |4 aut 
700 1 |a Gmeiner, Benjamin  |e VerfasserIn  |4 aut 
773 0 8 |i Enthalten in  |t npj digital medicine  |d [Basingstoke] : Macmillan Publishers Limited, 2016  |g 8(2025), Artikel-ID 423, Seite 1-6  |h Online-Ressource  |w (DE-627)1016587104  |w (DE-600)2925182-5  |w (DE-576)501513582  |x 2398-6352  |7 nnas  |a Benchmarking vision-language models for diagnostics in emergency and critical care settings 
773 1 8 |g volume:8  |g year:2025  |g elocationid:423  |g pages:1-6  |g extent:6  |a Benchmarking vision-language models for diagnostics in emergency and critical care settings 
856 4 0 |u https://doi.org/10.1038/s41746-025-01837-2  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext 
856 4 0 |u https://www.nature.com/articles/s41746-025-01837-2  |x Verlag  |z kostenfrei  |3 Volltext 
951 |a AR 
992 |a 20251121 
993 |a Editorial 
994 |a 2025 
998 |g 1064064914  |a Kather, Jakob Nikolas  |m 1064064914:Kather, Jakob Nikolas  |d 910000  |d 910100  |e 910000PK1064064914  |e 910100PK1064064914  |k 0/910000/  |k 1/910000/910100/  |p 4 
999 |a KXP-PPN1941747612  |e 4809768066 
BIB |a Y 
SER |a journal 
JSO |a {"recId":"1941747612","physDesc":[{"noteIll":"Illustrationen, Diagramme","extent":"6 S."}],"title":[{"title":"Benchmarking vision-language models for diagnostics in emergency and critical care settings","title_sort":"Benchmarking vision-language models for diagnostics in emergency and critical care settings"}],"note":["Gesehen am 21.11.2025"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"language":["eng"],"relHost":[{"origin":[{"publisherPlace":"[Basingstoke]","dateIssuedDisp":"[2016]-","publisher":"Macmillan Publishers Limited"}],"pubHistory":["2016-"],"type":{"bibl":"periodical","media":"Online-Ressource"},"language":["eng"],"note":["Gesehen am 06. September 2019"],"id":{"issn":["2398-6352"],"eki":["1016587104"],"zdb":["2925182-5"]},"title":[{"title":"npj digital medicine","title_sort":"npj digital medicine"}],"part":{"pages":"1-6","year":"2025","text":"8(2025), Artikel-ID 423, Seite 1-6","volume":"8","extent":"6"},"physDesc":[{"extent":"Online-Ressource"}],"recId":"1016587104","disp":"Benchmarking vision-language models for diagnostics in emergency and critical care settingsnpj digital medicine"}],"person":[{"family":"Kurz","role":"aut","given":"Christoph","display":"Kurz, Christoph"},{"given":"Tatiana","role":"aut","family":"Merzhevich","display":"Merzhevich, Tatiana"},{"display":"Eskofier, Bjoern M.","family":"Eskofier","given":"Bjoern M.","role":"aut"},{"display":"Kather, Jakob Nikolas","family":"Kather","role":"aut","given":"Jakob Nikolas"},{"family":"Gmeiner","role":"aut","given":"Benjamin","display":"Gmeiner, Benjamin"}],"origin":[{"dateIssuedDisp":"10 July 2025","dateIssuedKey":"2025"}],"id":{"eki":["1941747612"],"doi":["10.1038/s41746-025-01837-2"]},"name":{"displayForm":["Christoph F. Kurz, Tatiana Merzhevich, Bjoern M. Eskofier, Jakob Nikolas Kather & Benjamin Gmeiner"]}} 
SRT |a KURZCHRISTBENCHMARKI1020