Performance of large language models on a neurology board-style examination

Recent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.T...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Schubert, Marc Cicero (VerfasserIn) , Wick, Wolfgang (VerfasserIn) , Venkataramani, Varun (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: December 7, 2023
In: JAMA network open
Year: 2023, Jahrgang: 6, Heft: 12, Pages: 1-11
ISSN:2574-3805
DOI:10.1001/jamanetworkopen.2023.46721
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1001/jamanetworkopen.2023.46721
Volltext
Verfasserangaben:Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD

MARC

LEADER 00000caa a2200000 c 4500
001 1881551342
003 DE-627
005 20240307022355.0
007 cr uuu---uuuuu
008 240226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1001/jamanetworkopen.2023.46721  |2 doi 
035 |a (DE-627)1881551342 
035 |a (DE-599)KXP1881551342 
035 |a (OCoLC)1425200496 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Schubert, Marc Cicero  |e VerfasserIn  |0 (DE-588)1264158912  |0 (DE-627)1812829779  |4 aut 
245 1 0 |a Performance of large language models on a neurology board-style examination  |c Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD 
264 1 |c December 7, 2023 
300 |a 11 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 26.02.2024 
520 |a Recent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.To assess the performance of LLMs on neurology board-style examinations.This cross-sectional study was conducted between May 17 and May 31, 2023. The evaluation utilized a question bank resembling neurology board-style examination questions and was validated with a small question cohort by the European Board for Neurology. All questions were categorized into lower-order (recall, understanding) and higher-order (apply, analyze, synthesize) questions based on the Bloom taxonomy for learning and assessment. Performance by LLM ChatGPT versions 3.5 (LLM 1) and 4 (LLM 2) was assessed in relation to overall scores, question type, and topics, along with the confidence level and reproducibility of answers.Overall percentage scores of 2 LLMs.LLM 2 significantly outperformed LLM 1 by correctly answering 1662 of 1956 questions (85.0%) vs 1306 questions (66.8%) for LLM 1. Notably, LLM 2’s performance was greater than the mean human score of 73.8%, effectively achieving near-passing and passing grades in the neurology board-style examination. LLM 2 outperformed human users in behavioral, cognitive, and psychological-related questions and demonstrated superior performance to LLM 1 in 6 categories. Both LLMs performed better on lower-order than higher-order questions, with LLM 2 excelling in both lower-order and higher-order questions. Both models consistently used confident language, even when providing incorrect answers. Reproducible answers of both LLMs were associated with a higher percentage of correct answers than inconsistent answers.Despite the absence of neurology-specific training, LLM 2 demonstrated commendable performance, whereas LLM 1 performed slightly below the human average. While higher-order cognitive tasks were more challenging for both models, LLM 2’s results were equivalent to passing grades in specialized neurology examinations. These findings suggest that LLMs could have significant applications in clinical neurology and health care with further refinements. 
700 1 |a Wick, Wolfgang  |d 1970-  |e VerfasserIn  |0 (DE-588)120297736  |0 (DE-627)080586929  |0 (DE-576)186221320  |4 aut 
700 1 |a Venkataramani, Varun  |d 1989-  |e VerfasserIn  |0 (DE-588)1074364562  |0 (DE-627)832064378  |0 (DE-576)442609345  |4 aut 
773 0 8 |i Enthalten in  |t JAMA network open  |d Chicago, Ill. : American Medical Association, 2018  |g 6(2023), 12 vom: Dez., Artikel-ID e2346721, Seite 1-11  |h Online-Ressource  |w (DE-627)1023451867  |w (DE-600)2931249-8  |w (DE-576)505831112  |x 2574-3805  |7 nnas  |a Performance of large language models on a neurology board-style examination 
773 1 8 |g volume:6  |g year:2023  |g number:12  |g month:12  |g elocationid:e2346721  |g pages:1-11  |g extent:11  |a Performance of large language models on a neurology board-style examination 
856 4 0 |u https://doi.org/10.1001/jamanetworkopen.2023.46721  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20240226 
993 |a Article 
994 |a 2023 
998 |g 1074364562  |a Venkataramani, Varun  |m 1074364562:Venkataramani, Varun  |d 910000  |d 910600  |e 910000PV1074364562  |e 910600PV1074364562  |k 0/910000/  |k 1/910000/910600/  |p 3  |y j 
998 |g 120297736  |a Wick, Wolfgang  |m 120297736:Wick, Wolfgang  |d 910000  |d 911100  |e 910000PW120297736  |e 911100PW120297736  |k 0/910000/  |k 1/910000/911100/  |p 2 
998 |g 1264158912  |a Schubert, Marc Cicero  |m 1264158912:Schubert, Marc Cicero  |d 50000  |d 53100  |e 50000PS1264158912  |e 53100PS1264158912  |k 0/50000/  |k 1/50000/53100/  |p 1  |x j 
999 |a KXP-PPN1881551342  |e 4491049440 
BIB |a Y 
SER |a journal 
JSO |a {"name":{"displayForm":["Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD"]},"id":{"doi":["10.1001/jamanetworkopen.2023.46721"],"eki":["1881551342"]},"relHost":[{"recId":"1023451867","physDesc":[{"extent":"Online-Ressource"}],"disp":"Performance of large language models on a neurology board-style examinationJAMA network open","origin":[{"publisher":"American Medical Association","dateIssuedDisp":"[2018]-","publisherPlace":"Chicago, Ill."}],"name":{"displayForm":["American Medical Association"]},"pubHistory":["Vol 1, no. 1 (May 2018)-"],"id":{"issn":["2574-3805"],"eki":["1023451867"],"zdb":["2931249-8"]},"title":[{"title":"JAMA network open","title_sort":"JAMA network open"}],"part":{"pages":"1-11","extent":"11","text":"6(2023), 12 vom: Dez., Artikel-ID e2346721, Seite 1-11","issue":"12","year":"2023","volume":"6"},"language":["eng"],"type":{"media":"Online-Ressource","bibl":"periodical"}}],"person":[{"display":"Schubert, Marc Cicero","given":"Marc Cicero","role":"aut","family":"Schubert"},{"role":"aut","given":"Wolfgang","family":"Wick","display":"Wick, Wolfgang"},{"family":"Venkataramani","role":"aut","given":"Varun","display":"Venkataramani, Varun"}],"origin":[{"dateIssuedDisp":"December 7, 2023","dateIssuedKey":"2023"}],"title":[{"title_sort":"Performance of large language models on a neurology board-style examination","title":"Performance of large language models on a neurology board-style examination"}],"note":["Gesehen am 26.02.2024"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"language":["eng"],"recId":"1881551342","physDesc":[{"extent":"11 S."}]} 
SRT |a SCHUBERTMAPERFORMANC7202