Performance of large language models on a neurology board-style examination

Recent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.T...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Schubert, Marc Cicero (VerfasserIn) , Wick, Wolfgang (VerfasserIn) , Venkataramani, Varun (VerfasserIn)
Dokumenttyp:	Article (Journal)
Sprache:	Englisch
Veröffentlicht:	December 7, 2023
In:	JAMA network open Year: 2023, Jahrgang: 6, Heft: 12, Pages: 1-11
ISSN:	2574-3805
DOI:	10.1001/jamanetworkopen.2023.46721
Online-Zugang:	Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1001/jamanetworkopen.2023.46721
Verfasserangaben:	Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD

MARC


LEADER	00000caa a2200000 c 4500
001	1881551342
003	DE-627
005	20240307022355.0
007	cr uuu---uuuuu
008	240226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1001/jamanetworkopen.2023.46721 \|2 doi
035			\|a (DE-627)1881551342
035			\|a (DE-599)KXP1881551342
035			\|a (OCoLC)1425200496
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
084			\|a 33 \|2 sdnb
100	1		\|a Schubert, Marc Cicero \|e VerfasserIn \|0 (DE-588)1264158912 \|0 (DE-627)1812829779 \|4 aut
245	1	0	\|a Performance of large language models on a neurology board-style examination \|c Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD
264		1	\|c December 7, 2023
300			\|a 11
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a Gesehen am 26.02.2024
520			\|a Recent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.To assess the performance of LLMs on neurology board-style examinations.This cross-sectional study was conducted between May 17 and May 31, 2023. The evaluation utilized a question bank resembling neurology board-style examination questions and was validated with a small question cohort by the European Board for Neurology. All questions were categorized into lower-order (recall, understanding) and higher-order (apply, analyze, synthesize) questions based on the Bloom taxonomy for learning and assessment. Performance by LLM ChatGPT versions 3.5 (LLM 1) and 4 (LLM 2) was assessed in relation to overall scores, question type, and topics, along with the confidence level and reproducibility of answers.Overall percentage scores of 2 LLMs.LLM 2 significantly outperformed LLM 1 by correctly answering 1662 of 1956 questions (85.0%) vs 1306 questions (66.8%) for LLM 1. Notably, LLM 2’s performance was greater than the mean human score of 73.8%, effectively achieving near-passing and passing grades in the neurology board-style examination. LLM 2 outperformed human users in behavioral, cognitive, and psychological-related questions and demonstrated superior performance to LLM 1 in 6 categories. Both LLMs performed better on lower-order than higher-order questions, with LLM 2 excelling in both lower-order and higher-order questions. Both models consistently used confident language, even when providing incorrect answers. Reproducible answers of both LLMs were associated with a higher percentage of correct answers than inconsistent answers.Despite the absence of neurology-specific training, LLM 2 demonstrated commendable performance, whereas LLM 1 performed slightly below the human average. While higher-order cognitive tasks were more challenging for both models, LLM 2’s results were equivalent to passing grades in specialized neurology examinations. These findings suggest that LLMs could have significant applications in clinical neurology and health care with further refinements.
700	1		\|a Wick, Wolfgang \|d 1970- \|e VerfasserIn \|0 (DE-588)120297736 \|0 (DE-627)080586929 \|0 (DE-576)186221320 \|4 aut
700	1		\|a Venkataramani, Varun \|d 1989- \|e VerfasserIn \|0 (DE-588)1074364562 \|0 (DE-627)832064378 \|0 (DE-576)442609345 \|4 aut
773	0	8	\|i Enthalten in \|t JAMA network open \|d Chicago, Ill. : American Medical Association, 2018 \|g 6(2023), 12 vom: Dez., Artikel-ID e2346721, Seite 1-11 \|h Online-Ressource \|w (DE-627)1023451867 \|w (DE-600)2931249-8 \|w (DE-576)505831112 \|x 2574-3805 \|7 nnas \|a Performance of large language models on a neurology board-style examination
773	1	8	\|g volume:6 \|g year:2023 \|g number:12 \|g month:12 \|g elocationid:e2346721 \|g pages:1-11 \|g extent:11 \|a Performance of large language models on a neurology board-style examination
856	4	0	\|u https://doi.org/10.1001/jamanetworkopen.2023.46721 \|x Verlag \|x Resolving-System \|z lizenzpflichtig \|3 Volltext
951			\|a AR
992			\|a 20240226
993			\|a Article
994			\|a 2023
998			\|g 1074364562 \|a Venkataramani, Varun \|m 1074364562:Venkataramani, Varun \|d 910000 \|d 910600 \|e 910000PV1074364562 \|e 910600PV1074364562 \|k 0/910000/ \|k 1/910000/910600/ \|p 3 \|y j
998			\|g 120297736 \|a Wick, Wolfgang \|m 120297736:Wick, Wolfgang \|d 910000 \|d 911100 \|e 910000PW120297736 \|e 911100PW120297736 \|k 0/910000/ \|k 1/910000/911100/ \|p 2
998			\|g 1264158912 \|a Schubert, Marc Cicero \|m 1264158912:Schubert, Marc Cicero \|d 50000 \|d 53100 \|e 50000PS1264158912 \|e 53100PS1264158912 \|k 0/50000/ \|k 1/50000/53100/ \|p 1 \|x j
999			\|a KXP-PPN1881551342 \|e 4491049440
BIB			\|a Y
SER			\|a journal
JSO			\|a {"name":{"displayForm":["Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD"]},"id":{"doi":["10.1001/jamanetworkopen.2023.46721"],"eki":["1881551342"]},"relHost":[{"recId":"1023451867","physDesc":[{"extent":"Online-Ressource"}],"disp":"Performance of large language models on a neurology board-style examinationJAMA network open","origin":[{"publisher":"American Medical Association","dateIssuedDisp":"[2018]-","publisherPlace":"Chicago, Ill."}],"name":{"displayForm":["American Medical Association"]},"pubHistory":["Vol 1, no. 1 (May 2018)-"],"id":{"issn":["2574-3805"],"eki":["1023451867"],"zdb":["2931249-8"]},"title":[{"title":"JAMA network open","title_sort":"JAMA network open"}],"part":{"pages":"1-11","extent":"11","text":"6(2023), 12 vom: Dez., Artikel-ID e2346721, Seite 1-11","issue":"12","year":"2023","volume":"6"},"language":["eng"],"type":{"media":"Online-Ressource","bibl":"periodical"}}],"person":[{"display":"Schubert, Marc Cicero","given":"Marc Cicero","role":"aut","family":"Schubert"},{"role":"aut","given":"Wolfgang","family":"Wick","display":"Wick, Wolfgang"},{"family":"Venkataramani","role":"aut","given":"Varun","display":"Venkataramani, Varun"}],"origin":[{"dateIssuedDisp":"December 7, 2023","dateIssuedKey":"2023"}],"title":[{"title_sort":"Performance of large language models on a neurology board-style examination","title":"Performance of large language models on a neurology board-style examination"}],"note":["Gesehen am 26.02.2024"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"language":["eng"],"recId":"1881551342","physDesc":[{"extent":"11 S."}]}
SRT			\|a SCHUBERTMAPERFORMANC7202

Performance of large language models on a neurology board-style examination

MARC

Ähnliche Einträge