Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment

Objective This study evaluates the performance of four large language models—ChatGPT 4o, ChatGPT o1-mini, Gemini 2.0 Flash, and Gemini 1.5 Flash—in answering multiple-choice questions in child and adolescent psychiatry to assess their level of factual knowledge in the field. Methods A total of 150 s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Neubauer, Johanna Charlotte (VerfasserIn) , Kaiser, Anna (VerfasserIn) , Lettermann, Leon (VerfasserIn) , Volkert, Tobias (VerfasserIn) , Häge, Alexander (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: September 19, 2025
In: PLOS ONE
Year: 2025, Jahrgang: 20, Heft: 9, Pages: 1-9
ISSN:1932-6203
DOI:10.1371/journal.pone.0332917
Online-Zugang:Verlag, kostenfrei, Volltext: https://doi.org/10.1371/journal.pone.0332917
Verlag, kostenfrei, Volltext: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0332917
Volltext
Verfasserangaben:Johanna Charlotte Neubauer, Anna Kaiser, Leon Lettermann, Tobias Volkert, Alexander Häge

MARC

LEADER 00000caa a2200000 c 4500
001 1939349303
003 DE-627
005 20251209025737.0
007 cr uuu---uuuuu
008 251023s2025 xx |||||o 00| ||eng c
024 7 |a 10.1371/journal.pone.0332917  |2 doi 
035 |a (DE-627)1939349303 
035 |a (DE-599)KXP1939349303 
035 |a (OCoLC)1559713533 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Neubauer, Johanna Charlotte  |e VerfasserIn  |0 (DE-588)1214602347  |0 (DE-627)1725576546  |4 aut 
245 1 0 |a Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment  |c Johanna Charlotte Neubauer, Anna Kaiser, Leon Lettermann, Tobias Volkert, Alexander Häge 
264 1 |c September 19, 2025 
300 |b Illustrationen 
300 |a 9 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 23.10.2025 
520 |a Objective This study evaluates the performance of four large language models—ChatGPT 4o, ChatGPT o1-mini, Gemini 2.0 Flash, and Gemini 1.5 Flash—in answering multiple-choice questions in child and adolescent psychiatry to assess their level of factual knowledge in the field. Methods A total of 150 standardized multiple-choice questions from a specialty board review study guide were selected, ensuring a representative distribution across different topics. Each question had five possible answers, with only one correct option. To account for the stochastic nature of large language models, each question was asked 10 times with randomized answer orders to minimize known biases. Accuracy for each question was assessed as the percentage of correct answers across 10 requests. We calculated the mean accuracy for each model and performed statistical comparisons using paired t-tests to evaluate differences between Gemini 2.0 Flash and Gemini 1.5 Flash, as well as between Gemini 2.0 Flash and both ChatGPT 4o and ChatGPT o1-mini. As a post-hoc exploration, we identified questions with an accuracy below 10% across all models to highlight areas of particularly low performance. Results The accuracy of the tested models ranged from 68.3% to 78.9%. Both ChatGPT and Gemini demonstrated generally solid performance in the assessment of in child and adolescent psychiatry knowledge, with variations between models and topics. The superior performance of Gemini 2.0 Flash compared with its predecessor, Gemini 1.5 Flash, may reflect advancements in artificial intelligence capabilities. Certain topics, such as psychopharmacology, posed greater challenges compared to disorders with well-defined diagnostic criteria, such as schizophrenia or eating disorders. Conclusion While the results indicate that language models can support knowledge acquisition in child and adolescent psychiatry, limitations remain. Variability in accuracy across different topics, potential biases, and risks of misinterpretation must be carefully considered before implementing these models in clinical decision-making. 
650 4 |a Adolescent psychiatry 
650 4 |a Artificial intelligence 
650 4 |a Language 
650 4 |a Medicine and health sciences 
650 4 |a Mental health and psychiatry 
650 4 |a Mental health therapies 
650 4 |a Psychometrics 
650 4 |a Psychopharmacology 
700 1 |a Kaiser, Anna  |d 1993-  |e VerfasserIn  |0 (DE-588)1105079961  |0 (DE-627)862468922  |0 (DE-576)473059649  |4 aut 
700 1 |a Lettermann, Leon  |d 1996-  |e VerfasserIn  |0 (DE-588)1341259234  |0 (DE-627)1902111710  |4 aut 
700 1 |a Volkert, Tobias  |e VerfasserIn  |4 aut 
700 1 |a Häge, Alexander  |d 1979-  |e VerfasserIn  |0 (DE-588)136964397  |0 (DE-627)588725773  |0 (DE-576)30136561X  |4 aut 
773 0 8 |i Enthalten in  |t PLOS ONE  |d San Francisco, California, US : PLOS, 2006  |g 20(2025), 9, Artikel-ID e0332917, Seite 1-9  |h Online-Ressource  |w (DE-627)523574592  |w (DE-600)2267670-3  |w (DE-576)281331979  |x 1932-6203  |7 nnas  |a Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment 
773 1 8 |g volume:20  |g year:2025  |g number:9  |g elocationid:e0332917  |g pages:1-9  |g extent:9  |a Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment 
856 4 0 |u https://doi.org/10.1371/journal.pone.0332917  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext  |7 0 
856 4 0 |u https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0332917  |x Verlag  |z kostenfrei  |3 Volltext  |7 0 
951 |a AR 
992 |a 20251023 
993 |a Article 
994 |a 2025 
998 |g 136964397  |a Häge, Alexander  |m 136964397:Häge, Alexander  |d 60000  |e 60000PH136964397  |k 0/60000/  |p 5  |y j 
998 |g 1341259234  |a Lettermann, Leon  |m 1341259234:Lettermann, Leon  |d 130000  |d 130300  |d 130000  |e 130000PL1341259234  |e 130300PL1341259234  |e 130000PL1341259234  |k 0/130000/  |k 1/130000/130300/  |k 0/130000/  |p 3 
998 |g 1105079961  |a Kaiser, Anna  |m 1105079961:Kaiser, Anna  |d 60000  |e 60000PK1105079961  |k 0/60000/  |p 2 
999 |a KXP-PPN1939349303  |e 4790981169 
BIB |a Y 
SER |a journal 
JSO |a {"relHost":[{"name":{"displayForm":["Public Library of Science"]},"disp":"Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessmentPLOS ONE","title":[{"title_sort":"PLOS ONE","title":"PLOS ONE"}],"origin":[{"dateIssuedDisp":"2006-","publisherPlace":"San Francisco, California, US ; Lawrence, Kan.","dateIssuedKey":"2006","publisher":"PLOS ; PLoS"}],"part":{"issue":"9","extent":"9","volume":"20","year":"2025","text":"20(2025), 9, Artikel-ID e0332917, Seite 1-9","pages":"1-9"},"id":{"issn":["1932-6203"],"eki":["523574592"],"zdb":["2267670-3"]},"language":["eng"],"pubHistory":["1.2006 -"],"type":{"media":"Online-Ressource","bibl":"periodical"},"note":["Schreibweise des Titels bis 2012: PLoS ONE","Gesehen am 20.03.19"],"corporate":[{"role":"isb","display":"Public Library of Science"}],"recId":"523574592","physDesc":[{"extent":"Online-Ressource"}]}],"physDesc":[{"extent":"9 S.","noteIll":"Illustrationen"}],"recId":"1939349303","note":["Gesehen am 23.10.2025"],"language":["eng"],"origin":[{"dateIssuedDisp":"September 19, 2025","dateIssuedKey":"2025"}],"person":[{"given":"Johanna Charlotte","role":"aut","display":"Neubauer, Johanna Charlotte","family":"Neubauer"},{"display":"Kaiser, Anna","role":"aut","family":"Kaiser","given":"Anna"},{"family":"Lettermann","role":"aut","display":"Lettermann, Leon","given":"Leon"},{"given":"Tobias","family":"Volkert","display":"Volkert, Tobias","role":"aut"},{"display":"Häge, Alexander","role":"aut","family":"Häge","given":"Alexander"}],"id":{"doi":["10.1371/journal.pone.0332917"],"eki":["1939349303"]},"type":{"media":"Online-Ressource","bibl":"article-journal"},"name":{"displayForm":["Johanna Charlotte Neubauer, Anna Kaiser, Leon Lettermann, Tobias Volkert, Alexander Häge"]},"title":[{"title":"Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment","title_sort":"Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment"}]} 
SRT |a NEUBAUERJOPERFORMANC1920