Collective human intelligence outperforms artificial intelligence in a skin lesion classification task

Background and objectives Convolutional neural networks (CNN) enable accurate diagnosis of medical images and perform on or above the level of individual physicians. Recently, collective human intelligence (CoHI) was shown to exceed the diagnostic accuracy of individuals. Thus, diagnostic performanc...

Full description

Saved in:
Bibliographic Details
Main Authors: Winkler, Julia K. (Author) , Kommoss, Katharina (Author) , Müller-Christmann, Christine (Author) , Toberer, Ferdinand (Author) , Enk, Alexander (Author) , Abassi, Mohamed Souhayel (Author) , Fuchs, Tobias (Author) , Blum, Andreas (Author) , Stolz, Wilhelm (Author) , Coras-Stepanek, Brigitte (Author) , Cipic, Robert (Author) , Guther, Stefanie (Author) , Hänßle, Holger (Author)
Format: Article (Journal)
Language:English
Published: 2021
In: Journal der Deutschen Dermatologischen Gesellschaft
Year: 2021, Volume: 19, Issue: 8, Pages: 1178-1184
ISSN:1610-0387
DOI:10.1111/ddg.14510
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1111/ddg.14510
Verlag, kostenfrei, Volltext: https://onlinelibrary.wiley.com/doi/abs/10.1111/ddg.14510
Get full text
Author Notes:Julia K. Winkler, Katharina Sies, Christine Fink, Ferdinand Toberer, Alexander Enk, Mohamed Souhayel Abassi, Tobias Fuchs, Andreas Blum, Wilhelm Stolz, Brigitte Coras-Stepanek, Robert Cipic, Stefanie Guther, Holger A. Haenssle
Description
Summary:Background and objectives Convolutional neural networks (CNN) enable accurate diagnosis of medical images and perform on or above the level of individual physicians. Recently, collective human intelligence (CoHI) was shown to exceed the diagnostic accuracy of individuals. Thus, diagnostic performance of CoHI (120 dermatologists) versus individual dermatologists versus two state-of-the-art CNN was investigated. Patients and Methods Cross-sectional reader study with presentation of 30 clinical cases to 120 dermatologists. Six diagnoses were offered and votes collected via remote voting devices (quizzbox®, Quizzbox Solutions GmbH, Stuttgart, Germany). Dermatoscopic images were classified by a binary and multiclass CNN (FotoFinder Systems GmbH, Bad Birnbach, Germany). Three sets of diagnostic classifications were scored against ground truth: (1) CoHI, (2) individual dermatologists, and (3) CNN. Results CoHI attained a significantly higher accuracy [95 % confidence interval] (80.0 % [62.7 %-90.5 %]) than individual dermatologists (75.7 % [73.8 %-77.5 %]) and CNN (70.0 % [52.1 %-83.3 %]; all P < 0.001) in binary classifications. Moreover, CoHI achieved a higher sensitivity (82.4 % [59.0 %-93.8 %]) and specificity (76.9 % [49.7 %-91.8 %]) than individual dermatologists (sensitivity 77.8 % [75.3 %-80.2 %], specificity 73.0 % [70.6 %-75.4 %]) and CNN (sensitivity 70.6 % [46.9 %-86.7 %], specificity 69.2 % [42.4 %-87.3 %]). The diagnostic accuracy of CoHI was superior to that of individual dermatologists (P < 0.001) in multiclass evaluation, with the accuracy of the latter comparable to multiclass CNN. Conclusions Our analysis revealed that the majority vote of an interconnected group of dermatologists (CoHI) outperformed individuals and CNN in a demanding skin lesion classification task.
Item Description:Gesehen am 04.09.2023
Physical Description:Online Resource
ISSN:1610-0387
DOI:10.1111/ddg.14510