Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks

Background - Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be...

Full description

Saved in:
Bibliographic Details
Main Authors: Maron, Roman C. (Author) , Weichenthal, Michael (Author) , Utikal, Jochen (Author) , Hekler, Achim (Author) , Berking, Carola (Author) , Hauschild, Axel (Author) , Enk, Alexander (Author) , Haferkamp, Sebastian (Author) , Klode, Joachim (Author) , Schadendorf, Dirk (Author) , Jansen, Philipp (Author) , Holland-Letz, Tim (Author) , Schilling, Bastian (Author) , Kalle, Christof von (Author) , Fröhling, Stefan (Author) , Gaiser, Maria (Author) , Hartmann, Daniela (Author) , Gesierich, Anja Heike (Author) , Kähler, Katharina C. (Author) , Wehkamp, Ulrike (Author) , Karoglan, Ante (Author) , Bär, Claudia (Author) , Brinker, Titus Josef (Author)
Format: Article (Journal)
Language:English
Published: 14 August 2019
In: European journal of cancer
Year: 2019, Volume: 119, Pages: 57-65
ISSN:1879-0852
DOI:10.1016/j.ejca.2019.06.013
Online Access:Verlag, kostenfrei: https://doi.org/10.1016/j.ejca.2019.06.013
Verlag: http://www.sciencedirect.com/science/article/pii/S0959804919303818
Get full text
Author Notes:Roman C. Maron, Michael Weichenthal, Jochen S. Utikal, Achim Hekler, Carola Berking, Axel Hauschild, Alexander H. Enk, Sebastian Haferkamp, Joachim Klode, Dirk Schadendorf, Philipp Jansen, Tim Holland-Letz, Bastian Schilling, Christof von Kalle, Stefan Fröhling, Maria R. Gaiser, Daniela Hartmann, Anja Gesierich, Katharina C. Kähler, Ulrike Wehkamp, Ante Karoglan, Claudia Bär, Titus J. Brinker, Collabrators
Description
Summary:Background - Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. - Methods - Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. - Findings - Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). - Interpretation - Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001).
Item Description:Gesehen am 06.11.2019
Physical Description:Online Resource
ISSN:1879-0852
DOI:10.1016/j.ejca.2019.06.013