Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks
Background - Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article (Journal) |
| Language: | English |
| Published: |
14 August 2019
|
| In: |
European journal of cancer
Year: 2019, Volume: 119, Pages: 57-65 |
| ISSN: | 1879-0852 |
| DOI: | 10.1016/j.ejca.2019.06.013 |
| Online Access: | Verlag, kostenfrei: https://doi.org/10.1016/j.ejca.2019.06.013 Verlag: http://www.sciencedirect.com/science/article/pii/S0959804919303818 |
| Author Notes: | Roman C. Maron, Michael Weichenthal, Jochen S. Utikal, Achim Hekler, Carola Berking, Axel Hauschild, Alexander H. Enk, Sebastian Haferkamp, Joachim Klode, Dirk Schadendorf, Philipp Jansen, Tim Holland-Letz, Bastian Schilling, Christof von Kalle, Stefan Fröhling, Maria R. Gaiser, Daniela Hartmann, Anja Gesierich, Katharina C. Kähler, Ulrike Wehkamp, Ante Karoglan, Claudia Bär, Titus J. Brinker, Collabrators |
| Summary: | Background - Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. - Methods - Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. - Findings - Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). - Interpretation - Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001). |
|---|---|
| Item Description: | Gesehen am 06.11.2019 |
| Physical Description: | Online Resource |
| ISSN: | 1879-0852 |
| DOI: | 10.1016/j.ejca.2019.06.013 |