Studies systematically unravelling possible causes for false diagnoses of deep learning convolutional neural networks (CNNs) are scarce, yet needed before broader application.
The objective of the study was to investigate whether scale bars in dermoscopic images are associated with the diagnostic accuracy of a market-approved CNN.
This cross-sectional analysis applied a CNN trained with more than 150,000 images (Moleanalyzer-pro®, FotoFinder Systems Inc., Bad Birnbach, Germany) to investigate seven dermoscopic image sets depicting the same 130 melanocytic lesions (107 nevi, 23 melanomas) without or with digitally superimposed scale bars of different manufacturers. Sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for the CNN’s binary classification of images with or without superimposed scale bars were assessed.
Six dermoscopic image sets with different scale bars and one control set without scale bars (overall 910 images) were submitted to CNN analysis. In images without scale bars, the CNN attained a sensitivity [95% confidence interval] of 87.0% [67.9%-95.5%] and a specificity of 87.9% [80.3%-92.8%]. ROC AUC was 0.953 [0.914-0.992]. Scale bars were not associated with significant changes in sensitivity (range 87%-95.7%, all p ≥ 1.0). However, four scale bars induced a decrease of the CNN’s specificity (range 0%-43.9%, all p < 0.001). Moreover, ROC AUC was significantly reduced by two scale bars (range 0.520-0.848, both p ≤ 0.042).
Superimposed scale bars in dermoscopic images may impair the CNN’s diagnostic accuracy, mostly by increasing the rate of the false-positive diagnoses. We recommend avoiding scale bars in images intended for CNN analysis unless specific measures counteracting effects are implemented.
This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; URL: https://www.drks.de/drks_web/).