Advances in biomedical artificial intelligence may introduce or perpetuate sex and gender discriminations. Convolutional neural networks (CNN) have proven a dermatologist-level performance in image classification tasks but have not been assessed for sex and gender biases that may affect training data and diagnostic performance. In this study, we investigated sex-related imbalances in training data and diagnostic performance of a market-approved CNN for skin cancer classification (Moleanalyzer Pro®, Fotofinder Systems GmbH, Bad Birnbach, Germany).
We screened open-access dermoscopic image repositories widely used for CNN training for distribution of sex. Moreover, the sex-related diagnostic performance of the market-approved CNN was tested in 1549 dermoscopic images stratified by sex (female n = 773; male n = 776).
Most open-access repositories showed a marked under-representation of images originating from female (40%) versus male (60%) patients. Despite these imbalances and well-known sex-related differences in skin anatomy or skin-directed behaviour, the tested CNN achieved a comparable sensitivity of 87.0% [80.9%-91.3%] versus 87.1% [81.1%-91.4%], specificity of 98.7% [97.4%-99.3%] versus 96.9% [95.2%-98.0%] and ROC-AUC of 0.984 [0.975-0.993] versus 0.979 [0.969-0.988] in dermoscopic images of female versus male origin, respectively. In the sample at hand, sex-related differences in ROC-AUCs were not statistically significant in the per-image analysis nor in an additional per-individual analysis (p ≥ 0.59).
Design and training of artificial intelligence algorithms for medical applications should generally acknowledge sex and gender dimensions. Despite sex-related imbalances in open-access training data, the diagnostic performance of the tested CNN showed no sex-related bias in the classification of skin lesions.

Copyright © 2022 Elsevier Ltd. All rights reserved.