(De)Constructing Bias on Skin Lesion Datasets (1904.08818v1)

Published 18 Apr 2019 in cs.CV

Abstract: Melanoma is the deadliest form of skin cancer. Automated skin lesion analysis plays an important role for early detection. Nowadays, the ISIC Archive and the Atlas of Dermoscopy dataset are the most employed skin lesion sources to benchmark deep-learning based tools. However, all datasets contain biases, often unintentional, due to how they were acquired and annotated. Those biases distort the performance of machine-learning models, creating spurious correlations that the models can unfairly exploit, or, contrarily destroying cogent correlations that the models could learn. In this paper, we propose a set of experiments that reveal both types of biases, positive and negative, in existing skin lesion datasets. Our results show that models can correctly classify skin lesion images without clinically-meaningful information: disturbingly, the machine-learning model learned over images where no information about the lesion remains, presents an accuracy above the AI benchmark curated with dermatologists' performances. That strongly suggests spurious correlations guiding the models. We fed models with additional clinically meaningful information, which failed to improve the results even slightly, suggesting the destruction of cogent correlations. Our main findings raise awareness of the limitations of models trained and evaluated in small datasets such as the ones we evaluated, and may suggest future guidelines for models intended for real-world deployment.

PDF Abstract

Assessing Bias in Skin Lesion Datasets: Insights and Implications

The analysis of biases present in datasets for automated skin lesion classification is a nuanced topic addressed by Bissoto et al. in their paper "(De)Constructing Bias on Skin Lesion Datasets." The paper primarily focuses on the biases within the prominent ISIC Archive and the Atlas of Dermoscopy datasets, which are widely utilized for benchmarking deep-learning models aimed at early melanoma detection.

The researchers conducted a series of experiments to demonstrate the presence of spurious correlations in skin lesion datasets, where bias might inflate performance or obscure helpful correlations. The authors devised destructive and constructive experiments to explore the effects of input data manipulation on machine-learning model performance. Their findings indicate that current practices might overlook critical biases that distort model inference capabilities, which could be problematic for real-world deployment.

Methodology and Experimental Design

The authors explored bias through "information destruction" and "information construction" experiments using two well-known datasets. Destructive actions were applied to the Atlas and ISIC datasets, which included removing clinically-relevant features such as lesion details, borders, and size information, to assess the role of non-clinical artifacts. Despite significant degradation of information, models continued to perform well, indicating the possibility of bias exploitation from artifacts introduced during image acquisition.

Conversely, constructive experiments involved feeding models with supplementary clinically-meaningful attributes. These experiments aimed to test if additional manually-engineered features could enhance model performance, hypothetically increasing learning from truly relevant medical patterns rather than residual biases.

Major Findings

The authors reveal a consistent ability of models to achieve satisfactory accuracy even when cogent medical image features were nearly entirely removed. Remarkably, models maintained performance metrics above benchmarks achieved by dermatologists under controlled evaluation settings, suggesting artificial inflation due to bias. Furthermore, the introduction of clinically-meaningful supplementary data did not enhance performance significantly, implying that models either did not leverage these attributes meaningfully or they were inherently biased toward exploiting irrelevant artifacts.

Implications and Future Work

Bissoto et al.'s findings call for reflection in research methodologies concerning dataset reliance and bias implications in training and evaluating machine-learning models. Such biases not only risk misleading optimization processes but also pose a significant barrier to reliable deployment in practical scenarios. The paper critically assesses current datasets, proposing a necessity for diverse and unbiased data collection methods as well as advanced algorithmic adjustments or training regimes that mitigate unwanted bias exploitation.

In future work, deeper analyses into specific artifact-driven biases and their visual natures should be conducted. Opportunity lies in developing more refined techniques for removing underlying biases while fostering more dependable predictions. Research could also focus on alternative data augmentation, synthetic dataset generation, or standardized bias metrics to evaluate and guide safer algorithm deployment in clinical settings.

In summary, recognizing and addressing dataset biases are pivotal in progressing toward trustworthy artificial intelligence in dermatological applications, fundamentally rooted in real-world reliability and diagnostic efficacy. The paper by Bissoto et al. invites researchers to reassess existing datasets and consider strategic improvements to better align models with medically relevant and unbiased data properties.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Alceu Bissoto (19 papers)
Michel Fornaciali (8 papers)
Eduardo Valle (50 papers)
Sandra Avila (41 papers)

Citations (68)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos