Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Transparency in Dermatology Image Datasets with Skin Tone Annotations by Experts, Crowds, and an Algorithm (2207.02942v1)

Published 6 Jul 2022 in cs.CV, cs.HC, and cs.LG

Abstract: While AI holds promise for supporting healthcare providers and improving the accuracy of medical diagnoses, a lack of transparency in the composition of datasets exposes AI models to the possibility of unintentional and avoidable mistakes. In particular, public and private image datasets of dermatological conditions rarely include information on skin color. As a start towards increasing transparency, AI researchers have appropriated the use of the Fitzpatrick skin type (FST) from a measure of patient photosensitivity to a measure for estimating skin tone in algorithmic audits of computer vision applications including facial recognition and dermatology diagnosis. In order to understand the variability of estimated FST annotations on images, we compare several FST annotation methods on a diverse set of 460 images of skin conditions from both textbooks and online dermatology atlases. We find the inter-rater reliability between three board-certified dermatologists is comparable to the inter-rater reliability between the board-certified dermatologists and two crowdsourcing methods. In contrast, we find that the Individual Typology Angle converted to FST (ITA-FST) method produces annotations that are significantly less correlated with the experts' annotations than the experts' annotations are correlated with each other. These results demonstrate that algorithms based on ITA-FST are not reliable for annotating large-scale image datasets, but human-centered, crowd-based protocols can reliably add skin type transparency to dermatology datasets. Furthermore, we introduce the concept of dynamic consensus protocols with tunable parameters including expert review that increase the visibility of crowdwork and provide guidance for future crowdsourced annotations of large image datasets.

An Essay on "Towards Transparency in Dermatology Image Datasets with Skin Tone Annotations by Experts, Crowds, and an Algorithm"

The paper "Towards Transparency in Dermatology Image Datasets with Skin Tone Annotations by Experts, Crowds, and an Algorithm" presents an investigation into the provision of Fitzpatrick Skin Type (FST) annotations to dermatological image datasets, aiming to improve the transparency and, consequently, the fairness and accountability of AI models in dermatological applications. The core contribution of this work is the comparative evaluation of FST annotation methods, highlighting the efficacy of crowd-sourcing approaches relative to expert dermatologists and algorithmic techniques.

Summary of Key Findings

  1. Methods of Annotation: The research explores three primary methods for annotating a set of 460 images: expert annotations, crowd-sourced dynamic consensus protocols, and algorithmic annotations using Individual Typology Angle converted to FST (ITA-FST).
  2. Reliability of Annotations: The paper reveals that the inter-rater reliability among experts and between experts and crowd-sourced methods is comparable. In contrast, the ITA-FST algorithm demonstrates significantly lower correlation with expert annotations, undermining its reliability for large-scale dataset annotation.
  3. Crowd-Sourced Annotations: The dynamic consensus protocol utilized for crowd-sourced annotations demonstrates a robust method for generating reliable FST labels. The protocol incorporates mechanisms such as consensus thresholds and expert review for improved reliability.
  4. Expert Review for Edge Cases: The paper emphasizes the value of expert review, particularly for images with high disagreement within the crowd, to enhance the reliability of annotations and address any subjective nuances that crowd-sourcing may not effectively capture.

Implications and Future Directions

The findings have significant implications for the development and evaluation of AI-based diagnostic tools in dermatology. Increasing the transparency of datasets with accurate skin tone annotations can aid in identifying biases and ensuring equitable AI performance across diverse populations. This is particularly critical given prior demonstrations of AI models showing decreased accuracy on images of darker skin tones, leading to potential disparities in diagnostic outcomes.

For practitioners and dataset curators, the paper suggests prioritizing crowd-sourced annotations, augmented by expert review, over sole reliance on algorithmic approaches like ITA-FST. This recommendation is substantiated by the superior inter-rater reliability metrics for crowd-sourced methods and their feasibility for large-scale implementation in dataset annotation tasks.

The paper invites further exploration in a few areas:

  • Technological Developments: Continued refinement and development of algorithmic approaches for skin tone annotation may yield methods that could match or surpass the reliability of expert or crowd-sourced annotations.
  • Incorporation of Diverse Image Sources: Future studies might expand dataset sources, including more varied skin conditions and types, and test if the results hold across broader dermatological datasets.
  • Longitudinal and Ecological Validation: It is crucial to validate these annotation processes under ecological settings where AI models are deployed, ensuring that datasets remain representative over time and across different populations.

Conclusion

This rigorous evaluation of skin tone annotation methods represents a substantive advancement toward more transparent and accountable dermatology datasets. By emphasizing crowd-sourced dynamic consensus protocols as a viable method, complemented by expert oversight, the paper lays foundational work to underpin equitable AI applications in dermatological care. This work contributes to the ongoing discourse on mitigating biases in AI systems and fostering greater fairness within healthcare-oriented machine learning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Matthew Groh (20 papers)
  2. Caleb Harris (2 papers)
  3. Roxana Daneshjou (19 papers)
  4. Omar Badri (3 papers)
  5. Arash Koochek (4 papers)
Citations (36)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com