Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset (2402.18545v1)

Published 28 Feb 2024 in cs.CY

Abstract: Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and AI tool development. Dermatology is a suitable area to develop and test a new and scalable method to create representative health datasets. Methods: We used Google Search advertisements to invite contributions to an open access dataset of images of dermatology conditions, demographic and symptom information. With informed contributor consent, we describe and release this dataset containing 10,408 images from 5,033 contributions from internet users in the United States over 8 months starting March 2023. The dataset includes dermatologist condition labels as well as estimated Fitzpatrick Skin Type (eFST) and Monk Skin Tone (eMST) labels for the images. Results: We received a median of 22 submissions/day (IQR 14-30). Female (66.72%) and younger (52% < age 40) contributors had a higher representation in the dataset compared to the US population, and 32.6% of contributors reported a non-White racial or ethnic identity. Over 97.5% of contributions were genuine images of skin conditions. Dermatologist confidence in assigning a differential diagnosis increased with the number of available variables, and showed a weaker correlation with image sharpness (Spearman's P values <0.001 and 0.01 respectively). Most contributions were short-duration (54% with onset < 7 days ago ) and 89% were allergic, infectious, or inflammatory conditions. eFST and eMST distributions reflected the geographical origin of the dataset. The dataset is available at github.com/google-research-datasets/scin . Conclusion: Search ads are effective at crowdsourcing images of health conditions. The SCIN dataset bridges important gaps in the availability of representative images of common skin conditions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Laura Akers and Judith S Gordon “Using Facebook for Large-Scale Online Randomized Clinical Trial Recruitment: Effective Advertising Strategies” In J. Med. Internet Res. 20.11, 2018, pp. e290
  2. “Conducting a fully mobile and randomised clinical trial for depression: access, engagement and expense” In BMJ Innov 2.1, 2016, pp. 14–21
  3. “The Validity of Google Trends Search Volumes for Behavioral Forecasting of National Suicide Rates in Ireland” In Int. J. Environ. Res. Public Health 16.17, 2019
  4. “Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection” In Sci Data 10.1, 2023, pp. 397
  5. “Black & brown skin” Accessed: 2023-10-30 In Black & brown skin, https://www.blackandbrownskin.co.uk/
  6. “Dermatology in Rural Settings: Organizational, Clinical, and Socioeconomic Perspectives” Springer Nature, 2021
  7. “Who searches the internet for health information?” In Health Serv. Res. 41.3 Pt 1, 2006, pp. 819–836
  8. “Sources of bias in artificial intelligence that perpetuate healthcare disparities-A global review” In PLOS Digit Health 1.3, 2022, pp. e0000022
  9. A Y Chang, S K Kiprono and T A Maurer “Providing dermatological care in resource-limited settings: barriers and potential solutions” In Br. J. Dermatol. 177.1, 2017, pp. 247–248
  10. “Disparities in dermatology AI performance on a diverse, curated clinical image set” In Sci Adv 8.32, 2022, pp. eabq6147
  11. “Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review” In JAMA Dermatol. 157.11, 2021, pp. 1362–1369
  12. “Detect explicit content (SafeSearch)” Accessed: 2023-11-18 In Google Cloud, https://cloud.google.com/vision/docs/detecting-safe-search
  13. “Light Field Image Dataset of Skin Lesions” In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) IEEE, 2019, pp. 3905–3908
  14. “Increasing utilization of dermatologists by managed care: an analysis of the National Ambulatory Medical Care Survey, 1990-1994” In J. Am. Acad. Dermatol. 37.5 Pt 1, 1997, pp. 784–788
  15. Susannah Fox “The Social Life of Health Information, 2011” Accessed: 2023-11-20 In Pew Research Center: Internet, Science & Tech, https://www.pewresearch.org/internet/2011/05/12/the-social-life-of-health-information-2011/, 2011
  16. “What Predicts Online Health Information-Seeking Behavior Among Egyptian Adults? A Cross-Sectional Study” In J. Med. Internet Res. 19.6, 2017, pp. e216
  17. “Detecting influenza epidemics using search engine query data” In Nature 457.7232, 2009, pp. 1012–1014
  18. “Successful participant recruitment strategies for an online smokeless tobacco cessation program” In Nicotine Tob. Res. 8 Suppl 1, 2006, pp. S35–41
  19. “Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the Fitzpatrick 17k Dataset”, 2021 arXiv:2104.09957 [cs.CV]
  20. “Towards Transparency in Dermatology Image Datasets with Skin Tone Annotations by Experts, Crowds, and an Algorithm”, 2022 arXiv:2207.02942 [cs.CV]
  21. “Bias in, bias out: Underreporting and underrepresentation of diverse skin types in machine learning research for skin cancer detection-A scoping review” In J. Am. Acad. Dermatol. 87.1, 2022, pp. 157–159
  22. “Augmented Intelligence Dermatology: Deep Neural Networks Empower Medical Professionals in Diagnosing Skin Cancer and Predicting Treatment Options for 134 Skin Disorders” In J. Invest. Dermatol. 140.9, 2020, pp. 1753–1761
  23. “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison”, 2019 arXiv:1901.07031 [cs.CV]
  24. “Development and Assessment of an Artificial Intelligence-Based Tool for Skin Condition Diagnosis by Primary Care Physicians and Nurse Practitioners in Teledermatology Practices” In JAMA Netw Open 4.4, 2021, pp. e217249
  25. Susan Jasper “How we detect, remove and report child sexual abuse material” Accessed: 2023-11-18 In Google, https://blog.google/technology/safety-security/how-we-detect-remove-and-report-child-sexual-abuse-material/, 2022
  26. “MIMIC-III, a freely accessible critical care database” In Sci Data 3, 2016, pp. 160035
  27. “Racial underrepresentation in dermatological datasets leads to biased machine learning models and inequitable healthcare” In J. Biomed. Res. 3.1, 2022, pp. 42–47
  28. “Know Your Data” Accessed: 2023-11-20, https://knowyourdata.withgoogle.com/docs/
  29. “A deep learning system for differential diagnosis of skin diseases” In Nat. Med. 26.6, 2020, pp. 900–908
  30. “PH2 - a dermoscopic image database for research and benchmarking” In Conf. Proc. IEEE Eng. Med. Biol. Soc. 2013, 2013, pp. 5437–5440
  31. Ellis Monk “The Monk Skin Tone Scale”, 2023
  32. “Impact of store-and-forward (SAF) teledermatology on outpatient dermatologic care: A prospective study in an underserved urban primary care setting” In J. Am. Acad. Dermatol. 74.3, 2016, pp. 484–90.e1
  33. “Evaluation of the Number-Needed-to-Biopsy Metric for the Diagnosis of Cutaneous Melanoma: A Systematic Review and Meta-analysis” In JAMA Dermatol. 155.10, 2019, pp. 1167–1174
  34. “A study of internet searches for medical information in dermatology patients: The patient–physician relationship” In Actas Dermo-Sifiliográficas (English Edition) 106.6, 2015, pp. 493–499
  35. “PAD-UFES-20: A skin lesion dataset composed of patient data and clinical images collected from smartphones” In Data Brief 32, 2020, pp. 106221
  36. John Paparrizos, Ryen W White and Eric Horvitz “Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results” In J. Oncol. Pract. 12.8, 2016, pp. 737–744
  37. “Unreliability of self-reported burning tendency and tanning ability” In Arch. Dermatol. 124.6, 1988, pp. 885–888
  38. Emilie Renahy, Isabelle Parizot and Pierre Chauvin “Health information seeking on the Internet: a double divide? Results from a representative survey in the Paris metropolitan area, France, 2005-2006” In BMC Public Health 8, 2008, pp. 69
  39. “A patient-centric dataset of images and metadata for identifying melanomas using clinical context” In Sci Data 8.1, 2021, pp. 34
  40. “Machine-learned epidemiology: real-time detection of foodborne illness at scale” In NPJ Digit Med 1, 2018, pp. 36
  41. Klaus Sellheyer and Wilma F Bergfeld “A retrospective biopsy study of the clinical diagnostic accuracy of common skin diseases by different specialties compared with dermatology” In J. Am. Acad. Dermatol. 52.5, 2005, pp. 823–830
  42. “CheXclusion: Fairness gaps in deep chest X-ray classifiers” In Pac. Symp. Biocomput. 26, 2021, pp. 232–243
  43. “Skin Deep” Accessed: 2023-10-30 In Skin Deep DFTB Skin Deep, https://dftbskindeep.com/, 2020
  44. “AI-based localization and classification of skin disease with erythema” In Sci. Rep. 11.1, 2021, pp. 5350
  45. “OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis: First International Workshop, OR 2.0 2018, 5th International Workshop, CARE 2018, 7th International Workshop, CLIP 2018, Third International Workshop, ISIC 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16 and 20, 2018, Proceedings” Springer, 2018
  46. “Survey of Physician Appointment Wait Times and Medicare and Medicaid Acceptance Rates” Accessed: 2023-11-18, https://www.wsha.org/wp-content/uploads/mha2022waittimesurveyfinal.pdf, 2022
  47. Philipp Tschandl, Cliff Rosendahl and Harald Kittler “The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions” In Sci Data 5, 2018, pp. 180161
  48. United States Census Bureau ¿ Communications Directorate - Center for New Media “QuickFacts: United States”
  49. “Using Google Ads to recruit and retain a cohort considering abortion in the United States” In Contracept X 2, 2020, pp. 100017
  50. “Development and Clinical Evaluation of an Artificial Intelligence Support Tool for Improving Telemedicine Photo Quality” In JAMA Dermatol. 159.5, 2023, pp. 496–503
  51. Abigail Walker, Claire Hopkins and Pavol Surda “Use of Google Trends to investigate loss-of-smell-related searches during the COVID-19 outbreak” In Int. Forum Allergy Rhinol. 10.7, 2020, pp. 839–847
  52. “The first images of atopic dermatitis: an attempt at retrospective diagnosis in dermatology” In J. Am. Acad. Dermatol. 53.4, 2005, pp. 684–689
  53. Xun Wang and Robin A Cohen “Health Information Technology Use Among Adults: United States, July-December 2022”, Wang,Xun,andRobinA.Cohen.n.d.‘‘HealthInformationTechnologyUseAmongAdults:’’https://doi.org/10.15620/cdc:133700., 2023
  54. “Characteristics of publicly available skin cancer image datasets: a systematic review” In Lancet Digit Health 4.1, 2022, pp. e64–e74
  55. Ryen W White and Eric Horvitz “Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs” In JAMA Oncol 3.3, 2017, pp. 398–401
  56. “The burden of skin and subcutaneous diseases: findings from the global burden of disease study 2019” In Front Public Health 11, 2023, pp. 1145513
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Abbi Ward (3 papers)
  2. Jimmy Li (6 papers)
  3. Julie Wang (3 papers)
  4. Sriram Lakshminarasimhan (2 papers)
  5. Ashley Carrick (1 paper)
  6. Bilson Campana (1 paper)
  7. Jay Hartford (2 papers)
  8. Pradeep Kumar S (3 papers)
  9. Tiya Tiyasirichokchai (2 papers)
  10. Sunny Virmani (3 papers)
  11. Renee Wong (5 papers)
  12. Yossi Matias (61 papers)
  13. Dawn Siegel (1 paper)
  14. Steven Lin (6 papers)
  15. Justin Ko (22 papers)
  16. Alan Karthikesalingam (31 papers)
  17. Christopher Semturs (12 papers)
  18. Pooja Rao (14 papers)
  19. Greg S. Corrado (37 papers)
  20. Dale R. Webster (20 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.