Confidence Contours: Uncertainty-Aware Annotation for Medical Semantic Segmentation (2308.07528v2)
Abstract: Medical image segmentation modeling is a high-stakes task where understanding of uncertainty is crucial for addressing visual ambiguity. Prior work has developed segmentation models utilizing probabilistic or generative mechanisms to infer uncertainty from labels where annotators draw a singular boundary. However, as these annotations cannot represent an individual annotator's uncertainty, models trained on them produce uncertainty maps that are difficult to interpret. We propose a novel segmentation representation, Confidence Contours, which uses high- and low-confidence ``contours'' to capture uncertainty directly, and develop a novel annotation system for collecting contours. We conduct an evaluation on the Lung Image Dataset Consortium (LIDC) and a synthetic dataset. From an annotation study with 30 participants, results show that Confidence Contours provide high representative capacity without considerably higher annotator effort. We also find that general-purpose segmentation models can learn Confidence Contours at the same performance level as standard singular annotations. Finally, from interviews with 5 medical experts, we find that Confidence Contour maps are more interpretable than Bayesian maps due to representation of structural uncertainty.
- Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Applied Sciences, 11: 5088.
- The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics, 38 2: 915–31.
- PHiSeg: Capturing Uncertainty in Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention.
- Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, 2334–2346. New York, NY, USA: Association for Computing Machinery. ISBN 9781450346559.
- Explainable medical imaging AI needs human-centered design: guidelines and evidence from a systematic review. NPJ Digital Medicine, 5.
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40: 834–848.
- Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty. Proceedings of the ACM on Human-Computer Interaction, 5: 1 – 25.
- Crowd Disagreement About Medical Images Is Informative. In Stoyanov, D.; Taylor, Z.; Balocco, S.; Sznitman, R.; Martel, A.; Maier-Hein, L.; Duong, L.; Zahnd, G.; Demirci, S.; Albarqouni, S.; Lee, S.-L.; Moriconi, S.; Cheplygina, V.; Mateus, D.; Trucco, E.; Granger, E.; and Jannin, P., eds., Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, 105–111. Cham: Springer International Publishing. ISBN 978-3-030-01364-6.
- Efficient Elicitation Approaches to Estimate Collective Crowd Answers. Proceedings of the ACM on Human-Computer Interaction, 3: 1 – 25.
- Eliciting and Learning with Soft Labels from Every Annotator. In HCOMP.
- Annotator rationales for visual recognition. 2011 International Conference on Computer Vision, 1395–1402.
- Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions. ArXiv, abs/1806.08640.
- In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making. ArXiv, abs/2305.07722.
- Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning. In North American Chapter of the Association for Computational Linguistics.
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ArXiv, abs/1506.02142.
- A Survey of Uncertainty in Deep Neural Networks. ArXiv, abs/2107.03342.
- The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21. New York, NY, USA: Association for Computing Machinery. ISBN 9781450380966.
- Hamid, O. H. 2022. From Model-Centric to Data-Centric AI: A Paradigm Shift or Rather a Complementary Approach? 2022 8th International Conference on Information Technology Trends (ITT), 196–199.
- Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Advances in psychology, 52: 139–183.
- Supervised Uncertainty Quantification for Segmentation with Multiple Annotations. In International Conference on Medical Image Computing and Computer-Assisted Intervention.
- Uncertainty Estimates for Optical Flow with Multi-Hypotheses Networks. ArXiv, abs/1802.07095.
- Analyzing the Quality and Challenges of Uncertainty Estimations for Brain Tumor Segmentation. Frontiers in Neuroscience, 14.
- Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation. ArXiv, abs/1907.03338.
- Jurgens, D. 2013. Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels. In North American Chapter of the Association for Computational Linguistics.
- Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing.
- A Probabilistic U-Net for Segmentation of Ambiguous Images. ArXiv, abs/1806.05034.
- Medical Image Segmentation Using Deep Learning: A Survey. IET Image Process., 16: 1243–1267.
- Deep Learning Applications in Computed Tomography Images for Pulmonary Nodule Detection and Diagnosis: A Review. Diagnostics, 12.
- Lung nodules: A comprehensive review on current approach and management. Annals of Thoracic Medicine, 14: 226 – 238.
- Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments. In AAAI Conference on Human Computation & Crowdsourcing.
- The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging, 34: 1993–2024.
- Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty. ArXiv, abs/2006.06015.
- Estimating Uncertainty in Neural Networks for Cardiac MRI Segmentation: A Benchmark Study. ArXiv, abs/2012.15772.
- Attention U-Net: Learning Where to Look for the Pancreas. ArXiv, abs/1804.03999.
- Uncertain About Uncertainty: How Qualitative Expressions of Forecaster Confidence Impact Decision-Making With Uncertainty Visualizations. Frontiers in Psychology, 11.
- U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv, abs/1505.04597.
- Recent advances of HCI in decision-making tasks for optimized clinical workflows and precision medicine. Journal of biomedical informatics, 103479.
- Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses. 2017 IEEE International Conference on Computer Vision (ICCV), 3611–3620.
- MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–4520.
- Understanding Expert Disagreement in Medical Data Analysis through Structured Adjudication. In Proceedings of the 2019 ACM Conference on Computer Supported Cooperative Work and Social Computing, volume 3, 1–23. Austin, TX.
- Judgment under Uncertainty: Heuristics and Biases. Science, 185: 1124–1131.
- Exploring Structure-Wise Uncertainty for 3D Medical Image Segmentation. ArXiv, abs/2211.00303.
- Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing, 335: 34 – 45.
- Uncertainty Categories in Medical Image Segmentation: A Study of Source-Related Diversity. In Sudre, C. H.; Baumgartner, C. F.; Dalca, A.; Qin, C.; Tanno, R.; Van Leemput, K.; and Wells III, W. M., eds., Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, 26–35. Cham: Springer Nature Switzerland. ISBN 978-3-031-16749-2.
- Wylie, T. 2013. The Discrete Frechet Distance and its Applications. Ph.D. thesis, Montana State University.
- Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6230–6239.