A Probabilistic U-Net for Segmentation of Ambiguous Images

Published 13 Jun 2018 in cs.CV, cs.LG, cs.NE, and stat.ML | (1806.05034v4)

Abstract: Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. We show on a lung abnormalities segmentation task and on a Cityscapes segmentation task that our model reproduces the possible segmentation variants as well as the frequencies with which they occur, doing so significantly better than published approaches. These models could have a high impact in real-world applications, such as being used as clinical decision-making algorithms accounting for multiple plausible semantic segmentation hypotheses to provide possible diagnoses and recommend further actions to resolve the present ambiguities.

Abstract PDF Upgrade to Chat

Citations (527)

View on Semantic Scholar

Summary

The paper presents a novel segmentation model that integrates a CVAE with U-Net to generate diverse, plausible segmentation maps from ambiguous images.
It employs a low-dimensional latent space to encode segmentation variants and learn calibrated probabilities that reflect expert annotator variability.
Experimental results show superior performance on lung CT and Cityscapes tasks, effectively capturing and replicating segmentation uncertainties.

A Probabilistic U-Net for Segmentation of Ambiguous Images

The paper introduces a generative segmentation model to address inherent ambiguities in semantic segmentation tasks, particularly common in real-world vision problems such as medical imaging. The proposed framework combines a U-Net architecture with a Conditional Variational Autoencoder (CVAE), enabling the generation of multiple plausible segmentation hypotheses from ambiguous visual input.

Core Contributions

Generative Model Architecture: The integration of a CVAE with a U-Net facilitates modeling complex output distributions, allowing the system to capture a full range of plausible segmentation maps. This architecture effectively produces multiple consistent segmentation hypotheses, overcoming a limitation of many existing models that output only a single hypothesis or pixel-wise probability estimates.
Low-Dimensional Latent Space: The model employs a low-dimensional latent space to encode potential segmentation variants. Sampling from this space allows the model to generate segmentation maps that provide consistent interpretations of the input image, thus offering a joint probability of all pixels in the segmentation map.
Calibrated Probabilities: The framework is designed to learn calibrated probabilities of segmentation variants, capturing both common and rare hypotheses with corresponding frequencies. This ability to accurately model segmentation probabilities ensures that the system provides a holistic view of potential diagnoses, useful for applications that require decision-making based on multiple hypotheses.
Scalability and Efficiency: Unlike many ensemble approaches that require multiple models or heads for varied outputs, this framework efficiently scales to large numbers of hypotheses without needing a predetermined number of variants during training.

Experimental Validation

The paper demonstrates the model's performance on two tasks with intrinsic ambiguities: lung abnormalities segmentation from CT images (LIDC-IDRI dataset) and Cityscapes segmentation with artificially induced ambiguous labels. The Probabilistic U-Net outperforms baseline methods on both tasks in terms of IoU-based energy distance, effectively matching the distribution of ground truth segmentations.

LIDC-IDRI Task: When tested on a lung abnormalities dataset, the model showed superior capability in reflecting expert annotator variability. The ability to model joint likelihood across pixels provides a higher fidelity replication of expert uncertainty.
Cityscapes Task: By introducing synonymous classes with defined probabilities, this task tests the model’s ability to reproduce complex conditional distributions of labels. The framework accurately approximates the assigned frequencies of each class, showcasing its strength in preserving distributional attributes and its flexibility in handling structurally varied output spaces.

Theoretical and Practical Implications

The proposed framework offers significant advancements in segmentation tasks that require acknowledgment of uncertainty and variability. The ability to produce multiple hypotheses makes it especially relevant for clinical applications where decision-making is impacted by uncertainty—potentially aiding in diagnosis and guiding further diagnostic actions.

From a theoretical perspective, the integration of a CVAE with a U-Net represents a promising direction for synthesizing generative models with segmentation tasks. It opens avenues for future research in refining latent space representations and enhancing the interpretability of these models.

This research lays the groundwork for developing more comprehensive segmentation frameworks capable of operating in environments characterized by complex and ambiguous data. Future developments may explore extensions across different datasets and application domains, with potential integration into broader diagnostic systems in healthcare and beyond.