HMRF-UNet: Integrating HMRF Energy in U-Net
- The paper introduces HMRF-UNet, which integrates HMRF energy as a differentiable loss into a U-Net architecture for unsupervised and semi-supervised segmentation tasks.
- HMRF-UNet is a segmentation model that employs fuzzy confidence vectors to fuse data fidelity with spatial regularization, achieving state-of-the-art Dice coefficients.
- By leveraging unsupervised pre-training, HMRF-UNet reduces manual annotation requirements while enhancing segmentation accuracy in challenging intensity contrast scenarios.
Hidden Markov Random Field U-Net (HMRF-UNet) refers to a class of segmentation architectures that integrate the classical Hidden Markov Random Field (HMRF) energy or its generalizations as a loss function within a U-Net convolutional neural network backbone. This approach aims to combine the unsupervised, spatially regularized modeling advantages of HMRF or MRF with the data-driven feature extraction capabilities and computational efficiency of U-Net. Key variants include the unsupervised HMRF-UNet for micro-CT segmentation (Grolig et al., 14 Nov 2025) and the differentiable "product-of-experts" MRF-UNet for (semi-)supervised neuroimaging (Brudfors et al., 2021).
1. Theoretical Foundations of HMRF Integration
HMRF-UNet constructs its loss based on the negative log-posterior energy of a hidden Markov random field, decomposed into a data fidelity (unary) term and a spatial regularization (pairwise) term. Let denote image intensities and the hidden label field (with ). Under a Gaussian mixture observation model, the HMRF energy is:
The unary term models data-likelihood, while the pairwise term—typically Potts or Banerjee potential—enforces spatial smoothness or more structured label interactions. Typically, the Potts prior penalizes label discontinuities via , where controls spatial regularization strength. The Banerjee clique potential incorporates class-conditional means and variances for more nuanced contextual penalization.
2. Differentiable HMRF Loss Formulation
To enable end-to-end unsupervised training, the HMRF-UNet replaces hard label assignments with "fuzzy" confidence vectors produced by the U-Net's final softmax layer, , . The class means and variances are computed via soft weighting:
0
The fuzzy data loss is:
1
where 2 and 3. The fuzzy neighborhood (Potts) loss is:
4
These terms are combined as 5, with 6.
3. U-Net Backbone and Architectural Details
The segmentation backbone is a standard 2D U-Net comprising three downsampling and three upsampling levels, each employing three blocks of 3×3 Conv2D, BatchNorm, and ReLU activations. Max pooling (2×2) and transposed convolution upsampling maintain spatial resolutions. Channel counts scale from 64 up to 256 in the encoder and then decrease in the decoder. The output is a 1×1 Conv2D with 7-way softmax. No explicit modifications to the U-Net topology are required—HMRF energy is incorporated purely via the custom loss function (Grolig et al., 14 Nov 2025).
A related supervised MRF-UNet architecture for neuroimaging (Brudfors et al., 2021) uses a 3D U-Net (5 encoding/decoding levels), where the MRF prior is implemented as a learned convolutional layer and T steps of mean-field message-passing are unrolled into the computation graph. The product of the U-Net (likelihood) and the MRF (prior) defines the label posterior; backpropagation occurs through all recurrent mean-field updates.
4. Neighborhood and Regularization Strategies
Different formulations of the pairwise loss have been systematically investigated:
- Normal Potts loss: Uniform regularization weight 8.
- Weighted Potts loss: Spatially-varying 9 derived from data statistics.
- Normal Banerjee clique: Incorporating class means and variances for label interactions.
- Weighted Banerjee clique: Data-adaptive Banerjee term.
All experiments in (Grolig et al., 14 Nov 2025) employ a first-order (8-pixel) neighborhood. Potts-based terms outperform Banerjee-based ones, with weighted Potts exhibiting the highest attained Dice (≈0.956). High weighting for Banerjee coupling degrades performance, especially in "normal" (non-weighted) settings. Fine-tuning the neighborhood weighting threshold can preserve thin structures.
5. Training Procedures and Pre-training Regimes
On the ArtPUFoam dataset (20,000 synthetic micro-CT images), architectural and loss hyperparameters are optimized via Bayesian search. Unsupervised training on the HMRF loss is conducted for 200 epochs (learning rate 0, batch size 128) for top candidate 1 values and each neighborhood strategy. The model achieves state-of-the-art unsupervised Dice coefficients (e.g., weighted Potts DSC ≈0.956, see Table below).
| Loss variant | Dice (mean ± std) | Key trend |
|---|---|---|
| No-neighborhood | 0.950 ± 0.015 | Baseline |
| Normal Potts (best) | 0.957 ± 0.017 | Highest (with weighted Potts) |
| Weighted Potts | 0.956 ± 0.015 | Best peak Dice |
| Normal Banerjee | <0.88 | Degrades under high weight |
| Weighted Banerjee | 0.955 | Sub-peak |
Pre-training is performed by first running unsupervised HMRF-UNet training, then transferring weights to a fresh U-Net for supervised fine-tuning on limited labeled data. This pre-training enables substantial gains in segmentation accuracy even when only a handful of labeled images are available (e.g., 5 images: DSC improves from ≈0.848 to ≈0.977).
6. Empirical Results and Performance Evaluation
Unsupervised HMRF-UNet delivers near-supervised segmentation accuracy on artificial data and demonstrates strong pre-training value for few-shot supervised tasks. On real 2CT data (RealPUFoam), purely unsupervised training shows oversegmentation along intensity contrasts, whereas finetuning via supervised labels recovers thin material features.
Inference is efficient: segmentation of a 256×256 slice requires ≈200 ms on an A100 GPU, several orders of magnitude faster than iterative HMRF-EM or evolutionary methods. Supervised MRF-UNet applications in 3D neuroimaging evidence robust accuracy improvements (up to 0.13 Dice boost out-of-distribution) with minimal parameter overhead (Brudfors et al., 2021).
7. Advantages, Limitations, and Prospective Extensions
HMRF-UNet achieves fully unsupervised segmentation with end-to-end differentiability, enabling robust spatial regularization and competitive accuracy without ground-truth annotation. The method excels in pre-training, reducing manual labeling to a small number of annotated images while sustaining high performance.
Limitations include persistence of errors on thin, low-contrast structures due to intensity-only feature reliance and binary label constraints, and difficulty in modeling ambiguous border voxels.
Planned extensions encompass semi-supervised integration (combining HMRF with a small supervised Dice loss), contrastive or self-supervised auxiliary losses, true 3D U-Net adaptation, and multi-class label support to better capture complex border or artifact regions.
In summary, HMRF-UNet fuses fuzzy HMRF energy minimization—combining data fidelity and spatial prior regularization—with U-Net's feature learning, supporting end-to-end unsupervised, semi-supervised, or supervised segmentation with favorable computational efficiency and minimal annotation requirements (Grolig et al., 14 Nov 2025, Brudfors et al., 2021).