PW-DICE: Ultrasound & Imitation Advances
- PW-DICE for ultrasound imaging employs a score-based diffusion shortcut to reconstruct high-quality compounded B-mode images from a single plane wave, reducing diffusion steps by roughly 60%.
- PW-DICE for offline imitation learning uses a primal optimal transport framework with f-divergence regularization to achieve robust state-occupancy matching between learner and expert.
- Both methods deliver practical computational gains and theoretical unification while highlighting challenges in hyperparameter tuning and scalability.
PW-DICE encompasses two distinct methodological advances in machine learning: one in inverse problems for ultrasound (US) reconstruction via diffusion models, and another in offline imitation learning, where Primal Wasserstein DICE provides a regularized, optimal transport-based approach for state-occupancy matching. Despite the coincident acronym, the two contributions are unrelated in their application domains and theoretical underpinnings. This entry documents both lines in depth, reflecting their independent developments as established in (Li et al., 2023) and (Yan et al., 2023).
1. PW-DICE for Ultrasound Imaging
PW-DICE—"single plane wave takes a shortcut to plane wave compounding"—is a reconstruction paradigm designed to synthesize high-quality compounded B-mode US images from a single low-quality plane-wave (PW) acquisition using a score-based diffusion model shortcut (Li et al., 2023). The core innovation is leveraging the structural similarity between single-PW and compounded-PW images to bypass initialization from pure Gaussian noise, thereby reducing sampling cost.
Problem Formulation
US image reconstruction is modeled as a linear inverse problem. Let be the measured radio-frequency data from transducer elements, samples; discretizes the field of scattering coefficients, and models the physical measurement. Signal formation is , with Gaussian noise. Standard delay-and-sum (DAS) beamforming is for a single PW () or for compounded (multi-angle) acquisitions (). The goal is to reconstruct high-quality 0 given only 1.
Algorithmic Structure
PW-DICE modifies the standard diffusion generative process as follows:
- Rather than sampling from maximum noise (2), it forward-diffuses the accessible single PW image 3 up to an intermediate noise level 4, forming 5, 6.
- The reverse diffusion trajectory is initialized from 7 and carried out for 8 steps, using the same EDM (Elucidated Diffusion Model) sampling, typically employing the Heun 2nd-order solver.
- Measurement consistency is enforced at each reverse step by projecting onto the data-consistent set: 9.
Empirically, starting from 0 (with 1 and 2) reduces diffusion steps from 50 (conventional) to 20—representing ~60% reduction in computational workload. The resulting images match or slightly exceed the contrast-to-noise ratio (CNR) and generalized CNR (gCNR) of both conventional diffusion models and the multi-angle (75 PW) DAS reconstructions.
Architecture and Training
The denoiser/score network adopts the EDM architecture: a U-Net conditioned on the noise level. The denoising objective is the standard 3 score-matching loss. The model is trained on 300 in-vivo carotid frames (compounded from 75 PWs, angles uniformly distributed), with 4, 5, and 6 diffusion steps. Experiments are conducted on the Verasonics Vantage 256 system, test data comprises 100 frames.
Ablation and Sensitivity
Performance is sensitive to the noise initialization 7: lower 8 enables fewer reverse steps but increases risk of structural bias, while higher 9 allows more robust convergence. No principled approach is provided for tuning these hyperparameters, with future adaptation proposed.
Results
PW-DICE achieves comparable or superior CNR/gCNR to both DAS and conventional EDM using 40% of the computational cost. Variance across runs is reduced due to the structured single-PW prior. For all tested 0, beyond 10–20 steps, gCNR surpasses the 75-PW baseline (Li et al., 2023).
2. PW-DICE for Offline Imitation from Observation
Primal Wasserstein DICE (PW-DICE) is an advancement in distribution correction estimation (DICE) methods for offline imitation learning from observation (LfO), offering a primal optimal transport–based approach to discounted state-occupancy matching between the learner and expert with theoretical unification over prior 1-divergence-based approaches (Yan et al., 2023).
State-Occupancy Matching in LfO
Given a discounted MDP with policy 2, the state-occupancy measure is 3. In LfO, the goal is to match 4 (learner) to 5 (expert), given expert state-only dataset 6 and a non-expert dataset 7 of state–action–next-state tuples.
Primal Wasserstein Formulation
PW-DICE replaces 8-divergence objectives with the primal 1-Wasserstein distance: 9 subject to
0
and Bellman-flow constraints linking 1 to the state–action occupancy 2.
Because the primal LP is intractable for high-dimensional 3, pessimistic 4-divergence regularizers are introduced: 5 A single-level dual, parameterized by functions 6, emerges through Lagrangian and Fenchel duality, resulting in a convex, unconstrained objective suitable for SGD estimation.
Theoretical Unification
With suitable costs and limits, PW-DICE's dual strictly generalizes SMODICE and LobsDICE, unifying f-divergence and Wasserstein objectives under one framework. Theorem 2 explicitly shows that for appropriate cost and regularizers, the dual reduces to the standard KL-divergence SMODICE formulation.
Learning the Cost Metric
Rather than a fixed cost, PW-DICE defines and learns 7:
- The reward-like term 8 is computed as a log-ratio of smoothed non-expert and expert state densities, estimated using a learned state discriminator.
- The reachability component, 9, is derived from a contrastively trained embedding 0 with InfoNCE loss over adjacent non-expert transitions.
This flexible, contrastively informed metric allows adaptation to the geometry of the state space.
Implementation and Optimization
All networks (dual heads, discriminator, embeddings) are MLPs with 256 units/layer; training uses Adam with learning rates (3e-4 for potentials, 1e-3 for policy). Policy extraction is reduced to weighted behavior cloning: 1. The algorithmic process alternates between training cost-embedding, discriminator, dual potentials, and finally fitting the policy via weighted BC.
Empirical Evaluation
Empirical benchmarks include:
- Tabular random MDPs: PW-DICE achieves lowest regret and TV distance.
- MuJoCo continuous control (Hopper, HalfCheetah, Walker2d, Ant): achieves or surpasses state-of-the-art normalized return.
- Ablations confirm superiority of the joint R + contrastive cost and robustness to regularization hyperparameters.
3. Comparative Summary Table
| Domain | Objective/Model | Main Contribution |
|---|---|---|
| US Imaging (Li et al., 2023) | Score-based diffusion (EDM) | Reduced steps for PWC-quality via shortcut initialization from single PW |
| Offline LfO (Yan et al., 2023) | Primal OT + f-divergence regularized DICE | Unified, convex dual for flexible cost occupancy matching |
PW-DICE thus refers to two domain-specific advances: one enabling efficient ultrasound compounding by initializing diffusion from a single measured instance, the other generalizing distribution-matching techniques via primal optimal transport and learned state geometry.
4. Significance and Context
For US imaging, PW-DICE reduces the computational cost of generative reconstruction while maintaining anatomical detail, with practical implications for real-time imaging and applications where compounded data are unavailable.
For offline RL and imitation, PW-DICE circumvents the limitations of fixed-metric optimal transport and 2-divergence approaches—enabling metric learning and providing theoretical continuity among prior DICE algorithms. Its convex, unconstrained dual formulation and superior empirical results advance the state of the art for imitation from observation, particularly where collecting action-labeled expert data is challenging.
5. Limitations and Open Directions
PW-DICE for US imaging relies on careful noise-level selection (3, 4); no closed-form procedures or adaptive rules are currently provided. For the RL setting, the expressivity and stability of the cost metric learning, as well as scalability to highly complex state/action spaces, remain areas for further study. Both lines suggest follow-up work in hyperparameter adaptation, structured priors (imaging), and advanced cost function design (RL) (Li et al., 2023, Yan et al., 2023).