Combining Reward Fine-Tuning and Inference-Time Guidance
Determine a principled methodology that combines reward fine-tuning and inference-time reward alignment (guidance) for flow matching and diffusion generative models, so that the resulting approach inherits the advantages of both families—flexibility across arbitrary rewards without retraining and accurate, efficient sampling.
References
It remains an open question how to combine the merits of both approaches.
— Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps
(2602.05993 - Holderrieth et al., 5 Feb 2026) in Introduction (Section 1)