- The paper introduces a two-stage, diffusion-based architecture that decouples defocus deblurring (DeblurNet) from controllable bokeh synthesis (BokehNet).
- It demonstrates superior performance on benchmarks with enhanced detail recovery and realistic blur gradients through semi-supervised training on synthetic and real unpaired data.
- The method offers flexible user control over focus plane, bokeh strength, and aperture shape, paving the way for advanced post-capture depth-of-field editing.
Generative Refocusing: Flexible Defocus Control from a Single Image
Introduction
"Generative Refocusing: Flexible Defocus Control from a Single Image" (2512.16923) presents a two-stage, diffusion-based paradigm for single-image refocusing, overcoming long-standing limitations in control and realism in post-capture depth-of-field (DoF) editing. The framework—comprising DeblurNet and BokehNet modules—enables flexible input handling (arbitrary focus states), comprehensive user control (focus plane, bokeh strength, and aperture shape), and elevated synthesis realism via semi-supervised training on synthetic and real unpaired data. The approach is validated on strong public and new benchmarks, outperforming state-of-the-art methods across deblurring, bokeh synthesis, and refocusing tasks.
Methodology
The system decomposes refocusing into two orthogonal subtasks: defocus deblurring and controllable bokeh synthesis. This decoupling allows specialized models for each operation, modularizing the learning and enabling precise user control.
Figure 1: The two-stage pipeline first recovers an all-in-focus image with DeblurNet and then applies parameterized, controllable bokeh using BokehNet.
DeblurNet: Defocus Deblurring
DeblurNet targets spatially varying defocus blur. It is conditioned on the potentially defocused input Iin​ and, optionally, a pre-deblurred estimate Ipd​ from a classical restoration method. Dual-conditioning is positional: Iin​ and Ipd​ are encoded with distinct spatial grids, and a random dropout regularizes Ipd​ to ensure robustness to its artifacts. The diffusion-based prior enables reconstruction of high-frequency detail otherwise collapsed by deterministic methods.
BokehNet: Controllable Bokeh Synthesis
BokehNet accepts the all-in-focus reconstruction, a (potentially user-edited) defocus map parameterized by estimated or user-provided depth and focus plane, bokeh level (strength), and an optional aperture shape kernel. This supports aperture size, shape, and arbitrary focus position control. Semi-supervised learning leverages both
Experimental Results
Defocus Deblurring
On DPDD and RealDOF benchmarks, DeblurNet achieves the strongest metrics (LPIPS, FID, CLIP-IQA, etc.), outperforming transformer-based and implicit representation models. Visual comparisons show superior geometrics and fidelity, particularly in text restoration and challenging, high-variance blur regions.
Figure 3: Qualitative results on deblurring benchmarks show finer detail and geometry consistency compared to top-performing baselines.
Bokeh Synthesis
On the new LF-Bokeh benchmark, BokehNet surpasses both physics-based and neural bokeh renderers (LPIPS 0.1047 vs. 0.1228–0.1799). The model better preserves blur gradients, occlusion boundaries, and adheres more closely to real lens behavior especially when trained with unpaired real data.











Figure 4: Zoomed-in comparisons against multiple baselines highlight more realistic blur placement and intensity scaling.
Refocusing
For the complete refocusing pipeline, GenRefocus outperforms all pairwise combinations of top all-in-focus estimators (e.g., DRBNet, Restormer) with neural or classical bokeh synthesis modules, reflecting the advantage of joint training with real data and modular design.
Ablation Studies
A significant performance gap is found between the two-stage and one-stage (direct mapping) designs, with the two-stage notably stronger due to improved depth-control and tailored semi-supervised learning for each subtask. Incorporating real, unpaired bokeh data substantially boosts perceptual and fidelity metrics compared to purely simulated supervision.
Controllable Aperture and Text-Guided Applications
Shape-aware bokeh synthesis is supported via explicit conditioning on a shape kernel. The model is fine-tuned with point-light-elicited training sets to maximize responsiveness. The system also demonstrates the capacity for text-guided restoration in DeblurNet, where prompts can correct hallucinated or ambiguous text in reconstructions.



Figure 5: Example images demonstrating user-specified aperture shape control during bokeh synthesis (triangle, heart, star).
Figure 6: Results on text-guided deblurring—text prompts at inference rectify content that would otherwise be mistranslated due to blur.
Implications and Future Directions
GenRefocus introduces a scalable and modular approach for post-capture focus and bokeh control, thus bridging artistic and technical requirements in computational photography and rendering. By decoupling restoration from rendering, the architecture accommodates flexible data sources, individualized subtask regularization, and exhaustive control signals. The use of real unpaired bokeh images (with EXIF) marks a crucial advance in capturing physical camera effects previously unattainable with simulator-only pipelines.
Practical implications include:
- Enhanced post-capture editing for consumers and professionals, with fine-grained DoF and bokeh styling.
- Data-driven understanding of real camera optics for neural rendering domains.
- Prompt synergy with vision-LLMs for informed content disambiguation and editing.
Theoretical implications extend to the integration of modular, semi-supervised pipelines for underdetermined physical phenomena (e.g., vision-conditioned rendering).
A limitation is reliance on monocular depth estimation, which may degrade under severe blur. Generalization to complex, user-drawn aperture shapes requires further curated simulation. Future directions involve robustifying depth estimation and expanding the vocabulary of controllable optical effects within diffusion-based refocusing.
Conclusion
Generative Refocusing (2512.16923) establishes a new paradigm for flexible, high-fidelity DoF and bokeh manipulation from single images. The two-stage architecture, underpinned by a semi-supervised strategy and explicit control over optical parameters, achieves superior quantitative and qualitative performance, offering a foundation for physically-plausible post-capture image editing and future vision–language-guided pipelines.