Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

134 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

PrefPaint: Human-Preferred Inpainting

Updated 2 July 2025

PrefPaint is a framework that aligns image inpainting outputs with human aesthetic and expert-driven preferences.
It leverages reward model-based reinforcement learning for general scenarios and direct feedback optimization for sensitive domains like medical imaging.
Experimental results indicate PrefPaint outperforms conventional methods, achieving improved subjective alignment and higher WinRate metrics.

PrefPaint encompasses a set of approaches for aligning the outputs of image inpainting diffusion models with human preferences, with demonstrated applications in both general visual domains and expert-driven fields such as medical imaging. The term “PrefPaint” refers to methodologies that systematically incorporate either general crowd-sourced preferences or domain-specific expert feedback into the optimization loop of deep generative inpainting models, superseding conventional metrics such as pixel-wise fidelity. The research underlying PrefPaint advances both algorithmic strategies—such as reward model-based reinforcement learning and direct human preference optimization—and system infrastructure for efficient human-in-the-loop training and deployment.

1. Human Preference Alignment in Image Inpainting

PrefPaint introduces alignment of image inpainting models with human aesthetic and perceptual preferences, contrasting prior approaches that optimize for objective losses or rely solely on paired ground-truth data. Two primary research lines define PrefPaint’s approach:

For general visual inpainting, a reinforcement learning (RL) paradigm is adopted, in which diffusion model outputs are steered toward configurations annotated as subjectively preferable by human raters. This strategy employs a dedicated reward model to operationalize subjective standards.
In expert domains such as medical imaging, PrefPaint employs direct human feedback—bypassing proxy reward models—to condition the learning process on the comparative or binary judgments of domain professionals.

This methodological shift positions PrefPaint as a framework for integrating direct human value systems into iterative generative model refinement, applicable to ambiguous and high-stakes visual tasks.

2. Core Technical Methodologies

2.1 Reward Model-Based Reinforcement Learning

For aligning image inpainting with general human aesthetics, PrefPaint employs a trust-region inspired RL framework for stable fine-tuning of pretrained diffusion models:

Reward Model: A model $\mathcal{R}(\cdot)$ is trained on a large, human-annotated dataset. The reward model’s architecture incorporates a CLIP (ViT-B) backbone and a regression MLP head.
RL Update: Model parameters are optimized with a gradient that incorporates both expected reward and divergence constraints:

$\nabla_{\boldsymbol{\theta} } \mathcal{J}(\boldsymbol{x}) = - \int_{P_{\boldsymbol{\theta}'}} \frac{\nabla_{\boldsymbol{\theta}} P_{\boldsymbol{\theta}}(\boldsymbol{x})}{P_{\boldsymbol{\theta}'}(\boldsymbol{x})} \mathcal{R}(\boldsymbol{x}) + \kappa \nabla_{\boldsymbol{\theta}} \mathcal{D}(P_{\boldsymbol{\theta}'}|P_{\boldsymbol{\theta}})$

where $\boldsymbol{\theta}$ , $\boldsymbol{\theta}'$ are the model and reference parameters, $\mathcal{D}$ is a divergence metric (e.g., KL), and $\kappa$ is a tradeoff coefficient.

Trustiness Weighting: To address uncertainty in reward model predictions, the update is reweighted per sample using a theoretically derived error bound linked to the sample’s embedding variance:

$\gamma = \exp(-k \|z\|_{\mathbf{V}^{-1}} + b)$

with final gradient $\nabla_{\boldsymbol{\theta}} \mathcal{J}'(\boldsymbol{x}) = \gamma \nabla_{\boldsymbol{\theta}} \mathcal{J}(\boldsymbol{x})$ .

2.2 Direct Human Feedback with Policy Optimization

In medical imaging and similarly sensitive fields, PrefPaint leverages Direct Preference Optimization (DPO) to incorporate domain-specific human feedback without relying on an explicit reward model:

D3PO Loss: For a pair of inpainted images $x^+$ (preferred) and $x^-$ (less preferred), the model is updated by minimizing:

$L_{\mathrm{DPO}}(\theta) = -\log \left( \frac{\exp(r_\theta(x^+))}{\exp(r_\theta(x^+)) + \exp(r_\theta(x^-))} \right)$

Here, $r_\theta(x)$ reflects the model’s latent output, and updates shift the model toward expert-validated configurations at each diffusion denoising step, sidestepping instability from adversarial or reward-model-based RL.

3. Dataset Construction and Annotation Protocols

PrefPaint is grounded in robust dataset design:

General Domain Dataset: Images sampled from ADE20K, ImageNet, KITTI, and DIV2K yield 17,000 distinct prompts. Each prompt is inpainted with three completions, producing ~51,000 samples. Masking procedures cover both outpainting (boundary expansion, 15–40%) and complex inpainting (irregular warps).
Annotation Criteria: Human evaluators rate completions on structural rationality, local texture, and overall impression. Composite scores prioritize global impression ([0.15, 0.15, 0.7] weighted).
Medical Imaging Dataset: In the polyp inpainting scenario, expert annotation supersedes crowd-sourcing. Feedback is collected using binary or comparative ratings on candidate inpaintings generated by the model.

This comprehensive annotation strategy underpins the effectiveness and reliability of preference alignment.

4. Experimental Evaluation and Comparative Analysis

PrefPaint has been benchmarked against state-of-the-art inpainting methods across standard and expert domains:

General Inpainting: PrefPaint outperforms methods including Stable Diffusion (1.5, 2.1, XL, XL-inpainting), CompVis, Kandinsky, MAT, and Palette. Metrics include reward alignment, WinRate (fraction surpassing baseline in preference), T2I Reward, aesthetic and CLIP/BLIP scores, and Inception Score.
- Achieves WinRate >70% with one sample, compared to 50–60% for strong baselines.
- Outputs exhibit enhanced structural coherence, fine texture, and improved global integration on challenging semantic holes.
Medical Imaging (Polyp Domain): In crowdsourced and expert studies, PrefPaint shows superior performance:
- Table below summarizes subjective ratings (1=very bad, 5=very good):
Method Sessile Pedunculated Landscape Human

SDi 2.81 2.96 2.77 3.38

DreamBooth 3.24 3.18 3.15 3.61

SD2i 3.33 3.25 3.38 3.57

PrefPaint 4.38 4.34 4.30 4.22
- PrefPaint reduces visual inconsistencies and artifacts in both medical and natural images, notably improving local detail and frame transitions.
Ablation and Analysis: Trustiness-aware feedback accelerates convergence and improves final preference scores; regression-based reward learning outperforms classification; theoretical error bounds for confidence weighting are empirically validated.

Method	Sessile	Pedunculated	Landscape	Human
SDi	2.81	2.96	2.77	3.38
DreamBooth	3.24	3.18	3.15	3.61
SD2i	3.33	3.25	3.38	3.57
PrefPaint	4.38	4.34	4.30	4.22

5. System Implementation and Web-Based Interface

PrefPaint’s workflow is deployed via a modern web-based system, emphasizing usability for both AI practitioners and medical professionals:

Interface: Features include image upload, region masking, prompt specification, visualization of ongoing/past projects, and a showcase of previous results. Interactive feedback capture is integrated.
System Architecture:
- Templated with Templ and Golang for fast server-side rendering and security.
- Backend employs task queueing and GPU (RTX 3090) worker threads for responsive sampling and fine-tuning even under heavy computational load.
- Visualization tools allow model version tracking as a branching tree, facilitating auditability and reproducibility.

This infrastructure enables frequent, fine-grained, and scalable human-in-the-loop collaboration for model refinement.

6. Implications, Applications, and Prospective Extensions

PrefPaint demonstrates that direct optimization for human preference—either through explicit reward models or by integrating domain-expert feedback—yields inpainting results with measurably higher subjective and practical quality in both high-level aesthetics and application-critical realism. Key implications include:

Generalizability: The methodology can be adapted to other generative tasks, including scene extension, 3D/VR content synthesis, restoration, and personalization. The preference modeling framework and interface support iterative refinement aligned with either population-level or bespoke standards.
Safety-Critical AI: In domains such as medical imaging, PrefPaint mitigates the risk of model hallucinations or clinically invalid outputs, advancing reliability, transparency, and efficacy in computer-aided diagnosis and education.
Technical Contributions: PrefPaint formalizes the use of trust-region RL for diffusion models, introduces reward trustiness-aware weighting, and presents an efficient, user-centric system for interactive model training and evaluation.
Future Directions: Ongoing research seeks to expand feedback modalities (e.g., regional annotation, ranking), enhance prompt generation, generalize to broader domains (histology, radiology), and validate performance at larger population scales.

7. Open Resources and Accessibility

The PrefPaint codebase and its large-scale, human-labeled dataset (51,000+ annotated inpaintings) are publicly released for research under a CC BY-NC 4.0 license. The project website (https://prefpaint.github.io) provides access to software, data, and documentation, facilitating broad adoption and extension in the community.

PrefPaint exemplifies a systematic, reproducible, and extensible approach to human preference alignment in image inpainting, with substantial evidence supporting its impact in both visual computing and expert-critical imaging domains.

PDF Markdown Chat (Upgrade)