- The paper presents a unified framework that integrates multi-task layout generation with human-mimicking evaluation using DMPO.
- It introduces a multimodal instruction-based generator and builds a large-scale human feedback dataset, Layout-HF100k, for robust assessments.
- Experimental results and ablation studies demonstrate that the approach outperforms existing methods, setting a new performance benchmark.
Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation
Introduction to Layout Generation Challenges
The field of layout generation is pivotal in enhancing both user experience and design efficiency. Existing approaches often focus on specific task categories, resulting in limited applicability and evaluation techniques that may not align with human perception. The paper "Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation" addresses these challenges by proposing a unified framework: Uni-Layout.
Uni-Layout encompasses three core components:
- Unified Generation: Incorporates various layout tasks into a single taxonomy and utilizes natural language prompts for universal generation.
- Human-Mimicking Evaluation: Builds a large-scale human feedback dataset—Layout-HF100k—to facilitate effective evaluation aligned with human perception.
- Alignment Mechanism: Adopts Dynamic-Margin Preference Optimization (DMPO) to bridge the gap between generation outputs and human preferences.
Figure 1: Taxonomy of layout generation tasks and illustration of motivation. Diverse layout generation tasks can be divided into four categories: (a) BFEF, (b) BCEF, (c) BFEC, and (d) BCEC.
Framework Architecture
Unified Generation
Uni-Layout uses a multimodal instruction-based approach for generating layouts across different tasks. The layout generator handles Background-Free and Element-Free (BFEF), Background-Constrained and Element-Free (BCEF), Background-Free and Element-Constrained (BFEC), and Background-Constrained and Element-Constrained (BCEC). A scalable instruction function processes task-specific constraints, leveraging multimodal LLMs (MLLMs) for layout generation.
Human-Mimicking Evaluation
Utilizing Layout-HF100k, the framework introduces a dual-branch learning strategy integrating visual content and geometrical features for human-like assessment. A Chain-of-Thought mechanism aids qualitative evaluation, while a classification module provides quantitative assessments.
Figure 2: Layout-HF100k examples. The top row shows qualified examples, while the bottom row shows unqualified ones.
Alignment Strategy
The alignment between generated layouts and evaluation outcomes is optimized using DMPO. This method dynamically adjusts margins based on preference strength—leading to improved consistency with human judgments. By fine-tuning the layout generator, DMPO ensures that generated outputs match human-annotated preferences.
Figure 3: Overview of Uni-Layout framework: (a) Generation described in Section ~\ref{subsec:uni_generation.
Experimental Analysis
Uni-Layout was extensively validated against state-of-the-art models, demonstrating superior performance metrics. The framework achieved unparalleled accuracy in layout generation tasks, outperforming specialized and general-purpose methods in both task-specific evaluations and human-mimicking evaluations.
Figure 4: Layout Reward and Human Pass Rate across different methods.
Ablation Studies
A series of ablation studies confirmed the contribution of each framework component, emphasizing the importance of DMPO in aligning generated layouts with human preferences. The visualization enhanced by DMPO highlighted noticeable improvements in layout coherence and structure.
Figure 5: Comparison of effects before and after alignment.
Conclusion
Uni-Layout sets a new benchmark in the integration of human feedback within layout generation and evaluation. By addressing the limits of task-specific solutions and misaligned evaluation metrics, Uni-Layout paves the way for future research in unified frameworks that leverage human-centered design principles. Future work will explore extensions to three-dimensional layout generation, further bridging technological capabilities with complex application domains.