Graphic Design with Large Multimodal Model (2404.14368v1)

Published 22 Apr 2024 in cs.CV, cs.AI, and cs.CL

Abstract: In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of layers, thus limiting creative potential and increasing user workload. In this paper, we present Hierarchical Layout Generation (HLG) as a more flexible and pragmatic setup, which creates graphic composition from unordered sets of design elements. To tackle the HLG task, we introduce Graphist, the first layout generation model based on large multimodal models. Graphist efficiently reframes the HLG as a sequence generation problem, utilizing RGB-A images as input, outputs a JSON draft protocol, indicating the coordinates, size, and order of each element. We develop new evaluation metrics for HLG. Graphist outperforms prior arts and establishes a strong baseline for this field. Project homepage: https://github.com/graphic-design-ai/graphist

PDF HTML Abstract

Enhancing Graphic Design Automation with Graphist: A Large Multimodal Model Approach

Introduction

The paper presents a novel AI approach to graphic design, introducing the Hierarchical Layout Generation (HLG) task that addresses ordering and spatial arrangement of multimodal input elements without predetermined sequence. This advances the previous Graphic Layout Generation (GLG) methods by allowing unordered element input, fostering creativity and reducing user workload.

Overview of Graphist

Graphist is a pioneering model built upon large multimodal models (LMMs), specifically designed to handle the HLG task. It utilizes a sequence generation framework to process input elements represented as RGB-A images and outputs JSON draft protocols describing the layout. Graphist integrates components like an RGB-A Encoder, Visual Shrinker, and a base LMM derived from established models such as Qwen1.5-0.5B/7B.

Key Contributions

Introduction of HLG Task: The hierarchical approach in layout generation bypasses the need for pre-determined layer ordering, broadening the scope for AI in practical graphic design applications.
Development of Graphist: As an end-to-end trainable model leveraging LMMs, Graphist efficiently processes multimodal inputs (images and text) and outputs detailed layout specifications in JSON format.
New Evaluation Metrics: The introduction of Inverse Order Pair Ratio (IOPR) and GPT-4V Eval helps in quantitatively assessing the performance of the layout generation models, with Graphist showing leading performance across these metrics.

Experimental Results

Graphist strongly outperforms existing methods, including state-of-the-art models like Flex-DM and GPT-4V in both HLG and GLG tasks. It excels particularly in maintaining layer order and enhancing the aesthetic quality of graphic compositions, corroborated by high scores in new evaluation metrics. Additionally, the model shows adaptability across various graphical tasks, as demonstrated in tests on diverse datasets like Crello and CGL-V2.

Future Implications and Developments

The introduction and successful implementation of Graphist suggest significant potential for further exploration into more effective integration of AI in graphic design. This ranges from refining model architecture to improving task-specific performance. Continuous advancements in LMMs could bolster Graphist’s capabilities, potentially democratizing high-quality design creation and expanding practical applications in industry.

Conclusion

This paper marks a substantial advancement in automated graphic design technologies. Graphist, leveraging the power of LLMs, addresses critical challenges in layout generation, demonstrating superior performance and flexibility compared to existing models. The introduction of the HLG task, alongside innovative evaluation metrics, sets a new benchmark in the field, pushing the boundaries of what AI can achieve in creative industries.