DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Published 6 Jul 2021 in cs.CV | (2107.02638v1)

Abstract: Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind.

Abstract PDF Upgrade to Chat

Citations (18)

View on Semantic Scholar

Summary

The paper introduces DocSynth, a framework that uses layout guidance and adversarial learning to generate realistic synthetic document images from predefined layouts.
It employs a dual adversarial network architecture, integrating a generator with discriminators and a conv-LSTM based spatial reasoning module for layout consistency.
Quantitative results on PubLayNet, including an FID of 33.75 and a Diversity Score of 0.197, demonstrate its effectiveness in augmenting training datasets.

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

The paper "DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis" introduces a novel framework for the synthesis of document images based on predefined layouts. This work addresses the challenge of generating document images with complex object layouts, offering a solution that constructs realistic and diverse synthetic documents by employing a deep generative model.

Introduction

The ability to automatically generate document images based on specified layouts offers significant advancements in the field of Document Analysis and Recognition. The viability of document synthesis facilitates the augmentation of training datasets for machine learning tasks, beneficial for domains with limited data and privacy concerns. Traditional approaches in computer graphics and vision have faced challenges in generating documents with complex layouts while maintaining visual and logical consistency. The introduction of neural rendering, particularly GANs, provides an avenue to achieve controllable image generation of document layouts. The DocSynth framework stands as a pioneering effort to generate synthetic document images with user-defined layout properties.

Figure 1: Illustration of the Task: Given an input document layout with object bounding boxes and categories configured in an image lattice, our model samples the semantic and spatial attributes of every layout object from a normal distribution, and generate multiple plausible document images as required by the user.

Methodology

Problem Formulation

The problem is defined as generating a document image $\tilde{I}$ from a layout $L$ consisting of object categories and bounding boxes, along with a latent estimation $Z_{obj}$ sampled from a normal distribution. The mapping follows the function $\tilde{I} = G(L, Z_{obj}; \Theta_{G})$ , where $\Theta_{G}$ are trainable parameters capturing the data distribution aligned with the spatial configurations of document layout objects.

Model Architecture

The DocSynth architecture comprises two primary adversarial networks: the generator $G$ and two discriminators ( $D_{img}$ and $D_{obj}$ ). The generator is equipped with a conditioned image generator $H$ , global layout encoder $C$ , and an image decoder $K$ . It incorporates object and layout encoding to generate realistic document images.

Figure 2: Overview of our DocSynth Framework: The model has been trained adversarially against a pair of discriminators and a set of learning objectives as depicted.

Spatial Reasoning Module

A convolutional LSTM (conv-LSTM) network is employed for effective spatial reasoning. This network translates the object feature maps $F_{i}$ into a hidden layout feature map $h$ , preserving both local and global spatial features crucial for synthetic document synthesis.

Experimental Validation

Qualitative Results

The DocSynth model demonstrates competency in creating diverse and realistic document images, shown through a comprehensive t-SNE visualization and examples of synthesized documents. The model effectively maintains layout consistency while generating variable object appearances.

Figure 3: t-SNE visualization of the generated synthetic document images.

Figure 4: Examples of diverse synthesized documents generated from the same layout: Given an input document layout with object bounding boxes and categories, our model samples 3 images sharing the same layout structure, but different in style and appearance.

Figure 5: Examples of synthesized document images by adding or removing bounding boxes based on previous layout: There are 2 groups of images (a)-(c) and (d)-(f) in the order of adding or removing objects.

Quantitative Results

The performance of DocSynth, measured via FID and Diversity Scores on the PubLayNet dataset, underscores its capacity to generate images that closely mimic real documents. The benchmark evaluation reveals an FID of 33.75 and a Diversity Score of 0.197 for 128x128 images, indicating strong alignment with real-world dataset structures.

Conclusion

DocSynth offers a substantial contribution to the field of document image synthesis by introducing a framework that delivers on generating diverse, layout-guided synthetic documents. The integration of complex interactions between layout objects and preserved document structure paves the way for further research into high-resolution synthesis and auxiliary applications such as document classification and layout analysis. Potential future work includes extending the resolution capabilities of the framework and exploring broader applications within document analytics and data augmentation strategies.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Summary

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Introduction

Methodology

Problem Formulation

Model Architecture

Spatial Reasoning Module

Experimental Validation

Qualitative Results

Quantitative Results

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (4)

Collections

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Summary

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

Introduction

Methodology

Problem Formulation

Model Architecture

Spatial Reasoning Module

Experimental Validation

Qualitative Results

Quantitative Results

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections