SDL: New data generation tools for full-level annotated document layout (2106.15117v1)
Abstract: We present a novel data generation tool for document processing. The tool focuses on providing a maximal level of visual information in a normal type document, ranging from character position to paragraph-level position. It also enables working with a large dataset on low-resource languages as well as providing a mean of processing thorough full-level information of the documented text. The data generation tools come with a dataset of 320000 Vietnamese synthetic document images and an instruction to generate a dataset of similar size in other languages. The repository can be found at: https://github.com/tson1997/SDL-Document-Image-Generation
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.