Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model (2312.13631v2)
Abstract: Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel approach based on diffusion models to generate a diverse range of controllable oracle characters. Unlike traditional diffusion models that operate primarily on text prompts, Diff-Oracle incorporates a style encoder that utilizes style reference images to control the generation style. This encoder extracts style prompts from existing oracle character images, where style details are converted into a text embedding format via a pretrained language-vision model. On the other hand, a content encoder is integrated within Diff-Oracle to capture specific content details from content reference images, ensuring that the generated characters accurately represent the intended glyphs. To effectively train Diff-Oracle, we pre-generate pixel-level paired oracle character images (i.e., style and content images) by an image-to-image translation model. Extensive qualitative and quantitative experiments are conducted on datasets Oracle-241 and OBC306. While significantly surpassing present generative methods in terms of image generation, Diff-Oracle substantially benefits downstream oracle character recognition, outperforming all existing SOTAs by a large margin. In particular, on the challenging OBC306 dataset, Diff-Oracle leads to an accuracy gain of 7.70% in the zero-shot setting and is able to recognize unseen oracle character images with the accuracy of 84.62%, achieving a new benchmark for deciphering oracle bone scripts.
- Few-shot compositional font generation with dual memory. In ECCV, pages 735–751, 2020.
- Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023.
- Zero-shot generation of training data with denoising diffusion probabilistic model for handwritten chinese character recognition. In ICDAR, pages 348–365, 2023.
- Building hierarchical representations for oracle character and sketch recognition. IEEE TIP, (1):104–118, 2016.
- Self-supervised learning of orc-bert augmentor for recognizing few-shot oracle characters. In ACCV, pages 652–668, 2020.
- Diff-font: Diffusion model for robust one-shot font generation. CoRR, abs/2212.05895, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pages 6626–6637, 2017.
- Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Densely connected convolutional networks. In CVPR, pages 2261–2269, 2017.
- AGTGAN: unpaired image translation for photographic ancient character generation. In ACM MM, pages 5456–5467, 2022.
- Deep Learning: Fundamentals, Theory and Applications. Springer, 2019a.
- OBC306: A large-scale oracle bone character recognition dataset. In ICDAR, pages 681–688, 2019b.
- Image-to-image translation with conditional adversarial networks. In CVPR, pages 5967–5976, 2017.
- Scfont: Structure-guided chinese font generation via deep stacked networks. In AAAI, pages 4015–4022, 2019.
- Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In CVPR, pages 18239–18248, 2022.
- VQBB: image-to-image translation with vector quantized brownian bridge. CoRR, abs/2205.07680, 2022.
- Mix-up augmentation for oracle character recognition with imbalanced data distribution. In ICDAR, pages 237–251, 2021.
- Towards better long-tailed oracle character recognition with adversarial data augmentation. PR, 140:109534, 2023a.
- Gligen: Open-set grounded text-to-image generation. In CVPR, pages 22511–22521, 2023b.
- Image-to-image translation with multi-path consistency regularization. In IJCAI, pages 2980–2986, 2019.
- Exploring negatives in contrastive learning for unpaired image-to-image translation. In ACM MM, pages 1186–1194, 2022.
- Auto-encoder guided GAN for chinese calligraphy synthesis. In ICDAR, pages 1095–1100, 2017.
- Contrastive learning for unpaired image-to-image translation. In ECCV, pages 319–345, 2020.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Searching for activation functions. In ICLR, 2018.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685, 2022.
- Palette: Image-to-image diffusion models. In SIGGRAPH, pages 15:1–15:10, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022b.
- UNIT-DDPM: unpaired image translation with denoising diffusion probabilistic models. CoRR, abs/2104.05358, 2021.
- Fill-up: Balancing long-tailed data with generative models. CoRR, abs/2306.07200, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265, 2015.
- Denoising diffusion implicit models. In ICLR, 2021.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, pages 4278–4284, 2017.
- Few-shot font generation by learning fine-grained local styles. In CVPR, pages 7885–7894, 2022.
- Yuchen Tian. zi2zi: Master chinese calligraphy with conditional adversarial networks. https://github.com/kaonashi-tyc/zi2zi, 2017.
- Cf-font: Content fusion for few-shot font generation. In CVPR, pages 1858–1867, 2023.
- Unsupervised structure-texture separation network for oracle character recognition. IEEE TIP, 31:3137–3150, 2022a.
- Improving oracle bone characters recognition via A cyclegan-based data augmentation method. In ICONIP, pages 88–100, 2022b.
- Dg-font: Deformable generative networks for unsupervised font generation. In CVPR, pages 5130–5140, 2021.
- Adding conditional control to text-to-image diffusion models. CoRR, abs/2302.05543, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
- Oracle character recognition by nearest neighbor classification with deep metric learning. In ICDAR, pages 309–314, 2019.
- Inversion-based style transfer with diffusion models. In CVPR, pages 10146–10156, 2023.
- EGSDE: unpaired image-to-image translation via energy-guided stochastic differential equations. In NeurIPS, 2022a.
- FFD augmentor: Towards few-shot oracle character recognition from scratch. In ACCV, pages 37–53, 2022b.
- Unpaired image-to-image translation using adversarial consistency loss. In ECCV, pages 800–815, 2020.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, pages 2242–2251, 2017.
- Jing Li (621 papers)
- Qiu-Feng Wang (5 papers)
- Kaizhu Huang (95 papers)
- Rui Zhang (1138 papers)
- Siyuan Wang (73 papers)
- Erik Cambria (136 papers)