Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model (2312.13631v2)

Published 21 Dec 2023 in cs.CV

Abstract: Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel approach based on diffusion models to generate a diverse range of controllable oracle characters. Unlike traditional diffusion models that operate primarily on text prompts, Diff-Oracle incorporates a style encoder that utilizes style reference images to control the generation style. This encoder extracts style prompts from existing oracle character images, where style details are converted into a text embedding format via a pretrained language-vision model. On the other hand, a content encoder is integrated within Diff-Oracle to capture specific content details from content reference images, ensuring that the generated characters accurately represent the intended glyphs. To effectively train Diff-Oracle, we pre-generate pixel-level paired oracle character images (i.e., style and content images) by an image-to-image translation model. Extensive qualitative and quantitative experiments are conducted on datasets Oracle-241 and OBC306. While significantly surpassing present generative methods in terms of image generation, Diff-Oracle substantially benefits downstream oracle character recognition, outperforming all existing SOTAs by a large margin. In particular, on the challenging OBC306 dataset, Diff-Oracle leads to an accuracy gain of 7.70% in the zero-shot setting and is able to recognize unseen oracle character images with the accuracy of 84.62%, achieving a new benchmark for deciphering oracle bone scripts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Few-shot compositional font generation with dual memory. In ECCV, pages 735–751, 2020.
  2. Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
  3. An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023.
  4. Zero-shot generation of training data with denoising diffusion probabilistic model for handwritten chinese character recognition. In ICDAR, pages 348–365, 2023.
  5. Building hierarchical representations for oracle character and sketch recognition. IEEE TIP, (1):104–118, 2016.
  6. Self-supervised learning of orc-bert augmentor for recognizing few-shot oracle characters. In ACCV, pages 652–668, 2020.
  7. Diff-font: Diffusion model for robust one-shot font generation. CoRR, abs/2212.05895, 2022.
  8. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pages 6626–6637, 2017.
  9. Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022.
  10. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  11. Densely connected convolutional networks. In CVPR, pages 2261–2269, 2017.
  12. AGTGAN: unpaired image translation for photographic ancient character generation. In ACM MM, pages 5456–5467, 2022.
  13. Deep Learning: Fundamentals, Theory and Applications. Springer, 2019a.
  14. OBC306: A large-scale oracle bone character recognition dataset. In ICDAR, pages 681–688, 2019b.
  15. Image-to-image translation with conditional adversarial networks. In CVPR, pages 5967–5976, 2017.
  16. Scfont: Structure-guided chinese font generation via deep stacked networks. In AAAI, pages 4015–4022, 2019.
  17. Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In CVPR, pages 18239–18248, 2022.
  18. VQBB: image-to-image translation with vector quantized brownian bridge. CoRR, abs/2205.07680, 2022.
  19. Mix-up augmentation for oracle character recognition with imbalanced data distribution. In ICDAR, pages 237–251, 2021.
  20. Towards better long-tailed oracle character recognition with adversarial data augmentation. PR, 140:109534, 2023a.
  21. Gligen: Open-set grounded text-to-image generation. In CVPR, pages 22511–22521, 2023b.
  22. Image-to-image translation with multi-path consistency regularization. In IJCAI, pages 2980–2986, 2019.
  23. Exploring negatives in contrastive learning for unpaired image-to-image translation. In ACM MM, pages 1186–1194, 2022.
  24. Auto-encoder guided GAN for chinese calligraphy synthesis. In ICDAR, pages 1095–1100, 2017.
  25. Contrastive learning for unpaired image-to-image translation. In ECCV, pages 319–345, 2020.
  26. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  27. Searching for activation functions. In ICLR, 2018.
  28. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685, 2022.
  29. Palette: Image-to-image diffusion models. In SIGGRAPH, pages 15:1–15:10, 2022a.
  30. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022b.
  31. UNIT-DDPM: unpaired image translation with denoising diffusion probabilistic models. CoRR, abs/2104.05358, 2021.
  32. Fill-up: Balancing long-tailed data with generative models. CoRR, abs/2306.07200, 2023.
  33. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265, 2015.
  34. Denoising diffusion implicit models. In ICLR, 2021.
  35. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, pages 4278–4284, 2017.
  36. Few-shot font generation by learning fine-grained local styles. In CVPR, pages 7885–7894, 2022.
  37. Yuchen Tian. zi2zi: Master chinese calligraphy with conditional adversarial networks. https://github.com/kaonashi-tyc/zi2zi, 2017.
  38. Cf-font: Content fusion for few-shot font generation. In CVPR, pages 1858–1867, 2023.
  39. Unsupervised structure-texture separation network for oracle character recognition. IEEE TIP, 31:3137–3150, 2022a.
  40. Improving oracle bone characters recognition via A cyclegan-based data augmentation method. In ICONIP, pages 88–100, 2022b.
  41. Dg-font: Deformable generative networks for unsupervised font generation. In CVPR, pages 5130–5140, 2021.
  42. Adding conditional control to text-to-image diffusion models. CoRR, abs/2302.05543, 2023.
  43. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
  44. Oracle character recognition by nearest neighbor classification with deep metric learning. In ICDAR, pages 309–314, 2019.
  45. Inversion-based style transfer with diffusion models. In CVPR, pages 10146–10156, 2023.
  46. EGSDE: unpaired image-to-image translation via energy-guided stochastic differential equations. In NeurIPS, 2022a.
  47. FFD augmentor: Towards few-shot oracle character recognition from scratch. In ACCV, pages 37–53, 2022b.
  48. Unpaired image-to-image translation using adversarial consistency loss. In ECCV, pages 800–815, 2020.
  49. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, pages 2242–2251, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jing Li (621 papers)
  2. Qiu-Feng Wang (5 papers)
  3. Kaizhu Huang (95 papers)
  4. Rui Zhang (1138 papers)
  5. Siyuan Wang (73 papers)
  6. Erik Cambria (136 papers)

Summary

We haven't generated a summary for this paper yet.