DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation (2404.05212v2)
Abstract: Chinese, Japanese, and Korean (CJK), with a vast number of native speakers, have profound influence on society and culture. The typesetting of CJK languages carries a wide range of requirements due to the complexity of their scripts and unique literary traditions. A critical aspect of this typesetting process is that CJK fonts need to provide a set of consistent-looking glyphs for approximately one hundred thousand characters. However, creating such a font is inherently labor-intensive and expensive, which significantly hampers the development of new CJK fonts for typesetting, historical, aesthetic, or artistic purposes. To bridge this gap, we are motivated by recent advancements in diffusion-based generative models and propose a novel diffusion method for generating glyphs in a targeted style from a single conditioned, standard glyph form. Our experiments show that our method is capable of generating fonts of both printed and hand-written styles, the latter of which presents a greater challenge. Moreover, our approach shows remarkable zero-shot generalization capabilities for non-CJK but Chinese-inspired scripts. We also show our method facilitates smooth style interpolation and generates bitmap images suitable for vectorization, which is crucial in the font creation process. In summary, our proposed method opens the door to high-quality, generative model-assisted font creation for CJK characters, for both typesetting and artistic endeavors.
- 2020. Requirements for japanese text layout. W3c working group note, W3C. https://www.w3.org/TR/jlreq/.
- 2023. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Da, J. 2010. Chinese text computing. https://lingua.mtsu.edu/chinese-computing/. Accessed: 2024-01-01.
- Dutour, M. 2020. Svg path visualizer. https://svg-path-visualizer.netlify.app/. Accessed: 2024-01-01.
- 2020. Ddsp: Differentiable digital signal processing. arXiv preprint arXiv:2001.04643.
- Ethnologue Authors. 2024. Ethnologue: Languages of the world. https://www.ethnologue.com/. Accessed: 2024-02-01.
- 2019. Artistic glyph image synthesis via one-stage few-shot learning. ACM Transactions on Graphics (TOG) 38(6):1–12.
- 2022. It’s raw! audio generation with state-space models. In International Conference on Machine Learning, 7616–7633. PMLR.
- 2023. Zero-shot generation of training data with denoising diffusion probabilistic model for handwritten chinese character recognition. arXiv preprint arXiv:2305.15660.
- HNRCV Authors. Han-nom Revival Committe of Vietnam. https://www.hannom-rcv.org/.
- 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33:6840–6851.
- Huang, L. 2020. Type classification in cjk: Chinese. https://fonts.google.com/knowledge/type_in_china_japan_and_korea/type_classification_in_cjk_chinese. Accessed: 2024-01-01.
- IMF. 2023. ”world economic outlook database, october 2023”. https://www.imf.org/en/Publications/WEO/weo-database/2023/October. Accessed: 2024-02-01.
- Kamichi, K. 2023. Jigmo(字雲)font. https://kamichikoichi.github.io/jigmo/. Accessed: 2024-01-01.
- Kim, M.-Y. 2020a. Type classification in cjk: Japanese. https://fonts.google.com/knowledge/type_in_china_japan_and_korea/type_classification_in_cjk_japanese. Accessed: 2024-01-01.
- Kim, M.-Y. 2020b. Type classification in cjk: Korean. https://fonts.google.com/knowledge/type_in_china_japan_and_korea/type_classification_in_cjk_korean. Accessed: 2024-01-01.
- 2020. Requirements for hangul text layout and typography. W3c working group note, W3C. https://www.w3.org/TR/klreq/.
- Lisa Huang, M.-Y. K. 2020. Type in china, japan, and korea. https://fonts.google.com/knowledge/type_in_china_japan_and_korea. Accessed: 2024-01-01.
- 2023. Fonttransformer: Few-shot high-resolution chinese glyph image synthesis via stacked transformers. Pattern Recognition 141:109593.
- Liu, Y. 2022. Deciphering the hanzisphere. https://www.sixthtone.com/news/1011502. Accessed: 2024-01-01.
- 2024. Latent diffusion for language generation. Advances in Neural Information Processing Systems 36.
- 2022. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems 35:5775–5787.
- Marginson, S. 2011. Higher education in east asia and singapore: Rise of the confucian model. Higher education 61:587–611.
- MidJourney Authors. 2024. Midjourney. https://www.midjourney.com/home. Accessed: 2024-02-09.
- Needham, J. 1974. Science and civilisation in China, volume 5. Cambridge University Press.
- 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 8162–8171. PMLR.
- NINJAL. 2015. 『現代日本語書き言葉均衡コーパス』語彙表 (vocabulary list for “the balanced corpus of contemporary written japanese”). https://clrd.ninjal.ac.jp/bccwj/freq-list.html. Accessed: 2024-01-01.
- Noto Authors. 2020. Noto fonts. https://fonts.google.com/noto/use. Accessed: 2024-01-01.
- omniglot. 2023a. Chu-non script. https://www.omniglot.com/writing/chunom.htm. Accessed: 2024-01-01.
- omniglot. 2023b. Tangut. https://www.omniglot.com/writing/tangut.htm. Accessed: 2024-01-01.
- 2021. Few-shot font generation with localized style representations and factorization. In Proceedings of the AAAI conference on artificial intelligence, volume 35, 2393–2402.
- 2023. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952.
- 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 234–241. Springer.
- 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22500–22510.
- 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, 2256–2265. PMLR.
- 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- Stocks, E. J. 2020a. Introducing weights & styles. https://fonts.google.com/knowledge/introducing_type/introducing_weights_styles. Accessed: 2024-01-01.
- Stocks, E. J. 2020b. Making sense of typographic classifications. https://fonts.google.com/knowledge/introducing_type/making_sense_of_typographic_classifications. Accessed: 2024-01-01.
- Tian, Y. 2018. zi2zi: Master chinese calligraphy with conditional adversarial networks. https://github.com/kaonashi-tyc/zi2zi. Accessed: 2024-01-01.
- 2023. Requirements for chinese text layout. W3c group draft note, W3C. https://www.w3.org/TR/clreq/.
- Unicode. 2023. Unicode 15.1 character code charts. https://unicode.org/charts/. Accessed: 2024-01-01.
- Whistler, K. 2023. On the encoding of latin, greek, cyrillic and han. Unicode technical note, Unicode Consortium. https://www.unicode.org/notes/tn26/.
- Wikipedia contributors. 2024. Wikimedia commons:ancient chinese characters project. Accessed: 2024-01-01.
- 2024. Ar-diffusion: Auto-regressive diffusion model for text generation. Advances in Neural Information Processing Systems 36.
- 2024. Sa-solver: Stochastic adams solver for fast sampling of diffusion models. Advances in Neural Information Processing Systems 36.
- 2020. autotrace. https://github.com/autotrace/autotrace. Accessed: 2024-01-01.
- 2023. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 56(4):1–39.
- 2019. Font2font. https://github.com/yunchenlo/Font2Font. Accessed: 2024-01-01.
- 2018. Separating style and content for generalized style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8447–8455.
- 2023. Shifted diffusion for text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10157–10166.
- 2020. Few-shot text style transfer via deep feature similarity. IEEE Transactions on Image Processing 29:6932–6946.
- Yingtao Tian (32 papers)