Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An open dataset for the evolution of oracle bone characters: EVOBC (2401.12467v2)

Published 23 Jan 2024 in cs.AI

Abstract: The earliest extant Chinese characters originate from oracle bone inscriptions, which are closely related to other East Asian languages. These inscriptions hold immense value for anthropology and archaeology. However, deciphering oracle bone script remains a formidable challenge, with only approximately 1,600 of the over 4,500 extant characters elucidated to date. Further scholarly investigation is required to comprehensively understand this ancient writing system. Artificial Intelligence technology is a promising avenue for deciphering oracle bone characters, particularly concerning their evolution. However, one of the challenges is the lack of datasets mapping the evolution of these characters over time. In this study, we systematically collected ancient characters from authoritative texts and websites spanning six historical stages: Oracle Bone Characters - OBC (15th century B.C.), Bronze Inscriptions - BI (13th to 221 B.C.), Seal Script - SS (11th to 8th centuries B.C.), Spring and Autumn period Characters - SAC (770 to 476 B.C.), Warring States period Characters - WSC (475 B.C. to 221 B.C.), and Clerical Script - CS (221 B.C. to 220 A.D.). Subsequently, we constructed an extensive dataset, namely EVolution Oracle Bone Characters (EVOBC), consisting of 229,170 images representing 13,714 distinct character categories. We conducted validation and simulated deciphering on the constructed dataset, and the results demonstrate its high efficacy in aiding the study of oracle bone script. This openly accessible dataset aims to digitalize ancient Chinese scripts across multiple eras, facilitating the decipherment of oracle bone script by examining the evolution of glyph forms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Chen, Y. Tan jia gu wen dan zi de shu liang ji qi xiang guan wen ti. \JournalTitleChinese Calligraphy (2019).
  2. Wang, M. et al. Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script. \JournalTitlePlos one 17, e0272974 (2022).
  3. Laboratory, O. I. P. Yin qi wenyuan. figshare https://jgw.aynu.edu.cn/ajaxpage/home2.0/index.html (2023).
  4. Guoxuedashi. Guo xue da shi. figshare http://www.guoxuedashi.net/ (2022).
  5. Li, Z. Oracle Bone Character Compilation (chung Hwa Book Co, Beijing, 2012).
  6. Zhang, J. Compilation of Western Zhou Gold Texts (Shanghai Classics Publishing House, Shanghai, 2018).
  7. Wu, G. Spring and Autumn script glyph table (Shanghai Classics Publishing House, Shanghai, 2017).
  8. Xu, Z. Table of Glyphs for Warring States period (475-221 BC) (Shanghai Classics Publishing House, Shanghai, 2017).
  9. Deep residual learning for image recognition. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 770–778 (2016).
  10. Liu, Z. et al. Swin transformer v2: Scaling up capacity and resolution. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 12009–12019 (2022).
  11. Ilvr: Conditioning method for denoising diffusion probabilistic models. \JournalTitlearXiv preprint arXiv:2108.02938 (2021).
  12. Contributors, M. Openmmlab’s pre-training toolbox and benchmark. https://github.com/open-mmlab/mmpretrain (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Haisu Guan (3 papers)
  2. Jinpeng Wan (2 papers)
  3. Yuliang Liu (82 papers)
  4. Pengjie Wang (51 papers)
  5. Kaile Zhang (5 papers)
  6. Zhebin Kuang (3 papers)
  7. Xinyu Wang (186 papers)
  8. Xiang Bai (222 papers)
  9. Lianwen Jin (116 papers)
Citations (3)