Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Oracle Bone Inscriptions Multi-modal Dataset (2407.03900v1)

Published 4 Jul 2024 in cs.CV

Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging the advantages of advanced AI technology to assist in the decipherment of OBI is a highly essential research topic. However, fully utilizing AI's capabilities in these matters is reliant on having a comprehensive and high-quality annotated OBI dataset at hand whereas most existing datasets are only annotated in just a single or a few dimensions, limiting the value of their potential application. For instance, the Oracle-MNIST dataset only offers 30k images classified into 10 categories. Therefore, this paper proposes an Oracle Bone Inscriptions Multi-modal Dataset(OBIMD), which includes annotation information for 10,077 pieces of oracle bones. Each piece has two modalities: pixel-level aligned rubbings and facsimiles. The dataset annotates the detection boxes, character categories, transcriptions, corresponding inscription groups, and reading sequences in the groups of each oracle bone character, providing a comprehensive and high-quality level of annotations. This dataset can be used for a variety of AI-related research tasks relevant to the field of OBI, such as OBI Character Detection and Recognition, Rubbing Denoising, Character Matching, Character Generation, Reading Sequence Prediction, Missing Characters Completion task and so on. We believe that the creation and publication of a dataset like this will help significantly advance the application of AI algorithms in the field of OBI research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Bang Li (2 papers)
  2. Donghao Luo (34 papers)
  3. Yujie Liang (3 papers)
  4. Jing Yang (320 papers)
  5. Zengmao Ding (2 papers)
  6. Xu Peng (6 papers)
  7. Boyuan Jiang (22 papers)
  8. Shengwei Han (5 papers)
  9. Dan Sui (1 paper)
  10. Peichao Qin (1 paper)
  11. Pian Wu (1 paper)
  12. Chaoyang Wang (52 papers)
  13. Yun Qi (3 papers)
  14. Taisong Jin (11 papers)
  15. Chengjie Wang (178 papers)
  16. Xiaoming Huang (10 papers)
  17. Zhan Shu (21 papers)
  18. Rongrong Ji (315 papers)
  19. Yongge Liu (7 papers)
  20. Yunsheng Wu (25 papers)

Summary

We haven't generated a summary for this paper yet.