Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR (2401.12513v2)

Published 23 Jan 2024 in cs.CV and cs.AI

Abstract: Purpose: The capacity to isolate and recognize individual characters from facsimile images of papyrus manuscripts yields rich opportunities for digital analysis. For this reason the `ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri' was held as part of the 17th International Conference on Document Analysis and Recognition. This paper discusses our submission to the competition. Methods: We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions, including a transformer based DeiT approach and a ResNet-50 model trained on a large corpus of unlabelled data using SimCLR, a self-supervised learning method. Results: Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%. At the more relaxed intersection over union threshold of 0.5, we achieved the highest mean average precision and mean average recall results for both detection and classification. Conclusion: The results demonstrate the potential for these techniques for automated character recognition on historical manuscripts. We ran the prediction pipeline on more than 4,500 images from the Oxyrhynchus Papyri to illustrate the utility of our approach, and we release the results publicly in multiple formats.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. An iterative multimodal framework for the transcription of handwritten historical documents. Pattern Recognition Letters 35, 195–203.
  2. A simple framework for contrastive learning of visual representations, in: International conference on machine learning, PMLR. pp. 1597–1607.
  3. Big self-supervised models are strong semi-supervised learners. Advances in Neural Information Processing Systems 33, 22243–22255.
  4. Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems 34, 11834–11845.
  5. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 303–338. URL: https://doi.org/10.1007/s11263-009-0275-4, doi:10.1007/s11263-009-0275-4.
  6. Computational handwriting analysis of ancient hebrew inscriptions—a survey. IEEE BITS the Information Theory Magazine 2, 90–101.
  7. YOLOv5 by Ultralytics. URL: https://github.com/ultralytics/yolov5, doi:10.5281/zenodo.3908559.
  8. YOLO by Ultralytics. URL: https://github.com/ultralytics/ultralytics.
  9. Bookrolls and Scribes in Oxyrhynchus. University of Toronto Press.
  10. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in: Workshop on challenges in representation learning, ICML, Atlanta. p. 896.
  11. The Oxyrhynchus Papyri: Part XVIII. Egypt Exploration Society.
  12. Likelihood calculations for reconstructed lacunae and Papyrus 46’s text of Ephesians 6:19. Digital Scholarship in the Humanities 38, 647–657. URL: https://doi.org/10.1093/llc/fqac078, doi:10.1093/llc/fqac078, arXiv:https://academic.oup.com/dsh/article-pdf/38/2/647/50488311/fqac078.pdf.
  13. Historical manuscript dating: traditional and current trends. Multimedia Tools and Applications 81, 31573–31602. URL: https://doi.org/10.1007/s11042-022-12927-8, doi:10.1007/s11042-022-12927-8.
  14. City of the Sharp-Nosed Fish: Greek Lives in Roman Egypt. Weidenfeld and Nicolson.
  15. CodaLab Competitions: An open source platform to organize scientific challenges. Technical Report. Université Paris-Saclay, FRA. URL: https://inria.hal.science/hal-03629462.
  16. You only look once: Unified, real-time object detection, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. doi:10.1109/CVPR.2016.91.
  17. Yolo9000: Better, faster, stronger, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. doi:10.1109/CVPR.2017.690.
  18. Yolov3: An incremental improvement. arXiv:1804.02767.
  19. ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri, in: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (Eds.), Document Analysis and Recognition - ICDAR 2023, Springer Nature Switzerland, Cham. pp. 498–507.
  20. Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing , 1–6.
  21. Exploring learning approaches for ancient greekcharacter recognition with citizen science data, in: 2021 17th International Conference on eScience (eScience), IEEE. pp. 128–137.
  22. Exploring learning approaches for ancient greek character recognition with citizen science data, in: 2021 IEEE 17th International conference on eScience (eScience), IEEE. pp. 128–137.
  23. Training data-efficient image transformers & distillation through attention, in: International conference on machine learning, PMLR. pp. 10347–10357.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Robert Turnbull (7 papers)
  2. Evelyn Mannix (3 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets