Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips (2310.00698v1)
Abstract: Comic strips are a popular and expressive form of visual storytelling that can convey humor, emotion, and information. However, they are inaccessible to the BLV (Blind or Low Vision) community, who cannot perceive the images, layouts, and text of comics. Our goal in this paper is to create natural language descriptions of comic strips that are accessible to the visually impaired community. Our method consists of two steps: first, we use computer vision techniques to extract information about the panels, characters, and text of the comic images; second, we use this information as additional context to prompt a multimodal LLM (MLLM) to produce the descriptions. We test our method on a collection of comics that have been annotated by human experts and measure its performance using both quantitative and qualitative metrics. The outcomes of our experiments are encouraging and promising.
- Kumiko the Comics Cutter. https://github.com/njean42/kumiko.
- Deep cnn-based speech balloon detection and segmentation for comic books, 2019.
- Cocomix: Utilizing comments to improve non-visual webtoon accessibility. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New York, NY, USA, 2022. Association for Computing Machinery.
- Accesscomics: An accessible digital comic book reader for people with visual impairments. In Proceedings of the 18th International Web for All Conference, W4A ’21, New York, NY, USA, 2021. Association for Computing Machinery.
- Accesscomics2: Understanding the user experience of an accessible comic book reader for blind people with textual sound effects. ACM Trans. Access. Comput., 16(1), mar 2023.
- Visual instruction tuning, 2023.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2023.
- On the hidden mystery of ocr in large multimodal models, 2023.
- Digital comics image indexing based on deep learning. Journal of Imaging, 4(7):89, Jul 2018.
- Comic mtl: optimized multi-task learning for comic book image analysis. International Journal on Document Analysis and Recognition (IJDAR), 22, 09 2019.
- OpenAI. Gpt-4 technical report, 2023.
- Creating an authentic experience: A study in comic books, accessibility, and the visually impaired reader. The International Journal of Information, Diversity and Inclusion, 4(1):108–118, 2020.
- Learning transferable visual models from natural language supervision, 2021.
- Alcove: An accessible comic reader for people with low vision. In Proceedings of the 25th International Conference on Intelligent User Interfaces, IUI ’20, page 410–418, New York, NY, USA, 2020. Association for Computing Machinery.
- A survey on multimodal large language models, 2023.