Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings (2403.17784v1)

Published 26 Mar 2024 in cs.HC and cs.AI

Abstract: Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019).
  2. John Bransford. 1979. Human cognition: Learning, understanding, and remembering. (No Title) (1979).
  3. Yu-Ying Chang and John M Swales. 2014. Informal elements in English academic writing: threats or opportunities for advanced non-native speakers? In Writing: Texts, processes and practices. Routledge, 145–167.
  4. Figure captioning with relation maps for reasoning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1537–1545.
  5. Jiwon Choi and Jaemin Jo. 2022. Intentable: A mixed-initiative system for intent-based chart captioning. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, 40–44.
  6. Visualizing for the non-visual: Enabling the visually impaired to use visualization. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 249–260.
  7. Christopher Clark and Santosh Divvala. 2016. Pdffigures 2.0: Mining figures from research papers. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. 143–152.
  8. Susan M Conrad. 1996. Investigating academic texts with corpus-based techniques: An example from biology. Linguistics and education 8, 3 (1996), 299–326.
  9. Ignacio Garcia and María Isabel Pena. 2011. Machine translation-assisted language learning: writing for beginners. Computer Assisted Language Learning 24, 5 (2011), 471–487.
  10. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th International Web for All Conference. 1–8.
  11. Enhanced Transformer Model for Data-to-Text Generation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Ioannis Konstas, Thang Luong, Graham Neubig, Yusuke Oda, and Katsuhito Sudoh (Eds.). Association for Computational Linguistics, Hong Kong, 148–156. https://doi.org/10.18653/v1/D19-5615
  12. Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 50. Sage publications Sage CA: Los Angeles, CA, 904–908.
  13. Mary Hegarty and Marcel-Adam Just. 1993. Constructing mental models of machines from text and diagrams. Journal of memory and language 32, 6 (1993), 717–742.
  14. Reader versus writer responsibility: A new typology. Landmark essays on ESL writing (1987), 63–74.
  15. Scitune: Aligning large language models with scientific multimodal instructions. arXiv preprint arXiv:2307.01139 (2023).
  16. SciCap: Generating Captions for Scientific Figures. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 3258–3264. https://doi.org/10.18653/v1/2021.findings-emnlp.277
  17. The 1st Scientific Figure Captioning (SciCap) Challenge. http://scicap.ai/.
  18. GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 5464–5474. https://doi.org/10.18653/v1/2023.findings-emnlp.363
  19. Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization. arXiv preprint arXiv:2302.12324 (2023).
  20. Docfigure: A dataset for scientific document figure classification. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Vol. 1. IEEE, 74–79.
  21. Dvqa: Understanding data visualizations via question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5648–5656.
  22. Figureqa: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300 (2017).
  23. OpenCQA: Open-ended Question Answering with Charts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 11817–11837. https://doi.org/10.18653/v1/2022.emnlp-main.811
  24. Chart-to-Text: A Large-Scale Benchmark for Chart Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 4005–4023. https://doi.org/10.18653/v1/2022.acl-long.277
  25. ACL-Fig: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2301.12293 (2023).
  26. Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line Charts. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21).
  27. Towards understanding how readers integrate charts and captions: A case study with line charts. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–11.
  28. Philipp Koehn and Barry Haddow. 2009. Interactive assistance to human translators using statistical machine translation methods. In Proceedings of Machine Translation Summit XII: Papers.
  29. Human-ai collaboration via conditional delegation: A case study of content moderation. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18.
  30. Multimedia and comprehension: The relationship among text, animation, and captions. Journal of the American society for information science 46, 5 (1995), 340–347.
  31. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In International Conference on Machine Learning. PMLR, 18893–18912.
  32. Shengzhi Li and Nima Tajbakhsh. 2023. Scigraphqa: A large-scale synthetic multi-turn question-answering dataset for scientific graphs. arXiv preprint arXiv:2308.03349 (2023).
  33. Inksight: Leveraging sketch interaction for documenting chart findings in computational notebooks. IEEE Transactions on Visualization and Computer Graphics (2023).
  34. Autotitle: An interactive title generator for visualizations. IEEE Transactions on Visualization and Computer Graphics (2023).
  35. Matcha: Enhancing visual language pretraining with math reasoning and chart derendering. arXiv preprint arXiv:2212.09662 (2022).
  36. Linecap: Line charts for data visualization captioning models. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, 35–39.
  37. UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning. arXiv preprint arXiv:2305.14761 (2023).
  38. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244 (2022).
  39. Fast and secure authentication in virtual reality using coordinated 3d manipulation and pointing. ACM Transactions on Computer-Human Interaction (ToCHI) 28, 1 (2021), 1–44.
  40. Gwen C Nugent. 1983. Deaf students’ learning from captioned instruction: The relationship between the visual and caption display. The Journal of Special Education 17, 2 (1983), 227–234.
  41. Jason Obeid and Enamul Hoque. 2020. Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model. In Proceedings of the 13th International Conference on Natural Language Generation, Brian Davis, Yvette Graham, John Kelleher, and Yaji Sripada (Eds.). Association for Computational Linguistics, Dublin, Ireland, 138–147. https://doi.org/10.18653/v1/2020.inlg-1.20
  42. OpenAI. 2022. GPT-3.5: Language Models are Few-Shot Learners. https://platform.openai.com/docs/models/gpt-3-5.
  43. OpenAI. 2023. GPT-4V(ision) System Card. https://api.semanticscholar.org/CorpusID:263218031
  44. William D Page. 1974. The author and the reader in writing and reading. Research in the Teaching of English 8, 2 (1974), 170–183.
  45. Generating accurate caption units for figure captioning. In Proceedings of the Web Conference 2021. 2792–2804.
  46. Figureseer: Parsing result-figures in research papers. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. Springer, 664–680.
  47. FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback. arXiv preprint arXiv:2307.10867 (2023).
  48. Striking a balance: reader takeaways and preferences when integrating text and charts. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2022), 1233–1243.
  49. Vistext: A benchmark for semantically rich chart captioning. arXiv preprint arXiv:2307.05356 (2023).
  50. Yanan Wang and Yea-Seul Kim. 2023. Making data-driven articles more accessible: An active preference learning approach to data fact personalization. In Proceedings of the 2023 ACM Designing Interactive Systems Conference. 1353–1366.
  51. SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning. arXiv preprint arXiv:2306.03491 (2023).
  52. mplug-docowl: Modularized multimodal large language model for document understanding. arXiv preprint arXiv:2307.02499 (2023).
Citations (2)

Summary

We haven't generated a summary for this paper yet.