Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VisText: A Benchmark for Semantically Rich Chart Captioning (2307.05356v1)

Published 28 Jun 2023 in cs.CV, cs.HC, and cs.LG

Abstract: Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the haLLMark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art LLMs on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Bottom-up and top-down attention for image captioning and visual question answering. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6077–6086, Los Alamitos, CA, USA. IEEE Computer Society.
  2. Chart-text: A fully automated chart image descriptor. ArXiv, abs/1812.10636.
  3. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), pages 610–623.
  4. “it’s complicated”: Negotiating accessibility and (mis) representation in image descriptions of race, gender, and disability. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 1–19.
  5. Andrea J Bingham and Patricia Witkowsky. 2021. Deductive and inductive approaches to qualitative data analysis. Analyzing and interpreting qualitative data: After the interview, pages 133–146.
  6. Beyond memorability: Visualization recognition and recall. IEEE Transactions on Visualization and Computer Graphics (VIS), 22(1):519–528.
  7. Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics (VIS), 19(12):2376–2385.
  8. Neural caption generation over figures. In Adjunct Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the ACM International Symposium on Wearable Computers (UbiComp/ISWC), UbiComp/ISWC ’19 Adjunct, page 482–485, New York, NY, USA. Association for Computing Machinery.
  9. Figure captioning with relation maps for reasoning. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1526–1534.
  10. Figure captioning with reasoning and sequence-level training. arXiv preprint arXiv:1906.02850.
  11. Unifying vision-and-language tasks via text generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1931–1942. PMLR.
  12. Michael Correll. 2019. Ethical dimensions of visualization research. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 1–13.
  13. Datasite: Proactive visual data exploration with computation of insight-based recommendations. Information Visualization, 18(2):251–267.
  14. Generating textual summaries of bar charts. In Proceedings of the International Natural Language Generation Conference (INLG), INLG ’08, page 7–15, USA. Association for Computational Linguistics.
  15. Summarizing information graphics textually. Computational Linguistics, 38(3):527–574.
  16. Interactive sight into information graphics. In Proceedings of the International Cross Disciplinary Conference on Web Accessibility (W4A), pages 1–10.
  17. Chancey Fleet. 2021. Things which garner a ton of glowing reviews from mainstream outlets without being of much use to disabled people. For instance, Facebook’s auto image descriptions, much loved by sighted journos but famously useless in the Blind community. Twitter. https://twitter.com/ChanceyFleet/status/1349211417744961536.
  18. A theoretical analysis of the repetition problem in text generation. In Proceedings of the Conference on Artificial Intelligence (AAAI).
  19. Effective practices for description of science content within digital talking books: Guidelines for describing stem images.
  20. Computer vision and conflicting values: Describing people with automated alt text. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 543–554.
  21. Mary Hegarty and Marcel-Adam Just. 1993. Constructing mental models of machines from text and diagrams. Journal of memory and language, 32(6):717–742.
  22. SciCap: Generating captions for scientific figures. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3258–3264, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  23. Jessica Hullman and Nick Diakopoulos. 2011. Visualization rhetoric: Framing effects in narrative visualization. IEEE Transactions on Visualization and Computer Graphics (VIS), 17(12):2231–2240.
  24. Image descriptions’ limitations for people with visual impairments: Where are we and where are we going? In Proceedings of the Brazilian Symposium on Human Factors in Computing Systems (IHC), IHC ’21, New York, NY, USA. Association for Computing Machinery.
  25. Communicating visualizations without visuals: Investigation of visualization alternative text for people with visual impairments. IEEE Transactions on Visualization and Computer Graphics (VIS), 28(1):1095–1105.
  26. Chartsense: Interactive data extraction from chart images. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), CHI ’17, page 6706–6717, New York, NY, USA. Association for Computing Machinery.
  27. “So What? What’s That to Do With Me?” Expectations of People With Visual Impairments for Image Descriptions in Their Personal Photo Activities, page 1893–1906. Association for Computing Machinery, New York, NY, USA.
  28. Chart-to-text: A large-scale benchmark for chart summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 4005–4023, Dublin, Ireland. Association for Computational Linguistics.
  29. Visual analytics: Definition, process, and challenges. In Information visualization, pages 154–175. Springer.
  30. The work that visualisation conventions do. Information, Communication & Society, 19(6):715–735.
  31. Facilitating document reading by linking text and tables. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST), UIST ’18, page 423–434, New York, NY, USA. Association for Computing Machinery.
  32. Towards understanding how readers integrate charts and captions: A case study with line charts. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 1–11.
  33. Frames and slants in titles of visualizations on controversial topics. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 1–12.
  34. Trust and recall of information across varying degrees of title-visualization misalignment. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 1–13.
  35. Nicholas Kong and Maneesh Agrawala. 2012. Graphical overlays: Using layered elements to aid chart reading. IEEE Transactions on Visualization and Computer Graphics (VIS), 18(12):2631–2638.
  36. From word embeddings to document distances. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research (PMLR), pages 957–966, Lille, France. PMLR.
  37. Multimedia and comprehension: The relationship among text, animation, and captions. Journal of the American society for information science, 46(5):340–347.
  38. Kori: Interactive synthesis of text and charts in data documents. IEEE Transactions on Visualization and Computer Graphics (VIS), 28(1):184–194.
  39. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 7871–7880, Online. Association for Computational Linguistics.
  40. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.
  41. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  42. Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING), pages 501–507, Geneva, Switzerland. COLING.
  43. Alan Lundgard and Arvind Satyanarayan. 2022. Accessible visualization via natural language descriptions: A four-level model of semantic content. IEEE Transactions on Visualization and Computer Graphics (VIS), 28(1):1073–1083.
  44. Chartocr: Data extraction from charts images via a deep hybrid framework. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). The Computer Vision Foundation.
  45. Linecap: Line charts for data visualization captioning models. In IEEE Visualization and Visual Analytics (VIS), pages 35–39. IEEE.
  46. Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS), 7(4):1–21.
  47. Jason Obeid and Enamul Hoque. 2020. Chart-to-text: Generating natural language descriptions for charts by adapting the transformer model. In Proceedings of the International Conference on Natural Language Generation (INLG), pages 138–147, Dublin, Ireland. Association for Computational Linguistics.
  48. Decision making with visualizations: a cognitive framework across disciplines. Cognitive research: principles and implications, 3(1):1–25.
  49. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  50. Jorge Poco and Jeffrey Heer. 2017. Reverse-engineering visualizations: Recovering visual encodings from chart images. Computer Graphics Forum, 36(3):353–363.
  51. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Conference on Machine Translation (WMT), pages 186–191, Belgium, Brussels. Association for Computational Linguistics.
  52. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1).
  53. Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Natural Language Engineering, 3(1):57–87.
  54. Vega-lite: A grammar of interactive graphics. IEEE Transactions on Visualization and Computer Graphics (VIS), 23(1):341–350.
  55. Ben Shneiderman. 2020. Data Visualization’s Breakthrough Moment in the COVID-19 Crisis.
  56. Responsible research with crowds: pay crowdworkers at least minimum wage. Communications of the ACM, 61(3):39–41.
  57. A study of translation edit rate with targeted human annotation. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), pages 223–231, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
  58. Andrea Spreafico and Giuseppe Carenini. 2020. Neural data-driven captioning of time-series line charts. In Proceedings of the International Conference on Advanced Visual Interfaces (AVI), AVI ’20, New York, NY, USA. Association for Computing Machinery.
  59. Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE Transactions on Visualization and Computer Graphics (VIS), 25(1):672–681.
  60. Striking a balance: Reader takeaways and preferences when integrating text and charts. IEEE Transactions on Visualization and Computer Graphics (VIS), 29(1):1233–1243.
  61. Multimodal machine translation through visuals and speech. Machine Translation, 34(2–3):97–147.
  62. Replication: How well do my results generalize now? the external validity of online privacy and security surveys. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS 2022), pages 367–385.
  63. Altair: interactive statistical visualizations for python. Journal of Open Source Software, 3(32):1057.
  64. Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), NIPS’17, page 6000–6010, Red Hook, NY, USA. Curran Associates Inc.
  65. Show and tell: A neural image caption generator. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3156–3164, Los Alamitos, CA, USA. IEEE Computer Society.
  66. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.
  67. Leland Wilkinson. 2012. The grammar of graphics. In Handbook of computational statistics, pages 375–414. Springer.
  68. Challenges in data-to-document generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2253–2263, Copenhagen, Denmark. Association for Computational Linguistics.
  69. Huggingface’s transformers: State-of-the-art natural language processing.
  70. ByT5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics (ACL), 10:291–306.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Benny J. Tang (2 papers)
  2. Angie Boggust (11 papers)
  3. Arvind Satyanarayan (35 papers)
Citations (55)

Summary

We haven't generated a summary for this paper yet.