Papers
Topics
Authors
Recent
2000 character limit reached

Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness (2407.20845v1)

Published 30 Jul 2024 in cs.CV, cs.HC, and cs.LG

Abstract: Recent advancements in vision models have greatly improved their ability to handle complex chart understanding tasks, like chart captioning and question answering. However, it remains challenging to assess how these models process charts. Existing benchmarks only roughly evaluate model performance without evaluating the underlying mechanisms, such as how models extract image embeddings. This limits our understanding of the model's ability to perceive fundamental graphical components. To address this, we introduce a novel evaluation framework to assess the graphical perception of image embedding models. For chart comprehension, we examine two main aspects of channel effectiveness: accuracy and discriminability of various visual channels. Channel accuracy is assessed through the linearity of embeddings, measuring how well the perceived magnitude aligns with the size of the stimulus. Discriminability is evaluated based on the distances between embeddings, indicating their distinctness. Our experiments with the CLIP model show that it perceives channel accuracy differently from humans and shows unique discriminability in channels like length, tilt, and curvature. We aim to develop this work into a broader benchmark for reliable visual encoders, enhancing models for precise chart comprehension and human-like perception in future applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems, vol. 35, pp. 23716–23736, 2022.
  2. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pp. 2425–2433, 2015.
  3. F. Bajić and J. Job. Chart classification using siamese cnn. Journal of Imaging, 7(11), 2021.
  4. Chart-text: A fully automated chart image descriptor. arXiv preprint arXiv:1812.10636, 2018.
  5. What does the chart say? grouping cues guide viewer comparisons and conclusions in bar charts. IEEE Transactions on Visualization and Computer Graphics, 30(8):5097–5110, 2024.
  6. Same data, diverging perspectives: The power of visualizations to elicit competing interpretations. IEEE Transactions on Visualization and Computer Graphics, 30(6):2995–3007, 2024.
  7. Contrastive language and vision learning of general fashion concepts. Scientific Reports, 12(1):18958, 2022.
  8. W. S. Cleveland and R. McGill. Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American statistical association, 79(387):531–554, 1984.
  9. Instructblip: Towards general-purpose vision-language models with instruction tuning. Advances in Neural Information Processing Systems, 36, 2024.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  11. G. T. Fechner. Elements of Psychophysics, vol. 1. Holt, Rinehart and Winston, United States of America, 1966. Original work published 1860.
  12. C. F. Gauss. Theoria motus corporum coelestium in sectionibus conicis solem ambientium, vol. 7. FA Perthes, 1877.
  13. Comparing deep neural networks against humans: object recognition when the signal gets weaker. arXiv preprint arXiv:1706.06969, 2017.
  14. Evaluating ‘graphical perception’ with cnns. IEEE Transactions on Visualization and Computer Graphics, 25(1):641–650, 2019.
  15. Evaluating large language models in generating synthetic hci research data: a case study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–19, 2023.
  16. Chartllama: A multimodal llm for chart understanding and generation. arXiv preprint arXiv:2311.16483, 2023.
  17. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  19. C. Healey and J. Enns. Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7):1170–1188, 2012.
  20. J. Heer and M. Bostock. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 203–212, 2010.
  21. Sizing the horizon: the effects of chart size and layering on the graphical perception of time series visualizations. In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1303–1312, 2009.
  22. Graphical perception of multiple time series. IEEE Transactions on Visualization and Computer Graphics, 16(6):927–934, 2010.
  23. Clams: a cluster ambiguity measure for estimating perceptual variability in visual clustering. IEEE Transactions on Visualization and Computer Graphics, 2023.
  24. Contextual encoder–decoder network for visual saliency prediction. Neural Networks, 129:261–270, 2020.
  25. Image-text embedding learning via visual and textual semantic reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):641–656, 2023.
  26. Visual instruction tuning. Advances in neural information processing systems, 36, 2024.
  27. J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110–141, apr 1986.
  28. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 2263–2279. Association for Computational Linguistics, Dublin, Ireland, May 2022.
  29. T. Munzner. Visualization analysis and design. CRC press, 2014.
  30. Computational approaches for app-to-app retrieval and design consistency check. arXiv e-prints, pp. arXiv–2309, 2023.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  32. Revision: Automated classification, analysis and redesign of chart images. In Proceedings of the 24th annual ACM symposium on User interface software and technology, pp. 393–402, 2011.
  33. S. S. Stevens. On the psychophysical law. Psychological review, 64(3):153, 1957.
  34. Vistext: A benchmark for semantically rich chart captioning. arXiv preprint arXiv:2307.05356, 2023.
  35. R. Veras and C. Collins. Discriminability tests for visualization effectiveness and scalability. IEEE Transactions on Visualization and Computer Graphics, 26(1):749–758, 2020.
  36. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015.
  37. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022.
  38. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52, 1987.
  39. Seeing what you believe or believing what you see? belief biases correlation estimation. IEEE Transactions on Visualization and Computer Graphics, 29(1):493–503, 2023.
  40. Chartbench: A benchmark for complex visual reasoning in charts. arXiv preprint arXiv:2312.15915, 2023.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.