Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Enhancing Textbooks with Visuals from the Web for Improved Learning (2304.08931v2)

Published 18 Apr 2023 in cs.CV and cs.CL

Abstract: Textbooks are one of the main mediums for delivering high-quality education to students. In particular, explanatory and illustrative visuals play a key role in retention, comprehension and general transfer of knowledge. However, many textbooks lack these interesting visuals to support student learning. In this paper, we investigate the effectiveness of vision-LLMs to automatically enhance textbooks with images from the web. We collect a dataset of e-textbooks in the math, science, social science and business domains. We then set up a text-image matching task that involves retrieving and appropriately assigning web images to textbooks, which we frame as a matching optimization problem. Through a crowd-sourced evaluation, we verify that (1) while the original textbook images are rated higher, automatically assigned ones are not far behind, and (2) the precise formulation of the optimization problem matters. We release the dataset of textbooks with an associated image bank to inspire further research in this intersectional area of computer vision and NLP for education.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Enriching textbooks with images. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, page 1847–1856, New York, NY, USA.
  2. Enriching textbooks through data mining. In Proceedings of the First ACM Symposium on Computing for Development, ACM DEV ’10, New York, NY, USA.
  3. Russell N. Carney and Joel R. Levin. 2002. Pictorial illustrations still improve students’ learning from text. Educational Psychology Review, 14:5–26.
  4. Towards an analysis of visual images in school science textbooks and press articles about science and technology. Research in Science Education, 33:189–216.
  5. Anne Nielsen Hibbing and Joan L. Rankin-Erickson. 2008. A picture is worth a thousand words: Using visual images to improve comprehension for middle school struggling readers author(s). In The Reading Teacher.
  6. Petros J. Katsioloudis. 2007. Identification of quality indicators of visual-based learning material in technology education programs for grades 7-12.
  7. Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5376–5384.
  8. Multimodal lecture presentations dataset: Understanding multimodality in educational slides. ArXiv, abs/2208.08080.
  9. Microsoft COCO: Common objects in context. In European Conference on Computer Vision.
  10. László Lovász. 1983. Submodular functions and convexity. In Mathematical programming the state of the art, pages 235–257. Springer.
  11. Richard E. Mayer. 2019. Multimedia learning. Visible Learning Guide to Student Achievement.
  12. An analysis of approximations for maximizing submodular set functions—i. Mathematical Programming, 14:265–294.
  13. Effects of integrating digital visual materials with textbook scans in the classroom. International Journal of Education and Development using ICT, 5:55–71.
  14. Learning transferable visual models from natural language supervision. In ICML.
  15. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8821–8831.
  16. Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks. Computational Linguistics, 45(4):627–665.
  17. From textbooks to knowledge: A case study in harvesting axiomatic knowledge from textbooks to solve geometry problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 773–784, Copenhagen, Denmark. Association for Computational Linguistics.
  18. Mrinmaya Sachan and Eric Xing. 2017. Learning to solve geometry problems from natural language demonstrations in textbooks. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pages 251–261, Vancouver, Canada. Association for Computational Linguistics.
  19. Towards multi-modal text-image retrieval to improve human reading. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, Online. Association for Computational Linguistics.
  20. Solving geometry problems: Combining text and diagram interpretation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1466–1476, Lisbon, Portugal. Association for Computational Linguistics.
  21. WIT: Wikipedia-based image text dataset for multimodal multilingual machine learning. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
  22. OmniVL: One foundation model for image-language and video-language tasks.
  23. Image as a foreign language: BEiT pretraining for all vision and vision-language tasks.
  24. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78.
  25. Multi-grained vision language pre-training: Aligning texts with visual concepts. ArXiv, abs/2111.08276.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube