Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI (2405.07163v1)

Published 12 May 2024 in physics.ed-ph and cs.AI

Abstract: Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual LLMs, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art visual langauge model service so that VQA could be used for a variety of purposes. However, VQA and GPT-4V have not yet been applied to educational studies much. In this position paper, we suggest that GPT-4V contributes to realizing VQA for education. By 'realizing' VQA, we denote two meanings: (1) GPT-4V realizes the utilization of VQA techniques by any educational scholars without technical/accessibility barrier, and (2) GPT-4V makes educational scholars realize the usefulness of VQA to educational research. Given these, this paper aims to introduce VQA for educational studies so that it provides a milestone for educational research methodology. In this paper, chapter II reviews the development of VQA techniques, which primes with the release of GPT-4V. Chapter III reviews the use of image analysis in educational studies. Chapter IV demonstrates how GPT-4V can be used for each research usage reviewed in Chapter III, with operating prompts provided. Finally, chapter V discusses the future implications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. \bibcommenthead
  2. \APACrefYearMonthDay2022November. \BBOQ\APACrefatitleStacked Attention based Textbook Visual Question Answering with BERT Stacked attention based textbook visual question answering with bert.\BBCQ \APACrefbtitle2022 IEEE 19th India Council International Conference (INDICON) 2022 ieee 19th india council international conference (indicon) (\BPGS 1–7). \PrintBackRefs\CurrentBib
  3. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFlamingo: a Visual Language Model for Few-Shot Learning Flamingo: a visual language model for few-shot learning.\BBCQ S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho\BCBL \BBA A. Oh (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 35, \BPGS 23716–23736). \APACaddressPublisherCurran Associates, Inc. \PrintBackRefs\CurrentBib
  4. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleVQA: Visual Question Answering Vqa: Visual question answering.\BBCQ \APACrefbtitleProceedings of the IEEE International Conference on Computer Vision (ICCV) Proceedings of the ieee international conference on computer vision (iccv) (\BPGS 2425–2433). \PrintBackRefs\CurrentBib
  5. \APACrefYearMonthDay2024. \APACrefbtitleTaking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education. Taking the next step with generative artificial intelligence: The transformative role of multimodal large language models in science education. {APACrefURL} https://doi.org/10.48550/arXiv.2401.00832 \APACrefnotearXiv:2401.00832 [cs.AI] \PrintBackRefs\CurrentBib
  6. \APACrefYearMonthDay2010October. \BBOQ\APACrefatitleVizWiz: Nearly Real-Time Answers to Visual Questions Vizwiz: Nearly real-time answers to visual questions.\BBCQ \APACrefbtitleProceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology Proceedings of the 23rd annual acm symposium on user interface software and technology (\BPGS 333–342). \PrintBackRefs\CurrentBib
  7. \APACrefYearMonthDay2023May. \BBOQ\APACrefatitleAutomatic item generation: foundations and machine learning-based approaches for assessments Automatic item generation: foundations and machine learning-based approaches for assessments.\BBCQ \APACjournalVolNumPagesFrontiers in Education8858273, \PrintBackRefs\CurrentBib
  8. \APACrefYearMonthDay2023. \APACrefbtitleUsing GPT-4 to Augment Unbalanced Data for Automatic Scoring. Using gpt-4 to augment unbalanced data for automatic scoring. {APACrefURL} https://doi.org/10.48550/arXiv.2310.18365 \APACrefnotearXiv:2310.18365v2 [cs.CL] \PrintBackRefs\CurrentBib
  9. \APACrefYearMonthDay2023. \APACrefbtitleEDUVI: An Educational-Based Visual Question Answering and Image Captioning System for Enhancing the Knowledge of Primary Level Students. Eduvi: An educational-based visual question answering and image captioning system for enhancing the knowledge of primary level students. {APACrefURL} https://doi.org/10.21203/rs.3.rs-2594097/v1 \APACrefnoteThis is a preprint; it has not been peer-reviewed by a journal. This work is licensed under a CC BY 4.0 License. \PrintBackRefs\CurrentBib
  10. \APACrefYearMonthDay2021Jul. \BBOQ\APACrefatitleScaling up visual and vision-language representation learning with noisy text supervision Scaling up visual and vision-language representation learning with noisy text supervision.\BBCQ \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 4904–4916). \APACaddressPublisherPMLR. \PrintBackRefs\CurrentBib
  11. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleApplying large language models and chain-of-thought for automatic scoring Applying large language models and chain-of-thought for automatic scoring.\BBCQ \APACjournalVolNumPagesComputers and Education: Artificial Intelligence100213, \PrintBackRefs\CurrentBib
  12. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleRe-examining Student Conception on the Particulate Nature of Matter: A Cross-sectional Approach Re-examining student conception on the particulate nature of matter: A cross-sectional approach.\BBCQ \APACrefbtitleProceedings of the 2021 International Conference of Korean Association for Science Education Proceedings of the 2021 international conference of korean association for science education (\BPG 191). \PrintBackRefs\CurrentBib
  13. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleCollaborative Learning with Artificial Intelligence Speakers (CLAIS): Pre-Service Elementary Science Teachers’ Responses to the Prototype Collaborative learning with artificial intelligence speakers (clais): Pre-service elementary science teachers’ responses to the prototype.\BBCQ \APACjournalVolNumPagesScience & Education, \APACrefnoteIn press \PrintBackRefs\CurrentBib
  14. \APACrefYearMonthDay2023. \APACrefbtitleMultimodality of AI for Education: Towards Artificial General Intelligence. Multimodality of ai for education: Towards artificial general intelligence. {APACrefURL} https://doi.org/10.48550/arXiv.2312.06037 \APACrefnotearXiv:2312.06037 [cs.AI] \PrintBackRefs\CurrentBib
  15. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleNERIF: GPT-4V for Automatic Scoring of Drawn Models Nerif: Gpt-4v for automatic scoring of drawn models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2311.12990, 2311.12990 \PrintBackRefs\CurrentBib
  16. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAutomated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter Automated assessment of student hand drawings in free-response items on the particulate nature of matter.\BBCQ \APACjournalVolNumPagesJournal of Science Education and Technology1–18, {APACrefDOI} https://doi.org/10.1007/s10956-023-10042-3 {APACrefURL} https://doi.org/10.1007/s10956-023-10042-3 \PrintBackRefs\CurrentBib
  17. \APACrefYearMonthDay2019. \APACrefbtitleVisualBERT: A simple and performant baseline for vision and language. Visualbert: A simple and performant baseline for vision and language. \PrintBackRefs\CurrentBib
  18. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleCan we and should we use artificial intelligence for formative assessment in science? Can we and should we use artificial intelligence for formative assessment in science?\BBCQ \APACjournalVolNumPagesJournal of Research in Science Teaching6061385–1389, \PrintBackRefs\CurrentBib
  19. \APACinsertmetastarlin2023research{APACrefauthors}Lin, F.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleResearch on the Teaching Method of College Students’ Education Based on Visual Question Answering Technology Research on the teaching method of college students’ education based on visual question answering technology.\BBCQ \APACjournalVolNumPagesInternational Journal of Emerging Technologies in Learning (iJET)1822167–182, {APACrefDOI} https://doi.org/10.3991/ijet.v18i22.44103 {APACrefURL} https://doi.org/10.3991/ijet.v18i22.44103 \PrintBackRefs\CurrentBib
  20. \APACrefYearMonthDay1981. \BBOQ\APACrefatitlePupils’ understanding of the particulate nature of matter: A cross-age study Pupils’ understanding of the particulate nature of matter: A cross-age study.\BBCQ \APACjournalVolNumPagesScience Education652187–196, \PrintBackRefs\CurrentBib
  21. \APACinsertmetastaropenai2023chatgpt{APACrefauthors}OpenAI  \APACrefYearMonthDay2023. \APACrefbtitleChatGPT can now see, hear, and speak. Chatgpt can now see, hear, and speak. \APAChowpublishedhttps://openai.com/blog/chatgpt-can-now-see-hear-and-speak. \APACrefnoteAccessed: 2023-09-25 \PrintBackRefs\CurrentBib
  22. \APACinsertmetastaropenai2023gpt4{APACrefauthors}OpenAI  \APACrefYearMonthDay2023\BCnt1. \BBOQ\APACrefatitleGPT-4 Technical Report Gpt-4 technical report.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2303.08774, \PrintBackRefs\CurrentBib
  23. \APACinsertmetastarOpenAI2023GPT4V{APACrefauthors}OpenAI  \APACrefYearMonthDay2023\BCnt2September25. \APACrefbtitleGPT-4V(ision) System Card. GPT-4V(ision) System Card. {APACrefURL} https://openai.com/research/gpt-4v-system-card \APACrefnoteAccessed: 1-Mar-2024 \PrintBackRefs\CurrentBib
  24. \APACinsertmetastarortiz2024figure{APACrefauthors}Ortiz, S.  \APACrefYearMonthDay2024mar14. \BBOQ\APACrefatitleFigure’s humanoid robot can have a full conversation with you. Watch for yourself Figure’s humanoid robot can have a full conversation with you. watch for yourself.\BBCQ \APACjournalVolNumPagesZDNet, {APACrefURL} https://www.zdnet.com/article/figure-and-openais-humanoid-robot-can-have-a-full-conversation-with-you-watch-for-yourself/ \APACrefnoteAccessed: 2024-05-11 \PrintBackRefs\CurrentBib
  25. \APACrefYearMonthDay2021Jul. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 8748–8763). \APACaddressPublisherPMLR. \PrintBackRefs\CurrentBib
  26. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleEDUBOT-A Chatbot For Education in Covid-19 Pandemic and VQAbot Comparison Edubot-a chatbot for education in covid-19 pandemic and vqabot comparison.\BBCQ \APACrefbtitle2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) 2021 second international conference on electronics and sustainable communication systems (icesc) (\BPG 1707-1714). \PrintBackRefs\CurrentBib
  27. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleGamification of a Visual Question Answer System Gamification of a visual question answer system.\BBCQ \APACrefbtitle2018 IEEE Tenth International Conference on Technology for Education (T4E) 2018 ieee tenth international conference on technology for education (t4e) (\BPG 41-44). \PrintBackRefs\CurrentBib
  28. \APACrefYearMonthDay2023. \APACrefbtitleBioinformatics Illustrations Decoded by ChatGPT: The Good, The Bad, and The Ugly. Bioinformatics illustrations decoded by chatgpt: The good, the bad, and the ugly. \PrintBackRefs\CurrentBib
  29. \APACrefYearMonthDay2023. \APACrefbtitleCan GPT-4V (ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. \PrintBackRefs\CurrentBib
  30. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleVisual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Visual chatgpt: Talking, drawing and editing with visual foundation models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2303.04671, {APACrefDOI} https://doi.org/10.48550/arXiv.2303.04671 {APACrefURL} https://doi.org/10.48550/arXiv.2303.04671 2303.04671 [cs.CV] \PrintBackRefs\CurrentBib
  31. \APACrefYearMonthDay2023. \APACrefbtitleAn Early Evaluation of GPT-4V (ision). An early evaluation of gpt-4v (ision). \PrintBackRefs\CurrentBib
  32. \APACrefYearMonthDay2023. \APACrefbtitleThe dawn of lmms: Preliminary explorations with gpt-4v (ision) The dawn of lmms: Preliminary explorations with gpt-4v (ision) (\BVOL 9). \PrintBackRefs\CurrentBib
  33. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleVision-Language Models for Vision Tasks: A Survey Vision-language models for vision tasks: A survey.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Pattern Analysis and Machine Intelligence1-20, {APACrefDOI} https://doi.org/10.1109/TPAMI.2024.3369699 \PrintBackRefs\CurrentBib
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Gyeong-Geon Lee (11 papers)
  2. Xiaoming Zhai (48 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com