Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI (2405.07163v1)
Abstract: Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual LLMs, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art visual langauge model service so that VQA could be used for a variety of purposes. However, VQA and GPT-4V have not yet been applied to educational studies much. In this position paper, we suggest that GPT-4V contributes to realizing VQA for education. By 'realizing' VQA, we denote two meanings: (1) GPT-4V realizes the utilization of VQA techniques by any educational scholars without technical/accessibility barrier, and (2) GPT-4V makes educational scholars realize the usefulness of VQA to educational research. Given these, this paper aims to introduce VQA for educational studies so that it provides a milestone for educational research methodology. In this paper, chapter II reviews the development of VQA techniques, which primes with the release of GPT-4V. Chapter III reviews the use of image analysis in educational studies. Chapter IV demonstrates how GPT-4V can be used for each research usage reviewed in Chapter III, with operating prompts provided. Finally, chapter V discusses the future implications.
- \bibcommenthead
- \APACrefYearMonthDay2022November. \BBOQ\APACrefatitleStacked Attention based Textbook Visual Question Answering with BERT Stacked attention based textbook visual question answering with bert.\BBCQ \APACrefbtitle2022 IEEE 19th India Council International Conference (INDICON) 2022 ieee 19th india council international conference (indicon) (\BPGS 1–7). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFlamingo: a Visual Language Model for Few-Shot Learning Flamingo: a visual language model for few-shot learning.\BBCQ S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho\BCBL \BBA A. Oh (\BEDS), \APACrefbtitleAdvances in Neural Information Processing Systems Advances in neural information processing systems (\BVOL 35, \BPGS 23716–23736). \APACaddressPublisherCurran Associates, Inc. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitleVQA: Visual Question Answering Vqa: Visual question answering.\BBCQ \APACrefbtitleProceedings of the IEEE International Conference on Computer Vision (ICCV) Proceedings of the ieee international conference on computer vision (iccv) (\BPGS 2425–2433). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2024. \APACrefbtitleTaking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education. Taking the next step with generative artificial intelligence: The transformative role of multimodal large language models in science education. {APACrefURL} https://doi.org/10.48550/arXiv.2401.00832 \APACrefnotearXiv:2401.00832 [cs.AI] \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2010October. \BBOQ\APACrefatitleVizWiz: Nearly Real-Time Answers to Visual Questions Vizwiz: Nearly real-time answers to visual questions.\BBCQ \APACrefbtitleProceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology Proceedings of the 23rd annual acm symposium on user interface software and technology (\BPGS 333–342). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023May. \BBOQ\APACrefatitleAutomatic item generation: foundations and machine learning-based approaches for assessments Automatic item generation: foundations and machine learning-based approaches for assessments.\BBCQ \APACjournalVolNumPagesFrontiers in Education8858273, \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleUsing GPT-4 to Augment Unbalanced Data for Automatic Scoring. Using gpt-4 to augment unbalanced data for automatic scoring. {APACrefURL} https://doi.org/10.48550/arXiv.2310.18365 \APACrefnotearXiv:2310.18365v2 [cs.CL] \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleEDUVI: An Educational-Based Visual Question Answering and Image Captioning System for Enhancing the Knowledge of Primary Level Students. Eduvi: An educational-based visual question answering and image captioning system for enhancing the knowledge of primary level students. {APACrefURL} https://doi.org/10.21203/rs.3.rs-2594097/v1 \APACrefnoteThis is a preprint; it has not been peer-reviewed by a journal. This work is licensed under a CC BY 4.0 License. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021Jul. \BBOQ\APACrefatitleScaling up visual and vision-language representation learning with noisy text supervision Scaling up visual and vision-language representation learning with noisy text supervision.\BBCQ \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 4904–4916). \APACaddressPublisherPMLR. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2024. \BBOQ\APACrefatitleApplying large language models and chain-of-thought for automatic scoring Applying large language models and chain-of-thought for automatic scoring.\BBCQ \APACjournalVolNumPagesComputers and Education: Artificial Intelligence100213, \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleRe-examining Student Conception on the Particulate Nature of Matter: A Cross-sectional Approach Re-examining student conception on the particulate nature of matter: A cross-sectional approach.\BBCQ \APACrefbtitleProceedings of the 2021 International Conference of Korean Association for Science Education Proceedings of the 2021 international conference of korean association for science education (\BPG 191). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2024. \BBOQ\APACrefatitleCollaborative Learning with Artificial Intelligence Speakers (CLAIS): Pre-Service Elementary Science Teachers’ Responses to the Prototype Collaborative learning with artificial intelligence speakers (clais): Pre-service elementary science teachers’ responses to the prototype.\BBCQ \APACjournalVolNumPagesScience & Education, \APACrefnoteIn press \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleMultimodality of AI for Education: Towards Artificial General Intelligence. Multimodality of ai for education: Towards artificial general intelligence. {APACrefURL} https://doi.org/10.48550/arXiv.2312.06037 \APACrefnotearXiv:2312.06037 [cs.AI] \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleNERIF: GPT-4V for Automatic Scoring of Drawn Models Nerif: Gpt-4v for automatic scoring of drawn models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2311.12990, 2311.12990 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAutomated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter Automated assessment of student hand drawings in free-response items on the particulate nature of matter.\BBCQ \APACjournalVolNumPagesJournal of Science Education and Technology1–18, {APACrefDOI} https://doi.org/10.1007/s10956-023-10042-3 {APACrefURL} https://doi.org/10.1007/s10956-023-10042-3 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \APACrefbtitleVisualBERT: A simple and performant baseline for vision and language. Visualbert: A simple and performant baseline for vision and language. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleCan we and should we use artificial intelligence for formative assessment in science? Can we and should we use artificial intelligence for formative assessment in science?\BBCQ \APACjournalVolNumPagesJournal of Research in Science Teaching6061385–1389, \PrintBackRefs\CurrentBib
- \APACinsertmetastarlin2023research{APACrefauthors}Lin, F. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleResearch on the Teaching Method of College Students’ Education Based on Visual Question Answering Technology Research on the teaching method of college students’ education based on visual question answering technology.\BBCQ \APACjournalVolNumPagesInternational Journal of Emerging Technologies in Learning (iJET)1822167–182, {APACrefDOI} https://doi.org/10.3991/ijet.v18i22.44103 {APACrefURL} https://doi.org/10.3991/ijet.v18i22.44103 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay1981. \BBOQ\APACrefatitlePupils’ understanding of the particulate nature of matter: A cross-age study Pupils’ understanding of the particulate nature of matter: A cross-age study.\BBCQ \APACjournalVolNumPagesScience Education652187–196, \PrintBackRefs\CurrentBib
- \APACinsertmetastaropenai2023chatgpt{APACrefauthors}OpenAI \APACrefYearMonthDay2023. \APACrefbtitleChatGPT can now see, hear, and speak. Chatgpt can now see, hear, and speak. \APAChowpublishedhttps://openai.com/blog/chatgpt-can-now-see-hear-and-speak. \APACrefnoteAccessed: 2023-09-25 \PrintBackRefs\CurrentBib
- \APACinsertmetastaropenai2023gpt4{APACrefauthors}OpenAI \APACrefYearMonthDay2023\BCnt1. \BBOQ\APACrefatitleGPT-4 Technical Report Gpt-4 technical report.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2303.08774, \PrintBackRefs\CurrentBib
- \APACinsertmetastarOpenAI2023GPT4V{APACrefauthors}OpenAI \APACrefYearMonthDay2023\BCnt2September25. \APACrefbtitleGPT-4V(ision) System Card. GPT-4V(ision) System Card. {APACrefURL} https://openai.com/research/gpt-4v-system-card \APACrefnoteAccessed: 1-Mar-2024 \PrintBackRefs\CurrentBib
- \APACinsertmetastarortiz2024figure{APACrefauthors}Ortiz, S. \APACrefYearMonthDay2024mar14. \BBOQ\APACrefatitleFigure’s humanoid robot can have a full conversation with you. Watch for yourself Figure’s humanoid robot can have a full conversation with you. watch for yourself.\BBCQ \APACjournalVolNumPagesZDNet, {APACrefURL} https://www.zdnet.com/article/figure-and-openais-humanoid-robot-can-have-a-full-conversation-with-you-watch-for-yourself/ \APACrefnoteAccessed: 2024-05-11 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021Jul. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 8748–8763). \APACaddressPublisherPMLR. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleEDUBOT-A Chatbot For Education in Covid-19 Pandemic and VQAbot Comparison Edubot-a chatbot for education in covid-19 pandemic and vqabot comparison.\BBCQ \APACrefbtitle2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) 2021 second international conference on electronics and sustainable communication systems (icesc) (\BPG 1707-1714). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleGamification of a Visual Question Answer System Gamification of a visual question answer system.\BBCQ \APACrefbtitle2018 IEEE Tenth International Conference on Technology for Education (T4E) 2018 ieee tenth international conference on technology for education (t4e) (\BPG 41-44). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleBioinformatics Illustrations Decoded by ChatGPT: The Good, The Bad, and The Ugly. Bioinformatics illustrations decoded by chatgpt: The good, the bad, and the ugly. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleCan GPT-4V (ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleVisual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Visual chatgpt: Talking, drawing and editing with visual foundation models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2303.04671, {APACrefDOI} https://doi.org/10.48550/arXiv.2303.04671 {APACrefURL} https://doi.org/10.48550/arXiv.2303.04671 2303.04671 [cs.CV] \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleAn Early Evaluation of GPT-4V (ision). An early evaluation of gpt-4v (ision). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleThe dawn of lmms: Preliminary explorations with gpt-4v (ision) The dawn of lmms: Preliminary explorations with gpt-4v (ision) (\BVOL 9). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2024. \BBOQ\APACrefatitleVision-Language Models for Vision Tasks: A Survey Vision-language models for vision tasks: A survey.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Pattern Analysis and Machine Intelligence1-20, {APACrefDOI} https://doi.org/10.1109/TPAMI.2024.3369699 \PrintBackRefs\CurrentBib
- Gyeong-Geon Lee (11 papers)
- Xiaoming Zhai (48 papers)