Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodality of AI for Education: Towards Artificial General Intelligence (2312.06037v2)

Published 10 Dec 2023 in cs.AI
Multimodality of AI for Education: Towards Artificial General Intelligence

Abstract: This paper presents a comprehensive examination of how multimodal AI approaches are paving the way towards the realization of AGI in educational contexts. It scrutinizes the evolution and integration of AI in educational systems, emphasizing the crucial role of multimodality, which encompasses auditory, visual, kinesthetic, and linguistic modes of learning. This research delves deeply into the key facets of AGI, including cognitive frameworks, advanced knowledge representation, adaptive learning mechanisms, strategic planning, sophisticated language processing, and the integration of diverse multimodal data sources. It critically assesses AGI's transformative potential in reshaping educational paradigms, focusing on enhancing teaching and learning effectiveness, filling gaps in existing methodologies, and addressing ethical considerations and responsible usage of AGI in educational settings. The paper also discusses the implications of multimodal AI's role in education, offering insights into future directions and challenges in AGI development. This exploration aims to provide a nuanced understanding of the intersection between AI, multimodality, and education, setting a foundation for future research and development in AGI.

Overview of Multimodal AGI for Education

Multimodal AI is spearheading a transformation towards AGI within educational environments. This shift signifies the merger of different learning modes—auditory, visual, kinesthetic, and linguistic—into AI systems to foster a learning experience that is more comprehensive and tailored to individual students' needs.

Theoretical Foundations of Multimodality in Learning

The concept of multimodality refers to the manifold channels through which humans process information, including text, images, sounds, and gestures. Multimodality’s importance in education is underpinned by theories such as Dual Coding Theory and Multimedia Theory, emphasizing the need for diverse sensory channels in learning. For instance, combining visual and auditory information can enhance memory and understanding. These theories are influential in the development of AGI systems, which aim to simulate these varied cognitive processes to aid human learning.

AGI in Educational Settings

AGI has the potential to redefine the educational landscape through sophisticated integrations, such as cognitive frameworks that mirror human reasoning and perception, strategies for knowledge representation that reflect our capacities for logic and learning, and adaptive systems that can tailor approaches to individual student preferences and requirements. The field is moving away from strictly single-modal AI applications toward AGI systems that can interact with the world and students in a multidimensional manner. For instance, recent advances in natural language processing allow for improved communication avenues between students and AGI education systems.

Educational AGI is being designed not just to replicate human intelligence but to work synergistically with educators to shape the learning experience. Strategic planning and decision-making processes are embedded within AGI, lending it the capability to assist with administrative tasks within educational institutions and individualize learning pathways for students.

AGI's Transformative Potential and Challenges

The potential of AGI in reshaping education is immense, with technologies designed to enhance and supplement various aspects of learning and teaching. However, along with technological advancements, AGI systems in education must be developed and implemented responsibly, maintaining ethical standards, ensuring transparency, and preserving academic integrity.

Ensuring Ethical Integrity in Educational AGI

Concerns around data privacy, the potential for biases, and the ethical use of AGI systems are significant. As AGI systems are capable of generating content including assessments, ethical considerations regarding academic integrity, such as plagiarism, become paramount. Educational institutions must establish clear policies to guide the integration of AGI and support students in understanding the importance of original work.

Explainability and Transparency in Educational AGI

The complex nature of AGI models necessitates a transparent approach, where teachers can understand and trust the system's decision-making processes, especially in multimodal assessments. Addressing AI-generated misinformation is also essential as the line between human-generated and AGI-generated content becomes increasingly blurred.

Responsible Use of Educational AGI

The rapid integration of AGI in educational settings demands a shared approach to responsibility. Stakeholders, including educators, policy-makers, and researchers, must join forces to address the ethical implications of AGI in classrooms. Human agency must be preserved, with AGI serving as a tool to enhance human capabilities instead of replacing them.

In conclusion, multimodal AI paves the way towards AGI in education, promising a future of personalized and dynamic educational experiences. As we transition to AGI, it is crucial to approach this new horizon with a commitment to ethical practices, accountability, and a deeper understanding of AI’s role in education.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (277)
  1. [n. d.]. Introducing ChatGPT — openai.com. https://openai.com/blog/chatgpt. [Accessed 10-12-2023].
  2. GPT-3-driven pedagogical agents for training children’s curious question-asking skills. International Journal of Artificial Intelligence in Education 167, 3 (2023), 102887.
  3. Daron Acemoglu and Pascual Restrepo. 2018. The race between man and machine: Implications of technology for growth, factor shares, and employment. American economic review 108, 6 (2018), 1488–1542.
  4. Content based image retrieval using image features information fusion. Information Fusion 51 (2019), 76–99.
  5. Stacked Attention based Textbook Visual Question Answering with BERT. In 2022 IEEE 19th India Council International Conference (INDICON). IEEE, 1–7.
  6. Selin Akgun and Christine Greenhow. 2021. Artificial intelligence in education: Addressing ethical challenges in K-12 settings. AI and Ethics (2021), 1–10.
  7. D. Allison. 1998. Text in Education and Society. World Scientific.
  8. Nantheera Anantrasirichai and David Bull. 2022. Artificial intelligence in the creative industries: a review. Artificial intelligence review (2022), 1–68.
  9. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
  10. VQA: Visual Question Answering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2425–2433. https://doi.org/10.1109/ICCV.2015.287
  11. Mohd Anwar and Jim Greer. 2011. Facilitating trust in privacy-preserving e-learning environments. IEEE Transactions on Learning Technologies 5, 1 (2011), 62–73.
  12. Dbpedia: A nucleus for a web of open data. In The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, November 11-15, 2007. Proceedings. Springer, 722–735.
  13. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390 (2023).
  14. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
  15. Ryan S Baker and Aaron Hawn. 2021. Algorithmic bias in education. International Journal of Artificial Intelligence in Education (2021), 1–41.
  16. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowledge and Information Systems 37 (2013), 61–81.
  17. CityFM: City Foundation Models to Solve Urban Challenges. arXiv preprint arXiv:2310.00583 (2023).
  18. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 35. 32897–32912.
  19. A. W. (Tony) Bates. 2015. Teaching in a Digital Age: Guidelines for Designing Teaching and Learning.
  20. Armin Baur. 2015. Inwieweit eignen sich bisherige Diagnoseverfahren des Bereichs Experimentieren für die Schulpraxis? Zeitschrift für Didaktik der Biologie (ZDB) - Biologie Lehren und Lernen 19 (2015), 26–37. https://doi.org/10.4119/zdb-1640
  21. Abstractive video lecture summarization: applications and future prospects. Education and Information Technologies (2023), 1–21.
  22. Ruha Benjamin. 2019. Race after technology. In Social Theory Re-Wired. Routledge, 405–415.
  23. R. E. Bennett. 2010. Cognitively based assessment of, for, and as learning (CBAL): A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research & Perspective 8, 2–3 (2010), 70–91. https://doi.org/10.1080/15366367.2010.508686
  24. Use of Generative Adversarial Networks (GANs) in Educational Technology Research. (2023).
  25. Assessing student errors in experimentation using artificial intelligence and large language models: A comparative study with human raters. Computers and Education: Artificial Intelligence 5 (2023), 100177.
  26. VizWiz: Nearly Real-Time Answers to Visual Questions. In Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology. 333–342.
  27. A. Birhane and D. Raji. 2022. ChatGPT, Galactica, and the Progress Trap. https:/www.wired.com/story/large-language-models-critique/.
  28. Charmaine Bissessar. 2023. To use or not to use ChatGPT and assistive artificial intelligence tools in higher education institutions? The modern-day conundrum–students’ and faculty’s perspectives. Equity in Education & Society (2023), 27526461231215083.
  29. Paula Boddington. 2017. Towards a code of ethics for artificial intelligence. Springer.
  30. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
  31. Question answering systems: survey and trends. Procedia Computer Science 73 (2015), 366–375.
  32. W Boyd. 1921. The History of Western Education. A. & C. Black.
  33. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  34. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712 [cs.CL]
  35. Multimodal Approaches for Alzheimer’s Detection Using Patients’ Speech and Transcript. In International Conference on Brain Informatics. Springer, 395–406.
  36. Oscar Canovas and Felix J Garcia. 2022. Analysis of Classroom Interaction Using Speaker Diarization and Discourse Features from Audio Recordings. In International Conference on Interactive Collaborative Learning. Springer, 67–74.
  37. Elucidating STEM Concepts through Generative AI: A Multi-modal Exploration of Analogical Reasoning. arXiv preprint arXiv:2308.10454 (2023).
  38. Chaka Chaka. 2023. Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools. Journal of Applied Learning and Teaching 6, 2 (2023).
  39. Implementing machine learning in health care—addressing ethical challenges. The New England journal of medicine 378, 11 (2018), 981.
  40. CP Chen and CH Wang. 2015. Employing augmented-reality-embedded instruction to disperse the imparities of individual differences in earth science learning. Journal of Science Education and Technology 24 (2015), 835–847.
  41. Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478 (2023).
  42. Review of image classification algorithms based on convolutional neural networks. Remote Sensing 13, 22 (2021), 4712.
  43. James M Clark and Allan Paivio. 1991. Dual Coding Theory and Education. Educational Psychology Review 3, 3 (1991).
  44. John Amos Comenius. 1887. The Orbis Pictus of John Amos Comenius. C. W. Bardeen, New York.
  45. Deep learning for classroom activity detection from audio. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3727–3731.
  46. Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International (2023), 1–12.
  47. National Research Council et al. 2014. Developing assessments for the next generation science standards. (2014).
  48. Generative adversarial networks: An overview. IEEE signal processing magazine 35, 1 (2018), 53–65.
  49. Geoffrey M Currie. 2023. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy?. In Seminars in Nuclear Medicine. Elsevier.
  50. Mehul Reuben Das. 2023. Elon Musk is on a mission to create the world’s first AGI, an AI that is as smart as humans. https://www.firstpost.com/tech/news-analysis/elon-musk-is-on-a-mission-to-create-the-worlds-first-agi-an-ai-that-is-as-smart-as-humans-13105542.html
  51. Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164 (2019).
  52. Ernest Davis and Gary Marcus. 2015. Commonsense reasoning and commonsense knowledge in artificial intelligence. Commun. ACM 58, 9 (2015), 92–103.
  53. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  54. Virginia Dignum. 2019. Responsible artificial intelligence: how to develop and use AI in a responsible way. Vol. 2156. Springer.
  55. Thermal cameras as a semiotic resource for inquiry in a South African township school context. Designs for Learning 10, 1 (2018), 123–134.
  56. Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges. arXiv preprint arXiv:2309.07438 (2023).
  57. The impact of peer assessment on academic performance: A meta-analysis of control group studies. (2020).
  58. A multimodal analysis of college students’ collaborative problem solving in virtual experimentation activities: A perspective of cognitive load. Journal of Computing in Higher Education 35, 2 (2023), 272–295.
  59. A. I. M. Elfeky. 2018. The effect of personal learning environments on participants’ higher order. Innovations in Education and Teaching International (2018), 505–516.
  60. Andy Extance. 2023. ChatGPT has entered the classroom: how LLMs could transform education. Nature 623, 7987 (2023), 474–477.
  61. Through the viewfinder: Reflecting on the collection and analysis of classroom video data. International Journal of Qualitative Methods 12, 1 (2013), 52–64.
  62. Henning Fjørtoft. 2020. Multimodal digital classroom assessments. Computers & Education 152 (2020), 103892.
  63. Neil D Fleming and Colleen Mills. 1992. Not another inventory, rather a catalyst for reflection. To Improve the Academy 11, 1 (1992), 137–155.
  64. Luciano Floridi. 2019. Translating principles into practices of digital ethics: Five risks of being unethical. Philosophy & Technology 32, 2 (2019), 185–193.
  65. ENAI Recommendations on the ethical use of Artificial Intelligence in Education. International Journal for Educational Integrity 19, 1 (2023), 12.
  66. Errol Francke and Alexander Bennett. 2019. The potential influence of artificial intelligence on plagiarism: A higher education perspective. In European Conference on the Impact of Artificial Intelligence and Robotics (ECIAIR 2019). 131–140.
  67. Carla Conrad Freeman. 1990. Visual Media in Education: An Informal History. Visual Resources 6, 4 (1990), 327–340. https://doi.org/10.1080/01973762.1990.9658877
  68. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016).
  69. Explainable AI in industry. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 3203–3204.
  70. Deb Gearhart. 2012. Lack of Ethics for eLearning: Two sides of the ethical coin. International Journal of Technoethics (IJT) 3, 4 (2012), 33–40.
  71. Paul Ginns. 2005. Meta-analysis of the modality effect. Learning and Instruction 15, 4 (2005), 313–331. https://doi.org/10.1016/j.learninstruc.2005.07.001
  72. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  73. Engineering General Intelligence, Part 1. Atlantis Thinking Machines 5 (2014).
  74. Deep learning. MIT press.
  75. Google DeepMind. 2023. Welcome to the Gemini era. https://deepmind.google/technologies/gemini/introduction.
  76. Google Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models. https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf.
  77. Deeksha Gupta and Akashdeep Sharma. 2023. A comprehensive study of automatic video summarization techniques. Artificial Intelligence Review (2023), 1–161.
  78. EDUVI: An Educational-Based Visual Question Answering and Image Captioning System for Enhancing the Knowledge of Primary Level Students. https://doi.org/10.21203/rs.3.rs-2594097/v1 arXiv:rs-2594097/v1 This is a preprint; it has not been peer-reviewed by a journal. This work is licensed under a CC BY 4.0 License..
  79. Imagebind-llm: Multi-modality instruction tuning. arXiv preprint arXiv:2309.03905 (2023).
  80. J. Hattie. 2009. Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
  81. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  82. Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis. Advances in Neural Information Processing Systems 34 (2021).
  83. Heaven. 2022. Why Meta’s latest large language model only survived three days online. https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/.
  84. Julia Hirschberg and Christopher D Manning. 2015. Advances in natural language processing. Science 349, 6245 (2015), 261–266.
  85. Evaluating large language models on a highly-specialized topic, radiation oncology physics. arXiv preprint arXiv:2304.01938 (2023).
  86. Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Education (2021), 1–23.
  87. Designing for complementarity: Teacher and student needs for orchestration support in AI-enhanced classrooms. In Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, June 25-29, 2019, Proceedings, Part I 20. Springer, 157–171.
  88. Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045 (2023).
  89. Exploring teachers’ attitudes towards using chatgpt. Glob. J. Manag. Adm. Sci 3 (2022), 97–111.
  90. Maurice T Iverson. 1953. A Historical and Structural Survey of Audio-Visual Techniques in Education 1900-1950. Ph.D. dissertation. State University of Iowa, Department of Education, Graduate College.
  91. Automated classification of classroom climate by audio analysis. In 9th International Workshop on Spoken Dialogue System Technology. Springer, 41–49.
  92. Know, Know Where, KnowWhereGraph: A densely connected, cross-domain knowledge graph and geo-enrichment service stack for applications in environmental intelligence. AI Magazine 43, 1 (2022), 30–39.
  93. Zero-shot generation of coherent storybook from plain text story using diffusion models. arXiv preprint arXiv:2302.03900 (2023).
  94. ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments.
  95. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (2019), 389–399.
  96. Kushal Kafle and Christopher Kanan. 2017. Visual question answering: Datasets, algorithms, and future challenges. Computer Vision and Image Understanding 163 (2017), 3–20.
  97. Daniel Kahneman. 2011. Thinking, fast and slow. macmillan.
  98. Denoising diffusion restoration models. Advances in Neural Information Processing Systems 35 (2022), 23593–23606.
  99. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
  100. Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence 43, 11 (2020), 3964–3979.
  101. Human-versus artificial intelligence. Frontiers in artificial intelligence 4 (2021), 622364.
  102. All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. Journal of experimental political science 9, 1 (2022), 104–117.
  103. Multimodal Teaching and Learning: The Rhetorics of the Science Classroom. A&C Black.
  104. Using smartphone thermal cameras to engage students’ misconceptions about energy. The Physics Teacher 55, 8 (2017), 504–505.
  105. William L Kuechler and Mark G Simkin. 2010. Why is performance on multiple-choice tests and constructed-response tests not more closely related? Theory and an empirical test. Decision Sciences Journal of Innovative Education 8, 1 (2010), 55–73.
  106. Math Education with Large Language Models: Peril or Promise? Available at SSRN 4641653 (2023).
  107. Vivekanandan Kumar and David Boulanger. 2020. Explainable automated essay scoring: Deep learning really has pedagogical value. In Frontiers in education, Vol. 5. Frontiers Media SA, 572367.
  108. Physics task development of prospective physics teachers using ChatGPT. http://arxiv.org/pdf/2304.10014v1.
  109. Building machines that learn and think like people. Behavioral and brain sciences 40 (2017), e253.
  110. Pat Langley. 2006. Cognitive architectures and general intelligent systems. AI magazine 27, 2 (2006), 33–33.
  111. Artificial general intelligence (AGI) for education. arXiv preprint arXiv:2304.12479 (2023).
  112. Ehsan Latif and Xiaoming Zhai. 2023. Fine-tuning ChatGPT for Automatic Scoring. arXiv preprint arXiv:2310.10072 (2023).
  113. ZB MED Informationszentrum Lebenswissenschaften. 2023. Copyright and academic research: what are the key issues that affect you as an author. https://www.publisso.de/en/advice/publishing-advice-faqs/copyright-and-academic-research/#:~:text=URL%3A%20https%3A%2F%2Fwww.publisso.de%2Fen%2Fadvice%2Fpublishing
  114. Development of the Hands-free AI Speaker System Supporting Hands-on Science Laboratory Class: A Rapid Prototyping. International Journal of Emerging Technologies in Learning 18, 1 (2023), 115–136. https://doi.org/10.3991/ijet.v18i01.34843
  115. Gyeong-Geon Lee and Hun-Gi Hong. 2019. Applicability of Deep Learning-Based Video Processing Technology for Safe Laboratory Activity: Focusing on Object Detection for Experimental Apparatus and Equipments. In Proceedings of The Korean Association for Science Education Summer Conference.
  116. G. G. Lee and X. Zhai. 2023. NERIF: GPT-4V for Automatic Scoring of Drawn Models. arXiv preprint arXiv:2311.12990 (2023). arXiv:2311.12990
  117. Combining Deep Learning and Computer Vision Techniques for Automatic Analysis of the Learning Process in STEM Education. In International Conference on Innovative Technologies and Learning. Springer, 22–32.
  118. Automated Assessment of Student Hand Drawings in Free-Response Items on the Particulate Nature of Matter. Journal of Science Education and Technology (2023), 1–18. https://doi.org/10.1007/s10956-023-10042-3
  119. YJ Lee. 2019. Integrating Multimodal Technologies with VARK Strategies for Learning and Teaching EFL Presentation: An Investigation into Learners’ Achievements and Perceptions of the Learning Process. Australian Journal of Applied Linguistics 2, 1 (2019), 17–31.
  120. A collection of definitions of intelligence. Frontiers in Artificial Intelligence and applications 157 (2007), 17.
  121. Attempted validation of the scores of the VARK: Learning styles inventory with multitrait–multimethod confirmatory factor analysis models. Educational and Psychological Measurement 70, 2 (2010), 323–339.
  122. Elizabeth Mann Levesque. 2018. The Role of AI in Education and the Changing US Workforce. A Blueprint for the Future of AI (2018).
  123. Pretrained language models for text generation: A survey. arXiv preprint arXiv:2201.05273 (2022).
  124. Artificial General Intelligence for Medical Imaging. arXiv preprint arXiv:2306.05480 (2023).
  125. Ahmad Gull Liaqat and Aijaz Ahmad. 2011. Plagiarism detection in java code.
  126. F. Lin. 2023. Research on the Teaching Method of College Students’ Education Based on Visual Question Answering Technology. International Journal of Emerging Technologies in Learning (iJET) 18, 22 (2023), 167–182. https://doi.org/10.3991/ijet.v18i22.44103
  127. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
  128. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology (2023), 100017.
  129. Context matters: A strategy to pre-train language model for science education. arXiv preprint arXiv:2301.12031 (2023).
  130. Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities. arXiv preprint arXiv:2310.19626 (2023).
  131. Radonc-gpt: A large language model for radiation oncology. arXiv preprint arXiv:2309.10160 (2023).
  132. Pharmacygpt: The ai pharmacist. arXiv preprint arXiv:2307.10432 (2023).
  133. Surviving ChatGPT in Healthcare. Frontiers in Radiology 3 ([n. d.]), 1224682.
  134. Radiology-GPT: A Large Language Model for Radiology. arXiv preprint arXiv:2306.08666 (2023).
  135. Evaluating large language models for radiology natural language processing. arXiv preprint arXiv:2307.13693 (2023).
  136. Artificial intelligent based video analysis on the teaching interaction patterns in classroom environment. International Journal of Information and Education Technology 11, 3 (2021), 126–130.
  137. On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence. arXiv preprint arXiv:2304.06798 (2023).
  138. Geographic question answering: Challenges, uniqueness, classification, and future directions. AGILE: GIScience series 2 (2021), 8.
  139. CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations. In International Conference on Machine Learning. PMLR.
  140. SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution. arXiv preprint arXiv:2310.00413 (2023).
  141. RE Mayer and VK Sims. 1994. For whom is a picture worth a thousand words? Extensions of a dual-coding theory of multimedia learning. Journal of Educational Psychology 86, 3 (1994), 389–401.
  142. The risks associated with Artificial General Intelligence: A systematic review. Journal of Experimental & Theoretical Artificial Intelligence 35, 5 (2023), 649–663.
  143. Examining the use of video analysis on teacher instruction and teacher outcomes. The Journal of Special Education 57, 2 (2023), 83–93.
  144. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations.
  145. Large language models challenge the future of higher education. Nature Machine Intelligence 5, 4 (2023), 333–334.
  146. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  147. Supporting equity in virtual science instruction through project-based learning: Opportunities and challenges in the era of COVID-19. Journal of Science Teacher Education 32, 6 (2021), 642–663.
  148. Amit Mishra and Sanjay Kumar Jain. 2016. A survey on question answering systems with classification. Journal of King Saud University-Computer and Information Sciences 28, 3 (2016), 345–361.
  149. Shakir Mohamed and Balaji Lakshminarayanan. 2016. Learning in implicit generative models. arXiv preprint arXiv:1610.03483 (2016).
  150. Fabio Morandín-Ahuerma. 2023. Montreal Declaration for Responsible AI: 10 Principles and 59 Recommendations. (2023).
  151. Roxana Moreno and Richard E. Mayer. 2002. Learning science in virtual reality multimedia environments: Role of methods and media. Journal of Educational Psychology 94, 3 (2002), 598–610. https://doi.org/10.1037/0022-0663.94.3.598
  152. A conversation-based perspective for shaping ethical human–machine interactions: The particular challenge of chatbots. Journal of Business Research 129 (2021), 927–935.
  153. Lubasi Kakwete Musambo and Jackson Phiri. 2018. Student Facial Authentication Model based on OpenCV’s Object Detection Method and QR Code for Zambian Higher Institutions of Learning. International Journal of Advanced Computer Science and Applications (IJACSA) 9, 5 (2018), 88–94. http://www.ijacsa.thesai.org
  154. Peter Norvig and Stuart Russell. 2016. Artificial intelligence: a modern approach, Global Edition.
  155. Industry-scale Knowledge Graphs: Lessons and Challenges: Five diverse technology companies show how it’s done. Queue 17, 2 (2019), 48–75.
  156. OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt
  157. OpenAI. 2023. ChatGPT can now see, hear, and speak. https://openai.com/blog/chatgpt-can-now-see-hear-and-speak. Accessed: 2023-09-25.
  158. OpenAI. 2023a. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
  159. OpenAI. 2023b. GPT-4 technical report. arXiv (2023), 2303–08774.
  160. OpenAI. 2023c. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023). https://api.semanticscholar.org/CorpusID:257532815
  161. OpenAI. 2023e. Pioneering research on the path to AGI. https://openai.com/research/overview
  162. OpenAI. September 25, 2023d. GPT-4V(ision) system card. Technical Report. https://openai.com/research/gpt-4v-system-card
  163. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  164. A Paivio. 1971. Imagery and Verbal Processes. Holt, Rinehart & Winston, New York.
  165. Sahrish Panjwani-Charania and Xiaoming Zhai. 2023. AI for Students with Learning Disabilities: A Systematic Review. In Uses of Artificial Intelligence in STEM Education. Oxford University Press.
  166. James W Pellegrino. 2006. Rethinking and redesigning curriculum, instruction and assessment: What contemporary research and theory suggests. Commission on the Skills of the American Workforce, Chicago (2006), 1–15.
  167. Kosmos-2: Grounding Multimodal Large Language Models to the World. arXiv preprint arXiv:2306.14824 (2023).
  168. Teachers’ perspective on artificial intelligence education: An initial investigation. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–7.
  169. Isak Potgieter. 2020. Privacy concerns in educational data mining and learning analytics. The International Review of Information Ethics 28 (2020).
  170. IJ Prithishkumar and SA Michael. 2014. Understanding your student: Using the VARK model. Journal of Postgraduate Medicine 60, 2 (2014), 183–186.
  171. Carina Prunkl. 2022. Human autonomy in the age of artificial intelligence. Nature Machine Intelligence 4, 2 (2022), 99–101.
  172. EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System. Transactions in GIS (2023).
  173. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  174. Improving language understanding by generative pre-training. (2018).
  175. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  176. V. Raina and M. Gales. 2022. Multiple-choice question generation: Towards an automated assessment framework. arXiv preprint arXiv:2209.11830.
  177. Toward automated classroom observation: Multimodal machine learning to estimate class positive climate and negative climate. IEEE transactions on affective computing (2021).
  178. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821–8831.
  179. D. Ramesh and S. K. Sanampudi. 2022. An Automated Essay Scoring Systems: A Systematic Literature Review. Artificial Intelligence Review 55 (2022), 2495–2527.
  180. Waseem Rawat and Zenghui Wang. 2017. Deep convolutional neural networks for image classification: A comprehensive review. Neural computation 29, 9 (2017), 2352–2449.
  181. Artificial General Intelligence: Roadmap To Achieving Human-Level Capabilities.
  182. Meghna Reddy and Min Chen. 2023. Audio Classifier for Endangered Language Analysis and Education. In International Conference on Artificial Intelligence in Education. Springer, 242–247.
  183. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  184. GNIS-LD: Serving and visualizing the geographic names information system gazetteer as linked Data. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer, 528–540.
  185. Exploring New Frontiers in Agricultural NLP: Investigating the Potential of Large Language Models for Food Applications. arXiv preprint arXiv:2306.11892 (2023).
  186. Deborah Richards and Virginia Dignum. 2019. Supporting and challenging learners through pedagogical agents: Addressing ethical issues through designing for values. British Journal of Educational Technology 50, 6 (2019), 2885–2901.
  187. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  188. The dark side of innovation: Understanding research misconduct with chat gpt in nonformal education studies at universitas negeri surabaya. Jurnal Review Pendidikan Dasar: Jurnal Kajian Pendidikan dan Hasil Penelitian 9, 3 (2023), 220–228.
  189. D. R. Russell. 2009. Texts in Contexts: Theorizing Learning by Looking at Genre and Activity. (2009).
  190. Fairness and diversity in social-based recommender systems. In Adjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization. 83–88.
  191. P. M. Sadler and E. Good. 2006. The impact of self-and peer-grading on student learning. Educational Assessment 11, 1 (2006), 1–31.
  192. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  193. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 4 (2022), 4713–4726.
  194. KS Sahla and T Senthil Kumar. 2016. Classroom teaching assessment based on student emotions. In The International Symposium on Intelligent Systems Technologies and Applications. Springer, 475–486.
  195. Mausumi Sahu. 2016. Plagiarism detection using artificial intelligence technique in multiple files. International Journal 0f Scientific and Technology Research 5, 4 (2016).
  196. Can AI-Generated Text be Reliably Detected? arXiv e-prints (2023), arXiv–2303.
  197. Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature. https://doi.org/10.21203/rs.3.rs-3015858/v1 arXiv:rs-3015858/v1 This is a preprint; it has not been peer reviewed by a journal. This work is licensed under a CC BY 4.0 License..
  198. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1 (Lugano and Virtual Event, Switzerland) (ICER ’22). Association for Computing Machinery, New York, NY, USA, 27–43. https://doi.org/10.1145/3501385.3543957
  199. PEER: Empowering Writing with Large Language Models. In Responsive and Sustainable Educational Futures, Olga Viberg, Ioana Jivet, Pedro J. Muñoz-Merino, Maria Perifanou, and Tina Papathoma (Eds.). Springer Nature Switzerland, Cham, 755–761.
  200. Providing Feedback based on Student Errors in Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Experts. Frontiers in Education (2024).
  201. Multi-object Detection Based on Deep Learning in Real Classrooms. In PRICAI 2018: Trends in Artificial Intelligence (Lecture Notes in Computer Science, Vol. 11013), X. Geng and B.H. Kang (Eds.). Springer, Cham. https://doi.org/10.1007/978-3-319-97310-4_40
  202. Building pipelines for educational data using AI and multimodal analytics: A “grey-box” approach. British Journal of Educational Technology 50, 6 (2019), 3004–3031.
  203. A Study on various Applications of Computer Vision for Teaching Learning in Classroom. In 2022 6th International Conference on Electronics, Communication and Aerospace Technology. IEEE, 896–900.
  204. Chatgraph: Interpretable text classification by converting chatgpt knowledge to graphs. arXiv preprint arXiv:2305.03513 (2023).
  205. The Impact of AI Chatbot-Based Learning on Students’ Motivation in English Writing Classroom. In International Conference on Innovative Technologies and Learning. Springer Nature Switzerland, Cham, 542–549.
  206. P. Silvia. 2012. Curiosity and Motivation.
  207. Sahib Preet Singh. 2019. Artificial narrow intelligence adaptive audio processing. Ph. D. Dissertation. Dublin Business School.
  208. Christine Slade. 2023. Academic integrity and artificial intelligence. (2023).
  209. A multimodal assessment framework for integrating student writing and drawing in elementary science learning. IEEE Transactions on Learning Technologies 12, 1 (2018), 3–15.
  210. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
  211. Consistency models. arXiv preprint arXiv:2303.01469 (2023).
  212. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.
  213. J.Jinu Sophia and T.Prem Jacob. 2021. EDUBOT-A Chatbot For Education in Covid-19 Pandemic and VQAbot Comparison. In 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC). 1707–1714. https://doi.org/10.1109/ICESC51422.2021.9532611
  214. Robert Speer and Catherine Havasi. 2013. ConceptNet 5: A large semantic network for relational knowledge. The People’s Web Meets NLP: Collaboratively Constructed Language Resources (2013), 161–176.
  215. The Potential of a Visual Dialogue Agent In a Tandem Automated Audio Description System for Videos. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–17.
  216. Learning behavior recognition based on multi-object image in single viewpoint. Personal and Ubiquitous Computing 25 (2021), 1081–1090. https://doi.org/10.1007/s00779-020-01545-1
  217. Gamifying Math Education using Object Detection. arXiv (2023). arXiv:2304.06270 [cs.CV] https://doi.org/10.48550/arXiv.2304.06270
  218. Gamification of a Visual Question Answer System. In 2018 IEEE Tenth International Conference on Technology for Education (T4E). 41–44. https://doi.org/10.1109/T4E.2018.00016
  219. Learning analytics dashboard: a tool for providing actionable insights to learners. International Journal of Educational Technology in Higher Education 19, 1 (2022), 12.
  220. John Sweller. 2011. Cognitive Load Theory. In Psychology of Learning and Motivation, Jose P. Mestre and Brian H. Ross (Eds.). Vol. 55. Academic Press, 37–76. https://doi.org/10.1016/B978-0-12-387691-1.00002-8
  221. Z. Swiecki et al. 2022. Assessment in the age of artificial intelligence. Computers and (2022).
  222. Modern threats in academia: Evaluating plagiarism and artificial intelligence detection scores of ChatGPT. Eye (2023), 1–4.
  223. Assessing multimodal literacies in practice: A critical review of its implementations in educational settings. Language and Education 34, 2 (2020), 97–114.
  224. Kanchan M Tarwani and Swathi Edem. 2017. Survey on recurrent neural network in natural language processing. Int. J. Eng. Trends Technol 48, 6 (2017), 301–304.
  225. Sravani Teeparthi. 2021. Long Term Object Detection and Tracking in Collaborative Learning Environments. Master’s Thesis. The University of New Mexico, Albuquerque, New Mexico. B.Tech., Electronics and Communication Engineering, 2015.
  226. Visual question answering: A tutorial. IEEE Signal Processing Magazine 34, 6 (2017), 63–75. https://doi.org/10.1109/MSP.2017.2736920
  227. C. Thompson. 2011. Critical thinking across the curriculum: Process over output. International Journal of Humanities and Social Science 1, 9 (2011), 1–7.
  228. Phil Torres. 2019. The possibility and risks of artificial general intelligence. Bulletin of the atomic scientists 75, 3 (2019), 105–108.
  229. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  230. Conditional image generation with pixelcnn decoders. Advances in neural information processing systems 29 (2016).
  231. Dustin van der Haar. 2020. Student emotion recognition using computer vision as an assistive technology for education. In Information Science and Applications: ICISA 2019. Springer, 183–192.
  232. Attention is all you need. Advances in neural information processing systems 30 (2017).
  233. E Volini et al. 2020. Ethical implications of AI and the future of work. Deloitte Insights (2020).
  234. Scoring graphical responses in TIMSS 2019 using artificial neural networks. Educational and Psychological Measurement 83, 3 (2023), 556–585.
  235. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
  236. The dark side of generative artificial intelligence: A critical analysis of controversies and risks of ChatGPT. Entrepreneurial Business and Economics Review 11, 2 (2023), 7–24.
  237. Applying Machine Learning to Assess Paper-Pencil Drawn Models of Optics. In Uses of Artificial Intelligence in STEM Education. Oxford University Press, 1–22.
  238. Seeing ChatGPT Through Universities’ Policies, Resources and Guidelines. arXiv preprint arXiv:2312.05235 (2023).
  239. Review of large vision models and visual prompt engineering. arXiv preprint arXiv:2307.00855 (2023).
  240. Pei Wang. 2019. On defining artificial intelligence. Journal of Artificial General Intelligence 10, 2 (2019), 1–37.
  241. Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps. arXiv preprint arXiv:2309.05021 (2023).
  242. AI now report 2018. AI Now Institute at New York University New York.
  243. Using automated analysis to assess middle school students’ competence with scientific argumentation. Journal of Research in Science Teaching (2023), 1–32. https://doi.org/10.1002/tea.21864
  244. Alan FT Winfield and Marina Jirotka. 2018. Ethical governance is essential to building trust in robotics and artificial intelligence systems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 376, 2133 (2018), 20180085.
  245. Recognition of student classroom behaviors based on moving target detection. Traitement du Signal 38, 1 (2021), 215–220. https://doi.org/10.18280/ts.380123
  246. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7623–7633.
  247. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding 163 (2017), 21–40.
  248. Next-gpt: Any-to-any multimodal llm. arXiv preprint arXiv:2309.05519 (2023).
  249. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
  250. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 842–850.
  251. Instruction-vit: Multi-modal prompts for instruction learning in vit. arXiv preprint arXiv:2305.00201 (2023).
  252. Analyzing students’ attention by gaze tracking and object detection in classroom teaching. Data Technologies and Applications 57, 5 (2023), 643–667. https://doi.org/10.1108/DTA-09-2021-0236
  253. Target Classification System Based on Target Detection for Students’ Classroom Assessment. In 2019 10th International Conference on Information Technology in Medicine and Education (ITME). 541–545. https://doi.org/10.1109/ITME.2019.00128
  254. S. S. Yeh. 2010. Understanding and addressing the achievement gap through individualized instruction and formative assessment. Assessment in Education: Principles, Policy & Practice 17, 2 (2010), 169–182.
  255. A Survey on Multimodal Large Language Models. arXiv preprint arXiv:2306.13549 (2023).
  256. Haci Hasan Yolcu. 2023. Redefining the Teacher’s Role in Education through Artificial General Intelligence (AGI). (2023).
  257. Wordcraft: Story Writing with Large Language Models. In 27th International Conference on Intelligent User Interfaces. 841–852.
  258. Xiaoming Zhai. 2021a. Advancing automatic guidance in virtual science inquiry: from ease of use to personalization. Educational Technology Research and Development 69, 1 (2021), 255–258.
  259. Xiaoming Zhai. 2021b. Practices and theories: How can machine learning assist in innovative assessment practices in science education. Journal of Science Education and Technology 30, 2 (2021), 139–149.
  260. Xiaoming Zhai. 2022a. Assessing high-school students’ modeling performance on Newtonian mechanics. Journal of Research in Science Teaching 59, 8 (2022), 1313–1353. https://doi.org/10.1002/tea.21758
  261. Xiaoming Zhai. 2022b. ChatGPT user experience: Implications for education. Available at SSRN 4312418 (2022).
  262. Xiaoming Zhai. 2023. ChatGPT and AI: The Game Changer for Education. SSRN (2023). https://doi.org/abstract=4389098
  263. A Review of Artificial Intelligence (AI) in Education from 2010 to 2020. Complexity 2021 (2021), 1–18.
  264. Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching 59, 10 (2022), 1765–1794.
  265. Xiaoming Zhai and Joseph Krajcik. 2022. Pseudo AI Bias. https://doi.org/10.48550/arXiv.2210.08141
  266. Xiaoming Zhai and Ross Nehm. 2023. AI and formative assessment: The train has left the station. Journal of Research in Science Teaching 60, 6 (2023), 1390–1398. https://doi.org/DOI:10.1002/tea.21885
  267. Can AI Outperform Humans on Cognitive-demanding Tasks in Science? SSRN (2023). https://ssrn.com/abstract=4451722orhttp://dx.doi.org/10.2139/ssrn.4451722
  268. Xiaoming Zhai and Eric Wiebe. 2023. Technology-Based Innovative Assessment. Community for Advancing Discovery Research in Education, Education Development Center, Inc., 99–125.
  269. Applying machine learning in science assessment: a systematic review. Studies in Science Education 56, 1 (2020), 111–151.
  270. An overview on restricted Boltzmann machines. Neurocomputing 275 (2018), 1186–1199.
  271. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).
  272. When brain-inspired ai meets agi. arXiv preprint arXiv:2303.15935 (2023).
  273. Object Detection With Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems 30, 11 (2019), 3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865
  274. Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  275. Fine-grained artificial neurons in audio-transformers for disentangling neural auditory encoding. In Findings of the Association for Computational Linguistics: ACL 2023. 7943–7956.
  276. Zimmermann and Schunk. [n. d.]. Motivation: An Essential Dimension of Self-Regulated Learning.
  277. Object Detection in 20 Years: A Survey. Proc. IEEE 111, 3 (2023), 257–276. https://doi.org/10.1109/JPROC.2023.3238524
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Gyeong-Geon Lee (11 papers)
  2. Lehong Shi (6 papers)
  3. Ehsan Latif (36 papers)
  4. Yizhu Gao (4 papers)
  5. Matthew Nyaaba (10 papers)
  6. Shuchen Guo (13 papers)
  7. Zihao Wu (100 papers)
  8. Zhengliang Liu (91 papers)
  9. Hui Wang (371 papers)
  10. Gengchen Mai (46 papers)
  11. Tiaming Liu (1 paper)
  12. Xiaoming Zhai (48 papers)
  13. Arne Bewersdorff (5 papers)
Citations (28)
Youtube Logo Streamline Icon: https://streamlinehq.com