Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education (2401.00832v3)

Published 1 Jan 2024 in cs.AI and cs.CY
Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education

Abstract: The integration of AI, particularly LLM-based systems, in education has shown promise in enhancing teaching and learning experiences. However, the advent of Multimodal LLMs (MLLMs) like GPT-4 with vision (GPT-4V), capable of processing multimodal data including text, sound, and visual inputs, opens a new era of enriched, personalized, and interactive learning landscapes in education. Grounded in theory of multimedia learning, this paper explores the transformative role of MLLMs in central aspects of science education by presenting exemplary innovative learning scenarios. Possible applications for MLLMs could range from content creation to tailored support for learning, fostering competencies in scientific practices, and providing assessment and feedback. These scenarios are not limited to text-based and uni-modal formats but can be multimodal, increasing thus personalization, accessibility, and potential learning effectiveness. Besides many opportunities, challenges such as data protection and ethical considerations become more salient, calling for robust frameworks to ensure responsible integration. This paper underscores the necessity for a balanced approach in implementing MLLMs, where the technology complements rather than supplants the educator's role, ensuring thus an effective and ethical use of AI in science education. It calls for further research to explore the nuanced implications of MLLMs on the evolving role of educators and to extend the discourse beyond science education to other disciplines. Through the exploration of potentials, challenges, and future implications, we aim to contribute to a preliminary understanding of the transformative trajectory of MLLMs in science education and beyond.

Introduction

Science education is a field replete with activities that span from absorbing scientific knowledge to engaging in scientific methods and communicating scientific ideas effectively. The nature of science learning is intrinsically multimodal, necessitating the engagement with a variety of activities that pertain to different modalities, such as reading and writing scientific text, deciphering diagrams, and crafting and interpreting data visualizations. This multimodal reality is further bolstered by cognitive theories such as the Cognitive Theory of Multimedia Learning, which emphasizes the enhancement of knowledge acquisition through the combination of text and imagery.

Framework

The Role of Multimodal Learning in Science Education

Science education prepares students to handle complex realities through robust content knowledge and the cultivation of scientific practices. Engagement with scientific material is made more dynamic through multimodal learning. Combining text, images, and other sensory inputs helps learners construct an integrated mental model. Multimodal LLMs (MLLMs), such as GPT-4V, are designed to cater to the multifaceted nature of scientific education, facilitating both educators and learners in creating and engaging with multimodal content. This can potentially transform educational practices by enabling personalized content generation, tailored learning support, and multimodal assessment.

The Advancements in AI-Driven Models

Traditionally, LLMs like ChatGPT have been used extensively in education for content creation and problem-solving. The advent of MLLMs introduces the ability to process and generate content beyond text, encompassing imagery, audio, and video, thus mirroring the multimodality of science learning. They can interpret and respond to multimodal information, bridging the gap between text-centric learning and the demands of scientific education. MLLMs could support the essential tasks of educators by providing comprehensive analysis, generating novel material, and offering assessment and feedback across diverse modalities.

Applications in Science Education

MLLMs open up a range of applications within science education, from content creation to learning support and assessments. They provide the tools to create adaptive, multimodal learning materials, which are accessible to students with various needs. By transforming and supplementing textual information with visuals, MLLMs promote deeper understanding and engagement with scientific content. Additionally, their ability to provide instantaneous, personalized feedback on both textual and visual student work represents a significant advancement for learning processes and outcomes.

Challenges and Considerations

Despite the immense potential of MLLMs, the integration of these technologies into the classroom must be approached with caution. Challenges like minimal guidance, cognitive load, and the need for balance in technology use remain salient. There are also ethical considerations, including data privacy, biased content, and reliable assessment. As such, it is vital for educators to play a pivotal role in mediating the use of MLLMs, ensuring they serve as an enhancement, not a replacement, for human interaction and learning. Further research is needed to explore the nuanced implications of MLLMs on teacher roles and the education system as a whole.

Conclusion

As science education continues to evolve, MLLMs promise a trajectory where educational processes are enhanced and personalized. These models can potentially give rise to learning environments that respond adaptively to student needs, thus improving learning experiences significantly. However, the successful incorporation of MLLMs requires a thoughtful, balanced approach, prioritizing the enhancement of human-centric teaching and comprehensive understanding of scientific concepts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (129)
  1. Gpt-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education, pages 1–36.
  2. Gpt-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education.
  3. Flamingo: a visual language model for few-shot learning.
  4. Palm 2 technical report.
  5. Large language models and the perils of their hallucinations. Critical Care, 27(1):1–2.
  6. Authentic learning exercises as a means to influence preservice teachers’ technology integration self-efficacy and intentions to integrate technology. Australasian Journal of Educational Technology, 30(6).
  7. Process mining techniques for analysing patterns and strategies in students’ self-regulated learning. Metacognition and Learning, 9(2):161–185.
  8. Open science saves lives: lessons from the covid-19 pandemic. BMC Medical Research Methodology, 21(1):1–18.
  9. Assessing student errors in experimentation using artificial intelligence and large language models: A comparative study with human raters. Computers and Education: Artificial Intelligence, 5:100177.
  10. Myths, mis- and preconceptions of artificial intelligence: A review of the literature. Computers and Education: Artificial Intelligence, 4:100143.
  11. On the opportunities and risks of foundation models.
  12. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems.
  13. Language models are realistic tabular data generators. arXiv preprint arXiv:2210.06280.
  14. Language models are few-shot learners.
  15. Five trends of education and technology in a sustainable future. Geography and Sustainability, 1(2):93–97.
  16. Bybee, R. W. (1997). Achieving scientific literacy. Heinemann, Portsmouth, NH. Includes bibliographical references (p. 233-254) and index.
  17. Elucidating stem concepts through generative ai: A multi-modal exploration of analogical reasoning. arXiv preprint arXiv:2308.10454.
  18. X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages.
  19. Medically aware gpt-3 as a data generator for medical dialogue summarization. In Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations. Association for Computational Linguistics.
  20. Palm: Scaling language modeling with pathways.
  21. DeepMind, G. (2023). Welcome to the Gemini era.
  22. Department for Education (2013). The national curriculum in England: key stages 3 and 4 framework document (2014). Department for Education.
  23. The teaching of science: New insights into knowledge, language and pedagogy. In Teaching Science, pages 1–19. Routledge.
  24. Artificial intelligence in education: Fears and faiths. International Journal of Information and Education Technology, 12(7):650–657.
  25. Chatgpt for (finance) research: The bananarama conjecture. Finance Research Letters, 53:103662.
  26. Palm-e: An embodied multimodal language model.
  27. Generating medical reports from patient-doctor conversations using sequence-to-sequence models. In Proceedings of the First Workshop on Natural Language Processing for Medical Conversations. Association for Computational Linguistics.
  28. European Union (2023). EU AI Act: first regulation on artificial intelligence. https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence.
  29. Flick, L. B. (1993). The meanings of hands-on science. Journal of Science Teacher Education, 4(1):1–8.
  30. Mme: A comprehensive evaluation benchmark for multimodal large language models. arXiv preprint arXiv:2306.13394.
  31. Gabel, D. (1998). The Complexity of Chemistry and Implications for Teaching, pages 233–248. Springer Netherlands.
  32. Gardner, M. (2012). Scientific language. LEARN Journal: Language Education and Acquisition Research Network, 2:13–32.
  33. Imagine & immerse yourself: Does visuospatial imagery moderate learning in virtual reality? Computers & Education, page 104909.
  34. Scaffolding for creative product possibilities in a design-based stem activity. Research in science education, 45:727–748.
  35. Hattie, J. (2008). Visible Learning. Routledge.
  36. The power of feedback. Review of Educational Research, 77(1):81–112.
  37. Henderson, G. (1999). Learning with diagrams. Australian Science Teachers Journal, 45(2):17.
  38. Conditions that enable effective feedback. Higher Education Research & Development, 38(7):1401–1416.
  39. Developing students’ ability to ask more and better questions resulting from inquiry-type chemistry laboratories. Journal of Research in Science Teaching, 42(7):791–806.
  40. What do university students know about artificial intelligence? development and validation of an ai literacy test. Computers and Education: Artificial Intelligence, 5:100165.
  41. Multimedia effect in problem solving: A meta-analysis. Educational Psychology Review, 33(4):1717–1747.
  42. Audiogpt: Understanding and generating speech, music, sound, and talking head.
  43. Mathprompter: Mathematical reasoning using large language models.
  44. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.
  45. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  46. The effects of pre-training types on cognitive load, collaborative knowledge construction and deep learning in a computer-supported collaborative learning environment. Interactive Learning Environments, 29(7):1163–1175.
  47. Challenges and applications of large language models.
  48. Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review, 19(4):509–539.
  49. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274.
  50. Framework for 21st century learning. Partnership for 21st centuryskills,(11.10. 2015) Retrieved from http://www. p21.org/our-work/p21-framework.
  51. Kechel, J.-H. (2016). Schülerschwierigkeiten beim eigenständigen experimentieren. Description based on publisher supplied metadata and other sources.
  52. Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2):75–86.
  53. KMK (2004). Beschlüsse der Kultusministerkonferenz: Bildungsstandards im Fach Biologie für den Mittleren Schulabschluss. München.
  54. Physics task development of prospective physics teachers using chatgpt. Physical Review Physics Education Research, 19(2):020128.
  55. Lan, W. (1998). self-monitoring skills in statistics. In D. Schunk, & B. Zimmerman (Eds.), Developing Self-Regulated Learners: From Teaching to Self-Reflective Practice. Guilford.
  56. Fine-tuning chatgpt for automatic scoring. arXiv preprint arXiv:2310.10072.
  57. Ai gender bias, disparities, and fairness: Does training data matter? arXiv preprint arXiv:2312.10833.
  58. Multimodality of ai for education: Towards artificial general intelligence. arXiv preprint arXiv:2312.06037.
  59. Multimodality of ai for education: Towards artificial general intelligence.
  60. Nerif: Gpt-4v for automatic scoring of drawn models. arXiv preprint arXiv:2311.12990.
  61. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
  62. Lemke, J. L. (1998). Teaching all the languages of science: Words, symbols, images, and actions.
  63. Self-regulation behaviors in underprepared (developmental) and regular admission college students. Contemporary Educational Psychology, 23(1):42–64.
  64. Instructional principles for self-regulation. Educational Technology Research and Development, 49(2):93–103.
  65. Multimodal foundation models: From specialists to general-purpose assistants. arXiv preprint arXiv:2309.10020, 1(2):2.
  66. Videochat: Chat-centric video understanding.
  67. Visual instruction tuning.
  68. What is ai literacy? competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20. ACM.
  69. Video-chatgpt: Towards detailed video understanding via large vision and language models.
  70. When learning the hard way makes learning easy: Building better lab note-taking skills. Journal of Chemical Education, 87(7):703–704.
  71. Generating diverse code explanations using the gpt-3 large language model. In Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 2, ICER 2022. ACM.
  72. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
  73. Mayer, R. E. (1997). Multimedia learning: Are we asking the right questions? Educational Psychologist, 32(1):1–19.
  74. Mayer, R. E. (2021). Multimedia learning. Title from publisher’s bibliographic system (viewed on 29 Jun 2020).
  75. Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38(1):43–52.
  76. McComas, W. F. (2014). “21st-Century Skills”, pages 1–1. SensePublishers.
  77. Molenaar, I. (2022). Towards hybrid human‐ai learning technologies. European Journal of Education, 57(4):632–645.
  78. Assessing the quality of student-generated short answer questions using gpt-3. In European conference on technology enhanced learning, pages 243–257. Springer.
  79. Foundations of the learning sciences. In The Cambridge handbook of the learning sciences, pages 21–43.
  80. Promoting pre‐experimental activities in high‐school chemistry: Focusing on the role of students’ epistemic questions. International Journal of Science Education, 30(13):1801–1821.
  81. Nielsen, K. H. (2012). Scientific communication and the nature of science. Science & Education, 22(9):2067–2086.
  82. Do student perceptions of teaching predict the development of representational competence and biological knowledge? Learning and Instruction, 31:13–22.
  83. NRC (2012). Framework for K-12 Science Education Practices, Crosscutting Concepts, and Core Ideas. National Academies Press.
  84. OECD (2018). The Future of Education and Skills: Education 2023.
  85. OpenAI (2023). Gpt-4 technical report.
  86. OpenAI (2023). Gpt-4v(ision) system card. Technical report, OpenAI.
  87. Training language models to follow instructions with human feedback.
  88. Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of Psychology / Revue canadienne de psychologie, 45(3):255–287.
  89. Pavlik, J. V. (2023). Collaborating with chatgpt: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1):84–93.
  90. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
  91. Studying the expertise reversal of the multimedia signaling effect at a process level: evidence from eye tracking. Instructional Science, 47(6):627–658.
  92. Signaling text-picture relations in multimedia learning: A comprehensive meta-analysis. Educational Research Review, 17:19–36.
  93. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  94. Mathematical discoveries from program search with large language models. Nature, pages 1–3.
  95. Schiff, D. (2020). Out of the laboratory and into the classroom: the future of artificial intelligence in education. AI & SOCIETY, 36(1):331–348.
  96. What’s next for ai ethics, policy, and governance? a global overview. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 153–158.
  97. Construction and interference in learning from multiple representation. Learning and Instruction, 13(2):141–156.
  98. Exploring first year university students’ statistical literacy: A case on describing and visualizing data. Journal on Mathematics Education, 12(3):427–448.
  99. PEER: Empowering Writing with Large Language Models, pages 755–761. Springer Nature Switzerland.
  100. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.
  101. Pandagpt: One model to instruction-follow them all.
  102. Why minimally guided teaching techniques do not work: A reply to commentaries. Educational Psychologist, 42(2):115–121.
  103. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  104. Llama: Open and efficient foundation language models.
  105. Llama 2: Open foundation and fine-tuned chat models.
  106. Treagust, D. F. (2008). The Role of Multiple Representations in Learning Science: Enhancing Students’ Conceptual Understanding and Motivation, pages 7–23. BRILL.
  107. Effects of feedback in a computer-based learning environment on students’ learning outcomes: A meta-analysis. Review of Educational Research, 85(4):475–511.
  108. van Gog, T. (2014). The Signaling (or Cueing) Principle in Multimedia Learning, pages 263–278. Cambridge University Press.
  109. Reading images. Deakin University.
  110. Nationality bias in text generation. arXiv preprint arXiv:2302.02463.
  111. Common mistakes in the construction of diagrams in biological contexts. Research in Science Education, 45(2):193–213.
  112. Language and literacy in science education.
  113. Using automated analysis to assess middle school students’ competence with scientific argumentation. Journal of Research in Science Teaching, pages 1–32.
  114. Discovery of a structural class of antibiotics with explainable deep learning. Nature, pages 1–9.
  115. Visual chatgpt: Talking, drawing and editing with visual foundation models.
  116. Next-gpt: Any-to-any multimodal llm.
  117. Towards Improving the Reliability and Transparency of ChatGPT for Educational Question Answering, pages 475–488. Springer Nature Switzerland.
  118. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24.
  119. mplug-owl: Modularization empowers large language models with multimodality.
  120. Yeh, S. S. (2010). Understanding and addressing the achievement gap through individualized instruction and formative assessment. Assessment in Education: Principles, Policy & Practice, 17(2):169–182.
  121. Wordcraft: Story writing with large language models. In 27th International Conference on Intelligent User Interfaces, IUI ’22. ACM.
  122. Zhai, X. (2023). Chatgpt for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students, 29(3):42–46.
  123. Pseudo AI Bias. Oxford University Press, UK.
  124. Ai and formative assessment: The train has left the station. Journal of Research in Science Teaching, 60(6):1390–1398.
  125. Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities.
  126. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
  127. Multimodal chain-of-thought reasoning in language models.
  128. Judging llm-as-a-judge with mt-bench and chatbot arena.
  129. Minigpt-4: Enhancing vision-language understanding with advanced large language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Arne Bewersdorff (5 papers)
  2. Christian Hartmann (2 papers)
  3. Marie Hornberger (1 paper)
  4. Kathrin Seßler (7 papers)
  5. Maria Bannert (3 papers)
  6. Enkelejda Kasneci (97 papers)
  7. Gjergji Kasneci (69 papers)
  8. Xiaoming Zhai (48 papers)
  9. Claudia Nerdel (3 papers)
Citations (12)