Enhancing Higher Education with Generative AI: A Multimodal Approach for Personalised Learning
This paper presents a focused exploration of the application of Generative AI (GenAI) within higher education through the development of a multimodal chatbot. The central aim of this research is to enrich personalized learning experiences by leveraging the combination of text, image, and file inputs. The multimodal chatbot is specifically designed to address a wide spectrum of educational queries, thus helping to bridge existing gaps in conventional educational technologies.
The authors implemented various state-of-the-art GenAI technologies, utilizing the capabilities of the ChatGPT API for text-based interactions and Google Bard for image analysis and diagram-to-code conversions. The integration of multimodal input capabilities represents the primary contribution of this research, enabling the chatbot to effectively process and respond to complex educational queries. This innovation is particularly relevant in disciplines that require significant interaction with visual information, such as STEM fields.
The paper further introduces a file-based analyzer to enhance the teaching process. This component of the system supports uploading coursework-related documents and provides nuanced sentiment and emotion analysis. Such functionality equips educators with the ability to gain comprehensive insights into student feedback and course evaluations. Key metrics such as sentiment scores and keyword summaries offer educators a substantive tool for pedagogical assessment and improvement.
The methodology outlined in the paper involves the meticulous design of three primary modules: text-based, image-based, and file-based components. The text-based module leverages fine-tuning principles to adapt the ChatGPT API for specific educational contexts. The image-based module employs Google Bard's robust capabilities in interpreting and converting diagrammatic content into executable code, a notable advancement given the existing challenges inherent in such conversions. The file-based analyzer module provides powerful analytical capabilities, drawing on NLP methodologies and Plutchik's emotion wheel to generate detailed analyses of feedback data.
In demonstrating the proof-of-concept using Gradio, an open-source library, the authors effectively showcase each module's functionality. This demonstration underscores the feasibility of employing an integrated GenAI system to address complex educational requirements within a user-friendly and scalable web application framework.
The implications of this research are substantial, heralding a shift toward more dynamic and responsive educational environments facilitated by GenAI technologies. The combination of multimodal input capabilities and scalable, granular analysis marks significant progress toward more personalized, adaptable, and efficient educational processes. For future research, the integration of additional modalities such as voice and haptics could be explored to further augment interactive capabilities.
In conclusion, while the exploration of multimodal conversational AI in education remains in its formative stages, the developments presented in this paper indicate a promising trajectory. This research provides a foundational paper into the practical applications of GenAI within educational settings, with the potential for significant contributions to both theoretical advancements and practical applications in adaptive learning technologies.