Improving Students' Academic Performance with AI and Semantic Technologies (2206.03213v2)

Published 2 May 2022 in cs.CY and cs.AI

Abstract: Artificial intelligence and semantic technologies are evolving and have been applied in various research areas, including the education domain. Higher Education institutions strive to improve students' academic performance. Early intervention to at-risk students and a reasonable curriculum is vital for students' success. Prior research opted for deploying traditional machine learning models to predict students' performance. In terms of curriculum semantic analysis, after conducting a comprehensive systematic review regarding the use of semantic technologies in the Computer Science curriculum, a major finding of the study is that technologies used to measure similarity have limitations in terms of accuracy and ambiguity in the representation of concepts, courses, etc. To fill these gaps, in this study, three implementations were developed, that is, to predict students' performance using marks from the previous semester, to model a course representation in a semantic way and compute the similarity, and to identify the prerequisite between two similar courses. Regarding performance prediction, we used the combination of Genetic Algorithm and Long-Short Term Memory (LSTM) on a dataset from a Brazilian university containing 248730 records. As for similarity measurement, we deployed BERT to encode the sentences and used cosine similarity to obtain the distance between courses. With respect to prerequisite identification, TextRazor was applied to extract concepts from course description, followed by employing SemRefD to measure the degree of prerequisite between two concepts. The outcomes of this study can be summarized as: (i) a breakthrough result improves Manrique's work by 2.5% in terms of accuracy in dropout prediction; (ii) uncover the similarity between courses based on course description; (iii) identify the prerequisite over three compulsory courses of School of Computing at ANU.

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates a novel GA+LSTM model that improved dropout prediction accuracy to 97.65% on the ARQ dataset.
The paper employs BERT-based semantic analysis to measure course similarities and identify prerequisite relationships in university curricula.
The research offers practical applications for HEIs, including early intervention, improved curriculum design, and enhanced student advising.

This paper explores the use of AI and Semantic Technologies to improve student academic performance, focusing on two main areas: predicting student dropout and analyzing university curricula.

Problem & Motivation:

High student dropout rates are a significant concern for Higher Education Institutions (HEIs) and students. Additionally, curriculum design, including course prerequisites and sequencing, plays a vital role in student success and retention. Existing research had limited application of deep learning for dropout prediction and faced limitations in accurately measuring semantic similarity between courses for curriculum analysis.

Objectives:

The research aimed to:

Predict student dropout using grades from previous semesters.
Model semantic representations of courses and compute similarity between them.
Identify prerequisite relationships (sequences) between similar courses.

Methodology:

Three main implementations were developed:

Dropout Prediction:
- Data: Academic records (grades, status, course info, etc.) of 5,582 students across 3 degrees (Information Systems - CSI, Management - ADM, Architecture - ARQ) from a Brazilian university (248,730 records, 2001-2009).
- Preprocessing: Data cleaning, handling missing values (using Random Forest imputation), converting categorical data, normalization (z-score), and addressing class imbalance (using SMOTE for the CSI dataset).
- Feature Selection: A Genetic Algorithm (GA) combined with Support Vector Machine (SVM) fitness evaluation was used to select the most relevant features from the initial 27.
- Prediction Model: A Long Short-Term Memory (LSTM) network, followed by a 3-layer Fully Connected (FC) network, was trained on the selected features represented as time series data (32 time steps per student, representing 8 semesters). Mean Squared Error (MSE) was used as the loss function with the Adam optimizer.
Course Similarity Measurement:
- Data: Course descriptions from the Australian National University (ANU) Computer Science ("COMP") program website.
- Encoding: Bidirectional Encoder Representations from Transformers (BERT) was used to generate contextual sentence embeddings (vectors) for each sentence in the course descriptions.
- Similarity Calculation: Cosine similarity was calculated between the sentence vectors. The overall similarity between two courses was determined by averaging the similarity scores across their sentences.
Prerequisite Identification:
- Data: Same ANU course descriptions.
- Concept Extraction: TextRazor API was used to extract key concepts (entities) from the course descriptions.
- Prerequisite Measurement: Semi-Reference Distance (SemRefD), an extension of Reference Distance (RefD), was employed. SemRefD measures the prerequisite dependency between two concepts by querying the DBpedia knowledge graph, considering semantic properties and paths between concepts. The sum of SemRefD scores between concepts extracted from two courses indicates the overall prerequisite relationship (e.g., if Course A is a prerequisite for Course B).

Key Results & Contributions:

Dropout Prediction: The GA+LSTM model achieved high accuracy, notably improving upon previous work by Manrique et al. (1903.10210) by 2.45% (reaching 97.65% accuracy) on the ARQ dataset. Performance varied slightly across datasets (ADM, ARQ, CSI), with feature selection identifying optimal subsets for each. Some instability (multiple descent) was observed during training loss, potentially due to dataset characteristics or hyperparameter choices (like high dropout rate).
Systematic Review: A comprehensive review identified how Semantic Web and NLP technologies are used in CS curriculum analysis, highlighting limitations in existing similarity measures and inspiring the prerequisite identification approach.
Course Similarity: Heatmaps visualized similarity scores between ANU COMP courses. Foundational courses (e.g., COMP1110) showed higher average similarity within their level, while similarity decreased and differentiation increased at higher levels (2000, 3000, 4000), reflecting specialization.
Prerequisite Identification: Applied to three related ANU courses (COMP1100, COMP1110, COMP2100), the SemRefD analysis confirmed strong prerequisite relationships (COMP1100 -> COMP2100, COMP1110 -> COMP2100) and a weaker, more parallel relationship between COMP1100 and COMP1110, aligning with typical curriculum structure.

Practical Implications & Future Work:

The developed techniques offer practical applications for HEIs:

Early Intervention: The dropout prediction model can identify at-risk students early, allowing for timely support.
Curriculum Analysis & Design: Similarity and prerequisite identification tools can help analyze existing curricula, ensure logical sequencing, identify overlaps or gaps, and inform redesign efforts.
Student Advising: These tools can aid advisors in guiding students through course selection.
Recommendation Systems: Combining semantic analysis and student performance data could power course recommendation systems.

Future work includes refining the LSTM model (e.g., using dynamic time steps), using more balanced datasets for dropout prediction, and further developing tools for curriculum analysis and student support.

PDF Markdown

Improving Students' Academic Performance with AI and Semantic Technologies (2206.03213v2)

Summary

Related Papers