Cluster Model for parsimonious selection of variables and enhancing Students Employability Prediction (2407.16884v1)

Published 5 Jun 2024 in cs.CY, cs.AI, and cs.LG

Abstract: Educational Data Mining (EDM) is a promising field, where data mining is widely used for predicting students performance. One of the most prevalent and recent challenge that higher education faces today is making students skillfully employable. Institutions possess large volume of data; still they are unable to reveal knowledge and guide their students. Data in education is generally very large, multidimensional and unbalanced in nature. Process of extracting knowledge from such data has its own set of problems and is a very complicated task. In this paper, Engineering and MCA (Masters in Computer Applications) students data is collected from various universities and institutes pan India. The dataset is large, unbalanced and multidimensional in nature. A cluster based model is presented in this paper, which, when applied at preprocessing stage helps in parsimonious selection of variables and improves the performance of predictive algorithms. Hence, facilitate in better prediction of Students Employability.

PDF Abstract

Insights into a Cluster Model for Enhancing Students’ Employability Prediction

The paper "Cluster Model for parsimonious selection of variables and enhancing Students’ Employability Prediction" addresses a significant challenge faced by educational institutions: the improvement of student employability through data mining. The paper highlights the multifaceted nature of Educational Data Mining (EDM) and proposes a methodological innovation by employing a cluster-based model in the preprocessing phase of data analysis.

Core Contributions and Methodology

The authors focus on engineering and MCA students across India, aiming to refine predictive algorithms for employability forecasting. One of the principal challenges identified is the large, unbalanced, and multidimensional nature of educational datasets. Traditional predictive models often fall short due to these complexities, notably when the data classes are unbalanced and include numerous attributes.

To tackle this, the paper presents a cluster-based model utilized during the data preprocessing stage. The central thesis posits that such clustering leads to the more efficient selection of relevant variables, thereby enhancing the performance of various predictive algorithms. The dataset employed is substantial, with approximately 8,973 instances across 152 attributes, offering a robust basis for the analysis.

Algorithms and Findings

In the comparative analysis, the authors applied several well-known predictive algorithms, including k-Nearest Neighbors (k-NN), Naive Bayes Kernel, Decision Trees, Neural Networks (such as Perceptron and AutoMLP), and Support Vector Machines (SVM). The findings reveal that initial predictive performance was suboptimal due to the dataset's challenges. However, the introduction of a clustering step at preprocessing significantly increased prediction effectiveness.

The paper makes use of RapidMiner Studio for algorithmic implementation, employing k-means clustering with Jaccard Similarity to handle dimensionality and improve attribute selection. Notably, the cluster model showed a marked improvement over the direct application of predictive algorithms and the utilization of Principal Component Analysis (PCA) for dimensionality reduction. The clustering approach emerged as a superior strategy, leading to significant gains in F1 score and Kappa values, which are imperative for assessing model accuracy in unbalanced settings.

Implications and Speculative Outlook

The implications of this research are twofold: practical and theoretical. Practically, the enhanced predictive performance can aid educational institutions in identifying students at risk of unemployment and tailoring interventions accordingly. Theoretically, this research presents a step forward in EDM methodologies by illustrating the benefits of clustering in pre-processing large, complex datasets.

The authors suggest that the model’s adaptability and success offer fertile ground for future work, particularly in exploring clustering's role in other domains of educational data and its generalizability across diverse educational contexts. Further investigations could integrate other clustering techniques or hybrid models to explore their potential in refining predictive accuracy. Continuous enhancement of such models could prove pivotal as data volumes and complexity increase in educational settings.

In conclusion, while the paper doesn't offer a singular transformative innovation, it presents a well-reasoned and empirically validated approach to tackling a pervasive challenge in EDM. It opens avenues for further exploration in the utilization of pre-processing steps to bolster the performance of predictive algorithms in the domain of student employability.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Pooja Thakar (3 papers)
Anil Mehta (3 papers)
Manisha (12 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/WGOV/status/1816365813286178952

YouTube

Show All Videos