Data Mining: A prediction for performance improvement using classification (1201.3418v1)

Published 17 Jan 2012 in cs.IR

Abstract: Now-a-days the amount of data stored in educational database increasing rapidly. These databases contain hidden information for improvement of students' performance. The performance in higher education in India is a turning point in the academics for all students. This academic performance is influenced by many factors, therefore it is essential to develop predictive data mining model for students' performance so as to identify the difference between high learners and slow learners student. In the present investigation, an experimental methodology was adopted to generate a database. The raw data was preprocessed in terms of filling up missing values, transforming values in one form into another and relevant attribute/ variable selection. As a result, we had 300 student records, which were used for by Byes classification prediction model construction. Keywords- Data Mining, Educational Data Mining, Predictive Model, Classification.

Authors (2)

Brijesh Kumar Bhardwaj (2 papers)
Saurabh Pal (12 papers)

Citations (380)

View on Semantic Scholar

Summary

Data Mining: Predicting Academic Performance Using Classification

The paper by Brijesh Kumar Bhardwaj and Saurabh Pal explores the application of data mining techniques, specifically classification, to predict student performance. This research is framed around the increasing volume of data available in educational databases and leverages these data to model academic outcomes, particularly in the context of higher education in India.

Methodological Approach

The authors implemented a structured data mining process, intending to distinguish between high and low academic achievers. The methodology involves:

Data Collection and Preprocessing: Data were gathered from degree colleges affiliated with Dr. R. M. L. Awadh University, focusing on students pursuing a Bachelor of Computer Applications (BCA). The dataset, comprising 300 student records, underwent preprocessing which included handling missing values and selecting relevant attributes for analysis.
Variable Identification and Selection: The paper identified various predictive variables categorized under demographic, academic, and socio-economic characteristics. Key high-potential variables influencing student performance included Senior Secondary grades (GSS), living location (LLoc), and medium of teaching (Med).
Classification Technique Implementation: The research employs a Bayesian Classification algorithm for the prediction model. Bayesian methods, with their probabilistic underpinnings and capacity for handling missing data, serve as an efficient choice for this educational context. The model was trained and validated using MATLAB.

Key Findings

The results highlight several significant attributes that influence academic performance:

Senior Secondary Grades (GSS): This was found to be the most substantial predictor (probability value: 0.8642) of student success in higher education.
Living Location (LLoc) and Medium of Teaching (Med): These factors also exhibited considerable influence on student outcomes, with probability values of 0.7862 and 0.7225 respectively.
Socio-economic and Parental Factors: Attributes such as the mother's qualification, family income, and family status also showed noteworthy associations with student performance, albeit with slightly lower probabilities.

Implications and Future Directions

The implications of this paper are twofold—practical and theoretical. Practically, the predictive model developed can aid educational institutions in identifying students who may require additional support, thereby informing targeted interventions. Theoretically, the paper contributes to the field of Educational Data Mining (EDM) by demonstrating the applicability of Bayesian classification techniques in educational datasets, suggesting avenues for future research to explore other machine learning models.

In terms of future developments, integrating more comprehensive datasets incorporating real-time analytics and integrating diverse learning behavior data could enhance the robustness and predictive accuracy of such models. Additionally, the inclusion of more advanced machine learning methods like ensemble learning could improve prediction outcomes.

This paper serves as a valuable reference point for researchers interested in the intersection of data mining and educational performance prediction, paving the way for further exploration into optimizing academic outcomes through technological interventions.

PDF Markdown

Related Papers

Find Related Papers