Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A CHAID Based Performance Prediction Model in Educational Data Mining (1002.1144v1)

Published 5 Feb 2010 in cs.LG

Abstract: The performance in higher secondary school education in India is a turning point in the academic lives of all students. As this academic performance is influenced by many factors, it is essential to develop predictive data mining model for students' performance so as to identify the slow learners and study the influence of the dominant factors on their academic performance. In the present investigation, a survey cum experimental methodology was adopted to generate a database and it was constructed from a primary and a secondary source. While the primary data was collected from the regular students, the secondary data was gathered from the school and office of the Chief Educational Officer (CEO). A total of 1000 datasets of the year 2006 from five different schools in three different districts of Tamilnadu were collected. The raw data was preprocessed in terms of filling up missing values, transforming values in one form into another and relevant attribute/ variable selection. As a result, we had 772 student records, which were used for CHAID prediction model construction. A set of prediction rules were extracted from CHIAD prediction model and the efficiency of the generated CHIAD prediction model was found. The accuracy of the present model was compared with other model and it has been found to be satisfactory.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. M. Ramaswami (2 papers)
  2. R. Bhaskaran (5 papers)
Citations (203)

Summary

  • The paper presents a CHAID-based model that predicts higher secondary academic success using robust educational data mining methods.
  • Methodology involved refining a dataset of 772 student records from Tamil Nadu through meticulous data preprocessing and variable selection.
  • Key predictors such as medium of instruction and previous academic performance guide proactive interventions to enhance student outcomes.

Analysis of a CHAID-Based Performance Prediction Model in Educational Data Mining

The paper presents an investigation into the development of a performance prediction model for higher secondary students' academic achievement in Indian schools using the Chi-squared Automatic Interaction Detector (CHAID) algorithm. The impetus behind this research stems from the need to accurately identify students who may underperform, allowing educational stakeholders to intervene proactively. Leveraging educational data mining, this paper endeavors to uncover which factors most significantly influence student performance.

Methodological Approach

A meticulous survey and experimental methodology were employed to develop a comprehensive dataset comprising both primary and secondary data sources, encompassing 1,000 student records initially from schools across three districts in Tamil Nadu. After data preprocessing to address inconsistencies and select pertinent variables, the authors distilled this to a refined dataset of 772 student records to construct the CHAID model.

Data were gathered primarily through a detailed questionnaire, enriched with secondary information from educational institutions and relevant officials. This methodological rigor ensured a robust framework to analyze predictors of academic achievement at the higher secondary level.

Key Findings

The application of the CHAID algorithm resulted in a prediction model that delineates the interplay between several critical variables affecting student performance. Through feature selection using Chi-square tests, the paper identified high-impact predictors such as the medium of instruction, previous academic performance, location of the school, type of secondary education, and socio-economic variables like parental education and income. These findings were instrumental in elucidating the various factors contributing to student success or underperformance.

The results indicated that medium of instruction and previous academic achievement, particularly, were strong predictors of outcomes at the higher secondary level, aligning with existing literature on educational performance predictors. The CHAID model demonstrated reasonable predictive accuracy, with a classification accuracy of 44.69%.

Implications and Future Directions

Practically, the findings provide educational practitioners with actionable insights to tailor interventions effectively, focus resources, and foster environments conducive to fostering academic achievement. Accordingly, this points to the potential for scaling such models to broader educational contexts, paving the way for applications in diverse educational systems.

Theoretically, this research contributes to the discourse on the efficacy of different data mining algorithms in educational settings, juxtaposing CHAID with other methodologies like Decision Trees, Naïve Bayes, and Neural Networks. The comparative analysis underscores that while CHAID can handle small and unbalanced datasets, further enhancement in predictive performance can be sought through advanced techniques such as Boosting and Bagging, or hybrid models that integrate more sophisticated variable selection mechanisms.

Despite the model's utility, the authors acknowledge limitations due to the geographical specificity of the dataset and suggest that generalization demands larger, more varied samples. Future research may explore these avenues, incorporating more diverse educational data to enhance model robustness and applicational breadth.

Overall, the paper advances the understanding of educational performance prediction, offering a nuanced perspective on applying CHAID in educational data mining and setting a precedent for further explorations in predictive educational analytics.