Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm (1508.02061v1)

Published 7 May 2015 in cs.CY and cs.DB

Abstract: Data mining techniques have been widely used to mine knowledgeable information from medical data bases. In data mining classification is a supervised learning that can be used to design models describing important data classes, where class attribute is involved in the construction of the classifier. Nearest neighbor (KNN) is very simple, most popular, highly efficient and effective algorithm for pattern recognition.KNN is a straight forward classifier, where samples are classified based on the class of their nearest neighbor. Medical data bases are high volume in nature. If the data set contains redundant and irrelevant attributes, classification may produce less accurate result. Heart disease is the leading cause of death in INDIA. In Andhra Pradesh heart disease was the leading cause of mortality accounting for 32%of all deaths, a rate as high as Canada (35%) and USA.Hence there is a need to define a decision support system that helps clinicians decide to take precautionary steps. In this paper we propose a new algorithm which combines KNN with genetic algorithm for effective classification. Genetic algorithms perform global search in complex large and multimodal landscapes and provide optimal solution. Experimental results shows that our algorithm enhance the accuracy in diagnosis of heart disease.

Citations (332)

View on Semantic Scholar

Summary

The paper introduces a novel hybrid method combining KNN and GA to enhance heart disease classification by pruning irrelevant attributes.
The GA component optimizes feature selection, leading to a notable 15% improvement in cross-validation accuracy for heart disease diagnosis.
Robust evaluation across seven datasets, including regional data from Andhra Pradesh, demonstrates the method’s practical value in clinical decision support.

Classification of Heart Disease Using KNN and Genetic Algorithm

The paper presented by M. Akhil Jabbar et al. discusses a novel approach for enhancing the classification accuracy of heart disease diagnosis through the integration of the K-Nearest Neighbor (KNN) algorithm with a Genetic Algorithm (GA). The motivation underlying this research stems from heart disease being a leading cause of mortality in Andhra Pradesh, India, necessitating improved decision support systems for clinicians.

This work introduces a hybrid classification algorithm aimed at reducing the impact of redundant and irrelevant data attributes often found in medical datasets, which can otherwise hinder classification accuracy. The proposed method employs a two-step enhancement: firstly, it applies genetic search to rank attributes based on their contribution to classification accuracy. Secondly, it constructs a KNN-based classifier using the most relevant attributes.

Theoretical and Methodological Overview

K-Nearest Neighbor (KNN): Known for its simplicity and effectiveness in pattern recognition, KNN is a non-parametric classification method reliant on distance measures like Euclidean distance. Despite its efficacy, KNN's performance is influenced by the choice of the 'k' parameter and the presence of irrelevant features in the dataset.
Genetic Algorithm (GA): GA operates as an optimization methodology inspired by biological evolution, employing crossover and mutation operators to explore solution spaces. This algorithm is particularly advantageous for handling multimodal landscapes and finding global optima.

The integration of GA into the KNN framework targets the limitation posed by KNN regarding irrelevant attributes through feature selection. In the proposed method, GA identifies and retains attributes that enhance classification efficacy, thereby pruning unhelpful data points, which in turn sharpens the KNN model's accuracy.

Empirical Evaluation and Results

The effectiveness of the proposed KNN+GA approach is evidenced through extensive testing on seven datasets, including six from the UCI repository and a custom heart disease dataset from Andhra Pradesh. This validation reveals a notable increase in classification accuracy across several datasets, especially when employing GA to determine suitable attributes.

Strong results include a 15% improvement in cross-validation accuracy for heart disease classification within the Andhra Pradesh dataset, compared to KNN without GA, and a 5% improvement using full training sets. The authors meticulously report the results of various attribute configurations and different 'k' values to delineate the conditions under which the proposed method outperforms traditional KNN and other algorithms.

Conclusions and Implications

This paper suggests that the proposed KNN+GA method significantly enhances heart disease classification accuracy by leveraging both algorithms' strengths. The combination acts to abate the noise and redundancy issues that impair KNN's performance, all while maintaining GA's robust search capabilities in large solution spaces. This hybrid method not only optimizes the classifier but also simplifies model interpretability by focusing only on the most impactful features.

In terms of future directions, the integration of GA could be expanded with additional evolutionary strategies or hybridized with other classification methods, potentially improving its applicability across disparate medical conditions beyond heart disease. Furthermore, the dataset-focused approach offers an adaptable framework for personalized medical diagnostics by accommodating demographic-specific data, such as those from Andhra Pradesh.

This research contribution underlines the potential for evolutionary algorithms to not only enhance traditional machine learning classifiers but also pave the way for more accurate and efficient medical decision-support systems.

PDF Markdown