On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset (1711.07831v4)

Published 20 Nov 2017 in cs.LG and stat.ML

Abstract: This paper presents a comparison of six ML algorithms: GRU-SVM (Agarap, 2017), Linear Regression, Multilayer Perceptron (MLP), Nearest Neighbor (NN) search, Softmax Regression, and Support Vector Machine (SVM) on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset (Wolberg, Street, & Mangasarian, 1992) by measuring their classification test accuracy and their sensitivity and specificity values. The said dataset consists of features which were computed from digitized images of FNA tests on a breast mass (Wolberg, Street, & Mangasarian, 1992). For the implementation of the ML algorithms, the dataset was partitioned in the following fashion: 70% for training phase, and 30% for the testing phase. The hyper-parameters used for all the classifiers were manually assigned. Results show that all the presented ML algorithms performed well (all exceeded 90% test accuracy) on the classification task. The MLP algorithm stands out among the implemented algorithms with a test accuracy of ~99.04%.

Citations (212)

View on Semantic Scholar

Summary

The paper compares six ML algorithms, emphasizing classification accuracy and performance in breast cancer detection.
It employs diverse methods, including a hybrid GRU-SVM and classical models, to explore benefits and limitations in diagnosis.
Results show the MLP model reaching approximately 99% accuracy, highlighting the potential of ML for clinical diagnostics.

Application of Machine Learning Algorithms on Breast Cancer Detection

The paper "On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset" by Abien Fred M. Agarap presents a comparative paper of six machine learning algorithms applied to breast cancer diagnosis using the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The paper evaluates the algorithms based on their classification test accuracy, sensitivity, and specificity, providing valuable insights into their effectiveness in a critical medical application.

Methodology and Algorithms

The paper employs a diverse set of machine learning algorithms: GRU-SVM, Linear Regression, Multilayer Perceptron (MLP), Nearest Neighbor search, Softmax Regression, and Support Vector Machine (SVM). These algorithms were trained and tested using a 70/30 data split. The choice of these algorithms reflects an exploration of both classical and modern approaches. Notably, the GRU-SVM model combines a gated recurrent unit (GRU) with a support vector machine (SVM), proposing a hybrid approach to classification.

For preprocessing, the dataset is standardized, ensuring that each feature has similar weight, a standard practice in machine learning to improve model convergence and reduce training time. The paper also specifies hyper-parameters for each algorithm, most of which were manually set, indicating an empirical approach to tuning. The implementation utilized TensorFlow and other scientific computing libraries, showcasing a practical and reproducible experimental setup.

Results and Performance Analysis

All algorithms demonstrated high classification accuracy, surpassing 90%, with the MLP achieving a notable accuracy of approximately 99.04%. The strong performance of MLP might be attributed to its ability to capture complex patterns due to its multiple hidden layers and activation functions, such as ReLU, which aid in function approximation. In contrast, traditional linear classifiers like SVM, which also performed competently, leveraged the linear separability of the WDBC dataset.

The GRU-SVM model achieved an accuracy of 93.75%, indicating it as a moderately successful method in this context. The paper hypothesizes that the recurrent nature and non-linear transformations introduced by GRU might have led to challenges in fully leveraging the linearly separable nature of the data. Furthermore, the sensitivity of recurrent neural networks to weight initialization could have impacted its performance consistency.

Implications and Future Directions

The paper conclusively supports the efficacy of ML algorithms in medical diagnostics, specifically in breast cancer classification. Practically, integrating such models into diagnostic processes could assist clinicians in early detection, potentially improving patient outcomes. The thorough comparison also informs practitioners about the suitability of different algorithms depending on the data characteristics.

For theoretical implications, this research contributes to the understanding of how hybrid models like GRU-SVM perform in classification tasks, providing a foundation for further exploration into combining recurrent networks with other classification techniques.

In the future, extending this paper by incorporating cross-validation and hyper-parameter optimization could yield even more robust evaluations. Moreover, experimenting with alternative datasets and expanding the range of algorithms could further refine the understanding of ML applications in medical diagnostics.

In summary, the paper offers a detailed examination of machine learning models applied to a vital health domain, presenting both quantitative outcomes and qualitative insights into their application in healthcare. As the field advances, such studies illuminate pathways for integrating AI technologies into medical research and practice, a trend likely to continue shaping the future of AI in health sciences.

PDF Markdown