A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE)
Abstract: The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE, inspired by the Synthetic Minority Oversampling Technique (SMOTE), generates synthetic data points using quantum processes such as swap tests and quantum rotation. The process varies from the conventional SMOTE algorithm's usage of K-Nearest Neighbors (KNN) and Euclidean distances, enabling synthetic instances to be generated from minority class data points without relying on neighbor proximity. The algorithm asserts greater control over the synthetic data generation process by introducing hyperparameters such as rotation angle, minority percentage, and splitting factor, which allow for customization to specific dataset requirements. Due to the use of a compact swap test, the algorithm can accommodate a large number of features. Furthermore, the approach is tested on a public dataset of Telecom Churn and evaluated alongside two prominent classification algorithms, Random Forest and Logistic Regression, to determine its impact along with varying proportions of synthetic data.
- G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417416307175
- M. Blaszczyk and J. Jedrzejowicz, “Framework for imbalanced data classification,” Procedia Computer Science, vol. 192, pp. 3477–3486, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050921018603
- S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on expansion and classification of imbalanced data based on SMOTE algorithm,” Scientific Reports, vol. 11, no. 1, p. 24039, Dec. 2021, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41598-021-03430-5
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002. [Online]. Available: https://doi.org/10.1613/jair.953
- A. Fernandez, S. Garcia, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” Journal of Artificial Intelligence Research, vol. 61, pp. 863–905, Apr. 2018. [Online]. Available: https://jair.org/index.php/jair/article/view/11192
- M. Mukherjee and M. Khushi, “Smote-enc: A novel smote-based method to generate synthetic data for nominal and continuous features,” Applied System Innovation, vol. 4, no. 1, 2021. [Online]. Available: https://www.mdpi.com/2571-5577/4/1/18
- C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, “Rusboost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 40, pp. 185–197, 2010. [Online]. Available: https://ieeexplore.ieee.org/document/5299216
- N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer, “Smoteboost: Improving prediction of the minority class in boosting,” vol. 2838, 01 2003, pp. 107–119. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-540-39804-2_12
- J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective class-imbalance learning based on smote and convolutional neural networks,” Applied Sciences, vol. 13, no. 6, 2023. [Online]. Available: https://www.mdpi.com/2076-3417/13/6/4006
- “Telco Customer Churn.” [Online]. Available: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
- H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” in Advances in Intelligent Computing, ser. Lecture Notes in Computer Science, D.-S. Huang, X.-P. Zhang, and G.-B. Huang, Eds. Berlin, Heidelberg: Springer, 2005, pp. 878–887. [Online]. Available: https://link.springer.com/chapter/10.1007/11538059_91
- H. He, Y. Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, pp. 1322–1328. [Online]. Available: https://ieeexplore.ieee.org/document/4633969
- G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, Jun. 2004. [Online]. Available: https://doi.org/10.1145/1007730.1007735
- “Two modifications of cnn,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 11, pp. 769–772, 1976. [Online]. Available: https://ieeexplore.ieee.org/document/4309452
- L. Demidova and I. Klyueva, “Svm classification: Optimization with the smote algorithm for the class imbalance problem,” in 2017 6th Mediterranean Conference on Embedded Computing (MECO), 2017, pp. 1–4. [Online]. Available: https://ieeexplore.ieee.org/document/7977136
- S. Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982. [Online]. Available: https://ieeexplore.ieee.org/document/1056489
- Calculate quantum euclidean distance with qiskit. Medium. [Online]. Available: https://medium.com/qiskit/calculate-quantum-euclidean-distance-with-qiskit-df85525ab485
- L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. [Online]. Available: https://doi.org/10.1023/A:1010933404324
- D. W. H. Jr., S. Lemeshow, and R. X. Sturdivant, “Applied Logistic Regression, 3rd Edition | Wiley.” [Online]. Available: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118548387
- T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967. [Online]. Available: https://ieeexplore.ieee.org/document/1053964
- N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992. [Online]. Available: https://www.tandfonline.com/doi/abs/10.1080/00031305.1992.10475879
- R. Blagus and L. Lusa, “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 14, no. 1, p. 106, Mar. 2013. [Online]. Available: https://doi.org/10.1186/1471-2105-14-106
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.