Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE)

Published 27 Feb 2024 in quant-ph, cs.AI, and cs.LG | (2402.17398v3)

Abstract: The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets. Quantum-SMOTE, inspired by the Synthetic Minority Oversampling Technique (SMOTE), generates synthetic data points using quantum processes such as swap tests and quantum rotation. The process varies from the conventional SMOTE algorithm's usage of K-Nearest Neighbors (KNN) and Euclidean distances, enabling synthetic instances to be generated from minority class data points without relying on neighbor proximity. The algorithm asserts greater control over the synthetic data generation process by introducing hyperparameters such as rotation angle, minority percentage, and splitting factor, which allow for customization to specific dataset requirements. Due to the use of a compact swap test, the algorithm can accommodate a large number of features. Furthermore, the approach is tested on a public dataset of Telecom Churn and evaluated alongside two prominent classification algorithms, Random Forest and Logistic Regression, to determine its impact along with varying proportions of synthetic data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417416307175
  2. M. Blaszczyk and J. Jedrzejowicz, “Framework for imbalanced data classification,” Procedia Computer Science, vol. 192, pp. 3477–3486, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050921018603
  3. S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on expansion and classification of imbalanced data based on SMOTE algorithm,” Scientific Reports, vol. 11, no. 1, p. 24039, Dec. 2021, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41598-021-03430-5
  4. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002. [Online]. Available: https://doi.org/10.1613/jair.953
  5. A. Fernandez, S. Garcia, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” Journal of Artificial Intelligence Research, vol. 61, pp. 863–905, Apr. 2018. [Online]. Available: https://jair.org/index.php/jair/article/view/11192
  6. M. Mukherjee and M. Khushi, “Smote-enc: A novel smote-based method to generate synthetic data for nominal and continuous features,” Applied System Innovation, vol. 4, no. 1, 2021. [Online]. Available: https://www.mdpi.com/2571-5577/4/1/18
  7. C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, “Rusboost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 40, pp. 185–197, 2010. [Online]. Available: https://ieeexplore.ieee.org/document/5299216
  8. N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer, “Smoteboost: Improving prediction of the minority class in boosting,” vol. 2838, 01 2003, pp. 107–119. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-540-39804-2_12
  9. J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective class-imbalance learning based on smote and convolutional neural networks,” Applied Sciences, vol. 13, no. 6, 2023. [Online]. Available: https://www.mdpi.com/2076-3417/13/6/4006
  10. “Telco Customer Churn.” [Online]. Available: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
  11. H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” in Advances in Intelligent Computing, ser. Lecture Notes in Computer Science, D.-S. Huang, X.-P. Zhang, and G.-B. Huang, Eds.   Berlin, Heidelberg: Springer, 2005, pp. 878–887. [Online]. Available: https://link.springer.com/chapter/10.1007/11538059_91
  12. H. He, Y. Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, pp. 1322–1328. [Online]. Available: https://ieeexplore.ieee.org/document/4633969
  13. G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, Jun. 2004. [Online]. Available: https://doi.org/10.1145/1007730.1007735
  14. “Two modifications of cnn,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 11, pp. 769–772, 1976. [Online]. Available: https://ieeexplore.ieee.org/document/4309452
  15. L. Demidova and I. Klyueva, “Svm classification: Optimization with the smote algorithm for the class imbalance problem,” in 2017 6th Mediterranean Conference on Embedded Computing (MECO), 2017, pp. 1–4. [Online]. Available: https://ieeexplore.ieee.org/document/7977136
  16. S. Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982. [Online]. Available: https://ieeexplore.ieee.org/document/1056489
  17. Calculate quantum euclidean distance with qiskit. Medium. [Online]. Available: https://medium.com/qiskit/calculate-quantum-euclidean-distance-with-qiskit-df85525ab485
  18. L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. [Online]. Available: https://doi.org/10.1023/A:1010933404324
  19. D. W. H. Jr., S. Lemeshow, and R. X. Sturdivant, “Applied Logistic Regression, 3rd Edition | Wiley.” [Online]. Available: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118548387
  20. T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967. [Online]. Available: https://ieeexplore.ieee.org/document/1053964
  21. N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992. [Online]. Available: https://www.tandfonline.com/doi/abs/10.1080/00031305.1992.10475879
  22. R. Blagus and L. Lusa, “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 14, no. 1, p. 106, Mar. 2013. [Online]. Available: https://doi.org/10.1186/1471-2105-14-106

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.