QSMOTE: Quantum SMOTE for Imbalanced Data

Updated 21 January 2026

QSMOTE is a quantum-inspired oversampling technique that uses quantum state encoding and fidelity metrics to generate robust synthetic minority samples.
It replaces classical Euclidean interpolation with controlled quantum rotations and swap-test similarity measures to improve sample diversity and classifier performance.
Empirical evaluations on industrial and medical datasets demonstrate significant accuracy gains and enhanced robustness, particularly for non-linear classifiers under noise.

Quantum Synthetic Minority Oversampling Technique (QSMOTE) is a quantum-inspired approach for mitigating class imbalance in machine learning datasets by generating synthetic minority class samples based on quantum state geometry, quantum superposition, and fidelity-driven interpolation in Hilbert space. Unlike classical SMOTE, which relies on Euclidean feature-space interpolation along nearest-neighbor directions, QSMOTE harnesses quantum data encodings, controlled angular rotations, and amplitude-based similarity estimation to enhance both the diversity and representativeness of synthetic samples. QSMOTE has been examined on tabular, industrial, and medical datasets and systematically benchmarked against classical oversampling and more recent quantum-inspired augmenters, revealing substantial gains in classifier performance and robustness, particularly for non-linear learners under realistic noise models (Mohanty et al., 2024, Behera et al., 18 Dec 2025, Patel et al., 16 Jan 2026).

1. Quantum Principles and Data Encoding

QSMOTE translates classical feature vectors into normalized quantum states to facilitate geometric protein of data diversity in high-dimensional spaces. The encoding process involves amplitude encoding, where each vector $x \in \mathbb{R}^d$ is mapped as:

$|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$

(Behera et al., 18 Dec 2025, Mohanty et al., 2024). For comparative operations, centroid vectors $c$ from cluster assignments, obtained via k-means, are likewise encoded.

Multiple quantum copies (tensor products) further nonlinearize data embedding:

$|\psi_x\rangle^{\otimes n_\text{copies}}$

(Behera et al., 18 Dec 2025). Stereographic encoding is also used:

$|\psi_x^\mathrm{st}\rangle = \frac{1}{\sqrt{1 + \|x\|^2}} \bigl(|0\rangle + \sum_{i=1}^{d} x_{i-1}\,|i\rangle \bigr)$

which tends to yield superior separation and balanced oversampling geometry.

2. Synthetic Sample Generation: Quantum Geometry and Interpolation

QSMOTE fundamentally replaces the classical KNN-based SMOTE with quantum geometric operations:

Swap Test Similarity: A compact SWAP test circuit estimates quantum fidelity between encoded $x$ and $c$ . Measuring the ancilla qubit returns

$P(0) = \frac{1}{2}\left[1 + |\langle \psi_x | \psi_c \rangle|^2 \right]$

from which $|\langle \psi_x | \psi_c \rangle|$ and the angular separation $\alpha(x,c) = \arccos\,s(x,c)$ are derived (Mohanty et al., 2024).

Direction and Spread: Given $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 0 and split factor $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 1, sample $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 2, then form

$|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 3

(Patel et al., 16 Jan 2026). This process ensures that the diversity and proximity of synthetic samples are rigorously controlled by quantum angle and spread.

Rotational Gates: Quantum rotation—applied about X (or Y, Z) axes—enables the generation of synthetic minority samples by parametrically rotating the encoded data register through an angle derived from swap-test output (Mohanty et al., 2024).
Fidelity-weighted QSMOTE: Quantum-inspired weighting uses Born-rule overlap in synthetic generation:

$|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 4

and, for a candidate sample,

$|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 5

(Behera et al., 18 Dec 2025).

3. Algorithmic Workflow and Complexity

The canonical QSMOTE pipeline comprises:

Clustering (typically k-means) to define local minority regions.
Quantum encoding (amplitude or stereo) for all minority samples and their associated cluster centroids.
Similarity estimation via swap test for each minority sample-centroid pair.
Synthetic generation by controlled rotation/displacement in quantum-encoded space.
Aggregation and integration into the original dataset, yielding an augmented class balance.

Pseudocode for QSMOTE consists of amplitude encoding, swap test circuit, and quantum rotation (see Algorithms 1–5, (Mohanty et al., 2024)). The overall gate count and qubit usage scale polylogarithmically with feature dimension $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 6, i.e., $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 7 for state preparation and test depth, offering a marked complexity advantage over classical KNN-based SMOTE, which scales linearly (or worse) in $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 8 and $|\psi_x\rangle = \frac{1}{\|x\|} \sum_{i=0}^{d-1} x_i\,|i\rangle$ 9 (Mohanty et al., 2024).

4. Classifier Integration and Noise-Robustness Analysis

QSMOTE-augmented datasets support multiple classifier families:

Ensemble/Non-linear: Random Forest (RF), SVM (RBF), Decision Trees (DT).
Linear/Probabilistic: Logistic Regression (LR), Naive Bayes (NB).

Empirical evaluation on industrial benchmarks (SPID, CWRUBD, EFDD, IFDD) demonstrates that RF, SVM, and DT uniformly exhibit substantial accuracy improvements post-QSMOTE (up to 170% for DT on EFDD; RF achieves ≥0.99 accuracy on IFDD) (Patel et al., 16 Jan 2026). In contrast, LR and NB can experience degraded performance, especially in overlapping feature spaces where interpolation-induced boundary distortion breaks class separability.

Robustness was further tested using six quantum-inspired noise channels (bit-flip, phase-flip, bit-phase-flip, depolarizing, amplitude-damping, phase-damping), each realized via electronic Kraus maps. Ensemble and margin-based models preserve ≥95% accuracy even under maximal noise, whereas linear/probabilistic models are highly susceptible to catastrophic failure (Patel et al., 16 Jan 2026).

5. Comparative Results and Variant Performance

Major results from telecom churn and industrial benchmarks are summarized below.

Dataset/Model	Pre-QSMOTE Acc	Post-QSMOTE Acc	Relative Gain
SPID RF	0.7756	0.8533	+10%
EFDD DT	0.2979	0.8228	+176.2%
IFDD RF	0.6763	0.9919	+46.7%

(Patel et al., 16 Jan 2026)

For classifier variants (PGM, kPGM), optimization over encoding schemes (amplitude, stereo) and number of quantum copies yield further trade-offs:

PGM (stereo, $c$ 0): Accuracy $c$ 1, F1 $c$ 2
kPGM stable across QSMOTE variants, with top scores Accuracy $c$ 3, F1 $c$ 4 Margin-QSMOTE and Fidelity-QSMOTE outperform KNN-QSMOTE in both recall and precision (Behera et al., 18 Dec 2025).

6. Limitations and Extension Opportunities

QSMOTE state preparation relies on classical pre-normalization; no explicit entanglement layer is deployed in most current variants, limiting enhancement of inter-feature correlations. Synthetic sample generation remains linear in $c$ 5, with each data point processed individually. Hardware implementation is constrained by noise and amplitude-loading limitations; quantum amplitude amplification and composite rotations are proposed as extensions (Mohanty et al., 2024). Empirical analyses suggest that noise tolerance is classifier-dependent, requiring careful selection of learning algorithms for practical deployment (Patel et al., 16 Jan 2026).

Recommended future directions include systematic utilization of multi-axis rotations, introduction of entangling gates, exploration of multi-class and streaming settings, and fusion with hybrid quantum-classical pipelines. Error-mitigation and variational refinement circuits offer avenues for enhancing synthetic sample fidelity (Mohanty et al., 2024).

7. Theoretical Significance and Deployment Considerations

QSMOTE establishes a geometry-aware, quantum-driven paradigm for resampling imbalanced data, avoiding pitfalls of linear interpolation and neighbor search inherent to SMOTE. By leveraging quantum angle, fidelity, and higher-order data embedding, it adapts density and spread of synthetic minority samples to respect local manifold structure. For non-linear classifiers in industrial fault diagnosis and tabular-labeled applications, QSMOTE offers drop-in resampling with provable gains in accuracy, recall, and resilience—especially under quantum and classical noise regimes (Patel et al., 16 Jan 2026, Behera et al., 18 Dec 2025).

Deployment recommendations: select split factor $c$ 6 in the 2–5 range, prefer RF/SVM learners in noisy or overlap-prone settings, consider kPGM for large tabular data, and apply QSMOTE as preprocessing prior to conventional training and hyperparameter tuning pipelines. This framework provides a rigorous baseline for imbalance- and noise-aware AI in contemporary and next-generation industrial environments.

A plausible implication is that QSMOTE's fusion of quantum encoding and angle-driven interpolation not only augments dataset balance and classifier performance but also fundamentally enhances robustness to structured perturbations, marking it as a technically significant advance in quantum-inspired resampling for machine learning (Mohanty et al., 2024, Behera et al., 18 Dec 2025, Patel et al., 16 Jan 2026).