Quantum-UnIMP: Hybrid Quantum-Classical Imputation
- Quantum-UnIMP is a hybrid quantum-classical framework that integrates shallow IQP circuits with LLM-based techniques to impute missing values in mixed-type datasets.
- It encodes classical data into expressive quantum feature maps, capturing nonlinear and higher-order correlations that traditional methods often miss.
- Empirical results show significant performance improvements, with up to 15.2% RMSE reduction and 8.7% increase in Macro F1-Score on benchmark datasets.
Quantum-UnIMP is a hybrid quantum-classical machine learning framework designed for imputation of missing values in real-world, mixed-type tabular datasets. Building on the neural imputation capabilities of LLM-based architectures such as UnIMP, Quantum-UnIMP introduces shallow quantum circuits—specifically, Instantaneous Quantum Polynomial (IQP) circuits—to replace conventional classical feature embeddings, with the aim of capturing complex nonlinear and higher-order correlations that classical methods often fail to represent. This section presents a detailed account of the methodology, integration strategy, key performance metrics, empirical comparisons, and implications for quantum computing hardware (2507.08255).
1. Quantum Feature Mapping with IQP Circuits
Quantum-UnIMP encodes classical data into quantum feature maps by leveraging shallow IQP circuits. The approach proceeds as follows:
- Each data point, after preprocessing (including normalization of numerical features to and suitable embedding of categorical/textual features), is transformed into a parameter vector .
- This parameter vector defines the rotation angles and phases for a quantum circuit of qubits. The standard form of the IQP embedding is:
Here, denotes a layer of Hadamard gates applied to all qubits, producing a uniform superposition, and is a diagonal unitary whose entries are polynomial functions of the input features.
- The quantum state is then measured in the -basis to extract expectation values:
The resulting vector constitutes the quantum feature map that nonlinearly encodes the input, with entanglement and superposition allowing the representation to reflect higher-order dependencies across all input types (numerical, categorical, textual).
This mapping is distinguished by its capacity to encode information in a non-classical, highly expressive manner. The choice of IQP circuits is deliberate, as they are amenable to near-term (NISQ) quantum hardware due to their shallow depth and reduced susceptibility to gate errors.
2. Hybrid Integration with LLM-Based Imputation Architectures
Quantum-UnIMP employs a two-stage hybrid architecture. The sequence is as follows:
- Mixed-type features for each data row are first preprocessed and concatenated into a single classical vector .
- This vector is encoded into a quantum state using the IQP circuit described above, producing the quantum feature map .
- The quantum feature vectors serve as the input node embeddings within the subsequent LLM-based imputer. In this component, the LLM (often Transformer-based) operates on a hypergraph structure where nodes correspond to features and hyperedges to dataset rows, thereby allowing the model to represent complex higher-order relations.
- The enhanced expressivity of —arising from superposition and entanglement—enables the LLM to better attend to and reason about intricate patterns of missingness, particularly in mixed-type data where conventional classical embeddings may underperform.
By placing quantum feature extraction at the input layer, the remainder of the LLM pipeline and its training regimes remain largely classical, allowing seamless hybridization.
3. Empirical Performance and Evaluation
Quantum-UnIMP demonstrates pronounced improvements in benchmark experiments on mixed-type datasets. Core metrics employed in evaluation include:
- Root Mean Squared Error (RMSE) for numerical features:
Quantum-UnIMP achieves up to a 15.2% reduction in RMSE compared to state-of-the-art classical imputers and LLM-based architectures with purely classical feature embeddings.
- Macro F1-Score for categorical features: This provides a balanced measurement across classes (including class imbalance) for categorical prediction tasks. Improvements of up to 8.7% in Macro F1-Score are observed.
- These gains are validated on benchmark datasets such as UCI Adult Income, Bank Marketing, and synthetic healthcare data, using cross-validation. Both ablation studies and comparisons against frameworks such as MICE, MissForest, GAIN, and the classical UnIMP baseline confirm that the quantum feature mapping is the dominant source of improvement.
A summary table, as reported, is as follows:
Feature Type | Metric | Quantum-UnIMP Improvement |
---|---|---|
Numerical | RMSE | Up to 15.2% reduction |
Categorical | F1-Score | Up to 8.7% increase |
4. Comparison with Prior Classical Imputation Methods
Quantum-UnIMP departs from conventional workflows in two important respects:
- Classical embeddings in previous methods—including standard MLPs in UnIMP, autoencoders, random forests, and statistical techniques—typically represent data via linear or shallow non-linear mappings, potentially failing to reflect intricate cross-type dependencies and nonclassical correlations.
- The use of quantum feature maps enables embeddings to inherently exploit non-local, non-linear, and higher-order relationships thanks to quantum phenomena. Superposition makes it possible to evaluate multiple hypotheses about the data simultaneously, while entanglement encodes global dependencies across all features—even those that are not obviously directly related.
- Empirical tests demonstrate that neither purely random embeddings nor classical MLP/autoencoder-based embeddings attain the same level of imputation accuracy and discrimination as the IQP-driven quantum embeddings.
- The quantum methodology is modular: an ablation paper replacing only the feature map and leaving the LLM structure constant produces most of the observed performance gain, indicating that the quantum feature map is the primary mechanism of improvement.
5. Compatibility and Implications for Near-Term Quantum Hardware
Quantum-UnIMP has been constructed to be compatible with present-day noisy intermediate-scale quantum (NISQ) hardware:
- The chosen IQP circuits are shallow (e.g., 8 qubits, 2 layers), have circuit depths that minimize exposure to decoherence and quantum noise, and are implementable without error correction on current-generation devices.
- Experimental evaluations, while predominantly on quantum simulators, establish a credible case for transferring these findings to quantum hardware as it matures. The architecture can already be tested on real quantum processors for moderate input dimension sizes.
- The hybrid quantum-classical design allows the computationally intensive processing within the LLM to remain on classical accelerators, ensuring that limitations in quantum layer width or fidelity do not bottleneck the system.
- As quantum hardware improves, further benefits from deeper circuits and larger encoding spaces are anticipated, especially once error-robust quantum feature maps and co-designed optimization with LLMs are realized.
6. Outlook and Research Directions
Quantum-UnIMP illustrates the practical potential for integrating quantum feature maps as an input layer to large-scale, classical deep learning models. The integration offers a promising route for addressing complex missing data patterns in mixed-type real-world datasets. Future research directions include:
- Scaling the quantum embedding layer to larger feature spaces as quantum hardware advances.
- Developing joint optimization/co-design routines where both quantum circuit parameters and LLM weights are trained jointly, potentially offering further improvements in representation and imputation accuracy.
- Extending the approach to other machine learning tasks where rich, nonlinear and high-dimensional data representations are critical, such as time-series forecasting, anomaly detection, or recommendation systems.
In sum, Quantum-UnIMP marks a significant advance in quantum-enhanced machine learning for imputation, establishing measurable empirical superiority over standard classical methods while remaining accessible to near-term quantum computers (2507.08255).