Quantum Floating-Point Encodings

Updated 8 March 2026

Quantum floating-point encodings are methods that decompose numbers into sign, exponent, and mantissa components to enable precise arithmetic on quantum hardware.
They support efficient quantum arithmetic protocols, including addition, multiplication, inversion, and rotation synthesis with resource-optimized circuits.
They underpin applications in quantum simulation, annealing, and machine learning by balancing error, precision, and hardware resource constraints.

Quantum floating-point encodings define representations and arithmetic protocols for real (and sometimes complex) numbers suitable for implementation on quantum information processing platforms. These include gate-based quantum circuits, quantum annealers, and quantum-inspired compression techniques for simulation and machine learning. Such encodings are essential for scientific, engineering, and data-centric quantum applications that require non-integer, wide-dynamic-range quantities to be processed with bounded error, resource efficiency, and compatibility with fault-tolerant protocols.

1. Core Encoding Schemes and Their Mathematical Structure

Quantum floating-point representations parallel and generalize classical floating-point formats, typically decomposing each value into sign, exponent, and mantissa components, but employing register structures and algorithmic primitives suitable to quantum computation or annealing. Major schemes include:

Register-based quantum floating-point (gate-based): Typical quantum encodings use one qubit for the sign $S$ , $E$ qubits for a two’s-complement exponent, and $M$ qubits for a mantissa, implemented as $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ . The mantissa most often has an implied leading 1, so the encoded value is $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ (Häner et al., 2018). Some approaches use two’s-complement fixed-point mantissas and exponents, omitting the hidden bit and thereby facilitating certain arithmetic operations and reducing ancilla usage (Serrallés et al., 23 Oct 2025).
Floating-point via quantum annealing (QUBO): On annealing platforms, floating-point variables are replaced by binary expansions over bounded intervals, e.g., $\chi = \sum_{r=0}^{R-1} 2^{-r} Q_r$ , $x = c\chi - d$ , with $Q_r\in\{0,1\}$ promoted to QUBO variables. This encodes a real number as a weighted sum of Ising or QUBO variables, optimized by minimizing a quadratic cost energy (Rogers et al., 2019).
Semi-Boolean polynomial (SBP) encoding: For certain applications, such as modular arithmetic or circuits benefitting from global entangling gates, the mantissa is stored on an $n$ -qubit quantum register, and the exponent is maintained classically. Quantum arithmetic is then realized by semi-Boolean polynomial evaluation or QFT-based subroutines, which accommodate arbitrary-precision, block-parallelizable adders and multipliers (Seidel et al., 2021).

2. Quantum Floating-Point Arithmetic Algorithms

Quantum floating-point arithmetic comprises addition, multiplication, division, reciprocals, and specialized rotation synthesis, each demanding quantum-specific circuit constructs for normalization, alignment, and error control:

Addition: Exponents are subtracted via reversible subtraction, followed by alignment shifts on the mantissa (controlled Fredkin gate cascades), two’s-complement addition, renormalization (first-one circuits to detect leading bits and corresponding shifts), and exponent updating. Zero, underflow, and overflow handling is implemented by conditional bit masks or register resets (Häner et al., 2018, Serrallés et al., 23 Oct 2025).
Multiplication: The exponents are summed, mantissas are multiplied, and post-product normalization is performed via shifts and exponent increment. The sign is given by XOR of operand signs. SBP and QFT-based approaches can parallelize many of these steps, especially in modular contexts (Serrallés et al., 23 Oct 2025, Seidel et al., 2021).
Reciprocal and division: Gated circuit approaches can leverage Newton-Raphson iteration for reciprocal calculation in fixed- or floating-point mantissa representation, initializing mantissa/exponent appropriately and iteratively refining via FMA and subtraction primitives. Resource counts scale with the register width and number of iterations (Serrallés et al., 23 Oct 2025).
QUBO encoding division/inversion (annealing): Objectives such as $(mx-y)^2$ or $E$ 0 are mapped to quadratic forms in binary (mantissa) variables. These cost functions are minimized in the QUBO Hamiltonian, giving rise to approximate floating-point results after measurement (Rogers et al., 2019).
Rotation synthesis (floating-point circuit synthesis): In Clifford+T, extremely small unitaries can be realized by floating-point gearboxes, with a mantissa encoded by an optimal small rotation, and an exponent factorized via a cascade of non-deterministic “gearbox” subcircuits. Exponent scale is realized efficiently, with relative (not absolute) precision cost (Wiebe et al., 2013).

3. Resource Analysis and Trade-offs

Resource usage—number of qubits, depth (T-count/T-depth, circuit depth), and ancilla requirements—are critical for practical utility, especially in fault-tolerant and large-scale applications. Key findings include:

Encoding/Algorithm	Qubit Usage (Single precision)	T-count Scaling	Ancilla Requirement
Hand-optimized register FP (Häner et al., 2018)	140 (32-bit)	$E$ 1	$E$ 2 (for normalization, shifts)
Two’s-comp FP (QFT-based) (Serrallés et al., 23 Oct 2025)	$E$ 323 (32-bit)	$E$ 4 for most operations	$E$ 5
SBP/Fourier (mantissa quantum, exp. classic) (Seidel et al., 2021)	$E$ 6 (mantissa)	Add $E$ 7, Mul $E$ 8	$E$ 9 (serial), up to $M$ 0 (parallel)
QUBO/annealing encoding (Rogers et al., 2019)	$M$ 1 bits per variable	Dependent on $M$ 2; $M$ 3 QUBO graph	Limited by problem embedding
Gearbox/FP rotation (Wiebe et al., 2013)	$M$ 4 ancillas (exponent), $M$ 5 ancilla (mantissa)	$M$ 6	Parallel-prep of gearbox ancillas

Circuit synthesis via floating-point gearbox circuits achieves a $M$ 7-count slope $M$ 8, compared to a best ancilla-free $M$ 9, decisively reducing the gate complexity for small-angle synthesis (Wiebe et al., 2013). QFT/ancilla-supported SBP techniques can outperform carry-ripple arithmetic by up to an order of magnitude in circuit depth (Seidel et al., 2021). Recent designs significantly reduce ancilla requirements compared to HDL-style or fixed-point arithmetic (Serrallés et al., 23 Oct 2025).

4. Range, Precision, and Error Analysis

Representation choices determine achievable range, precision, and their scaling with resource count:

Mantissa and exponent allocation: Mantissa bit width $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 0 controls precision ( $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 1 error), exponent width $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 2 determines representable range ( $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 3). Two’s-complement encodings support negative and positive exponents for signed numbers. Absence of a hidden bit (IEEE-754 denormal handling) can save resources and circuit complexity (Serrallés et al., 23 Oct 2025).
Error scaling: In Newton reciprocal and ODE simulation, relative errors decrease exponentially with total register width; $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 4+1 qubits reduce error by a factor of $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 5 (Serrallés et al., 23 Oct 2025). In QUBO-based annealing, finite bit count yields a grid (e.g., $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 6 granularity) and rounding to grid points, but energy minimization and iterative refinement can reduce solution error to $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 7 in a few passes (Rogers et al., 2019).
Lossy compression (for simulation): Scalar quantization of complex amplitudes in Schrödinger sim achieves $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 8 fidelity with $|S⟩\,|E_{E-1}\cdots E_0⟩\,|M_{M-1}\cdots M_0⟩$ 9 significand bits; vector quantization/codebook methods require bits per amplitude that grow only as $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 0 to maintain target fidelity, supporting scale-out simulation (Huffman et al., 2024).
Precision choice guidelines: For quantum simulation, analytically derived inequalities specify required bits for given target fidelity and circuit depth. For arithmetic, exponent and mantissa bits can be chosen statically, or via ML-adaptive methods (see below) (Huffman et al., 2024, Nikolić et al., 2022).

5. Adaptivity and Learning of Floating-Point Parameters

Recent work leverages machine learning to optimize the number of exponent and mantissa bits per tensor or circuit component, especially in quantum-inspired models for classical hardware or quantum-enhanced ML:

Quantum Mantissa/Quantum Exponent (QM/QE): These ML-driven schemes introduce learnable real-valued parameters $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 1, $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 2 per weight/activation tensor; stochastic rounding and straight-through estimators allow gradient descent to minimize total bits used under loss and regularization constraints. Mantissa and exponent bits adapt independently due to distinct influences on precision and range. QM+QE convergently reduces average mantissa to $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 3 bits and exponents to $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 4 bits, with no >0.4% top-1 accuracy loss compared to FP32, and achieves $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 5 overall reduction in memory/storage. Post-processing via the Gecko compressor yields up to $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 6 reduction (Nikolić et al., 2022).
Lossless and lossy compression enhancements: Such approaches exploit clustering or redundancy in exponent values, packing via variable-length coding, especially effective for activations and weights with non-uniform exponent distributions (Nikolić et al., 2022).

6. Platform-Specific Approaches and Performance

Encoding and operational procedures vary by quantum hardware paradigm:

Gate-based fault-tolerant devices: Focus is on optimizing T-depth, resource state reuse, and subroutine composition using hand-optimized subcircuits, surface code compatibility, and QFT-based acceleration (Häner et al., 2018, Serrallés et al., 23 Oct 2025, Wiebe et al., 2013, Seidel et al., 2021).
Annealers and QUBO encodings: Formulation as quadratic binary optimization enables direct exploitation of quantum annealers (D-Wave Chimera graphs), though embedding challenges (chain stability, native $(−1)^S \cdot (1 . M_{M-1}\cdots M_0)_2 \cdot 2^E$ 7 emulation, near-degenerate spectra for ill-conditioned systems) are significant considerations (Rogers et al., 2019).
Simulation and classical-quantum boundary: Reduced-precision representations, codebook-based vector quantization, and empirical error-fidelity models enable resource-efficient Schrödinger-style quantum circuit simulation, necessary for NISQ regime and benchmarking claims of quantum advantage (Huffman et al., 2024).

7. Extensions, Limitations, and Open Directions

Floating-point encodings on quantum hardware remain an active area, with known limitations and generalization avenues:

Normalization, subnormal/denormal support, and rounding: Certain SBP/Fourier and gate-based approaches defer implementation of sticky bits, rounding modes, and full IEEE-754 compliance for future work (Seidel et al., 2021). Clamp-to-zero/∞ rather than NaN or subnormal handling is common (Häner et al., 2018).
Extension to general real and complex numbers: Most current frameworks generalize naturally to arbitrary real (and sometimes complex) values, though further compatibility with tensor networks, tensor-product Hilbert spaces, and quantum operator representations remains ongoing.
Adaptation to hardware-native gates and topologies: Improved mapping to ion-trap operations (GMS gates), photonic encodings, and alternative architectures is required for device-level optimization (Seidel et al., 2021).
Integration with error-correction and fault tolerance: Considerable research focuses on minimizing resources and maximizing error-resilience, especially for T-gate–dominated subroutines and modular factoring applications.
Learning-based and hybrid encoding/design: Adaptive floating-point “quantum-inspired” frameworks exploiting ML enable automatic precision-range tradeoff, especially salient in large-scale quantum-compatible ML models and hardware-efficient simulation (Nikolić et al., 2022, Huffman et al., 2024).

Quantum floating-point representations thus provide a fundamental set of tools and protocols underpinning the implementation of scientific, engineering, and data analysis algorithms on quantum and quantum-inspired hardware, subject to deep trade-offs in error, complexity, and resource allocation.