Fujitsu Digital Annealer: QUBO Optimization

Updated 4 July 2026

Fujitsu Digital Annealer is a quantum-inspired, CMOS-based platform designed specifically to solve dense QUBO and Ising-model optimization problems.
It employs hardware-accelerated parallel-trial updates and annealing-derived search methods to achieve competitive speeds on fully connected, dense optimization models.
Generational advancements have increased variable capacity up to 100,000 and enabled integrated constraint handling for applications ranging from network systems to quantum error correction.

The Fujitsu Digital Annealer (DA) is a quantum-inspired, CMOS-based platform for solving quadratic unconstrained binary optimization (QUBO) and equivalent Ising-model optimization problems. Across the literature, it is described as an application-specific hardware accelerator or hybrid hardware–software system that natively supports dense, fully connected binary quadratic models and applies annealing-derived search procedures in digital logic rather than on analog quantum hardware (Aramon et al., 2018). Reported generations span early systems for fully connected problems of up to $1024$ variables, second-generation systems with $8\,192$ binary variables, and later DA 3.0/DAv3-class systems with up to $100\,000$ variables or hybrid DAU+CPU execution, depending on the study and deployment model (Huang et al., 2022).

1. Architecture and generational evolution

Early descriptions present the DA as a custom CMOS chip tailored to fully connected QUBO instances of up to $N=1024$ binary variables, with biases stored to 26-bit fixed precision and couplers to 16-bit precision (Aramon et al., 2018). The same line of work emphasizes $N$ parallel flip engines, constant-time effective-field updates on fully connected graphs, and a global controller implementing annealing schedules and escape logic. A second-generation DA is reported to implement a fully connected QUBO engine with up to $8\,192$ binary variables and up to 64 bits of integer precision for biases and couplers (Cohen et al., 2020). Benchmarking work on practical use cases likewise characterizes the DA as a purpose-built CMOS ASIC that natively supports up to $8\,192$ binary “spins,” with all spin states and coupler weights stored on-chip in SRAM and a controller orchestrating temperature scheduling, bit-flip proposals, and replica exchange (Huang et al., 2022).

Later studies describe a broader family of DA systems rather than a single immutable device. DA 3.0 is reported to handle up to $100\,000$ binary variables and to provide built-in handling of one-hot and inequality constraints (Ayodele et al., 2022). A Max-Cut benchmark distinguishes DAv2, with native support for fully connected QUBO of up to $8\,192$ variables, from DAv3, which reaches $100\,000$ variables through a hybrid DAU + CPU “tabu” layer (Shaglel et al., 29 Jul 2025). A 2025 comparative study describes DA v4 as a massively parallel architecture built on CPU and GPU resources, with tightly coupled memory and arithmetic units, full connectivity among variables, QUBO + QC support, and API-level HOBO-to-QUBO conversion utilities (Upadhyay et al., 11 Sep 2025). By contrast, a 2026 transpilation study uses a Fujitsu Digital Annealer Gen4 with full connectivity and no minor-embedding, but reports a maximum of $8\,192$ 0 variables per QUBO in that workflow, reflecting the formulation and interface used there rather than a universal architectural ceiling (Watanabe et al., 12 May 2026).

This reported evolution suggests that “Digital Annealer” denotes a product line combining specialized digital hardware, firmware, and solver interfaces whose exposed capabilities depend on generation, API, and workload. A consistent theme across generations is native support for dense couplings and the avoidance of the sparse-topology embedding overhead associated with analog quantum annealers (Huang et al., 2022).

2. Optimization model and annealing dynamics

The DA is used to solve optimization problems cast in QUBO or Ising form. One formulation given in the literature is the binary objective

$8\,192$ 1

where the DA accepts a fully connected QUBO matrix $8\,192$ 2 and linear bias vector $8\,192$ 3, then returns a binary vector that approximately minimizes the energy (Wei et al., 2021). The equivalent Ising form used in several analyses is

$8\,192$ 4

with standard binary–spin conversion between QUBO and Ising variables (Fukushima-Kimura et al., 2023).

The algorithmic core is annealing-derived rather than quantum-mechanical. Early DA descriptions present a simulated-annealing extension in which all single-bit flips are proposed in parallel, their energy changes $8\,192$ 5 are computed simultaneously, and one accepted flip is chosen uniformly at random; if no flip is accepted, a global offset is increased to assist escape from local minima (Aramon et al., 2018). Later descriptions of surface-code decoding add replica exchange across temperature replicas, parallel-trial Metropolis evaluation, and user-adjustable replica counts, temperatures, and sweeps (Fujisaki et al., 2022). Benchmarking work on practical use cases likewise describes an enhanced simulated annealing based on Parallel Tempering, with neighboring temperature replicas swapped by the standard Metropolis criterion (Huang et al., 2022).

The mathematical analysis of the first-generation DA formalizes this procedure as a time-inhomogeneous Markov chain with transition kernel $8\,192$ 6 at inverse temperature $8\,192$ 7. For each fixed $8\,192$ 8, the chain is irreducible and aperiodic and therefore has a unique stationary distribution $8\,192$ 9, but in general $100\,000$ 0, where $100\,000$ 1 is the Gibbs–Boltzmann distribution; equality occurs only in degenerate cases with no pairwise interactions (Fukushima-Kimura et al., 2023). The same work establishes a necessary and sufficient condition for asymptotic convergence to the ground-state set $100\,000$ 2: $100\,000$ 3 where $100\,000$ 4 is the depth of the deepest nonglobal local minimum. For a logarithmic schedule $100\,000$ 5, this requires $100\,000$ 6 (Fukushima-Kimura et al., 2023).

A common misconception is therefore that the DA merely reproduces classical single-site simulated annealing in hardware. The formal Markov-chain analysis and the architectural descriptions indicate a more specific process: parallel-trial Metropolis evaluation, one-flip random selection among accepted candidates, and later-generation extensions such as dynamic offsets and replica exchange alter both the transition kernel and the stationary behavior relative to textbook single-spin simulated annealing (Fukushima-Kimura et al., 2023).

3. Constraint handling and QUBO engineering

DA applications rely on converting constrained combinatorial problems into binary quadratic form. Reported interfaces vary by generation. Some studies use pure penalty embedding, adding large quadratic penalties for one-hot, balance, or inequality constraints (Kao et al., 2023). Others report built-in support for one-way one-hot and inequality constraints in the DA API, reducing the need for manual slack-variable expansion (Kao et al., 2023). A later comparative study describes a QUBO + QC mode in which quadratic constraints are handled separately rather than absorbed into a single penalty-augmented objective (Upadhyay et al., 11 Sep 2025).

Construct	Representative form	Reported usage
One-hot assignment	$100\,000$ 7 or $100\,000$ 8	graph partitioning, ABR, transpilation (Wei et al., 2021)
Balance / inequality	slack-variable square penalties	graph partitioning, rebuffering, QEC quadratization (Kao et al., 2023)
Higher-order reduction	auxiliary bits/spins for quartic terms	surface-code decoding under syndrome constraints (Fujisaki et al., 2022)

The recurring pattern is direct binary encoding of discrete assignments, followed by quadratization of any higher-order terms. In graph partitioning, the objective combines a quadratic modularity or cut term with one-hot and balance penalties (Kao et al., 2023). In adaptive bitrate control, each segment–bitrate choice is binary, with quality maximization, quality-switch penalties, rebuffering constraints using slack variables, and an exact-one-bitrate constraint assembled into a single QUBO Hamiltonian (Wei et al., 2021). In surface-code decoding, quartic syndrome terms are converted to quadratic form with auxiliary variables and penalty terms so that the final problem matches the DA input format (Fujisaki et al., 2022).

This engineering step is often the decisive modeling burden. Several studies note that full logical connectivity removes the need for minor embedding on sparse hardware, but does not remove the need for careful encoding, penalty selection, auxiliary-variable management, or coefficient scaling (Fujisaki et al., 2022). A plausible implication is that the DA’s effectiveness is coupled as much to formulation quality as to raw annealing throughput.

4. Application mappings across domains

The DA has been applied to a wide range of QUBO-encodable problems, and the literature is notable for the diversity of formulations rather than for a single canonical workload.

In networked media systems, adaptive bitrate control has been formulated as a QUBO over segment–bitrate binaries $100\,000$ 9, with Hamiltonian terms for aggregate video quality, inter-segment quality switches, rebuffering avoidance via slack variables, and one-hot bitrate selection (Wei et al., 2021). On real-world throughput traces from Tram and Ferry scenarios, the reported QoE values were $N=1024$ 0 and $N=1024$ 1 for the QUBO-DA method, versus $N=1024$ 2/ $N=1024$ 3 for Pensieve, $N=1024$ 4/ $N=1024$ 5 for MPC, and $N=1024$ 6/ $N=1024$ 7 for BBA (Wei et al., 2021).

In industrial control, automated guided vehicle coordination was formulated with binary variables $N=1024$ 8 indicating route assignments over a finite horizon, together with penalties enforcing one route per vehicle and collision avoidance on shared edges. In a 10-AGV example, the reported average working rates over repeated $N=1024$ 9 s simulations were $N$ 0 for the Fujitsu DA, $N$ 1 for D-Wave 2000Q, $N$ 2 exact for Gurobi MIP, and $N$ 3 for the conventional rule-based method (Ohzeki et al., 2018).

Quantum error correction is one of the most technically detailed DA application areas. A DA decoder for the planar surface code maps syndrome consistency and error sparsity to Ising/QUBO form and reports threshold behavior between $N$ 4 and $N$ 5, very close to the MWPM decoder threshold of about $N$ 6 (Fujisaki et al., 2022). Under depolarizing noise, a related Ising-based study compares soft-constraint and hard-constraint mappings and reports thresholds of about $N$ 7 and $N$ 8 for the DA decoder, compared with about $N$ 9 for MWPM and about $8\,192$ 0 for CPU-SA and CPLEX in that setting (Takeuchi et al., 2023). The same study reports average iteration counts at $8\,192$ 1 and $8\,192$ 2 of $8\,192$ 3 for the soft mapping and $8\,192$ 4 for the hard mapping, with a DA runtime estimate of about $8\,192$ 5 ms per instance under the stated architecture and replica configuration (Takeuchi et al., 2023).

Graph partitioning and community detection form another major cluster of applications. Modularity-based QUBO formulations have been run on networks ranging from Karate Club to large power-grid graphs. One study reports modularity $8\,192$ 6 on Zachary’s Karate Club and identifies communities in IEEE 33-bus and IEEE 118-bus power networks (Kao et al., 2023). A later study reports modularity values $8\,192$ 7 for weighted Karate Club, $8\,192$ 8 for weighted Les Misérables, $8\,192$ 9 for American Football, and $8\,192$ 0 for Dolphin; on the Case 1354pegase power-grid network it reports $8\,192$ 1 binary variables and modularity $8\,192$ 2 within roughly $8\,192$ 3 s at $8\,192$ 4 communities (Kao et al., 2023).

Near-term quantum compilation has also been cast in DA-compatible form. An “accuracy-first” transpilation framework uses the DA either only for global initial mapping (“Hybrid”) or for both mapping and iterative short-horizon routing (“Full DA”). Reported benchmarks show an average CNOT reduction of $8\,192$ 5 and up to $8\,192$ 6 versus Qiskit level 3 for the Hybrid strategy, while the Full DA approach outperforms ISAAQ by $8\,192$ 7 on average and up to $8\,192$ 8 on structured circuits, but degrades on random or concentrated-connectivity circuits (Watanabe et al., 12 May 2026).

Machine-learning and scientific-inference uses are similarly heterogeneous. Consensus clustering has been encoded as pairwise-similarity and correlation-clustering QUBOs and solved on a second-generation DA, with DA-based models reported as best or tied-best on all seven datasets by consensus ARI for the correlation-clustering formulation (Cohen et al., 2020). Nonnegative/binary matrix factorization uses the DA to solve the binary subproblem in alternating updates; on Olivetti faces, the reported final RMSE and average iteration counts were $8\,192$ 9 and $100\,000$ 0 for NBMF + DA, versus $100\,000$ 1 and $100\,000$ 2 for classical NMF, with classification accuracies of $100\,000$ 3 and $100\,000$ 4, respectively (Asaoka et al., 2020). In chemical reaction-condition optimization, a Digital Annealing Unit was used for QUBO-based search over combinatorial condition spaces; with a $100\,000$ 5 s annealing cycle, the DAU found $100\,000$ 6 candidate conditions, $100\,000$ 7 of which outperformed the best among $100\,000$ 8 CPU-sampled conditions, for a Negishi-example search space of $100\,000$ 9 combinations (Li et al., 2024).

5. Empirical benchmark profile

Across broad benchmarks, the DA’s strongest empirical profile is on dense, highly interconnected, or heavily constrained binary quadratic problems, while its advantages are weaker on sparse problems or on formulations whose encodings are dominated by decomposition overhead.

Early physics-motivated benchmarking on spin glasses reports a time-to-solution speedup of roughly two orders of magnitude over single-core simulated annealing and parallel tempering for fully connected Sherrington–Kirkpatrick problems, but no speedup for sparse two-dimensional spin glasses (Aramon et al., 2018). A benchmarking study on practical use cases reaches a mixed conclusion: both D-Wave and the Fujitsu DA are effective on small size and simple settings, but lose utility on practical size and settings; decomposition extends scalability but remains far from practical use (Huang et al., 2022). This tension between excellent dense-QUBO behavior and formulation-sensitive practical scalability reappears in later studies.

Problem regime	Reported DA behavior	Representative study
Fully connected spin glasses	$8\,192$ 0 speedup over SA/PT on dense instances	(Aramon et al., 2018)
Max-Cut, large benchmark set	competitive with best heuristics on up to $8\,192$ 1 vertices	(Shaglel et al., 29 Jul 2025)
QAP / MKP / short-budget TSP	often better average objective than tuned GA	(Ayodele, 2022)
Sparse low-density CRN QUBOs	classical MIP/CP superior	(Upadhyay et al., 11 Sep 2025)
Dense codon-selection QUBOs	near-linear scaling; DA competitive with hybrid annealers	(Upadhyay et al., 11 Sep 2025)

A large Max-Cut benchmark on over $8\,192$ 2 MQLib instances reports that DA v2 wins on $8\,192$ 3, ties on $8\,192$ 4, and loses on $8\,192$ 5 of $8\,192$ 6 medium–large instances against the best-of-37 MQLib heuristics, while DAv3 wins on $8\,192$ 7, ties on $8\,192$ 8, and loses on $8\,192$ 9 of $100\,000$ 0 instances (Shaglel et al., 29 Jul 2025). On the D-Wave hybrid solver comparison set, DA v3 recorded $100\,000$ 1 wins, $100\,000$ 2 ties, and $100\,000$ 3 loss on $100\,000$ 4 integer-weight instances, and $100\,000$ 5 wins, $100\,000$ 6 ties, and $100\,000$ 7 losses on $100\,000$ 8 float-weight instances, most losses being within $100\,000$ 9– $8\,192$ 00 accuracy ratio (Shaglel et al., 29 Jul 2025).

A direct comparison with a tuned genetic algorithm on QAP, MKP, and TSP reports that at $8\,192$ 01 s the DA found the optimum on $8\,192$ 02 MKP instances versus $8\,192$ 03 for GA, reached the optimum on $8\,192$ 04 QAP instances within $8\,192$ 05 s while GA reached the optimum on only $8\,192$ 06 even after $8\,192$ 07 s, and was uniformly better on TSP at $8\,192$ 08 s and $8\,192$ 09 s, though GA closed the gap on some TSP instances by $8\,192$ 10 s (Ayodele, 2022). By contrast, a later comparative study on industrial applications reports that for reaction network pathway analysis, classical MIP/CP solvers solve the problem to optimality in reasonable time frames while the DA is not able to do so, whereas in mRNA codon selection the DA reaches the same average cost $8\,192$ 11 as CP-SAT and SCIP on standard and large proteins, albeit with higher average time-to-solution than CP-SAT (Upadhyay et al., 11 Sep 2025).

The aggregate benchmark picture is therefore not that the DA dominates all optimizers. Rather, the reported evidence suggests a regime-dependent performance profile: especially strong on dense QUBOs with significant pairwise structure, competitive on some large unconstrained graph problems, and less compelling when the underlying optimization is sparse, linear-cost dominated, or requires decomposition that erodes global structure.

6. Limitations, interpretation, and future directions

Several limitations recur across the literature. First, the DA is not a quantum annealer in the physical sense. It is a digital, CMOS-based Ising/QUBO solver, and one practical-use benchmark explicitly lists “No inherent quantum tunneling—very tall or wide energy barriers remain challenging” among its limitations (Huang et al., 2022). Second, the mathematical analysis shows that the stationary distribution of the first-generation DA generally differs from the Gibbs–Boltzmann distribution, so conventional equilibrium intuitions from classical simulated annealing must be applied with care (Fukushima-Kimura et al., 2023).

Third, performance is highly sensitive to formulation. Dense-graph partitioning work reports that the DA excels on dense graphs but loses ground on sparse graphs as $8\,192$ 12 or imbalance increases (Liu et al., 2022). The transpilation study identifies a trade-off between QUBO size and solution quality, with Full DA degrading on circuits with random or concentrated connectivity because a single anneal is insufficient for the enlarged routing search space (Watanabe et al., 12 May 2026). The Max-Cut benchmark notes that instances with very unbalanced floating-point weight ranges may suffer from rounding losses when mapped into the DAU’s integer format (Shaglel et al., 29 Jul 2025). Application papers also sometimes omit runtime or energy analyses; the ABR-control study, for example, reports QoE but not DA runtime, energy consumption, annealing schedule, or maximum tested QUBO size (Wei et al., 2021).

Fourth, decomposition remains a structural bottleneck. The practical-use benchmark concludes that decomposition methods extend scalability but are still far away from practical use in the settings tested (Huang et al., 2022). This suggests that native large-variable support does not eliminate the need for better partitioning, encoding compression, or hybrid optimization pipelines when problem structure exceeds the solver’s directly usable regime.

Reported future directions are correspondingly technical rather than generic. They include tuning annealing schedules and reducing QUBO dimensionality in adaptive bitrate optimization (Wei et al., 2021); larger DA generations, tighter temperature schedules, and integrated cryo-interfaces for real-time quantum-error-correction decoding (Fujisaki et al., 2022); adaptive $8\,192$ 13-step routing, multi-trial anneals, and noise-aware weight augmentation for DA-assisted transpilation (Watanabe et al., 12 May 2026); and smart encoding, smart decomposition, and error-mitigation strategies for practical large-scale QUBO workloads (Huang et al., 2022).

Taken together, the literature supports a specific interpretation of the Fujitsu Digital Annealer: it is a family of dense-QUBO optimization systems whose value lies in the conjunction of native full connectivity, hardware-accelerated parallel-trial updates, and annealing-derived global search. The empirical record is strongest where those properties align with the problem’s structure, and markedly less uniform where sparsity, awkward encodings, or decomposition dominate the effective computational cost.