Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fujitsu Digital Annealer: QUBO Optimization

Updated 4 July 2026
  • Fujitsu Digital Annealer is a quantum-inspired, CMOS-based platform designed specifically to solve dense QUBO and Ising-model optimization problems.
  • It employs hardware-accelerated parallel-trial updates and annealing-derived search methods to achieve competitive speeds on fully connected, dense optimization models.
  • Generational advancements have increased variable capacity up to 100,000 and enabled integrated constraint handling for applications ranging from network systems to quantum error correction.

The Fujitsu Digital Annealer (DA) is a quantum-inspired, CMOS-based platform for solving quadratic unconstrained binary optimization (QUBO) and equivalent Ising-model optimization problems. Across the literature, it is described as an application-specific hardware accelerator or hybrid hardware–software system that natively supports dense, fully connected binary quadratic models and applies annealing-derived search procedures in digital logic rather than on analog quantum hardware (Aramon et al., 2018). Reported generations span early systems for fully connected problems of up to $1024$ variables, second-generation systems with 81928\,192 binary variables, and later DA 3.0/DAv3-class systems with up to 100000100\,000 variables or hybrid DAU+CPU execution, depending on the study and deployment model (Huang et al., 2022).

1. Architecture and generational evolution

Early descriptions present the DA as a custom CMOS chip tailored to fully connected QUBO instances of up to N=1024N=1024 binary variables, with biases stored to 26-bit fixed precision and couplers to 16-bit precision (Aramon et al., 2018). The same line of work emphasizes NN parallel flip engines, constant-time effective-field updates on fully connected graphs, and a global controller implementing annealing schedules and escape logic. A second-generation DA is reported to implement a fully connected QUBO engine with up to 81928\,192 binary variables and up to 64 bits of integer precision for biases and couplers (Cohen et al., 2020). Benchmarking work on practical use cases likewise characterizes the DA as a purpose-built CMOS ASIC that natively supports up to 81928\,192 binary “spins,” with all spin states and coupler weights stored on-chip in SRAM and a controller orchestrating temperature scheduling, bit-flip proposals, and replica exchange (Huang et al., 2022).

Later studies describe a broader family of DA systems rather than a single immutable device. DA 3.0 is reported to handle up to 100000100\,000 binary variables and to provide built-in handling of one-hot and inequality constraints (Ayodele et al., 2022). A Max-Cut benchmark distinguishes DAv2, with native support for fully connected QUBO of up to 81928\,192 variables, from DAv3, which reaches 100000100\,000 variables through a hybrid DAU + CPU “tabu” layer (Shaglel et al., 29 Jul 2025). A 2025 comparative study describes DA v4 as a massively parallel architecture built on CPU and GPU resources, with tightly coupled memory and arithmetic units, full connectivity among variables, QUBO + QC support, and API-level HOBO-to-QUBO conversion utilities (Upadhyay et al., 11 Sep 2025). By contrast, a 2026 transpilation study uses a Fujitsu Digital Annealer Gen4 with full connectivity and no minor-embedding, but reports a maximum of 81928\,1920 variables per QUBO in that workflow, reflecting the formulation and interface used there rather than a universal architectural ceiling (Watanabe et al., 12 May 2026).

This reported evolution suggests that “Digital Annealer” denotes a product line combining specialized digital hardware, firmware, and solver interfaces whose exposed capabilities depend on generation, API, and workload. A consistent theme across generations is native support for dense couplings and the avoidance of the sparse-topology embedding overhead associated with analog quantum annealers (Huang et al., 2022).

2. Optimization model and annealing dynamics

The DA is used to solve optimization problems cast in QUBO or Ising form. One formulation given in the literature is the binary objective

81928\,1921

where the DA accepts a fully connected QUBO matrix 81928\,1922 and linear bias vector 81928\,1923, then returns a binary vector that approximately minimizes the energy (Wei et al., 2021). The equivalent Ising form used in several analyses is

81928\,1924

with standard binary–spin conversion between QUBO and Ising variables (Fukushima-Kimura et al., 2023).

The algorithmic core is annealing-derived rather than quantum-mechanical. Early DA descriptions present a simulated-annealing extension in which all single-bit flips are proposed in parallel, their energy changes 81928\,1925 are computed simultaneously, and one accepted flip is chosen uniformly at random; if no flip is accepted, a global offset is increased to assist escape from local minima (Aramon et al., 2018). Later descriptions of surface-code decoding add replica exchange across temperature replicas, parallel-trial Metropolis evaluation, and user-adjustable replica counts, temperatures, and sweeps (Fujisaki et al., 2022). Benchmarking work on practical use cases likewise describes an enhanced simulated annealing based on Parallel Tempering, with neighboring temperature replicas swapped by the standard Metropolis criterion (Huang et al., 2022).

The mathematical analysis of the first-generation DA formalizes this procedure as a time-inhomogeneous Markov chain with transition kernel 81928\,1926 at inverse temperature 81928\,1927. For each fixed 81928\,1928, the chain is irreducible and aperiodic and therefore has a unique stationary distribution 81928\,1929, but in general 100000100\,0000, where 100000100\,0001 is the Gibbs–Boltzmann distribution; equality occurs only in degenerate cases with no pairwise interactions (Fukushima-Kimura et al., 2023). The same work establishes a necessary and sufficient condition for asymptotic convergence to the ground-state set 100000100\,0002: 100000100\,0003 where 100000100\,0004 is the depth of the deepest nonglobal local minimum. For a logarithmic schedule 100000100\,0005, this requires 100000100\,0006 (Fukushima-Kimura et al., 2023).

A common misconception is therefore that the DA merely reproduces classical single-site simulated annealing in hardware. The formal Markov-chain analysis and the architectural descriptions indicate a more specific process: parallel-trial Metropolis evaluation, one-flip random selection among accepted candidates, and later-generation extensions such as dynamic offsets and replica exchange alter both the transition kernel and the stationary behavior relative to textbook single-spin simulated annealing (Fukushima-Kimura et al., 2023).

3. Constraint handling and QUBO engineering

DA applications rely on converting constrained combinatorial problems into binary quadratic form. Reported interfaces vary by generation. Some studies use pure penalty embedding, adding large quadratic penalties for one-hot, balance, or inequality constraints (Kao et al., 2023). Others report built-in support for one-way one-hot and inequality constraints in the DA API, reducing the need for manual slack-variable expansion (Kao et al., 2023). A later comparative study describes a QUBO + QC mode in which quadratic constraints are handled separately rather than absorbed into a single penalty-augmented objective (Upadhyay et al., 11 Sep 2025).

Construct Representative form Reported usage
One-hot assignment 100000100\,0007 or 100000100\,0008 graph partitioning, ABR, transpilation (Wei et al., 2021)
Balance / inequality slack-variable square penalties graph partitioning, rebuffering, QEC quadratization (Kao et al., 2023)
Higher-order reduction auxiliary bits/spins for quartic terms surface-code decoding under syndrome constraints (Fujisaki et al., 2022)

The recurring pattern is direct binary encoding of discrete assignments, followed by quadratization of any higher-order terms. In graph partitioning, the objective combines a quadratic modularity or cut term with one-hot and balance penalties (Kao et al., 2023). In adaptive bitrate control, each segment–bitrate choice is binary, with quality maximization, quality-switch penalties, rebuffering constraints using slack variables, and an exact-one-bitrate constraint assembled into a single QUBO Hamiltonian (Wei et al., 2021). In surface-code decoding, quartic syndrome terms are converted to quadratic form with auxiliary variables and penalty terms so that the final problem matches the DA input format (Fujisaki et al., 2022).

This engineering step is often the decisive modeling burden. Several studies note that full logical connectivity removes the need for minor embedding on sparse hardware, but does not remove the need for careful encoding, penalty selection, auxiliary-variable management, or coefficient scaling (Fujisaki et al., 2022). A plausible implication is that the DA’s effectiveness is coupled as much to formulation quality as to raw annealing throughput.

4. Application mappings across domains

The DA has been applied to a wide range of QUBO-encodable problems, and the literature is notable for the diversity of formulations rather than for a single canonical workload.

In networked media systems, adaptive bitrate control has been formulated as a QUBO over segment–bitrate binaries 100000100\,0009, with Hamiltonian terms for aggregate video quality, inter-segment quality switches, rebuffering avoidance via slack variables, and one-hot bitrate selection (Wei et al., 2021). On real-world throughput traces from Tram and Ferry scenarios, the reported QoE values were N=1024N=10240 and N=1024N=10241 for the QUBO-DA method, versus N=1024N=10242/N=1024N=10243 for Pensieve, N=1024N=10244/N=1024N=10245 for MPC, and N=1024N=10246/N=1024N=10247 for BBA (Wei et al., 2021).

In industrial control, automated guided vehicle coordination was formulated with binary variables N=1024N=10248 indicating route assignments over a finite horizon, together with penalties enforcing one route per vehicle and collision avoidance on shared edges. In a 10-AGV example, the reported average working rates over repeated N=1024N=10249 s simulations were NN0 for the Fujitsu DA, NN1 for D-Wave 2000Q, NN2 exact for Gurobi MIP, and NN3 for the conventional rule-based method (Ohzeki et al., 2018).

Quantum error correction is one of the most technically detailed DA application areas. A DA decoder for the planar surface code maps syndrome consistency and error sparsity to Ising/QUBO form and reports threshold behavior between NN4 and NN5, very close to the MWPM decoder threshold of about NN6 (Fujisaki et al., 2022). Under depolarizing noise, a related Ising-based study compares soft-constraint and hard-constraint mappings and reports thresholds of about NN7 and NN8 for the DA decoder, compared with about NN9 for MWPM and about 81928\,1920 for CPU-SA and CPLEX in that setting (Takeuchi et al., 2023). The same study reports average iteration counts at 81928\,1921 and 81928\,1922 of 81928\,1923 for the soft mapping and 81928\,1924 for the hard mapping, with a DA runtime estimate of about 81928\,1925 ms per instance under the stated architecture and replica configuration (Takeuchi et al., 2023).

Graph partitioning and community detection form another major cluster of applications. Modularity-based QUBO formulations have been run on networks ranging from Karate Club to large power-grid graphs. One study reports modularity 81928\,1926 on Zachary’s Karate Club and identifies communities in IEEE 33-bus and IEEE 118-bus power networks (Kao et al., 2023). A later study reports modularity values 81928\,1927 for weighted Karate Club, 81928\,1928 for weighted Les Misérables, 81928\,1929 for American Football, and 81928\,1920 for Dolphin; on the Case 1354pegase power-grid network it reports 81928\,1921 binary variables and modularity 81928\,1922 within roughly 81928\,1923 s at 81928\,1924 communities (Kao et al., 2023).

Near-term quantum compilation has also been cast in DA-compatible form. An “accuracy-first” transpilation framework uses the DA either only for global initial mapping (“Hybrid”) or for both mapping and iterative short-horizon routing (“Full DA”). Reported benchmarks show an average CNOT reduction of 81928\,1925 and up to 81928\,1926 versus Qiskit level 3 for the Hybrid strategy, while the Full DA approach outperforms ISAAQ by 81928\,1927 on average and up to 81928\,1928 on structured circuits, but degrades on random or concentrated-connectivity circuits (Watanabe et al., 12 May 2026).

Machine-learning and scientific-inference uses are similarly heterogeneous. Consensus clustering has been encoded as pairwise-similarity and correlation-clustering QUBOs and solved on a second-generation DA, with DA-based models reported as best or tied-best on all seven datasets by consensus ARI for the correlation-clustering formulation (Cohen et al., 2020). Nonnegative/binary matrix factorization uses the DA to solve the binary subproblem in alternating updates; on Olivetti faces, the reported final RMSE and average iteration counts were 81928\,1929 and 100000100\,0000 for NBMF + DA, versus 100000100\,0001 and 100000100\,0002 for classical NMF, with classification accuracies of 100000100\,0003 and 100000100\,0004, respectively (Asaoka et al., 2020). In chemical reaction-condition optimization, a Digital Annealing Unit was used for QUBO-based search over combinatorial condition spaces; with a 100000100\,0005 s annealing cycle, the DAU found 100000100\,0006 candidate conditions, 100000100\,0007 of which outperformed the best among 100000100\,0008 CPU-sampled conditions, for a Negishi-example search space of 100000100\,0009 combinations (Li et al., 2024).

5. Empirical benchmark profile

Across broad benchmarks, the DA’s strongest empirical profile is on dense, highly interconnected, or heavily constrained binary quadratic problems, while its advantages are weaker on sparse problems or on formulations whose encodings are dominated by decomposition overhead.

Early physics-motivated benchmarking on spin glasses reports a time-to-solution speedup of roughly two orders of magnitude over single-core simulated annealing and parallel tempering for fully connected Sherrington–Kirkpatrick problems, but no speedup for sparse two-dimensional spin glasses (Aramon et al., 2018). A benchmarking study on practical use cases reaches a mixed conclusion: both D-Wave and the Fujitsu DA are effective on small size and simple settings, but lose utility on practical size and settings; decomposition extends scalability but remains far from practical use (Huang et al., 2022). This tension between excellent dense-QUBO behavior and formulation-sensitive practical scalability reappears in later studies.

Problem regime Reported DA behavior Representative study
Fully connected spin glasses 81928\,1920 speedup over SA/PT on dense instances (Aramon et al., 2018)
Max-Cut, large benchmark set competitive with best heuristics on up to 81928\,1921 vertices (Shaglel et al., 29 Jul 2025)
QAP / MKP / short-budget TSP often better average objective than tuned GA (Ayodele, 2022)
Sparse low-density CRN QUBOs classical MIP/CP superior (Upadhyay et al., 11 Sep 2025)
Dense codon-selection QUBOs near-linear scaling; DA competitive with hybrid annealers (Upadhyay et al., 11 Sep 2025)

A large Max-Cut benchmark on over 81928\,1922 MQLib instances reports that DA v2 wins on 81928\,1923, ties on 81928\,1924, and loses on 81928\,1925 of 81928\,1926 medium–large instances against the best-of-37 MQLib heuristics, while DAv3 wins on 81928\,1927, ties on 81928\,1928, and loses on 81928\,1929 of 100000100\,0000 instances (Shaglel et al., 29 Jul 2025). On the D-Wave hybrid solver comparison set, DA v3 recorded 100000100\,0001 wins, 100000100\,0002 ties, and 100000100\,0003 loss on 100000100\,0004 integer-weight instances, and 100000100\,0005 wins, 100000100\,0006 ties, and 100000100\,0007 losses on 100000100\,0008 float-weight instances, most losses being within 100000100\,0009–81928\,19200 accuracy ratio (Shaglel et al., 29 Jul 2025).

A direct comparison with a tuned genetic algorithm on QAP, MKP, and TSP reports that at 81928\,19201 s the DA found the optimum on 81928\,19202 MKP instances versus 81928\,19203 for GA, reached the optimum on 81928\,19204 QAP instances within 81928\,19205 s while GA reached the optimum on only 81928\,19206 even after 81928\,19207 s, and was uniformly better on TSP at 81928\,19208 s and 81928\,19209 s, though GA closed the gap on some TSP instances by 81928\,19210 s (Ayodele, 2022). By contrast, a later comparative study on industrial applications reports that for reaction network pathway analysis, classical MIP/CP solvers solve the problem to optimality in reasonable time frames while the DA is not able to do so, whereas in mRNA codon selection the DA reaches the same average cost 81928\,19211 as CP-SAT and SCIP on standard and large proteins, albeit with higher average time-to-solution than CP-SAT (Upadhyay et al., 11 Sep 2025).

The aggregate benchmark picture is therefore not that the DA dominates all optimizers. Rather, the reported evidence suggests a regime-dependent performance profile: especially strong on dense QUBOs with significant pairwise structure, competitive on some large unconstrained graph problems, and less compelling when the underlying optimization is sparse, linear-cost dominated, or requires decomposition that erodes global structure.

6. Limitations, interpretation, and future directions

Several limitations recur across the literature. First, the DA is not a quantum annealer in the physical sense. It is a digital, CMOS-based Ising/QUBO solver, and one practical-use benchmark explicitly lists “No inherent quantum tunneling—very tall or wide energy barriers remain challenging” among its limitations (Huang et al., 2022). Second, the mathematical analysis shows that the stationary distribution of the first-generation DA generally differs from the Gibbs–Boltzmann distribution, so conventional equilibrium intuitions from classical simulated annealing must be applied with care (Fukushima-Kimura et al., 2023).

Third, performance is highly sensitive to formulation. Dense-graph partitioning work reports that the DA excels on dense graphs but loses ground on sparse graphs as 81928\,19212 or imbalance increases (Liu et al., 2022). The transpilation study identifies a trade-off between QUBO size and solution quality, with Full DA degrading on circuits with random or concentrated connectivity because a single anneal is insufficient for the enlarged routing search space (Watanabe et al., 12 May 2026). The Max-Cut benchmark notes that instances with very unbalanced floating-point weight ranges may suffer from rounding losses when mapped into the DAU’s integer format (Shaglel et al., 29 Jul 2025). Application papers also sometimes omit runtime or energy analyses; the ABR-control study, for example, reports QoE but not DA runtime, energy consumption, annealing schedule, or maximum tested QUBO size (Wei et al., 2021).

Fourth, decomposition remains a structural bottleneck. The practical-use benchmark concludes that decomposition methods extend scalability but are still far away from practical use in the settings tested (Huang et al., 2022). This suggests that native large-variable support does not eliminate the need for better partitioning, encoding compression, or hybrid optimization pipelines when problem structure exceeds the solver’s directly usable regime.

Reported future directions are correspondingly technical rather than generic. They include tuning annealing schedules and reducing QUBO dimensionality in adaptive bitrate optimization (Wei et al., 2021); larger DA generations, tighter temperature schedules, and integrated cryo-interfaces for real-time quantum-error-correction decoding (Fujisaki et al., 2022); adaptive 81928\,19213-step routing, multi-trial anneals, and noise-aware weight augmentation for DA-assisted transpilation (Watanabe et al., 12 May 2026); and smart encoding, smart decomposition, and error-mitigation strategies for practical large-scale QUBO workloads (Huang et al., 2022).

Taken together, the literature supports a specific interpretation of the Fujitsu Digital Annealer: it is a family of dense-QUBO optimization systems whose value lies in the conjunction of native full connectivity, hardware-accelerated parallel-trial updates, and annealing-derived global search. The empirical record is strongest where those properties align with the problem’s structure, and markedly less uniform where sparsity, awkward encodings, or decomposition dominate the effective computational cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fujitsu Digital Annealer (DA).