Papers
Topics
Authors
Recent
Search
2000 character limit reached

QAISim: Quantum Annealing & QAI Toolkit

Updated 4 July 2026
  • QAISim is a term used in quantum research to denote both a fast simulated annealing method for Ising/QUBO optimization and a Python simulation toolkit for QAI-based resource management.
  • The annealing approach leverages a collapsed Trotter approximation to streamline computational cost and improve search efficiency on problems like MAX–CUT and QUBO.
  • The toolkit offers a gym-style environment with PQC-based reinforcement learning agents for dynamic scheduling in quantum-cloud platforms, enabling practical benchmarking and policy development.

QAISim is a name used in recent arXiv literature for two distinct quantum-computing research artifacts. In Murashima’s 2023 usage, QAISim denotes a Fast Simulated Annealing algorithm inspired by Quantum Monte Carlo (QMC) for binary quadratic optimization, including Max–Cut and QUBO formulations (Murashima, 2023). In a 2025 usage, QAISim denotes a Python-based toolkit for modeling and simulation of Quantum Artificial Intelligence models for resource management in quantum cloud computing environments, with quantum reinforcement learning implemented through parameterized quantum circuits (Singh et al., 1 Dec 2025). The shared label therefore does not identify a single canonical framework. A similarly named but unrelated system, AutoMRISimQA, concerns automated daily quality control of a 3T MRI simulator rather than quantum optimization or quantum-cloud scheduling (Xing et al., 2024).

1. Terminological scope and disambiguation

The literature currently uses “QAISim” in two non-overlapping senses. One sense is algorithmic and concerns approximate quantum-annealing-inspired search in classical optimization; the other is infrastructural and concerns simulation of AI-driven resource allocation in Quantum Computing as a Service environments. The overlap is nominal rather than methodological (Murashima, 2023, Singh et al., 1 Dec 2025).

Name Domain Core function
QAISim Ising/QUBO/Max–Cut optimization Fast Simulated Annealing inspired by QMC
QAISim Quantum cloud computing environments Toolkit for modeling and simulation of QAI models for resource management
AutoMRISimQA MRI simulator quality control Automated daily QC of a 3T MRI simulator

This disambiguation is important because the two QAISim works operate at different abstraction levels. The Murashima formulation begins from Suzuki–Trotter-based reasoning and compresses a path-integral viewpoint into a single-spin-vector annealing procedure. The toolkit formulation instead exposes a gym-style environment, QTask and QNode abstractions, and PQC-based reinforcement-learning agents for dynamic scheduling. A plausible implication is that citations to “QAISim” require inspection of the problem domain before any technical comparison is made.

2. QAISim as a QMC-inspired optimization method

In Murashima’s formulation, the target problem is a generic binary quadratic optimization over Boolean variables xi{±1}x_i \in \{\pm 1\}, written as

argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .

In spin language, this is cast as a classical Ising Hamiltonian HdH_d, and the quantum-annealing construction introduces a transverse-field term so that

H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .

The derivation then uses the second-order Suzuki–Trotter formula with MM Trotter slices to obtain an effective classical model with periodic boundary conditions (Murashima, 2023).

The effective classical energy is

Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},

with

K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),

and a Metropolis update at slice kk accepts a flip with

paccept=min[1,exp(ΔEeff)].p_{\mathrm{accept}} = \min[1,\exp(-\Delta E_{\mathrm{eff}})].

This construction places QAISim within the family of simulated quantum annealing approximations, but the distinctive move is not the Trotterized model itself. It is the subsequent approximation that collapses the MM layers into a single working configuration. The paper explicitly characterizes the new approach as advantageous in runtime, while also stating that it “isn’t rigorous mathematically” (Murashima, 2023). That limitation is central to interpreting the method: it is presented as a heuristic rather than as an exact reformulation of path-integral QMC.

3. Collapsed Trotter approximation and computational profile

The key approximation in QAISim is the claim that, in the zero-temperature limit, all argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .0 replicas tend to the same spin configuration, allowing an approximate reordering of the minimization: argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .1 where

argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .2

and argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .3 is the current best-so-far configuration, termed the “temporal minimum” in the summary (Murashima, 2023).

Once explicit slicing is dropped, each sweep updates a single collapsed spin vector argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .4. The algorithm computes

argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .5

and uses argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .6 in a Metropolis acceptance step. Temperature argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .7 and transverse field argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .8 are annealed jointly, for example with a geometric factor argminxE(x)=i<jJijxixj+ihixi.\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .9, while the best-so-far state is updated whenever a lower classical energy is found.

The computational consequence is explicit. Conventional path-integral QMC requires HdH_d0 per sweep and commonly chooses $H_d$1 to control discretization error, whereas QAISim reduces the sweep cost to HdH_d2 by eliminating the layer index. The summary states that the number of sweeps and temperatures needed is empirically comparable to SQA, so QAISim is faster by roughly a factor HdH_d3 (Murashima, 2023).

The implementation guidance is correspondingly low-level. The summary recommends storing HdH_d4 in a sparse adjacency list or CSR format, maintaining the spin vector as int8 or HdH_d5 bytes, and keeping a running array HdH_d6 for HdH_d7 HdH_d8 updates. It also recommends graph coloring for CPU parallelization and thread-level mapping with red–black ordering on GPU or FPGA platforms. The “pull-to-best” term reads only HdH_d9, so it is described as inherently parallel-safe.

4. Benchmarks, tuning practice, and limitations of the optimization QAISim

Murashima evaluated the method on MAX-CUT instances from GSET with 800–2000 nodes. The reported hyperparameter regime included H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .0, H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .1, H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .2, H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .3, H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .4 sweeps for H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .5, H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .6 for H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .7, and geometric H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .8 every H=Hd+Ho=i<jJijσizσjzΓiσix.H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .9 sweeps (Murashima, 2023).

The reported averages over 50 trials were instance-specific. For G9 with MM0, Best Known Cut MM1, QAISim reached 2054 in 100% of runs, whereas SA reached it in approximately 55%. For G34 with MM2, Best Known Cut MM3, QAISim reached 1384 in 2/50 runs, while SA failed to reach it. This suggests that the collapsed-layer heuristic can materially improve search quality relative to plain simulated annealing on some large instances, but also that success remains sensitive to instance structure and schedule design.

The tuning guidance is explicit. MM4 should be high enough to comfortably accept uphill moves, described as MM5; MM6 should lie in MM7; and the acceptance ratio should be monitored, with a target of about 30–50% early and about 1–5% at the end. The summary also lists several failure modes: if MM8 becomes too extreme, then MM9 and Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},0, freezing all spins to Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},1; overcooling causes freeze-out before a global optimum is found; undercooling spends too much time at high temperature; large Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},2 values motivate 64-bit floating-point evaluation of Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},3; and widely varying Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},4 magnitudes may require normalization or adaptive temperature scales (Murashima, 2023).

5. QAISim as a toolkit for QAI-driven resource management in quantum clouds

In the 2025 work, QAISim is a Python-based simulation framework for modeling and evaluating quantum-reinforcement-learning-driven resource-management policies in Quantum Computing as a Service platforms, with emphasis on large-scale IoT applications (Singh et al., 1 Dec 2025). Its stated objectives are to provide a flexible, gym-style environment in which quantum tasks can be generated, queued, and dispatched to simulated quantum processing units, and to support the design, training, and benchmarking of resource-allocation policies implemented via parameterized quantum circuits.

The architecture is divided into three main modules. The Environment Simulator implements the OpenAI Gymnasium interface through reset, step, and render, while maintaining a queue of QTasks and a registry of QNodes. The Quantum Agent Interface wraps TensorFlow Quantum and Cirq to define PQC-based policies or Q-value networks, exposing a QuantumAgent base class with derived PolicyGradientAgent and DeepQLearningAgent implementations. The Resource Manager, or Broker, matches pending QTasks to selected QNodes and computes task execution time as

Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},5

The scheduling problem is formalized as a Markov Decision Process Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},6. The state Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},7 includes current queue lengths of each QNode and the arriving QTask feature vector, including arrival time, qubit count Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},8, and circuit layers Eeff({s})=βMk,i<jJijsi(k)sj(k)Kk=1Mi=1Nsi(k)si(k+1),E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},9. The action space is

K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),0

with each action assigning the current task to one QNode. The reward is

K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),1

The expected-return objective is

K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),2

This framework positions QAISim not as a solver for a fixed combinatorial objective, but as a simulator for online decision policies under queueing, execution-time, and hardware-capability constraints. A plausible implication is that its primary unit of analysis is policy quality under workload variation rather than asymptotic optimization quality on a static Ising instance.

6. QRL implementations, software organization, and empirical behavior of the toolkit

The toolkit implements two QRL methods, Policy Gradient and Deep Q-Learning, both using the “data-reuploading” PQC ansatz of Jerbi et al. (2021) (Singh et al., 1 Dec 2025). The circuit alternates encoding layers K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),3, which rotate each qubit by input-scaled angles, with variational layers K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),4 built from K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),5, K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),6, and K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),7 rotations plus CZ entangling gates. Final measurement of weighted Pauli-K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),8 observables K=12lntanh(βΓ/M),K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),9 yields either a policy distribution or a Q-value estimate.

For policy gradient, the policy is

kk0

and REINFORCE is applied through episode returns and an Adam optimizer. For Deep Q-Learning, the action-value estimate is kk1 with kk2, and training uses the Bellman target

kk3

with replay and periodic target-network synchronization.

The software organization is explicit. Core classes include qaisim.core.Broker, qaisim.core.QNode, qaisim.core.QTask, and qaisim.env.QaiEnv. The quantum-agent layer exposes qaisim.qai.PolicyGradientAgent and qaisim.qai.DeepQLearningAgent, each with methods such as select_action(state), store_transition(s,a,r,s′), and learn(). A 5-layer data-reuploading PQC contains 181 trainable parameters kk4, while a comparable 3-layer MLP classical DRL agent with 64-unit hidden layers has 9,221 parameters. The summary identifies this as a model-complexity advantage of approximately 95% fewer variables.

The reported evaluation used Cirq and TensorFlow Quantum within a Gymnasium environment, 100 randomly sampled circuits from the MQT Bench library with 2–50 qubits and depths up to 17,598, and five simulated IBM devices—Marrakesh, Torino, Quebec, Brisbane, and Kolkata—with realistic CLOPS and EPLG metrics. Baselines were a greedy assignment rule and classical DRL agents from QSimPy. Both Policy Gradient and Deep Q-Learning converged within approximately 500–800 episodes to reward levels comparable to classical agents, while noisy simulations with amplitude-damping and depolarization showed graceful degradation but still outperformed the greedy baseline. Average per-episode cumulative returns were reported as 44.04 for QAISim Policy Gradient versus 40.06 for classical Policy Gradient, and 40.04 for QAISim Deep Q-Learning versus 45.36 for classical Deep Q-Learning, alongside gains over greedy scheduling of 50% and 57%, respectively (Singh et al., 1 Dec 2025).

The extension guidance is also concrete. New allocation policies are created by subclassing QuantumAgent and overriding _build_pqc(). Hyperparameters include learning rates kk5, discount factor kk6, and PQC depth. For larger IoT deployments, the summary recommends increasing PQC qubit count or using factorized observables to keep measurement overhead low, starting with 2–3 encoding–variational layers, employing gradient-norm clipping in noisy environments, and using replay buffers of size kk7–kk8 for stable Deep Q-Learning performance.

The two QAISim lines of work therefore share a name but not a common technical substrate. One is a QMC-inspired annealing heuristic centered on a collapsed Trotter approximation and best-so-far coupling; the other is a simulation and benchmarking environment for PQC-based reinforcement learning in quantum-cloud resource management. Any technical discussion of QAISim is incomplete unless this naming collision is made explicit.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QAISim.