QAISim: Quantum Annealing & QAI Toolkit

Updated 4 July 2026

QAISim is a term used in quantum research to denote both a fast simulated annealing method for Ising/QUBO optimization and a Python simulation toolkit for QAI-based resource management.
The annealing approach leverages a collapsed Trotter approximation to streamline computational cost and improve search efficiency on problems like MAX–CUT and QUBO.
The toolkit offers a gym-style environment with PQC-based reinforcement learning agents for dynamic scheduling in quantum-cloud platforms, enabling practical benchmarking and policy development.

QAISim is a name used in recent arXiv literature for two distinct quantum-computing research artifacts. In Murashima’s 2023 usage, QAISim denotes a Fast Simulated Annealing algorithm inspired by Quantum Monte Carlo (QMC) for binary quadratic optimization, including Max–Cut and QUBO formulations (Murashima, 2023). In a 2025 usage, QAISim denotes a Python-based toolkit for modeling and simulation of Quantum Artificial Intelligence models for resource management in quantum cloud computing environments, with quantum reinforcement learning implemented through parameterized quantum circuits (Singh et al., 1 Dec 2025). The shared label therefore does not identify a single canonical framework. A similarly named but unrelated system, AutoMRISimQA, concerns automated daily quality control of a 3T MRI simulator rather than quantum optimization or quantum-cloud scheduling (Xing et al., 2024).

1. Terminological scope and disambiguation

The literature currently uses “QAISim” in two non-overlapping senses. One sense is algorithmic and concerns approximate quantum-annealing-inspired search in classical optimization; the other is infrastructural and concerns simulation of AI-driven resource allocation in Quantum Computing as a Service environments. The overlap is nominal rather than methodological (Murashima, 2023, Singh et al., 1 Dec 2025).

Name	Domain	Core function
QAISim	Ising/QUBO/Max–Cut optimization	Fast Simulated Annealing inspired by QMC
QAISim	Quantum cloud computing environments	Toolkit for modeling and simulation of QAI models for resource management
AutoMRISimQA	MRI simulator quality control	Automated daily QC of a 3T MRI simulator

This disambiguation is important because the two QAISim works operate at different abstraction levels. The Murashima formulation begins from Suzuki–Trotter-based reasoning and compresses a path-integral viewpoint into a single-spin-vector annealing procedure. The toolkit formulation instead exposes a gym-style environment, QTask and QNode abstractions, and PQC-based reinforcement-learning agents for dynamic scheduling. A plausible implication is that citations to “QAISim” require inspection of the problem domain before any technical comparison is made.

2. QAISim as a QMC-inspired optimization method

In Murashima’s formulation, the target problem is a generic binary quadratic optimization over Boolean variables $x_i \in \{\pm 1\}$ , written as

$\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$

In spin language, this is cast as a classical Ising Hamiltonian $H_d$ , and the quantum-annealing construction introduces a transverse-field term so that

$H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$

The derivation then uses the second-order Suzuki–Trotter formula with $M$ Trotter slices to obtain an effective classical model with periodic boundary conditions (Murashima, 2023).

The effective classical energy is

$E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$

with

$K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$

and a Metropolis update at slice $k$ accepts a flip with

$p_{\mathrm{accept}} = \min[1,\exp(-\Delta E_{\mathrm{eff}})].$

This construction places QAISim within the family of simulated quantum annealing approximations, but the distinctive move is not the Trotterized model itself. It is the subsequent approximation that collapses the $M$ layers into a single working configuration. The paper explicitly characterizes the new approach as advantageous in runtime, while also stating that it “isn’t rigorous mathematically” (Murashima, 2023). That limitation is central to interpreting the method: it is presented as a heuristic rather than as an exact reformulation of path-integral QMC.

3. Collapsed Trotter approximation and computational profile

The key approximation in QAISim is the claim that, in the zero-temperature limit, all $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 0 replicas tend to the same spin configuration, allowing an approximate reordering of the minimization: $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 1 where

$\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 2

and $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 3 is the current best-so-far configuration, termed the “temporal minimum” in the summary (Murashima, 2023).

Once explicit slicing is dropped, each sweep updates a single collapsed spin vector $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 4. The algorithm computes

$\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 5

and uses $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 6 in a Metropolis acceptance step. Temperature $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 7 and transverse field $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 8 are annealed jointly, for example with a geometric factor $\arg \min_x E(x) = \sum_{i<j} J_{ij} x_i x_j + \sum_i h_i x_i .$ 9, while the best-so-far state is updated whenever a lower classical energy is found.

The computational consequence is explicit. Conventional path-integral QMC requires $H_d$ 0 per sweep and commonly chooses $H_d$1 to control discretization error, whereas QAISim reduces the sweep cost to $H_d$ 2 by eliminating the layer index. The summary states that the number of sweeps and temperatures needed is empirically comparable to SQA, so QAISim is faster by roughly a factor $H_d$ 3 (Murashima, 2023).

The implementation guidance is correspondingly low-level. The summary recommends storing $H_d$ 4 in a sparse adjacency list or CSR format, maintaining the spin vector as int8 or $H_d$ 5 bytes, and keeping a running array $H_d$ 6 for $H_d$ 7 $H_d$ 8 updates. It also recommends graph coloring for CPU parallelization and thread-level mapping with red–black ordering on GPU or FPGA platforms. The “pull-to-best” term reads only $H_d$ 9, so it is described as inherently parallel-safe.

4. Benchmarks, tuning practice, and limitations of the optimization QAISim

Murashima evaluated the method on MAX-CUT instances from GSET with 800–2000 nodes. The reported hyperparameter regime included $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 0, $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 1, $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 2, $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 3, $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 4 sweeps for $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 5, $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 6 for $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 7, and geometric $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 8 every $H = H_d + H_o = -\sum_{i<j} J_{ij}\sigma_i^z \sigma_j^z - \Gamma \sum_i \sigma_i^x .$ 9 sweeps (Murashima, 2023).

The reported averages over 50 trials were instance-specific. For G9 with $M$ 0, Best Known Cut $M$ 1, QAISim reached 2054 in 100% of runs, whereas SA reached it in approximately 55%. For G34 with $M$ 2, Best Known Cut $M$ 3, QAISim reached 1384 in 2/50 runs, while SA failed to reach it. This suggests that the collapsed-layer heuristic can materially improve search quality relative to plain simulated annealing on some large instances, but also that success remains sensitive to instance structure and schedule design.

The tuning guidance is explicit. $M$ 4 should be high enough to comfortably accept uphill moves, described as $M$ 5; $M$ 6 should lie in $M$ 7; and the acceptance ratio should be monitored, with a target of about 30–50% early and about 1–5% at the end. The summary also lists several failure modes: if $M$ 8 becomes too extreme, then $M$ 9 and $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 0, freezing all spins to $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 1; overcooling causes freeze-out before a global optimum is found; undercooling spends too much time at high temperature; large $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 2 values motivate 64-bit floating-point evaluation of $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 3; and widely varying $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 4 magnitudes may require normalization or adaptive temperature scales (Murashima, 2023).

5. QAISim as a toolkit for QAI-driven resource management in quantum clouds

In the 2025 work, QAISim is a Python-based simulation framework for modeling and evaluating quantum-reinforcement-learning-driven resource-management policies in Quantum Computing as a Service platforms, with emphasis on large-scale IoT applications (Singh et al., 1 Dec 2025). Its stated objectives are to provide a flexible, gym-style environment in which quantum tasks can be generated, queued, and dispatched to simulated quantum processing units, and to support the design, training, and benchmarking of resource-allocation policies implemented via parameterized quantum circuits.

The architecture is divided into three main modules. The Environment Simulator implements the OpenAI Gymnasium interface through reset, step, and render, while maintaining a queue of QTasks and a registry of QNodes. The Quantum Agent Interface wraps TensorFlow Quantum and Cirq to define PQC-based policies or Q-value networks, exposing a QuantumAgent base class with derived PolicyGradientAgent and DeepQLearningAgent implementations. The Resource Manager, or Broker, matches pending QTasks to selected QNodes and computes task execution time as

$E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 5

The scheduling problem is formalized as a Markov Decision Process $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 6. The state $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 7 includes current queue lengths of each QNode and the arriving QTask feature vector, including arrival time, qubit count $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 8, and circuit layers $E_{\mathrm{eff}}(\{s\}) = -\frac{\beta}{M}\sum_{k,i<j} J_{ij} s_i^{(k)} s_j^{(k)} - K \sum_{k=1}^{M}\sum_{i=1}^{N} s_i^{(k)} s_i^{(k+1)},$ 9. The action space is

$K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 0

with each action assigning the current task to one QNode. The reward is

$K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 1

The expected-return objective is

$K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 2

This framework positions QAISim not as a solver for a fixed combinatorial objective, but as a simulator for online decision policies under queueing, execution-time, and hardware-capability constraints. A plausible implication is that its primary unit of analysis is policy quality under workload variation rather than asymptotic optimization quality on a static Ising instance.

6. QRL implementations, software organization, and empirical behavior of the toolkit

The toolkit implements two QRL methods, Policy Gradient and Deep Q-Learning, both using the “data-reuploading” PQC ansatz of Jerbi et al. (2021) (Singh et al., 1 Dec 2025). The circuit alternates encoding layers $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 3, which rotate each qubit by input-scaled angles, with variational layers $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 4 built from $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 5, $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 6, and $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 7 rotations plus CZ entangling gates. Final measurement of weighted Pauli- $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 8 observables $K = -\frac{1}{2}\ln \tanh(\beta \Gamma / M),$ 9 yields either a policy distribution or a Q-value estimate.

For policy gradient, the policy is

$k$ 0

and REINFORCE is applied through episode returns and an Adam optimizer. For Deep Q-Learning, the action-value estimate is $k$ 1 with $k$ 2, and training uses the Bellman target

$k$ 3

with replay and periodic target-network synchronization.

The software organization is explicit. Core classes include qaisim.core.Broker, qaisim.core.QNode, qaisim.core.QTask, and qaisim.env.QaiEnv. The quantum-agent layer exposes qaisim.qai.PolicyGradientAgent and qaisim.qai.DeepQLearningAgent, each with methods such as select_action(state), store_transition(s,a,r,s′), and learn(). A 5-layer data-reuploading PQC contains 181 trainable parameters $k$ 4, while a comparable 3-layer MLP classical DRL agent with 64-unit hidden layers has 9,221 parameters. The summary identifies this as a model-complexity advantage of approximately 95% fewer variables.

The reported evaluation used Cirq and TensorFlow Quantum within a Gymnasium environment, 100 randomly sampled circuits from the MQT Bench library with 2–50 qubits and depths up to 17,598, and five simulated IBM devices—Marrakesh, Torino, Quebec, Brisbane, and Kolkata—with realistic CLOPS and EPLG metrics. Baselines were a greedy assignment rule and classical DRL agents from QSimPy. Both Policy Gradient and Deep Q-Learning converged within approximately 500–800 episodes to reward levels comparable to classical agents, while noisy simulations with amplitude-damping and depolarization showed graceful degradation but still outperformed the greedy baseline. Average per-episode cumulative returns were reported as 44.04 for QAISim Policy Gradient versus 40.06 for classical Policy Gradient, and 40.04 for QAISim Deep Q-Learning versus 45.36 for classical Deep Q-Learning, alongside gains over greedy scheduling of 50% and 57%, respectively (Singh et al., 1 Dec 2025).

The extension guidance is also concrete. New allocation policies are created by subclassing QuantumAgent and overriding _build_pqc(). Hyperparameters include learning rates $k$ 5, discount factor $k$ 6, and PQC depth. For larger IoT deployments, the summary recommends increasing PQC qubit count or using factorized observables to keep measurement overhead low, starting with 2–3 encoding–variational layers, employing gradient-norm clipping in noisy environments, and using replay buffers of size $k$ 7– $k$ 8 for stable Deep Q-Learning performance.

The two QAISim lines of work therefore share a name but not a common technical substrate. One is a QMC-inspired annealing heuristic centered on a collapsed Trotter approximation and best-so-far coupling; the other is a simulation and benchmarking environment for PQC-based reinforcement learning in quantum-cloud resource management. Any technical discussion of QAISim is incomplete unless this naming collision is made explicit.