Factorization Machine with Annealing

Updated 29 July 2025

Factorization Machine with Annealing is a hybrid optimization framework that integrates surrogate modeling with annealing strategies to efficiently tackle complex, high-dimensional problems.
It transforms objective functions into quadratic forms (QUBO/Ising) using factorization machines, enabling scalable and precise modeling of black-box functions.
The framework iteratively refines surrogate accuracy via annealing (simulated or quantum), balancing exploration and exploitation to mitigate challenges like the curse of dimensionality.

Factorization Machine with Annealing (FMA) is a paradigm for black-box and combinatorial optimization that unites the surrogate modeling capabilities of factorization machines (FMs) with metaheuristic and hardware-accelerated annealing approaches. In the FMA framework, the FM is trained to approximate an objective function in quadratic form, which is subsequently optimized using annealing algorithms—either simulated or quantum—interpreting the FM as a quadratic unconstrained binary optimization (QUBO) or Ising Hamiltonian. The methodology addresses the challenges of scalability, efficiency, and the curse of dimensionality in high-complexity optimization tasks and has seen integration into recommendation systems, materials discovery, combinatorial design problems, and continuous parameter learning with binary representation. This article synthesizes the formalism, key algorithmic innovations, computational properties, and notable applications of Factorization Machine with Annealing.

1. Mathematical and Algorithmic Foundations

At its core, FMA leverages the functional form of a factorization machine as a surrogate to model a black-box function $f:\{0,1\}^N\to\mathbb{R}$ or $f:\mathbb{R}^d\to\mathbb{R}$ (after conversion to binary representation). The FM model expresses predictions as

$\bar{y}(\mathbf{x}) = w_0 + \sum_{i=1}^{N} w_i x_i + \sum_{i=1}^{N}\sum_{j=i+1}^{N} \langle \mathbf{v}_i, \mathbf{v}_j \rangle \, x_i x_j$

where $x_i\in\{0,1\}$ (or $x_i\in\mathbb{R}$ under certain encodings), $w_0$ is bias, $w_i$ are linear coefficients, and $\mathbf{v}_i$ are latent vectors with inner dimension $K$ . All second-order interactions can be directly and efficiently represented, supporting high expressivity with tractable parameterization in high dimension.

This quadratic surrogate can be unambiguously mapped onto a QUBO matrix $Q$ , where

$Q_{ij} = \begin{cases} w_i, & i = j\ \langle \mathbf{v}_i, \mathbf{v}_j \rangle, & i \neq j \end{cases}$

and the annealing objective becomes $\min_{\mathbf{x}\in\{0,1\}^N} H_{\rm FM}(\mathbf{x}) = \mathbf{x}^T Q \mathbf{x} + c$ .

The entire FMA workflow thus consists of (1) surrogate model training, (2) conversion to quadratic annealing form, and (3) candidate solution search using an annealing optimizer. This process is iterated, with each search cycle augmenting the training set with evaluated candidate solutions, thereby refining the surrogate's fidelity to the original black-box function (Tamura et al., 24 Jul 2025).

2. Annealing Mechanisms and Exploration Strategies

FMA supports a variety of annealing strategies:

Simulated Annealing (SA): Here, candidate solutions are updated according to a temperature-based probabilistic acceptance rule. Notably, (Shehata et al., 2017) applies simulated annealing with Lévy flight random walks in place of Gaussian steps, enabling a balance of local exploitation and global exploration. The update rule for component $x_i$ at iteration $t$ is given by $x_i^{t+1} = x_i^t + \alpha L(s,\lambda)$ , with $L(s,\lambda)$ drawn from a Lévy distribution via Mantegna's algorithm.
Quantum Annealing (QA): In QA, the QUBO matrix learned by FM is provided as the problem Hamiltonian for hardware solvers (e.g., D-Wave annealer). The quantum system exploits quantum tunneling and adiabatic processes to seek ground-state (minimum energy) assignments, often reaching good suboptimal solutions with lower time complexities than classical solvers, particularly in highly non-convex landscapes (Liu et al., 2022).

Both modalities capitalize on the quadratic structure provided by FMs, with the choice dictated by available hardware and problem characteristics. Quantum annealing is especially suited for large-scale, hardware-accelerated scenarios.

3. Labeling, Encoding, and Optimization of Complex Domains

The effective application of FMA to combinatorial problems (e.g., TSP, scheduling) and continuous domains relies on variable encoding strategies:

Bit Labeling: For permutation problems, encoding schemes directly affect the smoothness of the effective energy landscape. For instance, in the TSP context, natural labeling (direct mapping of index to binary) results in an uncorrelated Hamming and solution space, while Gray labeling (using inversion numbers and Gray codes) ensures adjacent routes are close in both physical and binary distance. Gray labeling thus reduces the density of local minima and accelerates convergence (Koshikawa et al., 2 Jul 2024).
Continuous Variable Embeddings: Real variables often require conversion via one-hot, binary, or Gray-code encodings. However, this binary representation introduces the potential for “noisy” Hamiltonian surfaces if some combinations are never activated in training, impeding sampling performance (Endo et al., 5 Jul 2024). Function smoothing regularization (FSR), where adjacent binary variables are explicitly regularized for smoothness, is invoked to mitigate such artifacts.

Encoding strategies are problem-dependent, with tailored approaches improving both optimization accuracy and sampling reliability.

4. Model Initialization and Surrogate Enhancements

Initialization of FM parameters, notably the quadratic interaction component, is critical for efficient optimization:

Low-Rank Approximation Initialization: Warm-starting the FM model using a low-rank eigen-decomposition that closely matches a pre-existing Ising model or coupling matrix can substantially reduce training iterations and maintain proximity to the problem's dominant interactions. This approach, supported by random matrix theory, ensures that effective rank (often denoted $K^*$ ) captures the essential subspace of interactions, enhancing optimization success rates for black-box combinatorial problems (Seki et al., 16 Oct 2024).
Augmentation with Slack Variables: To capture higher-order interactions, slack variables are appended to the FM, expanding the effective function space beyond quadratics implicitly. The extended surrogate is optimized with annealing, and the slack variables are updated iteratively as part of the solution process, both improving expressivity and unifying what would otherwise be two-step reduction-and-solve workflows (Wang et al., 2 Jul 2025).

Surrogate enhancements such as these are crucial for domains requiring high-fidelity modeling of non-trivial or higher-order energetic dependencies.

5. Practical Applications and Computational Properties

FMA and its variants have demonstrated impact across multiple domains:

Application Area	FMA Mechanism	Notable Outcomes
Recommender Systems	FM-QUBO + Quantum Annealing	Faster-than-quadratic user recommendation (Liu et al., 2022)
Materials Discovery	FM-QUBO → Annealing with constraints	Rapid crystal ground state sampling (Couzinie et al., 7 Aug 2024)
Combinatorial Optimization	SA/QA with bit labeling/FMs	Efficient TSP route search with Gray labeling (Koshikawa et al., 2 Jul 2024)
Continuous Optimization	FM-QUBO + Smoothing Reg.	Enhanced generalization, faster convergence (Endo et al., 5 Jul 2024)
Drug Combination Prediction	FM-QUBO + Slack Variables	Improved prediction of drug effects (Wang et al., 2 Jul 2025)

Performance metrics are application-dependent but commonly include convergence speed, quality of sampled minima, rank correlation to reference outcomes, and surrogate-model generalization. For recommender systems using quantum hardware, experimental benchmarks indicate significant reductions in runtime and improved scalability compared to classical heuristics (Liu et al., 2022). In continuous parameter estimation, function smoothing regularization yields high $R^2$ with up to two orders of magnitude fewer samples than naive FMs (Endo et al., 5 Jul 2024).

Implementation is further supported by available Python packages enabling integration of FMA workflows and annealing backends (e.g., FMQA codebase, Amplify-BBOpt) (Tamura et al., 24 Jul 2025).

6. Current Limitations and Future Directions

Current limitations and avenues for further investigation include:

Dataset Construction and Stagnation: Conventional FMA, which accumulates all historical training data, exhibits “stiffness” as recent information is diluted. Employing FIFO-style sequential datasets—in which only the latest $D_\text{latest}$ samples inform model updates—has been shown to enhance surrogate adaptability and lead to better optimization outcomes with fewer black-box function calls. However, setting $D_\text{latest}$ too low (overfitting) or too high (dilution) remains a trade-off requiring problem-specific tuning (Nakano et al., 28 Jul 2025).
Regularization and Overfitting in High Dimension: While function smoothing regularization addresses surface noise in binary-encoded continuous optimization, improper selection of the smoothing strength $\lambda_{SR}$ can induce over-smoothing and information loss.
Scalability and Hardware Constraints: Quantum annealers natively support QUBO inputs but are presently limited in connectivity and qubit count; mappings to higher-order or large-variable problems require innovative reduction and embedding strategies.

Prospective research directions include dynamic/adaptive data windowing to maintain surrogate “agility,” hybrid regularization strategies to balance landscape smoothness and model expressivity, advanced encoding schemes for domain-adapted QUBO construction, and deeper integration with emerging Ising/annealing hardware to further improve sampling density, time-to-solution, and accuracy in large-scale applications.

7. Theoretical and Algorithmic Significance

FMA establishes a unifying surrogate-guided black-box optimization framework able to exploit both metaheuristic and hardware-accelerated search. The methodology generalizes stochastic optimization, classical Bayesian optimization, and combinatorial annealing by providing an expressively efficient, QUBO-compatible surrogate that is adaptively updated. Integration of modern learning paradigms (slack variables for higher-order effects, smoothing regularization for continuous spaces, and advanced initialization via low-rank decompositions) expands FMA capacity to model and optimize highly complex non-linear landscapes inherent in scientific and engineering problems.

The algorithmic synergy between FM surrogate expressivity and annealing optimizer exploration makes FMA a prominent technique for scalable, data-efficient optimization in high-dimensional and black-box environments.