HeatACO: Neural-ACO Decoder for TSP

Updated 30 January 2026

The paper introduces HeatACO, a decoding algorithm that blends neural priors with a Max-Min Ant System to construct feasible TSP tours under strict degree and single-cycle constraints.
It employs a candidate edge list and dynamic pheromone updates, using local distance heuristics and a heatmap exponent to balance exploration and error correction.
Optional 2-opt and 3-opt post-processing further refine solutions, yielding competitive gaps and CPU times on TSP instances up to 10K nodes.

HeatACO is a decoding algorithm introduced for large-scale Travelling Salesman Problems (TSP) that integrates neural "heatmap" predictions with a probabilistic Ant Colony Optimization (ACO) framework. It is designed to translate dense edge-probability matrices generated by neural predictors into feasible TSP tours that obey degree-2 and single-cycle constraints, offering high-quality solutions with computational efficiency at scale (Lin et al., 26 Jan 2026).

1. Problem Formulation and Decoding Challenges

The large-scale symmetrical TSP is defined over $N$ points with coordinates $x_i\in\mathbb R^2$ and inter-point distances $d_{ij} = \|x_i - x_j\|_2$ . A legal TSP tour satisfies two critical constraints: (i) each node has degree 2, enforced by $\sum_j A_{ij} = 2$ for the adjacency matrix $A \in \{0,1\}^{N\times N}$ , and (ii) the tour forms a single cycle, excluding subtours.

Heatmap-based non-autoregressive TSP solvers output a confidence matrix $H_{ij} \in [0,1]^{N\times N}$ , where higher $H_{ij}$ signals greater neural confidence that edge $(i,j)$ belongs to a near-optimal solution. Decoding aims to map $(H, D)$ to a feasible Hamiltonian cycle. Standard greedy heuristics—such as edge selection by $H_{ij}$ ranking—aggregate errors at scale, yielding poor performance as $N$ increases. While MCTS-guided k-opt solvers mitigate such error cascades and enforce constraints accurately, their computational costs are prohibitive for high $N$ .

HeatACO instead reframes decoding as constrained probabilistic construction. It samples tours from a distribution that blends three influences:

Local geometry ( $d_{ij}$ as a distance heuristic),
Neural prior ( $H_{ij}$ as a soft edge prior),
Global feedback (pheromone trails $\tau_{ij}$ learned during search).

2. HeatACO Algorithm: Max-Min Ant System Structure

HeatACO is instantiated as a Max-Min Ant System (MMAS) [Stützle & Hoos 2000], maintaining:

A pheromone matrix $\tau \in \mathbb R_{>0}^{N\times N}$ (dynamic global feedback),
A static distance heuristic $\eta_{ij} = 1/d_{ij}$ ,
A fixed heatmap $H_{ij}$ .

Transition Probability:

Ants construct tours stepwise. At node $i$ , the next node $j \in U_i$ (unvisited, feasible) is selected with: $P_{ij} = \frac{\left[\tau_{ij}\right]^{\alpha} \left[\eta_{ij}\right]^{\beta} \left[\tilde H_{ij}\right]^y}{\sum_{k\in U_i} \left[\tau_{ik}\right]^{\alpha} \left[\eta_{ik}\right]^{\beta} \left[\tilde H_{ik}\right]^y}$ where $\alpha, \beta > 0$ are exponents, $\tilde H_{ij} = \max(H_{ij},\epsilon)$ avoids zero-probability transitions ( $\epsilon=10^{-9}$ ), and $y \ge 0$ is the heatmap exponent modulating reliance on the neural prior. $y=0$ recovers vanilla MMAS.

Candidate Edge Lists:

To achieve scalability, HeatACO restricts sampling and search to $O(Nk)$ candidate edges ( $k\approx 20$ ). For each node:

Retain edges with $H_{ij}\ge E_h$ ( $E_h = 10^{-4}$ ).
For each $i$ , take the top- $k$ highest- $H_{ij}$ neighbors (if above threshold).
If needed, pad to $k$ neighbors with closest nodes by $d_{ij}$ .

Pheromone Update:

After each batch of ant tours:

Evaporation: $\tau_{ij} \leftarrow (1-\rho)\tau_{ij}$
Reinforcement: If $(i,j)$ is on elite tour $T^*$ , $\tau_{ij} \leftarrow \tau_{ij} + \rho/\text{Length}(T^*)$
Clamping: $\tau_{ij}\in [\tau_{\min},\tau_{\max}]$ , where $\tau_{\max}=1/(\rho L_{\rm nn})$ , $\tau_{\min}=\tau_{\max}/a$ ( $a\sim N$ ), and $\rho$ is the evaporation rate.

3. Global Coordination and Correction of Local Errors

The heatmap $H$ serves as a soft prior—no edge is strictly forbidden, as even low-confidence options remain accessible. Tour feasibility constraints (degree, subtour) are enforced during sampling. Over multiple iterations, if a high- $H_{ij}$ edge consistently leads to infeasible or suboptimal tours, reinforcement is withheld and pheromone levels for such edges decay, while effective ones are reinforced. This moderates local heatmap mis-rankings, correcting error cascades without resorting to intensive backtracking or search tree expansion.

4. Post-Processing: 2-opt and 3-opt Local Search

Optional post-processing using 2-opt or 3-opt exchanges is undertaken on the $O(Nk)$ candidate edge set to further refine constructed tours. In 2-opt, pairs of edges are considered for replacement if the exchange reduces total tour length, with iterative improvement halted when no further gains are found. 3-opt iteratively attempts more complex triple-edge improvements.

Cost for these routines is $O(Nk\,p_2)$ for $p_2$ 2-opt passes and $O(Nk^2p_3)$ for $p_3$ 3-opt passes. For $N$ up to 10,000, these searches typically complete within seconds.

5. Experimental Results and Performance Benchmarks

HeatACO was evaluated on TSP500, TSP1K, and TSP10K datasets, with heatmaps derived from AttGCN [Fu et al.], DIMES (Ożański et al., 2022), UTSP (Erceg et al., 2023), and DIFUSCO (Troulé et al., 2023). Baselines include NAR + Greedy merge (fast but brittle), published parallel MCTS combined with k-opt (Pan et al., 2024), and vanilla MMAS.

Key empirical outcomes for fixed heatmaps, $+$ 2-opt, using $m=32$ ants, 5000 iterations:

Dataset	Gap (%)	CPU Time
TSP500	0.11	≈ 2 s
TSP1K	0.23	≈ 5 s
TSP10K	1.15	≈ 1 m

Further tightening with 3-opt achieved sub-0.01% gaps on TSP500, approximately 0.05% on TSP1K (tens of seconds), and approximately 0.4% on TSP10K (≈4 m). Greedy merge delivered significantly inferior results (gaps >10–40%), while MCTS/k-opt achieved gaps of 1–4% with much higher CPU times (50 s–16 m).

6. Heatmap Reliability and Distribution Shift Effects

Sparse $O(N)$ candidate sets with near-perfect recall are attainable by thresholding the heatmap. However, most candidates reside in low-confidence regions, complicating decoding since true tour edges concentrate in a mid-to-high confidence band. Under distribution shift (e.g., TSPLIB circuits, drilling instances), candidate set sizes inflate (Edges/ $N \gg 20$ ) and heatmap confidence can collapse, leading to degraded performance for greedy approaches. HeatACO remained robust, maintaining sub-1% gaps in seconds and matching or surpassing parallel MCTS at substantially reduced CPU burden.

Auxiliary diagnostics such as binary cross-entropy (CE) and class-weighted CE (WCE) of $H$ relative to the reference tour correlated with decoding difficulty, but did not fully predict performance.

7. Hyperparameterization and Practical Considerations

The heatmap exponent $y$ sharply modulates the influence of the neural prior. Sweeping $y$ across $\{0.1, 0.5, 1, 2\}$ is empirically sufficient. Greater $y$ sharpens the prior and can accelerate convergence but risks overcommitting to misranked edges or suffering from poor calibration, especially under aggressive local search. Smaller $y$ promotes exploration when $H$ is noisy. An entropy-based label-free heuristic can also automate $y$ selection by targeting the effective support size of the heatmap-only proposal per node.

Parameter settings, full reproducibility instructions, and source code are available at https://github.com/bochenglin/HEATACO (Lin et al., 26 Jan 2026).