HeatACO: Neural-ACO Decoder for TSP
- The paper introduces HeatACO, a decoding algorithm that blends neural priors with a Max-Min Ant System to construct feasible TSP tours under strict degree and single-cycle constraints.
- It employs a candidate edge list and dynamic pheromone updates, using local distance heuristics and a heatmap exponent to balance exploration and error correction.
- Optional 2-opt and 3-opt post-processing further refine solutions, yielding competitive gaps and CPU times on TSP instances up to 10K nodes.
HeatACO is a decoding algorithm introduced for large-scale Travelling Salesman Problems (TSP) that integrates neural "heatmap" predictions with a probabilistic Ant Colony Optimization (ACO) framework. It is designed to translate dense edge-probability matrices generated by neural predictors into feasible TSP tours that obey degree-2 and single-cycle constraints, offering high-quality solutions with computational efficiency at scale (Lin et al., 26 Jan 2026).
1. Problem Formulation and Decoding Challenges
The large-scale symmetrical TSP is defined over points with coordinates and inter-point distances . A legal TSP tour satisfies two critical constraints: (i) each node has degree 2, enforced by for the adjacency matrix , and (ii) the tour forms a single cycle, excluding subtours.
Heatmap-based non-autoregressive TSP solvers output a confidence matrix , where higher signals greater neural confidence that edge belongs to a near-optimal solution. Decoding aims to map to a feasible Hamiltonian cycle. Standard greedy heuristics—such as edge selection by ranking—aggregate errors at scale, yielding poor performance as increases. While MCTS-guided k-opt solvers mitigate such error cascades and enforce constraints accurately, their computational costs are prohibitive for high .
HeatACO instead reframes decoding as constrained probabilistic construction. It samples tours from a distribution that blends three influences:
- Local geometry ( as a distance heuristic),
- Neural prior ( as a soft edge prior),
- Global feedback (pheromone trails learned during search).
2. HeatACO Algorithm: Max-Min Ant System Structure
HeatACO is instantiated as a Max-Min Ant System (MMAS) [Stützle & Hoos 2000], maintaining:
- A pheromone matrix (dynamic global feedback),
- A static distance heuristic ,
- A fixed heatmap .
Transition Probability:
Ants construct tours stepwise. At node , the next node (unvisited, feasible) is selected with: where are exponents, avoids zero-probability transitions (), and is the heatmap exponent modulating reliance on the neural prior. recovers vanilla MMAS.
Candidate Edge Lists:
To achieve scalability, HeatACO restricts sampling and search to candidate edges (). For each node:
- Retain edges with ().
- For each , take the top- highest- neighbors (if above threshold).
- If needed, pad to neighbors with closest nodes by .
Pheromone Update:
After each batch of ant tours:
- Evaporation:
- Reinforcement: If is on elite tour ,
- Clamping: , where , (), and is the evaporation rate.
3. Global Coordination and Correction of Local Errors
The heatmap serves as a soft prior—no edge is strictly forbidden, as even low-confidence options remain accessible. Tour feasibility constraints (degree, subtour) are enforced during sampling. Over multiple iterations, if a high- edge consistently leads to infeasible or suboptimal tours, reinforcement is withheld and pheromone levels for such edges decay, while effective ones are reinforced. This moderates local heatmap mis-rankings, correcting error cascades without resorting to intensive backtracking or search tree expansion.
4. Post-Processing: 2-opt and 3-opt Local Search
Optional post-processing using 2-opt or 3-opt exchanges is undertaken on the candidate edge set to further refine constructed tours. In 2-opt, pairs of edges are considered for replacement if the exchange reduces total tour length, with iterative improvement halted when no further gains are found. 3-opt iteratively attempts more complex triple-edge improvements.
Cost for these routines is for 2-opt passes and for 3-opt passes. For up to 10,000, these searches typically complete within seconds.
5. Experimental Results and Performance Benchmarks
HeatACO was evaluated on TSP500, TSP1K, and TSP10K datasets, with heatmaps derived from AttGCN [Fu et al.], DIMES (Ożański et al., 2022), UTSP (Erceg et al., 2023), and DIFUSCO (Troulé et al., 2023). Baselines include NAR + Greedy merge (fast but brittle), published parallel MCTS combined with k-opt (Pan et al., 2024), and vanilla MMAS.
Key empirical outcomes for fixed heatmaps, 2-opt, using ants, 5000 iterations:
| Dataset | Gap (%) | CPU Time |
|---|---|---|
| TSP500 | 0.11 | ≈ 2 s |
| TSP1K | 0.23 | ≈ 5 s |
| TSP10K | 1.15 | ≈ 1 m |
Further tightening with 3-opt achieved sub-0.01% gaps on TSP500, approximately 0.05% on TSP1K (tens of seconds), and approximately 0.4% on TSP10K (≈4 m). Greedy merge delivered significantly inferior results (gaps >10–40%), while MCTS/k-opt achieved gaps of 1–4% with much higher CPU times (50 s–16 m).
6. Heatmap Reliability and Distribution Shift Effects
Sparse candidate sets with near-perfect recall are attainable by thresholding the heatmap. However, most candidates reside in low-confidence regions, complicating decoding since true tour edges concentrate in a mid-to-high confidence band. Under distribution shift (e.g., TSPLIB circuits, drilling instances), candidate set sizes inflate (Edges/) and heatmap confidence can collapse, leading to degraded performance for greedy approaches. HeatACO remained robust, maintaining sub-1% gaps in seconds and matching or surpassing parallel MCTS at substantially reduced CPU burden.
Auxiliary diagnostics such as binary cross-entropy (CE) and class-weighted CE (WCE) of relative to the reference tour correlated with decoding difficulty, but did not fully predict performance.
7. Hyperparameterization and Practical Considerations
The heatmap exponent sharply modulates the influence of the neural prior. Sweeping across is empirically sufficient. Greater sharpens the prior and can accelerate convergence but risks overcommitting to misranked edges or suffering from poor calibration, especially under aggressive local search. Smaller promotes exploration when is noisy. An entropy-based label-free heuristic can also automate selection by targeting the effective support size of the heatmap-only proposal per node.
Parameter settings, full reproducibility instructions, and source code are available at https://github.com/bochenglin/HEATACO (Lin et al., 26 Jan 2026).