Relaxed One-Hot Optimization
- Relaxed one-hot optimization is a method that converts discrete one-hot constraints into continuous, differentiable surrogates, making gradient-based optimization feasible.
- It employs entry-wise concave proxy functions and deterministic rounding to ensure that the relaxed solutions approximate the optimal combinatorial outcomes.
- Empirical results demonstrate that this approach outperforms techniques like Gumbel-Softmax and reinforcement learning in applications such as node matching, FPGA resource balancing, and approximate computing.
Relaxed one-hot optimization is a methodology for transforming hard combinatorial problems involving discrete one-hot or binary selections into a form amenable to gradient-based optimization and end-to-end machine learning. Instead of directly optimizing over discrete indicator vectors, which is generally NP–hard and non-differentiable, this approach introduces continuous surrogates—typically probability vectors on simplices or interval boxes—and leverages principled relaxation-plus-rounding schemes. This enables the application of supervised or unsupervised learning, backpropagation, and modern neural network function classes to otherwise intractable discrete decision spaces.
1. Discrete Formulation and Continuous Relaxation
The canonical setting involves a decision variable constrained to be one-hot: . The original combinatorial objective (for configuration ) and feasibility constraint are defined over these discrete variables. To permit gradient-based optimization, the discrete space is relaxed to the -simplex:
so that each coordinate represents the “soft” assignment probability. For more general combinatorial decisions, relaxations are used. The mapping from configuration to the relaxed solution 0 is typically parameterized by a neural network 1, with parameters 2.
2. Relaxed Objective and Entry-Wise Concavity
Relaxed objective functions 3 and 4 are constructed to match the original 5 and 6 at all discrete points 7. The relaxed optimization problem is
8
where 9 and 0 is a penalty coefficient.
A key requirement on the relaxed proxies is entry-wise concavity: for any 1, 2 differing only in coordinate 3, and 4,
5
for 6. Entry-wise concavity is a critical structural assumption enabling deterministic rounding guarantees and is strictly weaker than full concavity (Wang et al., 2022).
3. Deterministic Rounding and Performance Guarantees
After minimizing the relaxed surrogate to produce a “soft” solution 7 with low 8, a deterministic rounding procedure yields a feasible discrete 9. Rounding proceeds coordinate-wise:
- Fix previously rounded coordinates.
- For coordinate 0, minimize 1 and 2 over 3, keeping other coordinates soft:
4
- Repeat for all 5.
Provided 6 (with 7), 8 and 9 are entry-wise concave, and 0, this rounding algorithm produces a feasible discrete solution 1 with 2 [(Wang et al., 2022), Theorem 3.3]. This result supplies a deterministic quality guarantee not available to sampling-based relaxations.
4. Neural Parametrization and End-to-End Optimization
The mapping 3 and the proxies 4, 5 are parameterized as graph neural networks (GNNs) or multilayer perceptrons (MLPs). For graph-structured inputs, a GNN encodes nodes or edges, followed by MLP “heads” that map to 6 via elementwise sigmoid or softmax activations. Proxies 7 and 8 are constructed as follows:
- Affine proxy (AFF): 9 with 0 containing affine and low-degree monomial features in 1.
- Entry-wise concave proxy (CON): 2 with 3, which is entry-wise concave by construction.
The resulting loss 4 is fully differentiable in both neural parameters 5 and proxy weights, enabling training via backpropagation and Adam. At test time, inference consists of a single neural forward pass, rounding, and output.
5. Comparison with Gumbel-Softmax and RL Baselines
Alternative relaxation approaches include the Gumbel-Softmax trick (“GS-Trick”) and actor-critic RL optimization. In the Gumbel-Softmax method, the softmax output is made approximately discrete by adding sampled Gumbel noise and applying a temperature annealing schedule, but the resulting samples are only stochastically close to being one-hot, and no deterministic guarantee exists for the integral solution quality. RL-based optimization (e.g., policy gradients, actor-critic) suffers from higher sample complexity and lack of guarantees.
Empirical results across benchmarks show that relaxations with entry-wise concave proxies (CON) or affine proxies (AFF), combined with coordinate-wise deterministic rounding, outperform GS-Trick, RL, and naive relaxation in solution quality and convergence speed. For example, in node matching, the final cost (AFF, 418.96) nearly equals the combinatorial optimum (416.01), outperforming Gumbel (429.39) and RL (426.97). In FPGA resource balancing, CON achieves best average rank 2.35 compared to GS-Tr+R (2.87) and RL (3.16), and requires 7 GB vs. 22 GB GPU memory for RL (Wang et al., 2022).
6. Applications and Empirical Results
Relaxed one-hot optimization has been applied to diverse unsupervised combinatorial tasks:
- Feature-based node matching: Grid graphs where edge costs depend on embedded node features (e.g., MNIST digit product). Entry-wise concave relaxations achieve near-optimal matchings.
- FPGA resource balancing: Resource allocation in large data-flow graphs for digital circuit design, using GNN-based proxies for actual resource usage synthesized offline. Entry-wise concave proxy achieves best rank and solution quality.
- Approximate computing unit assignment: Task assignment to approximate hardware with hard constraints (6 out of 7 nodes), minimizing expected computation error. AFF/CON relaxations find solutions within 10–15% of optimal, beating GS-Trick, RL, and naive relaxation baselines.
A summary table of select empirical metrics (from (Wang et al., 2022)):
| Benchmark | Method | Solution Quality |
|---|---|---|
| Node Matching (MNIST grid) | AFF/CON | 418.96 / 422.47 |
| RL | 426.97 | |
| GS-Tr+R | 429.39 | |
| Optimal | 416.01 | |
| FPGA Resource Balancing | CON | Avg. rank 2.35 |
| RL | Avg. rank 3.16 | |
| GS-Tr+R | Avg. rank 2.87 | |
| Approximate Computing Assign | AFF/CON | 3.10–10.04 / 3.18–10.17 |
| RL | 7.68–12.83 | |
| GS-Tr+R | 3.24–10.62 | |
| Optimal | 2.77–8.56 |
7. Significance, Limitations, and Extensions
Relaxed one-hot optimization enables tractable unsupervised or differentiable learning for combinatorial decision problems where objectives and constraints may not be explicitly known or are evaluated via expensive black-box processes. The deterministic guarantee afforded by entry-wise concavity of proxies represents a significant theoretical advance, generalizing prior frameworks based on the probabilistic method.
A plausible implication is that this methodology will broaden the applicability of neural combinatorial optimization to domains previously limited by sample complexity or lack of deterministic guarantees. Nevertheless, the construction of suitable entry-wise concave proxy functions and the calibration of rounding procedures require problem-specific insight. Future research may investigate automated proxy design and broader classes of combinatorial constraints (Wang et al., 2022).