Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 172 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 42 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Compositional Energy Minimization

Updated 27 October 2025
  • Compositional energy minimization is a framework that decomposes complex reasoning tasks into modular, energy-based subproblems to enable scalable solution finding.
  • It constructs a global energy landscape by summing locally trained energy functions and applies gradient-based and particle-based optimization methods.
  • The approach demonstrates robust generalization across CSPs and reasoning tasks by seamlessly integrating additional constraints during inference.

Compositional energy minimization refers to a paradigm in which complex problems are decomposed into simpler subproblems, with each subproblem admitting its own energy-based model (EBM) or energy function. Solutions to the overall problem are obtained by constructing a global energy landscape as the composition—typically a sum—of the subproblem energy functions and then performing joint minimization. This approach addresses the challenge of generalization in reasoning tasks, enabling models to solve problems of greater complexity than those encountered during training by leveraging modularity and the structure of the underlying solution space (Oarga et al., 23 Oct 2025).

1. Foundations: Compositional Subproblem Decomposition and Energy Landscapes

Compositional energy minimization is characterized by learning energy functions over the solution spaces of tractable subproblems. In this framework, a complex reasoning task or combinatorial constraint satisfaction problem (CSP) is decomposed—typically via problem-specific structure—into smaller subproblems (e.g., rows or clauses in N-Queens or 3-SAT, edges in graph coloring). Each subproblem is associated with its own energy function, often parameterized as Eₜhetaᵏ(x_k, y_k), where xₖ encodes the subproblem’s local context and yₖ its proposed solution.

A global energy landscape over a candidate solution y (potentially high-dimensional) is then constructed by aggregating these subproblem energies:

y^=argminyk=1NEθk(xk,yk)ŷ = \operatorname*{argmin}_y \sum_{k=1}^{N} E_\theta^k(x_k, y_k)

An energy landscape, in this context, is a mapping from candidate (potentially partial or complete) solutions to scalar energy values, where the goal of optimization is to find a solution minimizing total energy. Valid solutions—those that satisfy all constraints—are located at global minima of this landscape. During inference, gradient-based or particle-based methods are used to traverse this landscape, and the compositional architecture allows for plug-and-play addition of constraints by incorporating further energy terms.

2. Methodology: Global Energy Landscape Construction and Parallel Minimization

Instead of end-to-end-training a monolithic EBM over the full input, the compositional approach individually trains energy functions on subproblems. At test time, these are composed to yield an energy function over the global solution, which may encode substantially more elaborate constraints or extra structure compared to training.

The composition typically uses additive aggregation:

Eglobal(x,y)=k=1NEθk(xk,yk)E_{\text{global}}(x, y) = \sum_{k=1}^{N} E_\theta^k(x_k, y_k)

Optimization is carried out using a gradient-based update:

yt=yt1λyEglobal(x,yt1)y^{t} = y^{t-1} - \lambda \, \nabla_y E_{\text{global}}(x, y^{t-1})

where the step size λ may vary and batch or stochastic variants may be used. Because the resulting energy landscape may be highly nonconvex, the paper proposes Parallel Energy Minimization (PEM): a particle-based sampling procedure where a population of P solution candidates ("particles") evolve in parallel.

Each particle is updated by:

  • Resampling based on softmax-weighted energies (selecting low-energy candidates)
  • Injection of scheduled Gaussian noise to foster exploration (preventing premature convergence)
  • Gradient descent updates to lower energy further.

This mitigates the well-known challenge of local minima and enables broader coverage of the high-dimensional solution landscape.

3. Incorporation of Additional Constraints and Generalization

A salient feature of compositional energy minimization is extensibility: new constraints can be directly incorporated during inference by adding new energy terms. Each subproblem (e.g., enforcing column or diagonal constraints in N-Queens, or cross-grid alignment in crosswords) instantiates its own energy, enabling the global minimization to adapt to more complex or application-specific scenarios without retraining the underlying subproblem EBMs.

This modular structure also supports generalization—models trained on smaller or simpler instances (a row, a clause, etc.) can be composed and scaled up to solve larger instances (an entire board, a multi-clause SAT, or more complex grids) never seen during training. The approach relies crucially on the capacity of the energy functions to recognize and appropriately score valid partial solutions as low-energy regardless of context, and the ability of the composition scheme to combine the constraints smoothly.

4. Empirical Evaluation and Comparative Performance

Evaluation is performed across a spectrum of CSPs and reasoning tasks, including:

  • N-Queens (placement of non-attacking queens)
  • 3-SAT (Boolean satisfiability)
  • Graph coloring (color assignment avoiding conflicts)
  • Crossword puzzle completion (combining semantic and grid constraints)

The methodology involves training energy models for individual subproblems (e.g., one row in N-Queens, one clause in SAT) and, at test, composing them to encode the constraints of the full problem instance. Baselines include reinforcement learning models, GFlowNets, combinatorial optimization via diffusion (DIFUSCO, Fast T2T), and neural SAT solvers such as NeuroSAT and NSNet.

The compositional PEM approach yields markedly improved results across all considered domains. For instance, on the 8-Queens problem, it achieves 97% valid solution rate in sampling (P = 1024), substantially more than alternative methods. In 3-SAT, it produces more complete satisfying assignments than neural or diffusion-based benchmarks. Similar improvements are observed in graph coloring (lower edge conflicts) and crosswords (completion rate), with ablation studies confirming that both compositionality and particle-based sampling are instrumental for robust generalization.

Task Compositional PEM Best Baseline
8-Queens Valid Rate 97% Significantly lower
3-SAT Complete Assignments Highest among methods Lower
Graph Coloring (edge conflicts) Fewest More
Crossword Grid Completion Competitive to ToT Comparable or lower

PEM’s resampling and noise injection were found essential for escaping local minima—highlighted by ablation.

5. Mathematical Formulation and Training Objectives

The compositional EBM is trained with a diffusion-inspired loss:

LMSE(θ)=Ey,ϵN(0,I)[ϵ+σtyEθ(y,t)2]\mathcal{L}_{\mathrm{MSE}}(\theta) = \mathbb{E}_{y, \epsilon \sim \mathcal{N}(0, I)} \left[\left\| \epsilon + \sigma_t \nabla_y E_{\theta}(y^*, t) \right\|^2 \right]

with

y=1σty+σtϵy^* = \sqrt{1 - \sigma_t}\, y + \sigma_t \epsilon

A contrastive term is included, comparing positive and negative samples to sharpen the landscape and facilitate energy shaping around true solutions.

At inference, the sum-based composition arises naturally:

y^=argminyk=1NEθk(xk,yk)\hat{y} = \operatorname*{argmin}_y \sum_{k=1}^{N} E_\theta^k(x_k, y_k)

Parallel Energy Minimization is formalized by particle-based updates coupled with noise and periodic resampling informed by softmax(-E_θ), promoting effective exploration.

6. Implications, Advantages, and Future Research Directions

The compositional energy minimization framework supports modularity, interpretability, and extensibility. By breaking global reasoning into localized, tractable modeling, it enables transfer to larger and harder instances, flexible addition of constraints, and robust generalization performance outside the training regime.

Open research directions include refinement of energy landscape shaping (for improved extrapolation to unseen solutions), exploration of non-Gaussian sampling or alternative noise schedules, integration with neural-symbolic and other structured reasoning systems, and application to broader domains such as transfer learning, adaptive CSPs, and general-purpose reasoning.

The approach also highlights the contemporary viability of energy-based models—historically challenging to train for large-scale structured prediction—for combinatorial generalization, provided suitable compositional design and inference schemes are employed. Further work on sampling, optimization, and energy composition mechanisms is suggested for scaling to even more intricate task families.

7. Summary

Compositional energy minimization, as formalized in (Oarga et al., 23 Oct 2025), advances the state of reasoning generalization by leveraging modular energy functions trained on tractable subproblems and composing them at inference to address more complex, constraint-rich tasks. Through parallel energy minimization and robust training losses, this framework demonstrates not only superior generalization but also practical extensibility to new constraints and task scales, thereby contributing an adaptable architecture for structured reasoning and combinatorial tasks in machine learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Compositional Energy Minimization.