Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 88 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 73 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Kimi K2 190 tok/s Pro

2000 character limit reached

Re-evaluating LLM-based Heuristic Search: A Case Study on the 3D Packing Problem (2509.02297v1)

Published 2 Sep 2025 in cs.AI

Abstract: The art of heuristic design has traditionally been a human pursuit. While LLMs can generate code for search heuristics, their application has largely been confined to adjusting simple functions within human-crafted frameworks, leaving their capacity for broader innovation an open question. To investigate this, we tasked an LLM with building a complete solver for the constrained 3D Packing Problem. Direct code generation quickly proved fragile, prompting us to introduce two supports: constraint scaffolding--prewritten constraint-checking code--and iterative self-correction--additional refinement cycles to repair bugs and produce a viable initial population. Notably, even within a vast search space in a greedy process, the LLM concentrated its efforts almost exclusively on refining the scoring function. This suggests that the emphasis on scoring functions in prior work may reflect not a principled strategy, but rather a natural limitation of LLM capabilities. The resulting heuristic was comparable to a human-designed greedy algorithm, and when its scoring function was integrated into a human-crafted metaheuristic, its performance rivaled established solvers, though its effectiveness waned as constraints tightened. Our findings highlight two major barriers to automated heuristic design with current LLMs: the engineering required to mitigate their fragility in complex reasoning tasks, and the influence of pretrained biases, which can prematurely narrow the search for novel solutions.

Collections

Summary

The paper demonstrates that integrating constraint scaffolding and iterative self-correction enables LLMs to generate effective scoring functions for the 3D Packing Problem.
It shows that the evolutionary process transitions from simple heuristics to refined scoring functions, though it remains confined to modular mathematical expressions.
The study highlights significant limitations in LLM autonomy, noting the need for manual interventions and the model's bias toward component-level optimization.

Re-evaluating LLM-based Heuristic Search: A Case Study on the 3D Packing Problem

Introduction

This paper presents a rigorous investigation into the capabilities and limitations of LLMs for automated heuristic design, focusing on the constrained 3D Packing Problem. Unlike prior work that leverages LLMs for component-level optimization within established algorithmic frameworks, this paper probes the feasibility of end-to-end solver generation in a "knowledge-poor" domain characterized by high-dimensional geometry and complex constraints. The authors identify critical barriers to LLM-driven heuristic search, propose engineering interventions to mitigate these issues, and empirically evaluate the resulting heuristics against state-of-the-art human-designed methods.

Problem Formulation and Methodological Framework

The 3D Packing Problem addressed is the Input Minimization variant, where the objective is to pack a set of items into the minimum number of containers, subject to geometric and real-world constraints such as incompatibility and vertical stability. The solution space is combinatorially large, with each item requiring assignment of container, orientation, and continuous placement coordinates, while satisfying non-overlap and support constraints.

The Evolution of Heuristics (EoH) framework is employed as the search paradigm. EoH iteratively evolves a population of candidate heuristics, represented as (thought, code) pairs, using an LLM as a mutation operator. Five prompt strategies (exploration and modification operators) guide the generation of new heuristics, which are evaluated for fitness and feasibility.

Engineering Interventions: Constraint Scaffolding and Iterative Self-Correction

Direct application of EoH to the 3D Packing Problem proved infeasible due to the fragility of LLM-generated code. The authors document a high incidence of logical errors, constraint violations, and computational inefficiency, which severely impede population initialization and evolutionary progress.

To address these challenges, two interventions are introduced:

Constraint Scaffolding: A verified API encapsulates all geometric and physical constraint checks, abstracting away low-level logic from the LLM. The LLM is tasked only with orchestrating calls to this API, focusing its generative capacity on high-level strategy.
Iterative Self-Correction: Diagnostic feedback from failed candidate programs (syntax errors, constraint violations, timeouts) is appended to the prompt, enabling the LLM to repair faults over multiple refinement cycles. This process systematically increases the proportion of feasible and efficient heuristics in the population.

Empirical validation demonstrates that constraint scaffolding dramatically reduces code errors, while iterative self-correction further increases the success rate by resolving inefficiencies and residual logical faults.

Heuristic Discovery and Evolutionary Dynamics

With the interventions in place, the evolutionary process successfully discovers functional packing heuristics. Analysis of the evolutionary trajectory reveals that the LLM's optimization is almost exclusively concentrated on the scoring function used to guide item selection and placement, rather than inventing novel algorithmic structures.

Figure 1: The evolutionary trajectory for one run, plotting the fitness of the best-performing heuristic in each generation evaluated on training datasets.

The population evolves from simple "largest-item-first" heuristics to more sophisticated scoring functions that incorporate volume utilization, item quantity, cubeness, adjacency, and placement efficiency. However, the search space remains narrowly focused on modular mathematical expressions, with no emergence of advanced procedural logic such as wall-building or layer-based packing.

Performance Evaluation and Component Transplantation

The best LLM-discovered heuristic, operating within a greedy framework, achieves performance comparable to human-designed greedy algorithms (e.g., S-GRASP), but lags behind advanced metaheuristics and exact solvers. Notably, the scoring function itself is identified as a high-quality component. When transplanted into a two-stage metaheuristic framework (Randomized Search + Set Partitioning), the LLM-generated scoring function enables near-optimal performance, rivaling state-of-the-art methods on unconstrained benchmarks.

However, as additional constraints (load stability, item separation) are introduced, the effectiveness of the LLM-discovered heuristic diminishes. The scoring function alone is insufficient to navigate the complex trade-offs required by real-world constraints, resulting in a widening performance gap relative to leading solvers.

Limitations and Implications

The paper highlights two major limitations of current LLM-based heuristic search:

Engineering Overhead: Significant manual intervention is required to scaffold constraints and enable robust code generation. Autonomous solver synthesis remains out of reach for current LLMs in complex domains.
Pretrained Biases: LLMs exhibit a strong bias toward refining modular scoring functions, with limited capacity for inventing novel algorithmic paradigms or procedural logic.

These findings have direct implications for the future of automated algorithm design. Progress will require either advances in LLM capabilities (e.g., fine-tuning for domain-specific reasoning) or reframing the generation task to leverage LLM strengths (e.g., formula generation for dedicated solvers). The generalizability of these results to other combinatorial optimization problems and more powerful LLMs remains an open question, constrained by practical issues of model accessibility and prompt sensitivity.

Future Directions

Potential avenues for future research include:

Model Fine-Tuning: Training LLMs on domain-specific data to improve their handling of geometric and logical constraints.
Hybrid Architectures: Integrating LLMs with symbolic solvers or physics engines to offload deterministic reasoning.
Prompt Engineering: Developing more effective prompt strategies to elicit procedural innovation rather than component-level optimization.
Benchmark Expansion: Systematic evaluation across a broader set of combinatorial problems and LLM architectures.

Conclusion

This work provides a detailed empirical and methodological analysis of LLM-based heuristic search in the context of the 3D Packing Problem. The authors demonstrate that, with appropriate engineering interventions, LLMs can discover competitive scoring functions for greedy algorithms, which, when integrated into advanced metaheuristics, yield strong performance on unconstrained benchmarks. However, the approach is fundamentally limited by the fragility of LLM-generated code and the model's bias toward component-level optimization. Overcoming these barriers will be essential for realizing the full potential of automated algorithm design in complex, real-world domains.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (3)

YouTube

Show All Videos

alphaXiv

Re-evaluating LLM-based Heuristic Search: A Case Study on the 3D Packing Problem (6 likes, 0 questions)