Learning to Perform Local Rewriting for Combinatorial Optimization (1810.00337v5)

Published 30 Sep 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each parameterized by a neural network trained with actor-critic methods in reinforcement learning. NeuRewriter captures the general structure of combinatorial problems and shows strong performance in three versatile tasks: expression simplification, online job scheduling and vehicle routing problems. NeuRewriter outperforms the expression simplification component in Z3; outperforms DeepRM and Google OR-tools in online job scheduling; and outperforms recent neural baselines and Google OR-tools in vehicle routing problems.

Citations (309)

View on Semantic Scholar

Summary

The paper demonstrates that clustering optimal action sequences in MDPs produces effective macro actions for simplifying complex expressions.
Experimental results highlight that RL agents using policy networks outperform rule-based systems in optimal action selection.
The method minimizes computational steps, paving the way for efficient search strategies in tackling complex problem-solving scenarios.

An Overview of "Learn to Simplify Expression" by Xinyun Chen and Yuandong Tian

This paper explores the concept of identifying and leveraging macro actions within Markov Decision Processes (MDPs) to simplify expressions and enhance computational efficiency. The authors propose a framework that highlights the utility of clustering optimal action sequences and subsequently applying these macro actions to solve complex problems with greater speed and efficiency.

Theoretical Foundations

At the core of this paper is the hypothesis that macro actions can be inferred from the clustering of optimal solutions in MDPs. The authors formalize this with a theorem stating that, given an MDP with a defined state set $\mathcal{S}$ , action set $A$ , and dynamical characteristics $p(s'|s, a)$ , the sequence of optimal actions clusters according to a family of reward distributions. This foundation suggests that, contrary to scenarios where actions are uniformly distributed, there exist meaningful macro actions that can be extracted to facilitate problem-solving even in complex environments, such as a 2D maze with multiple rooms.

Experimental Insights

The paper ventures into empirical exploration by generating complicated mathematical expressions, presumably drawn from real-world scenarios, to test their hypotheses. The experiments focused on comparing a rule-based system, specifically the Halide framework, with a reinforcement learning (RL) agent. The results demonstrated that the RL agent using a policy network outperformed the Halide system in selecting optimal actions. Moreover, the performance further improved with a search-based method, although numerical comparisons were not provided.

Extraction and Application of Macro Actions

Through the application of the principle of optimality, the authors detected patterns in action sequences, which they termed as macro actions. These patterns were applied to previously unseen expression simplification tasks to evaluate their generalization capability. The adoption of these macro actions led to a reduction in computational steps required, indicating a more efficient exploration of the search space. The paper documents instances where faster exploration was achieved, allowing the RL agent to tackle more intricate problems defined by deeper search trees.

Implications and Future Directions

The paper explores significant implications for the application of macro actions in enhancing computational efficiency by simplifying complex expressions. By identifying patterns in optimal actions, the research provides a pathway for improving the performance of RL systems beyond what traditional rule-based systems can achieve.

Practical implications extend to fields requiring expression simplification, such as compiler optimization and automated problem-solving systems. The theoretical assertion and practical demonstrations underscore potential future research trajectories including extending these findings to broader classes of MDPs and exploring the automation of macro action extraction using advanced RL methodologies.

In conclusion, "Learn to Simplify Expression" presents an innovative approach to using macro actions, affirming theoretical insights with experimental validation. Its contribution lies in offering a refined lens through which computational complexity in MDPs can be addressed and reduced, paving the way for more efficient and intelligent computation strategies in AI research.

PDF Markdown