Entropy Search for Information-Efficient Global Optimization (1112.1217v1)

Published 6 Dec 2011 in stat.ML and cs.AI

Abstract: Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.

Authors (2)

Philipp Hennig (115 papers)
Christian J. Schuler (4 papers)

Citations (648)

View on Semantic Scholar

Summary

Entropy Search for Information-Efficient Global Optimization

This paper introduces a novel probabilistic approach to global optimization, entitled Entropy Search, which contrasts with traditional methodologies by framing the optimization problem as inference on the minimum's location. This distinguishes it from prevalent techniques like response surface optimization or heuristic algorithms, providing a more theoretically grounded framework.

Problem Statement and Theoretical Foundation

The authors define global optimization as the task of reducing the expected loss after a finite number of evaluations. They highlight a significant gap in traditional algorithms that operate on local utility measures and often do not leverage a probabilistic understanding of the function's extremum. The problem is formalized within a Gaussian process (GP) framework, providing a probabilistic measure over the space of functions. This measure is used to infer the probability distribution over the location of the function's minimum.

Algorithmic Contributions

Entropy Search is developed through a series of approximations and innovative algorithmic steps:

Gaussian Process Prior: The paper utilizes Gaussian processes due to their flexibility and tractable analytic properties. This allows for the construction of a probability measure over functions and makes predictions regarding changes resulting from future observations.
Discretization of Function Space: To manage the intrinsic complexity of infinite-dimensional function spaces, the algorithm discretizes the problem into a finite grid of representer points sampled from a tailored distribution, such as probability of improvement. This non-uniform sampling is key for handling the curse of dimensionality efficiently.
Approximation of Probability Distribution: Expectation Propagation (EP) is employed to approximate the posterior distribution of the minimum's location, facilitating an efficient and differentiable solution. While EP offers computational advantages by allowing analytic differentiation, the authors also explore Monte Carlo integration as a complementary approach, albeit with higher computational requirements.
Predicting Information Gain: The core decision-making is based on maximizing the expected reduction in the relative entropy of the belief about the minimum’s location. The algorithm predicts various beliefs after hypothetical future evaluations and chooses evaluation points that promise maximal information gain.
Greedy Planning Strategy: A greedy approach is adopted to select evaluation points iteratively. Although there are theoretical limitations with the greedy solution in dynamic programming, this method ensures robustness against 'dead ends' in inference problems.

Empirical Evaluation

The paper demonstrates Entropy Search's performance across several test cases compared to existing Gaussian process-based methods (Expected Improvement, Probability of Improvement) and continuous bandit algorithms (GP-UCB). It shows superior performance in minimizing the error in estimated function values and the Euclidean distance to the true global minimum, especially in noisily evaluated functions.

The method is also tested against functions sampled out-of-model to explore robustness against model mismatch. The results confirm its stability and flexibility, proving its adeptness even when the test functions diverge from those sampled from the GP prior assumed by the optimizer.

Implications and Future Directions

The implications of this research are significant both theoretically and practically. By optimizing information acquisition, Entropy Search offers a compelling alternative to traditional methods, especially in contextually expensive evaluation scenarios. It aligns closely with experimental design and adaptive exploration, areas rich with potential for future research.

Future work could explore non-Gaussian priors or loss functions beyond relative entropy, addressing potential shortcomings related to diverse real-world applications. The integration of hierarchical Bayesian models to learn hyperparameters could further enhance its adaptability.

In summary, this paper delivers a comprehensive framework and practical algorithm for probabilistic global optimization, expanding the domain’s methodological toolkit and inviting further exploration into probabilistic decision-making in optimization contexts.

PDF Markdown

Related Papers

Find Related Papers