Hypervolume Subset Selection (HSS)
- Hypervolume Subset Selection is a method that selects a representative subset of nondominated solutions to maximize the hypervolume indicator, ensuring strict Pareto compliance and practical efficiency.
- Algorithms utilize greedy, lazy greedy, and advanced approximations—including deep learning surrogates—to compute hypervolume contributions effectively in multi-objective optimization.
- Applications extend from evolutionary algorithm population management to skyline queries, with submodularity and computational strategies supporting near-optimal performance.
Hypervolume Subset Selection (HSS) is a central problem in multi-objective optimization and evolutionary computation that addresses the task of selecting a representative subset of nondominated solutions from a large archive so as to maximize the hypervolume indicator—the Lebesgue measure of the region of objective space weakly dominated by the selected solutions and bounded by a reference point. The hypervolume indicator uniquely satisfies strict Pareto compliance and submodularity among set quality measures, making HSS both a theoretically rigorous and practically impactful criterion for population management, archiving, and postprocessing in multi-objective algorithms.
1. Definition and Formulation
Given a finite nondominated set and a reference point , the hypervolume indicator is defined as
where is coordinatewise and is the Lebesgue measure. The Hypervolume Subset Selection Problem (HSSP) formalizes the optimization:
The individual hypervolume contribution of is , a key quantity in incremental or decremental selection strategies (Guerreiro et al., 2020).
2. Theoretical Properties and Complexity
The strict monotonicity and submodularity of hypervolume underpin the optimality guarantees and heuristic efficiency in HSS algorithms. Submodularity means marginal gains decrease as the selected set grows:
for and (Guerreiro et al., 2020, Chen et al., 2021). This property ensures that simple greedy inclusion algorithms can attain a -approximation to the optimal subset (Chen et al., 2021).
Computationally, however, HSS exhibits significant complexity barriers:
- NP-hardness: The exact subset selection HSSP is NP-hard in three or more dimensions (Bringmann et al., 2018), even though the two-dimensional case admits efficient (dynamic-programming) solutions (Guerreiro et al., 2020).
- #P-hardness of hypervolume contribution: Even determining the solution with the minimal hypervolume contribution is #P-hard, including all reasonable variants or approximation within a multiplicative factor (unless ) (0812.2636).
This motivates the widespread use of heuristics, greedy schemes, and approximations in practical large-scale, many-objective scenarios.
3. Core Algorithmic Paradigms
HSS algorithms can be categorized as follows:
Greedy Procedures:
- Incremental Inclusion: Iteratively add the candidate with the maximal hypervolume contribution (Guerreiro et al., 2020, Chen et al., 2021).
- Decremental Removal: Remove the least contributor at each step, frequently relying on quickly updated contribution calculations (Guerreiro et al., 2020, Chen et al., 2020).
Lazy Greedy Algorithms:
- Key innovation is the use of the submodular property to avoid recalculating contributions for all candidates, maintaining tentative values in a priority queue and updating only when necessary. This dramatically reduces computation time, especially for large archives (Chen et al., 2020, Chen et al., 2021).
Exact and Approximate Algorithms:
- Exact in low dimensions: for dynamic programming in , for dimension sweep in (HV3D) (Guerreiro et al., 2020).
- Box Decomposition: Partitioning the dominated region into axis-parallel hyperrectangles (HBDA), with incremental () and nonincremental () variants competitive for objectives (Lacour et al., 2015).
- Quick Hypervolume/Improved (QHV/QHV-II): Divide-and-conquer strategies inspired by Quicksort, nearly linear in practice on uniform spherical/planar data, greatly reducing overhead (Russo et al., 2012, Jaszkiewicz, 2016).
Advanced Approximations:
- R-based Hypervolume Contribution: Direct line-based probing into the exclusive region of each candidate, targeting high accuracy for ranking and subset selection especially in many-objective cases (Shang et al., 2018).
- Learning-based Vector Set Generation: Learning-to-Approximate (LtA) method automatically generates direction vectors for R-based HVC estimation with higher Pearson correlation to the true contributions (Shang et al., 2022).
- Monte Carlo Sampling: Random sampling within bounding boxes for hypervolume contribution, controlling error via Chernoff bounds and adaptive elimination (0812.2636).
- Deep Learning Surrogates: HV-Net (DeepSets architecture (Shang et al., 2022)) and DeepHV (equivariant neural networks (Boelrijk et al., 2022)) provide fast, permutation-invariant hypervolume estimation scalable to higher dimensions, enabling real-time candidate scoring.
4. Mathematical and Geometric Insights
The combinatorial and geometric structure of HSS has been analyzed both in terms of volume distinctness and optimal distribution:
- Distinct Volume Subsets: Any points in contain a subset of (for ) where all -tuples span distinct nonzero volumes, unifying distinct distance and distinct volume problems (Conlon et al., 2014).
- Hypervolume-Optimal -Distributions: In , equispaced solutions are globally optimal on linear fronts, but in optimal distributions can be nonuniform depending on the shape of the Pareto front (line-based or plane-based) and reference point location (Shang et al., 2021). Uniform distributions are only optimal under specific decomposition conditions; otherwise, locally optimal subsets may not reach a global optimum.
These results inform effective solution set design for maximal hypervolume coverage and reveal when simple equidistant selection strategies may fail.
5. Applications and Impact in Optimization
HSS underlies several practical procedures in multi-objective evolutionary algorithms (MOEAs), Bayesian optimization, and decision support:
- Archive Management: Retaining a finite diverse subset from an unbounded external archive, e.g., in postprocessing stages with tens of thousands of solutions, while achieving a good trade-off between coverage and diversity (Chen et al., 2021).
- Indicator-based Selection: Many MOEAs (SMS-EMOA, NSGA-II/III, SPEA2) employ HSS during population update or environmental selection. Efficient computation (e.g., via lazy greedy or deep learning surrogates) is crucial for scaling to many objectives (Guerreiro et al., 2020, Boelrijk et al., 2022).
- Skyline Queries: In database applications, HSS finds compact, representative sets in multicriteria datasets, with direct extensions for "best record" selection in query postprocessing (Bringmann et al., 2018).
Hypervolume-based indicators are unique in their scaling independence and compliance with set dominance, justifying their use as the gold standard for subset selection and Pareto front assessment.
6. Computational Advances and Future Directions
Recent years have seen a convergence of algorithmic and machine learning approaches for fast, scalable HSS:
- Algorithmic Efficiency: Adaptive sampling, pruning, and data structure refinement reduce computational cost in exact and approximate evaluation, permitting application to real-time and high-dimensional optimization problems (0812.2636, Lacour et al., 2015, Jaszkiewicz, 2016).
- Machine Learning Surrogates: Permutation-invariant networks (HV-Net, DeepHV) attain sub-1% mean absolute percentage error for many-objective cases and enable orders-of-magnitude speedup for candidate evaluation (Shang et al., 2022, Boelrijk et al., 2022).
- EHVI Approximation: Gauss–Hermite quadrature substitutes Monte Carlo for Expected Hypervolume Improvement, providing accurate candidate ranking even under predictive density correlation, with direct benefit for Bayesian optimization and subset selection (Rahat et al., 2022).
- Learning-based Direction Sets: LtA (Learning to Approximate) for line-based HVC estimation now guides the selection of direction vectors, yielding higher correct identification rates and subset optimality (Shang et al., 2022).
A plausible implication is that HSS will increasingly rely on hybrid algorithmic–learning approaches as objective dimensionality and candidate archive sizes continue to rise. Methodological advances in fast surrogate evaluations and scalable approximation will remain essential to enable efficient, theoretically well-grounded population management in multi-objective domains.
7. Summary Table: HSS Algorithm Landscape
| Algorithm/Method | Core Idea | Context/Advantages |
|---|---|---|
| Greedy/Incremental | Maximal HVC add | -approximation, scalable for moderate , (Guerreiro et al., 2020, Chen et al., 2021) |
| Lazy Greedy | Priority update | 90% runtime reduction for large archives, (Chen et al., 2020, Chen et al., 2021) |
| Box Decomposition (HBDA) | Partition region | Output-sensitive, competitive for , (Lacour et al., 2015) |
| QHV/QHV-II | Divide-Conquer | Near-linear for uniform data, improved recursion, (Russo et al., 2012, Jaszkiewicz, 2016) |
| MC Sampling | Randomized est. | Bounded error, practical for very high , (0812.2636) |
| R–Based Approximation | Line-based region | Direct HVC, robust for many objectives, (Shang et al., 2018) |
| LtA Direction Selection | Learn direction | Superior identification rates, learned vectors, (Shang et al., 2022) |
| Deep HV Surrogates | Perm.-inv. nets | Sub-percent error, scalable, (Shang et al., 2022, Boelrijk et al., 2022) |
Hypervolume Subset Selection remains a fundamental, complex, and evolving challenge. Continued advances in submodular exploitation, algorithmic efficiency, and adaptive surrogate modeling are driving its scalability and impact in multi-objective evolutionary computation and beyond.