Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection

Published 3 Feb 2024 in stat.ME and cs.LG | (2402.02196v4)

Abstract: This work seeks to break the sample efficiency bottleneck in parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used "divide and conquer" framework in parallel computing by adding a correlation-based clustering step, transforming it into "clustering and conquer". This seemingly simple modification achieves the optimal sample complexity reduction for a widely used class of efficient large-scale R&S procedures. Our approach enjoys two key advantages: 1) it does not require highly accurate correlation estimation or precise clustering, and 2) it allows for seamless integration with various existing R&S procedures, while achieving optimal sample complexity. Theoretically, we develop a novel gradient analysis framework to analyze sample efficiency and guide the design of large-scale R&S procedures. We also introduce a new parallel clustering algorithm tailored for large-scale scenarios. Finally, in large-scale AI applications such as neural architecture search, our methods demonstrate superior performance.