Dice Question Streamline Icon: https://streamlinehq.com

Design of the candidate threshold set for cascade selection

Determine how to design the candidate threshold set used for cascade threshold selection in BARGAIN-style algorithms, including the choice between uniformly spaced percentiles, non-uniform spacings (e.g., exponentially spaced thresholds), and adaptive, sample-dependent selection, so as to maximize utility while preserving the stated statistical guarantees for accuracy- and precision-target queries.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper instantiates BARGAIN using a candidate threshold set that consists of every j/M-th percentile of proxy scores, denoted by C_M, and analyzes how the parameter M affects utility and sampling cost. The authors note that while this choice works well empirically, other designs are possible and may offer benefits.

Within the same section, the authors discuss alternatives such as non-uniformly spaced candidate thresholds (e.g., exponentially spaced) and even adaptively modifying the candidate set as more samples are observed. Although they mention seeing little empirical benefit in preliminary exploration, they highlight that a comprehensive investigation into how to construct and adapt the candidate threshold set could further improve utility without compromising guarantees.

References

We leave an in depth study of how the candidate threshold set should be designed to the future work.

Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees (2509.02896 - Zeighami et al., 2 Sep 2025) in Appendix, Section "Impact of Candidate Set" (\section{Impact of Candidate Set}, label: sec:candidate_set)