Accelerating ERM for data-driven algorithm design using output-sensitive techniques (2204.03569v3)
Abstract: Data-driven algorithm design is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of computationally efficient data-driven algorithms for combinatorial algorithm families with multiple parameters. As one fixes the problem instance and varies the parameters, the "dual" loss function typically has a piecewise-decomposable structure, i.e. is well-behaved except at certain sharp transition boundaries. In this work we initiate the study of techniques to develop efficient ERM learning algorithms for data-driven algorithm design by enumerating the pieces of the sum dual loss functions for a collection of problem instances. The running time of our approach scales with the actual number of pieces that appear as opposed to worst case upper bounds on the number of pieces. Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes using tools from computational geometry, and an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
- Reverse search for enumeration. Discrete applied mathematics, 65(1-3):21–46, 1996.
- Maria-Florina Balcan. Data-Driven Algorithm Design. In Tim Roughgarden, editor, Beyond Worst Case Analysis of Algorithms. Cambridge University Press, 2020.
- Learning revenue maximizing menus of lotteries and two-part tariffs. arXiv preprint arXiv:2302.11700, 2023.
- An analysis of robustness of non-lipschitz networks. Journal of Machine Learning Research (JMLR), 24(98):1–43, 2023.
- How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design. In Symposium on Theory of Computing (STOC), pages 919–932, 2021.
- Learning to link. In International Conference on Learning Representations (ICLR), 2020.
- Learning piecewise Lipschitz functions in changing environments. In International Conference on Artificial Intelligence and Statistics, pages 3567–3577. PMLR, 2020.
- Learning complexity of simulated annealing. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1540–1548. PMLR, 2021.
- Learning to branch. In International Conference on Machine Learning (ICML), pages 344–353. PMLR, 2018.
- Dispersion for data-driven algorithm design, online learning, and private optimization. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 603–614. IEEE, 2018.
- Primal-dual methods for vertex and facet enumeration (preliminary version). In Symposium on Computational Geometry (SoCG), pages 49–56, 1997.
- Generalization bounds for data-driven numerical linear algebra. In Conference on Learning Theory (COLT), pages 2013–2040. PMLR, 2022.
- Provably tuning the ElasticNet across instances. Advances in Neural Information Processing Systems, 35:27769–27782, 2022.
- Learning-theoretic foundations of algorithm configuration for combinatorial partitioning problems. In Conference on Learning Theory (COLT), pages 213–274. PMLR, 2017.
- Efficient algorithms for learning revenue-maximizing two-part tariffs. In International Joint Conferences on Artificial Intelligence (IJCAI), pages 332–338, 2020.
- Sample complexity of tree search configuration: Cutting planes and beyond. Advances in Neural Information Processing Systems, 2021.
- Data driven semi-supervised learning. Advances in Neural Information Processing Systems, 34, 2021.
- Sample complexity of automated mechanism design. Advances in Neural Information Processing Systems, 29, 2016.
- A general theory of sample complexity for multi-item profit maximization. In Economics and Computation (EC), pages 173–174, 2018.
- Robert Creighton Buck. Partition of space. The American Mathematical Monthly, 50(9):541–544, 1943.
- Online optimization of smoothed piecewise constant functions. In Artificial Intelligence and Statistics, pages 412–420. PMLR, 2017.
- Computational Molecular Biology: An Introduction. John Wiley Chichester; New York, 2000.
- An optimal algorithm for intersecting line segments in the plane. Journal of the ACM (JACM), 39(1):1–54, 1992.
- Timothy M Chan. Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete & Computational Geometry, 16(4):361–368, 1996.
- Timothy M Chan. Improved deterministic algorithms for linear programming in low dimensions. ACM Transactions on Algorithms (TALG), 14(3):1–10, 2018.
- Kenneth L Clarkson. More output-sensitive geometric algorithms. In Symposium on Foundations of Computer Science (FOCS), pages 695–702. IEEE, 1994.
- On linear-time deterministic algorithms for optimization problems in fixed dimension. Journal of Algorithms, 21(3):579–597, 1996.
- Parameterized complexity. Springer Science & Business Media, 2012.
- Topologically sweeping an arrangement. In Symposium on Theory of Computing (STOC), pages 389–403, 1986.
- Henning Fernau. On parameterized enumeration. In International Computing and Combinatorics Conference, pages 564–573. Springer, 2002.
- Algorithmic enumeration: Output-sensitive, input-sensitive, parameterized, approximative (dagstuhl seminar 18421). In Dagstuhl Reports, volume 8. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
- Parametric optimization of sequence alignment. Algorithmica, 12(4):312–326, 1994.
- A PAC approach to application-specific algorithm selection. In Innovations in Theoretical Computer Science Conference (ITCS), 2016.
- Data-driven algorithm design. Communications of the ACM, 63(6):87–94, 2020.
- Parametric and inverse-parametric sequence alignment with XPARAL. Methods in Enzymology, 266:481–494, 1996.
- Simple and fast inverse alignment. In Annual International Conference on Research in Computational Molecular Biology, pages 441–455. Springer, 2006.
- Aligning protein sequences with predicted secondary structure. Journal of Computational Biology, 17(3):561–580, 2010.
- Active metric learning for supervised classification. Computers & Chemical Engineering, 144:107132, 2021.
- Algorithm design. Pearson Education India, 2006.
- W Arthur Lewis. The two-part tariff. Economica, 8(31):249–270, 1941.
- Nimrod Megiddo. Combinatorial optimization with rational objective functions. In Symposium on Theory of Computing (STOC), pages 1–12, 1978.
- On the pseudo-dimension of nearly optimal auctions. Advances in Neural Information Processing Systems, 28, 2015.
- Shin-ichi Nakano. Efficient generation of triconnected plane triangulations. Computational Geometry, 27(2):109–122, 2004.
- A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970.
- Walter Y Oi. A disneyland dilemma: Two-part tariffs for a mickey mouse monopoly. The Quarterly Journal of Economics, 85(1):77–96, 1971.
- A new algorithm for enumeration of cells of hyperplane arrangements and a comparison with Avis and Fukuda’s reverse search. SIAM Journal on Discrete Mathematics, 32(1):455–473, 2018.
- Hierarchical distance metric learning for large margin nearest neighbor classification. International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), 25:1073–1087, 11 2011.
- Raimund Seidel. Small-dimensional linear programming and convex hulls made easy. Discrete & Computational Geometry, 6:423–434, 1991.
- Efficiently learning the graph for semi-supervised learning. The Conference on Uncertainty in Artificial Intelligence (UAI), 2023.
- Nora H Sleumer. Output-sensitive cell enumeration in hyperplane arrangements. Nordic Journal of Computing, 6(2):137–147, 1999.
- Vasilis Syrgkanis. A sample complexity measure with applications to learning optimal auctions. Advances in Neural Information Processing Systems, 30, 2017.
- Michael S Waterman. Mathematical methods for DNA sequences. Boca Raton, FL (USA); CRC Press Inc., 1989.
- Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research (JMLR), 10(2), 2009.
- Haifeng Xu. On the tractability of public persuasion with no externalities. In Symposium on Discrete Algorithms (SODA), pages 2708–2727. SIAM, 2020.