Determinantal Beam Search (2106.07400v4)
Abstract: Beam search is a go-to strategy for decoding neural sequence models. The algorithm can naturally be viewed as a subset optimization problem, albeit one where the corresponding set function does not reflect interactions between candidates. Empirically, this leads to sets often exhibiting high overlap, e.g., strings may differ by only a single word. Yet in use-cases that call for multiple solutions, a diverse or representative set is often desired. To address this issue, we propose a reformulation of beam search, which we call determinantal beam search. Determinantal beam search has a natural relationship to determinantal point processes (DPPs), models over sets that inherently encode intra-set interactions. By posing iterations in beam search as a series of subdeterminant maximization problems, we can turn the algorithm into a diverse subset selection process. In a case study, we use the string subsequence kernel to explicitly encourage n-gram coverage in text generated from a sequence model. We observe that our algorithm offers competitive performance against other diverse set generation strategies in the context of language generation, while providing a more general approach to optimizing for diversity.
- Diverse near neighbor problem. In Proceedings of the Twenty-Ninth Annual Symposium on Computational Geometry. Association for Computing Machinery.
- Approximating extent measures of points. Journal of the Association for Computing Machinery, 51(4):606–635.
- Findings of the 2019 conference on machine translation. In Proceedings of the Fourth Conference on Machine Translation, pages 1–61. Association for Computational Linguistics.
- Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 12–58. Association for Computational Linguistics.
- Fast greedy map inference for determinantal point process to improve recommendation diversity. In Advances in Neural Information Processing Systems, pages 5622–5633.
- Subdeterminant maximization via nonconvex relaxations and anti-concentration. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 1020–1031. IEEE Computer Society.
- Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 489–500. Association for Computational Linguistics.
- Bryan Eikema and Wilker Aziz. 2020. Is MAP decoding all you need? The inadequacy of the mode in neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4506–4520. International Committee on Computational Linguistics.
- Efficient approximation algorithms for strings kernel based sequence classification. In Advances in Neural Information Processing Systems, volume 30, pages 6935–6945. Curran Associates, Inc.
- Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 1243–1252.
- Near-optimal MAP inference for determinantal point processes. In Advances in Neural Information Processing Systems, volume 25, pages 2735–2743. Curran Associates, Inc.
- Faster greedy MAP inference for determinantal point processes. volume 70 of Proceedings of Machine Learning Research, pages 1384–1393.
- The curious case of neural text degeneration. In Proceedings of the International Conference on Learning Representations.
- Composable core-sets for diversity and coverage maximization. In Proceedings of the Thirty-Third Association for Computing Machinery SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, page 100–108. Association for Computing Machinery.
- Largest j𝑗jitalic_j-simplices in n𝑛nitalic_n-polytopes. Discrete and Computational Geometry, 13(3-4):477–516.
- An exact algorithm for maximum entropy sampling. Operations Research, 43(4):684–691.
- Stochastic beams and where to find them: The Gumbel-top-k𝑘kitalic_k trick for sampling sequences without replacement. In Proceedings of the International Conference on Machine Learning, pages 3499–3508.
- Alex Kulesza and Ben Taskar. 2011. k𝑘kitalic_k-DPPs: fixed-size determinantal point processes. In Proceedings of the 28th International Conference on Machine Learning, pages 1193–1200.
- Alex Kulesza and Ben Taskar. 2012. Determinantal Point Processes for Machine Learning. Now Publishers Inc.
- Efficient sampling for k𝑘kitalic_k-determinantal point processes. volume 51 of Proceedings of Machine Learning Research, pages 1328–1337.
- Zhifei Li and Jason Eisner. 2009. First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 40–51. Association for Computational Linguistics.
- Text classification using string kernels. Journal of Machine Learning Research, 2:419–444.
- If beam search is the answer, what was the question? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2173–2185. Association for Computational Linguistics.
- Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems, volume 26, pages 2049–2057.
- Facebook FAIR’s WMT19 news translation task submission. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 314–319. Association for Computational Linguistics.
- fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53. Association for Computational Linguistics.
- Juho Rousu and John Shawe-Taylor. 2005. Efficient computation of gapped substring kernels on large alphabets. Journal of Machine Learning Research, 6:1323–1344.
- Multiresolution recurrent neural networks: An application to dialogue response generation. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, page 3288–3294. AAAI Press.
- Incremental sampling without replacement for sequence models. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8785–8795. PMLR.
- Memory efficient kernel approximation. Journal of Machine Learning Research, 18(20):1–32.
- Felix Stahlberg and Bill Byrne. 2019. On NMT search errors and model errors: Cat got your tongue? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3356–3362. Association for Computational Linguistics.
- Yik-Cheung Tam. 2020. Cluster-based beam search for pointer-generator chatbot grounded by knowledge. Computer Speech and Language, 64:101094.
- Diverse beam search for improved description of complex scenes. In AAAI Conference on Artificial Intelligence, pages 7371–7379.
- Qingzhong Wang and Antoni B. Chan. 2019. Towards diverse and accurate image captions via reinforcing determinantal point process. CoRR, abs/1908.04919.
- COD3S: Diverse generation with discrete semantic signatures. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5199–5211. Association for Computational Linguistics.
- Google’s neural machine translation system: Bridging the gap between human and machine translation.
- XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, volume 32.
- Clara Meister (39 papers)
- Martina Forster (4 papers)
- Ryan Cotterell (226 papers)