Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering (2306.06779v1)
Abstract: In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work. Experiments on six datasets of extractive question answering (QA) show that the dueling framework using Co-UCB is more effective than other strong baselines for our studied problem.
- Rajeev Agrawal. 1995. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054–1078.
- Unsupervised multi-source domain adaptation without access to source data. In CVPR.
- Finite-time analysis of the multiarmed bandit problem. Mach. Learn.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Deep reinforcement learning from human preferences. In NeurlPS.
- Unsupervised cross-lingual representation learning at scale. In ACL.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- SearchQA: A new q&a dataset augmented with context from a search engine. ArXiv preprint, abs/1704.05179.
- NL-EDIT: Correcting semantic parse errors through natural language interaction. In NAACL.
- MRQA 2019 shared task: Evaluating generalization in reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering.
- Simulating bandit learning from user feedback for extractive question answering. In ACL.
- APRIL: interactively learning to summarise by combining active preference learning and reinforcement learning. In EMNLP.
- Multi-source domain adaptation for text classification via distancenet-bandits. In AAAI.
- Multi-source domain adaptation with mixture of experts. In EMNLP.
- Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(9):5149–5169.
- Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning,ICML.
- XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In ICML.
- Yusuke Iwasawa and Yutaka Matsuo. 2021. Test-time classifier adjustment module for model-agnostic domain generalization. In NeurIPS.
- SpanBERT: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguistics, 8:64–77.
- TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In ACL.
- Learning a cost-effective annotation policy for question answering. In EMNLP.
- Julia Kreutzer and Stefan Riezler. 2019. Self-regulated interactive sequence-to-sequence learning. In ACL.
- Volodymyr Kuleshov and Doina Precup. 2014. Algorithms for multi-armed bandit problems. arXiv preprint arXiv, abs/1402.6028.
- Universal source-free domain adaptation. In CVPR.
- Natural Questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics.
- Carolin Lawrence and Stefan Riezler. 2018. Improving a neural semantic parser by counterfactual learning from human bandit feedback. In ACL.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv, abs/2107.13586.
- Online learning meets machine translation evaluation: Finding the best systems with the least human effort. In ACL/IJCNLP.
- Reinforcement learning for bandit neural machine translation with simulated human feedback. In EMNLP.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- Training language models to follow instructions with human feedback. In NeurIPS.
- Adapterfusion: Non-destructive task composition for transfer learning. In EACL.
- SQuAD: 100,000+ questions for machine comprehension of text. In EMNLP.
- Alan Ramponi and Barbara Plank. 2020. Neural unsupervised domain adaptation in NLP—A survey. In COLING.
- Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. CoRR, abs/2208.03188.
- Learning to summarize with human feedback. In NeurIPS .
- Multi-dueling bandits with dependent arms. In UAI.
- Advancements in dueling bandits. In IJCAI.
- Test-time training with self-supervision for generalization under distribution shifts. In ICML.
- NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP.
- Tent: Fully test-time adaptation by entropy minimization. In ICLR.
- Efficient test time adapter ensembling for low-resource language varieties. In Findings of the ACL: EMNLP.
- Measure and improve robustness in NLP models: A survey. ArXiv preprint, abs/2112.08313.
- Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771.
- HotpotQA: A dataset for diverse, explainable multi-hop question answering. In EMNLP.
- An imitation game for learning semantic parsers from user interaction. In EMNLP.
- Robust question answering against distribution shifts with test-time adaptation: An empirical study. In Findings of the ACL: EMNLP 2022.
- Feature adaptation of pre-trained language models across languages and domains with robust self-training. In EMNLP.
- The k-armed dueling bandits problem. In COLT.
- Relative upper confidence bound for the k-armed dueling bandit problem. In ICML.