Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
37 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering (2306.06779v1)

Published 11 Jun 2023 in cs.CL

Abstract: In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work. Experiments on six datasets of extractive question answering (QA) show that the dueling framework using Co-UCB is more effective than other strong baselines for our studied problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Rajeev Agrawal. 1995. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054–1078.
  2. Unsupervised multi-source domain adaptation without access to source data. In CVPR.
  3. Finite-time analysis of the multiarmed bandit problem. Mach. Learn.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  5. Deep reinforcement learning from human preferences. In NeurlPS.
  6. Unsupervised cross-lingual representation learning at scale. In ACL.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  8. SearchQA: A new q&a dataset augmented with context from a search engine. ArXiv preprint, abs/1704.05179.
  9. NL-EDIT: Correcting semantic parse errors through natural language interaction. In NAACL.
  10. MRQA 2019 shared task: Evaluating generalization in reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering.
  11. Simulating bandit learning from user feedback for extractive question answering. In ACL.
  12. APRIL: interactively learning to summarise by combining active preference learning and reinforcement learning. In EMNLP.
  13. Multi-source domain adaptation for text classification via distancenet-bandits. In AAAI.
  14. Multi-source domain adaptation with mixture of experts. In EMNLP.
  15. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(9):5149–5169.
  16. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning,ICML.
  17. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In ICML.
  18. Yusuke Iwasawa and Yutaka Matsuo. 2021. Test-time classifier adjustment module for model-agnostic domain generalization. In NeurIPS.
  19. SpanBERT: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguistics, 8:64–77.
  20. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In ACL.
  21. Learning a cost-effective annotation policy for question answering. In EMNLP.
  22. Julia Kreutzer and Stefan Riezler. 2019. Self-regulated interactive sequence-to-sequence learning. In ACL.
  23. Volodymyr Kuleshov and Doina Precup. 2014. Algorithms for multi-armed bandit problems. arXiv preprint arXiv, abs/1402.6028.
  24. Universal source-free domain adaptation. In CVPR.
  25. Natural Questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics.
  26. Carolin Lawrence and Stefan Riezler. 2018. Improving a neural semantic parser by counterfactual learning from human bandit feedback. In ACL.
  27. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv, abs/2107.13586.
  28. Online learning meets machine translation evaluation: Finding the best systems with the least human effort. In ACL/IJCNLP.
  29. Reinforcement learning for bandit neural machine translation with simulated human feedback. In EMNLP.
  30. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  31. Training language models to follow instructions with human feedback. In NeurIPS.
  32. Adapterfusion: Non-destructive task composition for transfer learning. In EACL.
  33. SQuAD: 100,000+ questions for machine comprehension of text. In EMNLP.
  34. Alan Ramponi and Barbara Plank. 2020. Neural unsupervised domain adaptation in NLP—A survey. In COLING.
  35. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. CoRR, abs/2208.03188.
  36. Learning to summarize with human feedback. In NeurIPS .
  37. Multi-dueling bandits with dependent arms. In UAI.
  38. Advancements in dueling bandits. In IJCAI.
  39. Test-time training with self-supervision for generalization under distribution shifts. In ICML.
  40. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP.
  41. Tent: Fully test-time adaptation by entropy minimization. In ICLR.
  42. Efficient test time adapter ensembling for low-resource language varieties. In Findings of the ACL: EMNLP.
  43. Measure and improve robustness in NLP models: A survey. ArXiv preprint, abs/2112.08313.
  44. Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771.
  45. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In EMNLP.
  46. An imitation game for learning semantic parsers from user interaction. In EMNLP.
  47. Robust question answering against distribution shifts with test-time adaptation: An empirical study. In Findings of the ACL: EMNLP 2022.
  48. Feature adaptation of pre-trained language models across languages and domains with robust self-training. In EMNLP.
  49. The k-armed dueling bandits problem. In COLT.
  50. Relative upper confidence bound for the k-armed dueling bandit problem. In ICML.
Citations (7)

Summary

We haven't generated a summary for this paper yet.