Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Multitask Retrieval by Promoting Task Specialization (2307.00342v1)

Published 1 Jul 2023 in cs.CL and cs.IR

Abstract: In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model (one that is explicitly optimized for multitasking) along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Task-aware retrieval with instructions. arXiv preprint arXiv:2211.09260.
  2. Autoregressive search engines: Generating substrings as document identifiers. arXiv preprint arXiv:2204.10628.
  3. Signature verification using a" siamese" time delay neural network. Advances in neural information processing systems, 6.
  4. Corpusbrain: Pre-train a generative retrieval model for knowledge-intensive language tasks. arXiv preprint arXiv:2208.07652.
  5. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning, pages 794–803. PMLR.
  6. Autoregressive entity retrieval. In International Conference on Learning Representations.
  7. Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 528–537, Hong Kong, China. Association for Computational Linguistics.
  8. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 2333–2338.
  9. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  10. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019, pages 4171–4186.
  11. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
  12. Martin Josifoski Sebastian Riedel Luke Zettlemoyer Ledell Wu, Fabio Petroni. 2020. Zero-shot entity linking with dense entity retrieval. In EMNLP.
  13. Tabi: Type-aware bi-encoders for open-domain entity retrieval. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2147–2166.
  14. No parameters left behind: Sensitivity guided adaptive learning rate for training large transformer models. In International Conference on Learning Representations.
  15. Super tickets in pre-trained language models: From model compression to improving generalization. arXiv preprint arXiv:2105.12002.
  16. Openmatch: An open source library for neu-ir research. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2531–2535.
  17. Zero-shot entity linking by reading entity descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3449–3460.
  18. Multi-task retrieval for knowledge-intensive tasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1098–1111.
  19. Are sixteen heads really better than one? Advances in neural information processing systems, 32.
  20. Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11264–11272.
  21. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.
  22. Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660.
  23. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877.
  24. Kilt: a benchmark for knowledge intensive language tasks. In NAACL-HLT.
  25. Focus on the common good: Group distributional robustness follows. arXiv preprint arXiv:2110.02619.
  26. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research.
  27. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663.
  28. Gradient vaccine: Investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874.
  29. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
  30. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wenzheng Zhang (9 papers)
  2. Chenyan Xiong (95 papers)
  3. Karl Stratos (26 papers)
  4. Arnold Overwijk (9 papers)
Citations (1)