Improving Multitask Retrieval by Promoting Task Specialization (2307.00342v1)
Abstract: In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model (one that is explicitly optimized for multitasking) along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.
- Task-aware retrieval with instructions. arXiv preprint arXiv:2211.09260.
- Autoregressive search engines: Generating substrings as document identifiers. arXiv preprint arXiv:2204.10628.
- Signature verification using a" siamese" time delay neural network. Advances in neural information processing systems, 6.
- Corpusbrain: Pre-train a generative retrieval model for knowledge-intensive language tasks. arXiv preprint arXiv:2208.07652.
- Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning, pages 794–803. PMLR.
- Autoregressive entity retrieval. In International Conference on Learning Representations.
- Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 528–537, Hong Kong, China. Association for Computational Linguistics.
- Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 2333–2338.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019, pages 4171–4186.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
- Martin Josifoski Sebastian Riedel Luke Zettlemoyer Ledell Wu, Fabio Petroni. 2020. Zero-shot entity linking with dense entity retrieval. In EMNLP.
- Tabi: Type-aware bi-encoders for open-domain entity retrieval. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2147–2166.
- No parameters left behind: Sensitivity guided adaptive learning rate for training large transformer models. In International Conference on Learning Representations.
- Super tickets in pre-trained language models: From model compression to improving generalization. arXiv preprint arXiv:2105.12002.
- Openmatch: An open source library for neu-ir research. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2531–2535.
- Zero-shot entity linking by reading entity descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3449–3460.
- Multi-task retrieval for knowledge-intensive tasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1098–1111.
- Are sixteen heads really better than one? Advances in neural information processing systems, 32.
- Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11264–11272.
- Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.
- Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660.
- Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877.
- Kilt: a benchmark for knowledge intensive language tasks. In NAACL-HLT.
- Focus on the common good: Group distributional robustness follows. arXiv preprint arXiv:2110.02619.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research.
- Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663.
- Gradient vaccine: Investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836.
- Wenzheng Zhang (9 papers)
- Chenyan Xiong (95 papers)
- Karl Stratos (26 papers)
- Arnold Overwijk (9 papers)