Efficient Sample-Specific Encoder Perturbations (2405.01601v1)
Abstract: Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks. This paper proposes a simple and lightweight modification to such systems to control the behaviour according to a specific attribute of interest. This paper proposes a novel inference-efficient approach to modifying the behaviour of an encoder-decoder system according to a specific attribute of interest. Specifically, we show that a small proxy network can be used to find a sample-by-sample perturbation of the encoder output of a frozen foundation model to trigger the decoder to generate improved decodings. This work explores a specific realization of this framework focused on improving the COMET performance of Flan-T5 on Machine Translation and the WER of Whisper foundation models on Speech Recognition. Results display consistent improvements in performance evaluated through COMET and WER respectively. Furthermore, experiments also show that the proxies are robust to the exact nature of the data used to train them and can extend to other domains.
- An actor-critic algorithm for sequence prediction. In International Conference on Learning Representations (ICLR).
- Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics.
- Scheduled sampling for sequence prediction with recurrent neural networks. Conference on Neural Information Processing Systems.
- Fast differentiable sorting and ranking. International Conference on Machine Learning.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4774–4778. IEEE.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Palm-e: An embodied multimodal language model. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
- Who needs decoders? efficient estimation of sequence-level attributes. arXiv, arXiv:2305.05098.
- Prompting large language models with speech recognition abilities. arXiv preprint arXiv:2307.11795.
- Towards general-purpose speech abilities for large language models using unpaired data. arXiv preprint arXiv:2311.06753.
- High quality rather than high model probability: Minimum Bayes risk decoding with neural metrics. Transactions of the Association for Computational Linguistics, 10:811–825.
- Listen, think, and understand. arXiv preprint arXiv:2305.10790.
- Generative adversarial nets. Advances in neural information processing systems, 27.
- Levenshtein transformer. Conference on Neural Information Processing Systems.
- LoRA: Low-rank adaptation of large language models.
- Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1317–1327, Austin, Texas. Association for Computational Linguistics.
- Shankar Kumar and William Byrne. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 169–176, Boston, Massachusetts, USA. Association for Computational Linguistics.
- End-to-end speech recognition contextualization with large language models. arXiv preprint arXiv:2309.10917.
- Professor forcing: A new algorithm for training recurrent networks. Conference on Neural Information Processing Systems.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
- CUED at ProbSum 2023: Hierarchical ensemble of summarization models. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 516–523, Toronto, Canada. Association for Computational Linguistics.
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- Librispeech: An asr corpus based on public domain audio books. International Conference on Acoustics, Speech and Signal Processing (ICASSP).
- Matt Post. 2018. A call for clarity in reporting BLEU scores. Conference on Machine Translation: Research Papers.
- Robust speech recognition via large-scale weak supervision. arXiv, arXiv:2212.04356.
- Sequence level training with recurrent neural networks. In International Conference on Learning Representations (ICLR).
- Comet: A neural framework for mt evaluation. Association for Computational Linguistics.
- Combining outputs from multiple machine translation systems. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 228–235, Rochester, New York. Association for Computational Linguistics.
- Improved word-level system combination for machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 312–319, Prague, Czech Republic. Association for Computational Linguistics.
- Audiopalm: A large language model that can speak and listen. arXiv preprint arXiv:2306.12925.
- Optimal completion distillation for sequence learning. International Conference on Learning Representations (ICLR).
- Consensus network decoding for statistical machine translation system combination. 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07.
- Felix Stahlberg and Bill Byrne. 2019. On NMT search errors and model errors: Cat got your tongue? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3356–3362, Hong Kong, China. Association for Computational Linguistics.
- Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
- Richard Stuart Sutton. 1984. Temporal credit assignment in reinforcement learning.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Attention is all you need. Advances in neural information processing systems, 30.
- Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation.
- Learning global features for coreference resolution. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 994–1004, San Diego, California. Association for Computational Linguistics.
- A study of reinforcement learning for neural machine translation. Conference on Empirical Methods in Natural Language Processing.
- mt5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498.
- Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.