Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference (2212.08153v2)

Published 15 Dec 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Fusion-in-Decoder (FiD) is a powerful retrieval-augmented LLM that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Michiel de Jong (14 papers)
  2. Yury Zemlyanskiy (12 papers)
  3. Joshua Ainslie (32 papers)
  4. Nicholas FitzGerald (15 papers)
  5. Sumit Sanghai (15 papers)
  6. Fei Sha (88 papers)
  7. William Cohen (11 papers)
Citations (31)

Summary

We haven't generated a summary for this paper yet.