Papers
Topics
Authors
Recent
2000 character limit reached

Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers (2406.09315v1)

Published 13 Jun 2024 in cs.AI, cs.CV, and cs.LG

Abstract: In this paper, we show how Transformers can be interpreted as dense Expectation-Maximization algorithms performed on Bayesian Nets. Based on the above interpretation, we propose a new model design paradigm, namely Vertical LoRA (VLoRA), which reduces the parameter count dramatically while preserving performance. In VLoRA, a model consists of layers, each of which recursively learns an increment based on the previous layer. We then apply LoRA decomposition to the increments. VLoRA works on the base model, which is orthogonal to LoRA, meaning they can be used together. We do experiments on various tasks and models. The results show that 1) with VLoRA, the Transformer model parameter count can be reduced dramatically and 2) the performance of the original model is preserved. The source code is available at \url{https://github.com/neverUseThisName/vlora}

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.