LAuReL: Learned Augmented Residual Layer (2411.07501v4)

Published 12 Nov 2024 in cs.LG, cs.AI, and cs.CV

Abstract: One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce Learned Augmented Residual Layer (LAuReL) -- a novel generalization of the canonical residual connection -- with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using LAuReL can help boost performance for both vision and LLMs. For example, on the ResNet-50, ImageNet 1K task, it achieves 60% of the gains from adding an extra layer, while only adding 0.003% more parameters, and matches it while adding 2.6 times fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2.54% to 20.05%, while adding only 0.012% and 0.1% additional parameters, respectively.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/asher5772/status/1926841007603142801

https://twitter.com/GauravML/status/1938504484994646106

https://twitter.com/antimatter15/status/1926459079863472357

https://twitter.com/GauravML/status/1919634103680540890

https://twitter.com/GauravML/status/1924917897488658710

https://twitter.com/papers_anon/status/1856547598355837267

YouTube

Show All Videos

LAuReL: Learned Augmented Residual Layer (2411.07501v4)

Summary

Related Papers

Tweets

YouTube