Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and Gated Residual Connections (2405.13407v1)

Published 22 May 2024 in cs.LG and cs.AI

Abstract: Transformers have revolutionized various domains of artificial intelligence due to their unique ability to model long-range dependencies in data. However, they lack in nuanced, context-dependent modulation of features and information flow. This paper introduces two significant enhancements to the transformer architecture - the Evaluator Adjuster Unit (EAU) and Gated Residual Connections (GRC) - designed to address these limitations. The EAU dynamically modulates attention outputs based on the relevance of the input context, allowing for more adaptive response patterns. Concurrently, the GRC modifies the transformer's residual connections through a gating mechanism that selectively controls the information flow, thereby enhancing the network's ability to focus on contextually important features. We evaluate the performance of these enhancements across several benchmarks in natural language processing. Our results demonstrate improved adaptability and efficiency, suggesting that these modifications could set new standards for designing flexible and context-aware transformer models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Estimating or propagating gradients through stochastic neurons for conditional computation, 2013.
  2. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 12–58, Baltimore, Maryland, USA, June 2014. Association for Computational Linguistics.
  3. Recognising textual entailment with logical inference. In Raymond Mooney, Chris Brew, Lee-Feng Chien, and Katrin Kirchhoff, editors, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 628–635, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics.
  4. Semeval-2017 task 1: Semantic textual similarity multilingual and cross-lingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14. Association for Computational Linguistics, 2017.
  5. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics.
  6. Rethinking attention with performers. In International Conference on Learning Representations, 2021.
  7. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 933–941. JMLR.org, 2017.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  9. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005.
  10. Multi30K: Multilingual English-German image descriptions. In Anya Belz, Erkut Erdem, Krystian Mikolajczyk, and Katerina Pastra, editors, Proceedings of the 5th Workshop on Vision and Language, pages 70–74, Berlin, Germany, August 2016. Association for Computational Linguistics.
  11. Long short-term memory. Neural Comput., 9(8):1735–1780, nov 1997.
  12. Reformer: The efficient transformer, 2020.
  13. The winograd schema challenge. In Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR’12, page 552–561. AAAI Press, 2012.
  14. Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
  15. Pointer sentinel mixture models, 2016.
  16. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics.
  17. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 3859–3869, Red Hook, NY, USA, 2017. Curran Associates Inc.
  18. Bi-directional block self-attention for fast and memory-efficient sequence modeling. In International Conference on Learning Representations, 2018.
  19. Recursive deep models for semantic compositionality over a sentiment treebank. In David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, editors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
  20. Highway networks, 2015.
  21. Roformer: Enhanced transformer with rotary position embedding, 2021.
  22. Adaptive attention span in transformers. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 331–335, Florence, Italy, July 2019. Association for Computational Linguistics.
  23. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  24. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Tal Linzen, Grzegorz Chrupała, and Afra Alishahi, editors, Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics.
  25. Linformer: Self-attention with linear complexity, 2020.
  26. Neural network acceptability judgments. arXiv preprint arXiv:1805.12471, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets