Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization (2503.04598v3)

Published 6 Mar 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Transformers have become the de facto architecture for a wide range of machine learning tasks, particularly in LLMs. Despite their remarkable performance, challenges remain in training deep transformer networks, especially regarding the position of layer normalization. While Pre-Norm structures facilitate more stable training owing to their stronger identity path, they often lead to suboptimal performance compared to Post-Norm. In this paper, we propose $\textbf{HybridNorm}$, a simple yet effective hybrid normalization strategy that integrates the advantages of both Pre-Norm and Post-Norm. Specifically, HybridNorm employs QKV normalization within the attention mechanism and Post-Norm in the feed-forward network (FFN) of each transformer block. We provide both theoretical insights and empirical evidence demonstrating that HybridNorm improves gradient flow and model robustness. Extensive experiments on large-scale transformer models, including both dense and sparse variants, show that HybridNorm consistently outperforms both Pre-Norm and Post-Norm approaches across multiple benchmarks. These findings highlight the potential of HybridNorm as a more stable and effective technique for improving the training and performance of deep transformer models. Code is available at https://github.com/BryceZhuo/HybridNorm.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 5 likes.