Efficacy of Momentum Reset as a Proactive Mitigation for Silent Data Corruption
Determine whether momentum reset of parameters with gradient spikes can serve as a proactive mitigation for silent data corruption during large language model pretraining, given that silent data corruption can cause widespread gradient corruption and may trigger numerous momentum resets.
References
It remains an open question whether momentum reset can serve as a proactive mitigation for \ac{SDC}, as \ac{SDC} can cause widespread gradient corruption and may trigger numerous momentum resets.
— Exploring Silent Data Corruption as a Reliability Challenge in LLM Training
(2604.00726 - Altenbernd et al., 1 Apr 2026) in Subsection 4.4, Severe Effects in the Backward Pass (Gradient corruption persists even with clipping)