Effectiveness of SAGE beyond language modeling
Determine the effectiveness of the SAGE (Sign-Adaptive Gradient) optimizer on non-language modalities such as computer vision and on fine-tuning tasks, by evaluating its training stability and performance in those settings.
References
Finally, our analysis was confined to language modeling on The Pile dataset. The effectiveness of SAGE on other modalities (e.g., vision) or fine-tuning tasks remains an open question.
— SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization
(2604.07663 - Lee et al., 9 Apr 2026) in Limitations