Optimized Hardware Kernels for Ternary Operations
Develop and evaluate optimized hardware kernels for ternary operations to efficiently support training and inference of 1.58-bit quantized architectures such as Hybrid Gated Flow (HGF).
References
Key open questions include: (1) scaling behavior to billion-parameter models, (2) hardware kernel optimization for ternary operations, (3) adaptive gating mechanisms that vary across layers or heads, and (4) application to other modalities (vision, audio).
— Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction
(2602.05269 - Pizzo, 5 Feb 2026) in Conclusion, Future Directions