WebGPU-based deployment of Tiny-QMoE beyond terminal-only usage
Develop a WebGPU-based deployment that enables Tiny-QMoE—a quantization and dictionary-based compression framework for LLaMA 3.2 models—to run outside of a terminal environment, thereby making the system publicly accessible beyond terminal-only execution.
Sponsor
References
On top of this while we were unable to bring the model outside of the terminal which we had hoped to do with Web-GPU, we hope to do so in the future as to make this work more public beyond the terminal.
— Tiny-QMoE
(2509.22951 - Cashman et al., 26 Sep 2025) in Conclusion, Section 6