Extend LIFE to VLMs, MoEs, and Speculative Decoding
Extend the LLM Inference Forecast Engine (LIFE) hardware- and dataset-agnostic analytical framework beyond dense large language models to support Vision Language Models (VLMs), Mixture-of-Experts (MoE) architectures, and speculative decoding, by developing appropriate operator-level analytical models and workload characterization that enable forecasting of inference metrics such as time-to-first-token, time-per-output-token, and tokens-per-second using only hardware specifications.
References
While we showcase our study on dense LLMs, extending this to Vision LLMs (VLMs), Mixture-of-Experts (MoEs) and Speculative Decoding is left for future exploration.
— Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling
(2508.00904 - Patwari et al., 29 Jul 2025) in Section 7: Conclusion