Are all prompt tokens essential for first-token generation?
Determine whether all tokens in an input prompt are essential for predicting the first generated token during the prefilling stage of autoregressive transformer-based large language models (the initial pass that computes the key–value cache for all prompt tokens).
References
An open question remains whether all prompt tokens are essential for generating the first token.
— LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
(2407.14057 - Fu et al., 2024) in Abstract