CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding (2408.04678v1)

Published 8 Aug 2024 in cs.CL, cs.AI, and cs.DB

Abstract: We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding (2408.04678v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (3)