Esoteric Language Models (2506.01928v1)

Published 2 Jun 2025 in cs.CL and cs.LG

Abstract: Diffusion-based LLMs offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models (MDMs) achieve the strongest performance but still underperform AR models in perplexity and lack key inference-time efficiency features--most notably, KV caching. In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations. Eso-LMs set a new state of the art on standard LLMing benchmarks. Crucially, we are the first to introduce KV caching for MDMs while preserving parallel generation, significantly improving inference efficiency. Combined with an optimized sampling schedule, our method achieves up to 65x faster inference than standard MDMs and 4x faster inference than prior semi-autoregressive approaches. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/Eso-LMs

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/llm360/status/1930322641099075855

https://twitter.com/TheTuringPost/status/1930768109444702542

https://twitter.com/iScienceLuvr/status/1929753312532090908

https://twitter.com/ssahoo_/status/1930597555555176558

https://twitter.com/ssahoo_/status/1929945984588755180

https://twitter.com/Aizendcom/status/1931935298893594915

HackerNews

Esoteric Language Models (2 points, 0 comments)

Esoteric Language Models (2506.01928v1)

Summary

Related Papers

Tweets

HackerNews