Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Esoteric Language Models (2506.01928v1)

Published 2 Jun 2025 in cs.CL and cs.LG

Abstract: Diffusion-based LLMs offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models (MDMs) achieve the strongest performance but still underperform AR models in perplexity and lack key inference-time efficiency features--most notably, KV caching. In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations. Eso-LMs set a new state of the art on standard LLMing benchmarks. Crucially, we are the first to introduce KV caching for MDMs while preserving parallel generation, significantly improving inference efficiency. Combined with an optimized sampling schedule, our method achieves up to 65x faster inference than standard MDMs and 4x faster inference than prior semi-autoregressive approaches. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/Eso-LMs

Summary

We haven't generated a summary for this paper yet.

HackerNews

  1. Esoteric Language Models (2 points, 0 comments)