AdaSplash: GPU-Efficient Adaptive Sparse Attention

Updated 12 June 2026

AdaSplash is a family of GPU-efficient adaptive sparse attention algorithms that employs the α-entmax transformation to enhance scalability and accuracy in transformers.
It integrates hardware-tailored kernels, specialized root-finding solvers, and bitpacked block masking to achieve superior throughput compared to prior approaches.
AdaSplash-2 introduces a histogram-based initialization scheme, effectively addressing both algorithmic and system challenges posed by adaptive, input-dependent sparsity.

AdaSplash is a family of GPU-efficient adaptive sparse attention algorithms for transformers, centered on high-performance implementations of the α-entmax family of attention mechanisms. AdaSplash methods address both algorithmic and systems challenges posed by adaptive, input-dependent sparsity in attention, surpassing prior α-entmax implementations in efficiency, scale, and integration with end-to-end transformer training. The approach leverages specialized root-finding solvers, hardware-tailored kernels, bitpacked block masking, and, in AdaSplash-2, a histogram-based initialization scheme that dramatically accelerates the computation of the entmax normalizer. AdaSplash methods achieve competitive or superior throughput to FlashAttention-2 in moderate-to-high sparsity regimes and maintain accuracy head-to-head with softmax baselines on both short- and long-context benchmarks (Gonçalves et al., 17 Feb 2025, Gonçalves et al., 16 Apr 2026).

1. Mathematical Foundations: α-entmax and Sparse Attention

The α-entmax transformation is a parametric family of differentiable, input-adaptive sparse alternatives to softmax [Peters et al. 2019]. For a score vector $s\in\mathbb{R}^n$ , the softmax attention weights are given by

Markdown Report Issue Upgrade to Chat

References (2)

AdaSplash: Adaptive Sparse Flash Attention (2025)

AdaSplash-2: Faster Differentiable Sparse Attention (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdaSplash.

AdaSplash: GPU-Efficient Adaptive Sparse Attention

1. Mathematical Foundations: α-entmax and Sparse Attention

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AdaSplash: GPU-Efficient Adaptive Sparse Attention

1. Mathematical Foundations: α-entmax and Sparse Attention

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research