Adaptive autoregressive depth for Ordered Action Tokenization

Develop an online method for autoregressive policies that use Ordered Action Tokenization (OAT) to adaptively select the number of generated tokens at inference time by (i) estimating the intrinsic complexity of the current action chunk and (ii) deciding whether generating additional OAT tokens will meaningfully reduce uncertainty in the detokenized continuous action chunk.

Background

OAT introduces an ordered, prefix-decodable token space for continuous robot actions, enabling policies to detokenize valid action chunks from any token prefix. In the reported experiments, the number of generated tokens (the autoregressive depth) is fixed at deployment time for simplicity and consistency across evaluations.

The authors argue that, from an information-theoretic perspective, the token budget required to represent an action chunk should vary with task complexity: simple behaviors may be represented with few tokens, whereas contact-rich, complex behaviors may require more tokens. Determining, during inference, when further tokens provide meaningful uncertainty reduction is therefore crucial for realizing OAT’s anytime computation benefits, but remains unresolved.

References

Estimating action complexity online and deciding when additional tokens meaningfully reduce uncertainty remains an open problem.

OAT: Ordered Action Tokenization  (2602.04215 - Liu et al., 4 Feb 2026) in Discussion and Limitations