Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 74 tok/s

Gemini 2.5 Pro 37 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 37 tok/s Pro

GPT-4o 104 tok/s Pro

Kimi K2 184 tok/s Pro

GPT OSS 120B 448 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning (2506.06072v2)

Published 6 Jun 2025 in cs.RO and cs.LG

Abstract: We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-LLM with an encoder-decoder architecture, demonstrating BEAST's compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.

Summary

The paper introduces BEAST, which leverages B-spline encoding to create consistent, smooth tokenizations of action sequences that enhance imitation learning efficiency.
It demonstrates accelerated tokenization through parallel decoding and fixed token lengths across VAE, transformer, and vision-language model architectures.
Empirical evaluations on 166 simulated and 8 real-world tasks confirm BEAST’s reduced computational costs and competitive task success rates.

Overview of B-Spline Encoded Action Sequence Tokenizer (BEAST) for Imitation Learning

In the domain of imitation learning, efficient action sequence tokenization is crucial for developing models that can learn and reproduce complex behaviors from human demonstrations. The paper introduces a novel approach known as the B-spline Encoded Action Sequence Tokenizer (BEAST) designed for this purpose. BEAST leverages B-spline mathematical representations to encode action sequences into tokens, providing a smooth and resource-efficient mechanism for action representation in various model architectures.

Key Innovations and Methodology

BEAST distinguishes itself by encoding action sequences using B-splines—a mathematical formula that enables the representation of trajectories with smooth transitions. This property is particularly advantageous for imitating human-like actions where continuity and fluidity between movements are essential. Unlike traditional tokenization methods, BEAST does not rely on external training for its tokenization process, allowing it to maintain consistent token lengths and enabling parallel decoding, which significantly speeds up the sequence generation process.

Three separate model architectures serve as platforms in the evaluation of BEAST:

Variational Autoencoder (VAE) with Continuous Tokens: Here, BEAST translates action sequences into continuous tokens leveraging the probabilistic framework of VAEs for expressive sequence processing.
Decoder-Only Transformer with Discrete Tokens: BEAST utilizes transformers to generate discrete tokens through autoregressive modeling. This model benefits from the fast inference capabilities due to BEAST's fixed token length feature.
Florence-2 Vision-LLM (VLM) with an Encoder-Decoder Framework: Demonstrating compatibility with advanced LLMs, BEAST is integrated into Florence-2, highlighting its adaptability and scalability with large pretrained models.

Empirical Evaluations and Results

The paper provides a comprehensive assessment of BEAST across diverse benchmarks, which include a total of 166 simulated tasks and eight real-world tasks with distinct robot settings. BEAST achieves three significant outcomes:

Reduced Computational Costs: Both training and inference computational resources are markedly minimized.
High-Quality Action Sequences: The model generates control signals with high frequency and smooth transitions, which are essential for continuous control tasks.
Competitive Task Success Rates: BEAST matches or surpasses the performance of state-of-the-art methods in a range of tasks, providing robust and reliable action generation.

Implications and Future Directions

The introduction of BEAST opens up new possibilities for efficient and effective action sequence modeling in AI. The use of B-splines provides a theoretically sound and practical approach to generating smooth trajectories, a notable predicament in robotic applications. This approach can potentially be extended to other domains requiring precise action modeling and trajectory generation, such as autonomous vehicles and animatronics.

For future work, additional exploration into the scalability of BEAST with other large-scale pretrained models could further enhance its efficacy in handling complex tasks. Investigating the integration of BEAST with advanced machine learning techniques such as reinforcement learning could also prove beneficial, expanding its utility across broader AI applications. The potential for BEAST to function as a building block in larger systems is substantial, particularly as AI continues to evolve towards more holistic and integrated solutions. As more layers of abstraction are developed, BEAST's foundational focus on seamless tokenization could serve as a catalyst for future innovations in AI-driven imitation learning.