Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts
Detailed Answer
Thorough responses based on abstracts and some paper content
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
74 tokens/sec
Gemini 2.5 Pro Pro
62 tokens/sec
o3 Pro
18 tokens/sec
GPT-4.1 Pro
74 tokens/sec
DeepSeek R1 via Azure Pro
24 tokens/sec
2000 character limit reached

Libero-Object Benchmark (LIBERO)

Updated 24 June 2025

The Libero-Object Benchmark is a standardized evaluation suite within the LIBERO lifelong robot learning framework, targeting object-centric knowledge transfer in sequential robotic manipulation tasks. It is designed to assess and catalyze research on compositionality, transfer, and robustness of manipulation policies when exposed to a diverse set of objects, facilitating the paper of both declarative (object properties, identities) and procedural (manipulation skills) knowledge in a lifelong learning setting.

1. Design and Objectives

The primary objective of the Libero-Object Benchmark is to provide a rigorous, extensible substrate for evaluating lifelong imitation learning, with a focus on object-driven generalization and memory retention. Each task in the suite introduces a novel object for a standard manipulation scenario, typically pick-and-place, requiring agents to continually integrate and utilize knowledge about previously unseen object attributes and dynamics while minimizing catastrophic forgetting of earlier tasks. This enables controlled investigation into the ability of learning algorithms and architectures to transfer object-specific knowledge across a temporal learning curriculum.

The benchmark is part of the broader LIBERO LLDM suite, which encompasses 130 tasks organized into thematic suites addressing spatial, goal, and entangled knowledge in addition to the object-centric Libero-Object subset (Liu et al., 2023 ).

2. Procedural Generation of Tasks

Libero-Object tasks are generated through a systematic, extendible pipeline based on behavioral templates extracted from large-scale datasets of human activities. The process entails sampling task instructions, configuring initial scene layouts with newly introduced objects, and specifying goal predicates in PDDL. Each task utilizes human-teleoperated demonstrations (typically 50 per task) to provide ground truth and training signals for imitation learning.

All tasks are instantiated in Robosuite, a modular simulation platform, ensuring realism and standardization across evaluations. The object suite is readily extensible: new object models and configurations can be programmatically added, expanding the benchmark's applicability.

3. Evaluation Protocols and Metrics

Evaluation on LIBERO-OBJECT follows lifelong imitation learning conventions, where algorithms are exposed to a stream of tasks in sequence, without revisiting all past data. Core metrics include:

  • Forward Transfer (FWT): Assesses how prior learning accelerates or benefits new task acquisition.
  • Negative Backward Transfer (NBT): Quantifies the extent of performance degradation on previous tasks as new ones are learned (measure of catastrophic forgetting).
  • Area Under the Success Rate Curve (AUC): Aggregates performance across all tasks and incremental learning stages, summarizing sustained competence.

Formally, let ci,j,ec_{i,j,e} denote the success rate on task jj after ee epochs of learning task ii. For KK tasks: $\begin{split} &\text{FWT}_k = \frac{1}{11}\sum_{e \in \{0 \dots 50\}} c_{k,k,e} \ &\text{NBT}_k = \frac{1}{K-k} \sum_{\tau = k+1}^K \left(c_{k, k} - c_{\tau, k}\right) \ &\text{AUC}_k = \frac{1}{K-k+1} \left(\text{FWT}_k + \sum_{\tau=k+1}^K c_{\tau, k}\right) \end{split}$ These are aggregated across all KK tasks for comprehensive reporting.

4. Comparative Assessment of Lifelong Algorithms

Extensive benchmarking on LIBERO-OBJECT has revealed several empirical insights:

  • Sequential fine-tuning can outperform established lifelong learning algorithms (EWC, Experience Replay, PackNet) in forward transfer, suggesting that aggressive regularization may unduly restrict beneficial plasticity (Liu et al., 2023 ).
  • No single visual encoder is universally optimal. Vision Transformer policies yield better object-centric transfer, while ResNet-based encoders excel on procedural or motion-centric tasks.
  • The choice of language encoder (BERT, CLIP, task-ID) exerts minimal influence—embeddings primarily serve as task differentiators under current paradigms.
  • Task ordering has a pronounced effect, especially for dynamic architecture methods and experience replay schemes.
  • Naive supervised pretraining might impair lifelong performance, contrary to its typical advantages in other domains.

A summary table of LIBERO-OBJECT’s properties:

Feature Description
Purpose Object-centric lifelong knowledge transfer
Tasks 10, each with a unique object (train/test split possible)
Focus Declarative object knowledge across manipulations
Data Provided 50 human demonstrations per task, meshes, instructions
Metrics FWT, NBT, AUC
Extensibility Programmatic addition of new objects/tasks

5. Advances from Multi-Modal and Policy Distillation

Recent methods such as M2Distill have demonstrated state-of-the-art performance on LIBERO-OBJECT by explicitly regulating distribution shifts in latent features and policy outputs across vision, language, and action modalities (Roy et al., 30 Sep 2024 ). The method augments the imitation learning objective with auxiliary loss terms:

  • Latent feature distillation: Constrains the squared Euclidean drift of features for vision, language, joint, and gripper states between current and prior models, maintaining representation consistency.

Lϵ=1NLi=1Nj=1Lfi,jk,ϵfi,jk1,ϵ22\mathcal{L}_\epsilon = \frac{1}{N L} \sum_{i=1}^N \sum_{j=1}^L \| f_{i,j}^{k,\epsilon} - f_{i,j}^{k-1,\epsilon} \|_2^2

  • Policy distribution alignment: Enforces KL divergence minimization between old and new Gaussian Mixture Model (GMM) policy outputs, reducing behavioral drift: Lpolicy=KL(πkπk1)1Ns=1N(logπk(as)logπk1(as))\mathcal{L}_{\text{policy}} = \text{KL}(\pi^k \| \pi^{k-1}) \approx \frac{1}{N} \sum_{s=1}^N \Big( \log \pi^k(a^s) - \log \pi^{k-1}(a^s) \Big) Combined, these objectives preserve task-relevant features and behaviors as new objects and skills are encountered.

Empirical comparison on LIBERO-OBJECT yielded the following results:

Method FWT NBT AUC
Sequential 0.62 (±.00) 0.63 (±.02) 0.30 (±.00)
EWC 0.56 (±.03) 0.69 (±.02) 0.16 (±.02)
ER 0.56 (±.01) 0.24 (±.00) 0.49 (±.01)
BUDS 0.52 (±.02) 0.21 (±.01) 0.47 (±.01)
LOTUS 0.74 (±.03) 0.11 (±.01) 0.65 (±.03)
M2Distill 0.75 (±.03) 0.08 (±.05) 0.69 (±.04)

These results indicate that multi-modal distillation significantly reduces forgetting (lowest NBT) and improves overall performance consistency (highest AUC).

6. Research Implications and Extensions

The LIBERO-OBJECT benchmark has become a reference for evaluating object-centric lifelong robotic learning algorithms. Its structure enables isolation of object knowledge transfer, offering granular analysis of how representations persist, transform, or degrade with incremental exposure to novel objects.

The procedural task generation pipeline ensures ongoing extensibility—new objects, manipulation variants, and sensor modalities can be integrated, supporting future exploration of compositional and generalization challenges.

The benchmark’s metrics and outcomes also highlight ongoing open issues: regularization and experience replay methods developed for standard continual learning may need further adaptation for the high-dimensional, multi-modal, and sequentially compositional tasks characteristic of real-world robotics.

7. Summary Table

Aspect Libero-Object Specification Key Insights
Number of Tasks 10, with possibility for expansion Enables object-centric generalization paper
Evaluation Metrics FWT, NBT, AUC Quantifies transfer, forgetting, overall success
Data Human demonstrations, meshes, language Supports sample-efficient policy learning
Notable Methods M2Distill, LOTUS, ER, EWC, PackNet Multi-modal distillation yields best retention
Observed Challenges Task ordering, pretraining hurt, encoder
Extensibility Procedurally generated, indefinitely Ongoing research utility

Conclusion

The Libero-Object Benchmark provides a standardized, extensible framework for evaluating robotic agents’ ability to learn, generalize, and retain object-centered manipulation skills in a lifelong learning context. Its adoption has advanced the field’s understanding of knowledge transfer, memory, and the role of architecture and training strategy in sequential robot learning. The benchmark continues to serve as a foundational testbed for both incremental algorithmic progress and the identification of persistent challenges in embodied lifelong intelligence.