Identify the causes of failure episodes labeled as “Unknown” in BEHAVIOR-1K evaluation

Identify the underlying causes of failure episodes categorized as “Unknown” in the authors’ labeled subset of BEHAVIOR-1K evaluation tasks, where failures could not be attributed to any specific problem, to improve diagnosis and targeted remediation strategies.

Background

To analyze failure modes, the authors labeled a subset of evaluation episodes across 15 tasks with multiple-choice reasons such as dexterity issues, order errors, confusion, and navigation problems. Among these, some failures were grouped as “Unknown,” indicating the authors could not assign a specific cause.

Characterizing and diagnosing these unknown failures would help refine training data, heuristics, and model components (e.g., recovery behaviors, stage tracking), ultimately strengthening robustness in long-horizon household manipulation.

References

Unknown: Failures that we could not attribute to any specific problem.

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge (2512.06951 - Larchenko et al., 7 Dec 2025) in Section 6.2, Failure Mode Analysis