Task Decomposition into Subgoals within JEPA

Develop a principled mechanism within the Joint Embedding Predictive Architecture (JEPA) to discover how to decompose a task into a sequence of subgoals (temporal abstractions), enabling hierarchical control without relying on an autoregressive predictive model.

Background

The paper contrasts its metacontroller-based approach—built around a pretrained autoregressive next-action predictor—with LeCun’s Joint Embedding Predictive Architecture (JEPA), which aims to learn abstract observation and action representations without an autoregressive model.

The authors argue that their findings suggest next-action prediction aids in discovering subgoal sequences, and explicitly note that devising a method for decomposing tasks into subgoals remains an open problem in the JEPA framework.

References

In fact, we show that learning a (raw) action predictor is partly what enables discovering how to decompose a task into a sequence of subgoals, one of the open problems in the JEPA proposal.