Which Features are Best for Successor Features?
Abstract: In reinforcement learning, universal successor features (SFs) are a way to provide zero-shot adaptation to new tasks at test time: they provide optimal policies for all downstream reward functions lying in the linear span of a set of base features. But it is unclear what constitutes a good set of base features, that could be useful for a wide set of downstream tasks beyond their linear span. Laplacian eigenfunctions (the eigenfunctions of $\Delta+\Delta\ast$ with $\Delta$ the Laplacian operator of some reference policy and $\Delta\ast$ that of the time-reversed dynamics) have been argued to play a role, and offer good empirical performance. Here, for the first time, we identify the optimal base features based on an objective criterion of downstream performance, in a non-tautological way without assuming the downstream tasks are linear in the features. We do this for three generic classes of downstream tasks: reaching a random goal state, dense random Gaussian rewards, and random ``scattered'' sparse rewards. The features yielding optimal expected downstream performance turn out to be the \emph{same} for these three task families. They do not coincide with Laplacian eigenfunctions in general, though they can be expressed from $\Delta$: in the simplest case (deterministic environment and decay factor $\gamma$ close to $1$), they are the eigenfunctions of $\Delta{-1}+(\Delta{-1})\ast$. We obtain these results under an assumption of large behavior cloning regularization with respect to a reference policy, a setting often used for offline RL. Along the way, we get new insights into KL-regularized\option{natural} policy gradient, and into the lack of SF information in the norm of Bellman gaps.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.