Reward Adaptation Via Q-Manipulation (2503.13414v1)

Published 17 Mar 2025 in cs.LG and cs.AI

Abstract: In this paper, we propose a new solution to reward adaptation (RA), the problem where the learning agent adapts to a target reward function based on one or multiple existing behaviors learned a priori under the same domain dynamics but different reward functions. Learning the target behavior from scratch is possible but often inefficient given the available source behaviors. Our work represents a new approach to RA via the manipulation of Q-functions. Assuming that the target reward function is a known function of the source reward functions, our approach to RA computes bounds of the Q function. We introduce an iterative process to tighten the bounds, similar to value iteration. This enables action pruning in the target domain before learning even starts. We refer to such a method as Q-Manipulation (Q-M). We formally prove that our pruning strategy does not affect the optimality of the returned policy while empirically show that it improves the sample complexity. Q-M is evaluated in a variety of synthetic and simulation domains to demonstrate its effectiveness, generalizability, and practicality.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Reward Adaptation Via Q-Manipulation (2503.13414v1)

Summary

Follow-up Questions

Authors (2)

Don't miss out on important new AI/ML research

Reward Adaptation Via Q-Manipulation (2503.13414v1)

Summary

Follow-up Questions

Related Papers

Authors (2)

Don't miss out on important new AI/ML research