Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
99 tokens/sec
Gemini 2.5 Pro Premium
56 tokens/sec
GPT-5 Medium
26 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
106 tokens/sec
DeepSeek R1 via Azure Premium
99 tokens/sec
GPT OSS 120B via Groq Premium
507 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

Planning and Learning with Adaptive Lookahead (2201.12403v2)

Published 28 Jan 2022 in cs.LG and cs.AI

Abstract: Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.