Long-term Planning by Short-term Prediction (1602.01580v1)

Published 4 Feb 2016 in cs.LG

Abstract: We consider planning problems, that often arise in autonomous driving applications, in which an agent should decide on immediate actions so as to optimize a long term objective. For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge. Such problems are characterized by continuous state and action spaces, and by interaction with multiple agents, whose behavior can be adversarial. We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces. We propose to tackle the planning task by decomposing the problem into two phases: First, we apply supervised learning for predicting the near future based on the present. We require that the predictor will be differentiable with respect to the representation of the present. Second, we model a full trajectory of the agent using a recurrent neural network, where unexplained factors are modeled as (additive) input nodes. This allows us to solve the long-term planning problem using supervised learning techniques and direct optimization over the recurrent neural network. Our approach enables us to learn robust policies by incorporating adversarial elements to the environment.

Citations (60)

View on Semantic Scholar

Summary

The paper proposes a novel framework for autonomous driving planning that decomposes the problem into supervised learning for short-term predictions and a recurrent neural network for long-term optimization, bypassing traditional MDP limitations.
The framework demonstrates robustness against adversarial environments by modeling unpredictable interactions within a constrained range without relying on probabilistic assumptions.
Experiments on simulated adaptive cruise control and roundabout merging problems show the framework successfully learns policies, validating its potential for complex autonomous driving scenarios.

Long-term Planning by Short-term Prediction: A Novel Framework for Autonomous Driving

In "Long-term Planning by Short-term Prediction," the authors present an innovative approach to planning in autonomous driving, addressing the challenges presented by continuous state and action spaces and interactions with adversarial agents. The paper proposes decomposing the planning problem into supervised learning for short-term predictions and a recurrent neural network (RNN) framework to optimize long-term objectives, bypassing the limitations of traditional reinforcement learning (RL) methods that rely on Markov Decision Processes (MDPs).

Problem Context and Limitations of Traditional Approaches

The domain of autonomous driving necessitates robust decision-making to optimize long-term objectives while considering immediate actions. This scenario typically involves continuous state and action spaces, fundamentally challenging the traditional MDP framework due to the non-Markovian characteristics of the natural state space in this application. The paper critiques reliance on MDPs and dual approaches involving the value function and the $Q$ function, highlighting difficulties related to the requirement for discretization in continuous spaces and the sensitivity to noise, which complicates learning robust policies.

Proposed Framework

The authors present a two-phase solution: utilizing supervised learning (SL) to predict the near future based on current observations and applying an RNN for the whole trajectory of the agent. This framework captures the predictable aspects of the environment using SL and employs RNNs to model long-term behavior, with unexplained factors integrated as input nodes. By eliminating dependence on the traditional value functions in RL, their approach directly optimizes the policy as a function $\pi_\theta$ over a hypothesis class $\mathcal{H}$ of DNNs. The gradient of the anticipated cumulative reward is obtained using backpropagation through the RNN, which efficiently approximates this direct functional relationship.

Addressing Adversarial Environments

The framework's distinctive advantage is its robustness against adversarial environments. The authors model the unpredictable elements of agent interaction such as adversarial driving behavior within a constrained range, without relying on probabilistic assumptions for these disturbances. This capability is key in real-world driving scenarios, where the behavior of other drivers can be hostile or unpredictable. Furthermore, by incorporating adversarial interactions, their framework potentially accelerates policy learning by focusing on robust decision pathways resilient to negative influences.

Experiments

Two toy problems, adaptive cruise control (ACC) and roundabout merging, illustrate the framework’s efficacy. In these simulated environments, the authors demonstrated successful policy learning without requiring the explicit modeling of all state transitions as Markovian. The results underline that the policies learned could navigate adversarial settings effectively, identifying behavior patterns such as when to merge or yield, thus attesting to the practical utility of the method.

Implications and Future Work

The paper offers a compelling alternative to traditional RL for complex planning tasks like autonomous driving. The presented work is theoretically significant as it circumvents some inherent assumptions in MDPs, potentially extending the applicability of RL approaches to more sophisticated environments. Practically, this approach holds promise for enhancing decision-making in driving automation systems beyond simple tasks, extending to complex scenarios like urban driving or collaborative maneuvers.

Future research could enhance this framework's generalization capacity to varied driving contexts or further integrate the adversarial environment into the learning process to refine policy robustness. Moreover, refining this approach's scalability and real-time application could pave the way for its broad adoption in real-world driving systems.

Related Papers

YouTube

Show All Videos