Backprop KF: Learning Discriminative Deterministic State Estimators

Published 23 May 2016 in cs.LG and cs.AI | (1605.07148v4)

Abstract: Generative state estimators based on probabilistic filters and smoothers are one of the most popular classes of state estimators for robots and autonomous vehicles. However, generative models have limited capacity to handle rich sensory observations, such as camera images, since they must model the entire distribution over sensor readings. Discriminative models do not suffer from this limitation, but are typically more complex to train as latent variable models for state estimation. We present an alternative approach where the parameters of the latent state distribution are directly optimized as a deterministic computation graph, resulting in a simple and effective gradient descent algorithm for training discriminative state estimators. We show that this procedure can be used to train state estimators that use complex input, such as raw camera images, which must be processed using expressive nonlinear function approximators such as convolutional neural networks. Our model can be viewed as a type of recurrent neural network, and the connection to probabilistic filtering allows us to design a network architecture that is particularly well suited for state estimation. We evaluate our approach on synthetic tracking task with raw image inputs and on the visual odometry task in the KITTI dataset. The results show significant improvement over both standard generative approaches and regular recurrent neural networks.

Abstract PDF Upgrade to Chat

Citations (191)

View on Semantic Scholar

Summary

Summary of Backprop KF: Learning Discriminative Deterministic State Estimators

The paper presents a novel approach to state estimation, specifically tailored to scenarios dealing with high-dimensional sensory inputs like camera images. The technique proposed by Haarnoja et al. leverages discriminative models, avoiding the pitfalls of generative models that need to capture the full distribution of sensor readings, which is computationally demanding and often impractical in real-world settings. Generative models, a common choice for probabilistic filters and smoothers, face limitations when tasked with interpreting complex sensory inputs directly, owing to their need to model entire distributions. Discriminative models, conversely, can potentially bypass this by focusing directly on mapping inputs to outputs.

The paper introduces the concept of a deterministic computation graph to optimize the latent state parameters directly, thus simplifying the training process while maintaining a high representational power comparable to traditional probabilistic models. This approach allows gradient descent and backpropagation methods to effectively train the discriminative state estimators, akin to recurrent neural network training. This methodology facilitates the incorporation of raw camera images and other complex data forms into the state estimation process using expressive nonlinear function approximators like convolutional neural networks (CNNs).

From a practical standpoint, the paper showcases improvements over traditional generative methods and standard recurrent neural networks (RNNs) through evaluations on synthetic tracking tasks and visual odometry using the KITTI dataset. The mathematical framework underpinning this approach hinges on rewriting state-space models as computation graphs, effectively merging probabilistic state estimation advantages with deterministic neural network flexibilities. The deterministic view of the filtering problem allows the integration of sophisticated training techniques generally reserved for neural networks, thus bridging a gap between neural network methodologies and probabilistic inference approaches.

Numerical Results and Claims

The authors provide a comprehensive experimental evaluation highlighting the effectiveness of their approach. In synthetic tracking tasks, the Backprop KF (BKF) displayed superior performance, with root mean square (RMS) test errors markedly lower than those achieved by piecewise Kalman filters and LSTM models, despite the latter's greater parameter count. For instance, the BKF achieved a RMS test error of 0.0537, significantly undercutting the LSTM model errors of approximately 0.1407 and 0.1423 depending on the specific configuration. The benefits of discriminative training are notably apparent when handling tasks with frequent occlusions and complex dynamics.

On the KITTI dataset for visual odometry estimations, the BKF exhibited significant translational and rotational error reductions compared to alternatives. Translational errors with BKF were approximately 0.2062 meters/meter trajectory length, compared to 0.2265 for piecewise KF and significantly higher errors in the most general LSTM models. Similarly, rotational errors observed were consistently lower with BKF.

Theoretical and Practical Implications

The implications of this research are twofold, encompassing both theoretical advancements and practical enhancements. Theoretically, the paper presents a paradigm where deterministic models, informed by probabilistic structures, can yield highly precise state estimations. This expands the architectural design space for neural networks, encouraging further exploration of hybrid models that capture the essence of probabilistic reasoning without the usual constraints on learning efficiency.

Practically, the findings suggest an avenue for improving robotic vision systems and autonomous vehicle navigation by harnessing discriminative estimators for real-time sensor data integration. The scalability and simplicity of the training process make it appealing for deployment in real-world applications, where sensor data can often be deeply complex and dynamic.

Future Developments

Potential future efforts may focus on extending the discriminative deterministic state estimation framework to accommodate more complex dynamics models or higher-dimensional latent spaces. Bridging further into semi-supervised or unsupervised domains may also be viable, especially in settings where labeled data is scarce but raw sensory information is plentiful. Additionally, exploring alternative non-linear probabilistic models for the recurrent components could provide richer insights and further decrease reliance on handcrafted dynamics models.

Overall, the paper by Haarnoja et al. provides a robust foundation for state estimation in high-dimensional environments, blending deterministic neural architectures with insights from probabilistic models, challenging conventionally discrete paths between generative and discriminative methodologies.