PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network (2003.06959v4)

Published 16 Mar 2020 in cs.LG, cs.GR, and stat.ML

Abstract: Data-driven methods for physics-based character control using reinforcement learning have been successfully applied to generate high-quality motions. However, existing approaches typically rely on Gaussian distributions to represent the action policy, which can prematurely commit to suboptimal actions when solving high-dimensional continuous control problems for highly-articulated characters. In this paper, to improve the learning performance of physics-based character controllers, we propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We exploit particle filtering to dynamically explore and discretize the action space, and track the posterior policy represented as a mixture distribution. The resulting policy can replace the unimodal Gaussian policy which has been the staple for character control problems, without changing the underlying model architecture of the reinforcement learning algorithm used to perform policy optimization. We demonstrate the applicability of our approach on various motion capture imitation tasks. Baselines using our particle-based policies achieve better imitation performance and speed of convergence as compared to corresponding implementations using Gaussians, and are more robust to external perturbations during character control. Related code is available at: https://motion-lab.github.io/PFPN.

PDF Abstract

Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network

The paper addresses a significant challenge in physics-based character control using reinforcement learning (RL): the handling of high-dimensional continuous action spaces for highly-articulated characters. Traditional RL methods leverage Gaussian distributions to encode the action policies; however, these approaches often prematurely commit to suboptimal solutions due to their unimodal nature. The authors propose an alternative, termed the Particle Filtering Policy Network (PFPN), which utilizes a particle-based approach to dynamically explore and discretize the action space and track the policy as a mixture of particles rather than relying on a single Gaussian distribution.

The primary contribution of this paper is the development of the PFPN framework. This framework represents the action policy as a mixture distribution facilitated by particle filtering. Each action dimension comprises state-independent particles, which allows for a more expressive multimodal policy representation. This approach departs from traditional Gaussian-based methods, enabling PFPN to adaptively sample the action space, promoting more efficient exploration and policy optimization without needing architectural changes to existing RL algorithms.

The efficacy of PFPN is demonstrated across multiple character control tasks, including motion capture imitation tasks for a humanoid and a dog. The results show that PFPN-based baselines not only achieve superior imitation performance compared to Gaussian-based policies but also exhibit improved convergence speed and robustness to external perturbations. Specifically, PFPN-trained characters demonstrate more stable and natural-like motions, addressing the visual artifacts typically observed with Gaussian policies. The robustness evaluation further confirms PFPN's ability to maintain character balance under external disturbances more effectively than conventional methods.

Several novel aspects underpin the PFPN approach:

Particle-based Policy Representation: By representing the policy with a set of particles, the framework allows the action space to be discretely and dynamically explored, potentially capturing multiple optimal actions reflecting the multimodal nature of real-world control tasks.
Resampling Strategy: To handle particle degeneracy, which is common in particle filtering, the paper introduces a resampling strategy, thereby reactivating dead particles and maintaining the diversity and expressivity of the policy landscape.
Action-Value Based Optimization Compatibility: Despite the reparameterization challenges associated with complex action policies, the framework is shown to extend to off-policy algorithms such as Soft Actor-Critic (SAC), further supporting its general applicability.

The findings in this paper have several implications. Theoretically, PFPN offers a more expressive policy representation for RL, potentially transforming how high-dimensional action spaces are managed, particularly in the field of character control and animation. Practically, it points towards enhanced motion realism and robustness, key requirements in simulations and gaming industries. Future developments could investigate integrating PFPN in broader applications, including robotics, where physical interactions demand both high adaptability and precision.

In conclusion, the PFPN framework provides a promising alternative to Gaussian-based policies in RL for continuous control. It sets the stage for more nuanced and accurate learning models that articulate complex, multi-modal decision landscapes, thus broadening the horizons for AI-driven animation and control systems. The ability to seamlessly integrate with current RL architectures without added computational burdens marks PFPN as a versatile and efficient tool in advancing character animation's state of the art.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Pei Xu (18 papers)
Ioannis Karamouzas (13 papers)

Citations (2)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

YouTube

Show All Videos