Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control (2503.11801v3)

Published 14 Mar 2025 in cs.GR, cs.LG, and cs.RO

Abstract: We present Diffuse-CLoC, a guided diffusion framework for physics-based look-ahead control that enables intuitive, steerable, and physically realistic motion generation. While existing kinematics motion generation with diffusion models offer intuitive steering capabilities with inference-time conditioning, they often fail to produce physically viable motions. In contrast, recent diffusion-based control policies have shown promise in generating physically realizable motion sequences, but the lack of kinematics prediction limits their steerability. Diffuse-CLoC addresses these challenges through a key insight: modeling the joint distribution of states and actions within a single diffusion model makes action generation steerable by conditioning it on the predicted states. This approach allows us to leverage established conditioning techniques from kinematic motion generation while producing physically realistic motions. As a result, we achieve planning capabilities without the need for a high-level planner. Our method handles a diverse set of unseen long-horizon downstream tasks through a single pre-trained model, including static and dynamic obstacle avoidance, motion in-betweening, and task-space control. Experimental results show that our method significantly outperforms the traditional hierarchical framework of high-level motion diffusion and low-level tracking.

Summary

The paper introduces Diffuse-CLoC, a guided diffusion framework that integrates state-action modeling to generate realistic, steerable character motions.
It employs a transformer-based diffusion architecture with specialized attention mechanisms to overcome limitations of previous kinematic methods.
Experimental results demonstrate improved performance in tasks like dynamic obstacle avoidance and navigation without requiring retraining.

"Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control"

The paper introduces Diffuse-CLoC, a guided diffusion framework specifically designed for physics-based character control. This model excels in creating realistic, steerable motions by modeling the joint distribution of states and actions, overcoming the steerability and physical realism limitations of previous kinematics and diffusion policy approaches.

Motivation and Approach

The paper aims to address the shortcomings of kinematic motion diffusion models in generating physically feasible motions and the restricted adaptability of diffusion-based control policies for unseen tasks. Diffuse-CLoC effectively merges these paradigms by using a diffusion model that conditions action generation on predicted states, thus maintaining the physical realism of actions while providing intuitive steerability.

Diffuse-CLoC employs a transformer-based diffusion architecture with a specialized attention mechanism—non-causal for states and causal for actions. This design allows dynamic obstacle avoidance and task-space control without requiring a high-level planner. The system can handle a wide variety of tasks, such as static and dynamic obstacle avoidance, motion in-betweening, and more, using a single pretrained model.

Technical Implementation

Diffuse-CLoC's architecture is depicted in (Figure 1). It utilizes a decoder-only transformer, denoising a trajectory represented through separate state and action tokens. The model achieves superior performance by employing a tailored attention mask that distinguishes between states’ and actions’ attention scopes.

Attention and Rolling Scheme

The attention strategy plays a critical role. Actions use causal attention, focusing on past states to mitigate the influence of artifacts in future states, while states can attend to future states (Figure 2). The rolling inference scheme enhances consistency, assigning noise based on timestep proximity, thus improving real-time responsiveness and reducing computation by reusing information in a FIFO manner.

Figure 1: Framework of Diffuse-CLoC, highlighting its architecture, attention mechanism, and rolling buffer implementation.

Figure 3: Rollouts in jump task: Kin+PHC (Left) faces artifacts; Diffuse-CLoC (Right) completes agile motions robustly.

Experimental Results

Experiments demonstrate Diffuse-CLoC's superiority in various tasks compared to traditional methods like Kin+PHC. On the "jump" task, Diffuse-CLoC achieves a significantly higher success rate due to its integrated state-action modeling, allowing it to respond effectively even with perturbed kinematic inputs.

In the "forest" task, longer state prediction horizons improve navigation success, as they allow for effective trajectory planning (Figure 4).

Figure 4: Trajectories in the forest task: Short horizons cause repetitive patterns; long horizons ensure diverse, successful navigation.

Application Scenarios

Diffuse-CLoC's practical applications include static and dynamic obstacle avoidance, optimizing for route path guidance through classifier inference, and seamless motion in-betweening without extra retraining requirements.

Figure 5: Downstream task rollouts demonstrating Diffuse-CLoC's planning, steering, and motion synthesis capabilities.

Conclusion

Diffuse-CLoC presents a substantial advancement in physics-based control by merging the adaptability of kinematic generation with the realism of action diffusion. The unified approach enables a broad array of tasks without retraining, fostering innovations in virtual environments and robotics.

Future research could explore data coverage to enhance model robustness and alternative attention formulations to address bias towards previously seen motion patterns. Opportunities also exist to extend this framework into object interaction domains, offering rich avenues for exploration in AI-driven animation and control systems.