AnchorCrafter: Animate Cyber-Anchors Selling Your Products via Human-Object Interacting Video Generation (2411.17383v2)

Published 26 Nov 2024 in cs.CV

Abstract: The generation of anchor-style product promotion videos presents promising opportunities in e-commerce, advertising, and consumer engagement. Despite advancements in pose-guided human video generation, creating product promotion videos remains challenging. In addressing this challenge, we identify the integration of human-object interactions (HOI) into pose-guided human video generation as a core issue. To this end, we introduce AnchorCrafter, a novel diffusion-based system designed to generate 2D videos featuring a target human and a customized object, achieving high visual fidelity and controllable interactions. Specifically, we propose two key innovations: the HOI-appearance perception, which enhances object appearance recognition from arbitrary multi-view perspectives and disentangles object and human appearance, and the HOI-motion injection, which enables complex human-object interactions by overcoming challenges in object trajectory conditioning and inter-occlusion management. Extensive experiments show that our system improves object appearance preservation by 7.5\% and doubles the object localization accuracy compared to existing state-of-the-art approaches. It also outperforms existing approaches in maintaining human motion consistency and high-quality video generation. Project page including data, code, and Huggingface demo: https://github.com/cangcz/AnchorCrafter.

Summary

The paper introduces a novel diffusion-based framework that integrates HOI into pose-guided video generation to improve product video fidelity.
It employs multi-view feature fusion and a dual-adapter mechanism to decouple and refine human-object appearance and motion details.
Empirical evaluations show significant improvements in Object-IoU, Object-CLIP, FVD, and FID-VID scores, boosting automated e-commerce content creation.

An Expert Overview of AnchorCrafter: Animate CyberAnchors for Product Video Generation

The paper presents AnchorCrafter, a diffusion-based framework for generating high-fidelity anchor-style product promotion videos, leveraging advancements in human-object interaction (HOI) for improved visual fidelity and interaction awareness. This novel approach addresses a notable gap in automatic video generation, specifically in automating anchor-style promotional content integral to online commerce.

Methodological Contributions

The core novelty of AnchorCrafter lies in its integration of HOI into pose-guided human video generation, an area previously underexplored in video synthesis. AnchorCrafter is centered on two primary components: HOI-appearance perception and HOI-motion injection.

HOI-Appearance Perception: This module refines object appearance perception through multi-view feature fusion and a dual-adapter mechanism. The multi-view object feature fusion captures the object's details from multiple perspectives, thereby enhancing 3D structure fidelity. The human-object dual adapter further ensures decoupled representation of human and object appearances, mitigating artifacts previously observed with traditional embedding methods.

HOI-Motion Injection: The model adeptly captures interaction dynamics by leveraging trajectory-conditioned depth maps and 3D hand mesh sequences. This innovative approach allows precise control over object trajectories in the video, accommodating complex interaction scenarios like occlusions and overlapping object-human dynamics.

Empirical Evaluation

Quantitative evaluations indicate that AnchorCrafter achieves superior performance relative to existing methods, with marked improvements in Object-IoU and Object-CLIP scores, underscoring its efficacy in maintaining object trajectory and appearance integrity. Experimental results also demonstrate enhanced video quality (lower FVD and FID-VID scores) and improved hand motion accuracy, as reflected in lower Landmark Mean Distances.

Through extensive qualitative experiments, AnchorCrafter consistently produces videos with realistic human-object interactions that align closely with specified poses, an achievement not accomplished by current frameworks such as AnimateAnyone and MimicMotion which treat objects largely as static extensions of the human appearance. User paper outcomes further corroborate these findings, with viewers awarding high scores across appearance and motion criteria for AnchorCrafter-generated content.

Implications and Future Directions

AnchorCrafter lays an important groundwork for future work in the domain of HOI-based video generation. The framework expands the possibilities for automated content creation in e-commerce, greatly enhancing consumer engagement through interactive and realistic product demonstrations. The dual pathways of appearance and motion offer a compelling basis for more nuanced and context-aware video synthesis methods.

Looking forward, the potential for applying AnchorCrafter's principles to a wider set of non-rigid and transparent objects could foster significant advancements in virtual reality and augmented product presentations. Moreover, expanding the model's capability to handle more complex, multi-object environments may further broaden its applicability across dynamic commercial and entertainment domains.

This paper thus contributes a technically robust and methodologically sophisticated framework that significantly advances the state of the art in human-object interactive video generation, with promising implications for theoretical exploration and real-world applications in AI-driven media production.

PDF Markdown

Related Papers

GitHub

AnchorCrafter
GitHub - cangcz/AnchorCrafter (4 stars)

Tweets

https://twitter.com/dreamingtulpa/status/1862904507065385405

https://twitter.com/LauAndy563591/status/1864091588441624867

https://twitter.com/VickyLovesAI/status/1883295080624386328

https://twitter.com/ailly801/status/1863267308925530504

YouTube

Show All Videos