Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Imagine360: Immersive 360 Video Generation from Perspective Anchor (2412.03552v1)

Published 4 Dec 2024 in cs.CV

Abstract: $360\circ$ videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in $360\circ$ video format, we seek to lift standard perspective videos into $360\circ$ equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-$360\circ$ video generation framework that creates high-quality $360\circ$ videos with rich and diverse motion patterns from video anchors. Imagine360 learns fine-grained spherical visual and motion patterns from limited $360\circ$ video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for $360\circ$ video generation, with motion module and spatial LoRA layers fine-tuned on extended web $360\circ$ videos. 2) Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle diverse perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art $360\circ$ video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive $360\circ$ video creation.

Summary

  • The paper introduces a novel framework that generates 360° videos from standard perspective inputs using a dual-branch design to capture both global and local motion dynamics.
  • It utilizes antipodal masking and elevation-aware training to manage long-range dependencies and camera elevation variations for more naturalistic video patterns.
  • Experimental evaluation shows that Imagine360 outperforms existing methods in frame quality and motion coherence, highlighting its potential for immersive content creation.

Overview of Imagine360: Immersive 360 Video Generation from Perspective Anchor

The paper introduces Imagine360, a novel framework designed to advance the generation of 360-degree videos from standard perspective video inputs. This approach addresses the growing demand for personalized and immersive video content driven by the proliferation of head-mounted spatial computing systems. Unlike existing models requiring panoramic optical flow or high-quality panoramic images, Imagine360 employs easily accessible perspective videos as anchors to generate 360-degree equirectangular videos, offering a more user-friendly solution.

Key to the framework are several innovative components that form the crux of Imagine360’s methodology:

  1. Dual-Branch Design: The dual-branch architecture is integral to Imagine360, incorporating both a panorama and a perspective branch to provide comprehensive constraints for video generation. This design leverages fine-tuned motion modules and spatial LoRA layers to adapt to panoramic video patterns. This bifurcation allows the model to capture both global and local contexts, essential for replicating complex motion dynamics in 360-degree formats. The dual-branch structure aims to overcome the domain gap between perspective and panoramic videos by utilizing shared noisy conditions across branches while maintaining domain-specific tuning processes.
  2. Antipodal Masking: To enhance the capture of motion dependencies, particularly reversed camera motions across hemispheric boundaries, Imagine360 employs an antipodal mask. This mask aids in managing long-range dependencies, facilitating more naturalistic video patterns by emphasizing the inherent relationships between antipodal points on the spherical video output. This technique also extends the spatial awareness of the model, improving the learning of motion patterns consistent with human expectations in immersive environments.
  3. Elevation-Aware Design: Recognizing the challenges posed by varying camera elevations, the authors introduce novel elevation-aware training and inference designs. Variances in elevation can distort video masks during perspective-to-equirectangular mapping, necessitating elevation-adjusted strategies to maintain output robustness. The methodology includes elevation data augmentation and inference modules for elevation estimation, ensuring robust handling of input variability.

Empirical Evaluation: The extensive experiments conducted illustrate that Imagine360 significantly outperforms existing state-of-the-art 360-degree video generation methods across several metrics. It delivers superior frame quality, motion coherence, and video quality assessment scores. Additionally, it demonstrates a unique capacity for panorama image outpainting, showcasing its adaptability beyond video generation alone.

Implications and Future Directions: Imagine360 sets a new precedent in 360-degree video technology, with potential applications spanning entertainment, education, and communication. Practically, it simplifies the user experience by leveraging common video sources, democratizing access to immersive content creation. Theoretically, it advances understanding in cross-domain adaptation and spherical video representation. For future work, addressing limitations such as dependency on generalized off-the-shelf elevation estimations could refine accuracy and broaden application domains further, possibly extending towards real-time adaptive video processing and generation in diverse environmental contexts.

Imagine360 opens avenues for further innovation in AI-driven content generation, with implications that could transform immersive media landscapes, enhancing user experience in virtual reality and augmented reality applications.

Youtube Logo Streamline Icon: https://streamlinehq.com