StyleCity: Large-Scale 3D Urban Scenes Stylization (2404.10681v2)

Published 16 Apr 2024 in cs.CV

Abstract: Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generates a harmonic omnidirectional sky background. To achieve that, we propose to stylize a neural texture field by transferring 2D vision-and-text priors to 3D globally and locally. During 3D stylization, we progressively scale the planned training views of the input 3D scene at different levels in order to preserve high-quality scene content. We then optimize the scene style globally by adapting the scale of the style image with the scale of the training views. Moreover, we enhance local semantics consistency by the semantics-aware style loss which is crucial for photo-realistic stylization. Besides texture stylization, we further adopt a generative diffusion model to synthesize a style-consistent omnidirectional sky image, which offers a more immersive atmosphere and assists the semantic stylization process. The stylized neural texture field can be baked into an arbitrary-resolution texture, enabling seamless integration into conventional rendering pipelines and significantly easing the virtual production prototyping process. Extensive experiments demonstrate our stylized scenes' superiority in qualitative and quantitative performance and user preferences.

References (3)

Authors (5)

Yingshu Chen (9 papers)
Huajian Huang (12 papers)
Tuan-Anh Vu (14 papers)
Ka Chun Shum (5 papers)
Sai-Kit Yeung (52 papers)

Summary

StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization

The paper "StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization" presents an innovative approach for the stylization of extensive urban digital models using both visual and textual inputs. It proposes a neural methodology tailored to apply artistic and photorealistic styles to large-scale 3D city scenes, demonstrating significant expertise in handling textured mesh stylization.

Methodology

The crux of StyleCity lies in the introduction of a neural texture field optimized through a multi-scale progressive strategy. This methodology is distinctly advantageous for rendering large-scale urban environments with photorealistic nuances. The key features of StyleCity can be delineated as follows:

Neural Texture Field: StyleCity utilizes a compact UV-based neural texture field. This representation leverages a multi-resolution feature grid for UV encoding alongside a multilayer perceptron for decoding, ensuring that the high dimensionality of texture data is managed efficiently without compromising on the scene's visual fidelity.
Progressive Optimization: The paper introduces a strategy of multi-scale progressive rendering, which involves varying the scale of the rendered views during the training phase. This approach ensures detailed and high-quality reproduction of textures across different scales, facilitating the transfer of style features more effectively.
Global and Local Style Optimization: The system incorporates both global style harmonization and local semantic style loss functions. By adapting style references to the scale of the training material, StyleCity maintains the aesthetic consistency of large 3D scenes. The local semantic style loss further enhances photorealism by preserving local feature consistency.
Omnidirectional Sky Synthesis: To augment the realism and atmospheric depth of the scene, a generative diffusion model synthesizes an omnidirectional sky image that aligns with the stylistic elements derived from textual and visual references. This component not only adds to the aesthetic but also serves as semantic support during optimization.

Experimental Results

Through various experiments, the authors validate the qualitative and quantitative superiority of their proposed methodology over existing stylization techniques. They report enhanced user preferences and performance metrics across several parameters when compared to other methods such as StyleMesh, ARF, and instructN2N. Notably, StyleCity significantly outperforms these benchmarks in ensuring semantic style consistency and in achieving high-fidelity detail reproduction.

Implications and Future Work

The implications of this research are manifold. Practically speaking, it provides a method for automated, aesthetic transformations of 3D city scenes with applications in virtual reality, game design, and urban visualization. Theoretical advancements include contributions to the fields of neural rendering and texture synthesis, providing new avenues for the exploration of high-resolution and semantically consistent stylizations in an urban context.

The authors speculate on future developments that could involve further scaling the method to encompass more immersive experiences and integrating additional sensory data, such as acoustic or climatic alterations, to enrich the environmental realism in virtual simulations. Additionally, improving computational efficiency for even larger scale scenes remains a potential area for future exploration.

In summary, StyleCity presents a comprehensive framework that merges the capabilities of both neural textures and contemporary AI-driven aesthetic algorithms, achieving a sophisticated balance between artistic flexibility and computational efficiency.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos