NLUT: Neural-based 3D Lookup Tables for Video Photorealistic Style Transfer (2303.09170v2)

Published 16 Mar 2023 in cs.CV and eess.IV

Abstract: Video photorealistic style transfer is desired to generate videos with a similar photorealistic style to the style image while maintaining temporal consistency. However, existing methods obtain stylized video sequences by performing frame-by-frame photorealistic style transfer, which is inefficient and does not ensure the temporal consistency of the stylized video. To address this issue, we use neural network-based 3D Lookup Tables (LUTs) for the photorealistic transfer of videos, achieving a balance between efficiency and effectiveness. We first train a neural network for generating photorealistic stylized 3D LUTs on a large-scale dataset; then, when performing photorealistic style transfer for a specific video, we select a keyframe and style image in the video as the data source and fine-turn the neural network; finally, we query the 3D LUTs generated by the fine-tuned neural network for the colors in the video, resulting in a super-fast photorealistic style transfer, even processing 8K video takes less than 2 millisecond per frame. The experimental results show that our method not only realizes the photorealistic style transfer of arbitrary style images but also outperforms the existing methods in terms of visual quality and consistency. Project page:https://semchan.github.io/NLUT_Project.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a neural network-based 3D LUT method that enables efficient photorealistic style transfer in videos with high temporal consistency.
It employs a two-stage training process combining large-scale pre-training and video-specific fine-tuning to refine style mapping.
Experimental results demonstrate the method’s ability to process 8K video at under 2 ms per frame, outperforming prior techniques.

The paper "NLUT: Neural-based 3D Lookup Tables for Video Photorealistic Style Transfer" proposes a novel method designed to efficiently and effectively achieve photorealistic style transfer in videos while maintaining temporal consistency. Current techniques approach the problem by processing each frame independently, which can lead to inefficiencies and temporal inconsistencies. This paper addresses these issues by introducing neural network-based 3D Lookup Tables (LUTs).

Here is a detailed summary of their method and key contributions:

Key Contributions and Methodology

Neural Network-Based 3D LUTs:
- The primary innovation involves the use of 3D LUTs generated by a neural network for photorealistic style transfer. This approach stands out due to its efficiency in processing.
Training Phase:
- Initially, a neural network is trained using a large-scale dataset to generate photorealistic stylized 3D LUTs. This foundational training allows the model to learn to produce LUTs that can give videos a specific photorealistic look efficiently.
Fine-Tuning for Specific Videos:
- For a given video, the system selects a keyframe and corresponding style image from the video. The neural network is fine-tuned using this data, refining the LUTs for the specific video content.
Efficient Querying and Application:
- Once fine-tuned, the 3D LUTs are used to map the colors of the entire video. This process is extremely fast, capable of handling 8K video at less than 2 milliseconds per frame.

Experimental Results

The authors claim that their method achieves high visual quality and temporal consistency. By handling the style transfer at the color mapping level with precomputed LUTs, they bypass the inefficiencies and inconsistencies introduced by frame-by-frame processing.
The photorealistic style transfer method not only supports arbitrary style images but also outperforms existing methods in terms of both visual quality and consistency.

Practical Impact

The approach's efficiency makes it particularly suited for high-resolution videos (up to 8K), addressing both the speed and quality requirements of modern video production applications.
By ensuring temporal consistency and maintaining high visual quality, this method could be highly impactful in fields such as video editing, movie production, and real-time visual effects in gaming and virtual reality.

Additional Resources

The authors have made more details and resources available on their project page, which can be found here: Project page.

In summary, the paper makes a significant stride in the domain of video photorealistic style transfer by introducing a neural-based 3D LUT approach that efficiently creates and applies style transfers while maintaining high visual quality and temporal consistency.

PDF Markdown