Patch Craft: Video Denoising by Deep Modeling and Patch Matching (2103.13767v2)

Published 25 Mar 2021 in cs.CV

Abstract: The non-local self-similarity property of natural images has been exploited extensively for solving various image processing problems. When it comes to video sequences, harnessing this force is even more beneficial due to the temporal redundancy. In the context of image and video denoising, many classically-oriented algorithms employ self-similarity, splitting the data into overlapping patches, gathering groups of similar ones and processing these together somehow. With the emergence of convolutional neural networks (CNN), the patch-based framework has been abandoned. Most CNN denoisers operate on the whole image, leveraging non-local relations only implicitly by using a large receptive field. This work proposes a novel approach for leveraging self-similarity in the context of video denoising, while still relying on a regular convolutional architecture. We introduce a concept of patch-craft frames - artificial frames that are similar to the real ones, built by tiling matched patches. Our algorithm augments video sequences with patch-craft frames and feeds them to a CNN. We demonstrate the substantial boost in denoising performance obtained with the proposed approach.

Citations (61)

View on Semantic Scholar

Summary

The paper presents a novel approach by constructing patch-craft frames that enhance denoising via additional contextual information.
It combines classical patch matching with separable CNN architectures to exploit spatial-temporal redundancies in video sequences.
Extensive experiments reveal significant PSNR gains, demonstrating the method's competitive edge over leading denoising algorithms.

Video Denoising by Deep Modeling and Patch Matching

The research paper "Patch Craft: Video Denoising by Deep Modeling and Patch Matching" presents a novel approach in the domain of video denoising, combining classical patch-based methods with modern convolutional neural networks (CNNs). The underlying principle explored is the non-local self-similarity property inherent in video sequences, utilized to effectively mitigate additive Gaussian noise.

Methodology Overview

The paper introduces an innovative concept termed "patch-craft frames," which are artificial frames constructed by tiling matched patches. These generated frames are similar but not identical to the actual video frames, thus providing additional contextual information that enhances denoising capabilities. The method leverages temporal redundancy found in video sequences, making the patch-matching approach particularly valuable over traditional image-based methods.

Key Steps in the Approach:

Patch Matching: Video frames are divided into overlapping patches. For each patch, a set of nearest neighbors is identified across both spatial and temporal dimensions.
Patch-Craft Frame Construction: The identified patches are stitched together to form the patch-craft frames. These synthetic frames are then combined with actual video frames.
Separable Convolutional Architecture: The augmented frames are fed into a separable convolutional neural network (SepConv) designed to manage the high-dimensional input efficiently, reducing the number of parameters through multi-dimensional separable convolutions.
Two-Stage Filtering: The denoising architecture includes spatial filtering with SepConv layers followed by a temporal filtering stage using a 3D extension of DnCNN.

Key Results and Conclusions

Extensive experiments showed that integrating patch-craft frames with CNNs leads to a notable improvement in video denoising performance, outperforming existing state-of-the-art (SOTA) algorithms such as V-BM4D, VNLB, DVDnet, and FastDVDnet. Across various noise levels, the approach consistently provided superior PSNR values.

Key Implications:

Enhanced Denoising Accuracy: The patch-craft methodology significantly enhances the accuracy of noise reduction, with a PSNR boost varying between 0.5 dB to 1.2 dB over other leading methods.
Competitive Edge in CNN Utilization: The architecture manages to leverage the benefits of both classical patch methods and contemporary CNNs, which is a confluence often seen as challenging due to architectural differences.

Theoretical and Practical Implications

The research builds on the established theoretical premise of self-similarity in both spatial and temporal dimensions of video content, reimagining its application in the era of deep learning. Practically, the approach suggests pathways for optimizing video processing tasks in various applications, such as video streaming, surveillance, and media production where high-quality video output is critical.

Future Directions

Moving forward, the integration of patch-based methodologies with deep learning architectures holds promise for further breakthroughs in video enhancement technologies. Future research could explore:

Adaptive Patch Matching: Dynamic adjustment of patch similarity measures based on video content characteristics.
Real-Time Processing: Optimization for real-time implementations, given the computational demands of multi-dimensional processing.
Extension to Other Modalities: Investigating the applicability of patch-craft strategies in other modalities such as audio and multispectral imaging.

Overall, the "Patch Craft" approach marks a substantial step towards more effective and efficient video denoising, encapsulating a compelling synergy between traditional image processing techniques and the power of neural networks.

PDF Markdown

Related Papers

YouTube

Show All Videos