NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras (2207.12496v2)

Published 25 Jul 2022 in cs.CV

Abstract: We present NeuriCam, a novel deep learning-based system to achieve video capture from low-power dual-mode IoT camera systems. Our idea is to design a dual-mode camera system where the first mode is low-power (1.1 mW) but only outputs grey-scale, low resolution, and noisy video and the second mode consumes much higher power (100 mW) but outputs color and higher resolution images. To reduce total energy consumption, we heavily duty cycle the high power mode to output an image only once every second. The data for this camera system is then wirelessly sent to a nearby plugged-in gateway, where we run our real-time neural network decoder to reconstruct a higher-resolution color video. To achieve this, we introduce an attention feature filter mechanism that assigns different weights to different features, based on the correlation between the feature map and the contents of the input frame at each spatial location. We design a wireless hardware prototype using off-the-shelf cameras and address practical issues including packet loss and perspective mismatch. Our evaluations show that our dual-camera approach reduces energy consumption by 7x compared to existing systems. Further, our model achieves an average greyscale PSNR gain of 3.7 dB over prior single and dual-camera video super-resolution methods and 5.6 dB RGB gain over prior color propagation methods. Open-source code: https://github.com/vb000/NeuriCam.

Citations (5)

View on Semantic Scholar

Summary

The paper presents a dual-mode IoT camera system that alternates between low-power grayscale video and periodic high-resolution color images to minimize energy use.
It employs an attention-based deep neural network with bidirectional recurrent architecture, grid propagation, and deformable convolutions for effective video super-resolution and colorization.
Evaluation shows up to 7x energy savings and significant PSNR gains in both grayscale and RGB outputs, achieving a real-time inference latency of 54 ms.

NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras

The paper "NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras" introduces an innovative approach to video capture using low-power dual-mode IoT camera systems. The authors propose a system whereby low-resolution, grayscale video is continuously captured in real-time by a low-power camera, complemented by periodic, high-resolution color images from a higher-power camera. This approach aims to strike a balance between preserving image quality and minimizing energy consumption, which is critical for battery-powered devices such as IoT cameras.

Key Innovations and Methodology

The core innovation in this work is the dual-mode camera system, which alternates between two modes: a low-power mode capable of capturing 1.1 mW, grayscale, noisy QQVGA resolution video, and a high-power mode demanding 100 mW that outputs periodic full-color VGA images. This system reduces energy consumption by duty-cycling the high-resolution image sensor to capture only 1 frame per second while maintaining video continuity with the low-resolution video.

The NeuriCam system incorporates a novel deep learning-based neural network for video super-resolution and colorization, utilizing a bidirectional recurrent network architecture. Key contributions include:

Attention Feature Filter Mechanism: The authors introduce an attention feature filter mechanism assigning different weights to features based on the spatial correlation with inputs, enhancing the accuracy of resolution and colorization tasks.
Grid Propagation and Deformable Convolutional Networks: This enables the super-resolution algorithm to manage temporal information, even with large inter-frame motion.
Homographic Transformation: A perspective alignment method corrects mismatches due to using dual cameras.

Evaluation and Results

The dualcamera setup effectively reduces energy consumption by 7x compared to existing systems, achieving a PSNR gain of 3.7 dB in grayscale and 5.6 dB in RGB compared to state-of-the-art super-resolution methods. The performance is validated across standard datasets (Vimeo-90K, Vid4, UDM10, REDS4) as well as real-world scenarios using hardware prototypes. The neural model delivers a latency suitable for real-time applications (54 ms inference time on an Nvidia RTX 2080 Ti).

Implications and Future Prospects

This work provides significant implications for energy-efficient video processing in IoT applications. By offloading computationally intense tasks to less resource-constrained devices like routers or edge servers, this paper delineates a path forward for addressing power constraints typical of mobile and IoT deployment scenarios. Although the model's power demand and latency are critical in real-time communications, the proposed design accommodates practical implementation constraints typically encountered in edge computing environments.

Potential future directions include refining the network architecture to further enhance real-time performance or adapting the methodology for different application scenarios, such as autonomous vehicles or augmented reality, which demand precise and rapid video processing. Moreover, given the dual-camera design approach, there is scope for applying similar networks to optimize other sensing modalities beyond video capture.

In conclusion, NeuriCam demonstrates a compelling approach to overcoming the inherent power challenges in IoT video capture, showcasing how advanced neural network strategies can be effectively utilized to fulfill real-world application requirements. The insights and methodologies presented hold promise for further exploration and refinement concerning efficient resource utilization in the burgeoning field of IoT.

PDF Markdown

Related Papers

GitHub

GitHub - vb000/NeuriCam: Deep learning based video sensing method for low-power IoT cameras (Smart glasses, GoPro, Blink etc.). (90 stars)