Dual-Camera Smooth Zoom on Mobile Phones (2404.04908v2)

Published 7 Apr 2024 in cs.CV

Abstract: When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. The datasets, codes, and pre-trained models will are available at https://github.com/ZcsrenlongZ/ZoomGS.

References (1)

Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal processing letters 20(3), 209–212 (2012)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces the novel DCSZ task and the innovative ZoomGS method that constructs camera-specific 3D models for smooth zoom transitions.
It leverages frame interpolation on a synthetic dataset generated via a tailored data factory to address discontinuities in dual-camera zooming.
Experimental results demonstrate significant improvements in seamless zoom quality, enhancing mobile photography experiences in real-world scenarios.

Dual-Camera Smooth Zoom: Enhancing Mobile Zoom Experience through Frame Interpolation and 3D Reconstruction

Overview of the Work

In the quest to enhance the mobile photography experience, especially during zoom transitions between ultra-wide (UW) and wide (W) angle cameras, challenges remain. Present mobile technologies, while employing dual cameras for zoom functionalities, suffer from noticeable jumps in geometric content and color during the zoom-in process—detracting significantly from user experience. Recognizing this, the paper introduces a novel task named Dual-Camera Smooth Zoom (DCSZ), targeting smooth zoom previews devoid of the abrupt transitions currently observed.

Core Contributions

Novel Task Introduction (DCSZ): The paper pioneers the dual-camera smooth zoom task focusing on transitioning seamlessly between dual cameras on mobile devices—aiming at a fluid zoom preview without the jarring jumps in image quality currently experienced.
Dual-Camera Smooth Zoom Gaussian Splatting (ZoomGS): At the heart of addressing DCSZ is the ZoomGS approach. It leverages camera-specific encoding to construct unique 3D models for each virtual camera positioned between the UW and W cameras, facilitating the generation of intermediate frames for a smooth zoom effect. This is realized through frame interpolation techniques applied on synthetic datasets generated via a proposed data factory methodology.
Synthetic Dataset and Real-world Evaluation: Acknowledging the challenge in acquiring ground-truth data for the DCSZ task, the authors present a synthetic dataset constructed via the proposed data factory, besides curating real-world dual-zoom image sets for comprehensive evaluation.

Technical Insights

The paper explores challenges faced by current frame interpolation methods when directly applied to the DCSZ task, primarily due to a gap in motion domains between training data and dual-camera data. Addressing this, the research suggests generating DCSZ-friendly data by synthesizing continuous virtual cameras, a feat made possible through the innovative ZoomGS method. ZoomGS utilizes a camera-specific encoding mechanism allowing for the decoupling of scene geometric content from camera-dependent characteristics, thereby facilitating the rendering of synthetic but realistic frames for the task.

Building the Data Factory

Divided into a trio of essential steps—data preparation, 3D model construction via ZoomGS, and subsequent data generation—this process serves as a pipeline for creating a synthetic dataset prime for fine-tuning frame interpolation models. Specifically:

Data Preparation: Involves capturing multi-view dual-camera images and calibrating extrinsic and intrinsic camera parameters.
ZoomGS Modeling: ZoomGS is employed to construct camera-specific 3D models by introducing camera-specific encoding to disentangle geometric scenes' content from camera-dependent traits.
Synthetic Data Generation: Through the interpolated parameters of virtual cameras derived from real UW and W cameras, synthetic zoom sequences are generated, forming a dataset conducive for refining frame interpolation models suited to the DCSZ task.

Results and Implications

Extensive experiments demonstrate the efficacy of fine-tuned frame interpolation models against their original counterparts on the synthetic dataset and in real-world scenarios. Notably, the fine-tuned models exhibit significant improvements, validating the effectiveness of the data factory methodology in bridging the domain gap between training data and dual-camera zooming needs.

Looking Forward

This paper's innovative approach to dual-camera smooth zoom opens up avenues for further exploration in enhancing mobile photography experiences, especially in zoom functionalities. The introduction of the ZoomGS and the concept of a data factory for generating task-specific training data sets a foundation for future work in this domain. Moreover, the implication of these advancements stretches beyond photography, holding potential for applications in video recording and live streaming, emphasizing the seamless integration of multi-lens mobile camera systems for enhanced user experiences.