Learning Kernel-Modulated Neural Representation for Efficient Light Field Compression
Abstract: Light field is a type of image data that captures the 3D scene information by recording light rays emitted from a scene at various orientations. It offers a more immersive perception than classic 2D images but at the cost of huge data volume. In this paper, we draw inspiration from the visual characteristics of Sub-Aperture Images (SAIs) of light field and design a compact neural network representation for the light field compression task. The network backbone takes randomly initialized noise as input and is supervised on the SAIs of the target light field. It is composed of two types of complementary kernels: descriptive kernels (descriptors) that store scene description information learned during training, and modulatory kernels (modulators) that control the rendering of different SAIs from the queried perspectives. To further enhance compactness of the network meanwhile retain high quality of the decoded light field, we accordingly introduce modulator allocation and kernel tensor decomposition mechanisms, followed by non-uniform quantization and lossless entropy coding techniques, to finally form an efficient compression pipeline. Extensive experiments demonstrate that our method outperforms other state-of-the-art (SOTA) methods by a significant margin in the light field compression task. Moreover, after aligning descriptors, the modulators learned from one light field can be transferred to new light fields for rendering dense views, indicating a potential solution for view synthesis task.
- M. Levoy and P. Hanrahan. Light field rendering. In Annu. Conf. Comput. Graph. Interact. Techn., pages 31–42, 1996.
- The lumigraph. In ACM Trans. Graph. (TOG), pages 43–54, 1996.
- A framework for learning depth from a flexible subset of dense and sparse light field views. IEEE Trans. Image Process. (TIP), 28(12):5867–5880, Dec 2019.
- Occlusion-aware cost constructor for light field depth estimation. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 19809–19818, 2022.
- Detail-preserving transformer for light field image super-resolution. In AAAI Conf. Artif. Intell. (AAAI), volume 36, pages 2522–2530, 2022.
- Disentangling light fields for super-resolution and disparity estimation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), 2022.
- Transcut: Transparent object segmentation from a light-field image. In IEEE Int. Conf. Comput. Vis. (ICCV), pages 3442–3450, 2015.
- Occlusion-aware bi-directional guided network for light field salient object detection. In ACM Int. Conf. Multimedia (MM), pages 1692–1701, 2021.
- Locally linear embedding-based prediction for 3d holoscopic image coding using HEVC. In Eur. Sign. Process. Conf. (EUSIPCO), pages 11–15, 2014.
- HEVC-based light field image coding with bi-predicted self-similarity compensation. In IEEE Int. Conf. Multimedia and Expo. Worksh. (ICMEW), pages 1–4, 2016.
- Light field HEVC-based image coding using locally linear embedding and self-similarity compensated prediction. In IEEE Int. Conf. Multimedia and Expo. Worksh. (ICMEW), pages 1–4, 2016.
- Lenselet image compression scheme based on subaperture images streaming. In IEEE Int. Conf. Image Process. (ICIP), pages 4733–4737, 2015.
- Pseudo-sequence-based light field image compression. In IEEE Int. Conf. Multimedia and Expo. Worksh. (ICMEW), pages 1–4, 2016.
- Interpreting plenoptic images as multiview sequences for improved compression. In IEEE Int. Conf. Image Process. (ICIP), 2017.
- Comparison and evaluation of light field image coding approaches. IEEE J. Sel. Topics Signal Process. (JSTSP), 11(7):1092–1106, 2017.
- Pseudo-sequence-based 2-D hierarchical coding structure for light-field image compression. IEEE J. Sel. Topics Signal Process. (JSTSP), 11(7):1107–1119, 2017.
- Dvc: An end-to-end deep video compression framework. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 11006–11015, 2019.
- Learning for video compression with hierarchical quality and recurrent enhancement. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2020.
- Learning for video compression with recurrent auto-encoder and recurrent probability model. IEEE J. Sel. Topics Signal Process. (JSTSP), 15(2):388–401, 2021.
- Perceptual learned video compression with recurrent conditional GAN. Int. Joint Conf. Artif. Intell. (IJCAI), 2022.
- Light fields compression using depth image based view synthesis. In IEEE Int. Conf. Multimedia and Expo (ICME), 2017.
- Low bitrate light field compression with geometry and content consistency. IEEE Trans. Multimedia (TMM), 24:152–165, 2020.
- Deep residual architecture using pixel and feature cues for view synthesis and temporal interpolation. IEEE Trans. on Comput. Imaging (TCI), 8:246–259, 2022.
- S. Zhao and Z. Chen. Light field image coding via linear approximation prior. In IEEE Int. Conf. Image Process. (ICIP), 2017.
- A. Pekka and T. Ioan. WaSP: Hierarchical warping, merging, and sparse prediction for light field image compression. Eur. Worksh. on Visual Inform. Process. (EUVIP), pages 1–6, 2018.
- A 4D DCT-based lenslet light field codec. In IEEE Int. Conf. Image Process. (ICIP), pages 435–439, 2018.
- Geometry-aware graph transforms for light field compact representation. IEEE Trans. Image Process. (TIP), 29:602–616, 2020.
- Steered mixture-of-experts for light field images and video: Representation and coding. IEEE Trans. Multimedia (TMM), 22(3):579–593, 2020.
- Light field compression with homography-based low-rank approximation. IEEE J. Sel. Topics Signal Process. (JSTSP), 11(7):1132–1145, Oct. 2017.
- Shearlet transform-based light field compression under low bitrates. IEEE Trans. Image Process. (TIP), 29:4269–4280, 2020.
- NeRF: Representing scenes as neural radiance fields for view synthesis. In Eur. Conf. Comput. Vis. (ECCV), 2020.
- X-fields: Implicit neural view-, light-and time-image interpolation. ACM Trans. Graph. (TOG), 39(6):1–15, 2020.
- Nerf in the dark: High dynamic range view synthesis from noisy raw images. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 16190–16199, 2022.
- JAWS: Just a wild shot for cinematic transfer in neural radiance fields. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 16933–16942, 2023.
- Looking through the glass: Neural surface reconstruction against high specular reflections. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 20823–20833, 2023.
- J. Shi and C. Guillemot. Light field compression via compact neural scene representation. In IEEE Int. Conf. Acoust., Speech, and Sign. Process. (ICASSP), pages 1–5. IEEE, 2023.
- An untrained neural network prior for light field compression. IEEE Trans. Image Process. (TIP), 31:6922–6936, 2022.
- A systematic dnn weight pruning framework using alternating direction method of multipliers. In Eur. Conf. Comput. Vis. (ECCV), pages 184–199, 2018.
- Network pruning via performance maximization. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 9270–9280, 2021.
- Convolutional neural network pruning with structural redundancy reduction. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 14913–14922, 2021.
- Towards efficient tensor decomposition-based DNN model compression with optimization framework. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 10674–10683, 2021.
- DCFNet: Deep neural network with decomposed convolutional filters. In Int. Conf. on Mach. Learn. (ICML), pages 4198–4207, 2018.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 2704–2713, 2018.
- Training with quantization noise for extreme model compression. Int. Conf. Learn. Represent. (ICLR), 2021.
- E. Adelson and J. Wang. Single lens stereo with a plenoptic camera. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), 14(2):99–106, 1992.
- Light field photography with a hand-held plenoptic camera. Comput. Science Tech. Report (CSTR), 2(11):1–11, 2005.
- High performance imaging using large camera arrays. In ACM Trans. Graph. (TOG), pages 765–776. 2005.
- ISO/IEC JTC 1/SC29. High Efficiency Coding and Media Delivery in Heterogeneous Environments – Part 2: High Efficiency Video Coding. ISO/IEC 23008-2:2017. Technical report, 2017.
- Impact of light field compression on focus stack and extended focus images. In IEEE Eur. Signal Process. Conf. (EUSIPCO), pages 898–902, 2016.
- Jpeg pleno. https://jpeg.org/jpegpleno/.
- H. Reinhard and H. Paul. Deep decoder: Concise image representations from untrained non-convolutional networks. Int. Conf. Learn. Represent. (ICLR), 2019.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. (TIP), 13(4):600–612, 2004.
- The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 586–595, 2018.
- M. Abramowitz and I. Stegun. Handbook of mathematical functions with formulas, graphs, and mathematical tables, volume 55. 1964.
- A dataset and evaluation methodology for depth estimation on 4D light fields. In Asian Conf. Comput. Vis. (ACCV), pages 19–34, 2016.
- M. Rerabek and T. Ebrahimi. New light field image dataset. In Int. Conf. on Quality of Multimedia Experience (QoMEX), number EPFL-CONF-218363, 2016.
- Blender website. https://www.blender.org/.
- B. Gisle. Calculation of average psnr differences between rd-curves. In ITU-T SG16 Q.6 Document, VCEG-M33, 2001.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.