Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss (2404.02731v1)
Abstract: Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing new challenges for RAW domain processes like demosaicing. The challenge intensifies as most research in the RAW domain is based on the premise that each pixel contains a value, making the straightforward adaptation of these methods to event camera demosaicing problematic. To end this, we present a Swin-Transformer-based backbone and a pixel-focus loss function for demosaicing with missing pixel values in RAW domain processing. Our core motivation is to refine a general and widely applicable foundational model from the RGB domain for RAW domain processing, thereby broadening the model's applicability within the entire imaging process. Our method harnesses multi-scale processing and space-to-depth techniques to ensure efficiency and reduce computing complexity. We also proposed the Pixel-focus Loss function for network fine-tuning to improve network convergence based on our discovery of a long-tailed distribution in training loss. Our method has undergone validation on the MIPI Demosaic Challenge dataset, with subsequent analytical experimentation confirming its efficacy. All code and trained models are released here: https://github.com/yunfanLu/ev-demosaic
- Beyond joint demosaicking and denoising: An image processing pipeline for a pixel-bin image sensor. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 233–242, 2021.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Unprocessing images for learned raw denoising. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11036–11045, 2019.
- Vdtr: Video deblurring with transformer. IEEE Transactions on Circuits and Systems for Video Technology, 33(1):160–171, 2022.
- Learning depth with convolutional spatial propagation network. IEEE transactions on pattern analysis and machine intelligence, 42(10):2361–2379, 2019.
- N-gram in swin transformers for efficient lightweight image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2071–2081, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020.
- Rstt: Real-time spatial temporal transformer for space-time video super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17441–17451, 2022.
- Adaptive homogeneity-directed demosaicing algorithm. Ieee transactions on image processing, 14(3):360–369, 2005.
- Image quality metrics: Psnr vs. ssim. In 2010 20th international conference on pattern recognition, pages 2366–2369. IEEE, 2010.
- Replacing mobile camera isp with a single deep learning model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 536–537, 2020.
- Pynet-v2 mobile: Efficient on-device photo processing with neural networks. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 677–684. IEEE, 2022.
- Learning event-based motion deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3320–3329, 2020.
- Turning frequency to resolution: Video super-resolution via event cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7772–7781, 2021.
- Joint demosaicing and deghosting of time-varying exposures for single-shot hdr imaging. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12292–12301, 2023.
- Deep image demosaicking using a cascade of convolutional residual denoising networks. In Proceedings of the European conference on computer vision (ECCV), pages 303–319, 2018.
- Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE transactions on pattern analysis and machine intelligence, 41(11):2599–2613, 2018.
- Hst: Hierarchical swin transformer for compressed image super-resolution. In European conference on computer vision, pages 651–668. Springer, 2022.
- Image demosaicing: A systematic survey. In Visual Communications and Image Processing 2008, pages 489–503. SPIE, 2008.
- Transformer for object detection: Review and benchmark. Engineering Applications of Artificial Intelligence, 126:107021, 2023.
- Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
- Joint demosaicing and denoising with self guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2240–2249, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022.
- Video frame interpolation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3532–3542, 2022a.
- All one needs to know about priors for deep image restoration and enhancement: A survey. arXiv preprint arXiv:2206.02070, 2022b.
- Learning inr for event-guided rolling shutter frame correction, deblur, and interpolation. arXiv preprint arXiv:2305.15078, 2023a.
- Learning spatial-temporal implicit neural representations for event-guided video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1557–1567, 2023b.
- Modern cad/cam/cae systems: brief overview. 2021.
- High-quality linear interpolation for demosaicing of bayer-patterned color images. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pages iii–485. IEEE, 2004.
- Multi-bracket high dynamic range imaging with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 547–557, 2022.
- Mixed precision training. arXiv preprint arXiv:1710.03740, 2017.
- MIPI Challenge 2024. Mobile intelligent photography and imaging workshop 2024. https://mipi-challenge.org/MIPI2024/, 2024.
- Efi-net: Video frame interpolation from fusion of events and frames. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1291–1301, 2021.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Rethinking learning-based demosaicing, denoising, and super-resolution pipeline. In 2022 IEEE International Conference on Computational Photography (ICCP), pages 1–12. IEEE, 2022.
- High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence, 43(6):1964–1980, 2019.
- U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
- Frame-recurrent video super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6626–6634, 2018.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
- Video frame interpolation transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17482–17491, 2022.
- Chung-Yen Su. Highly effective iterative demosaicing using weighted-edge and color-difference interpolations. IEEE Transactions on Consumer Electronics, 52(2):639–645, 2006.
- Event-based fusion for motion deblurring with cross-modal attention. In European conference on computer vision, pages 412–428. Springer, 2022.
- Learning deep convolutional networks for demosaicing. arXiv preprint arXiv:1802.03769, 2018.
- Time lens: Event-based video frame interpolation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16155–16164, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Efficient video deblurring guided by motion magnitude. In European Conference on Computer Vision, pages 413–429. Springer, 2022.
- Video quality assessment based on swin transformer with spatio-temporal feature fusion and data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1846–1854, 2023.
- End-to-end learning for joint image demosaicing, denoising and super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3507–3516, 2021.
- Residual swin transformer channel attention network for image demosaicing. In 2022 10th European Workshop on Visual Information Processing (EUVIP), pages 1–6. IEEE, 2022.
- Motion deblurring with real events. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2583–2592, 2021.
- Mipi 2024 challenge on demosaic for hybridevs camera: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Image super-resolution: The techniques, applications, and future. Signal processing, 128:389–408, 2016.
- Msfa-frequency-aware transformer for hyperspectral images demosaicing. arXiv preprint arXiv:2303.13404, 2023.
- Deep image deblurring: A survey. International Journal of Computer Vision, 130(9):2103–2130, 2022.
- Color demosaicking via directional linear minimum mean square-error estimation. IEEE Transactions on Image Processing, 14(12):2167–2178, 2005.
- Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, 3(1):47–57, 2016.
- Deep learning for event-based vision: A comprehensive survey and benchmarks. arXiv preprint arXiv:2302.08890, 2023.