Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Reconstruction using Enhanced Vision Transformer (2307.05616v1)

Published 11 Jul 2023 in cs.CV

Abstract: Removing noise from images is a challenging and fundamental problem in the field of computer vision. Images captured by modern cameras are inevitably degraded by noise which limits the accuracy of any quantitative measurements on those images. In this project, we propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting. The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings which can be used for reconstructing denoised images. We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability, namely Locality Sensitive Attention (LSA), Shifted Patch Tokenization (SPT), Rotary Position Embeddings (RoPE) and adversarial loss function inspired from Generative Adversarial Networks (GANs). LSA, SPT and RoPE enable the transformer to learn from the dataset more efficiently, while the adversarial loss function enhances the resolution of the reconstructed images. Based on our experiments, the proposed architecture outperforms the benchmark U-Net model by more than 3.5\% structural similarity (SSIM) for the reconstruction tasks of image denoising and inpainting. The proposed enhancements further show an improvement of \textasciitilde5\% SSIM over the benchmark for both tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Graph-based global reasoning networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 433–442, 2019.
  2. Self-attention in reconstruction bias u-net for semantic segmentation of building rooftops in optical remote sensing images. Remote Sensing, 13(13):2524, 2021.
  3. Learning a deep convolutional network for image super-resolution. In European conference on computer vision, pages 184–199. Springer, 2014.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  5. Combining transformer generators with convolutional discriminators. In German Conference on Artificial Intelligence (Künstliche Intelligenz), pages 67–79. Springer, 2021.
  6. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  7. Image inpainting: Overview and recent advances. IEEE signal processing magazine, 31(1):127–144, 2013.
  8. Image quality metrics: Psnr vs. ssim. In 2010 20th international conference on pattern recognition, pages 2366–2369. IEEE, 2010.
  9. Vision xformers: Efficient attention for image classification. arXiv preprint arXiv:2107.02239, 2021.
  10. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8183–8192, 2018.
  11. Vitgan: Training gans with vision transformers. arXiv preprint arXiv:2107.04589, 2021a.
  12. Vision transformer for small-size datasets. arXiv preprint arXiv:2112.13492, 2021b.
  13. Deep spectral-spatial network for single image deblurring. IEEE Signal Processing Letters, 27:835–839, 2020.
  14. Tell me where it is still blurry: Adversarial blurred region mining and refining. In Proceedings of the 27th ACM International Conference on Multimedia, pages 702–710, 2019.
  15. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
  16. Noise models in digital image processing. Global Sci-Tech, 10(2):63–66, 2018.
  17. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3883–3891, 2017.
  18. Ssim-based non-local means image denoising. In 2011 18th IEEE International Conference on Image Processing, pages 217–220. IEEE, 2011.
  19. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  20. Image denoising with self-adaptive multi-unet valve. In Soft Computing for Problem Solving, pages 647–659. Springer, 2021.
  21. Attention-guided cnn for image denoising. Neural Networks, 124:117–129, 2020.
  22. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018.
  23. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677, 2020.
  24. Ocnet: Object context for semantic segmentation. International Journal of Computer Vision, 129(8):2375–2398, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.