Convolutional Neural Network (CNN) vs Vision Transformer (ViT) for Digital Holography (2108.09147v4)

Published 20 Aug 2021 in cs.CV and eess.IV

Abstract: In Digital Holography (DH), it is crucial to extract the object distance from a hologram in order to reconstruct its amplitude and phase. This step is called auto-focusing and it is conventionally solved by first reconstructing a stack of images and then by sharpening each reconstructed image using a focus metric such as entropy or variance. The distance corresponding to the sharpest image is considered the focal position. This approach, while effective, is computationally demanding and time-consuming. In this paper, the determination of the distance is performed by Deep Learning (DL). Two deep learning (DL) architectures are compared: Convolutional Neural Network (CNN) and Vision Transformer (ViT). ViT and CNN are used to cope with the problem of auto-focusing as a classification problem. Compared to a first attempt [11] in which the distance between two consecutive classes was 100$\mu$m, our proposal allows us to drastically reduce this distance to 1$\mu$m. Moreover, ViT reaches similar accuracy and is more robust than CNN.

Citations (28)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Convolutional Neural Network (CNN) vs Vision Transformer (ViT) for Digital Holography (2108.09147v4)

Summary

Related Papers