DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization (2405.03436v1)
Abstract: Embedding invisible hyperlinks or hidden codes in images to replace QR codes has become a hot topic recently. This technology requires first localizing the embedded region in the captured photos before decoding. Existing methods that train models to find the invisible embedded region struggle to obtain accurate localization results, leading to degraded decoding accuracy. This limitation is primarily because the CNN network is sensitive to low-frequency signals, while the embedded signal is typically in the high-frequency form. Based on this, this paper proposes a Dual-Branch Dual-Head (DBDH) neural network tailored for the precise localization of invisible embedded regions. Specifically, DBDH uses a low-level texture branch containing 62 high-pass filters to capture the high-frequency signals induced by embedding. A high-level context branch is used to extract discriminative features between the embedded and normal regions. DBDH employs a detection head to directly detect the four vertices of the embedding region. In addition, we introduce an extra segmentation head to segment the mask of the embedding region during training. The segmentation head provides pixel-level supervision for model learning, facilitating better learning of the embedded signals. Based on two state-of-the-art invisible offline-to-online messaging methods, we construct two datasets and augmentation strategies for training and testing localization models. Extensive experiments demonstrate the superior performance of the proposed DBDH over existing methods.
- E. Wengrowski and K. J. Dana, “Light field messaging with deep photographic steganography,” in CVPR. Computer Vision Foundation / IEEE, 2019, pp. 1515–1524.
- M. Tancik, B. Mildenhall, and R. Ng, “Stegastamp: Invisible hyperlinks in physical photographs,” in CVPR. Computer Vision Foundation / IEEE, 2020, pp. 2114–2123.
- H. Fang, Z. Jia, Z. Ma, E. Chang, and W. Zhang, “Pimog: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network,” in ACM Multimedia. ACM, 2022, pp. 2267–2275.
- J. Jia, Z. Gao, D. Zhu, X. Min, G. Zhai, and X. Yang, “Learning invisible markers for hidden codes in offline-to-online photography,” in CVPR. IEEE, 2022, pp. 2263–2272.
- J. Jia, Z. Gao, K. Chen, M. Hu, X. Min, G. Zhai, and X. Yang, “RIHOOP: robust invisible hyperlinks in offline and online photographs,” IEEE Trans. Cybern., vol. 52, no. 7, pp. 7094–7106, 2022.
- H. Fang, W. Zhang, H. Zhou, H. Cui, and N. Yu, “Screen-shooting resilient watermarking,” IEEE Trans. Inf. Forensics Secur., vol. 14, no. 6, pp. 1403–1418, 2019.
- H. Fang, W. Zhang, Z. Ma, H. Zhou, S. Sun, H. Cui, and N. Yu, “A camera shooting resilient watermarking scheme for underpainting documents,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 11, pp. 4075–4089, 2020.
- H. Fang, D. Chen, F. Wang, Z. Ma, H. Liu, W. Zhou, W. Zhang, and N. Yu, “TERA: screen-to-camera image code with transparency, efficiency, robustness and adaptability,” IEEE Trans. Multim., vol. 24, pp. 955–967, 2022.
- C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in ECCV (13), ser. Lecture Notes in Computer Science, vol. 11217. Springer, 2018, pp. 334–349.
- J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, 2021.
- K. Xu, M. Qin, F. Sun, Y. Wang, Y. Chen, and F. Ren, “Learning in the frequency domain,” in CVPR. Computer Vision Foundation / IEEE, 2020, pp. 1737–1746.
- C. Zhang, P. Benz, A. Karjauv, G. Sun, and I. S. Kweon, “UDH: universal deep hiding for steganography, watermarking, and light field messaging,” in NeurIPS, 2020.
- J. Jing, X. Deng, M. Xu, J. Wang, and Z. Guan, “Hinet: Deep image hiding by invertible network,” in ICCV. IEEE, 2021, pp. 4713–4722.
- J. J. Fridrich and J. Kodovský, “Rich models for steganalysis of digital images,” IEEE Trans. Inf. Forensics Secur., vol. 7, no. 3, pp. 868–882, 2012.
- X. Song, F. Liu, C. Yang, X. Luo, and Y. Zhang, “Steganalysis of adaptive JPEG steganography using 2d gabor filters,” in IH&MMSec. ACM, 2015, pp. 15–23.
- C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters - improve semantic segmentation by global convolutional network,” in CVPR. IEEE Computer Society, 2017, pp. 1743–1751.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR. IEEE Computer Society, 2016, pp. 770–778.
- J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, pp. 2011–2023, 2020.
- H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” Int. J. Comput. Vis., vol. 128, no. 3, pp. 642–656, 2020.
- E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (Poster), 2015.